### Introduction / Business Plan

Very soon I will have to make an important life decision. I need to choose which city I am going to live in. I speak French, English and German, which gives a lot of potential choices.

I don't have a particular reason to choose any city / country. I just want to live in a big and dynamic city. To me this can be represented by the density of bars and restaurant in a specific city.

Therefore I want to build my project around this idea. Given my top 5 cities I want to live in (Paris, Berlin, Brussels, Amsterdam and Zurich), which one has the highest density of bars / activities to do ? Which can be calculated using the number of activities divided by the area of the city.

### Data

For this purpose I need to use the Foursquare API to search the number of venues that correspond to a certain type of activities that I like. This can be the following categories : ['bar','pub','restaurant','bier bar']. I plan to do some exploratory analysis to add other activities that I like to this list.

Once I have this list I will use the Foursquare API and the explore venues function to gather the number of venues that fits in these categories per city. I will then count how much venues this represents and divide this number by the area of the city. That way, I will have an idea of the city I might like the most!

### Methodology

Let's first import all the relevant librairies and packages

In [69]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print('Libraries imported.')

Libraries imported.


Let's store our credentials to call the foursquare API

In [77]:
CLIENT_ID = 'E15VXX3FORWXIZ0F24ZJ4T2MRQEUC3WYOJ0F2ZMLMHBH0NWZ' # your Foursquare ID
CLIENT_SECRET = 'Z325HYRMPHVGNPZP3J3T4FZ5VGWPRSAQ4KC1ZNSM0FYPNGK3' # your Foursquare Secret
ACCESS_TOKEN = '4JSO4UGPHGKXVJXRTALTFAHCKDSMHU410P4F22VAP2KEGQSH' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: E15VXX3FORWXIZ0F24ZJ4T2MRQEUC3WYOJ0F2ZMLMHBH0NWZ
CLIENT_SECRET:Z325HYRMPHVGNPZP3J3T4FZ5VGWPRSAQ4KC1ZNSM0FYPNGK3


We keep a dictionary of the latitudes and longitudes of the cities we are interested in. But also the radius of search we want for each city. The radius is calculated by using the area and the know function A = pi * r^2

So for exemple. Paris has an area of 105 square kilometer. So the radius we take is sqrt(105/3.14) = 5,78 km.

Therefore let's store the area in a dictionary (in square meters, which is why the numbers seem big) and then we can calculate the radius when we pass the parameter.

In [78]:
cities = {'Paris':'48.8566,2.3522','Berlin':'52.5200,13.4050','Amsterdam':'52.3676,4.9041','Brussels':'50.8503,4.3517','Zürich':'47.3769,8.5417'}
area = {'Paris':105000000,'Berlin':891000000,'Amsterdam':219000000,'Brussels':32000000,'Zürich':88000000 }

In [85]:
# I know Paris very well so let's start with it. I'm going to search for bars
city_to_analyse = 'Paris'

import math as m
import json, requests
url = 'https://api.foursquare.com/v2/venues/explore'

params = dict(
client_id=CLIENT_ID,
client_secret=CLIENT_SECRET,
v='20180323',
ll=cities[city_to_analyse],
query='bars',
limit=1,
radius = m.sqrt(area[city_to_analyse] / 3.14)
)
resp = requests.get(url=url, params=params)
data = json.loads(resp.text)
data

{'meta': {'code': 200, 'requestId': '60a0be413c06365b45503db5'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Paris',
  'headerFullLocation': 'Paris',
  'headerLocationGranularity': 'city',
  'query': 'bars',
  'totalResults': 346,
  'suggestedBounds': {'ne': {'lat': 48.90863805203805,
    'lng': 2.4311440197432965},
   'sw': {'lat': 48.80456194796195, 'lng': 2.273255980256703}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4fd21c85e4b08315f264bdd5',
       'name': 'Le Dernier Bar avant la Fin du Monde',
       'contact': {},
       'location': {'address': '19 avenue Victoria',
        'lat': 48.857970532781664,
        'lng': 2.346152365207672,
        'labeledLatLngs': [{'l

We can see that there is a field called 'totalResults' which is exactly what we need to count the number of bars in a given city. By changing the query 'bars' to 'restaurants' we can also find the number of restaurants. Let's therefore use these 2 queries and add them to have the number of activities in a given city. 

In [88]:
city_activities = {'Paris':0,'Berlin':0,'Amsterdam':0,'Brussels':0,'Zürich':0}

for city in city_activities:
    
    url = 'https://api.foursquare.com/v2/venues/explore'
    
    #We iterate through a list of query we want to search. This can be changed easily to gather more information
    for search in ['bars','restaurants']:
        params = dict(
        client_id=CLIENT_ID,
        client_secret=CLIENT_SECRET,
        v='20180323',
        ll=cities[city],
        query=search,
        limit=1,
        radius = m.sqrt(area[city] / 3.14)
        )
        resp = requests.get(url=url, params=params)
        data = json.loads(resp.text)
        city_activities[city] += data['response']['totalResults']
    
city_activities

{'Paris': 593, 'Berlin': 520, 'Amsterdam': 559, 'Brussels': 518, 'Zürich': 451}

Now we just have to calculate the density of activities per city and select the city with the highest density !

In [90]:
selected_city = ''
highest_density = 0

for city in cities: #We divide below the area by 1 million to have the unit in activities per square kilometers and square meters
    if city_activities[city] / (area[city] / 1000000) > highest_density:
        highest_density = city_activities[city] / (area[city] / 1000000)
        selected_city = city

print('The city with the highest density of activities is : ' + selected_city + ' with ' + str(highest_density) + ' activities per square kilometers ! ')

The city with the highest density of activities is : Brussels with 16.1875 activities per square kilometers ! 


# Results

The results of our analysis shows that Brussels has the highest density of activities I am interested in. I visited the 5 cities that are studied here and this makes sense because Brussels is very small but is one of the world capital of beer, with bars everywhere. 

Also, even though Belgium is one of the smallest country in Europe, it is one of the first exporting countries in terms of liter of Beer. With 1.9 billion liter exported per year. Which is about the same as Danemark, the first exporter in Europe. 

Overall the result was what I expected.

# Discussion

What is interesting with the simple and short code is how easily it can be tuned to your liking. For example, if you want to search for museums instead of bars and restaurants, you can just change the list in the code and it will return what you are looking for.

There is room for improvement though. For example, I had to search by hand the area of the cities I am interested in. This wasn't long but it's not really scalable. So it would be nice to imagine a new functionality where you just enter the names of the cities you are interested in and it will webscrap wikipedia to find the area of the city. This would make the code simpler and more practical to use.

# Conclusion

This report was very interesting to do ! I liked the foursquare API, it really shows what is possible to automate using the world of APIs. 

I want to dive deeper into this subject and try to analyse other APIs. For example to gather data about world trends using twitter or Reddit. 

Overall, it was a great way to start this journey in Data Science !