<h1 align=center><font size = 5>Location Analysis of Gyms in Munich</font></h1>

<h2 align=center><font size = 3>Finding the best location to open a new gym</font></h2>

### Introduction
In this project I am going to analyze location data for the city of Munich in Germany, particulary distribution of gyms around the city. This will be of huge interest for investors/business owners looking to open a gym somewhere in the city. I will collect location data of currently existing gyms and population data for every borough of Munich, so that I can pinpoint one or multiple spots in the city where gym density is low and population density is relatively high. The final product of the project will be a detailed report supported by the datasets found which will defend a hypothesis about investing on a new gym in a certain spot of the city of Munich.

### Data
The most important factor for helping us decide for the best location of the new gym will be the location of the currently existing gyms in the city. In order to achieve showing those gyms in a map, we will need their coordinates. We can get the coordinates from Foursquare result set, after we query it only on venue type:Gym.

There will be two main datasets which will be used in the project: 
* The first one is gym locations dataset retrieved from Foursquare platform. The dataframe will consist of three main columns namely Neighborhood, Latitude and Longtitude. 
* The second describes Munich administrative units. The dataset comes from Wikipedia and consists of Munich boroughs, their population, area and population density.

### Methodology
The main approach used to fulfill the project objectives is data visualisation. The collected data are plotted on the map by using markers, displaying up to two features at a time (area and population density). A visual decision is taken by looking at the map, which is further investigated by querying data more specifically.<br>
Foursquare platform is used as location data provider.<br>
Any machine learning model is not used.


<b> Import the dependencies.</b>

In [115]:
import pandas as pd
import numpy as np
import geopy
from geopy.geocoders import Nominatim
import folium
import requests
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
from colour import Color
from pandas.io.json import json_normalize

#### Scrape the Munich Boroughs dataset from Wikipedia
After we read it into a Pandas DataFrame, we drop unnecesary columns and the last row which is the sum of features for the whole city. At the end, we translate the column names from German to English.

In [116]:
link = "https://de.wikipedia.org/wiki/Stadtbezirke_M%C3%BCnchens"

df = pd.read_html(link, header=0, thousands='.', decimal=',')[0]
df.drop(df.tail(1).index,inplace=True)
df.drop(df.columns[[0,5]], axis=1, inplace=True)
df.columns = ['Borough','Area(Km2)','Population','Population_Density(ppl/Km2)']
df.set_index('Borough', inplace=True)

This is how the dataset looks like after cleaning.

In [117]:
df.head()

Unnamed: 0_level_0,Area(Km2),Population,Population_Density(ppl/Km2)
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Altstadt-Lehel,3.15,21100,6708
Ludwigsvorstadt-Isarvorstadt,4.4,51644,11734
Maxvorstadt,4.3,51402,11960
Schwabing-West,4.36,68527,15706
Au-Haidhausen,4.22,61356,14541


<b> Define Foursquare Credentials and version.

In [118]:
CLIENT_ID = 'L2LSLZQ3SCCTW15ONHLNUXHSK3IKWBSOZNHCGUHKXKETGYSB' # your Foursquare ID
CLIENT_SECRET = 'N1AUST4TYU4LNFPTXWF5RXVPEVYMVO0GZPZMH54ZEBPMDFOF' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

Define a function that gets the coordinates of a given address using geopy library. 

In [119]:
def get_coordinates(address):
    geolocator = Nominatim(user_agent="ny_explorer")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    return (latitude, longitude)

Get the coordinates of Munich.

In [120]:
munich_latitude, munich_longitude = get_coordinates('Munich')

Create a folium map of centered on the city center.

In [121]:
map_munich = folium.Map(tiles="Stamen Toner", location=[munich_latitude, munich_longitude], zoom_start=11)
map_munich

We add two empty columns on the dataset, namely Latitude and Longitude. Then we get the coordinates of each borough of the city and fill the two columns that we just added with that information.

In [122]:
df['Latitude'] = np.zeros([df.shape[0],1])
df['Longitude'] = np.zeros([df.shape[0],1])
for borough in df.index:
    latitude, longtitude = get_coordinates('{}, Munich'.format(borough))
    df.loc[borough,'Latitude'] = latitude
    df.loc[borough,'Longitude'] = longtitude

The dataset after adding coordinates for each row.

In [123]:
df.head()

Unnamed: 0_level_0,Area(Km2),Population,Population_Density(ppl/Km2),Latitude,Longitude
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Altstadt-Lehel,3.15,21100,6708,48.137828,11.574582
Ludwigsvorstadt-Isarvorstadt,4.4,51644,11734,48.131771,11.555809
Maxvorstadt,4.3,51402,11960,48.149555,11.567753
Schwabing-West,4.36,68527,15706,48.168271,11.569873
Au-Haidhausen,4.22,61356,14541,48.128753,11.590536


Next we define a color scale using the colour library. We will use the color scale to represent the population density of each borough in the map.<br>
We sort the dataframe using the population density column. <br>
After that we add the color column to the dataset, assigning bright colors to less densely populated boroughs and darker colors to densely populated boroughs. For the purpose of good visibility, the brightest color used here is yellow and the darkest is red. The ascending order of colors is from yellow to red.

In [124]:
colors = list(Color("yellow").range_to(Color("red"),25))
sorted_df = df.sort_values(by='Population_Density(ppl/Km2)')
sorted_df['color'] = colors

In [125]:
sorted_df.head()

Unnamed: 0_level_0,Area(Km2),Population,Population_Density(ppl/Km2),Latitude,Longitude,color
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Aubing-Lochhausen-Langwied,34.06,47813,1404,48.158437,11.414066,yellow
Feldmoching-Hasenbergl,28.94,61774,2135,48.213804,11.541275,#fff400
Allach-Untermenzing,15.45,33355,2159,48.195994,11.457013,#ffea00
Schwabing-Freimann,25.67,77936,3036,48.170089,11.588486,#ffdf00
Trudering-Riem,22.45,73206,3261,48.126036,11.663338,#ffd400


The first step to creating the second dataset on gym locations is defining the url which is going to be used to query location data from Foursquare.<br>
We set the results limit to 200, although the result set is much smaller than that. We set the radius of the search query to 10000 meters, since that is nearly as far as city borders are from the city center.<br>
In order to search by type of venue in Foursquare, we need a type ID for the gym type, which we find in the Foursquare documentation.<br>
After querying Foursquare, we cast the data in json format.

In [74]:
LIMIT = 200 
radius = 10000
gym_ID = '4bf58dd8d48988d176941735'

url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    munich_latitude, 
    munich_longitude, 
    radius, 
    LIMIT,
    gym_ID)

results = requests.get(url).json()

This is how the query result looks like.

In [77]:
results['response']['venues']

[{'id': '5cfc926060d11b002c1ca397',
  'name': 'FitX Fitnessstudio',
  'location': {'address': 'Lenbachplatz 4',
   'lat': 48.1415585,
   'lng': 11.5683978,
   'labeledLatLngs': [{'label': 'display',
     'lat': 48.1415585,
     'lng': 11.5683978}],
   'distance': 717,
   'postalCode': '80333',
   'cc': 'DE',
   'city': 'München',
   'state': 'Bayern',
   'country': 'Deutschland',
   'formattedAddress': ['Lenbachplatz 4', '80333 München', 'Deutschland']},
  'categories': [{'id': '4bf58dd8d48988d175941735',
    'name': 'Gym / Fitness Center',
    'pluralName': 'Gyms or Fitness Centers',
    'shortName': 'Gym / Fitness',
    'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/building/gym_',
     'suffix': '.png'},
    'primary': True}],
  'referralId': 'v-1588185288',
  'hasPerk': False},
 {'id': '515600f245b09b48ecc544e5',
  'name': 'CrossFit eo',
  'location': {'address': 'Weltenburger Str. 6',
   'lat': 48.140814371456734,
   'lng': 11.62784409553123,
   'labeledLatLngs': [{'la

We flatten the data, define necessary columns and put it to a dataframe.

In [94]:
flat = json_normalize(results['response']['venues'])
filtered_columns = ['name', 'location.lat', 'location.lng']
munich_gyms = flat.loc[:, filtered_columns]

This is the final look of the gym location dataset after cleaning.

In [126]:
munich_gyms.head()

Unnamed: 0,name,location.lat,location.lng
0,FitX Fitnessstudio,48.141559,11.568398
1,CrossFit eo,48.140814,11.627844
2,Terry's Original House of Pain,48.146224,11.582714
3,BodyStreet,48.127535,11.557053
4,BODY STREET | München Giesing | EMS Training,48.113541,11.57818


There are 43 gyms around the city.

In [127]:
munich_gyms.shape[0]

43

<b>This is the concluding step in visualizing the location data on the map.</b><br>
We add markers on the map to show all the information that we have collected so far:
* We show the boroughs in large circle markers. The size of the marker indicates the area of the borough, the larger the circle marker, the larger the area of the borough. The color of the circle marker indicates the population density, as explained on the previous section.
* We show the gym locations in small blue circle markers. Each marker represents a gym. <br>
<b> We are looking for spaces around the city where population density is high and gym density is very low. In other words, we are interested in areas with dark color of the borough and no blue spots around, that means the neighborhood is densely populated and there are no gyms around.<b/>

In [128]:
for lat, lng, gym_name in zip(munich_gyms['location.lat'], munich_gyms['location.lng'], munich_gyms['name']):
    label = '{}'.format(gym_name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=2,
        popup=label,
        color='blue',
        fill=True,
        fill_color=color.hex,
        fill_opacity=0.9,
        parse_html=False).add_to(map_munich) 
for lat, lng, borough, area, color in zip(sorted_df['Latitude'], sorted_df['Longitude'], sorted_df.index, sorted_df['Area(Km2)'], sorted_df['color']):
    label = '{}'.format(borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=area,
        popup=label,
        color=color.hex,
        fill=True,
        fill_color=color.hex,
        fill_opacity=0.9,
        parse_html=False).add_to(map_munich) 

In [129]:
map_munich

After inspecting the map visually for a while, we discover an area near an orange circle around which there are no blue spots. We click on the marker and the label with the borough name comes up. Its name is "Schwabing-West". <br>

To prove the first visual decision, we do some further inspection. We query again the Foursquare platform for gyms around the borough that we just found.

In [130]:
LIMIT = 100 
radius = 750
gym_ID = '4bf58dd8d48988d176941735'

url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    df.loc['Schwabing-West','Latitude'], 
    df.loc['Schwabing-West','Longitude'], 
    radius, 
    LIMIT,
    gym_ID)

subset_results = requests.get(url).json()

The radius was set to 500 meters at first, but after some trials on the number we reached the distance of 750 meters. With this distance from the borough center we get an empty result set from Foursquare. That means that there are no gyms in that area of the city

In [131]:
flat = json_normalize(subset_results['response']['venues'])
flat.shape

(0, 0)

In [132]:
subset_results['response']

{'venues': []}

### Discussion
Further development can be done in these directions:
* Detailed inspection in other parts of the city to discover more business opportunities for new gyms. 
* Instead of showing population data of the boroughs in colored circles located on the center of the borough, the whole borough could be filled with color. That would require finding the exact borders of the boroughs and creating a polygon which would later be colored with the chosen color.

### Conclusion

In summary, this project has achieved its objectives. By the end of it, we have a location and a discrete radius where there are no gyms currently located. <br>
<b> In the borough "Schwabing-West" there exist an abstract circle with center on the borough coordinates (48.168271 N, 11.569873 E) and radius of 750 meters where are no gyms currently operating. </b> <br>
This result would be presented to the stakeholders of the project and their feedback would be appreciated for further improvement of the methodology.

Thank you!