# This notebook will be used for the capstone project

### Objective
This notebook is used to scrape venues in Zurich, Switzerland and determine where we should open a gym or yoga studio. 

### Neighbourhood data
All neighbourhood data is pulled from wikipedia, with additional manipulation of the data required as the wikipedia data is very inconvenient in this example. So all table rows and columns had to be treated differently. 

### Statistical data of Zurich and neighbourhood borders
Further inhabitant data is collected from the statistical bureau of the city of Zurich. 
The statistical data of the city of Zurich also includes polygon data of the boundaries of individual neighbourhoods. This allows to draw neighbourhood borders and centers (computed from the border polygons) on the map. 

### Collecting gym and yoga studio data
Next, gym and yoga venue data is collected for Zurich. Thanks to the polygon data, we can check if the individual venues are actually located within a given neighbourhood. We can thereby create a very clean dataframe of the neighbourhood locations of each venue. 

### Final assessment
Finally, we can use our statistical inhabitant data to determine in which neighbourhood we have a lot of gyms (people exercising), but not a lot of yoga studios (particular form of exercising). This will help someone to open their yoga studio in a location with demand for physical activity. 

We can also determine neighbourhoods that do not have a lot of gyms or yoga studios per inhabitant. So a new owner of a venue should investigate these neighbourhoods first to check the viability of opening a venue there.

### Import libraries

In [1]:
import pandas as pd
import numpy as np
import re
import requests # library to handle requests
from bs4 import BeautifulSoup
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

print('Investigating the neighbourhoods of Zurich!')

Investigating the neighbourhoods of Zurich!


### Fetch Zurich neighbourhood table from Wiki

In [2]:
## URL of table
zh_url='https://de.wikipedia.org/wiki/Stadtteile_der_Stadt_Zürich'

### get page content and parse
zh_page = requests.get(zh_url).text
zh_parsed = BeautifulSoup(zh_page,'xml')

### find all tables and table rows
zh_table = zh_parsed.findAll('table',{'class':'wikitable sortable'})[1]
zh_table_rows = zh_table.find_all('tr')

### Extract website information from scraped data

In [3]:
data = []
### read our rows from data
for row in zh_table_rows:
    data.append([t.text.strip() for t in row.find_all('td')])

### omit first and last row
data = data[1:-1]
addition = []

## Pre-processing
### Deal with nested table on wiki

In [4]:
### clean nested table rows
for quartier in data:

    ### check if borough is in row, remember for next row
    if quartier[0].find('Kreis') == 0:
        ### space after Kreis XY
        quartier[0] = re.sub(r"([0-9]+(\.[0-9]+)?)",r" \1 ", quartier[0]).strip()
        addition = quartier[0]
    elif quartier[0].find('Kreis') != 0:
        quartier.insert(0, addition) 
    
    ### remove entry of sigil
    if quartier[1] == '':
        del quartier[1]
        
    ### remove non-sensical strings
    for j, ele in enumerate(quartier):
        quartier[j] = re.sub(r'^.*?!', '', ele)

### Read data into dataframe and assign columns

In [5]:
### assign columns to table entries
zh_data = pd.DataFrame(data, columns=['Kreis', 'Quartier', 'BFS-Code', 'Included', 'Area in km²', 'Inhabitants (2018)', 'Inhabitants (2013)', 
 'Inhabitants (2005)', 'Immigrants'])

### convert inhabitant entries
for j, ele in enumerate(zh_data['Inhabitants (2018)']):
    zh_data.loc[j,'Inhabitants (2018)'] = re.sub(r'\'', '', ele)
    
### convert inhabitant entries
for j, ele in enumerate(zh_data['Inhabitants (2013)']):
    zh_data.loc[j,'Inhabitants (2013)'] = re.sub(r'\'', '', ele)
    
### convert inhabitant entries
for j, ele in enumerate(zh_data['Inhabitants (2005)']):
    zh_data.loc[j,'Inhabitants (2005)'] = re.sub(r'\'', '', ele)
    
zh_data.head(34)

Unnamed: 0,Kreis,Quartier,BFS-Code,Included,Area in km²,Inhabitants (2018),Inhabitants (2013),Inhabitants (2005),Immigrants
0,Kreis 1 Altstadt,Rathaus,261011,vor 1893,0.38,3267,3194,3081,"30,1 %"
1,Kreis 1 Altstadt,Hochschulen,261012,vor 1893,0.56,664,665,695,"34,3 %"
2,Kreis 1 Altstadt,Lindenhof,261013,vor 1893,0.23,990,923,950,"30,1 %"
3,Kreis 1 Altstadt,City,261014,vor 1893,0.64,829,783,846,"30,0 %"
4,Kreis 2,Wollishofen,261021,1893,5.75,18923,15937,15592,"29,1 %"
5,Kreis 2,Leimbach,261023,1893,2.92,6320,5730,4867,"33,6 %"
6,Kreis 2,Enge,261024,1893,2.4,9634,8836,8375,"36,7 %"
7,Kreis 3 Wiedikon,Alt-Wiedikon,261031,1893,1.85,17956,16706,14971,"34,8 %"
8,Kreis 3 Wiedikon,Friesenberg,261033,1893,5.15,10933,10696,10360,"18,3 %"
9,Kreis 3 Wiedikon,Sihlfeld,261034,1893,1.64,21680,20931,20554,"31,2 %"


## Plotting map
### Load libraries for mapping

In [6]:
from geopy.geocoders import Nominatim

import matplotlib.cm as cm
import matplotlib.colors as colors

import geopandas as gpd
from shapely.geometry import shape
import geojson

import folium



### Find Lat/Lon data of Zurich

In [7]:
address = 'ZURICH'

geolocator = Nominatim(user_agent="zurich_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The lati & long coordinate of Zurich are {}, {}.'.format(latitude, longitude))

The lati & long coordinate of Zurich are 47.3744489, 8.5410422.


### Load map of Zurich

In [8]:
zh_map = folium.Map(location=[latitude, longitude], zoom_start=12)
# zh_map

### Load polygons of neighbourhoods 

In [9]:
### load polygon data for quartiere/neighbourhoods
zh_geo = gpd.read_file('stzh.adm_statistische_quartiere_map.shp')  
### rename borough/kreis and neighbourhood/quartier
zh_geo.rename(columns = {'qname':'Quartier','kname':'Kreis'}, inplace = True) 
### convert to lat/lon
gjson = zh_geo.to_crs(epsg='4326').to_json()
### add polygons to map
folium.GeoJson(gjson).add_to(zh_map)
zh_map
zh_geo

Unnamed: 0,objectid,objid,qnr,Quartier,knr,Kreis,geometry
0,1,34,73.0,Hirslanden,7.0,Kreis 7,"POLYGON ((2684457.034 1246514.804, 2684466.315..."
1,2,33,83.0,Weinegg,8.0,Kreis 8,"POLYGON ((2684457.383 1246512.719, 2684458.291..."
2,3,32,82.0,MÃ¼hlebach,8.0,Kreis 8,"POLYGON ((2684269.913 1246566.796, 2684271.618..."
3,4,31,81.0,Seefeld,8.0,Kreis 8,"POLYGON ((2683794.254 1246609.895, 2683802.117..."
4,7,16,41.0,Werd,4.0,Kreis 4,"POLYGON ((2682651.888 1247587.653, 2682650.697..."
5,8,15,34.0,Sihlfeld,3.0,Kreis 3,"POLYGON ((2681620.924 1247666.669, 2681627.848..."
6,9,14,91.0,Albisrieden,9.0,Kreis 9,"POLYGON ((2680246.801 1248184.959, 2680242.274..."
7,10,13,72.0,Hottingen,7.0,Kreis 7,"POLYGON ((2686439.052 1249239.332, 2686493.549..."
8,5,30,21.0,Wollishofen,2.0,Kreis 2,"POLYGON ((2683464.971 1243316.936, 2683455.914..."
9,18,22,61.0,Unterstrass,6.0,Kreis 6,"POLYGON ((2682138.424 1251265.833, 2682137.008..."


### Obtain quartier/neighbourhood centroids from circumference polygons

In [10]:
### convert from swiss encoding to lat/lon
zh_poly = zh_geo.to_crs(epsg='4326')

### initialize polygon centroids
lat_centroid = []
lon_centroid = []

### loop through quartiers/neighbourhoods
for i in range(len(zh_poly)):
    ### convert to shape object
    p = shape(zh_poly['geometry'][i])
    ### extract polygon points
    lon_polygon, lat_polygon = p.exterior.coords.xy
    
    ### initialize centroid for each polygon
    lat = 0
    lon = 0
    ### add up all points
    for j in range(len(lon_polygon)):
        lat += lat_polygon[j]
        lon += lon_polygon[j]

    ### final coordinates
    lat_centroid.append(lat/len(lat_polygon))
    lon_centroid.append(lon/len(lon_polygon))

# quartier centroid lat/lon coordinates
zh_geo['Latitude'] = lat_centroid
zh_geo['Longitude'] = lon_centroid

zh_geo.head()

Unnamed: 0,objectid,objid,qnr,Quartier,knr,Kreis,geometry,Latitude,Longitude
0,1,34,73.0,Hirslanden,7.0,Kreis 7,"POLYGON ((2684457.034 1246514.804, 2684466.315...",47.36202,8.572849
1,2,33,83.0,Weinegg,8.0,Kreis 8,"POLYGON ((2684457.383 1246512.719, 2684458.291...",47.356528,8.567511
2,3,32,82.0,MÃ¼hlebach,8.0,Kreis 8,"POLYGON ((2684269.913 1246566.796, 2684271.618...",47.358239,8.556874
3,4,31,81.0,Seefeld,8.0,Kreis 8,"POLYGON ((2683794.254 1246609.895, 2683802.117...",47.357012,8.55488
4,7,16,41.0,Werd,4.0,Kreis 4,"POLYGON ((2682651.888 1247587.653, 2682650.697...",47.372574,8.526534


### Visualize all Zurich neighbourhood centroids

In [11]:
# add quartier markers to map
for lat, lon, neighbourhood, kreis in zip(zh_geo['Latitude'], zh_geo['Longitude'], zh_geo['Quartier'], zh_geo['Kreis']):
    label = '{} , {}'.format(kreis, neighbourhood)
    label = folium.Popup(label)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='blue',
        fill=False
    ).add_to(zh_map)  
    
zh_map

In [12]:
zh_data = zh_data.merge(zh_geo.drop(columns=['Kreis']), left_on='Quartier', right_on='Quartier')
zh_data.head()

Unnamed: 0,Kreis,Quartier,BFS-Code,Included,Area in km²,Inhabitants (2018),Inhabitants (2013),Inhabitants (2005),Immigrants,objectid,objid,qnr,knr,geometry,Latitude,Longitude
0,Kreis 1 Altstadt,Rathaus,261011,vor 1893,0.38,3267,3194,3081,"30,1 %",23,12,11.0,1.0,"POLYGON ((2683374.124 1246786.080, 2683374.192...",47.370203,8.543717
1,Kreis 1 Altstadt,Hochschulen,261012,vor 1893,0.56,664,665,695,"34,3 %",29,6,12.0,1.0,"POLYGON ((2683993.543 1247428.241, 2683993.817...",47.372247,8.545615
2,Kreis 1 Altstadt,Lindenhof,261013,vor 1893,0.23,990,923,950,"30,1 %",16,23,13.0,1.0,"POLYGON ((2683168.047 1246706.173, 2683187.786...",47.36993,8.541478
3,Kreis 1 Altstadt,City,261014,vor 1893,0.64,829,783,846,"30,0 %",22,17,14.0,1.0,"POLYGON ((2683325.312 1247912.255, 2683325.292...",47.37365,8.537898
4,Kreis 2,Wollishofen,261021,1893,5.75,18923,15937,15592,"29,1 %",5,30,21.0,2.0,"POLYGON ((2683464.971 1243316.936, 2683455.914...",47.339905,8.528241


### Look at downtown Zurich

In [13]:
### kreis 1 is "downtown" Zurich
address = 'Kreis 1 ZURICH'

### locate the above address
geolocator = Nominatim(user_agent="zurich_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The lati & long coordinate of Downtown Zurich (Kreis 1) are {}, {}.'.format(latitude, longitude))

The lati & long coordinate of Downtown Zurich (Kreis 1) are 47.3722329, 8.5423291.


### Visualizing and clustering of downtown/center Zurich

In [14]:
zh_center = zh_data[zh_data['Kreis'].str.contains('1 ')].reset_index(drop=True)
zh_center.head()

# create map of T using latitude and longitude values
zh_map_center = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, label in zip(zh_center['Latitude'], zh_center['Longitude'], zh_center['Quartier']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=False,
        parse_html=False).add_to(zh_map_center)  

### look at kreis 1/center only
zh_geo_center = zh_geo[zh_geo['Kreis'] == 'Kreis 1']
### convert to lat/lon
gjson_center = zh_geo_center.to_crs(epsg='4326').to_json()
### add polygons to map
folium.GeoJson(gjson_center).add_to(zh_map_center)

zh_map_center

### Load Foursquare info

In [15]:
CLIENT_ID = 'DO2DZ0V5ZCOSSAOPBBBOP1GDPMUFVVYG55UV45DQOFELBZTH' # your Foursquare ID
CLIENT_SECRET = '1PIX4SXCFIEKXPV5JPGYN3GCDLANHLFLO2L0MYRDBRGW44RD' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: DO2DZ0V5ZCOSSAOPBBBOP1GDPMUFVVYG55UV45DQOFELBZTH
CLIENT_SECRET:1PIX4SXCFIEKXPV5JPGYN3GCDLANHLFLO2L0MYRDBRGW44RD


## Explore first neighbourhood as example (Quartier 1: Rathaus)

Extract name of the first neighbourhood/quartier

In [16]:
zh_data.loc[0, 'Quartier']

'Rathaus'

Set location data for first neighbourhood

In [17]:
neighborhood_latitude = zh_data.loc[0, 'Latitude']
neighborhood_longitude = zh_data.loc[0, 'Longitude'] 

neighborhood_name = zh_data.loc[0, 'Quartier'] 

print('Latitude and longitude values of Quartier {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Quartier Rathaus are 47.37020335236508, 8.54371726189494.


## Check top 100 venues of first neighbourhood

In [18]:
### number of venues
LIMIT = 100 
radius = 1000 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url 

'https://api.foursquare.com/v2/venues/explore?&client_id=DO2DZ0V5ZCOSSAOPBBBOP1GDPMUFVVYG55UV45DQOFELBZTH&client_secret=1PIX4SXCFIEKXPV5JPGYN3GCDLANHLFLO2L0MYRDBRGW44RD&v=20180605&ll=47.37020335236508,8.54371726189494&radius=1000&limit=100'

Output the result found for the first neighbourhood

In [19]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '602adf9a3e9575501b5e5b93'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Kreis 1',
  'headerFullLocation': 'Kreis 1, Zürich',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 238,
  'suggestedBounds': {'ne': {'lat': 47.37920336136509,
    'lng': 8.556981329452189},
   'sw': {'lat': 47.36120334336507, 'lng': 8.530453194337692}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b463772f964a5205c1a26e3',
       'name': 'Fitnesspark Münstergasse',
       'location': {'address': 'Blaufahnenstr. 3',
        'lat': 47.37088805140556,
        'lng': 8.544999004166998,
        'labeledLatLngs': [{'label': 'display',
          'lat': 47

### function that extracts the category of the venue


In [20]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [21]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  nearby_venues = json_normalize(venues) # flatten JSON


Unnamed: 0,name,categories,lat,lng
0,Fitnesspark Münstergasse,Gym / Fitness Center,47.370888,8.544999
1,Café Schober,Café,47.3714,8.544149
2,Schwarzenbach Kolonialwaren,Gourmet Shop,47.371444,8.544091
3,Ban Song Thai,Thai Restaurant,47.369395,8.544136
4,barfussbar,Bar,47.368441,8.542181


In [22]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


## 2. Explore Neighbourhoods/Quartiere in Zurich


In [23]:
def getNearbyVenues(names, latitudes, longitudes, radius=2000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print('Scraping venues for quartier:', name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [24]:
zh_venues = getNearbyVenues(names=zh_data['Quartier'],
                                   latitudes=zh_data['Latitude'],
                                   longitudes=zh_data['Longitude']
                                  )

Scraping venues for quartier: Rathaus
Scraping venues for quartier: Hochschulen
Scraping venues for quartier: Lindenhof
Scraping venues for quartier: City
Scraping venues for quartier: Wollishofen
Scraping venues for quartier: Leimbach
Scraping venues for quartier: Enge
Scraping venues for quartier: Alt-Wiedikon
Scraping venues for quartier: Friesenberg
Scraping venues for quartier: Sihlfeld
Scraping venues for quartier: Werd
Scraping venues for quartier: Langstrasse
Scraping venues for quartier: Hard
Scraping venues for quartier: Gewerbeschule
Scraping venues for quartier: Escher Wyss
Scraping venues for quartier: Unterstrass
Scraping venues for quartier: Oberstrass
Scraping venues for quartier: Fluntern
Scraping venues for quartier: Hottingen
Scraping venues for quartier: Hirslanden
Scraping venues for quartier: Witikon
Scraping venues for quartier: Seefeld
Scraping venues for quartier: Weinegg
Scraping venues for quartier: Albisrieden
Scraping venues for quartier: Altstetten
Scrapin

In [25]:
print(zh_venues.shape)
zh_venues.head()

(2866, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Rathaus,47.370203,8.543717,Fitnesspark Münstergasse,47.370888,8.544999,Gym / Fitness Center
1,Rathaus,47.370203,8.543717,Café Schober,47.3714,8.544149,Café
2,Rathaus,47.370203,8.543717,Schwarzenbach Kolonialwaren,47.371444,8.544091,Gourmet Shop
3,Rathaus,47.370203,8.543717,barfussbar,47.368441,8.542181,Bar
4,Rathaus,47.370203,8.543717,Old Crow,47.372092,8.541024,Cocktail Bar


In [26]:
zh_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Affoltern,40,40,40,40,40,40
Albisrieden,88,88,88,88,88,88
Alt-Wiedikon,100,100,100,100,100,100
Altstetten,94,94,94,94,94,94
City,100,100,100,100,100,100
Enge,100,100,100,100,100,100
Escher Wyss,100,100,100,100,100,100
Fluntern,100,100,100,100,100,100
Friesenberg,83,83,83,83,83,83
Gewerbeschule,100,100,100,100,100,100


In [27]:
print('There are {} uniques categories.'.format(len(zh_venues['Venue Category'].unique())))

There are 196 uniques categories.


## Look into specific category

In [28]:
### number of venues
LIMIT = 50
radius = 2000 # define radius

def getNearbyVenuesCategory(names, latitudes, longitudes, foursquare_category_id, radius=2000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print('Scraping venues for quartier:', name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT,
            foursquare_category_id)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']

        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
        
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### Extract specific categories

In [29]:
foursquare_id_yoga = '4bf58dd8d48988d102941735'
foursquare_id_gym = '4bf58dd8d48988d175941735'

zh_venues_yoga = getNearbyVenuesCategory(names=zh_data['Quartier'],
                                   latitudes=zh_data['Latitude'],
                                   longitudes=zh_data['Longitude'],
                                   foursquare_category_id=foursquare_id_yoga
                                  )

zh_venues_gym = getNearbyVenuesCategory(names=zh_data['Quartier'],
                                   latitudes=zh_data['Latitude'],
                                   longitudes=zh_data['Longitude'],
                                   foursquare_category_id=foursquare_id_gym
                                  )

zh_venues_yoga.head(100)
zh_venues_gym.head(400)

Scraping venues for quartier: Rathaus
Scraping venues for quartier: Hochschulen
Scraping venues for quartier: Lindenhof
Scraping venues for quartier: City
Scraping venues for quartier: Wollishofen
Scraping venues for quartier: Leimbach
Scraping venues for quartier: Enge
Scraping venues for quartier: Alt-Wiedikon
Scraping venues for quartier: Friesenberg
Scraping venues for quartier: Sihlfeld
Scraping venues for quartier: Werd
Scraping venues for quartier: Langstrasse
Scraping venues for quartier: Hard
Scraping venues for quartier: Gewerbeschule
Scraping venues for quartier: Escher Wyss
Scraping venues for quartier: Unterstrass
Scraping venues for quartier: Oberstrass
Scraping venues for quartier: Fluntern
Scraping venues for quartier: Hottingen
Scraping venues for quartier: Hirslanden
Scraping venues for quartier: Witikon
Scraping venues for quartier: Seefeld
Scraping venues for quartier: Weinegg
Scraping venues for quartier: Albisrieden
Scraping venues for quartier: Altstetten
Scrapin

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Rathaus,47.370203,8.543717,Fitnesspark Münstergasse,47.370888,8.544999,Gym / Fitness Center
1,Rathaus,47.370203,8.543717,Indigo,47.372532,8.534018,Gym / Fitness Center
2,Rathaus,47.370203,8.543717,Migros Fitnesspark Stockerhof,47.366313,8.535020,Gym / Fitness Center
3,Rathaus,47.370203,8.543717,Balboa Bar and Gym,47.373542,8.533390,Gym
4,Rathaus,47.370203,8.543717,Luxor Sports Club,47.367976,8.537466,Gym
...,...,...,...,...,...,...,...
395,Hard,47.381404,8.512116,Gym @ Renaissance Zurich Tower,47.388064,8.514600,Gym / Fitness Center
396,Hard,47.381404,8.512116,Bikram Yoga,47.373539,8.524790,Yoga Studio
397,Hard,47.381404,8.512116,Balboa,47.386494,8.522711,Gym / Fitness Center
398,Hard,47.381404,8.512116,Sheraton Fitness,47.390394,8.510223,Gym / Fitness Center


### Check if venue is actually in quartier/neighbourhood

In [30]:
from shapely.geometry import Point
from shapely.geometry.polygon import Polygon

### initialize array of venues not actually in quartier
not_in_quartier = []

### Go through all venues
for i in range(len(zh_venues_yoga['Venue'])):

    quartier_polygon = zh_poly[ zh_poly['Quartier'] == zh_venues_yoga.loc[i, 'Neighbourhood'] ].reset_index(drop=True)
    
    lat = zh_venues_yoga.loc[i, 'Venue Latitude']
    lon = zh_venues_yoga.loc[i, 'Venue Longitude']
    point = Point([lon, lat])
    
    in_quartier = quartier_polygon.contains(point)

    ### store row index if venue is not in quartier
    if in_quartier[0] == False: 
        not_in_quartier.append(i)
        
zh_venues_yoga_checked = zh_venues_yoga.drop(not_in_quartier)

zh_venues_yoga_checked.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Rathaus,47.370203,8.543717,AirYoga,47.368315,8.545928,Yoga Studio
2,Rathaus,47.370203,8.543717,athayoga,47.370736,8.545406,Yoga Studio
38,City,47.37365,8.537898,PowerQiBalance,47.373033,8.532857,Yoga Studio
49,Wollishofen,47.339905,8.528241,SAMiRAYOGA,47.344446,8.526638,Yoga Studio
54,Enge,47.358245,8.53004,Planet Yoga,47.365101,8.525203,Yoga Studio


## Map of checked venues

In [31]:
columns_to_drop = ['Neighbourhood Latitude', 'Neighbourhood Longitude', 'Venue Latitude', 'Venue Longitude', 'Venue Category']

zh_venues_total_yoga = zh_venues_yoga.groupby('Neighbourhood').count().drop(columns=columns_to_drop)
zh_venues_total_gym = zh_venues_gym.groupby('Neighbourhood').count().drop(columns=columns_to_drop)

### merge yoga and gym df
zh_venues_total_comparison = zh_venues_total_yoga.merge(zh_venues_total_gym, left_on='Neighbourhood', right_on='Neighbourhood', suffixes=('_yoga', '_gym'))
### determine ratio of gyms to yoga studio
zh_venues_total_comparison['Gym/Yoga'] = zh_venues_total_comparison['Venue_gym'].div(zh_venues_total_comparison['Venue_yoga'], axis=0).round(2)
### merge in inhabitant data
zh_venues_total_comparison = zh_venues_total_comparison.merge(zh_data[['Quartier', 'Inhabitants (2018)']], left_on='Neighbourhood', right_on='Quartier').set_index('Quartier')

### convert inhabitant column type to int
zh_venues_total_comparison['Inhabitants (2018)'] = zh_venues_total_comparison['Inhabitants (2018)'].astype(int)

### determine ratio of inhabitants to yoga
zh_venues_total_comparison['pp_yoga'] = zh_venues_total_comparison['Inhabitants (2018)'].div(zh_venues_total_comparison['Venue_yoga'], axis=0).round(1)
### determine ratio of inhabitants to gym
zh_venues_total_comparison['pp_gym'] = zh_venues_total_comparison['Inhabitants (2018)'].div(zh_venues_total_comparison['Venue_gym'], axis=0).round(1)

zh_venues_total_comparison.head(34)


Unnamed: 0_level_0,Venue_yoga,Venue_gym,Gym/Yoga,Inhabitants (2018),pp_yoga,pp_gym
Quartier,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Affoltern,3,4,1.33,26562,8854.0,6640.5
Albisrieden,3,11,3.67,22304,7434.7,2027.6
Alt-Wiedikon,5,29,5.8,17956,3591.2,619.2
Altstetten,2,13,6.5,33461,16730.5,2573.9
City,13,47,3.62,829,63.8,17.6
Enge,9,29,3.22,9634,1070.4,332.2
Escher Wyss,6,39,6.5,6066,1011.0,155.5
Fluntern,4,17,4.25,8485,2121.2,499.1
Friesenberg,3,8,2.67,10933,3644.3,1366.6
Gewerbeschule,11,48,4.36,9513,864.8,198.2


In [32]:
# add yoga markers to map
for lat, lng, label in zip(zh_venues_yoga['Venue Latitude'], zh_venues_yoga['Venue Longitude'], zh_venues_yoga['Venue']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=False,
        parse_html=False).add_to(zh_map)  
    
# add gym markers to map
for lat, lng, label in zip(zh_venues_gym['Venue Latitude'], zh_venues_gym['Venue Longitude'], zh_venues_gym['Venue']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='green',
        fill=False,
        parse_html=False).add_to(zh_map) 

zh_map

### Count Yoga studios and gyms per neighbourhood

In [33]:
zh_venues_total_comparison.sort_values(by=['pp_yoga'],ascending=False).head(34)

Unnamed: 0_level_0,Venue_yoga,Venue_gym,Gym/Yoga,Inhabitants (2018),pp_yoga,pp_gym
Quartier,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Wollishofen,1,3,3.0,18923,18923.0,6307.7
Altstetten,2,13,6.5,33461,16730.5,2573.9
Hirzenbach,1,5,5.0,12801,12801.0,2560.2
Oerlikon,2,16,8.0,23214,11607.0,1450.9
Schwamendingen-Mitte,1,4,4.0,11100,11100.0,2775.0
Affoltern,3,4,1.33,26562,8854.0,6640.5
Albisrieden,3,11,3.67,22304,7434.7,2027.6
Seebach,4,13,3.25,25568,6392.0,1966.8
Unterstrass,4,25,6.25,23394,5848.5,935.8
Hirslanden,2,7,3.5,7488,3744.0,1069.7


In [34]:
zh_venues_total_comparison.sort_values(by=['pp_gym'],ascending=False).head(34)

Unnamed: 0_level_0,Venue_yoga,Venue_gym,Gym/Yoga,Inhabitants (2018),pp_yoga,pp_gym
Quartier,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Affoltern,3,4,1.33,26562,8854.0,6640.5
Wollishofen,1,3,3.0,18923,18923.0,6307.7
Schwamendingen-Mitte,1,4,4.0,11100,11100.0,2775.0
Altstetten,2,13,6.5,33461,16730.5,2573.9
Hirzenbach,1,5,5.0,12801,12801.0,2560.2
Albisrieden,3,11,3.67,22304,7434.7,2027.6
Seebach,4,13,3.25,25568,6392.0,1966.8
Leimbach,2,4,2.0,6320,3160.0,1580.0
Oerlikon,2,16,8.0,23214,11607.0,1450.9
Hottingen,4,8,2.0,11265,2816.2,1408.1


## Final assessment

### Per capita
Based on above analysis. The parts of town:
- Affoltern, 
- Wollishofen, 
- Schwamendingen-Mitte and
- Sihlfeld
would be most conducive to open a gym or yoga studio, as their population is very high and the count of gym or yoga venues is very low.

Especially for Sihlfeld, which is very centrally located. 

### Ratio of gym to yoga studios
It is noticeable that the centrally located neighbourhoods all have high ratios if gyms to yoga studios. This could be caused by a high demand for gyms right after work, before work or during lunch. Neighbourhoods further away from the city center often have a lower gym to yoga studio ratio, although they tend to have fewer of both, so those statistics must be treated with caution. Generally, neighbourhoods such as Oerlikon or Weinegg (each with a ratio of 5) seem to have a lot of demand for active venues, but not many yoga studios. So these neighbourhoods should be investigated first when opening a Yoga studio. 