<h1>IBM Applied Data Science Capstone</h1>

<h2>Restaurants in Warsaw</h2>

The main goal of this project is to find clusters containing districts of Warsaw (capital city of Poland) in terms of their resturants concentration.

Project steps:

<ul>
    <li>getting data about Warsaw districts</li>
    <li>getting data about their geographical coordinates</li>
    <li>obtaining venue data from Foursquare API</li>
    <li>clustering districts</li>
 </ul>

<h3>Import libraries</h3>

In [5]:
import time 

import requests

import bs4

import numpy as np

import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

from geopy.geocoders import Nominatim
import geopy.geocoders

import json

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

import folium

from bs4 import BeautifulSoup

<h3>Downloading data from Wikipedia - list of Warsaw's districts</h3>

In [6]:
data = requests.get('https://en.wikipedia.org/wiki/Districts_of_Warsaw').text
soup = BeautifulSoup(data, 'html.parser')

HTML code from Wikipedia

In [7]:
soup

<!DOCTYPE html>

<html class="client-nojs" dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<title>Districts of Warsaw - Wikipedia</title>
<script>document.documentElement.className="client-js";RLCONF={"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"Districts_of_Warsaw","wgTitle":"Districts of Warsaw","wgCurRevisionId":893575194,"wgRevisionId":893575194,"wgArticleId":56372451,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["CS1 errors: deprecated parameters","CS1 Polish-language sources (pl)","Commons category link from Wikidata","All stub articles","Districts of Warsaw","Geography of Warsaw","Warsaw geography stubs"],"wgBreakFrames":!1,"wgPageContentLanguage":"en","wgPageContentModel":"wikitext","wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","Sep

Extracting table with districs information

In [8]:
neighborhood = []
population = []

for row in soup.find('table').find_all('tr'):
    cells = row.find_all('td')
    if(len(cells) > 0):
        neighborhood.append(cells[0].text)
        population.append(cells[1].text) # remove the new line char from neighborhood cell

Converting districts data into dataframe

In [9]:
warsaw_df = pd.DataFrame(list(zip(neighborhood, population)), 
                          columns = ['District', 'Population'])

warsaw_df['Latitude'] = 0
warsaw_df['Longitude'] = 0
warsaw_df.drop(warsaw_df.tail(1).index,inplace=True)
warsaw_df

Unnamed: 0,District,Population,Latitude,Longitude
0,Mokotów,220682,0,0
1,Praga Południe,178665,0,0
2,Ursynów,145938,0,0
3,Wola,137519,0,0
4,Bielany,132683,0,0
5,Targówek,123278,0,0
6,Śródmieście,122646,0,0
7,Bemowo,115873,0,0
8,Białołęka,96588,0,0
9,Ochota,84990,0,0


<h3>Add geographical coordinates to each district</h3>

Using geopy packages

In [10]:
for i in range(len(warsaw_df)):
    address = warsaw_df.iloc[i, 0] + ', Warsaw'
    geolocator = Nominatim(user_agent='warsaw')
    location = geolocator.geocode(address)
    if location == None:
        continue
    latitude = location.latitude
    longitude = location.longitude
    warsaw_df.iloc[i,2] = latitude
    warsaw_df.iloc[i,3] = longitude
    
warsaw_df

Unnamed: 0,District,Population,Latitude,Longitude
0,Mokotów,220682,52.193987,21.045781
1,Praga Południe,178665,52.237396,21.071258
2,Ursynów,145938,52.141039,21.032321
3,Wola,137519,52.236238,20.954781
4,Bielany,132683,52.294652,20.92998
5,Targówek,123278,52.275192,21.058085
6,Śródmieście,122646,52.23281,21.019067
7,Bemowo,115873,52.238974,20.913288
8,Białołęka,96588,52.319665,21.021177
9,Ochota,84990,52.212225,20.97263


Get coordinates of Warsaw

In [11]:
address = 'Warsaw, Poland'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Warsaw are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Warsaw are 52.2337172, 21.0714111288323.


  This is separate from the ipykernel package so we can avoid doing imports until


<h3>Map of Warsaw with districts</h3>

In [12]:
map_warsaw = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(warsaw_df['Latitude'], warsaw_df['Longitude'], 
                                           warsaw_df['District']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_warsaw)  
    
map_warsaw

<h3>Use of Foursquare API to get restaurants data</h3>

Login information

In [16]:

CLIENT_ID = 'xxx' # your Foursquare ID
CLIENT_SECRET = 'xxx' # your Foursquare Secret
VERSION = '20190928' # Foursquare API version


Function to get restaurants in each district

In [17]:
def getNearbyVenues(names, latitudes, longitudes, radius=3000, LIMIT = 100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&query=Food'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [18]:
warsaw_nearby_venues = getNearbyVenues(warsaw_df.District,
                            warsaw_df.Latitude,
                            warsaw_df.Longitude)

Mokotów
Praga Południe
Ursynów
Wola
Bielany
Targówek
Śródmieście
Bemowo
Białołęka
Ochota
Wawer
Praga Północ
Ursus
Żoliborz
Włochy
Wilanów
Rembertów
Wesoła


Dataframe with restaurants data

In [19]:
print(warsaw_nearby_venues.shape)
warsaw_nearby_venues.head()

(1120, 7)


Unnamed: 0,District,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Mokotów,52.193987,21.045781,Stary Dom,52.195544,21.024004,Polish Restaurant
1,Mokotów,52.193987,21.045781,NABO Cafe,52.189653,21.068752,Scandinavian Restaurant
2,Mokotów,52.193987,21.045781,MEZZE hummus & falafel,52.203548,21.022705,Falafel Restaurant
3,Mokotów,52.193987,21.045781,Gringo Bar Burritos Tacos & More,52.201305,21.020496,Burrito Place
4,Mokotów,52.193987,21.045781,Targ Śniadaniowy Mokotów,52.189239,21.022857,Breakfast Spot


Test to see data from one district

In [20]:
warsaw_nearby_venues[warsaw_nearby_venues['District'] == 'Mokotów']

Unnamed: 0,District,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Mokotów,52.193987,21.045781,Stary Dom,52.195544,21.024004,Polish Restaurant
1,Mokotów,52.193987,21.045781,NABO Cafe,52.189653,21.068752,Scandinavian Restaurant
2,Mokotów,52.193987,21.045781,MEZZE hummus & falafel,52.203548,21.022705,Falafel Restaurant
3,Mokotów,52.193987,21.045781,Gringo Bar Burritos Tacos & More,52.201305,21.020496,Burrito Place
4,Mokotów,52.193987,21.045781,Targ Śniadaniowy Mokotów,52.189239,21.022857,Breakfast Spot
5,Mokotów,52.193987,21.045781,Boston Port,52.197249,21.024606,Seafood Restaurant
6,Mokotów,52.193987,21.045781,Pekin Express - duck & more,52.199367,21.023589,Asian Restaurant
7,Mokotów,52.193987,21.045781,Burger Bar,52.199293,21.02354,Burger Joint
8,Mokotów,52.193987,21.045781,Restauracja Polska Różana,52.208468,21.023531,Polish Restaurant
9,Mokotów,52.193987,21.045781,Ciao a Tutti Due,52.201612,21.016697,Pizza Place


Count of venues in each district

In [21]:
warsaw_nearby_venues.groupby('District').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
District,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bemowo,62,62,62,62,62,62
Białołęka,12,12,12,12,12,12
Bielany,32,32,32,32,32,32
Mokotów,100,100,100,100,100,100
Ochota,100,100,100,100,100,100
Praga Południe,100,100,100,100,100,100
Praga Północ,100,100,100,100,100,100
Rembertów,4,4,4,4,4,4
Targówek,65,65,65,65,65,65
Ursus,13,13,13,13,13,13


In [22]:
print('There are {} uniques categories.'.format(len(warsaw_nearby_venues['Venue Category'].unique())))

There are 79 uniques categories.


Create dummy variables for each row

In [24]:
# one hot encoding
warsaw_onehot = pd.get_dummies(warsaw_nearby_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
warsaw_onehot['District'] = warsaw_nearby_venues['District'] 

# # move neighborhood column to the first column
nindex = list(warsaw_onehot.columns).index("District")

cols = warsaw_onehot.columns

cols = list(cols)

cols_new = list() 
cols_new.append(cols[list(warsaw_onehot.columns).index("District")])
cols_new.extend(cols[0:list(warsaw_onehot.columns).index("District")])
cols_new.extend(cols[list(warsaw_onehot.columns).index("District")+1:])

warsaw_onehot = warsaw_onehot[cols_new]

warsaw_onehot.head()

Unnamed: 0,District,African Restaurant,American Restaurant,Argentinian Restaurant,Asian Restaurant,Bakery,Bistro,Breakfast Spot,Buffet,Bulgarian Restaurant,Burger Joint,Burrito Place,Cafeteria,Café,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Chinese Restaurant,Comfort Food Restaurant,Creperie,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Falafel Restaurant,Fast Food Restaurant,Fish & Chips Shop,Food,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Hawaiian Restaurant,Hungarian Restaurant,Indian Restaurant,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Kebab Restaurant,Korean Restaurant,Kosher Restaurant,Lebanese Restaurant,Mac & Cheese Joint,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,New American Restaurant,Noodle House,Pizza Place,Polish Restaurant,Ramen Restaurant,Restaurant,Russian Restaurant,Salad Place,Sandwich Place,Scandinavian Restaurant,Seafood Restaurant,Snack Place,Spanish Restaurant,Steakhouse,Sushi Restaurant,Taco Place,Tapas Restaurant,Thai Restaurant,Tibetan Restaurant,Turkish Restaurant,Udon Restaurant,Ukrainian Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,Mokotów,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Mokotów,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Mokotów,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Mokotów,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Mokotów,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [25]:
warsaw_onehot.shape

(1120, 80)

Frequency of each restaurant type in each district

In [26]:
warsaw_grouped = warsaw_onehot.groupby('District').mean().reset_index()
warsaw_grouped

Unnamed: 0,District,African Restaurant,American Restaurant,Argentinian Restaurant,Asian Restaurant,Bakery,Bistro,Breakfast Spot,Buffet,Bulgarian Restaurant,Burger Joint,Burrito Place,Cafeteria,Café,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Chinese Restaurant,Comfort Food Restaurant,Creperie,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Falafel Restaurant,Fast Food Restaurant,Fish & Chips Shop,Food,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Hawaiian Restaurant,Hungarian Restaurant,Indian Restaurant,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Kebab Restaurant,Korean Restaurant,Kosher Restaurant,Lebanese Restaurant,Mac & Cheese Joint,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,New American Restaurant,Noodle House,Pizza Place,Polish Restaurant,Ramen Restaurant,Restaurant,Russian Restaurant,Salad Place,Sandwich Place,Scandinavian Restaurant,Seafood Restaurant,Snack Place,Spanish Restaurant,Steakhouse,Sushi Restaurant,Taco Place,Tapas Restaurant,Thai Restaurant,Tibetan Restaurant,Turkish Restaurant,Udon Restaurant,Ukrainian Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,Bemowo,0.0,0.0,0.0,0.032258,0.032258,0.016129,0.032258,0.032258,0.0,0.016129,0.0,0.0,0.177419,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.016129,0.0,0.016129,0.0,0.016129,0.0,0.064516,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.048387,0.0,0.064516,0.016129,0.032258,0.0,0.0,0.0,0.0,0.0,0.016129,0.016129,0.0,0.0,0.0,0.0,0.0,0.145161,0.0,0.0,0.032258,0.0,0.016129,0.016129,0.0,0.016129,0.0,0.0,0.0,0.048387,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129
1,Białołęka,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333
2,Bielany,0.0,0.0,0.0,0.09375,0.03125,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.1875,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0625,0.0,0.03125,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0625,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0625,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Mokotów,0.0,0.0,0.0,0.03,0.01,0.02,0.03,0.0,0.0,0.08,0.01,0.0,0.18,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.12,0.0,0.02,0.01,0.0,0.0,0.01,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.05,0.04,0.01,0.05,0.0,0.01,0.02,0.01,0.02,0.0,0.0,0.0,0.06,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.02,0.0
4,Ochota,0.0,0.05,0.01,0.01,0.02,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.12,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.04,0.0,0.1,0.03,0.0,0.04,0.0,0.0,0.0,0.01,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.07,0.02,0.01,0.02,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.07,0.0,0.02,0.0,0.0,0.02,0.01,0.0,0.03,0.04
5,Praga Południe,0.0,0.0,0.0,0.01,0.01,0.04,0.01,0.0,0.01,0.01,0.0,0.0,0.16,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.01,0.02,0.01,0.13,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.03,0.0,0.02,0.0,0.0,0.0,0.0,0.06,0.05,0.01,0.11,0.01,0.0,0.03,0.0,0.0,0.0,0.01,0.01,0.03,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.02,0.01
6,Praga Północ,0.0,0.0,0.0,0.02,0.02,0.03,0.02,0.0,0.0,0.04,0.0,0.0,0.17,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.02,0.03,0.02,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.11,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.02,0.01,0.0,0.0,0.0,0.06,0.12,0.0,0.04,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.05,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.03,0.0
7,Rembertów,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Targówek,0.0,0.0,0.0,0.030769,0.0,0.061538,0.0,0.015385,0.0,0.046154,0.0,0.0,0.123077,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.046154,0.0,0.0,0.0,0.046154,0.030769,0.107692,0.0,0.030769,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.030769,0.0,0.046154,0.0,0.0,0.0,0.0,0.0,0.015385,0.015385,0.015385,0.0,0.0,0.015385,0.0,0.107692,0.046154,0.0,0.030769,0.015385,0.015385,0.015385,0.0,0.0,0.0,0.0,0.0,0.046154,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.015385,0.0
9,Ursus,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.153846,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.307692,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.076923,0.0,0.0,0.0,0.0,0.0,0.153846,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923


In [27]:
warsaw_grouped.shape

(18, 80)

Most common venues for each district

In [28]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [29]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['District']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['District'] = warsaw_grouped['District']

for ind in np.arange(warsaw_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(warsaw_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bemowo,Café,Pizza Place,Italian Restaurant,Fast Food Restaurant,Indian Restaurant,Sushi Restaurant,Food Court,Restaurant,Asian Restaurant,Bakery
1,Białołęka,Fast Food Restaurant,Asian Restaurant,Café,Pizza Place,Diner,Restaurant,Mediterranean Restaurant,Vietnamese Restaurant,Italian Restaurant,Deli / Bodega
2,Bielany,Café,Pizza Place,Asian Restaurant,Indian Restaurant,Polish Restaurant,Fast Food Restaurant,Italian Restaurant,Deli / Bodega,Food Court,Bakery
3,Mokotów,Café,Italian Restaurant,Burger Joint,Sushi Restaurant,Pizza Place,Restaurant,Polish Restaurant,Diner,Asian Restaurant,Breakfast Spot
4,Ochota,Café,Italian Restaurant,Sushi Restaurant,Pizza Place,American Restaurant,Vietnamese Restaurant,Korean Restaurant,Bistro,Indian Restaurant,Eastern European Restaurant


<h3>Clustering</h3>

KMeans clustering

In [30]:
# set number of clusters
kclusters = 5

warsaw_grouped_clustering = warsaw_grouped.drop('District', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(warsaw_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_ 

array([0, 0, 0, 2, 2, 2, 2, 1, 0, 4, 2, 0, 3, 2, 2, 2, 2, 2])

Assing cluster label to each district

In [31]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

warsaw_merged = warsaw_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
warsaw_merged = warsaw_merged.join(neighborhoods_venues_sorted.set_index('District'), on='District')

warsaw_merged['Cluster Labels'] =  warsaw_merged['Cluster Labels'].astype("int")
warsaw_merged.head()

Unnamed: 0,District,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Mokotów,220682,52.193987,21.045781,2,Café,Italian Restaurant,Burger Joint,Sushi Restaurant,Pizza Place,Restaurant,Polish Restaurant,Diner,Asian Restaurant,Breakfast Spot
1,Praga Południe,178665,52.237396,21.071258,2,Café,Italian Restaurant,Restaurant,Pizza Place,Polish Restaurant,Bistro,Mexican Restaurant,Sandwich Place,Sushi Restaurant,Indian Restaurant
2,Ursynów,145938,52.141039,21.032321,2,Café,Pizza Place,Italian Restaurant,Chinese Restaurant,Sushi Restaurant,Burger Joint,Indian Restaurant,Eastern European Restaurant,Restaurant,Fast Food Restaurant
3,Wola,137519,52.236238,20.954781,2,Café,Italian Restaurant,Pizza Place,Sushi Restaurant,Chinese Restaurant,Korean Restaurant,Bakery,Bistro,Indian Restaurant,Eastern European Restaurant
4,Bielany,132683,52.294652,20.92998,0,Café,Pizza Place,Asian Restaurant,Indian Restaurant,Polish Restaurant,Fast Food Restaurant,Italian Restaurant,Deli / Bodega,Food Court,Bakery


<h3>Cluster map</h3>

In [32]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(warsaw_merged['Latitude'], warsaw_merged['Longitude'], 
                                  warsaw_merged['District'], warsaw_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [33]:
warsaw_merged[warsaw_merged['Cluster Labels'] == 0]

Unnamed: 0,District,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Bielany,132683,52.294652,20.92998,0,Café,Pizza Place,Asian Restaurant,Indian Restaurant,Polish Restaurant,Fast Food Restaurant,Italian Restaurant,Deli / Bodega,Food Court,Bakery
5,Targówek,123278,52.275192,21.058085,0,Café,Fast Food Restaurant,Pizza Place,Bistro,Diner,Eastern European Restaurant,Kebab Restaurant,Burger Joint,Sushi Restaurant,Polish Restaurant
7,Bemowo,115873,52.238974,20.913288,0,Café,Pizza Place,Italian Restaurant,Fast Food Restaurant,Indian Restaurant,Sushi Restaurant,Food Court,Restaurant,Asian Restaurant,Bakery
8,Białołęka,96588,52.319665,21.021177,0,Fast Food Restaurant,Asian Restaurant,Café,Pizza Place,Diner,Restaurant,Mediterranean Restaurant,Vietnamese Restaurant,Italian Restaurant,Deli / Bodega
10,Wawer,69896,52.220358,21.137083,0,Fast Food Restaurant,Café,Pizza Place,Bistro,Sushi Restaurant,Italian Restaurant,American Restaurant,Asian Restaurant,Bakery,Restaurant


In [34]:
warsaw_merged[warsaw_merged['Cluster Labels'] == 1]

Unnamed: 0,District,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,Rembertów,23280,52.261415,21.162819,1,Bakery,Eastern European Restaurant,Sushi Restaurant,Pizza Place,Vietnamese Restaurant,Fast Food Restaurant,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop


In [35]:
warsaw_merged[warsaw_merged['Cluster Labels'] == 2]

Unnamed: 0,District,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Mokotów,220682,52.193987,21.045781,2,Café,Italian Restaurant,Burger Joint,Sushi Restaurant,Pizza Place,Restaurant,Polish Restaurant,Diner,Asian Restaurant,Breakfast Spot
1,Praga Południe,178665,52.237396,21.071258,2,Café,Italian Restaurant,Restaurant,Pizza Place,Polish Restaurant,Bistro,Mexican Restaurant,Sandwich Place,Sushi Restaurant,Indian Restaurant
2,Ursynów,145938,52.141039,21.032321,2,Café,Pizza Place,Italian Restaurant,Chinese Restaurant,Sushi Restaurant,Burger Joint,Indian Restaurant,Eastern European Restaurant,Restaurant,Fast Food Restaurant
3,Wola,137519,52.236238,20.954781,2,Café,Italian Restaurant,Pizza Place,Sushi Restaurant,Chinese Restaurant,Korean Restaurant,Bakery,Bistro,Indian Restaurant,Eastern European Restaurant
6,Śródmieście,122646,52.23281,21.019067,2,Café,Italian Restaurant,Vegetarian / Vegan Restaurant,Sushi Restaurant,Polish Restaurant,Pizza Place,Bistro,Mexican Restaurant,Modern European Restaurant,Restaurant
9,Ochota,84990,52.212225,20.97263,2,Café,Italian Restaurant,Sushi Restaurant,Pizza Place,American Restaurant,Vietnamese Restaurant,Korean Restaurant,Bistro,Indian Restaurant,Eastern European Restaurant
11,Praga Północ,69510,52.264884,21.027344,2,Café,Polish Restaurant,Italian Restaurant,Pizza Place,Sushi Restaurant,Restaurant,Thai Restaurant,Burger Joint,Vegetarian / Vegan Restaurant,Bistro
13,Żoliborz,48342,52.267594,20.979698,2,Café,Italian Restaurant,Burger Joint,Bakery,Polish Restaurant,Vegetarian / Vegan Restaurant,Thai Restaurant,Bistro,Sushi Restaurant,Diner
14,Włochy,38075,52.186109,20.948438,2,Italian Restaurant,Fast Food Restaurant,Café,Restaurant,Sushi Restaurant,Bistro,Pizza Place,Turkish Restaurant,Asian Restaurant,Chinese Restaurant
15,Wilanów,23960,52.153083,21.110441,2,Italian Restaurant,Burger Joint,Restaurant,Eastern European Restaurant,Café,Japanese Restaurant,Mediterranean Restaurant,Diner,Doner Restaurant,Pizza Place


In [36]:
warsaw_merged[warsaw_merged['Cluster Labels'] == 3]

Unnamed: 0,District,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,Wesoła,22811,52.251794,21.229276,3,Pizza Place,Bakery,Sushi Restaurant,Cafeteria,Fast Food Restaurant,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant


In [37]:
warsaw_merged[warsaw_merged['Cluster Labels'] == 4]

Unnamed: 0,District,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Ursus,53755,52.196098,20.882899,4,Italian Restaurant,Pizza Place,Fast Food Restaurant,Restaurant,Food Truck,Mexican Restaurant,Middle Eastern Restaurant,Vietnamese Restaurant,Buffet,Fish & Chips Shop


Anylysis shows that we can distinguish Warsaw districts in terms of resturants concentration. The best districts to open restraunt in Warsaw are in cluster 2, which consists of the most populous districts. It is recommended to open restaurants that are in in the middle of the most common venues in each district, because they are not too saturated.