##Comparing borough venues in NA and Moscow

**Introduction**.   
Let’s suggest I would like to move from Moscow to NY and I would like to choose the similar borough when I live now. So I need to analyze boroughs in both cities and choose the the most similar that I live now. I assume realtors and headhunters might be interested in this solution because it can help to solve relocation problem.  
**Data**
I have found json data for Moscow district in open source https://gis-lab.info/. The data are represented by the coordinates of the districts with their names. I had to clean data because the json file was represented by the shape but I need only centers of the districts.  
For New York data I have choose existing data which was used in the lab. They are represented the same way as Moscow district data.  
To compare the districts of the two cities, I used Foursquare location data about cities venues.   
**Methodology**
After preparing data I choose k-means clustering to distribute the boroughs into groups. I was trying to set different count of clusters and I finally choose 8 groups. After clustering, I displayed the results on a map to visually assess the similarity of boroughs of two cities.


In [None]:
#import all libraries
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_rows', 10)

import json # library to handle JSON files

#!pip install geopy 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!pip install folium
import folium # map rendering library


<a id='item1'></a>


##Getting data

In [None]:
#for NY I use an existing data
!wget -q -O 'newyork_data.json' https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
neighborhoods_data = newyork_data['features']

# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
# instantiate the dataframe
nyneighborhoods = pd.DataFrame(columns=column_names)

In [None]:
#loading data from json
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    nyneighborhoods = nyneighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [None]:
#for Moscow I have found this geodata:
!wget -q -O 'moscow_data.geojson' http://gis-lab.info/data/mos-adm/mo.geojson
with open('moscow_data.geojson') as json_data:
    moscow_data = json.load(json_data)

neighborhoods_data = moscow_data['features']
# instantiate the dataframe
moneighborhoods = pd.DataFrame(columns=column_names)

In [None]:
str = 'in comparison to dogs, cats have not undergone major changes during the domestication process.'
a = str.split('[^a-z]', t)

NameError: ignored

In [None]:
#loading and clearingdata from json
for data in neighborhoods_data:
 # while (n<100):
    borough = neighborhood_name = data['properties']['NAME_AO'] 
    neighborhood_name = data['properties']['NAME']
        
    neighborhood_latlon = data['geometry']['coordinates']
    #print(neighborhood_latlon,'\n','len= ',len(neighborhood_latlon))  
    df = pd.DataFrame(neighborhood_latlon)
    list1=df[0][0]
    #print('list1',list1)
    i=1
    while (isinstance(list1[0],list)):
      #print('i= ',i,' ',list1[0],'\n')
      list1=list1[0]
      i+=1
    columns=['Latitude','Longitude'] 
    neighborhood_lat = list1[1]
    neighborhood_lon = list1[0]
    moneighborhoods = moneighborhoods.append({'Borough': borough,
                                              'Neighborhood': neighborhood_name,
                                              'Latitude': neighborhood_lat,
                                              'Longitude': neighborhood_lon}, ignore_index=True)

In [None]:
nyneighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585
...,...,...,...,...
301,Manhattan,Hudson Yards,40.756658,-74.000111
302,Queens,Hammels,40.587338,-73.805530
303,Queens,Bayswater,40.611322,-73.765968
304,Queens,Queensbridge,40.756091,-73.945631


#Maps for NY and Moscow

In [None]:
#create map for Moscow boroughs
address = 'Moscow'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
molatitude = location.latitude
molongitude = location.longitude

map_moscow = folium.Map(location=[molatitude, molongitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(moneighborhoods['Latitude'], moneighborhoods['Longitude'], moneighborhoods['Borough'], moneighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_moscow)  
    
map_moscow

In [None]:
#create map for NY borough
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(nyneighborhoods['Latitude'], nyneighborhoods['Longitude'], nyneighborhoods['Borough'], nyneighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


##Getting data from Foursqare


In [None]:
CLIENT_ID ='0VOX2RJVTYH4GNAL32VUJDUBGJ2PZNXNBBUZ3XDV03DEJ3TK' #'Q112VBXVEKS5WUG4ZXRI5GUDHZYVEYB1GD151PK5LBRMAFKY' # your Foursquare ID
CLIENT_SECRET ='PDOC3DDMLTTZEB1ZYTKSETM3YBAQ4JIKB3PQBYXB3S5IMARA' #'KGA2CQJP4GFA4W5N1W0EN1TZEL2X4WLIKADAUKQG1U3I5KNM' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value


In [None]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
#moscow places from Foursquare
mo_venues = getNearbyVenues(names=moneighborhoods['Neighborhood'],
                                   latitudes=moneighborhoods['Latitude'],
                                   longitudes=moneighborhoods['Longitude']
                                  )

In [None]:
#NY places from Foursquare
ny_venues = getNearbyVenues(names=nyneighborhoods['Neighborhood'],
                                   latitudes=nyneighborhoods['Latitude'],
                                   longitudes=nyneighborhoods['Longitude']
                                  )

In [None]:
#check Moscow results
print(mo_venues.shape)
mo_venues.head()

(1874, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Филёвский Парк,55.74821,37.42765,Hills,55.749205,37.423439,Mediterranean Restaurant
1,Филёвский Парк,55.74821,37.42765,Спортивно-экологический комплекс «Лата Трэк»,55.750687,37.426734,Ski Area
2,Филёвский Парк,55.74821,37.42765,Андерсон,55.74649,37.422141,Café
3,Филёвский Парк,55.74821,37.42765,ЧЕТЫРЕ ЛАПЫ,55.746892,37.421252,Pet Store
4,Филёвский Парк,55.74821,37.42765,Горнолыжная База ЦСКА,55.744827,37.432187,Ski Area


In [None]:
mo_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"""Мосрентген""",3,3,3,3,3,3
Академический,30,30,30,30,30,30
Алексеевский,10,10,10,10,10,10
Алтуфьевский,11,11,11,11,11,11
Арбат,19,19,19,19,19,19
...,...,...,...,...,...,...
Южное Тушино,4,4,4,4,4,4
Южнопортовый,18,18,18,18,18,18
Якиманка,21,21,21,21,21,21
Ярославский,13,13,13,13,13,13


In [None]:
#check NY results
print(ny_venues.shape)
ny_venues.head()

(10068, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Wakefield,40.894705,-73.847201,Lollipops Gelato,40.894123,-73.845892,Dessert Shop
1,Wakefield,40.894705,-73.847201,Rite Aid,40.896649,-73.844846,Pharmacy
2,Wakefield,40.894705,-73.847201,Walgreens,40.896528,-73.8447,Pharmacy
3,Wakefield,40.894705,-73.847201,Carvel Ice Cream,40.890487,-73.848568,Ice Cream Shop
4,Wakefield,40.894705,-73.847201,Dunkin',40.890459,-73.849089,Donut Shop


In [None]:
(190**4 - 120/0.3)*1780

2319713088000.0

<a id='item3'></a>


In [None]:
#combine NY and Moscow places for further analysis
mony=mo_venues.append(ny_venues)
mony.reset_index(inplace=True,drop=True)
mony

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Филёвский Парк,55.748210,37.42765,Hills,55.749205,37.423439,Mediterranean Restaurant
1,Филёвский Парк,55.748210,37.42765,Спортивно-экологический комплекс «Лата Трэк»,55.750687,37.426734,Ski Area
2,Филёвский Парк,55.748210,37.42765,Андерсон,55.746490,37.422141,Café
3,Филёвский Парк,55.748210,37.42765,ЧЕТЫРЕ ЛАПЫ,55.746892,37.421252,Pet Store
4,Филёвский Парк,55.748210,37.42765,Горнолыжная База ЦСКА,55.744827,37.432187,Ski Area
...,...,...,...,...,...,...,...
11937,Fox Hills,40.617311,-74.08174,SUBWAY,40.618939,-74.082881,Sandwich Place
11938,Fox Hills,40.617311,-74.08174,Mona's Cuisine,40.618282,-74.084975,African Restaurant
11939,Fox Hills,40.617311,-74.08174,Nettys playhouse,40.616856,-74.077566,Playground
11940,Fox Hills,40.617311,-74.08174,MTA Bus - Targee St & Vanderbilt Av (S74/S76),40.614856,-74.084598,Bus Stop


In [None]:
# one hot encoding
mony_onehot = pd.get_dummies(mony[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
mony_onehot['Neighborhood'] = mony['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [mony_onehot.columns[-1]] + list(mony_onehot.columns[:-1])
mony_onehot = mony_onehot[fixed_columns]

mony_onehot.head()

Unnamed: 0,Zoo Exhibit,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport Terminal,American Restaurant,Amphitheater,Antique Shop,Aquarium,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Auditorium,Australian Restaurant,Austrian Restaurant,Auto Dealership,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Bath House,Beach,Beach Bar,Bed & Breakfast,...,Theme Park,Theme Park Ride / Attraction,Thrift / Vintage Store,Tibetan Restaurant,Tiki Bar,Toll Plaza,Tourist Information Center,Toy / Game Store,Track,Trail,Train,Train Station,Tram Station,Trattoria/Osteria,Tree,Tunnel,Turkish Restaurant,Udon Restaurant,Ukrainian Restaurant,Used Bookstore,Vape Store,Varenyky restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Volcano,Volleyball Court,Warehouse Store,Waste Facility,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [None]:
mony_onehot['Neighborhood']

0        Филёвский Парк
1        Филёвский Парк
2        Филёвский Парк
3        Филёвский Парк
4        Филёвский Парк
              ...      
11937         Fox Hills
11938         Fox Hills
11939         Fox Hills
11940         Fox Hills
11941         Fox Hills
Name: Neighborhood, Length: 11942, dtype: object

And let's examine the new dataframe size.


In [None]:
mony_onehot.shape

(11942, 482)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category


In [None]:
mony_grouped = mony_onehot.groupby('Neighborhood').mean().reset_index()
mony_grouped

Unnamed: 0,Neighborhood,Zoo Exhibit,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport Terminal,American Restaurant,Amphitheater,Antique Shop,Aquarium,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Auditorium,Australian Restaurant,Austrian Restaurant,Auto Dealership,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Bath House,Beach,Beach Bar,...,Theme Park,Theme Park Ride / Attraction,Thrift / Vintage Store,Tibetan Restaurant,Tiki Bar,Toll Plaza,Tourist Information Center,Toy / Game Store,Track,Trail,Train,Train Station,Tram Station,Trattoria/Osteria,Tree,Tunnel,Turkish Restaurant,Udon Restaurant,Ukrainian Restaurant,Used Bookstore,Vape Store,Varenyky restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Volcano,Volleyball Court,Warehouse Store,Waste Facility,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"""Мосрентген""",0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.00,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0
1,Allerton,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.00,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0
2,Annadale,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.00,0.0,0.0,0.0,0.0,0.000000,0.0,0.083333,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0
3,Arden Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.00,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0
4,Arlington,0.0,0.0,0.0,0.0,0.0,0.0,0.200000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.00,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
425,Южное Тушино,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.25,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0
426,Южнопортовый,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.00,0.0,0.0,0.0,0.0,0.055556,0.0,0.000000,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0
427,Якиманка,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.00,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0
428,Ярославский,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.00,0.0,0.0,0.0,0.0,0.000000,0.0,0.076923,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.000000,0.0


#### Let's confirm the new size


In [None]:
mony_grouped.shape

(430, 482)

#### Let's put that into a _pandas_ dataframe


First, let's write a function to sort the venues in descending order.


In [None]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.


In [None]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = mony_grouped['Neighborhood']

for ind in np.arange(mony_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(mony_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"""Мосрентген""",Shopping Mall,Middle Eastern Restaurant,Café,Yoga Studio,Exhibit,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant,Farm
1,Allerton,Pizza Place,Deli / Bodega,Chinese Restaurant,Spa,Supermarket,Breakfast Spot,Bus Station,Check Cashing Service,Pharmacy,Fast Food Restaurant
2,Annadale,Pizza Place,Pharmacy,Diner,Bar,Liquor Store,Restaurant,American Restaurant,Cosmetics Shop,Park,Train Station
3,Arden Heights,Pool,Pharmacy,Smoke Shop,Pizza Place,Coffee Shop,Event Service,Event Space,Exhibit,Eye Doctor,Fabric Shop
4,Arlington,Bus Stop,Intersection,American Restaurant,Deli / Bodega,Yoga Studio,Farmers Market,Film Studio,Filipino Restaurant,Field,Fast Food Restaurant
...,...,...,...,...,...,...,...,...,...,...,...
425,Южное Тушино,Café,Auto Workshop,Hookah Bar,Park,Fish & Chips Shop,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant,Farm
426,Южнопортовый,Pharmacy,Café,Bath House,Beer Bar,Clothing Store,Miscellaneous Shop,Bakery,Health Food Store,Coffee Shop,Paper / Office Supplies Store
427,Якиманка,Café,Outdoor Gym,Arcade,Science Museum,Art Museum,Stables,Music Venue,Surf Spot,Theater,Theme Park
428,Ярославский,Big Box Store,Clothing Store,Food & Drink Shop,Bar,Wine Shop,Auto Dealership,Roof Deck,Supermarket,Bus Line,Arcade


<a id='item4'></a>


#Cluster Neighborhoods



In [None]:
# set number of clusters
kclusters = 8

mony_grouped_clustering = mony_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(mony_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [None]:
neighborhoods_venues_sorted

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,7,"""Мосрентген""",Shopping Mall,Middle Eastern Restaurant,Café,Yoga Studio,Exhibit,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant,Farm
1,1,Allerton,Pizza Place,Deli / Bodega,Chinese Restaurant,Spa,Supermarket,Breakfast Spot,Bus Station,Check Cashing Service,Pharmacy,Fast Food Restaurant
2,1,Annadale,Pizza Place,Pharmacy,Diner,Bar,Liquor Store,Restaurant,American Restaurant,Cosmetics Shop,Park,Train Station
3,1,Arden Heights,Pool,Pharmacy,Smoke Shop,Pizza Place,Coffee Shop,Event Service,Event Space,Exhibit,Eye Doctor,Fabric Shop
4,3,Arlington,Bus Stop,Intersection,American Restaurant,Deli / Bodega,Yoga Studio,Farmers Market,Film Studio,Filipino Restaurant,Field,Fast Food Restaurant
...,...,...,...,...,...,...,...,...,...,...,...,...
425,0,Южное Тушино,Café,Auto Workshop,Hookah Bar,Park,Fish & Chips Shop,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant,Farm
426,7,Южнопортовый,Pharmacy,Café,Bath House,Beer Bar,Clothing Store,Miscellaneous Shop,Bakery,Health Food Store,Coffee Shop,Paper / Office Supplies Store
427,7,Якиманка,Café,Outdoor Gym,Arcade,Science Museum,Art Museum,Stables,Music Venue,Surf Spot,Theater,Theme Park
428,7,Ярославский,Big Box Store,Clothing Store,Food & Drink Shop,Bar,Wine Shop,Auto Dealership,Roof Deck,Supermarket,Bus Line,Arcade


Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.


In [None]:

mony_merged = moneighborhoods.append(nyneighborhoods)
mony_merged.reset_index(inplace=True,drop=True)
mony_merged

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Троицкий,Киевский,55.440830,36.803100
1,Западный,Филёвский Парк,55.748210,37.427650
2,Троицкий,Новофёдоровское,55.451620,36.803570
3,Троицкий,Роговское,55.241390,36.937240
4,Новомосковский,"""Мосрентген""",55.627310,37.439560
...,...,...,...,...
447,Manhattan,Hudson Yards,40.756658,-74.000111
448,Queens,Hammels,40.587338,-73.805530
449,Queens,Bayswater,40.611322,-73.765968
450,Queens,Queensbridge,40.756091,-73.945631


In [None]:
# merge to add latitude/longitude for each neighborhood
mony_merged = mony_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

mony_merged

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Троицкий,Киевский,55.440830,36.803100,,,,,,,,,,,
1,Западный,Филёвский Парк,55.748210,37.427650,7.0,Italian Restaurant,Ski Area,Gym,Mediterranean Restaurant,Garden,Hookah Bar,Café,Gastropub,BBQ Joint,Spa
2,Троицкий,Новофёдоровское,55.451620,36.803570,,,,,,,,,,,
3,Троицкий,Роговское,55.241390,36.937240,7.0,Nightclub,Farm,Volleyball Court,Soccer Field,Yoga Studio,Fish & Chips Shop,Exhibit,Eye Doctor,Fabric Shop,Factory
4,Новомосковский,"""Мосрентген""",55.627310,37.439560,7.0,Shopping Mall,Middle Eastern Restaurant,Café,Yoga Studio,Exhibit,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant,Farm
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
447,Manhattan,Hudson Yards,40.756658,-74.000111,7.0,Gym / Fitness Center,American Restaurant,Hotel,Italian Restaurant,Café,Dog Run,Coffee Shop,Gym,Park,Restaurant
448,Queens,Hammels,40.587338,-73.805530,7.0,Beach,Fried Chicken Joint,Bus Stop,Shoe Store,Deli / Bodega,Dog Run,Food Truck,Gym / Fitness Center,Diner,Fast Food Restaurant
449,Queens,Bayswater,40.611322,-73.765968,7.0,Construction & Landscaping,Playground,Tennis Court,Yoga Studio,Fish & Chips Shop,Exhibit,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant
450,Queens,Queensbridge,40.756091,-73.945631,7.0,Hotel,Athletics & Sports,Hotel Bar,Basketball Court,Beer Garden,Cocktail Bar,Performing Arts Venue,Scenic Lookout,Sandwich Place,Park


In [None]:
mony_merged.dropna(axis=0,how='any',inplace=True)
mony_merged

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Западный,Филёвский Парк,55.748210,37.427650,7.0,Italian Restaurant,Ski Area,Gym,Mediterranean Restaurant,Garden,Hookah Bar,Café,Gastropub,BBQ Joint,Spa
3,Троицкий,Роговское,55.241390,36.937240,7.0,Nightclub,Farm,Volleyball Court,Soccer Field,Yoga Studio,Fish & Chips Shop,Exhibit,Eye Doctor,Fabric Shop,Factory
4,Новомосковский,"""Мосрентген""",55.627310,37.439560,7.0,Shopping Mall,Middle Eastern Restaurant,Café,Yoga Studio,Exhibit,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant,Farm
5,Троицкий,Вороновское,55.354850,36.970080,6.0,Tree,Yoga Studio,Fish & Chips Shop,Event Space,Exhibit,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant,Farm
9,Зеленоградский,Матушкино,56.007950,37.178530,0.0,Auto Workshop,Convenience Store,Flower Shop,Yoga Studio,Fish & Chips Shop,Exhibit,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
447,Manhattan,Hudson Yards,40.756658,-74.000111,7.0,Gym / Fitness Center,American Restaurant,Hotel,Italian Restaurant,Café,Dog Run,Coffee Shop,Gym,Park,Restaurant
448,Queens,Hammels,40.587338,-73.805530,7.0,Beach,Fried Chicken Joint,Bus Stop,Shoe Store,Deli / Bodega,Dog Run,Food Truck,Gym / Fitness Center,Diner,Fast Food Restaurant
449,Queens,Bayswater,40.611322,-73.765968,7.0,Construction & Landscaping,Playground,Tennis Court,Yoga Studio,Fish & Chips Shop,Exhibit,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant
450,Queens,Queensbridge,40.756091,-73.945631,7.0,Hotel,Athletics & Sports,Hotel Bar,Basketball Court,Beer Garden,Cocktail Bar,Performing Arts Venue,Scenic Lookout,Sandwich Place,Park


In [None]:
mony_merged.astype({'Cluster Labels': 'int32'},copy=False).dtypes
mony_merged

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Западный,Филёвский Парк,55.748210,37.427650,7.0,Italian Restaurant,Ski Area,Gym,Mediterranean Restaurant,Garden,Hookah Bar,Café,Gastropub,BBQ Joint,Spa
3,Троицкий,Роговское,55.241390,36.937240,7.0,Nightclub,Farm,Volleyball Court,Soccer Field,Yoga Studio,Fish & Chips Shop,Exhibit,Eye Doctor,Fabric Shop,Factory
4,Новомосковский,"""Мосрентген""",55.627310,37.439560,7.0,Shopping Mall,Middle Eastern Restaurant,Café,Yoga Studio,Exhibit,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant,Farm
5,Троицкий,Вороновское,55.354850,36.970080,6.0,Tree,Yoga Studio,Fish & Chips Shop,Event Space,Exhibit,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant,Farm
9,Зеленоградский,Матушкино,56.007950,37.178530,0.0,Auto Workshop,Convenience Store,Flower Shop,Yoga Studio,Fish & Chips Shop,Exhibit,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
447,Manhattan,Hudson Yards,40.756658,-74.000111,7.0,Gym / Fitness Center,American Restaurant,Hotel,Italian Restaurant,Café,Dog Run,Coffee Shop,Gym,Park,Restaurant
448,Queens,Hammels,40.587338,-73.805530,7.0,Beach,Fried Chicken Joint,Bus Stop,Shoe Store,Deli / Bodega,Dog Run,Food Truck,Gym / Fitness Center,Diner,Fast Food Restaurant
449,Queens,Bayswater,40.611322,-73.765968,7.0,Construction & Landscaping,Playground,Tennis Court,Yoga Studio,Fish & Chips Shop,Exhibit,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant
450,Queens,Queensbridge,40.756091,-73.945631,7.0,Hotel,Athletics & Sports,Hotel Bar,Basketball Court,Beer Garden,Cocktail Bar,Performing Arts Venue,Scenic Lookout,Sandwich Place,Park


In [None]:
mony_merged.dtypes

Borough                    object
Neighborhood               object
Latitude                  float64
Longitude                 float64
Cluster Labels            float64
                           ...   
6th Most Common Venue      object
7th Most Common Venue      object
8th Most Common Venue      object
9th Most Common Venue      object
10th Most Common Venue     object
Length: 15, dtype: object

Create maps


In [None]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10) #,width='50%', height='100%', left='0%', top='0%')

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(mony_merged['Latitude'], mony_merged['Longitude'], mony_merged['Neighborhood'], mony_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)


momap_clusters = folium.Map(location=[molatitude, molongitude], zoom_start=10) #,width='50%', height='100%', left='50%', top='0%')

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(mony_merged['Latitude'], mony_merged['Longitude'], mony_merged['Neighborhood'], mony_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(momap_clusters)

In [None]:
#and finally visualise two maps side by side
from IPython.core.display import display, HTML

htmlmap = HTML('<iframe srcdoc="{}" style="float:left; width: {}px; height: {}px; display:inline-block; width: 49%; margin: 0 auto; border: 2px solid black"></iframe>'
           '<iframe srcdoc="{}" style="float:right; width: {}px; height: {}px; display:inline-block; width: 49%; margin: 0 auto; border: 2px solid black"></iframe>'
           .format(momap_clusters.get_root().render().replace('"', '&quot;'),500,500,
                   map_clusters.get_root().render().replace('"', '&quot;'),500,500))
display(htmlmap)

In [None]:
momap_clusters

In [None]:
map_clusters

In [None]:
#check labeled boroughs
mony_merged.groupby('Cluster Labels').nunique()

Unnamed: 0_level_0,Borough,Neighborhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Cluster Labels,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
0.0,6,17,16,16,5,14,15,14,12,11,8,7,8,8
1.0,7,187,188,188,51,60,80,79,88,77,95,93,91,79
2.0,3,3,3,3,1,1,1,1,1,1,1,1,1,1
3.0,11,24,24,24,12,22,16,20,18,20,19,14,14,13
4.0,1,2,2,2,1,2,2,2,2,2,2,2,2,2
5.0,3,5,5,5,4,5,4,5,5,4,3,4,3,2
6.0,1,1,1,1,1,1,1,1,1,1,1,1,1,1
7.0,17,191,194,194,84,105,99,98,100,97,101,98,100,102


In [None]:
#check cluster with one resident
mony_merged[mony_merged['Cluster Labels']==6.0]

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Троицкий,Вороновское,55.35485,36.97008,6.0,Tree,Yoga Studio,Fish & Chips Shop,Event Space,Exhibit,Eye Doctor,Fabric Shop,Factory,Falafel Restaurant,Farm
