# Battle of Neighbourhoods in Manhattan


### About Manhattan

`Manhattan` is a major central city for diversity since many people from different cultural atmospheres have brought their families and dreams to Manhattan. The city has consistently seen people from around the world move to the city and call it home. It has been a center for trade and economic growth. Manhattan is known world wide as a cultural melting pot. While other states have had immigration surges, none have compared to the diversity and sheer number of immigrants that have made their way to the City. So these number of varies cultures combined to create a great diversity for itself. Since People from all over the world tend to come up here, we can see some many of their cultural aspects Transport, Food, Clothing, and so on…

### Problem Description

`Restaurant` is a place where people come to have food and drinks for a cost, People love to do many things and try something new or stick with their own routines, it depends on the individual and there are so many of them with different cultural and various aspects in Manhattan. There are so many cuisines, which is based on the style of cooking, the ingredients, dishes and techniques. For our problem lets stick with Indian cuisine.

> Let’s assume in this one of the world’s most diverse region we want to open an Indian restaurant, so what are all the factors we have to take into account such as follows,
* Market Places
* Competition in particular location
* Aiding places that make people come to restaurants like Gym,  * Entertaining Public places
* Population
* Menu from competitors
> 
> And so on… So our solution needs to be data driven for avoiding or considering low risk criteria and high success rate and thus apply our overall knowledge in the techniques and the tools gained so far in this course.

Here `FourSquare APIs` are used to explore neighbourhoods in Manhattan. To get started with `FourSquare APIs` follow the official 🗺 [guide](https://developer.foursquare.com/docs/places-api/getting-started/)

In [1]:
import json

import requests

import numpy as np
import pandas as pd
from matplotlib import cm, colors

import folium
from geopy.geocoders import Nominatim

from sklearn.cluster import KMeans

In [2]:
# Getting co-ordinates of locations
def get_newyork_coordinates():
    with open('../input/newyork-data/newyork_data.json') as json_data:
        newyork_data = json.load(json_data)
    return newyork_data


newyork_data = get_newyork_coordinates()
newyork_data.keys()

dict_keys(['type', 'totalFeatures', 'features', 'crs', 'bbox'])

**All the relevant data is in the `features key`, which is basically a list of the neighbourhoods**

In [3]:
neighbourhoods_data = newyork_data['features']
neighbourhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

**Transforming the data to pandas dataframe**

In [4]:
def create_neighbourhood_df(neighbourhoods_data):
    columns = ['Borough', 'Neighbourhood', 'Latitude', 'Longitude']

    # Creating empty dataframe
    df = pd.DataFrame(columns=columns)

    # Populating the dataframe
    for data in neighbourhoods_data:
        borough = data['properties']['borough']
        neighbourhood_name = data['properties']['name']

        neighbourhood_coordinates = data['geometry']['coordinates']
        neighbourhood_lat = neighbourhood_coordinates[1]
        neighbourhood_long = neighbourhood_coordinates[0]

        df = df.append(
            {
                'Borough': borough,
                'Neighbourhood': neighbourhood_name,
                'Latitude': neighbourhood_lat,
                'Longitude': neighbourhood_long
            },
            ignore_index=True
        )

    return df

In [5]:
df = create_neighbourhood_df(neighbourhoods_data)
print(f'Dataset size: {len(df)}')
df.head()

Dataset size: 306


Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [6]:
unique_borough = df.Borough.unique()
num_of_neighbourhood = len(df)

print(f'There are {len(unique_borough)} borough & {num_of_neighbourhood} neighbourhoods')
print(f"Unique borough names: {', '.join(unique_borough)}")

There are 5 borough & 306 neighbourhoods
Unique borough names: Bronx, Manhattan, Brooklyn, Queens, Staten Island


**Using `geopy library` to get the latitude and longitude values of New York City**

In [7]:
def get_coordinates(address):
    geolocater = Nominatim(user_agent='ny_explorer')
    location = geolocater.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    return latitude, longitude


latitude, longitude = get_coordinates('New York City, NY')
print(f'Geographical Coordinates of New York City are {latitude}, {longitude}')

Geographical Coordinates of New York City are 40.7127281, -74.0060152


**Creating a `map` of New York with neighborhoods superimposed on top**

In [8]:
def plot_folium_map_for_neighbourhood(df, latitude, longitude):
    _map = folium.Map(location=[latitude, longitude], zoom_start=10)

    # Adding markers on top
    for lat, lng, borough, neigh in zip(
        df['Latitude'], df['Longitude'], df['Borough'], df['Neighbourhood']
    ):
        label = f'{neigh}, {borough}'
        label = folium.Popup(label, parse_html=True)
        folium.vector_layers.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.5,
            parse_html=False
        ).add_to(_map)

    return _map

In [9]:
# Plot New York map
plot_folium_map_for_neighbourhood(df, latitude, longitude)

## Going for Manhattan

In [10]:
# Going for Manhattan
mnh_df = df[df.Borough == 'Manhattan'].reset_index(drop=True)
mnh_df.head()

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


**Getting `co-ordinates` of Manhattan**

In [11]:
latitude, longitude = get_coordinates('Manhattan, NY')
print(f'Geographical Coordinates of Manhattan are {latitude}, {longitude}')

Geographical Coordinates of Manhattan are 40.7896239, -73.9598939


**`Visualizing` Manhattan**

In [12]:
plot_folium_map_for_neighbourhood(mnh_df, latitude, longitude)

Defining `Foursquare API` credentials. The keys below are not functional after this notebook is made public. You can create your own Foursquare API keys by following the `official` [guide](https://developer.foursquare.com/docs/places-api/getting-started/).

In [13]:
FOURSQUARE_API_CONFIG = {
    'client_id': 'LI5KWZXBMDQIF5MXN5LKRC3IM4JJSNRPUICKKCIIF2D01AKE',
    'client_secret': 'RBX03LVDQQNDJD5JIPL3PGSJ1VXG2P3ZIXS4HOOOQ1E33ZEU',
    'version': '20180605'  # Foursquare API version
}

### Exploring one neighborhood in Manhattan

In [14]:
def get_neighbourhood_info(df, index):
    name = df.loc[index, 'Neighbourhood']
    lat = df.loc[index, 'Latitude']
    lng = df.loc[index, 'Longitude']
    return name, lat, lng


neigh_name, neigh_lat, neigh_lng = get_neighbourhood_info(mnh_df, 0)
print(f'Latitude and Longitude of {neigh_name} are {neigh_lat}, {neigh_lng}')

Latitude and Longitude of Marble Hill are 40.87655077879964, -73.91065965862981


Getting the `top 100 venues` that are in `Marble Hill` within a radius of 500 meters.

In [15]:
def get_neighbourhood_info_using_foursquare(
    api_config, latitude, longitude, limit=100, radius=500
):
    client_id = api_config['client_id']
    client_secret = api_config['client_secret']
    version = api_config['version']

    url = f'https://api.foursquare.com/v2/venues/explore?&client_id={client_id}&client_secret={client_secret}&v={version}&ll={latitude},{longitude}&radius={radius}&limit={limit}'

    results = requests.get(url).json()
    return results


results = get_neighbourhood_info_using_foursquare(
    FOURSQUARE_API_CONFIG, neigh_lat, neigh_lng
)

status_code = results['meta']['code']
print(f"Status code: {status_code}")

if status_code == 200 and results['response']['totalResults'] > 0:
    venue = results['response']['groups'][0]['items'][0]
    print(json.dumps(venue, indent=2, sort_keys=True))

Status code: 200
{
  "reasons": {
    "count": 0,
    "items": [
      {
        "reasonName": "globalInteractionReason",
        "summary": "This spot is popular",
        "type": "general"
      }
    ]
  },
  "referralId": "e-0-4b4429abf964a52037f225e3-0",
  "venue": {
    "categories": [
      {
        "icon": {
          "prefix": "https://ss3.4sqi.net/img/categories_v2/food/pizza_",
          "suffix": ".png"
        },
        "id": "4bf58dd8d48988d1ca941735",
        "name": "Pizza Place",
        "pluralName": "Pizza Places",
        "primary": true,
        "shortName": "Pizza"
      }
    ],
    "delivery": {
      "id": "72548",
      "provider": {
        "icon": {
          "name": "/delivery_provider_seamless_20180129.png",
          "prefix": "https://fastly.4sqi.net/img/general/cap/",
          "sizes": [
            40,
            50
          ]
        },
        "name": "seamless"
      },
      "url": "https://www.seamless.com/menu/arturos-pizza-5189-broadway-ave

**The required information is in `items key`**

In [16]:
# Function to extract the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

**Cleaning the json and structuring it into a pandas dataframe**

In [17]:
def clean_json_response(response):
    venues = response['response']['groups'][0]['items']
    
    # Normalize semi-structured JSON data into a flat table
    nearby_venues = pd.json_normalize(venues) # flatten JSON
    
    # filter columns
    filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
    nearby_venues = nearby_venues.loc[:, filtered_columns]
    
    # filter the category for each row
    nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)
    
    # clean columns
    nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
    
    return nearby_venues
    

def serialize_json_response_into_df(response):
    status_code = response['meta']['code']
    print(f'Status code: {status_code}')
    
    if status_code == 200:
        num_of_results = results['response']['totalResults']
        print(f'Num of results: {num_of_results}')
        if num_of_results > 0:
            df = clean_json_response(response)
            return df
        
        
nearby_venues = serialize_json_response_into_df(results)
nearby_venues.head()

Status code: 200
Num of results: 23


Unnamed: 0,name,categories,lat,lng
0,Arturo's,Pizza Place,40.874412,-73.910271
1,Bikram Yoga,Yoga Studio,40.876844,-73.906204
2,Tibbett Diner,Diner,40.880404,-73.908937
3,Dunkin',Donut Shop,40.877136,-73.906666
4,Starbucks,Coffee Shop,40.877531,-73.905582


In [18]:
print(f'{nearby_venues.shape[0]} venues were returned by Foursquare.')

23 venues were returned by Foursquare.


### Exploring all the Neighborhood in Manhattan

**Function to get venues near a location**

In [19]:
# Normalizes individual response
def normalize_json_response_for_venues(venues_list, response, name, latitude, longitude):
    response = response['response']['groups'][0]['items']

    # Return only relevant information for each nearby venue
    venues_list.append(
        [
            (
                name, latitude, longitude, 
                v['venue']['name'],
                v['venue']['location']['lat'], 
                v['venue']['location']['lng'],
                v['venue']['categories'][0]['name']
            ) for v in response
        ]
    )

    

def get_nearby_venues(api_config, names, latitudes, longitudes, radius=500):
    venues_list = []

    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)

        # Get api request
        response = get_neighbourhood_info_using_foursquare(api_config, lat, lng, radius=radius)

        status_code = response['meta']['code']
        if status_code == 200:
            num_of_results = response['response']['totalResults']
            if num_of_results > 0:
                normalize_json_response_for_venues(venues_list, response, name, lat, lng)
            else:
                print(f'Num of results: {num_of_results} for {name}')
        else:
            print(f'Status code: {status_code} while making request for {name}')

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = [
        'Neighbourhood', 
        'Neighbourhood.Latitude', 
        'Neighbourhood.Longitude',
        'Venue', 
        'Venue Latitude', 
        'Venue Longitude', 
        'Venue Category'
    ]

    return nearby_venues

**Getting nearby neighborhood**

In [20]:
# Running get_nearby_venues function on each neighborhood in Manhattan
mnh_venues = get_nearby_venues(FOURSQUARE_API_CONFIG, mnh_df.Neighbourhood, mnh_df.Latitude, mnh_df.Longitude)

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


In [21]:
print(mnh_venues.shape)
print(f'Num of examples: {len(mnh_venues)}')
mnh_venues.head()

(3242, 7)
Num of examples: 3242


Unnamed: 0,Neighbourhood,Neighbourhood.Latitude,Neighbourhood.Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place
1,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,Yoga Studio
2,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner
3,Marble Hill,40.876551,-73.91066,Dunkin',40.877136,-73.906666,Donut Shop
4,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop


**Grouping venues on the basis of `Neighbourhood`**

In [22]:
mnh_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood.Latitude,Neighbourhood.Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Battery Park City,78,78,78,78,78,78
Carnegie Hill,93,93,93,93,93,93
Central Harlem,45,45,45,45,45,45
Chelsea,100,100,100,100,100,100
Chinatown,100,100,100,100,100,100
Civic Center,100,100,100,100,100,100
Clinton,100,100,100,100,100,100
East Harlem,37,37,37,37,37,37
East Village,100,100,100,100,100,100
Financial District,100,100,100,100,100,100


In [23]:
print(f"There are {len(mnh_venues['Venue Category'].unique())} uniques categories")

There are 332 uniques categories


**Analyze each Neighborhood**

In [24]:
# One Hot Encoding
mnh_ohe = pd.get_dummies(mnh_venues[['Venue Category']], prefix='', prefix_sep='')

# add neighbourhood column back to dataframe
mnh_ohe['Neighbourhood'] = mnh_venues['Neighbourhood']

# move neighbourhood column to the first column
fixed_columns = [mnh_ohe.columns[-1]] + list(mnh_ohe.columns[:-1])
mnh_ohe = mnh_ohe[fixed_columns]

mnh_ohe.head()

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,...,Video Store,Vietnamese Restaurant,Volleyball Court,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [25]:
mnh_ohe.shape

(3242, 333)

**Grouping rows by neighborhood and by taking the mean of the frequency of occurrence of each category**

In [26]:
mnh_group = mnh_ohe.groupby('Neighbourhood').mean().reset_index()
mnh_group.head()

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,...,Video Store,Vietnamese Restaurant,Volleyball Court,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Battery Park City,0.0,0.0,0.0,0.0,0.012821,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.012821,0.0,0.012821,0.0
1,Carnegie Hill,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.010753,0.0,...,0.0,0.010753,0.0,0.0,0.0,0.010753,0.032258,0.0,0.010753,0.032258
2,Central Harlem,0.0,0.0,0.0,0.066667,0.044444,0.0,0.0,0.0,0.022222,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Chelsea,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.05,...,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.01
4,Chinatown,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,...,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01


In [27]:
mnh_group.shape

(40, 333)

**Printing top 5 most common venues**

In [28]:
num_top_venues = 5

for place in mnh_group.Neighbourhood:
    print(f'==== 🚨 {place} 🚨 ====')
    
    tmp = mnh_group[mnh_group.Neighbourhood == place].T.reset_index()
    tmp.columns = ['venue', 'freq']
    tmp = tmp.iloc[1:]
    tmp['freq'] = tmp['freq'].astype(float)
    tmp = tmp.round({'freq': 2})
    
    print(tmp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print()

==== 🚨 Battery Park City 🚨 ====
            venue  freq
0            Park  0.08
1     Coffee Shop  0.06
2  Clothing Store  0.05
3             Gym  0.05
4   Memorial Site  0.04

==== 🚨 Carnegie Hill 🚨 ====
         venue  freq
0  Coffee Shop  0.08
1         Café  0.05
2          Bar  0.04
3  Yoga Studio  0.03
4    Wine Shop  0.03

==== 🚨 Central Harlem 🚨 ====
                  venue  freq
0    African Restaurant  0.07
1    Seafood Restaurant  0.04
2                   Bar  0.04
3  Gym / Fitness Center  0.04
4   Fried Chicken Joint  0.04

==== 🚨 Chelsea 🚨 ====
                 venue  freq
0          Coffee Shop  0.07
1          Art Gallery  0.05
2               Bakery  0.05
3   Italian Restaurant  0.04
4  American Restaurant  0.04

==== 🚨 Chinatown 🚨 ====
                venue  freq
0  Chinese Restaurant  0.08
1              Bakery  0.07
2        Dessert Shop  0.04
3        Cocktail Bar  0.04
4   Hotpot Restaurant  0.04

==== 🚨 Civic Center 🚨 ====
                 venue  freq
0          C

**Function to get most common venues**

In [29]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

**Getting top 10 most common venues in a Neighborhood**

In [30]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neigh_venues_sorted = pd.DataFrame(columns=columns)
neigh_venues_sorted['Neighbourhood'] = mnh_group['Neighbourhood']

for ind in np.arange(mnh_group.shape[0]):
    neigh_venues_sorted.iloc[ind, 1:] = return_most_common_venues(mnh_group.iloc[ind, :], num_top_venues)

neigh_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battery Park City,Park,Coffee Shop,Clothing Store,Gym,Memorial Site,Hotel,Gourmet Shop,Italian Restaurant,Boat or Ferry,Beer Garden
1,Carnegie Hill,Coffee Shop,Café,Bar,Yoga Studio,Wine Shop,Bookstore,French Restaurant,Pizza Place,Gym,Cosmetics Shop
2,Central Harlem,African Restaurant,Seafood Restaurant,Bar,Gym / Fitness Center,Fried Chicken Joint,French Restaurant,Cosmetics Shop,Chinese Restaurant,American Restaurant,Art Gallery
3,Chelsea,Coffee Shop,Art Gallery,Bakery,Italian Restaurant,American Restaurant,Seafood Restaurant,Ice Cream Shop,Wine Shop,French Restaurant,Market
4,Chinatown,Chinese Restaurant,Bakery,Dessert Shop,Cocktail Bar,Hotpot Restaurant,Spa,American Restaurant,Optical Shop,Salon / Barbershop,Shanghai Restaurant


## Cluster Neighborhoods

In [31]:
# Number of clusters
k = 5

You can use the `elbow method` to find a `k` i.e. number of clusters but a better approach is to see your `requirements`

In [32]:
mnh_group_clustering = mnh_group.drop('Neighbourhood', axis=1)
model = KMeans(n_clusters=k, random_state=0).fit(mnh_group_clustering)

model.labels_[:10]

array([1, 1, 1, 2, 1, 1, 1, 0, 1, 1], dtype=int32)

**Adding the labels to dataframe**

In [33]:
neigh_venues_sorted.insert(0, 'Cluster Labels', model.labels_)

mnh_merged = mnh_df
mnh_merged = mnh_merged.join(neigh_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

mnh_merged.head()

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,1,Coffee Shop,Sandwich Place,Discount Store,Gym,Supplement Shop,Steakhouse,Tennis Stadium,Pizza Place,Donut Shop,Video Game Store
1,Manhattan,Chinatown,40.715618,-73.994279,1,Chinese Restaurant,Bakery,Dessert Shop,Cocktail Bar,Hotpot Restaurant,Spa,American Restaurant,Optical Shop,Salon / Barbershop,Shanghai Restaurant
2,Manhattan,Washington Heights,40.851903,-73.9369,0,Café,Bakery,Mobile Phone Shop,Bank,Park,Sandwich Place,Tapas Restaurant,New American Restaurant,Grocery Store,Donut Shop
3,Manhattan,Inwood,40.867684,-73.92121,0,Mexican Restaurant,Café,Restaurant,Lounge,Bakery,Deli / Bodega,Park,Pizza Place,Chinese Restaurant,Caribbean Restaurant
4,Manhattan,Hamilton Heights,40.823604,-73.949688,0,Pizza Place,Café,Coffee Shop,Mexican Restaurant,Deli / Bodega,Yoga Studio,Indian Restaurant,Caribbean Restaurant,Cocktail Bar,Park


**Visualizing the clusters**

In [34]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(mnh_merged['Latitude'], mnh_merged['Longitude'], mnh_merged['Neighbourhood'], mnh_merged['Cluster Labels']):
    label = folium.Popup(f'{poi} Cluster {cluster}', parse_html=True)
    folium.vector_layers.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7
    ).add_to(map_clusters)
       
map_clusters

### Examine clusters

In [35]:
def return_cluster_df(mnh_merged, cluster):
    return mnh_merged.loc[mnh_merged['Cluster Labels'] == cluster, mnh_merged.columns[[1] + list(range(5, mnh_merged.shape[1]))]]

In [36]:
return_cluster_df(mnh_merged, 0)

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Washington Heights,Café,Bakery,Mobile Phone Shop,Bank,Park,Sandwich Place,Tapas Restaurant,New American Restaurant,Grocery Store,Donut Shop
3,Inwood,Mexican Restaurant,Café,Restaurant,Lounge,Bakery,Deli / Bodega,Park,Pizza Place,Chinese Restaurant,Caribbean Restaurant
4,Hamilton Heights,Pizza Place,Café,Coffee Shop,Mexican Restaurant,Deli / Bodega,Yoga Studio,Indian Restaurant,Caribbean Restaurant,Cocktail Bar,Park
7,East Harlem,Mexican Restaurant,Bakery,Thai Restaurant,Park,Deli / Bodega,Latin American Restaurant,Sandwich Place,Steakhouse,Cuban Restaurant,Performing Arts Venue


In [37]:
return_cluster_df(mnh_merged, 1)

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Marble Hill,Coffee Shop,Sandwich Place,Discount Store,Gym,Supplement Shop,Steakhouse,Tennis Stadium,Pizza Place,Donut Shop,Video Game Store
1,Chinatown,Chinese Restaurant,Bakery,Dessert Shop,Cocktail Bar,Hotpot Restaurant,Spa,American Restaurant,Optical Shop,Salon / Barbershop,Shanghai Restaurant
6,Central Harlem,African Restaurant,Seafood Restaurant,Bar,Gym / Fitness Center,Fried Chicken Joint,French Restaurant,Cosmetics Shop,Chinese Restaurant,American Restaurant,Art Gallery
8,Upper East Side,Coffee Shop,Italian Restaurant,Bakery,Exhibit,Gym / Fitness Center,Juice Bar,Pizza Place,French Restaurant,Spa,Boutique
14,Clinton,Italian Restaurant,Theater,Gym / Fitness Center,American Restaurant,Coffee Shop,Cocktail Bar,Hotel,Wine Shop,Sandwich Place,Gym
15,Midtown,Hotel,Clothing Store,Coffee Shop,Theater,Sporting Goods Shop,Bakery,Steakhouse,Sandwich Place,Café,American Restaurant
16,Murray Hill,Coffee Shop,Hotel,Sandwich Place,American Restaurant,Bar,Japanese Restaurant,Burger Joint,Taco Place,Gym / Fitness Center,Thai Restaurant
19,East Village,Bar,Mexican Restaurant,Pizza Place,Italian Restaurant,Ice Cream Shop,Cocktail Bar,Korean Restaurant,Wine Bar,Vegetarian / Vegan Restaurant,Speakeasy
20,Lower East Side,Chinese Restaurant,Café,Latin American Restaurant,Coffee Shop,Bakery,Art Gallery,Ramen Restaurant,Argentinian Restaurant,Performing Arts Venue,Tennis Court
22,Little Italy,Bakery,Café,Bubble Tea Shop,Italian Restaurant,Ice Cream Shop,Salon / Barbershop,Sandwich Place,Coffee Shop,Pizza Place,Hotel


In [38]:
return_cluster_df(mnh_merged, 2)

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Manhattanville,Coffee Shop,Seafood Restaurant,Italian Restaurant,Chinese Restaurant,Sushi Restaurant,Mexican Restaurant,Deli / Bodega,Park,Ramen Restaurant,Check Cashing Service
9,Yorkville,Italian Restaurant,Gym,Bar,Coffee Shop,Sushi Restaurant,Deli / Bodega,Wine Shop,Japanese Restaurant,Bagel Shop,Park
10,Lenox Hill,Italian Restaurant,Cocktail Bar,Sushi Restaurant,Pizza Place,Coffee Shop,Burger Joint,Café,Gym / Fitness Center,Gym,Salon / Barbershop
11,Roosevelt Island,Park,Residential Building (Apartment / Condo),Gym,Greek Restaurant,Supermarket,Bubble Tea Shop,Café,Noodle House,Soccer Field,Outdoors & Recreation
12,Upper West Side,Italian Restaurant,Mediterranean Restaurant,Bar,Bakery,Café,Pizza Place,Coffee Shop,Wine Bar,American Restaurant,Bagel Shop
13,Lincoln Square,Plaza,Concert Hall,Theater,Performing Arts Venue,Café,French Restaurant,American Restaurant,Wine Shop,Italian Restaurant,Indie Movie Theater
17,Chelsea,Coffee Shop,Art Gallery,Bakery,Italian Restaurant,American Restaurant,Seafood Restaurant,Ice Cream Shop,Wine Shop,French Restaurant,Market
18,Greenwich Village,Italian Restaurant,Clothing Store,Sushi Restaurant,Café,Boutique,American Restaurant,Coffee Shop,Dessert Shop,Indian Restaurant,Gym
21,Tribeca,Italian Restaurant,American Restaurant,Park,Spa,Wine Bar,Café,Gym / Fitness Center,Playground,Basketball Court,Bakery
23,Soho,Clothing Store,Italian Restaurant,Coffee Shop,Mediterranean Restaurant,Boutique,Sporting Goods Shop,Men's Store,Women's Store,Bakery,Salon / Barbershop


In [39]:
return_cluster_df(mnh_merged, 3)

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
37,Stuyvesant Town,Park,Bar,Coffee Shop,Boat or Ferry,Farmers Market,Fountain,Gym / Fitness Center,Bistro,Harbor / Marina,Heliport


In [40]:
return_cluster_df(mnh_merged, 4)

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
33,Midtown South,Korean Restaurant,Hotel,Hotel Bar,Japanese Restaurant,Salad Place,Cosmetics Shop,Gym / Fitness Center,Coffee Shop,Dessert Shop,American Restaurant


---