# Manhattan and Downtown Toronto Top Venues Comparison: Is the structure of the two Neighborhoods similar?


# 1. Introduction

**Purpose:** The purpose of the analysis is to compare the Top vanues found in the Boroughs of Manhattan (NYC) and DownTown Toronto (TO), using the foursquare API and the data provided during the course. 
Even though the two Boroughs differs in terms of population density and geographical areas, they are the most central Boroughs of the two cities with similarities in terms of architectural strucuture. Are Top vanues in the two cities showing similar patterns?
Here below the steps of the analysis:


**1.** Import and preprocess the data from the web in order to identify the location and the Neighborhoods belonging to the two Boroughs.


**2.** Run a cluster Analysis on the 2 boroughs and idenfy the clusters showing more density, assuning that we can use these cluster as a sample to generalyse behaviours on trending venues.


**3.** Encode the data, select the top 3 trending venues for each cluster, compute the frequency of each venue and then run correlation and regression analysis to see how the trending venues are similar.


**4.** Analyse and visualise the results.


**5.** Conclusion.

**Target Audience:** Everyone who is interested in exploring the two neighborhoods discovering the top venues in the two cities. The comparison could serve as an idea in order to understand what could be the most attractive Borough to visit, based on individual preferences.

**Source Data used:**


 **1.** Data sources used during the course.
 
 
 **2.** Foursquare API.



# 2. Import and preprocess Data

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')


from bs4 import BeautifulSoup
import requests

ModuleNotFoundError: No module named 'folium'

### 2.1 Importing and preprocessing Toronto Data

In [None]:
url_file = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
soup = BeautifulSoup(url_file.content, "html5lib")
Data = pd.read_html(str(soup.table))[0]
Data = Data.dropna(axis = 0).reset_index(drop=True)
Data.head(10)

In [None]:
GeoData = pd.read_csv('http://cocl.us/Geospatial_data')
Check = GeoData[GeoData['Postal Code'] == 'M3A']
print(Check)

In [None]:
Toronto_Data = Data.merge(GeoData, on = 'Postal Code')
Toronto_Data.head()
Toronto_Data = Toronto_Data[Toronto_Data['Borough'].str.contains('Toronto')]
Toronto_Data.head()

In [None]:
#Setting Toronto Coordinates
lat_TO = 43.651070
lon_TO = -79.347015
map_Toronto = folium.Map(location=[lat_TO, lon_TO], zoom_start=10)


In [None]:
for lat, lng, borough, neighborhood in zip(Toronto_Data['Latitude'], Toronto_Data['Longitude'], Toronto_Data['Borough'], Toronto_Data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Toronto) 
    
map_Toronto

### 2.2 Importing and preprocessing NYC Data

In [None]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
    
newyork_data

In [None]:
neighborhoods_data = newyork_data['features']
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
neighborhoods = pd.DataFrame(columns=column_names)

In [None]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [None]:


map_newyork = folium.Map(location=[40.730610, -73.935242], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

### 2.3 Subset Neighborhoods in Dowtown Toronto

In [None]:
Downtown_Toronto = Toronto_Data[Toronto_Data['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
Downtown_Toronto.head()

In [None]:
#Exploring Neighborhoods in DownTown Toronto
map_Downtown_Toronto = folium.Map(location=[43.6515, -79.3835], zoom_start=10)
for lat, lng, borough, neighborhood in zip(Downtown_Toronto['Latitude'], Downtown_Toronto['Longitude'], Downtown_Toronto['Borough'], Downtown_Toronto['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Downtown_Toronto) 
    
map_Downtown_Toronto

### 2.4 Subset Neighborhoods in Manhattan

In [None]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

In [None]:
map_manhattan = folium.Map(location=[40.78343, -73.96625], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

### 2.5 Setting Foursquare Credentials to get the most common venues

In [None]:
#FourSquare Credentials and Version
CLIENT_ID = 'HXPXSOAX3TNWPOGOSLYTYJQVS20KDJRZU233WJZQ1RY3FJ24' # your Foursquare ID
CLIENT_SECRET = 'NNQRCRWLWEY3S5BVDQ3JOCL3SQXAAN2WUP0FNG04P0R234AJ' # your Foursquare Secret
VERSION = '20180604'
