# The Battle of Neighborhoods Final Project

## 1. Introduction/Business Problem

You have just learned some new skills and now you have been headhunted by two great companies. One is located in New York and the other is in Toronto. You know you will accept one of these amazing roles, but you want to know more about the areas in these cities in order to make an informed decision. You will want to find neighbourhoods that offer great amenities and other  venues like school, gym, swimming pool, Amusement park, restaurants, coffee-shops, etc. So lets look at borough-neighbourhoods that are very similar to your current location. 

## 2. Load the Libraries and gather the Data

In [7]:
import json, requests
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
from geopy.geocoders import Nominatim
import folium 
# Matplotlib and associated plotting modules
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
!pip install matplotlib-venn
from matplotlib_venn import venn2
from bs4 import BeautifulSoup
from sklearn.cluster import KMeans

Collecting matplotlib-venn
  Downloading matplotlib-venn-0.11.5.zip (40 kB)
Building wheels for collected packages: matplotlib-venn
  Building wheel for matplotlib-venn (setup.py): started
  Building wheel for matplotlib-venn (setup.py): finished with status 'done'
  Created wheel for matplotlib-venn: filename=matplotlib_venn-0.11.5-py3-none-any.whl size=32399 sha256=1f765fdfb8ac75bbfcd625278371c852fd164c6cd179ab3ac283c9b79ef6d794
  Stored in directory: c:\users\mlh1c17\appdata\local\pip\cache\wheels\00\81\bf\d39e58069f878a6cd3ac64624d8c774aaa56b46432a956157a
Successfully built matplotlib-venn
Installing collected packages: matplotlib-venn
Successfully installed matplotlib-venn-0.11.5


## 2.1 Reading the Data

### 2.1.1 New York Data

In [10]:
with open('newyork_data.json') as f:
    ny_json = json.load(f)

# relevant information is in 'features' key
ny_json = ny_json['features']
ny_json[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

In [11]:
#let's make an empty dataframe and put the useful information of json data into dataframe
cols = ['Borough', 'Neighbourhood', 'Latitude', 'Longitude']
ny_df = pd.DataFrame(columns = cols)
for data in ny_json:
    borough = data['properties']['borough']
    neigh   = data['properties']['name']
    lat_lon = data['geometry']['coordinates'] # now it'll return list
    lon, lat = lat_lon[0], lat_lon[1]
    
    ny_df = ny_df.append({'Borough': borough,'Neighbourhood': neigh, 'Latitude': lat,
                          'Longitude': lon}, ignore_index=True)
    
ny_df.head()

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


### 2.1.2 Toronto Data

In [13]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
source = requests.get(url).text
soup = BeautifulSoup(source)

table_data = soup.find('div', class_='mw-parser-output')
table = table_data.table.tbody

columns = ['Postal Code', 'Borough', 'Neighbourhood']
data = dict({key:[]*len(columns) for key in columns})

for row in table.find_all('tr'):
    for i,column in zip(row.find_all('td'),columns):
        i = i.text
        i = i.replace('\n', '')
        data[column].append(i)
        
toronto_df = pd.DataFrame.from_dict(data=data)[columns]
print("Before dropping the 'Not assigned' rows, shape is: ",toronto_df.shape)

toronto_df = toronto_df[toronto_df['Borough'] != 'Not assigned'].reset_index(drop = True)
print('After dropping rows where borough is "Not assigned", Shape is: ',toronto_df.shape)
print('Number of rows where Neighbourhood is "Not assigned" but borough has value: ', 
      toronto_df[toronto_df['Neighbourhood'] == 'Not assigned'].shape[0])

# making corresponding Borough as Neighbourhood: where Neighbourhood is 'Not assigend'
p, b, n = [], [], []
for postcode, borough, neigh in zip(toronto_df['Postal Code'], toronto_df['Borough'], toronto_df['Neighbourhood']):
    p.append(postcode)
    b.append(borough)
    if neigh == 'Not assigned':
        n.append(borough)
    else:
        n.append(neigh)

toronto_df = pd.DataFrame({'Postal Code': p, 'Borough': b, 'Neighbourhood':n})[columns]

#merging the rows, where Postal Code and Borough is same and Neighbourhoods will be seperated by ','
# :https://stackoverflow.com/a/27298308
toronto_df = toronto_df.groupby(['Postal Code', 'Borough'])['Neighbourhood'].apply(', '.join).reset_index()
print('Before Adding the latitude and longitue, shape is: ',toronto_df.shape)

latlon = pd.read_csv('Geospatial_Coordinates.csv')
#mergign the latitude and longitude
print("\nColumns of latlon are: {},\nSo merging on '{}'".format(latlon.columns, latlon.columns[0]))
toronto_df = pd.merge(toronto_df, latlon, how= 'inner', on = 'Postal Code')
print('Final Shape of data is: ', toronto_df.shape)

Before dropping the 'Not assigned' rows, shape is:  (180, 3)
After dropping rows where borough is "Not assigned", Shape is:  (103, 3)
Number of rows where Neighbourhood is "Not assigned" but borough has value:  0
Before Adding the latitude and longitue, shape is:  (103, 3)

Columns of latlon are: Index(['Postal Code', 'Latitude', 'Longitude'], dtype='object'),
So merging on 'Postal Code'
Final Shape of data is:  (103, 5)


In [14]:
toronto_df.head(10)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


## 2.2 Getting Venues USing Foursquare API

In [20]:
CLIENT_ID = 'LV5QEBRRP4YADRFG2I43IPEFFYID3IKOGBIMLNHSW4U234T4' # your Foursquare ID
CLIENT_SECRET = 'TZKVRLFGNLVDEAEJFXALH2XKNLVZBKO0CCSVQN2O2BO3QLRR' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET: ' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: LV5QEBRRP4YADRFG2I43IPEFFYID3IKOGBIMLNHSW4U234T4
CLIENT_SECRET: TZKVRLFGNLVDEAEJFXALH2XKNLVZBKO0CCSVQN2O2BO3QLRR
