# The Battle of Neighborhood

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Introduction</a>
    

2. <a href="#item2">Data Description</a>
    

3. <a href="#item3">Methodology</a>
    

4. <a href="#item3">Results</a>   
    
    
5. <a href="#item3">Discussion</a>
    
    
6. <a href="#item3">Conclusion</a>
</font>
</div>

## 1. Introduction

### 1.1 Background

In this project we are going to explore the neighborhoods of the most popular cities, New York and Toronto. They are the international centers of business, finance, arts, culture, and are recognized as the most multicultural cosmopolitans in the world.They are diverse in many ways. One interesting idea would be to compare the neighborhoods of the two cities and determine how similar or dissimilar they are.

### 1.2 Problem Description

Let me explain the idea of this project through a scenario. Say you live in the New York City of USA. You love your neighborhood, mainly because of all the great amenities and other types of venues that exist in the neighborhood, such as gourmet fast food joints, pharmacies, parks, and so on. Now say you receive a job offer from a great company which is in Toronto, Canada with great career prospects. However, given the far distance from your current place you unfortunately must move if you decide to accept the offer.

Wouldn't it be great if you are able to determine a neighborhood that are the same as your current neighborhood, and if not perhaps a similar neighborhoods that are at least closer to your new job?

### 1.3 Target Audience

We will study and analyze the neighborhoods of both the cities and group them into similar clusters and, analyze those clusters to gather meaningful information. That information can be used to find out neighborhoods that are same as your current neighborhood or at least similar.

The information provided by this project would be useful for people who are interested in relocating to different places and are interested in finding new neighborhoods that are highly similar to their existing neighborhood.

## 2. Data Description

The NYC neighborhood data exists for free on the web. Here is the link to the dataset:https://geo.nyu.edu/catalog/nyu_2451_34572, I downloaded the files and placed it on the server. For Toronto neighborhood data,a Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, exists that has all the information we needed to explore. We will scrape the Wikipedia page and wrangle the data, clean it, and then read it into a pandas dataframe.

Note: There are different website scraping libraries and packages in Python. We can simply use pandas to read the table into a pandas dataframe. Another way, for more complicated cases of web scraping is using the BeautifulSoup package. Here is the package's main documentation page: http://beautiful-soup-4.readthedocs.io/en/latest/

### Import the libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

from bs4 import BeautifulSoup

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  conda-forge
    openssl-1.1.1f             |       h516909a_0         2.1 MB  conda-forge
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    branca-0.4.0               |             py_0          26 KB  conda-forge
    ------------------------------------------------------------
                       

### Download and Explore Data

In [2]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


In [3]:
# obtain New York neighborhood data
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
    
nyneighborhoods = newyork_data['features']

# define the dataframe with four columns: City, Borough, Neighborhood, Latitude, Longitude
column_names = ['City','Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
ny_neighborhoods = pd.DataFrame(columns=column_names)

for data in nyneighborhoods:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    ny_neighborhoods = ny_neighborhoods.append({'City': 'New York',
                                          'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
ny_neighborhoods.head()

Unnamed: 0,City,Borough,Neighborhood,Latitude,Longitude
0,New York,Bronx,Wakefield,40.894705,-73.847201
1,New York,Bronx,Co-op City,40.874294,-73.829939
2,New York,Bronx,Eastchester,40.887556,-73.827806
3,New York,Bronx,Fieldston,40.895437,-73.905643
4,New York,Bronx,Riverdale,40.890834,-73.912585


In [4]:
#Define a function to plot the city map with different neighborhood labels based on Borough

def citymap(cityname,countryname,dataframe):
    
    # create map
    address = cityname + ',' + countryname

    geolocator = Nominatim(user_agent="ny_explorer")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    my_map = folium.Map(location=[latitude, longitude], zoom_start=10)

    # set color scheme for the Borough
    borough_name = dataframe['Borough'].unique().tolist()
    colnum = dataframe['Borough'].unique().size
    x = np.arange(colnum)
    ys = [i+x+(i*x)**2 for i in range(colnum)]
    colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
    rainbow = [colors.rgb2hex(i) for i in colors_array]

    # add markers to the map

    for lat, lon, neighborhood, borough in zip(dataframe['Latitude'], dataframe['Longitude'], dataframe['Neighborhood'], dataframe['Borough']):
        cluster = borough_name.index(borough)
        label = '{}, {}'.format(neighborhood, borough)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color=rainbow[cluster],
            fill=True,
            fill_color=rainbow[cluster],
            fill_opacity=0.7).add_to(my_map)
       
    
    return my_map

In [5]:
citymap('New York','USA',ny_neighborhoods)

In [6]:
# obtain Toronto neighborhood data
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
wikipage = requests.get(url)
wikipage.text[:100]

#Create a new pd DataFrame
toronto = pd.DataFrame()

#use beautifulsoup to read the wikipage
soup = BeautifulSoup(wikipage.text, 'lxml')
wikitable = soup.find_all('table')[0] 

row_marker = 0

for row in wikitable.find_all('tr'):
    column_marker = 0
    columns = row.find_all('td')
    for column in columns:
        toronto.loc[row_marker,column_marker] = column.get_text()
        column_marker += 1
    row_marker += 1

#rename column names
toronto.rename(columns={0:'PostalCode',1:'Borough',2:'Neighborhood'}, inplace=True)

#drop the boroughs that has 'not assigned' value 
toronto=toronto[~toronto.Borough.str.contains("Not assigned")]
toronto=toronto.reset_index(drop=True)

#create a new dataframe toronto_neighbor to find all neighborhoods with the location information
toronto_neighborhoods= pd.DataFrame(columns = ['City','Borough','Neighborhood'])

for i in range(toronto.shape[0]):
    borough = toronto.loc[i,'Borough'][:-1] #delete the last character \n
    postalcode = toronto.loc[i,'PostalCode']
    neighborhood = toronto.loc[i,'Neighborhood'][:-1] #delete the last character \n
    
    #if the neighborhood name is not assigned, than the neighorhood name is same as borough
    if neighborhood == 'Not assigned':
        neighborhood = borough
    
    #find the location data, ignore the neighborhoods that are unable to be located by Nominatim
    geolocator = Nominatim(user_agent="mycapstoneproject")
    location = geolocator.geocode("{},{},Toronto,Ontario,Canada".format(neighborhood,borough))
    
    #try one more searching
    if location is None: 
        location = geolocator.geocode("{},Toronto,Ontario,Canada".format(neighborhood))
        
    else:
        toronto_neighborhoods = toronto_neighborhoods.append({'City': 'Toronto',
                                                'Borough': borough,
                                                'Neighborhood': neighborhood,
                                                'Latitude': location.latitude,
                                                'Longitude': location.longitude
                                               }, ignore_index=True)

toronto_neighborhoods.head()

Unnamed: 0,City,Borough,Neighborhood,Latitude,Longitude
0,Toronto,North York,Parkwoods,43.7588,-79.320197
1,Toronto,North York,Victoria Village,43.732658,-79.311189
2,Toronto,Etobicoke,Islington Avenue,43.622575,-79.514215
3,Toronto,Scarborough,Malvern / Rouge,43.809196,-79.221701
4,Toronto,North York,Don Mills,43.775347,-79.345944


In [7]:
citymap('Toronto','Canada',toronto_neighborhoods)

In [8]:
#shape of toronto neighborhood
toronto_neighborhoods.shape

(47, 5)

In [9]:
#shape of ny neighborhood
ny_neighborhoods.shape

(306, 5)

In [10]:
#save the data to a csv
ny_neighborhoods.to_csv('ny_neighborhoods.csv', sep='\t')
toronto_neighborhoods.to_csv('toronto_neighborhoods.csv', sep='\t')

In [11]:
#Use Foursquare to expore the neighborhoods
#My Foursquare Credentials 
CLIENT_ID = 'PCGUZOSUUTA4TFMTKLVQA1QR3YGPKEUYYN45DF1WNQBMFLQQ' # your Foursquare ID
CLIENT_SECRET = 'OP4EQ0313NVCLHZRUC0KACAOL12JMHVTOWUSJNMEFOMVRTYF' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 100
radius = 500
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: PCGUZOSUUTA4TFMTKLVQA1QR3YGPKEUYYN45DF1WNQBMFLQQ
CLIENT_SECRET:OP4EQ0313NVCLHZRUC0KACAOL12JMHVTOWUSJNMEFOMVRTYF
