# Coursera Capstone Project- Battle of Neighborhoods


# Week 1
## Part 1

## Problem Statement:

A tourist wants to visit New York for a trip. He is a Pizza lover and hence would love to be live in the Borough where pizza is easily accessible. Hence we need to find such a Borough which has maximum Pizza Places and also with high density of Pizza Places. The recommendation should be easy to understand and interpret.

## Stakeholders:

This research will benefit all the pizza lovers who will be travelling to New York. This will help them choose the Borough to stay which will give access to Pizza Places options with ease.

## Part 2

## Data
### For this problem I need data of the following:
   #### 1) Amount of Pizza places in various borough 
   #### 2) Data of Borough and Neighborhoods with Lat Lang data

#### Geographical Data will be taken by Foursquare API and using the link https://geo.nyu.edu/catalog/nyu_2451_34572, https://cocl.us/new_york_dataset




In [1]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


In [3]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0
import folium # map rendering library


Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    ------------------------------------------------------------
                       

In [4]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [5]:
neighborhoods_data = newyork_data['features']

In [6]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [7]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


In [8]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [9]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585
5,Bronx,Kingsbridge,40.881687,-73.902818
6,Manhattan,Marble Hill,40.876551,-73.91066
7,Bronx,Woodlawn,40.898273,-73.867315
8,Bronx,Norwood,40.877224,-73.879391
9,Bronx,Williamsbridge,40.881039,-73.857446


In [10]:
# Use geopy library to get the latitude and longitude values of New York City.
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


In [11]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

In [12]:
CLIENT_ID = '2QPBOH3SWH3M1OHDGGMWOD02O2L0BSLD3KSKYIJXAHKLT0N2' # your Foursquare ID
CLIENT_SECRET = 'U1CGP01SBN1JRJHUZOF4R2J3N251HAJPVWB0N5PATSX3PILB' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version


In [27]:
LIMIT = 500 # Maximum is 100
boroughs = ["Bronx","Manhattan","Brooklyn","Queens","Staten Island"]
results = {}
for borough in boroughs:
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&near={}&limit={}&categoryId={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        borough,
        LIMIT,
        "4bf58dd8d48988d1ca941735") # PIZZA PLACE CATEGORY ID
    results[borough] = requests.get(url).json()

In [30]:
results

{'Bronx': {'meta': {'code': 200, 'requestId': '5ec1555f14a126001bfa7e4b'},
  'response': {'suggestedFilters': {'header': 'Tap to show:',
    'filters': [{'name': '$-$$$$', 'key': 'price'},
     {'name': 'Open now', 'key': 'openNow'}]},
   'geocode': {'what': '',
    'where': 'bronx',
    'center': {'lat': 40.84985, 'lng': -73.86641},
    'displayString': 'Bronx, NY, United States',
    'cc': 'US',
    'geometry': {'bounds': {'ne': {'lat': 40.917577, 'lng': -73.74806},
      'sw': {'lat': 40.785743, 'lng': -73.933808}}},
    'slug': 'bronx-new-york',
    'longId': '72057594043038202'},
   'headerLocation': 'Bronx',
   'headerFullLocation': 'Bronx',
   'headerLocationGranularity': 'city',
   'query': 'pizza',
   'totalResults': 158,
   'suggestedBounds': {'ne': {'lat': 40.91530088164544,
     'lng': -73.77913119443664},
    'sw': {'lat': 40.80210681515975, 'lng': -73.93441222883635}},
   'groups': [{'type': 'Recommended Places',
     'name': 'recommended',
     'items': [{'reasons': {'co

In [33]:
df_venues={}
for borough in boroughs:
    venues = json_normalize(results[borough]['response']['groups'][0]['items'])
    df_venues[borough] = venues[['venue.name', 'venue.location.address', 'venue.location.lat', 'venue.location.lng']]
    df_venues[borough].columns = ['Name', 'Address', 'Lat', 'Lng']

In [34]:
df_venues

{'Bronx':                                                  Name  \
 0                             Kingsbridge Social Club   
 1                               Louie & Ernie's Pizza   
 2                                      Zero Otto Nove   
 3                                         Sam's Pizza   
 4                                  Full Moon Pizzeria   
 5                                         Nicks Pizza   
 6                               John & Joe's Pizzeria   
 7                                        Yankee Pizza   
 8                             Katonah Pizza and Pasta   
 9                                      Emilio's Pizza   
 10                                       Franks Pizza   
 11                       Franks Original Pizza Italia   
 12                                      Pugsley Pizza   
 13                                         Patricia's   
 14                        The Original Emilio's Pizza   
 15                                    Loretta's Pizza   
 16  

In [35]:
maps = {}
for borough in boroughs:
    borough_lat = np.mean([results[borough]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[borough]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    borough_lng = np.mean([results[borough]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[borough]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps[borough] = folium.Map(location=[borough_lat, borough_lng], zoom_start=11)

    # add markers to map
    for lat, lng, label in zip(df_venues[borough]['Lat'], df_venues[borough]['Lng'], df_venues[borough]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(maps[borough])  
    print(f"Total number of pizza places in {borough} = ", results[borough]['response']['totalResults'])
    print("Showing Top 100")

Total number of pizza places in Bronx =  158
Showing Top 100
Total number of pizza places in Manhattan =  252
Showing Top 100
Total number of pizza places in Brooklyn =  232
Showing Top 100
Total number of pizza places in Queens =  188
Showing Top 100
Total number of pizza places in Staten Island =  140
Showing Top 100


### Clearly shows that Manhattan has most number of Pizza Places, followed by Brooklyn and Bronx. But more number of Pizza Places doesn't mean that they are easily accessible, hence I'll check how densely are they located in their respective Boroughs


In [36]:
maps[boroughs[0]] #Bronx

In [37]:
maps[boroughs[1]] #Manhattan

In [38]:
maps[boroughs[2]] #Brooklyn

In [39]:
maps[boroughs[3]] #Queens

In [40]:
maps[boroughs[4]] #Staten Island

In [58]:
#We will see how close the pizza places located i.e how densely are they located. For this I'll see for Mean Distance from the Mean Coordinates.For this
# I'll 1st calculate the mean coordinate( mean distance of all coordinates) and then average of the distances of the Pizza places from the mean coordinate  

maps2 = {}
for borough in boroughs:
    borough_lat = np.mean([results[borough]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[borough]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    borough_lng = np.mean([results[borough]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[borough]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps2[borough] = folium.Map(location=[borough_lat, borough_lng], zoom_start=10)
    venues_mean_coor = [df_venues[borough]['Lat'].mean(), df_venues[borough]['Lng'].mean()] 
    # add markers to map
    for lat, lng, label in zip(df_venues[borough]['Lat'], df_venues[borough]['Lng'], df_venues[borough]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            legend_name='borough:'+ borough,
            parse_html=False).add_to(maps2[borough])
        folium.PolyLine([venues_mean_coor, [lat, lng]], color="red", weight=1.5, opacity=0.5).add_to(maps2[borough])
    
    label = folium.Popup("Mean Co-ordinate", parse_html=True)
    folium.CircleMarker(
        venues_mean_coor,
        radius=10,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        caption=''+borough,
        parse_html=False).add_to(maps2[borough])

    print(borough)
    print("Mean Distance from Mean coordinates")
    print(np.mean(np.apply_along_axis(lambda x: np.linalg.norm(x - venues_mean_coor),1,df_venues[borough][['Lat','Lng']].values)))

Bronx
Mean Distance from Mean coordinates
0.03498701585170117
Manhattan
Mean Distance from Mean coordinates
0.025780434583878624
Brooklyn
Mean Distance from Mean coordinates
0.0455394088214408
Queens
Mean Distance from Mean coordinates
0.06189604575236819
Staten Island
Mean Distance from Mean coordinates
0.05600907549251621


### This clearly shows the pizza places are most densely located in Manhattan, followed by Bronx and Brooklyn. 
### From both the number of Pizza Places and Density of the location of Pizza Places, the obvious choice in Manhattan

In [59]:
maps2[boroughs[0]]

In [48]:
maps2[boroughs[1]]

In [49]:
maps2[boroughs[2]]

In [50]:
maps2[boroughs[3]]

In [51]:
maps2[boroughs[4]]

### In Queens we could see that there are 2 Pizza Places which are at extreme or the outliers, so I'll remove them and check if our conclusion changes or not.

In [66]:
borough = 'Queens'
venues_mean_coor = [df_venues[borough]['Lat'].mean(), df_venues[borough]['Lng'].mean()] 

print(borough)
print("Mean Distance from Mean coordinates")
dists = np.apply_along_axis(lambda x: np.linalg.norm(x - venues_mean_coor),1,df_venues[borough][['Lat','Lng']].values)
dists.sort()
print(np.mean(dists[:-2]))# Ignoromg the 2 biggest distances

Queens
Mean Distance from Mean coordinates
0.05999405348415403


### Its still huge and hence won't affect the findings, so the person travelling to NYC and wants to locate himself/herself as per the affinity to Pizza Places and Pizzas, he/she should prefer Manhattan

### Another parameter that can be added for this research can be the ratings of the pizza places,for now I have choosen the count and the density for this project