# Quality Of Life, New York City

### In this Notebook, I will be leveraging the Foursquare location data to explore New York City venues in Manhattan

### Business Problem: A corporation has asked me, a Data Science contractor, to use data science algorithms to find the best location to open up an office in Manhattan. They are concerned of the quality of life that their employees will have based on how close they will be to the office. The factor that they want me to analyze is... the frequency of pizza parlors in each neighborhood. Sounds crazy to me but I guess they love their pizza?

### Question: which neighborhood in Manhattan is closest to a variety of pizza places?

### In order to solve this problem, we need to first get all of the data of New York City neighborhoods from a json file and convert it to a dataframe that we can work with. We will be using the Foursquare API with the help of folium and geopy libraries to help us plot our observations. After making an API call to the Foursquare server we have data such as 'Pizza Places' in the neighborhoods. We can now use these to calculate the frequency of pizza places in each neighborhood to then use a clustering algorithm to determine which neighborhood the firm would be likely to move into.

### Background: Wherever you want to live, your quality of life is dependent on many factors consisting of finance and safety. In this case, it consists of one thing, pizza. Enjoy :)

## Import all libraries we will be using in this workbook

In [4]:
from bs4 import BeautifulSoup
import requests
import xml

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
#!conda install -c conda-forge folium
#!pip install folium
import folium

## Load New York Data into Dataframe

In [5]:
# Get the New York dataset
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset

# Use json file we got above '!wget ...' to load it into a data set
with open('newyork_data.json') as json_dataframe:
    newyork_dataframe = json.load(json_dataframe)

# All of the relevant data is in the features key, we must define a new variable that includes this data
ny_features = newyork_dataframe['features']

# We need to define the dataframe columns
ny_columns = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# Instantiation of the dataframe
ny_neighborhoods = pd.DataFrame(columns=ny_columns)

# Load the data into the dataframe
for data in ny_features:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    ny_neighborhoods = ny_neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

#### Dataframe

In [6]:
ny_neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


## Create a Map of Manhattan

In [7]:
# Extract only the manhattan data from the nyc dataframe
manhattan_data = ny_neighborhoods[ny_neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)

manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


In [8]:
# Get NYC coodinates
address = 'Manhattan, NY'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, borough, neighborhood in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Borough'], manhattan_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

## Call Foursquare API

In [9]:
# The code was removed by Watson Studio for sharing.

In [13]:
# create the API request
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

# This function will go through all the neighborhoods in our Manhattan dataframe and 
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

# This will run the above function and return all the venues in each area
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


## Analyze Neighborhoods

In [14]:
onlyPizzaPlaces = manhattan_venues[manhattan_venues['Venue Category'] == 'Pizza Place'].reset_index(drop=True)

In [15]:
pizza_places = onlyPizzaPlaces[['Neighborhood', 'Venue Category']]

In [16]:
#pizza_places.groupby('Neighborhood').count()
#pizza_places.groupby(['Neighborhood', 'Venue Category']).size()

pizzaFrequencies = (pizza_places.groupby(['Venue Category', 'Neighborhood']).size() 
   .sort_values(ascending=False) 
   .reset_index(name='Amount') 
   .drop_duplicates(subset='Neighborhood'))

pizzaFrequencies.head()

Unnamed: 0,Venue Category,Neighborhood,Amount
0,Pizza Place,Carnegie Hill,6
1,Pizza Place,Lenox Hill,5
2,Pizza Place,Yorkville,4
3,Pizza Place,East Village,4
4,Pizza Place,Gramercy,4


## Clutering the Neighborhoods

In [17]:
# We will run k-means to cluster the neighborhood into 5 clusters

#Get a Dataframe with just the 'Amount' to use for clustering
manhattan_pizza_frequency = pd.DataFrame(pizzaFrequencies['Amount'])

# set number of clusters
kclusters = 6

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_pizza_frequency)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:33] 

array([0, 5, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 4, 4, 4, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

In [18]:
# add clustering labels
#neighborhood_pizza_sorted = pd.DataFrame(pizzaFrequencies[['Neighborhood', 'Venue Category']])

pizza_places_sorted = pizza_places.drop_duplicates()
pizza_places_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

manhattan_merged = manhattan_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
manhattan_merged = manhattan_merged.join(pizza_places_sorted.set_index('Neighborhood'), on='Neighborhood')


# For some reason I need to convert the 'Cluster Labels' column to integer so these next four lines will do that
clean_merge = manhattan_merged.dropna(subset=['Borough', 'Cluster Labels'])

int_CL = pd.DataFrame(clean_merge[ 'Cluster Labels' ], dtype=int)

drop_CL = clean_merge.drop(['Cluster Labels'], axis=1)

clusters = pd.concat([drop_CL, int_CL], axis=1)
clusters.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Venue Category,Cluster Labels
0,Manhattan,Marble Hill,40.876551,-73.91066,Pizza Place,0
1,Manhattan,Chinatown,40.715618,-73.994279,Pizza Place,5
2,Manhattan,Washington Heights,40.851903,-73.9369,Pizza Place,3
3,Manhattan,Inwood,40.867684,-73.92121,Pizza Place,3
4,Manhattan,Hamilton Heights,40.823604,-73.949688,Pizza Place,3


In [19]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(clusters['Latitude'], clusters['Longitude'], clusters['Neighborhood'], clusters['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=7,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Legend
### Purple > Cyan  > Green > Orange > Red
### <-- Highest to Lowest Pizza Parlours -->

## Conclusion
### If you were moving to Manhattan based on your love of easy to access pizza parlours, you could refer to this map to which neighborhood you would move to.

## Hope you had fun? I would make a joke but it might turn out to cheesy ( ͡° ͜ʖ ͡°)