# The Battle of Neighborhoods

## Business problem

The problem I am going to solve with data from Foursquare and and other sources is which neighbourhood in Manhattan is most suitable for a Swedish pizza restaurant chain to establish. This (fictive) chain is called SwePizz and has an established brand in Sweden and it is time for them to expand internationally. They are confident that the brand will be strong in all areas of Manhattan and to choose their first entry in this city they want to find a neighbourhood with few pizza restaurants. I will achieve this by using data science methodology to find the Manhattan neighborhood that currently has the fewest pizza restaurants. The idea behind this is that the demand is assumed to be high in all areas since SwePizz's pizzas have historically captivated all categories of customers and therefore the focus is on finding an area with low supply in order to efficiently grow the customer base.

## Target audience

This research will be valuable to the management team of SwePizz in order to make an effective entry into their next market. Entering the right area will help them gain traction and start building their brand in New York City, making it an important strategic decision for their long term international precense.

## Data and methodology

The data I will use for this project are:
    
- New York City neighborhood data from health.ny.gov
- Manhattan geodata from public.opendatasoft.com
- Venue categories and locations (latitude, longitude) from foursquare.com

The neighborhood data will show the names and postal codes for neighbourhoods in New York City, as well as the boroughs they are located in. Manhattan is the borough of focus for SwePizz. Two .csv files will be merged, one each from health.ny.gov and public.opendatasoft.com, to join postal code and neighborhood data with latitude and longitude data. The venue categories and locations data will show how many restaurants pizza in each Manhattan neighborhood by joining at approximate geographical coordinates (long/lat). Foursquare data will be retrieved to find nearby venues and then various instances of the pandas library will be used to refine the data to eventually create a table which displays the number of pizza restaurants in each Manhattan neighborhood.

### Importing libraries

In [213]:
# installing necessities...

!pip install folium
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
import os
from sklearn.cluster import KMeans
import folium as fol
from geopy.geocoders import Nominatim 
import matplotlib.cm as cm
import matplotlib.colors as colors



### Retrieving NYC neighborhoods

In [242]:
# retrieving NYC neighborhood data (neighbohood, borough, postalcode) from https://www.health.ny.gov/statistics/cancer/registry/appendix/neighborhoods.htm

nyc_df = pd.read_csv('https://raw.githubusercontent.com/Matmagne/Data-Science-Capstone/main/PC_NYC.csv')
nyc_df

Unnamed: 0,Neighborhood,Borough,Postalcode
0,Central Bronx,Bronx,10453
1,Central Bronx,Bronx,10457
2,Central Bronx,Bronx,10460
3,Bronx Park and Fordham,Bronx,10458
4,Bronx Park and Fordham,Bronx,10467
...,...,...,...
173,South Shore,Staten Island,10312
174,Stapleton and St. George,Staten Island,10301
175,Stapleton and St. George,Staten Island,10304
176,Stapleton and St. George,Staten Island,10305


### Retrieving Manhattan geodata

In [243]:
# retrieving Manhattan coordinates

def get_geocode(Postalcode):
    # initialize your variable to None
    lat_lng_coords = None
    while(lat_lng_coords is None):
        g = geocoder.google('{}, New York City, New York'.format(Postalcode))
        lat_lng_coords = g.latlng
    latitude = lat_lng_coords[0]
    longitude = lat_lng_coords[1]
    return latitude,longitude

In [244]:
geo_df=pd.read_csv('https://raw.githubusercontent.com/Matmagne/Data-Science-Capstone/main/us-zip-code-latitude-and-longitude.csv')

In [245]:
geo_df

Unnamed: 0,Postalcode,Latitude,Longitude
0,10001,40.750742,-73.996530
1,10002,40.717040,-73.987000
2,10003,40.732509,-73.989350
3,10005,40.706019,-74.008580
4,10006,40.707904,-74.013420
...,...,...,...
161,10292,40.780751,-73.977182
162,10422,40.828279,-73.869454
163,11286,40.658825,-74.004495
164,11302,40.759450,-73.715016


### Merging and cleaning

In [246]:
# merging at postalcode to join neighbourhoods with coordinates/geodata
# cleaning for Manhattan

manhattan = pd.merge(geo_df, nyc_df, on='Postalcode')
manhattan

Unnamed: 0,Postalcode,Latitude,Longitude,Neighborhood,Borough
0,10001,40.750742,-73.99653,Chelsea and Clinton,Manhattan
1,10002,40.71704,-73.987,Lower East Side,Manhattan
2,10003,40.732509,-73.98935,Lower East Side,Manhattan
3,10005,40.706019,-74.00858,Lower Manhattan,Manhattan
4,10006,40.707904,-74.01342,Lower Manhattan,Manhattan
5,10007,40.714754,-74.00721,Lower Manhattan,Manhattan
6,10009,40.727093,-73.97864,Lower East Side,Manhattan
7,10010,40.739022,-73.98205,Gramercy Park and Murray Hill,Manhattan
8,10011,40.741012,-74.00012,Chelsea and Clinton,Manhattan
9,10012,40.72596,-73.99834,Greenwich Village and Soho,Manhattan


### Retrieving Manhattan venues by name, latitude, longitude, and category from Foursquare

In [247]:
CLIENT_ID = 'FLQ0AVJPWWIHSBNDKJWLOOO2NZH4TD4MFB0UJJVYA4VCNR44' # Foursquare ID
CLIENT_SECRET = '5GBDXFFP4DJMOFROL4QBWFHTJFYJ4LTZWLCRABQTT0MNAOEO' # Foursquare Secret
VERSION = '20190425'

In [248]:
def getNearbyVenues(names, latitudes, longitudes):
    radius=500
    LIMIT=100
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [255]:
manhattan_venues = getNearbyVenues(names=manhattan['Neighborhood'],
                                   latitudes=manhattan['Latitude'],
                                   longitudes=manhattan['Longitude']
                                  )

Chelsea and Clinton
Lower East Side
Lower East Side
Lower Manhattan
Lower Manhattan
Lower Manhattan
Lower East Side
Gramercy Park and Murray Hill
Chelsea and Clinton
Greenwich Village and Soho
Greenwich Village and Soho
Greenwich Village and Soho
Gramercy Park and Murray Hill
Gramercy Park and Murray Hill
Chelsea and Clinton
Chelsea and Clinton
Chelsea and Clinton
Upper East Side
Gramercy Park and Murray Hill
Upper West Side
Upper West Side
Upper West Side
Central Harlem
Central Harlem
Upper East Side
East Harlem
Central Harlem
Inwood and Washington Heights
Inwood and Washington Heights
Inwood and Washington Heights
Inwood and Washington Heights
East Harlem
Chelsea and Clinton
Central Harlem
Lower Manhattan
Central Harlem
Inwood and Washington Heights
Upper East Side
Upper East Side
Lower Manhattan


### Finding the number of pizza places in Manhattan neighborhoods

In [256]:
# removing all venues that are not category "Pizza Place"

pizza = manhattan_venues['Venue Category']=='Pizza Place'
manhattan_pizza = manhattan_venues[pizza]
mp = manhattan_pizza
mp

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Chelsea and Clinton,40.750742,-73.99653,New York Pizza Suprema,40.750124,-73.994992,Pizza Place
108,Lower East Side,40.717040,-73.98700,Champion Pizza - Ludlow,40.719190,-73.988850,Pizza Place
176,Lower East Side,40.717040,-73.98700,Sauce Pizzeria,40.720368,-73.988830,Pizza Place
190,Lower East Side,40.717040,-73.98700,Scarr's Pizza,40.715335,-73.991649,Pizza Place
195,Lower East Side,40.732509,-73.98935,Joe's Pizza,40.733234,-73.987672,Pizza Place
...,...,...,...,...,...,...,...
3098,Upper East Side,40.781894,-73.95039,Marinara Pizza Upper East,40.782538,-73.953359,Pizza Place
3109,Upper East Side,40.781894,-73.95039,Nick's Restaurant & Pizzeria,40.782923,-73.948014,Pizza Place
3172,Upper East Side,40.781894,-73.95039,Luigi's Pizzeria,40.778222,-73.948426,Pizza Place
3186,Upper East Side,40.781894,-73.95039,Domino's Pizza,40.782746,-73.945352,Pizza Place


In [257]:
# removing all columns excepts venue and counting the number of venues with category "Pizza Place" in each Manhattan neighborhood

pizza_num_per_neigh = mp.set_index(["Neighborhood Latitude","Neighborhood Longitude","Neighborhood","Venue Category", "Venue Latitude", "Venue Longitude"]).count(level="Neighborhood")
pnpn = pizza_num_per_neigh
pnpn

Unnamed: 0_level_0,Venue
Neighborhood,Unnamed: 1_level_1
Central Harlem,5
Chelsea and Clinton,9
East Harlem,3
Gramercy Park and Murray Hill,8
Greenwich Village and Soho,4
Inwood and Washington Heights,16
Lower East Side,9
Lower Manhattan,12
Upper East Side,12
Upper West Side,6


In [258]:
# renaming the venue column to "Pizza Restaurants"

manhattan_pizza_restaurants = pnpn.rename(columns={"Venue":"Pizza Restaurants"})
mpr = manhattan_pizza_restaurants
mpr

Unnamed: 0_level_0,Pizza Restaurants
Neighborhood,Unnamed: 1_level_1
Central Harlem,5
Chelsea and Clinton,9
East Harlem,3
Gramercy Park and Murray Hill,8
Greenwich Village and Soho,4
Inwood and Washington Heights,16
Lower East Side,9
Lower Manhattan,12
Upper East Side,12
Upper West Side,6


## Results, conclusion, and discussion

East Harlem is the Manhattan neighborhood with the least pizza restaurants and therefore the recommendation for SwePizz will be to establish there. In addition the low number of pizza restaurants in East Harlem, the location is also known for its diverse food culture with influences from around the world (according to Wikipedia). SwePizz has a combined Swedish/Turkish heritage and therefore they should be able to contribute to the vibrant and interesting culture of East Harlem while getting a great start in their internationalization process. Further research could be done for SwePizz's management team, by providing a geographic visualization of the location of East Harlem as well as the locations of the existing pizza restaurants in the neighborhood. This could inform the decision making regarding exact location for establishment in East Harlem. Do SwePizz want to get close to the competition or do they want to keep an arms-length distance? To answer that question more analysis is needed, looking into things like center of attraction and flow of people in different areas.