# Battle of neighbourhood : best place to start a food-truck activity

## 1. Introduction

During the Covid pandemy, lockdown has heavily impacted how and where jobs are being performed. Facilities like restaurant and bars have been closed and only Drive-in or Food-truck are still allowed to pursuie their activity. 
In this project, we'll take the role a new entrepreneurs who is looking after opening a Foodtruck in city of New York.  
We'll focus on how to choose the best place to settle and start this new activity, by looking after potential competitors.



## 2. Dataset

I will use the New York neighborhood dataset to have a look to the neighborhoods and borough in New York city.
Neighborhoods are provided as a JSON flat file and will allow us to plot them on map.

## 3. Methodology

### 3.1 Recurring steps

Assuming that the lockdown is still in place, the main competitors are Drive-in and other Take away store in the same area. 
Using Foursquare API we'll be able to locate them by Neighborhood.

Then, we'll classify Neighborhood in different categories using K-clustering.

The less the Competitors we find by area the more valuable the Neighborhood will be for any further implementation.

### 3.2 Optimiszation

Assuming that we have a limited number of Free calls to Foursquare API, the list of competitors has been launched by Borough, creating 1 file by Borough.

We now have 5 dataset to merge and analyze to look after the best setlement place for our Food Truck.


### 3.3 First of all, let's import some useful libraries

In [3]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')


Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


## 4. Boroughs Analysis

### 4.1 Let's plot the Boroughs of New York city on a map

In [4]:
# read JSON file
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

# import into a Dataframe for future manipulation
neighborhoods_data = newyork_data['features']
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

In [5]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)


In [6]:
# loop into the JSON structure to retrieve all the details
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [7]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


In [8]:
neighborhoods.groupby('Borough').count()

Unnamed: 0_level_0,Neighborhood,Latitude,Longitude
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Bronx,52,52,52
Brooklyn,70,70,70
Manhattan,40,40,40
Queens,81,81,81
Staten Island,63,63,63


In [9]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


In [10]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

In [11]:
map_newyork.save("newyork_neighborhoods.png")

### 4.2 Merge FoodTruck result from Foursquare API for the 5 Boroughs in New York city

In [15]:
boroughs = neighborhoods.groupby('Borough').count()
boroughs.head()

Unnamed: 0_level_0,Neighborhood,Latitude,Longitude
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Bronx,52,52,52
Brooklyn,70,70,70
Manhattan,40,40,40
Queens,81,81,81
Staten Island,63,63,63


In [67]:
borough_name ='Bronx'
file_name = 'competitor_'+borough_name+'_tocsv.csv'
df_file = pd.read_csv(file_name)
df_file['Borough']=borough_name
print('File for {} has {} lines.'.format(file_name,df_file.shape))
df_neighborhoods = df_file

borough_name ='Brooklyn'
file_name = 'competitor_'+borough_name+'_tocsv.csv'
df_file = pd.read_csv(file_name)
df_file['Borough']=borough_name
print('File for {} has {} lines.'.format(file_name,df_file.shape))
df_neighborhoods = df_neighborhoods.append(df_file)

borough_name ='Manhattan'
file_name = 'competitor_'+borough_name+'_tocsv.csv'
df_file = pd.read_csv(file_name)
df_file['Borough']=borough_name
print('File for {} has {} lines.'.format(file_name,df_file.shape))
df_neighborhoods = df_neighborhoods.append(df_file)

borough_name ='Queens'
file_name = 'competitor_'+borough_name+'_tocsv.csv'
df_file = pd.read_csv(file_name)
df_file['Borough']=borough_name
print('File for {} has {} lines.'.format(file_name,df_file.shape))
df_neighborhoods = df_neighborhoods.append(df_file)

borough_name ='Staten Island'
file_name = 'competitor_'+borough_name+'_tocsv.csv'
df_file = pd.read_csv(file_name)
df_file['Borough']=borough_name
print('File for {} has {} lines.'.format(file_name,df_file.shape))
df_neighborhoods = df_neighborhoods.append(df_file)

df_neighborhoods.shape

File for competitor_Bronx_tocsv.csv has (23, 11) lines.
File for competitor_Brooklyn_tocsv.csv has (34, 11) lines.
File for competitor_Manhattan_tocsv.csv has (89, 11) lines.
File for competitor_Queens_tocsv.csv has (33, 11) lines.
File for competitor_Staten Island_tocsv.csv has (15, 11) lines.


(194, 11)

In [68]:
df_neighborhoods.head()

Unnamed: 0.1,Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,FoodType,Customer,Borough
0,0,Wakefield,40.894705,-73.847201,Subway,40.890468,-73.849152,Fast Food Restaurant,Fast Food,NotCustomer,Bronx
1,1,Co-op City,40.874294,-73.829939,Arby's,40.870411,-73.828606,Fast Food Restaurant,Fast Food,NotCustomer,Bronx
2,2,Co-op City,40.874294,-73.829939,Kennedy's,40.876807,-73.829627,Fast Food Restaurant,Fast Food,NotCustomer,Bronx
3,3,Eastchester,40.887556,-73.827806,McDonald's,40.885636,-73.82684,Fast Food Restaurant,Fast Food,NotCustomer,Bronx
4,4,Eastchester,40.887556,-73.827806,New York Chicken & Grill,40.888248,-73.831665,Fast Food Restaurant,Fast Food,NotCustomer,Bronx


In [69]:
print('There are {} uniques categories.'.format(len(df_neighborhoods['Venue Category'].unique())))

There are 40 uniques categories.


In [70]:
df_neighborhoods['Venue Category'].unique()

array(['Fast Food Restaurant', 'Burger Joint', 'Warehouse Store', 'Plaza',
       'Print Shop', 'Shipping Store', 'Lawyer', 'Chinese Restaurant',
       'Fried Chicken Joint', 'Sandwich Place', 'Factory', 'Home Service',
       'Dry Cleaner', 'Coffee Shop', 'Tea Room', 'Donut Shop', 'Gym',
       'Vegetarian / Vegan Restaurant', 'Bakery', 'Mexican Restaurant',
       'Breakfast Spot', 'Yoga Studio', 'Hotel', 'Lounge',
       'History Museum', 'School', 'BBQ Joint', 'Garden Center',
       'Paper / Office Supplies Store', 'Indie Movie Theater',
       'Thai Restaurant', 'Cocktail Bar', 'Mobile Phone Shop',
       "Doctor's Office", 'Pharmacy', 'Grocery Store',
       'Vietnamese Restaurant', 'Gourmet Shop', 'Beer Garden', 'Café'],
      dtype=object)

In [71]:
# see only competitors : remove Regular FoodType venues
manhattan_competitors = manhattan_venues[manhattan_venues['FoodType']!='Regular'].reset_index(drop=True)
manhattan_competitors.head()

NameError: name 'manhattan_venues' is not defined

### 4.3 Clustering the Neighborhoods of New York

In [72]:
# one hot encoding
newyork_onehot = pd.get_dummies(df_neighborhoods[['FoodType']], prefix="C", prefix_sep="_")
# add neighborhood column back to dataframe
newyork_onehot['Neighborhood'] = df_neighborhoods['Neighborhood'] 
# move neighborhood column to the first column
fixed_columns = [newyork_onehot.columns[-1]] + list(newyork_onehot.columns[:-1])
newyork_onehot = newyork_onehot[fixed_columns]

newyork_onehot.head()

Unnamed: 0,Neighborhood,C_Fast Food,C_Restaurant,C_Sandwich
0,Wakefield,1,0,0
1,Co-op City,1,0,0
2,Co-op City,1,0,0
3,Eastchester,1,0,0
4,Eastchester,1,0,0


In [73]:
newyork_onehot.shape

(194, 4)

In [74]:
# Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
neighborhoods_grouped = newyork_onehot.groupby('Neighborhood').mean().reset_index()
neighborhoods_grouped

Unnamed: 0,Neighborhood,C_Fast Food,C_Restaurant,C_Sandwich
0,Astoria,0.833333,0.166667,0.0
1,Bay Ridge,0.8,0.2,0.0
2,Bensonhurst,0.125,0.875,0.0
3,Brighton Beach,0.75,0.25,0.0
4,Central Harlem,0.875,0.125,0.0
5,Chinatown,0.285714,0.678571,0.035714
6,Co-op City,1.0,0.0,0.0
7,Corona,1.0,0.0,0.0
8,East Harlem,1.0,0.0,0.0
9,Eastchester,1.0,0.0,0.0


In [75]:
# Let's confirm size
neighborhoods_grouped.shape

(40, 4)

In [76]:
df_data = neighborhoods.reset_index(drop=True)
df_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [77]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [78]:
num_top_venues = 3

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = neighborhoods_grouped['Neighborhood']

for ind in np.arange(neighborhoods_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(neighborhoods_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,Astoria,C_Fast Food,C_Restaurant,C_Sandwich
1,Bay Ridge,C_Fast Food,C_Restaurant,C_Sandwich
2,Bensonhurst,C_Restaurant,C_Fast Food,C_Sandwich
3,Brighton Beach,C_Fast Food,C_Restaurant,C_Sandwich
4,Central Harlem,C_Fast Food,C_Restaurant,C_Sandwich


In [79]:
# set number of clusters
kclusters = 3

neighborhoods_grouped_clustering = neighborhoods_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(neighborhoods_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 0, 2, 0, 1, 2, 1, 1, 1, 1], dtype=int32)

In [80]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

neighborhoods_merged = df_data

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
neighborhoods_merged = neighborhoods_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

neighborhoods_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,Bronx,Wakefield,40.894705,-73.847201,1.0,C_Fast Food,C_Sandwich,C_Restaurant
1,Bronx,Co-op City,40.874294,-73.829939,1.0,C_Fast Food,C_Sandwich,C_Restaurant
2,Bronx,Eastchester,40.887556,-73.827806,1.0,C_Fast Food,C_Sandwich,C_Restaurant
3,Bronx,Fieldston,40.895437,-73.905643,1.0,C_Fast Food,C_Sandwich,C_Restaurant
4,Bronx,Riverdale,40.890834,-73.912585,,,,


In [100]:
neighborhoods_merged['Cluster'] = neighborhoods_merged['Cluster Labels'].fillna(3).astype('int')
neighborhoods_merged.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,Cluster
0,Bronx,Wakefield,40.894705,-73.847201,1.0,C_Fast Food,C_Sandwich,C_Restaurant,1
1,Bronx,Co-op City,40.874294,-73.829939,1.0,C_Fast Food,C_Sandwich,C_Restaurant,1
2,Bronx,Eastchester,40.887556,-73.827806,1.0,C_Fast Food,C_Sandwich,C_Restaurant,1
3,Bronx,Fieldston,40.895437,-73.905643,1.0,C_Fast Food,C_Sandwich,C_Restaurant,1
4,Bronx,Riverdale,40.890834,-73.912585,,,,,3


In [99]:

neighborhoods_merged.tail()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,Cluster
301,Manhattan,Hudson Yards,40.756658,-74.000111,,,,,3
302,Queens,Hammels,40.587338,-73.80553,,,,,3
303,Queens,Bayswater,40.611322,-73.765968,,,,,3
304,Queens,Queensbridge,40.756091,-73.945631,,,,,3
305,Staten Island,Fox Hills,40.617311,-74.08174,,,,,3


In [109]:
neighborhoods_merged.loc[neighborhoods_merged['Cluster'] == 3,:].head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,Cluster
4,Bronx,Riverdale,40.890834,-73.912585,,,,,3
10,Bronx,Baychester,40.866858,-73.835798,,,,,3
11,Bronx,Pelham Parkway,40.857413,-73.854756,,,,,3
12,Bronx,City Island,40.847247,-73.786488,,,,,3
13,Bronx,Bedford Park,40.870185,-73.885512,,,,,3


In [95]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, bor, cluster in zip(neighborhoods_merged['Latitude'], 
                                  neighborhoods_merged['Longitude'], 
                                  neighborhoods_merged['Neighborhood'], 
                                  neighborhoods_merged['Borough'],          
                                  neighborhoods_merged['Cluster']):
    label = folium.Popup(str(bor) + ' '+ str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [96]:
file_name = 'FoodTruck Analysis - Result map.png'
map_clusters.save(file_name)

In [114]:
newyork_geo = r'newyork_data.json' # geojson file

# generate choropleth map using the total immigration of each country to Canada from 1980 to 2013
map_clusters.choropleth(
    geo_data=newyork_geo,
    data=neighborhoods_merged,
    columns=['Neighborhood', 'Cluster'],
    key_on='feature.properties.name',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Title'
)

# display map
map_clusters

# 5 Conclusion

In [97]:
neighborhoods_merged.loc[neighborhoods_merged['Cluster Labels'] == 0, neighborhoods_merged.columns[[1] + list(range(5, neighborhoods_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,Cluster
5,Kingsbridge,C_Restaurant,C_Fast Food,C_Sandwich,0
6,Marble Hill,C_Fast Food,C_Restaurant,C_Sandwich,0
46,Bay Ridge,C_Fast Food,C_Restaurant,C_Sandwich,0
49,Greenpoint,C_Restaurant,C_Fast Food,C_Sandwich,0
51,Brighton Beach,C_Fast Food,C_Restaurant,C_Sandwich,0
101,Washington Heights,C_Fast Food,C_Restaurant,C_Sandwich,0
103,Hamilton Heights,C_Fast Food,C_Restaurant,C_Sandwich,0
104,Manhattanville,C_Fast Food,C_Restaurant,C_Sandwich,0
107,Upper East Side,C_Fast Food,C_Restaurant,C_Sandwich,0
131,Jackson Heights,C_Fast Food,C_Restaurant,C_Sandwich,0


In [65]:
neighborhoods_merged.loc[neighborhoods_merged['Cluster Labels'] == 1, neighborhoods_merged.columns[[1] + list(range(5, neighborhoods_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,Cluster
0,Wakefield,C_Fast Food,C_Sandwich,C_Restaurant,1
1,Co-op City,C_Fast Food,C_Sandwich,C_Restaurant,1
2,Eastchester,C_Fast Food,C_Sandwich,C_Restaurant,1
3,Fieldston,C_Fast Food,C_Sandwich,C_Restaurant,1
8,Norwood,C_Fast Food,C_Sandwich,C_Restaurant,1
9,Williamsbridge,C_Fast Food,C_Sandwich,C_Restaurant,1
48,Sunset Park,C_Fast Food,C_Sandwich,C_Restaurant,1
53,Manhattan Terrace,C_Fast Food,C_Sandwich,C_Restaurant,1
54,Flatbush,C_Fast Food,C_Sandwich,C_Restaurant,1
102,Inwood,C_Fast Food,C_Restaurant,C_Sandwich,1


In [89]:
neighborhoods_merged.loc[neighborhoods_merged['Cluster Labels'] == 2, neighborhoods_merged.columns[[1] + list(range(5, neighborhoods_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,Cluster
7,Woodlawn,C_Restaurant,C_Sandwich,C_Fast Food,2
47,Bensonhurst,C_Restaurant,C_Fast Food,C_Sandwich,2
50,Gravesend,C_Restaurant,C_Sandwich,C_Fast Food,2
52,Sheepshead Bay,C_Sandwich,C_Restaurant,C_Fast Food,2
100,Chinatown,C_Restaurant,C_Fast Food,C_Sandwich,2


In [98]:
neighborhoods_merged.loc[neighborhoods_merged['Cluster Labels'] == 3, neighborhoods_merged.columns[[1] + list(range(5, neighborhoods_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,Cluster
