# Capstone Project - The Battle of the Neighborhoods - Week 5
### Applied Data Science Capstone

### Introduction

In this project, the goal is to provide recommendations to stakeholders who are interested in opening a restaurant in NYC, more precisely in Manhattan.

With the help of tools and Data analisys, we will recommend to them the optimal location for their businesses.

## Table of Content:


<font size = 2>

- [Data Analysis](#1)
- [Import libraries](#2)
- [Get the NYC data](#3)
- [Explore the NYC data and create a DataFrame](#4)
- [Create a Manhattan Neighborhood DataFrame](#5)
- [Explore the neighborhoods in Manhattan using FourSquare AP](#6)
- [Business Problem](#7)
    - [1 - How many restaurants are there in Manhattan? - First insight](#8)
    - [2 - Cluster the restaurants and analyze them - Second insight](#9)
    - [3 - Include in the analysis the total population of each neighborhood - Third insight](#10)
    - [4 - Analyze complementary places to help the decision - Fourth Insight](#11)
    - [Conclusion](#12)
    

### Data Analysis  <a class="anchor" id="1"></a>

To adress our problem, there are considerations that will drive our decisions: 

- To recommend a good location for the restaurant, we will check the neighborhoods with less number of restaurants, This will help in identifying the area with less Competitions. This won't be easy in Manhattan, because is a tourist place.

- We could also check what kind of cuisines the competitor's restaurants serve, because this will be a driver to decide what kind of restaurant stakeholders should invest in.

- Also in the analysis we will consider other complementaries business in order to find the best match for the new restaurant.

### Import libraries  <a class="anchor" id="2"></a>

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

from IPython import display

from bs4 import BeautifulSoup

import wget

print('Libraries imported.')

Libraries imported.


### Get the NYC data   <a class="anchor" id="3"></a>

In [2]:
nydata = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json'
nyjason = wget.download(nydata)
print('Data downloaded!')

Data downloaded!


### Explore the NYC data and create a DataFrame <a class="anchor" id="4"></a>

In [3]:
# open json and work with it

with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
    
neighborhoods_data = newyork_data['features']

In [4]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

# create a dataframe
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


### Create a Manhattan Neighborhood DataFrame <a class="anchor" id="5"></a>

In [5]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


### Explore the neighborhoods in Manhattan using FourSquare API <a class="anchor" id="6"></a>

In [6]:
# Let's get the geographical coordinates of Manhattan

address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 40.7896239, -73.9598939.


In [7]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

### Business Problem <a class="anchor" id="7"></a>

First it is important to mention that in order to solve this question in the real world we need more information from different sources related to the development of marketing, economic and social analysis, etc.
But the main goal to solve this problem is to use toolboxes like the Foursquare API, for example, to drive a theoretical suggestion with data from the real world.
So, in this context, we can do:

**1 - Taking into account all the neighborhoods of Manhattan, first we will see how many restaurants there are and their type of cuisine**

**2 - Cluster the restaurants and analyze them**

**3 - Include in the analysis the total population of each neighborhood to find a good combination between a possible high demand and a smaller number of restaurants**

**4 - Analyze complementary places to help the decision**

#### 1 - How many restaurants are there in Manhattan?

In [8]:
# @hidden_cell.

# Define Foursquare Credentials and Version

CLIENT_ID = 'E3E4BB5NG0TNE4TC3MEKMQRMRMU5DDTVEO2PS2K4SYX4NGXH' # your Foursquare ID
CLIENT_SECRET = 'T13M4KDRLX2KMSJOOTDP4KPRUMU5GN00VPLMB3R0R54AR1OM' # your Foursquare Secret
VERSION = '20210629' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

In [9]:
# We define a variable to extract from Foursquare the data that we want

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [10]:
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )
manhattan_venues.head(10)

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place
1,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,Yoga Studio
2,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner
3,Marble Hill,40.876551,-73.91066,Rite Aid,40.875467,-73.908906,Pharmacy
4,Marble Hill,40.876551,-73.91066,Subway,40.874667,-73.909586,Sandwich Place
5,Marble Hill,40.876551,-73.91066,Vitamin Shoppe,40.87716,-73.905632,Supplement Shop
6,Marble Hill,40.876551,-73.91066,Baskin-Robbins,40.877132,-73.906678,Ice Cream Shop
7,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop
8,Marble Hill,40.876551,-73.91066,America's Best Contacts & Eyeglasses,40.874001,-73.909693,Optical Shop
9,Marble Hill,40.876551,-73.91066,The Children's Place,40.873672,-73.908156,Kids Store


In [11]:
#The shape of our data

print('Shape of our data:',manhattan_venues.shape)
manhattan_venues['Venue Category'].value_counts().to_frame()

Shape of our data: (3244, 7)


Unnamed: 0,Venue Category
Coffee Shop,142
Italian Restaurant,139
Pizza Place,84
Café,80
American Restaurant,79
Bakery,77
Park,67
Hotel,65
Gym / Fitness Center,61
Bar,54


We can see that Venues with category of **Italian Restaurant, Coffe Shops and Pizza place** have a significant number in Manhattan

In [12]:
# Now we select only the Restaurants in our new DataFrame

manhattan_restaurant = manhattan_venues[manhattan_venues['Venue Category'].str.contains("Restaurant")]
manhattan_restaurant.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
13,Marble Hill,40.876551,-73.91066,Land & Sea Restaurant,40.877885,-73.905873,Seafood Restaurant
22,Marble Hill,40.876551,-73.91066,Grill 26 at TCR,40.878802,-73.915672,American Restaurant
26,Chinatown,40.715618,-73.994279,Kiki's,40.714476,-73.992036,Greek Restaurant
30,Chinatown,40.715618,-73.994279,Spicy Village,40.71701,-73.99353,Chinese Restaurant
32,Chinatown,40.715618,-73.994279,Xi'an Famous Foods,40.715232,-73.997263,Chinese Restaurant


In [13]:
# Again we shape our data
print('Shape of our data:',manhattan_restaurant.shape)
manhattan_restaurant['Venue Category'].value_counts().to_frame()


Shape of our data: (927, 7)


Unnamed: 0,Venue Category
Italian Restaurant,139
American Restaurant,79
Mexican Restaurant,52
Sushi Restaurant,49
Chinese Restaurant,45
French Restaurant,42
Japanese Restaurant,41
Seafood Restaurant,37
Thai Restaurant,35
Mediterranean Restaurant,31


In [14]:
print('There are {} uniques categories.'.format(len(manhattan_restaurant['Venue Category'].unique())))

There are 74 uniques categories.


Now we know how many restaurants there are (927) and how many types (74). Also we see that **Italian, American, Sushi, and Mexican**  restaurant are the most popular ones and others cuisines like **Himalayan, Swiss, Afghan, Moroccan, Czech** not.
Now let's count each restaurant by neighborhood

In [15]:
pivot_restaurant = pd.pivot_table(manhattan_restaurant, index = ['Neighborhood','Venue Category'], aggfunc=len)

In [16]:
pivot_restaurant_total = pd.concat([d.append(d.sum().rename((k, 'Total')))
                            for k, d in pivot_restaurant.groupby(level=0)
                            ]).append(pivot_restaurant.sum().rename(('Grand', 'Total')))

pivot_restaurant_total[['Venue']]

Unnamed: 0_level_0,Unnamed: 1_level_0,Venue
Neighborhood,Venue Category,Unnamed: 2_level_1
Battery Park City,American Restaurant,1.0
Battery Park City,Chinese Restaurant,1.0
Battery Park City,Italian Restaurant,1.0
Battery Park City,Mediterranean Restaurant,1.0
Battery Park City,Mexican Restaurant,1.0
Battery Park City,Seafood Restaurant,1.0
Battery Park City,Total,6.0
Carnegie Hill,American Restaurant,1.0
Carnegie Hill,Argentinian Restaurant,1.0
Carnegie Hill,Chinese Restaurant,1.0


In [17]:
# Save this pivot table in a csv for analysis purposes

pivot_restaurant_total[['Venue']].to_csv('Pivot.csv')

### 1 - How many restaurants are there in Manhattan? First insight <a class="anchor" id="8"></a>

The neighborhoods of most quantity of restaurants are the following:

- **Greenwich Village** with 46 restaurants
- **Upper West Side** with 40 restaurants
- **East Village** with 38 restaurants
- **Turtle Bay** with 35 restaurants
- **Midtown South** with 35 restaurants

And the neighborhoods with less restaurants  (business opportunity) are:

- **Marble Hill** with 2 restaurants
- **Roosevelt Island** with 4 restaurants
- **Battery Park City** with 6 restaurants
- **Morningside Heights** with 10 restaurants
- **Lower East Side** with 12 restaurants
 
 
 

#### 2 - Cluster the restaurants by neighborhood to find more insights

In [18]:
# one hot encoding
manhattan_onehot = pd.get_dummies(manhattan_restaurant[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = manhattan_restaurant['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

manhattan_onehot.head()
restaurants_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()
restaurants_grouped.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,American Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,Austrian Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Chinese Restaurant,Cuban Restaurant,Czech Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Eastern European Restaurant,English Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Greek Restaurant,Hawaiian Restaurant,Himalayan Restaurant,Hotpot Restaurant,Indian Restaurant,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jewish Restaurant,Kebab Restaurant,Korean Restaurant,Kosher Restaurant,Latin American Restaurant,Lebanese Restaurant,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,New American Restaurant,North Indian Restaurant,Paella Restaurant,Persian Restaurant,Peruvian Restaurant,Ramen Restaurant,Restaurant,Russian Restaurant,Scandinavian Restaurant,Seafood Restaurant,Shanghai Restaurant,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Spanish Restaurant,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taiwanese Restaurant,Tapas Restaurant,Thai Restaurant,Tibetan Restaurant,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant
0,Battery Park City,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Carnegie Hill,0.0,0.0,0.047619,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.047619,0.0,0.095238,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.047619,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.047619,0.0,0.047619
2,Central Harlem,0.0,0.142857,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Chelsea,0.0,0.0,0.136364,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.136364,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.136364,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.090909,0.045455,0.0,0.0,0.0,0.0,0.0,0.0
4,Chinatown,0.0,0.0,0.088235,0.0,0.058824,0.029412,0.029412,0.0,0.0,0.029412,0.0,0.0,0.205882,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.029412,0.0,0.0,0.029412,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.058824,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.029412,0.0,0.029412


In [19]:
#number of clusters

k_clusters = 4
restaurants_cluster = restaurants_grouped.drop('Neighborhood', 1)
#k-means clustering
Kmeans = KMeans(n_clusters = k_clusters, random_state =0).fit(restaurants_cluster)
# check cluster labels generated for each row in the dataframe
Kmeans.labels_[0:10]

array([3, 0, 3, 0, 3, 0, 0, 3, 3, 0])

In [20]:
#Add the clustering labels
restaurants_grouped.insert(0, 'Cluster_labels', Kmeans.labels_)

#joining the dataframes to add the latitudes and longitudes of each venue
restaurants_merged = manhattan_restaurant.join(restaurants_grouped.set_index('Neighborhood'), on = 'Neighborhood')
restaurants_merged.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Cluster_labels,Afghan Restaurant,African Restaurant,American Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,Austrian Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Chinese Restaurant,Cuban Restaurant,Czech Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Eastern European Restaurant,English Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Greek Restaurant,Hawaiian Restaurant,Himalayan Restaurant,Hotpot Restaurant,Indian Restaurant,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jewish Restaurant,Kebab Restaurant,Korean Restaurant,Kosher Restaurant,Latin American Restaurant,Lebanese Restaurant,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,New American Restaurant,North Indian Restaurant,Paella Restaurant,Persian Restaurant,Peruvian Restaurant,Ramen Restaurant,Restaurant,Russian Restaurant,Scandinavian Restaurant,Seafood Restaurant,Shanghai Restaurant,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Spanish Restaurant,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taiwanese Restaurant,Tapas Restaurant,Thai Restaurant,Tibetan Restaurant,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant
13,Marble Hill,40.876551,-73.91066,Land & Sea Restaurant,40.877885,-73.905873,Seafood Restaurant,2,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
22,Marble Hill,40.876551,-73.91066,Grill 26 at TCR,40.878802,-73.915672,American Restaurant,2,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
26,Chinatown,40.715618,-73.994279,Kiki's,40.714476,-73.992036,Greek Restaurant,3,0.0,0.0,0.088235,0.0,0.058824,0.029412,0.029412,0.0,0.0,0.029412,0.0,0.0,0.205882,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.029412,0.0,0.0,0.029412,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.058824,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.029412,0.0,0.029412
30,Chinatown,40.715618,-73.994279,Spicy Village,40.71701,-73.99353,Chinese Restaurant,3,0.0,0.0,0.088235,0.0,0.058824,0.029412,0.029412,0.0,0.0,0.029412,0.0,0.0,0.205882,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.029412,0.0,0.0,0.029412,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.058824,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.029412,0.0,0.029412
32,Chinatown,40.715618,-73.994279,Xi'an Famous Foods,40.715232,-73.997263,Chinese Restaurant,3,0.0,0.0,0.088235,0.0,0.058824,0.029412,0.029412,0.0,0.0,0.029412,0.0,0.0,0.205882,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.029412,0.0,0.0,0.029412,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.058824,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.029412,0.0,0.029412


In [21]:
#Manhattan coordinate

latitude = 40.7896239
longitude = -73.9598939

# create map of Manhattan using latitude and longitude values
restaurants_map = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k_clusters)
ys = [i + x + (i*x)**2 for i in range(k_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to map
for lat, lng, venue, cluster in zip(restaurants_merged['Venue Latitude'], restaurants_merged['Venue Longitude'], restaurants_merged['Venue'], restaurants_merged['Cluster_labels']):
    label = 'Cluster:{}, {}'.format(cluster, venue)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=[cluster-1],
        fill_opacity=0.7,
        parse_html=False).add_to(restaurants_map)  
    
restaurants_map

#### Analyze each cluster

In [22]:
cluster1 = restaurants_merged.loc[restaurants_merged['Cluster_labels'] == 0, restaurants_merged.columns[[0,6]]]

print('There are {} uniques categories.'.format(len(cluster1['Venue Category'].unique())))
cluster1['Venue Category'].value_counts().to_frame()

There are 64 uniques categories.


Unnamed: 0,Venue Category
Italian Restaurant,128
American Restaurant,62
Sushi Restaurant,42
French Restaurant,36
Japanese Restaurant,34
Mediterranean Restaurant,27
Thai Restaurant,26
Korean Restaurant,25
Mexican Restaurant,23
Indian Restaurant,23


In Cluster1 the most popular cuisine is **Italian**, and the less popular is **Carribean**

In [23]:
cluster2 = restaurants_merged.loc[restaurants_merged['Cluster_labels'] == 1, restaurants_merged.columns[[0,6]]]

print('There are {} uniques categories.'.format(len(cluster2['Venue Category'].unique())))
cluster2['Venue Category'].value_counts().to_frame()


There are 3 uniques categories.


Unnamed: 0,Venue Category
Greek Restaurant,1
Japanese Restaurant,1
American Restaurant,1


In Cluster2 are 3 types of restaurants: **American, Japanese and Greek**

In [24]:
cluster3 = restaurants_merged.loc[restaurants_merged['Cluster_labels'] == 2, restaurants_merged.columns[[0,6]]]

print('There are {} uniques categories.'.format(len(cluster3['Venue Category'].unique())))
cluster3['Venue Category'].value_counts().to_frame()


There are 2 uniques categories.


Unnamed: 0,Venue Category
American Restaurant,1
Seafood Restaurant,1


In Cluster3 there are two types of cuisines: **American** and **Seafood**

In [25]:
cluster4 = restaurants_merged.loc[restaurants_merged['Cluster_labels'] == 3, restaurants_merged.columns[[0,6]]]

print('There are {} uniques categories.'.format(len(cluster4['Venue Category'].unique())))
cluster4['Venue Category'].value_counts().to_frame()

There are 49 uniques categories.


Unnamed: 0,Venue Category
Mexican Restaurant,29
Chinese Restaurant,23
American Restaurant,15
Seafood Restaurant,14
Italian Restaurant,11
Latin American Restaurant,10
Thai Restaurant,9
Vietnamese Restaurant,9
Caribbean Restaurant,9
Spanish Restaurant,8


In Cluster4 the most popular cuisine is **Mexican**, and the less popular is **Tibetan**

### 2 - Cluster the restaurants and analyze them - Second insight <a class="anchor" id="9"></a>

Each cluster has their preferred Cuisine and in this theoretical exercise, we will not recommend compete with most popular categories, such as Korean, Italian, American, Mexican and Chinese

Also, data provides insight about restaurants types and variety across all clusters: There aren't many healthy food restaurants in Manhattan. For example, considering veggie/ vegan food as healthy food, data shows there are less than twenty restaurants in the entire city that offer this type of food.

#### 3 - Population by neighborhood (Webscrapping from alternative source)

In [26]:
url = "https://www.worldatlas.com/articles/manhattan-neighborhoods-by-population.html"
web = requests.get(url)
soup = BeautifulSoup(web.text, 'lxml') # soup object
table = soup.find(id="article_table", class_='mod_excess excess_show_desktop')

In [27]:
headers=[]

for i in table.find_all('th'):
    title = i.text.strip()
    headers.append(title)
    
manhattan_population = pd.DataFrame(columns = headers)

In [28]:
for row in table.find_all('tr')[1:]:
    data = row.find_all('td')
    row_data = [td.text.strip() for td in data]
    lenght = len(manhattan_population)
    manhattan_population.loc[lenght] = row_data

In [29]:
manhattan_population.head(10
                         )

Unnamed: 0,Rank,﻿Neighborhood,Population
0,1,Midtown,391371
1,2,Lower Manhattan,382654
2,3,Harlem,335109
3,4,Upper East Side,229688
4,5,Upper West Side,209084
5,6,Washington Heights,158318
6,7,East Harlem,115921
7,8,Chinatown,100000
8,9,Lower East Village,72957
9,10,Alphabet City,63347


### 3 - Include in the analysis the total population of each neighborhood to find a good combination between a possible high demand and a smaller number of restaurants - Third insight  <a class="anchor" id="10"></a>

According to this data, and considering that Manhattan is a world-class tourist place, it is important to show the most populated neighborhoods to develop the business in at least one of them.

Seems like we find one possible neighborhood that combine small quantity of restaurants and it's one of the top 10 most populated neighborhood: **Lower East Side** (The East Village is a neighborhood on the East Side of Lower Manhattan in New York City)

**4 - Analyze complementary business**

In [30]:
# FIlter in the original Dataframe all the venues, except Restaurants

manhattan_other_venues = manhattan_venues[~manhattan_venues['Venue Category'].str.contains("Restaurant")]
manhattan_other_venues.head()


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place
1,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,Yoga Studio
2,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner
3,Marble Hill,40.876551,-73.91066,Rite Aid,40.875467,-73.908906,Pharmacy
4,Marble Hill,40.876551,-73.91066,Subway,40.874667,-73.909586,Sandwich Place


In [32]:
lower_east_side_venues = manhattan_other_venues.loc[manhattan_other_venues['Neighborhood'] == 'Lower East Side', manhattan_other_venues.columns[[0,6]]]
lower_east_side_venues['Venue Category'].value_counts().to_frame()


Unnamed: 0,Venue Category
Art Gallery,3
Pizza Place,2
Café,2
Bakery,2
Park,2
Performing Arts Venue,1
Theater,1
Pharmacy,1
Shoe Store,1
Diner,1


### 4 - Analyze complementary places to help the decision - Fourth Insight <a class="anchor" id="11"></a>

Focusing on our selected neighborhood (Lower East Side) we discovered that there are not so many other different venues, but they have 3 art galleries and 2 parks, for example.

## Conclusion <a class="anchor" id="12"></a>


The purpose of the project was to collect data on NYC neighborhoods by performing an analysis of data on restaurants and other places to help stakeholders who want to open a restaurant to locate the best area in NYC neighborhoods.

Data has been collected from NYC on neighborhoods and used to find different places in those neighborhoods using the FourSquare API and also an alternative source to understand population by neighborhood.

With the information available we can conclude that we recommend a **Healthy food restaurant in the Lower East Side** neighborhood of Manhattan. The main reasons that lead us to this conclusion are the following:

- Lower East Side is within the TOP 10 with the largest population in Manhattan.


- There are just a few healthy food restaurants in Manhattan. Competing against cuisines such as Italian, American, and Mexican would not be convenient.


- In the Lower East Side, although there are not many places in general, parks and art galleries prevail, which can be considered places that may be associated with the type of recommended restaurant.
