# Capstone Project
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In the capstone project I will try to find a good location for a Indian restaurant in Manhattan. Specifically, this report will be targeted to stakeholders interested in opening an **Indian restaurant** in **Manhattan**, NY.

Since there are 2874 restaurants in Manhattan I will try to find locations 
* that are not already crowded with restaurants.
* where there are as few Indian restaurants as possible in the closer area around. 
* Where the share of Indian restaurants in the neighborhood is very little.
* which are as close to the center of Manhattan as possible. 

With Data Science I will try to find and present to the stakholders the most promissing neigborhoods of Manhattan where to open up a Indian restaurant. 

## Data <a name="data"></a>

Based on the definition of the Business Problem, the decsission will be influenced by the following factors:

* total number of existing restaurants in the neighborhood.
* total number of Indian restaurants in the neighborhood.
* share of Indian restaurants in the neighborhood.
* distance to the next Indian restaurant, if there are any. 
* distance from city center.

To find the most promissing neighborhoods to open up a Indian restaurant in Manhattan I will use the following data sources:

* The Information about all the neighborhoods in Manhattan and their centers is available at the **New York Dataset** (https://cocl.us/new_york_dataset).
* To find all restaurants/Indian restaurants in each neighborhood I will use the **Foursquare API**.
* To define a heatmap of the most promissing neighborhoods I need the **borders of each neighborhood** which are available at https://raw.githubusercontent.com/ibuilder/NYCPolyline/master/manhattan.geojson

In [3]:
#!conda install -c conda-forge folium --yes
#!conda install -c conda-forge geopy --yes
import numpy as np
import pandas as pd
import json
from geopy.geocoders import Nominatim
import requests
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium
import pyproj
import math

print('All nessecary Libraries imported!')

All nessecary Libraries imported!


Load New York dataset about neigborhoods

In [4]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


In [5]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [6]:
neighborhoods_data = newyork_data['features']

In [7]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [8]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

Filter the dataframe for neighborhoods of Manhattan

In [9]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688
5,Manhattan,Manhattanville,40.816934,-73.957385
6,Manhattan,Central Harlem,40.815976,-73.943211
7,Manhattan,East Harlem,40.792249,-73.944182
8,Manhattan,Upper East Side,40.775639,-73.960508
9,Manhattan,Yorkville,40.77593,-73.947118


get latitude an longitude of manhattan with geopy

In [10]:
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 40.7900869, -73.9598295.


create a folium map of New York and mark all Manhattan neigborhoods and the center of Manhattan in it!

In [11]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=13)
folium.CircleMarker([latitude, longitude], radius=7, color='orange', fill=True, fill_color='orange', fill_opacity=1).add_to(map_manhattan)  
folium.Circle(location=[latitude, longitude], radius=1000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=3000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=5000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=7000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=9000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=11000, fill=False, color='white').add_to(map_manhattan)
# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)   

map_manhattan

Define dataframe with all neighborhoods, latitude, longitude, distance to center of Manhattan, x, y

In [12]:
def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

In [13]:
#calculate distances from center
distance_from_center=[]
X=[]
Y=[]

manhatten_longitude= longitude
manhatten_latitude=latitude
manhatten_x, manhatten_y= lonlat_to_xy(manhatten_longitude,manhatten_latitude)

for i in range(len(manhattan_data)):
    neigborhood_x, neigborhood_y= lonlat_to_xy(manhattan_data['Longitude'][i],manhattan_data['Latitude'][i])
    distance_from_center.append(calc_xy_distance(manhatten_x, manhatten_y, neigborhood_x, neigborhood_y)) 
    X.append(neigborhood_x)
    Y.append(neigborhood_y)

In [14]:
manhattan_data = manhattan_data.drop('Borough', 1)
manhattan_data['X']=X
manhattan_data['Y']=Y
manhattan_data['Distance from Center']=distance_from_center
manhattan_data

Unnamed: 0,Neighborhood,Latitude,Longitude,X,Y,Distance from Center
0,Marble Hill,40.876551,-73.91066,-5794205.0,9858099.0,15945.318731
1,Chinatown,40.715618,-73.994279,-5821760.0,9868103.0,13386.331413
2,Washington Heights,40.851903,-73.9369,-5798470.0,9861349.0,10875.558655
3,Inwood,40.867684,-73.92121,-5795743.0,9859410.0,14045.714502
4,Hamilton Heights,40.823604,-73.949688,-5803305.0,9862859.0,5825.579136
5,Manhattanville,40.816934,-73.957385,-5804461.0,9863817.0,4558.839318
6,Central Harlem,40.815976,-73.943211,-5804573.0,9861989.0,4879.562566
7,East Harlem,40.792249,-73.944182,-5808594.0,9862002.0,2048.077645
8,Upper East Side,40.775639,-73.960508,-5811466.0,9864025.0,2450.104944
9,Yorkville,40.77593,-73.947118,-5811369.0,9862302.0,2904.700389


Insert all Foursquare credetials 

In [15]:
#hidden cell
CLIENT_ID = 'ZZFPNPGKMCMTFXJ03VWM5VB10NGEHUYYFQP3OSKHSAMU5SAS' # your Foursquare ID
CLIENT_SECRET = 'WBXMGAKRY11BE2F5K0RV5VQWGZRNGLIMPPG1XKFJYSNXVCYF' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

Query all restaurants and all indian  restaurants for each neighborhood from Foursquare API

* food_category = '4d4b7105d754a06374d81259'
* indian_restaurant='4bf58dd8d48988d10f941735'

In [16]:
def getNearbyVenues(names, latitudes, longitudes, category, radius=500, LIMIT=200):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            category,
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [17]:
manhattan_restaurants = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude'],category='4d4b7105d754a06374d81259', radius=500, LIMIT=200
                                  )

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


In [18]:
manhattan_indian_restaurants = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude'],category='4bf58dd8d48988d10f941735', radius=500, LIMIT=200
                                  )

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


In [19]:
print(manhattan_restaurants.shape)
manhattan_restaurants.head(20)

(2891, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place
1,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner
2,Marble Hill,40.876551,-73.91066,Dunkin',40.877136,-73.906666,Donut Shop
3,Marble Hill,40.876551,-73.91066,Land & Sea Restaurant,40.877885,-73.905873,Seafood Restaurant
4,Marble Hill,40.876551,-73.91066,Parrilla Latina,40.877473,-73.906073,Steakhouse
5,Marble Hill,40.876551,-73.91066,Subway Sandwiches,40.874667,-73.909586,Sandwich Place
6,Marble Hill,40.876551,-73.91066,Boston Market,40.87743,-73.905412,American Restaurant
7,Marble Hill,40.876551,-73.91066,SUBWAY,40.878493,-73.905385,Sandwich Place
8,Marble Hill,40.876551,-73.91066,Subway,40.87772,-73.90538,Sandwich Place
9,Marble Hill,40.876551,-73.91066,Hernandez Grocery,40.875897,-73.912591,Deli / Bodega


In [20]:
print(manhattan_indian_restaurants.shape)
manhattan_indian_restaurants.head(20)

(274, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Chinatown,40.715618,-73.994279,Nyonya,40.719155,-73.996893,Malay Restaurant
1,Chinatown,40.715618,-73.994279,New Malaysia,40.715787,-73.996905,Malay Restaurant
2,Chinatown,40.715618,-73.994279,Dirt Candy,40.71789,-73.991015,Vegetarian / Vegan Restaurant
3,Chinatown,40.715618,-73.994279,Sanuria Restaurant,40.714681,-73.998006,Malay Restaurant
4,Chinatown,40.715618,-73.994279,Curry House Indian Cuisine,40.719046,-73.990849,Indian Restaurant
5,Chinatown,40.715618,-73.994279,Roasting Plant Coffee,40.717784,-73.990453,Coffee Shop
6,Washington Heights,40.851903,-73.9369,Kismat Indian Restaurant,40.855222,-73.936967,Indian Restaurant
7,Hamilton Heights,40.823604,-73.949688,Clove Indian Restaurant & Bar,40.82128,-73.95062,Indian Restaurant
8,Hamilton Heights,40.823604,-73.949688,Mumbai Masala,40.826866,-73.946486,Indian Restaurant
9,Manhattanville,40.816934,-73.957385,Chapati House - NYC,40.814572,-73.959154,Indian Restaurant


In [21]:
print('Total number of restaurants in Manhattan:', len(manhattan_restaurants))
print('Total number of Indian restaurants in Manhatten:', len(manhattan_indian_restaurants))
print('Percentage of Indian restaurants in Mahattan: {:.2f}%'.format(len(manhattan_indian_restaurants) / len(manhattan_restaurants) * 100))

Total number of restaurants in Manhattan: 2891
Total number of Indian restaurants in Manhatten: 274
Percentage of Indian restaurants in Mahattan: 9.48%


Create a folium map to display all restaurants in Manhatten and show them in different colors. **Indian restauants in green** and **other restauratns in red** and the **center of Manhattan in orange**

In [76]:
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=13)
folium.CircleMarker([latitude, longitude], radius=7, color='orange', fill=True, fill_color='orange', fill_opacity=1).add_to(map_manhattan)  
folium.Circle(location=[latitude, longitude], radius=1000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=3000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=5000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=7000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=9000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=11000, fill=False, color='white').add_to(map_manhattan)
  
for lat, lng in zip(manhattan_restaurants['Venue Latitude'], manhattan_restaurants['Venue Longitude']):
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
for lat, lng in zip(manhattan_indian_restaurants['Venue Latitude'], manhattan_indian_restaurants['Venue Longitude']):
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        color='green',
        fill=True,
        fill_color='green',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  

map_manhattan

Now we developed a feeling for the data.<br>
We have gathered all the information we need to do our further analysis.<br>

* We know all neighborhoods and their center location
* We know all restaurants of Manhattan and their location
* We know all Indian restaurants and there location
* We can visualize all locations and types of restaurants in Manhattan

This concludes the Data preparation phase and now we can continue with the analysis of the data to find the most promising neighborhoods. 

## Methodology <a name="methodology"></a>

The goal of this project is to detect the most promising areas of Manhattan where to open up a Indian restaurant. 

In the **first step** I want so see if I can **identify some areas in Manhattan with low density of restaurants/Indian restaurants that are as close as possible to the center of Manhattan**.

Therefore I **calculate additional figures for each neighborhood to get a better understanding of the data**: 
* Number of restaurants in every neighborhood
* Number of Indian restaurants in every neighborhood
* Percentage of Indian restaurants in every neighborhood
* Distance from the center of a neighborhood to the next Indian restaurant

Then I will **use heatmaps to visualize**:
* the density of restaurants
* the density of Indian restaurants

and **choropleth maps to visualize**:
* the percentage of Indian restaurants in a neighborhood
* the distance from the center of a neighborhood to the next Indian restaurant

In the **second step** I will use the identified areas and **generate a grid of cells** for those areas. <br> 
**For every grid cell I will calculate some figures in order to define how good the location is** and to be able to **filter them to get a map of all the areas that are promising to open up a Indian restaurant**.<br>
For each grid cell the following figures will be calculated:
* Latitude
* Longitude
* Nearby restaurants
* Distance to next indian restaurant
* Distance to center of Manhattan

Then the generated dataframe of all grid cells will be filtered for grid cell where:
* the next Indian restaurant is more than 500m away
* and there are no restaurants within an radius of 250m 

In the **final step** I will generate a **heatmap to visualize the filtered list of grid cells** which represent a map of all the **promising locations** to open up a Indian restaurant in Manhattan. 

## Analysis <a name="analysis"></a>

Lets start the analysis with identify some areas in Manhattan with low density of restaurants/Indian restaurants that are as close as possible to the center of Manhattan therefore lets derive some additional data from our prepared dataset. 

First we need the **number of restaurants and the number of Indian restaurants in every neighborhood**.

In [23]:
#get the total number of restaurants in each neighborhood 
restaurants_count=manhattan_restaurants['Neighborhood'].value_counts()
restaurants_count = pd.DataFrame([restaurants_count])
restaurants_count=restaurants_count.transpose().reset_index()
restaurants_count.columns =['Neighborhood','Count']

#get the total number of Indian restaurants in each neighborhood 
indian_restaurants_count=manhattan_indian_restaurants['Neighborhood'].value_counts()
indian_restaurants_count = pd.DataFrame([indian_restaurants_count])
indian_restaurants_count=indian_restaurants_count.transpose().reset_index()
indian_restaurants_count.columns =['Neighborhood','Count']

restaurants_count.head()
indian_restaurants_count.head()

Unnamed: 0,Neighborhood,Count
0,Noho,29
1,East Village,21
2,Midtown,21
3,Greenwich Village,19
4,Midtown South,18


In [24]:
manhattan_data.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,X,Y,Distance from Center
0,Marble Hill,40.876551,-73.91066,-5794205.0,9858099.0,15945.318731
1,Chinatown,40.715618,-73.994279,-5821760.0,9868103.0,13386.331413
2,Washington Heights,40.851903,-73.9369,-5798470.0,9861349.0,10875.558655
3,Inwood,40.867684,-73.92121,-5795743.0,9859410.0,14045.714502
4,Hamilton Heights,40.823604,-73.949688,-5803305.0,9862859.0,5825.579136


In [25]:
manhattan_data_v2=manhattan_data

In [26]:
manhattan_data_v2['Number of Restaurants']=manhattan_data_v2.Neighborhood.map(restaurants_count.set_index('Neighborhood')['Count'].to_dict())
manhattan_data_v2['Number of Indian Restaurants']=manhattan_data_v2.Neighborhood.map(indian_restaurants_count.set_index('Neighborhood')['Count'].to_dict())
manhattan_data_v2['Number of Indian Restaurants'].fillna(0, inplace=True)
manhattan_data_v2.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,X,Y,Distance from Center,Number of Restaurants,Number of Indian Restaurants
0,Marble Hill,40.876551,-73.91066,-5794205.0,9858099.0,15945.318731,15,0.0
1,Chinatown,40.715618,-73.994279,-5821760.0,9868103.0,13386.331413,100,6.0
2,Washington Heights,40.851903,-73.9369,-5798470.0,9861349.0,10875.558655,74,1.0
3,Inwood,40.867684,-73.92121,-5795743.0,9859410.0,14045.714502,52,0.0
4,Hamilton Heights,40.823604,-73.949688,-5803305.0,9862859.0,5825.579136,62,2.0


Next we calculate the **percentage of Indian restaurants in each neighborhood**.

In [27]:
Percentage=[]
for i in range(len(manhattan_data_v2['Neighborhood'].unique())):
    Percentage.append(round(manhattan_data_v2['Number of Indian Restaurants'][i]/manhattan_data_v2['Number of Restaurants'][i],2))

manhattan_data_v2['Percentage of Indian Restaurants']=Percentage
manhattan_data_v2.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,X,Y,Distance from Center,Number of Restaurants,Number of Indian Restaurants,Percentage of Indian Restaurants
0,Marble Hill,40.876551,-73.91066,-5794205.0,9858099.0,15945.318731,15,0.0,0.0
1,Chinatown,40.715618,-73.994279,-5821760.0,9868103.0,13386.331413,100,6.0,0.06
2,Washington Heights,40.851903,-73.9369,-5798470.0,9861349.0,10875.558655,74,1.0,0.01
3,Inwood,40.867684,-73.92121,-5795743.0,9859410.0,14045.714502,52,0.0,0.0
4,Hamilton Heights,40.823604,-73.949688,-5803305.0,9862859.0,5825.579136,62,2.0,0.03


Now we calculate the **distance of the center of a neighborhood to the next Indian restaurant**.

In [28]:
Distances=[]
for i in range(len(manhattan_data_v2['Neighborhood'].unique())):
    shortest_distance=None
    
    latitude_neighborhood=manhattan_data_v2['Latitude'][i]
    longitude_neighborhood=manhattan_data_v2['Longitude'][i]
    #calculate x, y of neighborhood
    x_neigh, y_neigh=lonlat_to_xy(longitude_neighborhood,latitude_neighborhood)
    
    for s in range(manhattan_indian_restaurants.shape[0]):
        latitude_restaurant=manhattan_indian_restaurants['Venue Latitude'][s]
        longitude_restaurant=manhattan_indian_restaurants['Venue Longitude'][s]
        
        #calculate x, y of Indian restaurant
        x_rest, y_rest=lonlat_to_xy(longitude_restaurant,latitude_restaurant)
        
        #calculate distance.
        dist = calc_xy_distance(x_neigh, y_neigh, x_rest, y_rest)
        if shortest_distance==None:
            shortest_distance=dist
        elif dist<shortest_distance:
            shortest_distance=dist
    Distances.append(round(shortest_distance,2))

manhattan_data_v2['Distance to Indian Restaurants from Center']=Distances
manhattan_data_v2

Unnamed: 0,Neighborhood,Latitude,Longitude,X,Y,Distance from Center,Number of Restaurants,Number of Indian Restaurants,Percentage of Indian Restaurants,Distance to Indian Restaurants from Center
0,Marble Hill,40.876551,-73.91066,-5794205.0,9858099.0,15945.318731,15,0.0,0.0,4943.62
1,Chinatown,40.715618,-73.994279,-5821760.0,9868103.0,13386.331413,100,6.0,0.06,340.24
2,Washington Heights,40.851903,-73.9369,-5798470.0,9861349.0,10875.558655,74,1.0,0.01,561.78
3,Inwood,40.867684,-73.92121,-5795743.0,9859410.0,14045.714502,52,0.0,0.0,2922.91
4,Hamilton Heights,40.823604,-73.949688,-5803305.0,9862859.0,5825.579136,62,2.0,0.03,411.44
5,Manhattanville,40.816934,-73.957385,-5804461.0,9863817.0,4558.839318,41,2.0,0.05,460.2
6,Central Harlem,40.815976,-73.943211,-5804573.0,9861989.0,4879.562566,46,1.0,0.02,362.68
7,East Harlem,40.792249,-73.944182,-5808594.0,9862002.0,2048.077645,53,3.0,0.06,540.8
8,Upper East Side,40.775639,-73.960508,-5811466.0,9864025.0,2450.104944,80,3.0,0.04,538.49
9,Yorkville,40.77593,-73.947118,-5811369.0,9862302.0,2904.700389,91,3.0,0.03,336.31


In [29]:
print('On average the distance from the center of a neighborhod to the closest Indian restaurant is: ', manhattan_data_v2['Distance to Indian Restaurants from Center'].mean())

On average the distance from the center of a neighborhod to the closest Indian restaurant is:  578.28175


#### Heatmaps to visualize the desitiy of restaurants/Indian restaurants
Lets visualize the **density of restaurants in Manhattan with a heatmap**. <br>
**Red** means **higher density**.  <br>
The **blue dots** represent the **center of each neighborhood**.

In [55]:
from folium import plugins
from folium.plugins import HeatMap

In [56]:
manhattan_neighborhoods_url = 'https://raw.githubusercontent.com/ibuilder/NYCPolyline/master/manhattan.geojson'
manhattan_neighborhoods = requests.get(manhattan_neighborhoods_url).json()

def boroughs_style(feature):
    return { 'color': 'blue', 'fill': False }

In [57]:
restaurant_latlons=manhattan_restaurants['Venue Latitude'].to_frame()
restaurant_latlons['Venue Longitude']=manhattan_restaurants['Venue Longitude']

In [58]:
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=13)
folium.CircleMarker([latitude, longitude], radius=7, color='orange', fill=True, fill_color='orange', fill_opacity=1).add_to(map_manhattan)  
folium.Circle(location=[latitude, longitude], radius=1000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=3000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=5000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=7000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=9000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=11000, fill=False, color='white').add_to(map_manhattan)
HeatMap(restaurant_latlons).add_to(map_manhattan)
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
folium.GeoJson(manhattan_neighborhoods, style_function=boroughs_style, name='geojson').add_to(map_manhattan)
map_manhattan

Now lets visualize the **density of Indian restaurants in Manhattan with a heatmap**. <br>
**Red** meand **higher density**.  <br>
The **blue dots** represent the **center of each neighborhood**.

In [59]:
indian_restaurant_latlons=manhattan_indian_restaurants['Venue Latitude'].to_frame()
indian_restaurant_latlons['Venue Longitude']=manhattan_indian_restaurants['Venue Longitude']

In [60]:
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=13)
folium.CircleMarker([latitude, longitude], radius=7, color='orange', fill=True, fill_color='orange', fill_opacity=1).add_to(map_manhattan)  
folium.Circle(location=[latitude, longitude], radius=1000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=3000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=5000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=7000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=9000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=11000, fill=False, color='white').add_to(map_manhattan)
HeatMap(indian_restaurant_latlons).add_to(map_manhattan)
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
folium.GeoJson(manhattan_neighborhoods, style_function=boroughs_style, name='geojson').add_to(map_manhattan)
map_manhattan

Now lets visualize **both heatmaps together** to see if we can spot areas near the center of Manhattan with low density of restaurants an low density of Indian restaurants.

In [78]:
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=13)
folium.CircleMarker([latitude, longitude], radius=7, color='orange', fill=True, fill_color='orange', fill_opacity=1).add_to(map_manhattan)  
folium.Circle(location=[latitude, longitude], radius=1000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=3000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=5000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=7000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=9000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=11000, fill=False, color='white').add_to(map_manhattan)
HeatMap(restaurant_latlons).add_to(map_manhattan)
HeatMap(indian_restaurant_latlons).add_to(map_manhattan)
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
folium.GeoJson(manhattan_neighborhoods, style_function=boroughs_style, name='geojson').add_to(map_manhattan)
map_manhattan

#### Insights from desnsity maps: 

When having a look on the **density of restaurants/Indian restaurants in Manhattan** we can see that there are a few spaces with low density close to the centre of Manhattan.<br>
<br>
In the **close area around the center of Manhattan**:
* Bigger area north/north-east of central park 
* Small area west of the center of central park
* Bigger area south/ south-west of central park 

A **bit more away**:
* A bigger area south of central park in between West Village and East Village

So as we can see the areas wich have a overall low density of restaurants are matching the areas with a low density of Indian restaurants quiete well in the closer area around the center of Manhattan. 
We can see as well that the Heatmap of Indian restaurants is not that hot in general. With a overall share of round about 10 % the share of Indian restaurants is not that high in Manhattan.  
<br><br>
Unfortuanatly we can see that the Neighborhoods in the New_York_Dataset and the Geojson file are not matching perfectly.<br>
Some Neighborhoods are named differently of put together in the geojson file.<br><br>

#### Choropleth Maps to visualize the shares of Indian restaurants and the distance from center of a neighborhood to the next Indian restaurant 
Now lets visualize the **share of Indian restaurants in Manhattan with a choropleth map**. <br>
The share is colorcoded starting with a **low share in Yellow** increasing to a **higher share in Red**. <br>
The **blue dots** represent the **center of each neighborhood**.

In [62]:
newyork_geo = r'https://raw.githubusercontent.com/ibuilder/NYCPolyline/master/manhattan.geojson'

map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=13)
folium.CircleMarker([latitude, longitude], radius=7, color='orange', fill=True, fill_color='orange', fill_opacity=1).add_to(map_manhattan)  
folium.Circle(location=[latitude, longitude], radius=1000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=3000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=5000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=7000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=9000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=11000, fill=False, color='white').add_to(map_manhattan)
map_manhattan.choropleth(
    geo_data=newyork_geo,
    data=manhattan_data_v2,
    columns=['Neighborhood', 'Percentage of Indian Restaurants'],
    key_on='feature.properties.neighborhood',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Percentage of Indian Restaurants'
)
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
map_manhattan

Unfortuanatly the geojson file and the new_york_dataset doesnt match perfectly.<br>
The naming of the neighborhoods is sometimes slitley different and the centers of neighborhoods sometimes doesnt match the geojson file.<br>
<br>
eg. <br>
<br>
hamilton heights, manhattenville, central harlem = Harlem in geojson<br>
central park doesn exist in new_york_dataset <br>
Hudson Yards, Clinton = Hells Kitchen, Theater District in geojson ...

Now lets visualize the **distance from the center of a Neigborhood to the next Indian restaurant in Manhattan with a choropleth map**. <br>
The share is colorcoded starting with a **low share in Yellow** increasing to a **higher share in Red**. <br>
The **blue dots** represent the **center of each neighborhood**.

In [63]:
newyork_geo = r'https://raw.githubusercontent.com/ibuilder/NYCPolyline/master/manhattan.geojson'

map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=13)
folium.CircleMarker([latitude, longitude], radius=7, color='orange', fill=True, fill_color='orange', fill_opacity=1).add_to(map_manhattan)  
folium.Circle(location=[latitude, longitude], radius=1000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=3000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=5000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=7000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=9000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(location=[latitude, longitude], radius=11000, fill=False, color='white').add_to(map_manhattan)
map_manhattan.choropleth(
    geo_data=newyork_geo,
    data=manhattan_data_v2,
    columns=['Neighborhood', 'Distance to Indian Restaurants from Center'],
    key_on='feature.properties.neighborhood',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Percentage of Indian Restaurants'
)
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
map_manhattan

#### Insights from choropleth maps: 

Unfortunately we can´t take that much information out of the choropleth maps cause the areas we identified in the heatmaps are exactly the ones that are named differently in the geojson file. So especaly for those areas we cant see any information in the choropleth maps. 

Lets focus on the following tho areas to generate a grid of cells to evaluate each location in more detail.


In [64]:
center_manhattan=[latitude, longitude]
focus_area1=[40.762849, -73.980685]
focus_area2=[40.802038, -73.948810]

map_manhattan = folium.Map(location=center_manhattan, zoom_start=13)
HeatMap(restaurant_latlons).add_to(map_manhattan)
HeatMap(indian_restaurant_latlons).add_to(map_manhattan)
#folium.GeoJson(manhattan_neighborhoods, style_function=boroughs_style, name='geojson').add_to(map_manhattan)
folium.Marker(center_manhattan).add_to(map_manhattan)
folium.Circle(focus_area1, radius=1500, color='white', fill=True, fill_opacity=0.4).add_to(map_manhattan)
folium.Circle(focus_area2, radius=1100, color='white', fill=True, fill_opacity=0.4).add_to(map_manhattan)
map_manhattan

Lets proceed with the **second step** of our analysis. <br>
Lets define **define a grid of cells that cover the areas we identified before**.

In [66]:
# define focus areas
focus_area1=[40.762849, -73.980685]
#focus_area2=[40.80451, -73.946072]
#focus_area2=[40.803429, -73.947583]
focus_area2=[40.802038, -73.948810]


# define area 1 
lat1_min=focus_area1[0]-0.008
lon1_min=focus_area1[1]-0.014
lat1_max=focus_area1[0]+0.008
lon1_max=focus_area1[1]+0.014
# define area 2
lat2_min=focus_area2[0]-0.008*1300/1500
lon2_min=focus_area2[1]-0.014*1300/1500
lat2_max=focus_area2[0]+0.008*1300/1500
lon2_max=focus_area2[1]+0.014*1300/1500

#corner points of area 1
point1=[lat1_min,lon1_min]
point2=[lat1_max,lon1_min]
point3=[lat1_max,lon1_max]
point4=[lat1_min,lon1_max]
#corner points of area 2
point5=[lat2_min,lon2_min]
point6=[lat2_max,lon2_min]
point7=[lat2_max,lon2_max]
point8=[lat2_min,lon2_max]

#define lists for latitudes and longitudes 
focus_area_latitudes=[]
focus_area_longitudes=[]

#define a grid of points in area1
stepwith=0.0012
steps1_lat=int(round((lat1_max-lat1_min)/stepwith,0))
steps1_lon=int(round((lon1_max-lon1_min)/stepwith,0))

long=lon1_min
for i in range(steps1_lon):
    long=long+stepwith
    lati=lat1_min
    for s in range(steps1_lat):
        lati=lati+stepwith
        focus_area_latitudes.append(lati)
        focus_area_longitudes.append(long)
        
#define a grid of points in area2
steps2_lat=int(round((lat2_max-lat2_min)/stepwith,0))
steps2_lon=int(round((lon2_max-lon2_min)/stepwith,0))

long=lon2_min
for i in range(steps2_lon):
    long=long+stepwith
    lati=lat2_min
    for s in range(steps2_lat):
        lati=lati+stepwith
        focus_area_latitudes.append(lati)
        focus_area_longitudes.append(long)
        
print(str(len(focus_area_latitudes))+" grid points generated!")

539 grid points generated!


In [77]:
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=13)
folium.Marker(center_manhattan).add_to(map_manhattan)
for lat, lng, in zip(focus_area_latitudes, focus_area_longitudes):
    folium.CircleMarker(
        [lat, lng],
        radius=1,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
HeatMap(restaurant_latlons).add_to(map_manhattan)
HeatMap(indian_restaurant_latlons).add_to(map_manhattan)
folium.GeoJson(manhattan_neighborhoods, style_function=boroughs_style, name='geojson').add_to(map_manhattan)
map_manhattan

Looks great. The grids cover most of the free space nearby the center of Manhattan where there is a low density of restaurants and Indian restaurants as well.<br>
Now lets build a dataframe of all those points and calculate all the important figures for them: <br>
* Latitude
* Longitude
* Nearby restaurants
* Distance to next Indian restaurant
* Distance to center of Manhattan


In [68]:
restaurants_nearby=[]
distance_next_indian=[]
distance_center_manhattan=[]

for i in range(len(focus_area_latitudes)): #539

    #calculate x,y of grid point
    x_grid, y_grid = lonlat_to_xy(focus_area_longitudes[i], focus_area_latitudes[i])
    count=0
    shortest_distance=None
    
    #calculate number of restaurants in area of 250m around
    for s in range(len(restaurant_latlons)):
        x_restaurant, y_restaurant = lonlat_to_xy(restaurant_latlons['Venue Longitude'][s], restaurant_latlons['Venue Latitude'][s])
        distance=calc_xy_distance(x_grid, y_grid, x_restaurant, y_restaurant)
        if distance<250:
            count=count+1
    restaurants_nearby.append(count)
            
    #calculate distance to next Indian restaurant
    for k in range(len(indian_restaurant_latlons)):
        x_restaurant, y_restaurant = lonlat_to_xy(indian_restaurant_latlons['Venue Longitude'][k], indian_restaurant_latlons['Venue Latitude'][k])
        dist=calc_xy_distance(x_grid, y_grid, x_restaurant, y_restaurant)
        if shortest_distance==None:
            shortest_distance=dist
        elif dist<shortest_distance:
            shortest_distance=dist
    distance_next_indian.append(round(shortest_distance,0))
    
    #calculate distance to center of manhattan
    x_center_manhattan, y_center_manhattan = lonlat_to_xy(center_manhattan[1], center_manhattan[0])
    dist=calc_xy_distance(x_grid, y_grid, x_center_manhattan, y_center_manhattan)
    distance_center_manhattan.append(round(dist,0))

In [69]:
grid_df=pd.DataFrame({'Latitude':focus_area_latitudes,
                      'Longitude':focus_area_longitudes,
                      'Restaurants nearby':restaurants_nearby,
                      'Distance next Indian Restaurant':distance_next_indian,
                      'Distance to Center':distance_center_manhattan})
grid_df.head()

Unnamed: 0,Latitude,Longitude,Restaurants nearby,Distance next Indian Restaurant,Distance to Center
0,40.756049,-73.993485,18,143.0,7218.0
1,40.757249,-73.993485,15,69.0,7056.0
2,40.758449,-73.993485,15,187.0,6897.0
3,40.759649,-73.993485,24,234.0,6740.0
4,40.760849,-73.993485,14,298.0,6585.0


lets filter the restaurants. We are interested in locations with **no restaurant within a radius of 250m** and **no Indian restaurant in a radius of 500m**.

In [72]:
good_res_count = np.array((grid_df['Restaurants nearby']<=0))
print('Locations with no more than two restaurants nearby:', good_res_count.sum())
good_ind_distance = np.array(grid_df['Distance next Indian Restaurant']>=500)
print('Locations with no Indian restaurants within 500m:', good_ind_distance.sum())
good_locations = np.logical_and(good_res_count, good_ind_distance)
print('Locations with both conditions met:', good_locations.sum())

df_good_locations = grid_df[good_locations]
df_good_locations.head()

Locations with no more than two restaurants nearby: 417
Locations with no Indian restaurants within 500m: 406
Locations with both conditions met: 372


Unnamed: 0,Latitude,Longitude,Restaurants nearby,Distance next Indian Restaurant,Distance to Center
7,40.764449,-73.993485,0,767.0,6139.0
8,40.765649,-73.993485,0,959.0,5996.0
9,40.766849,-73.993485,0,1155.0,5858.0
10,40.768049,-73.993485,0,1353.0,5723.0
11,40.769249,-73.993485,0,1553.0,5592.0


Lets visualize the grid points with no restaurant within 250m and no Indian restaurant within 500m.

In [74]:
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=13)
folium.Marker(center_manhattan).add_to(map_manhattan)
for lat, lng, in zip(df_good_locations['Latitude'], df_good_locations['Longitude']):
    folium.CircleMarker(
        [lat, lng],
        radius=1,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
HeatMap(restaurant_latlons).add_to(map_manhattan)
HeatMap(indian_restaurant_latlons).add_to(map_manhattan)
folium.GeoJson(manhattan_neighborhoods, style_function=boroughs_style, name='geojson').add_to(map_manhattan)
map_manhattan

Looks good. The remaining grid cells are perfectly matching inbetween the Heatmap of restaurants and indian restaurants.

Lets visualize a heatmap of the good locations that are matching the criteria of no restaurant in a distance of 250m and no Indian restaurant within an radius of 500m. 

In [75]:
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=13)
folium.Marker(center_manhattan).add_to(map_manhattan)
HeatMap(pd.DataFrame({'Latitude':df_good_locations['Latitude'],
                      'Longitude':df_good_locations['Longitude']})).add_to(map_manhattan)
folium.GeoJson(manhattan_neighborhoods, style_function=boroughs_style, name='geojson').add_to(map_manhattan)
map_manhattan

The map represents the **final result**. It visualizes all the promissing areas close to the center of Manhattan to open up a Indian restaurant. <br>
Be aware that the part of the heatmap overlaping with the central park needs to be ignored cause there it is obviously not possible to open up a restaurant.

## Results and Discussion <a name="results"></a>

The analysis shows some areas close to the center of Manhattan where the density of restaurants/Indian restaurants is low even if you can find nearly 3000 restaurants in Manhattan. 

The analysis presents two areas where you wont find any Indian restaurant within at least 500m radius and where there are no restaurants in at least 250m of radius. 

From a perspective of competition the analysis is able to present two queit lage areas where it might be interesting to open up a Indian restaurant but it doen´t take into account if the rent is affordable or if there are spaces available to open up a restaurant or if it is a attractive neighborhood. 

## Conclusion <a name="conclusion"></a>

The purpose of this analysis was to present attractive locations to the stakeholders to open up a Indian restaurant in Manhattan. 

Therefore the analysis used data science to calculate the density of restaurants/Indian restaurants. 
By visualizing those densities we were able to identify two areas quiet close to the center of Manhattan where the density of restaurants/Indian restaurants is very low. 

This analyis will build the foundation for stakeholders for making a descision where to open up a Indian restaurant. For the descision additional factors needs to be taken into account like for examble the rent, if there are available locations for a restaurant, the population density and the overall attractiveness of the neighborhood. 