# Capstone Project - The Battle of the Neighborhoods (Week 1)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)

## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find an optimal location for a restaurant. Specifically, this report will be targeted to stakeholders interested in opening an **Ramen/Sushi Restaurant** in **Manhattan**, New York.

Since there are lots of restaurants in Berlin we will try to detect **locations that are not already crowded with restaurants**. We are also particularly interested in **areas with no Ramen/Sushi restaurants in nearby**. We would also prefer locations **as close to city center as possible**, assuming that first two conditions are met.

We will use our data science powers to generate a few most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.


## Data <a name="data"></a>

### What data is used and how will the problem be solved?
Based on definition of our problem, factors that will influence our decision are:
* number of existing restaurants in the neighborhood (any type of restaurant)
* number of to Ramen/Sushi restaurants in the neighborhood, if any

We will be completely working on Foursquare data to explore and try to locate a spot for our new ramen/sushi restaurant, as stated before, at a location that is not already crowded with similar restaurants. We will examine each neighborhood in the area of interest, and using the Foursquare API, we will explore nearby venues. In particular, we are interested in the grabbing ramen or sushi to eat.

### Importing Libraries

In [1]:
# Import libraries
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

#!pip install geopy
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

#!pip install geocoder
import geocoder

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 

from bs4 import BeautifulSoup # scraping library

# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize
import json # JSON files manipulation

from sklearn.cluster import KMeans # clustering algorithm

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt

#! pip install folium==0.5.0
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Folium installed
Libraries imported.


### Importing NY Neighborhood Data

In [2]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

neighborhoods_data = newyork_data['features']

In [3]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [4]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
    
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [5]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


#### Use geopy library to get the latitude and longitude values of Manhattan.
In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent foursquare_agent, as shown below.

In [6]:
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

Man='Manhattan geograpical coordinate: {},{}'.format(latitude,longitude)

print(Man)

Manhattan geograpical coordinate: 40.7896239,-73.9598939


In [7]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


In [8]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

### Define Foursquare Credentials and Version

In [9]:
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
ACCESS_TOKEN = '' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 100

### Search for a Ramen and Sushi Restaurants within 1 KM Radius
#### Since we're looking in Manhattan NY, let's find out if there is any ramen and sushi spots within 1 KM

In [10]:
search_query_ramen = 'Ramen'
search_query_sushi = 'Sushi'

radius = 2000
print(search_query_ramen + ' .... OK!')
print(search_query_sushi + ' .... OK!')

Ramen .... OK!
Sushi .... OK!


#### Define the corresponding URL

In [11]:
url_ramen = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query_ramen, radius, LIMIT)
url_sushi = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query_sushi, radius, LIMIT)

#### Send the GET Request and examine the results

In [12]:
results_ramen = requests.get(url_ramen).json()
results_sushi = requests.get(url_sushi).json()

#### Get relevant part of JSON and transform it into a pandas dataframe

In [13]:
# assign relevant part of JSON to venues
venues_ramen = results_ramen['response']['venues']
venues_sushi = results_sushi['response']['venues']

# tranform venues into a dataframe and merging both data
dataframe_ramen = pd.json_normalize(venues_ramen)
dataframe_sushi = pd.json_normalize(venues_sushi)

dataframe = pd.concat([dataframe_ramen,dataframe_sushi])

print("There are {} Ramen and Sushi Restaurants in Manhattan".format(dataframe.shape[0]))

There are 62 Ramen and Sushi Restaurants in Manhattan


#### Define information of interest and filter dataframe

In [14]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

#dataframe_filtered
df=dataframe_filtered[['name','categories','lat','lng','distance']]
df.head(10)

Unnamed: 0,name,categories,lat,lng,distance
0,Jin Ramen,Ramen Restaurant,40.785261,-73.976839,1508
1,Kitakata Ramen Ban Nai,Ramen Restaurant,40.778841,-73.981183,2158
2,Zurutto Ramen & Gyoza Bar,Ramen Restaurant,40.778068,-73.98039,2153
3,Bua Thai Ramen & Robata Grill,Thai Restaurant,40.77635,-73.95308,1585
4,Naruto Ramen,Ramen Restaurant,40.781074,-73.952299,1146
5,Naruto Ramen,Noodle House,40.797065,-73.970028,1189
6,Mei-jin Ramen,Ramen Restaurant,40.77502,-73.953579,1710
7,Churutto Ramen,Japanese Restaurant,40.77899,-73.953954,1285
8,Choudaiya Saji's Ramen,Japanese Restaurant,40.803239,-73.966922,1627
9,Mr. Peng's Ramen & Sushi,Asian Restaurant,40.776838,-73.949757,1660
