# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)



## Introduction: Business Problem <a name="introduction"></a>

In this project we will look for the best place to open a ramen shop in New York. This report will be a good help for anyone who is considering opening a new ramen restaurant. New York is the largest city in the United States with a diverse population. It is also a city with many busy working people.¶
On the other hand, there are many Asians living in New York. In Asia, ramen is a familiar menu item to many people, and recently it has become a universal language. Therefore, I think that ramen, which can be eaten quickly and cheaply and is liked by everyone, will become a popular food in New York. In this report, we will use data science to visualize the characteristics of each district using raw data to find the most suitable district to open a ramen restaurant.

## Data <a name="data"></a>

The following data is required for this project.
* Geographic data such as latitude and longitude of New York City
https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json
This data will be used to analyze the districts of New York.

* Ramen restaurants in New York
Foursquare API
Using this API to extract information about ramen restaurants from restaurants in New York.

## Methodology <a name="methodology"></a>

The goal of this project is to find the best place to open a ramen restaurant in New York City.
We have obtained the following data: 
* Geographic data of New York City
* Information on Japan-related restaurants in New York

We used the FourSquare API.

### Neighborhood Candidates

Let's create latitude & longitude coordinates for centroids of our candidate neighborhoods. We will create a grid of cells covering our area of interest which is aprox. 12x12 killometers centered around New York city center.

Let's first find the latitude & longitude of New York city center, using specific, well known address and Google Maps geocoding API.

In [37]:
# import numpy and pandas (dataframe)
import numpy as np 
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json

import geopy
from geopy.geocoders import Nominatim

# library to handle requests
import requests
from pandas.io.json import json_normalize

#  import map rendering libraries
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

# import Beautiful soup
import bs4 as bs

To get latitude and longitude of New York.

In [38]:
address = 'New York'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

40.7127281 -74.0060152


SearchingExplore Neighborhoods in New York

In [46]:
CLIENT_ID = 'Q55UKOLCLWIEV2T3DTHT11CBX2R4SFH0RSGKLSVOGKSNPPRU' 
CLIENT_SECRET = '2VPS1PFTEI5BNNNGPHUCLVFUJV3ZLPKNDRUGDK2YJ3UQIFEC' 
VERSION = '20210326'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: Q55UKOLCLWIEV2T3DTHT11CBX2R4SFH0RSGKLSVOGKSNPPRU
CLIENT_SECRET:2VPS1PFTEI5BNNNGPHUCLVFUJV3ZLPKNDRUGDK2YJ3UQIFEC


In [58]:
search_query = 'Japanese'
radius = 500
print(search_query + ' .... OK!')

Japanese .... OK!


In [59]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=Q55UKOLCLWIEV2T3DTHT11CBX2R4SFH0RSGKLSVOGKSNPPRU&client_secret=2VPS1PFTEI5BNNNGPHUCLVFUJV3ZLPKNDRUGDK2YJ3UQIFEC&ll=40.7127281,-74.0060152&v=20210326&query=Japanese&radius=500&limit=30'

In [60]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '606312eb48d83d2b05bc3d6a'},
 'response': {'venues': [{'id': '4e6aaea3d164c37bf4b2539d',
    'name': 'Sumo Japanese Cuisine',
    'location': {'address': '104 John St',
     'crossStreet': 'at Cliff St',
     'lat': 40.70771408081055,
     'lng': -74.0062255859375,
     'labeledLatLngs': [{'label': 'display',
       'lat': 40.70771408081055,
       'lng': -74.0062255859375}],
     'distance': 558,
     'postalCode': '10038',
     'cc': 'US',
     'city': 'New York',
     'state': 'NY',
     'country': 'United States',
     'formattedAddress': ['104 John St (at Cliff St)',
      'New York, NY 10038',
      'United States']},
    'categories': [{'id': '4bf58dd8d48988d111941735',
      'name': 'Japanese Restaurant',
      'pluralName': 'Japanese Restaurants',
      'shortName': 'Japanese',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/japanese_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1617105643',

**To create a panda dataframe.**

In [61]:
venues = results['response']['venues']
    
nearby_venues = pd.json_normalize(venues) # flatten JSON

In [62]:
# see dataframe
nearby_venues.head()

Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.crossStreet,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,delivery.id,delivery.url,delivery.provider.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.icon.name,location.neighborhood,venuePage.id
0,4e6aaea3d164c37bf4b2539d,Sumo Japanese Cuisine,"[{'id': '4bf58dd8d48988d111941735', 'name': 'J...",v-1617105643,False,104 John St,at Cliff St,40.707714,-74.006226,"[{'label': 'display', 'lat': 40.70771408081055...",558,10038,US,New York,NY,United States,"[104 John St (at Cliff St), New York, NY 10038...",,,,,,,,
1,5464f27c498e2e01c60151af,Kaede Japanese Restaurant,"[{'id': '4bf58dd8d48988d111941735', 'name': 'J...",v-1617105643,False,90 Chambers St,,40.714623,-74.00727,"[{'label': 'display', 'lat': 40.71462286117167...",236,10007,US,New York,NY,United States,"[90 Chambers St, New York, NY 10007, United St...",2398298.0,https://www.seamless.com/menu/kaede-90-chamber...,seamless,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_seamless_20180129.png,,
2,4e4e4c34bd4101d0d7a71f7d,Aoi Japanese Restaurant,"[{'id': '4bf58dd8d48988d111941735', 'name': 'J...",v-1617105643,False,325 Broadway,,40.71612,-74.005278,"[{'label': 'display', 'lat': 40.71612035753943...",382,10007,US,New York,NY,United States,"[325 Broadway, New York, NY 10007, United States]",,,,,,,,
3,4c45d523dd1f2d7f64c681f9,Mana Japanese,"[{'id': '4bf58dd8d48988d111941735', 'name': 'J...",v-1617105643,False,59 Nassau St,,40.709251,-74.008797,"[{'label': 'display', 'lat': 40.70925140380859...",452,10038,US,New York,NY,United States,"[59 Nassau St, New York, NY 10038, United States]",,,,,,,,
4,4f326c4819836c91c7d658ff,Nagoya Japanese Restaurant,"[{'id': '4d4b7105d754a06374d81259', 'name': 'F...",v-1617105643,False,59 Nassau St,,40.709251,-74.008797,"[{'label': 'display', 'lat': 40.70925140380859...",452,10038,US,New York,NY,United States,"[59 Nassau St, New York, NY 10038, United States]",,,,,,,,


In [63]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in nearby_venues.columns if col.startswith('location.')] + ['id']
venue_filtered = nearby_venues.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
venue_filtered['categories'] = venue_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
venue_filtered.columns = [column.split('.')[-1] for column in venue_filtered.columns]

venue_filtered

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
0,Sumo Japanese Cuisine,Japanese Restaurant,104 John St,at Cliff St,40.707714,-74.006226,"[{'label': 'display', 'lat': 40.70771408081055...",558,10038,US,New York,NY,United States,"[104 John St (at Cliff St), New York, NY 10038...",,4e6aaea3d164c37bf4b2539d
1,Kaede Japanese Restaurant,Japanese Restaurant,90 Chambers St,,40.714623,-74.00727,"[{'label': 'display', 'lat': 40.71462286117167...",236,10007,US,New York,NY,United States,"[90 Chambers St, New York, NY 10007, United St...",,5464f27c498e2e01c60151af
2,Aoi Japanese Restaurant,Japanese Restaurant,325 Broadway,,40.71612,-74.005278,"[{'label': 'display', 'lat': 40.71612035753943...",382,10007,US,New York,NY,United States,"[325 Broadway, New York, NY 10007, United States]",,4e4e4c34bd4101d0d7a71f7d
3,Mana Japanese,Japanese Restaurant,59 Nassau St,,40.709251,-74.008797,"[{'label': 'display', 'lat': 40.70925140380859...",452,10038,US,New York,NY,United States,"[59 Nassau St, New York, NY 10038, United States]",,4c45d523dd1f2d7f64c681f9
4,Nagoya Japanese Restaurant,Food,59 Nassau St,,40.709251,-74.008797,"[{'label': 'display', 'lat': 40.70925140380859...",452,10038,US,New York,NY,United States,"[59 Nassau St, New York, NY 10038, United States]",,4f326c4819836c91c7d658ff
5,Nagoya Japanese Restaurant,Japanese Restaurant,49 Fulton St,,40.708168,-74.00396,"[{'label': 'entrance', 'lat': 40.708304, 'lng'...",536,10038,US,New York,NY,United States,"[49 Fulton St, New York, NY 10038, United States]",,4e4c4fbdbd413c4cc66869c3
6,Korin,Furniture / Home Store,57 Warren St,Church St,40.714824,-74.009404,"[{'label': 'display', 'lat': 40.71482437714839...",369,10007,US,New York,NY,United States,"[57 Warren St (Church St), New York, NY 10007,...",Tribeca,4af5d65ff964a52091fd21e3
7,China 59,Chinese Restaurant,59 Nassau St,btwn Maiden Lane & John St,40.709178,-74.008958,"[{'label': 'display', 'lat': 40.70917839380905...",466,10038,US,New York,NY,United States,"[59 Nassau St (btwn Maiden Lane & John St), Ne...",,4a982c7af964a520cc2a20e3


**To visualize nearby restaurants in the map.**

In [64]:
venue_filtered.name

0         Sumo Japanese Cuisine
1     Kaede Japanese Restaurant
2       Aoi Japanese Restaurant
3                 Mana Japanese
4    Nagoya Japanese Restaurant
5    Nagoya Japanese Restaurant
6                         Korin
7                      China 59
Name: name, dtype: object

In [66]:
#install folium
!pip install folium==0.5.0

Collecting folium==0.5.0
  Downloading folium-0.5.0.tar.gz (79 kB)
[K     |████████████████████████████████| 79 kB 10.9 MB/s eta 0:00:01
[?25hCollecting branca
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Building wheels for collected packages: folium
  Building wheel for folium (setup.py) ... [?25ldone
[?25h  Created wheel for folium: filename=folium-0.5.0-py3-none-any.whl size=76240 sha256=f7805e3d45c4e3580c9f66e0710bf053f3c9aa98f90ffc7bfc72c157ec5a0a6e
  Stored in directory: /tmp/wsuser/.cache/pip/wheels/b2/2f/2c/109e446b990d663ea5ce9b078b5e7c1a9c45cca91f377080f8
Successfully built folium
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.5.0


In [67]:
import folium

In [68]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the Conrad Hotel

# add a red circle marker to represent the Conrad Hotel
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Conrad Hotel',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(venue_filtered.lat, venue_filtered.lng, venue_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

## Analysis <a name="analysis"></a>

**Cluster Neighborhoods**

In [71]:
newyork_map = folium.Map(location=[latitude, longitude], zoom_start=11)

# set number of clusters
kclusters = 4
X = venue_filtered['lat']
Y = venue_filtered['lng']
newyork_grouped_clustering = np.stack((X, Y), axis=1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(newyork_grouped_clustering)

In [72]:
clusters = kmeans.labels_
colors = ['red', 'green', 'blue', 'yellow']
venue_filtered['Cluster'] = clusters

for latitude, longitude, borough, cluster in zip(venue_filtered['lat'], venue_filtered['lng'], venue_filtered['crossStreet'], venue_filtered['Cluster']):
    label = folium.Popup(borough, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='black',
        fill=True,
        fill_color=colors[cluster],
        fill_opacity=0.7).add_to(newyork_map)  
    
newyork_map

## Results and Discussion <a name="results"></a>

The results of the exploratory data analysis and clustering are summarized below.

* Restaurants associated with Japanese cuisine are distributed throughout central and southern New York.
* There was no solid distribution.

Based on these results, it can be concluded that the least competition for future Japanese restaurants will be in upstate New York.
On the other hand, opening a restaurant in the southern part of New York is more likely to attract customers who came to other Japanese restaurants in close proximity.
This clustering is based on information from the Foursquare API. Therefore, it may not be accurate because it does not include other information such as land prices.

## Conclusion <a name="conclusion"></a>

The goal of this project is to find the best location to open a ramen restaurant in New York.
We applied the k-means clustering algorithm to select the areas in New York with the most Japanese restaurants.

Finally, all of these analyses rely on Foursquare data. So I figured I would need to get information from other external databases for a more comprehensive analysis.