# Capstone Project - The Battle of Neighborhoods (Week 1)

## 1. Introduction
#### New York is a major central city for economy and diversity since many people from different cultural atmospheres have brought their families and dreams to New York. 
#### People in New York have various cultural backgrounds so that it's easy to find a large variety of restaurants.
#### Sushi and Ramen are very popular and well-known Japanese dishes not only in Japan, but also outside of Japan as well. Especially, in Manhattan, New York, there are lots fo ramen and sushi restaurants because they do a good business.

## 2. Data

### What data is used and how will the problem be solved?

#### New York Neighbourhood Data is used in this analysis to get to know of New York neighbourhoods.
#### We also use the data we've got by webscraping, Foursquare API, and the visualised data by Folium.

#### We will be completely working on Foursquare data to explore and try to locate a spot for our new ramen/sushi restaurant, as stated before, at a location that is not already crowded with similar restaurants. We will examine each neighborhood in the area of interest, and using the Foursquare API, we will explore nearby venues. In particular, we are interested in the grabbing ramen or sushi to eat.

In [1]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
import geocoder
from IPython.display import Image 
from IPython.core.display import HTML 
from bs4 import BeautifulSoup # scraping library
from pandas.io.json import json_normalize
import json # JSON files manipulation
from sklearn.cluster import KMeans # clustering algorithm
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt
import folium # plotting library

In [2]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

neighborhoods_data = newyork_data['features']

In [3]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [4]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
    
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585
...,...,...,...,...
301,Manhattan,Hudson Yards,40.756658,-74.000111
302,Queens,Hammels,40.587338,-73.805530
303,Queens,Bayswater,40.611322,-73.765968
304,Queens,Queensbridge,40.756091,-73.945631


In [5]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


In [6]:
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

Man='Manhattan geograpical coordinate: {},{}'.format(latitude,longitude)

print(Man)

Manhattan geograpical coordinate: 40.7896239,-73.9598939


In [7]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


In [8]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

In [9]:
CLIENT_ID = 'IVVSEMX5XRY2LU1R32RW0A5NJ35YNT4DL3CQV1GYBJU4M4NQ'
CLIENT_SECRET = 'BO00LD5AWTTUZ1MBXT5WFJNXQF3BPKCCV4AKUA0BSCPIGTFS' 
LIMIT = 100
VERSION = '20180605' 

In [10]:
search_query_ramen = 'Ramen'
search_query_sushi = 'Sushi'

radius = 2000
print(search_query_ramen + ' .... OK!')
print(search_query_sushi + ' .... OK!')

Ramen .... OK!
Sushi .... OK!


In [11]:
url_ramen = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query_ramen, radius, LIMIT)
url_sushi = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query_sushi, radius, LIMIT)

In [12]:
results_ramen = requests.get(url_ramen).json()
results_sushi = requests.get(url_sushi).json()
print(results_ramen)
print(results_sushi)

{'meta': {'code': 200, 'requestId': '60b81e6e1d9d4e7451c0bad8'}, 'response': {'venues': [{'id': '54a48e30498ef8abe3c6d4c5', 'name': 'Jin Ramen', 'location': {'address': '462 Amsterdam Ave', 'crossStreet': 'W 82nd St', 'lat': 40.78526069203778, 'lng': -73.97683897404411, 'labeledLatLngs': [{'label': 'display', 'lat': 40.78526069203778, 'lng': -73.97683897404411}, {'label': 'entrance', 'lat': 40.785332, 'lng': -73.976937}], 'distance': 1508, 'postalCode': '10024', 'cc': 'US', 'city': 'New York', 'state': 'NY', 'country': 'United States', 'formattedAddress': ['462 Amsterdam Ave (W 82nd St)', 'New York, NY 10024', 'United States']}, 'categories': [{'id': '55a59bace4b013909087cb24', 'name': 'Ramen Restaurant', 'pluralName': 'Ramen Restaurants', 'shortName': 'Ramen', 'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/ramen_', 'suffix': '.png'}, 'primary': True}], 'referralId': 'v-1622679150', 'hasPerk': False}, {'id': '5d92b1059289530008a23077', 'name': 'Kitakata Ramen Ban Nai',

In [13]:
# assign relevant part of JSON to venues
venues_ramen = results_ramen['response']['venues']
venues_sushi = results_sushi['response']['venues']

# tranform venues into a dataframe and merging both data
dataframe_ramen = pd.json_normalize(venues_ramen)
dataframe_sushi = pd.json_normalize(venues_sushi)

dataframe = pd.concat([dataframe_ramen,dataframe_sushi])

print("There are {} Ramen and Sushi Restaurants in Manhattan".format(dataframe.shape[0]))

There are 62 Ramen and Sushi Restaurants in Manhattan


In [14]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

#dataframe_filtered
df=dataframe_filtered[['name','categories','lat','lng','distance']]
df.head(10)

Unnamed: 0,name,categories,lat,lng,distance
0,Jin Ramen,Ramen Restaurant,40.785261,-73.976839,1508
1,Kitakata Ramen Ban Nai,Ramen Restaurant,40.778841,-73.981183,2158
2,Zurutto Ramen & Gyoza Bar,Ramen Restaurant,40.778068,-73.98039,2153
3,Bua Thai Ramen & Robata Grill,Thai Restaurant,40.77635,-73.95308,1585
4,Naruto Ramen,Ramen Restaurant,40.781074,-73.952299,1146
5,Mei-jin Ramen,Ramen Restaurant,40.77502,-73.953579,1710
6,Naruto Ramen,Noodle House,40.797065,-73.970028,1189
7,Churutto Ramen,Japanese Restaurant,40.77899,-73.953954,1285
8,Mr. Peng's Ramen & Sushi,Asian Restaurant,40.776838,-73.949757,1660
9,Choudaiya Saji's Ramen,Japanese Restaurant,40.803239,-73.966922,1627
