# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find an optimal location for a restaurant. Specifically, this report will be targeted to stakeholders who has a successful restaurant in one city (New York City here) interested in opening a **chain restaurant** (Chinese Restaurant here) in a new city (Los Angeles here).

With the development of economy, more and more businessmen want to open chain stores, but if choosing the wrong place, they may face bankrupt. Thus, if we can find an optimal location in the new city for them to open the new store, then the risk will be lower. 

Since the stakeholder already has a popular restaurant in one place (New York City here), we will use our data science powers to detect the neighborhoods of successful restaurant and find a similar place in another city (Los Angeles here) so that the new opened restaurant will have similar environment with in the old place (New York City) which will lower the risk of investment.

## Data <a name="data"></a>

We will use Foursquare location data to solve the problem or execute the idea. With the Foursquare location data, we can get the neighborhoods of the restaurant that we are focusing. Within these neighborhoods, we may find the different categories and their frequency. Then we may find a place in Los Angeles which has similar neighborhoods.

### Import several libraries we need

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis

import json # library to handle JSON files

!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

! pip install folium==0.5.0
import folium # map rendering library

print('Libraries imported.')

Collecting folium==0.5.0
  Downloading folium-0.5.0.tar.gz (79 kB)
[K     |████████████████████████████████| 79 kB 7.8 MB/s  eta 0:00:01
[?25hCollecting branca
  Downloading branca-0.4.1-py3-none-any.whl (24 kB)
Building wheels for collected packages: folium
  Building wheel for folium (setup.py) ... [?25ldone
[?25h  Created wheel for folium: filename=folium-0.5.0-py3-none-any.whl size=76240 sha256=578371a31a0bf8823b9c6f84eca70f3ff26312bb4f996220c4d17a2d5e76c1f2
  Stored in directory: /tmp/wsuser/.cache/pip/wheels/b2/2f/2c/109e446b990d663ea5ce9b078b5e7c1a9c45cca91f377080f8
Successfully built folium
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.5.0
Libraries imported.


## Find a specific restaurant to focus

In order to do our analysis, we need to find a specific restaurant to focus. This restaurant should be successful in New York, but there is no chain store in Los Angeles. More specificlly, we want to find a Chinese restaurant that satisfies the conditions .

#### Let's find the latitude and longtitude of the New York City first.

In [19]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


#### Define Foursquare Credentials and Version

In [77]:
CLIENT_ID = 'OJZQTCIXLU2304V4LWI2CGHET4XQZXVZ2S5UODMOTQISJYBE' # your Foursquare ID
CLIENT_SECRET = 'GLZOQ02EVR3O5IQ0USP4NSQ4ICTKGD1E34GKWM5NLXKQUYHD' # your Foursquare Secret
VERSION = '20200715'
LIMIT = 50
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: OJZQTCIXLU2304V4LWI2CGHET4XQZXVZ2S5UODMOTQISJYBE
CLIENT_SECRET:GLZOQ02EVR3O5IQ0USP4NSQ4ICTKGD1E34GKWM5NLXKQUYHD


#### Find 50 Chinese Restaurants around center of New York City

In [21]:
search_query = 'Chinese Restaurant'
radius = 1000
print(search_query + ' .... OK!')

Chinese Restaurant .... OK!


In [22]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=QQ00XWC5002MSZSV5PYXPSA5RYNZ4AK4ZKR1OPKDT1OVDHJI&client_secret=FNA2DAHWEZ5Z3MV4BBUMEBDYSO4G2X21DGMFZM1LXIRIYEAF&ll=40.7127281,-74.0060152&v=20200715&query=Chinese Restaurant&radius=1000&limit=50'

In [23]:
results = requests.get(url).json()

{'meta': {'code': 200, 'requestId': '5fc29ecc432e02660c9094b4'},
 'response': {'venues': [{'id': '58aa398b14fb41097a0cf923',
    'name': 'Great Fortune Chinese Restaurant 聚旺茶餐廳',
    'location': {'address': '5 Catherine St',
     'crossStreet': 'btwn E Broadway & Chatham Sq',
     'lat': 40.713875,
     'lng': -73.99728,
     'labeledLatLngs': [{'label': 'display',
       'lat': 40.713875,
       'lng': -73.99728}],
     'distance': 748,
     'postalCode': '10038',
     'cc': 'US',
     'city': 'New York',
     'state': 'NY',
     'country': 'United States',
     'formattedAddress': ['5 Catherine St (btwn E Broadway & Chatham Sq)',
      'New York, NY 10038',
      'United States']},
    'categories': [{'id': '52af3a7c3cf9994f4e043bed',
      'name': 'Cantonese Restaurant',
      'pluralName': 'Cantonese Restaurants',
      'shortName': 'Cantonese',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/asian_',
       'suffix': '.png'},
      'primary': True}],
    'ref

In [24]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.crossStreet,location.lat,location.lng,location.labeledLatLngs,...,location.state,location.country,location.formattedAddress,delivery.id,delivery.url,delivery.provider.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.icon.name,venuePage.id
0,58aa398b14fb41097a0cf923,Great Fortune Chinese Restaurant 聚旺茶餐廳,"[{'id': '52af3a7c3cf9994f4e043bed', 'name': 'C...",v-1606590156,False,5 Catherine St,btwn E Broadway & Chatham Sq,40.713875,-73.99728,"[{'label': 'display', 'lat': 40.713875, 'lng':...",...,NY,United States,[5 Catherine St (btwn E Broadway & Chatham Sq)...,,,,,,,
1,4e4463cf52b18fcc799a9761,88 Reach House chinese restaurant,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1606590156,False,88 Division St,Eldrige street,40.714468,-73.993727,"[{'label': 'display', 'lat': 40.714468, 'lng':...",...,NY,United States,"[88 Division St (Eldrige street), New York, NY...",,,,,,,
2,581299bb38fa2e54303f6891,Taste Chinese Restaurant,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1606590156,False,178 Church St,,40.71569,-74.007605,"[{'label': 'display', 'lat': 40.7156902, 'lng'...",...,NY,United States,"[178 Church St, New York, NY 10013, United Sta...",322507.0,https://www.seamless.com/menu/taste-chinese-re...,seamless,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_seamless_20180129.png,
3,4b54fcddf964a520bfd627e3,Food King Chinese Restaurant,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1606590156,False,56 Market St,Monroe,40.711606,-73.994186,"[{'label': 'display', 'lat': 40.71160573939490...",...,NY,United States,"[56 Market St (Monroe), New York, NY 10002, Un...",,,,,,,
4,4e4e4c7abd4101d0d7a72532,Dragon Gate Chinese Restaurant,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1606590156,False,92 Elizabeth St,,40.718358,-73.995632,"[{'label': 'display', 'lat': 40.718358, 'lng':...",...,NY,United States,"[92 Elizabeth St, New York, NY 10013, United S...",,,,,,,


In [9]:
dataframe.shape

(50, 24)

Define information of interest and filter dataframe

In [118]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered.head(10)

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,id
0,Great Fortune Chinese Restaurant 聚旺茶餐廳,Cantonese Restaurant,5 Catherine St,btwn E Broadway & Chatham Sq,40.713875,-73.99728,"[{'label': 'display', 'lat': 40.713875, 'lng':...",748,10038,US,New York,NY,United States,[5 Catherine St (btwn E Broadway & Chatham Sq)...,58aa398b14fb41097a0cf923
1,88 Reach House chinese restaurant,Chinese Restaurant,88 Division St,Eldrige street,40.714468,-73.993727,"[{'label': 'display', 'lat': 40.714468, 'lng':...",1054,10002,US,New York,NY,United States,"[88 Division St (Eldrige street), New York, NY...",4e4463cf52b18fcc799a9761
2,Taste Chinese Restaurant,Chinese Restaurant,178 Church St,,40.71569,-74.007605,"[{'label': 'display', 'lat': 40.7156902, 'lng'...",355,10013,US,New York,NY,United States,"[178 Church St, New York, NY 10013, United Sta...",581299bb38fa2e54303f6891
3,Food King Chinese Restaurant,Chinese Restaurant,56 Market St,Monroe,40.711606,-73.994186,"[{'label': 'display', 'lat': 40.71160573939490...",1005,10002,US,New York,NY,United States,"[56 Market St (Monroe), New York, NY 10002, Un...",4b54fcddf964a520bfd627e3
4,Dragon Gate Chinese Restaurant,Chinese Restaurant,92 Elizabeth St,,40.718358,-73.995632,"[{'label': 'display', 'lat': 40.718358, 'lng':...",1077,10013,US,New York,NY,United States,"[92 Elizabeth St, New York, NY 10013, United S...",4e4e4c7abd4101d0d7a72532
5,Downtown Chinese Restaurant,Food,135 John St,,40.707134,-74.004944,"[{'label': 'entrance', 'lat': 40.707098, 'lng'...",629,10038,US,New York,NY,United States,"[135 John St, New York, NY 10038, United States]",4f320f6d19833175d60c8d97
6,Ann's Chinese Restaurant,Asian Restaurant,1 E Broadway #2FL,,40.713154,-73.997955,"[{'label': 'display', 'lat': 40.71315383911133...",681,10038,US,New York,NY,United States,"[1 E Broadway #2FL, New York, NY 10038, United...",5012c0fde4b0748a78b0c886
7,Malasia-Thai-Chinese Restaurant,Asian Restaurant,,,40.717035,-73.999719,"[{'label': 'display', 'lat': 40.717035, 'lng':...",715,10013,US,New York,NY,United States,"[New York, NY 10013, United States]",50b043dbe4b0be4ddec7cf6a
8,Marco Polo Chinese Restaurant,Chinese Restaurant,94 Baxter St,,40.71692,-73.999419,"[{'label': 'display', 'lat': 40.71692037258939...",726,10013,US,New York,NY,United States,"[94 Baxter St, New York, NY 10013, United States]",4e4e4c65bd4101d0d7a723cb
9,Fu Zhou An Ping Chinese Restaurant,Asian Restaurant,20 Henry St,,40.712723,-73.99688,"[{'label': 'display', 'lat': 40.71272277832031...",770,10002,US,New York,NY,United States,"[20 Henry St, New York, NY 10002, United States]",58913f12288b6a0e789bb237


Let's visualize the Chinese restaurants that are nearby

In [12]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map 

# add a red circle marker to represent the center of New York City
folium.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Center',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Chinese restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

#### Select out the restaurants that are rated

In [29]:
rated_restaurant = []
rating = []
for i in range(dataframe_filtered.shape[0]):
    venue_id = dataframe_filtered.loc[i, 'id']
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
    result = requests.get(url).json()
    try:
        print(result['response']['venue']['rating'])
        rated_restaurant.append(dataframe_filtered.iloc[i])
        rating.append(result['response']['venue']['rating'])
    
    except:
        print('This venue has not been rated yet.')
    

This venue has not been rated yet.
This venue has not been rated yet.
This venue has not been rated yet.
This venue has not been rated yet.
This venue has not been rated yet.
This venue has not been rated yet.
This venue has not been rated yet.
This venue has not been rated yet.
This venue has not been rated yet.
This venue has not been rated yet.
5.5
This venue has not been rated yet.
This venue has not been rated yet.
This venue has not been rated yet.
This venue has not been rated yet.
This venue has not been rated yet.
5.6
8.0
8.7
8.3
This venue has not been rated yet.
8.0
This venue has not been rated yet.
8.0
7.5
This venue has not been rated yet.
7.1
This venue has not been rated yet.
This venue has not been rated yet.
6.7
This venue has not been rated yet.
7.4
This venue has not been rated yet.
8.0
This venue has not been rated yet.
This venue has not been rated yet.
This venue has not been rated yet.
This venue has not been rated yet.
This venue has not been rated yet.
7.1
Thi

Combine the basic info of restaurants that are rated and their rating

In [30]:
from pandas import DataFrame
rated_restaurant2 = DataFrame(rated_restaurant)
rated_restaurant2['rating'] = rating
rated_restaurant3 = rated_restaurant2[rated_restaurant2['categories'].str.contains("Chinese")].reset_index(drop=True)
rated_restaurant3

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,id,rating
0,Canal Best Chinese Restaurant,Chinese Restaurant,266 Canal St,,40.71879,-74.001055,"[{'label': 'display', 'lat': 40.71878993882605...",794,10013,US,New York,NY,United States,"[266 Canal St, New York, NY 10013, United States]",4d83993c81fdb1f7cf87eabf,5.5
1,Number One Chinese,Chinese Restaurant,10 S William St,,40.704574,-74.01041,"[{'label': 'display', 'lat': 40.70457357537987...",980,10004,US,New York,NY,United States,"[10 S William St, New York, NY 10004, United S...",4ce56fbd5bf68cfa65f23c17,5.6
2,Deluxe Green Bo Restaurant,Chinese Restaurant,66 Bayard St,btwn Elizabeth & Mott St,40.715545,-73.998137,"[{'label': 'display', 'lat': 40.71554491813315...",734,10013,US,New York,NY,United States,"[66 Bayard St (btwn Elizabeth & Mott St), New ...",3fd66200f964a520ceea1ee3,8.0
3,XO Kitchen,Chinese Restaurant,148 Hester St,btw Bowery & Elizabeth,40.717482,-73.996047,"[{'label': 'display', 'lat': 40.71748184806784...",993,10013,US,New York,NY,United States,"[148 Hester St (btw Bowery & Elizabeth), New Y...",49f50c47f964a520896b1fe3,6.7
4,Bo Ky Restaurant 波記潮州小食,Chinese Restaurant,80 Bayard St,at Mott St,40.715696,-73.998667,"[{'label': 'display', 'lat': 40.71569636637641...",702,10013,US,New York,NY,United States,"[80 Bayard St (at Mott St), New York, NY 10013...",4a00df67f964a520ba701fe3,8.0
5,Sun Sai Gai Restaurant,Chinese Restaurant,220 Canal St,at Baxter St,40.717369,-73.999415,"[{'label': 'display', 'lat': 40.71736941942955...",759,10013,US,New York,NY,United States,"[220 Canal St (at Baxter St), New York, NY 100...",4a81ac53f964a5203af71fe3,6.6
6,218 Restaurant,Chinese Restaurant,218 Grand St,btwn Elizabeth & Mott St.,40.718833,-73.995895,"[{'label': 'display', 'lat': 40.71883283355385...",1091,10013,US,New York,NY,United States,"[218 Grand St (btwn Elizabeth & Mott St.), New...",4bfdcd3ae529c928a589bb8c,7.2
7,Chinese Tuxedo,Chinese Restaurant,5 Doyers St,,40.714433,-73.997987,"[{'label': 'display', 'lat': 40.71443328935857...",703,10013,US,New York,NY,United States,"[5 Doyers St, New York, NY 10013, United States]",582543c1b39f9613c18d8e2f,7.3


#### Make sure whether Los Angeles has specific restaurant

We select restaurant with highest rating to focus, so from the above datafame we can see that the highest rating is 8.0, and the restaurants are "Deluxe Green Bo Restaurant" and "Bo Ky Restaurant". Let's try "Deluxe Green Bo Restaurant" first. If Los Angeles does not have this restaurant, we will focus on this restaurant.

#### Find the latitude and longtitude of the Los Angeles.

In [31]:
address = 'Los Angeles, CA'

geolocator = Nominatim(user_agent="ca_explorer")
location = geolocator.geocode(address)
latitude_ca = location.latitude
longitude_ca = location.longitude
print('The geograpical coordinate of Los Angeles are {}, {}.'.format(latitude_ca, longitude_ca))

The geograpical coordinate of Los Angeles are 34.0536909, -118.2427666.


#### Search 'Deluxe Green Bo Restaurant' in Los Angeles

In [32]:
search_query = 'Deluxe Green Bo Restaurant'
radius = 1000
Limit = 3
print(search_query + ' .... OK!')

Deluxe Green Bo Restaurant .... OK!


In [33]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude_ca, longitude_ca, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=2ZWXTA4YRRBQ3HWROURUTABBCBM4NOKA15FQTDWAM11GSNAR&client_secret=KKSVPPAEEG35XTZN1SWT0VV5WJF5JMJ5C4HYE3KR0I3LBUI3&ll=34.0536909,-118.2427666&v=20200715&query=Deluxe Green Bo Restaurant&radius=1000&limit=50'

In [34]:
results = requests.get(url).json()
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe2 = json_normalize(venues)
dataframe2.shape

(0, 0)

We can see that the dataframe we got has nothing in it, so there is no 'Deluxe Green Bo Restaurant' in Los Angeles

## Try to find an optimal place for "Deluxe Green Bo Restaurant" to open a new restaurant in Los Angeles

#### Find the latitude and longtitude of the "Deluxe Green Bo Restaurant" 

In [35]:
restaurant_name = rated_restaurant3.loc[2, 'name'] # restaurant name
restaurant_latitude = rated_restaurant3.loc[2, 'lat'] # restaurant latitude value
restaurant_longitude = rated_restaurant3.loc[2, 'lng'] # restaurant longitude value
print('Latitude and longitude values of {} are {}, {}.'.format(restaurant_name, 
                                                               restaurant_latitude, 
                                                               restaurant_longitude))

Latitude and longitude values of Deluxe Green Bo Restaurant are 40.715544918133155, -73.99813747002635.


#### Get neighborhoods (nearby venues) around "Deluxe Green Bo Restaurant"

In [36]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    restaurant_latitude, 
    restaurant_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=2ZWXTA4YRRBQ3HWROURUTABBCBM4NOKA15FQTDWAM11GSNAR&client_secret=KKSVPPAEEG35XTZN1SWT0VV5WJF5JMJ5C4HYE3KR0I3LBUI3&v=20200715&ll=40.715544918133155,-73.99813747002635&radius=500&limit=100'

In [37]:
results = requests.get(url).json()

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [38]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Zu Yuan Spa,Spa,40.715469,-73.998627
1,Hotel 50 Bowery,Hotel,40.715936,-73.996789
2,The Original Chinatown Ice Cream Factory,Ice Cream Shop,40.715521,-73.998145
3,Xi'an Famous Foods,Chinese Restaurant,40.715232,-73.997263
4,Shanghai 21,Shanghai Restaurant,40.714423,-73.998904


In [39]:
print('There are {} unique categories.'.format(len(nearby_venues['categories'].unique())))

There are 44 unique categories.


In [40]:
nearby_venues_grouped = nearby_venues.groupby('categories').count().reset_index()
nearby_venues_grouped_rank = nearby_venues_grouped.sort_values(by=['name'], ascending=False).reset_index()
nearby_venues_grouped_rank

Unnamed: 0,index,categories,name,lat,lng
0,7,Chinese Restaurant,18,18,18
1,1,Bakery,7,7,7
2,11,Dessert Shop,5,5,5
3,12,Dim Sum Restaurant,5,5,5
4,3,Bubble Tea Shop,4,4,4
5,34,Salon / Barbershop,4,4,4
6,22,Italian Restaurant,3,3,3
7,35,Shanghai Restaurant,3,3,3
8,37,Spa,3,3,3
9,27,Noodle House,3,3,3


Right now we have know that there are 44 unique categories of venues around "Deluxe Green Bo Restaurant". In them, the most category around "Deluxe Green Bo Restaurant" is "Chinese Restaurant", next is "Bakery" and "Dessert Shop" and so on. We want to find a similar place in Los Angeles.

Now let's create a grid of area candidates, equaly spaced, centered around city center. Our neighborhoods will be defined as circular areas with a radius of 300 meters, so our neighborhood centers will be 600 meters apart.

To accurately calculate distances we need to create our grid of locations in Cartesian 2D coordinate system which allows us to calculate distances in meters (not in latitude/longitude degrees). Then we'll project those coordinates back to latitude/longitude degrees to be shown on Folium map. So let's create functions to convert between WGS84 spherical coordinate system (latitude/longitude degrees) and UTM Cartesian coordinate system (X/Y coordinates in  meters).

In [42]:
!pip install shapely
import shapely.geometry

!pip install pyproj
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Los Angeles longitude={}, latitude={}'.format(longitude_ca, latitude_ca))
x, y = lonlat_to_xy(longitude_ca, latitude_ca)
print('Los Angeles UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Los Angeles longitude={}, latitude={}'.format(lo, la))

Collecting shapely
  Downloading Shapely-1.7.1-cp37-cp37m-manylinux1_x86_64.whl (1.0 MB)
[K     |████████████████████████████████| 1.0 MB 10.7 MB/s eta 0:00:01
[?25hInstalling collected packages: shapely
Successfully installed shapely-1.7.1
Collecting pyproj
  Downloading pyproj-3.0.0.post1-cp37-cp37m-manylinux2010_x86_64.whl (6.4 MB)
[K     |████████████████████████████████| 6.4 MB 22.8 MB/s eta 0:00:01
Installing collected packages: pyproj
Successfully installed pyproj-3.0.0.post1
Coordinate transformation check
-------------------------------
Los Angeles longitude=-118.2427666, latitude=34.0536909
Los Angeles UTM X=-3959962.027290968, Y=15049942.111933712
Los Angeles longitude=-118.24276660000001, latitude=34.053690900000014




Let's create a **hexagonal grid of cells**: we offset every other row, and adjust vertical row spacing so that **every cell center is equally distant from all it's neighbors**.

In [43]:
california_x, california_y = lonlat_to_xy(longitude_ca, latitude_ca) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = california_x - 6000
x_step = 600
y_min = california_y - 6000 - (int(21/k)*k*600 - 12000)/2
y_step = 600 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(california_x, california_y, x, y)
        if (distance_from_center <= 6001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)




In [44]:
print(len(latitudes), 'candidate neighborhood centers generated.')

364 candidate neighborhood centers generated.


Let's visualize the data we have so far: city center location and candidate neighborhood centers:

In [45]:
map_california = folium.Map(location=[latitude_ca, longitude_ca], zoom_start=13)
folium.Marker([latitude_ca, longitude_ca], popup='California').add_to(map_california)
for lat, lon in zip(latitudes, longitudes):
    #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_berlin) 
    folium.Circle([lat, lon], radius=300, color='blue', fill=False).add_to(map_california)
    #folium.Marker([lat, lon]).add_to(map_berlin)
map_california

Let's now get approximate addresses of those locations.

In [46]:
def get_address(latitude, longitude):
    position = str(latitude) + ", " + str(longitude)
    geolocator = Nominatim(user_agent="ca_explorer")

    location = geolocator.reverse(position)
    return location.address
get_address(latitude_ca, longitude_ca)

'Los Angeles City Hall, 200, North Spring Street, Civic Center, Downtown, Los Angeles, Los Angeles County, California, 90012, United States of America'

In [47]:
print('Obtaining location addresses: ', end='')
addresses = []
for lat, lon in zip(latitudes, longitudes):
    address = get_address(lat, lon)
    if address is None:
        address = 'NO ADDRESS'
    address = address.replace(', United States of America', '') # We don't need country part of address
    addresses.append(address)
    print(' .', end='')
print(' done.')

Obtaining location addresses:  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


In [48]:
addresses[150:170]

['2570, Kent Street, Echo Park, Los Angeles, Los Angeles County, California, 90026',
 '697, Parkman Avenue, Los Angeles, Los Angeles County, California, 90026',
 'Holy Virgin Mary Russian Orthodox Cathedral, Micheltorena Street, Silver Lake, East Hollywood, Los Angeles, Los Angeles County, California, 90026-2106',
 '3167, East 3rd Street, Boyle Heights, Los Angeles, Los Angeles County, California, 90063',
 '2844, East 2nd Street, Boyle Heights, Los Angeles, Los Angeles County, California, 90033',
 'Severa Synagogue, East 2nd Street, Boyle Heights, Los Angeles, Los Angeles County, California, 90033',
 '2148, East 2nd Street, Brooklyn Heights, Boyle Heights, Los Angeles, Los Angeles County, California, 90033',
 '1885, East 2nd Street, Brooklyn Heights, Boyle Heights, Los Angeles, Los Angeles County, California, 90033',
 '199, Cancion Way, Pueblo Del Sol, Boyle Heights, Los Angeles, Los Angeles County, California, 90033',
 '242, Mission Road, Boyle Heights, Los Angeles, Los Angeles County

Let's now place all this into a Pandas dataframe.

In [117]:
import pandas as pd

df_locations = pd.DataFrame({'Address': addresses,
                             'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})

df_locations.head(10)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center
0,"3331, Thomas Street, Lincoln Heights, Los Ange...",34.082312,-118.204106,-3961762.0,15044230.0,5992.495307
1,"362, Avenue 33, Lincoln Heights, Los Angeles, ...",34.084526,-118.208553,-3961162.0,15044230.0,5840.3767
2,"Metro Park and Ride, Pasadena Avenue, Highland...",34.086739,-118.213,-3960562.0,15044230.0,5747.173218
3,"453, Isabel Street, Cypress Park, Los Angeles,...",34.088952,-118.217448,-3959962.0,15044230.0,5715.767665
4,"3344, Pepper Avenue, Cypress Park, Los Angeles...",34.091165,-118.221896,-3959362.0,15044230.0,5747.173218
5,"2679, Loosmore Street, Cypress Park, Los Angel...",34.093378,-118.226344,-3958762.0,15044230.0,5840.3767
6,"2623, Arvia Street, Cypress Park, Los Angeles,...",34.095592,-118.230793,-3958162.0,15044230.0,5992.495307
7,"Abraham Lincoln High School, Lincoln Park Aven...",34.075789,-118.199741,-3962662.0,15044750.0,5855.766389
8,"2755, Alta Street, Lincoln Heights, Montecito ...",34.078002,-118.204187,-3962062.0,15044750.0,5604.462508
9,"472, Avenue 28, Lincoln Heights, Los Angeles, ...",34.080215,-118.208633,-3961462.0,15044750.0,5408.326913


We just select out first 10 categories with highest frequency since categories after the tenth only appear once or twice, and have little effect on the final result.

In [56]:
ten_categories = nearby_venues_grouped_rank.iloc[:10]['categories']
category_ids = ['4bf58dd8d48988d145941735', '4bf58dd8d48988d16a941735', '4bf58dd8d48988d1d0941735', '4bf58dd8d48988d1f5931735', '52e81612bcbc57f1066b7a0c', '4bf58dd8d48988d110951735', '4bf58dd8d48988d110941735', '52af3b593cf9994f4e043c00', '4bf58dd8d48988d1ed941735', '4bf58dd8d48988d1d1941735']
ten_categories

0     Chinese Restaurant
1                 Bakery
2           Dessert Shop
3     Dim Sum Restaurant
4        Bubble Tea Shop
5     Salon / Barbershop
6     Italian Restaurant
7    Shanghai Restaurant
8                    Spa
9           Noodle House
Name: categories, dtype: object

In [59]:
df_address = df_locations[df_locations.columns[:3]] 
df_address

Unnamed: 0,Address,Latitude,Longitude
0,"3331, Thomas Street, Lincoln Heights, Los Ange...",34.082312,-118.204106
1,"362, Avenue 33, Lincoln Heights, Los Angeles, ...",34.084526,-118.208553
2,"Metro Park and Ride, Pasadena Avenue, Highland...",34.086739,-118.213000
3,"453, Isabel Street, Cypress Park, Los Angeles,...",34.088952,-118.217448
4,"3344, Pepper Avenue, Cypress Park, Los Angeles...",34.091165,-118.221896
...,...,...,...
359,"753, East 33rd Street, Historic South-Central,...",34.016206,-118.263594
360,"416, East 32nd Street, Historic South-Central,...",34.018413,-118.268042
361,"129, West 32nd Street, Historic South-Central,...",34.020620,-118.272490
362,"Harbor Freeway, Historic South-Central, Los An...",34.022828,-118.276939


Let explore the first two categories for now since these two categories ("Chineses Restaurant" and "Bakery") are very impotant and with high frequency.
By using Foursquare, we can search venues around the position with specific category. Then calculate the total number of venues which is also called frequency here.

In [83]:
for j in range(2):
    category_id = category_ids[j]
    category_name = ten_categories[j]
    category_frequency = []
    for i in range(df_locations.shape[0]):
        first_lat = df_locations.loc[i,'Latitude']
        first_lng = df_locations.loc[i,'Longitude']
        radius = 1000
        Limit = 50
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&categoryId={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, first_lat, first_lng, VERSION, category_id, radius, LIMIT)

        results = requests.get(url).json()
        # assign relevant part of JSON to venues
        venues = results['response']['venues']

        # tranform venues into a dataframe
        dataframe3 = json_normalize(venues)
        category_frequency.append(dataframe3.shape[0])
    df_address[category_name] = category_frequency

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


In [63]:
df_address

Unnamed: 0,Address,Latitude,Longitude,Chinese Restaurant,Bakery
0,"3331, Thomas Street, Lincoln Heights, Los Ange...",34.082312,-118.204106,0,0
1,"362, Avenue 33, Lincoln Heights, Los Angeles, ...",34.084526,-118.208553,0,0
2,"Metro Park and Ride, Pasadena Avenue, Highland...",34.086739,-118.213000,1,3
3,"453, Isabel Street, Cypress Park, Los Angeles,...",34.088952,-118.217448,1,2
4,"3344, Pepper Avenue, Cypress Park, Los Angeles...",34.091165,-118.221896,1,2
...,...,...,...,...,...
359,"753, East 33rd Street, Historic South-Central,...",34.016206,-118.263594,2,9
360,"416, East 32nd Street, Historic South-Central,...",34.018413,-118.268042,3,8
361,"129, West 32nd Street, Historic South-Central,...",34.020620,-118.272490,7,6
362,"Harbor Freeway, Historic South-Central, Los An...",34.022828,-118.276939,7,7


From the above dataframe, we can see that lots of places have few Chinese Restarants. We want similar neighborhoods, so let's filter address with number of Chinese Restaurants nearby in the range from 13 to 23 and number of Bakery nearby in the range from 4 to 10 since in NY, our restaurant has 18 Chinese Restaurants and 7 Bakeries nearby.

In [79]:
df_category_one = df_address[df_address['Chinese Restaurant'] > 13]
df_category_one = df_category_one[df_category_one['Chinese Restaurant'] < 23].reset_index(drop=True)
df_category_one.shape

(48, 5)

In [80]:
df_category_two = df_category_one[df_category_one['Bakery'] > 4]
df_category_two = df_category_two[df_category_two['Bakery'] < 10].reset_index(drop=True)
df_category_two

Unnamed: 0,Address,Latitude,Longitude,Chinese Restaurant,Bakery
0,"Solano Avenue, Elysian Park, Chinatown, Los An...",34.074032,-118.231186,14,8
1,"North Main Street, Mission Junction LA, Chinat...",34.067509,-118.226816,16,9
2,"Dodger Stadium, 1000, Vin Scully Avenue, Elysi...",34.074144,-118.24016,14,5
3,"1480, North Boylston Street, Elysian Park, Los...",34.076355,-118.24461,14,6
4,"431, Leroy Street, Mission Junction LA, Chinat...",34.063198,-118.226893,19,9
5,"Vin Scully Avenue, Elysian Park, Los Angeles, ...",34.072043,-118.244686,14,7


Now we only have 6 places left. That's good. Then we look at other categories and do the above process to calculate the number of venues with each category nearby.

In [81]:
for j in range(2,10):
    category_id = category_ids[j]
    category_name = ten_categories[j]
    category_frequency = []
    for i in range(df_category_two.shape[0]):
        first_lat = df_category_two.loc[i,'Latitude']
        first_lng = df_category_two.loc[i,'Longitude']
        radius = 1000
        Limit = 50
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&categoryId={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, first_lat, first_lng, VERSION, category_id, radius, LIMIT)

        results = requests.get(url).json()
        # assign relevant part of JSON to venues
        venues = results['response']['venues']

        # tranform venues into a dataframe
        dataframe3 = json_normalize(venues)
        category_frequency.append(dataframe3.shape[0])
    df_category_two[category_name] = category_frequency

In [93]:
df_category_two

Unnamed: 0,Address,Latitude,Longitude,Chinese Restaurant,Bakery,Dessert Shop,Dim Sum Restaurant,Bubble Tea Shop,Salon / Barbershop,Italian Restaurant,Shanghai Restaurant,Spa,Noodle House
0,"Solano Avenue, Elysian Park, Chinatown, Los An...",34.074032,-118.231186,14,8,3,3,0,8,2,0,1,1
1,"North Main Street, Mission Junction LA, Chinat...",34.067509,-118.226816,16,9,3,3,1,12,0,0,2,3
2,"Dodger Stadium, 1000, Vin Scully Avenue, Elysi...",34.074144,-118.24016,14,5,2,3,0,11,2,0,1,0
3,"1480, North Boylston Street, Elysian Park, Los...",34.076355,-118.24461,14,6,2,3,0,12,2,0,2,0
4,"431, Leroy Street, Mission Junction LA, Chinat...",34.063198,-118.226893,19,9,4,4,1,11,0,0,4,3
5,"Vin Scully Avenue, Elysian Park, Los Angeles, ...",34.072043,-118.244686,14,7,2,3,0,11,3,0,2,0


After getting above dataframe, we need to find how similar of each places with original restaurant's environment in NY. So we minus each number with original frequency and get the weight, and we will choose the optimal place according to this weight.

In [84]:
categories_freq_ny = nearby_venues_grouped_rank.iloc[:10]['name']

In [102]:
final_df = df_category_two[df_category_two.columns[:3]] 
weight = []
for i in range(6):
    value = 0
    for j in range(3,13):
        value += abs(df_category_two.iloc[i,j] - categories_freq_ny[j-3])
    weight.append(value)
final_df['weight'] = weight

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


In [103]:
final_df

Unnamed: 0,Address,Latitude,Longitude,weight
0,"Solano Avenue, Elysian Park, Chinatown, Los An...",34.074032,-118.231186,25
1,"North Main Street, Mission Junction LA, Chinat...",34.067509,-118.226816,26
2,"Dodger Stadium, 1000, Vin Scully Avenue, Elysi...",34.074144,-118.24016,31
3,"1480, North Boylston Street, Elysian Park, Los...",34.076355,-118.24461,30
4,"431, Leroy Street, Mission Junction LA, Chinat...",34.063198,-118.226893,22
5,"Vin Scully Avenue, Elysian Park, Los Angeles, ...",34.072043,-118.244686,27


## Results and Discussion <a name="results"></a>

From above we can see that, we got six places that satisfy our first two categories: "Chinese Restaurant" and "Bakery", which means that these six places have similar number of venues with those categories nearby. Combining other eight categories and calculating the weight, we find that the optimal place should be "431, Leroy Street, Mission Junction LA" with latitude 34.063198 and longitude -118.226893 since it has smallest number on weight so that it has most similar neighborhoods as restaurant in New York City. Thus, if "Deluxe Green Bo Restaurant" wants to open a chian restaurant in Los Angeles, then opening arond "431, Leroy Street, Mission Junction LA" should be a optimal choice.

## Conclusion <a name="conclusion"></a>

Purpose of this project is to find a optimal location for a successful Chinese restaurant in New York to open a chain restaurant in Los Angeles. The first process for this project is to find a good restaurant to focus on, but if you have already know which brand you want to focus (not just restaurant, but also stores or any brands), then you can skip the first process. After deciding the restaurant to focus, we explored the neighborhoods of this restaurant, and then try to find a place in Los Angeles that has similar number of categories of neighborhoods. After analysis, we find that "431, Leroy Street, Mission Junction LA" is a optimal place for "Deluxe Green Bo Restaurant" to open a chain restaurant.