# Capstone Project - The Battle of Neighborhoods - Search for Chinese Restaurant

## Introduction  
  A Chinese restaurant is an establishment that serves a Chinese cuisine. 
  
  Chinese cuisine is an important part of Chinese culture, which includes cuisine originating from the diverse regions of China, as well as from Overseas Chinese who have settled in other parts of the world. Chinese food staples such as rice, dumpling, noodles, tea, and tofu, and utensils such as chopsticks and the wok, can now be found worldwide.  
  
  Many people including myself, love Chinese food, and always search for Chinese restaurants when traveling. However, Chinese cuisine is so diverse. The eight major traditions of Chinese cuisine are: Shandong cuisine, Sichuan cuisine, Cantonese cuisine, Fujian cuisine, Jiangsu cuisine, Zhejiang cuisine, Hunan cuisine, Anhui cuisine. Let along other traditional cuisine. In Foursquare, there are also Chinese restaurants belong to different categories, such as dim sum restaurants, dumpling restaurant, etc.  
  
  Therefore, if simply search for "Chinese restaurants", there will be some restaurants that actually belong to Chinese cuisine being left out. Also, some Foursquare data lack address or postalCode, which is essential for restaurant information. 
  
  So, the purpose of this project is to get all, at least most, Chinese restaurants' name, address and postalCode worldwide, for those who want to search for or provide others with Chinese restaurants information around a specific place.

## Data
#### Coordinates data 
Source: use geopy to get coordinates data. https://geopy.readthedocs.io/en/stable/  

Description: Includes searching places' and restaurants' latitude, longitude, address, etc. Latitude and longitude can used to get the postalCode and address for desired places; postalCode and address are part of Chinese restaurant information data provided for the final product, also for visualizing choropleth map.

#### Restaurants data
Source: Foursquare API

Description: use Foursquare API to search for restaurants around a location, such as name, category, address, postalCode, etc. Then filter the data to get only clean Chinese restaurants information.

## Methodology section  

This section represents the main component of the report where the data is gathered, prepared for analysis and mapping. 

### Approach

Use geopy library to get the latitude and longitude values of locations.  

Use Foursquare API to search all restaurants around a location.  

Clean the json and structure it into a pandas dataframe.  

Get all Chinese restaurants information, which includes:  
* Create a list to include all Chinese restaurants categories in Foursquare.
* Use geopy library to replace NaN value in 'postalCode' and 'address' columns.
* Select columns most related to audience for final display.

Use Folium to Create a map of restaurants with name superimposed on top.

## Methodology Execution

#### Import necessary Libraries

In [1]:
import pandas as pd
import numpy as np
import requests
from pandas.io.json import json_normalize
#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
from pandas.io.json import json_normalize # transforming json file into a pandas dataframe library
print('Libraries imported.')

Libraries imported.


### Search all restaurants around a location. 
  Since the purpose of this project is to get Chinese restaurants information around a location worldwide, some top visited cities will be tested on the final project, which are selected from **List of cities by international visitors** https://en.wikipedia.org/wiki/List_of_cities_by_international_visitors   
  
  But first, for building this project, Los Angeles,CA is chosen to start with, since it's known to have many Chinese restaurants nearby.  

#### Specify a searching location
Note: location can be changed.

In [2]:
address = 'Los Angeles, CA' # This can be changed to your interested city or location.

#### Use geopy library to get the latitude and longitude values of the searching location.

In [3]:
geolocator = Nominatim(user_agent="restaurant_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of {} are {}, {}.'.format(address, latitude, longitude))

The geograpical coordinate of Los Angeles, CA are 34.0536909, -118.242766.


#### Utilizing the Foursquare API to search all restaurants around a location.
Search "restaurants" instead of "Chinese restaurant" so no Chinese restaurants, such as dim sum restaurant will be left out.

In [4]:
# @hidden_cell
# Define Foursquare Credentials and Version
CLIENT_ID = '######' # Your Foursquare client_id
CLIENT_SECRET = '######' # Your Foursquare client_secret
VERSION = '20180605' # Foursquare API version

In [5]:
# Define search_query, radius and limit
search_query = 'restaurant'
radius = 10000 # takes about 6 minutes to drive at 60 mph speed, radius number can be changed.
LIMIT = 100

In [6]:
# Define URL
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=5RR4BF503NMA5Z3T1BVQDUFYH4D4KRESSO4TE1MCGVIUF4QC&client_secret=4KA5SMNYGO4GC5GE0QI4ROQRPMMFU5LS541FZ0NQWEH2Y5UB&ll=34.0536909,-118.242766&v=20180605&query=restaurant&radius=10000&limit=100'

Send GET request and examine results.

In [7]:
results = requests.get(url).json()
# Get relevant part of JSON. 
venues = results['response']['venues']
# Print sample JSON data.
venues[1]

{'id': '49ebc74af964a5202b671fe3',
 'name': 'Yang Chow Restaurant',
 'location': {'address': '819 N Broadway',
  'crossStreet': 'btwn College & Alpine',
  'lat': 34.06292584487055,
  'lng': -118.2380586952736,
  'labeledLatLngs': [{'label': 'display',
    'lat': 34.06292584487055,
    'lng': -118.2380586952736},
   {'label': 'entrance', 'lat': 34.062969, 'lng': -118.238165}],
  'distance': 1115,
  'postalCode': '90012',
  'cc': 'US',
  'city': 'Los Angeles',
  'state': 'CA',
  'country': 'United States',
  'formattedAddress': ['819 N Broadway (btwn College & Alpine)',
   'Los Angeles, CA 90012',
   'United States']},
 'categories': [{'id': '4bf58dd8d48988d145941735',
   'name': 'Chinese Restaurant',
   'pluralName': 'Chinese Restaurants',
   'shortName': 'Chinese',
   'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/asian_',
    'suffix': '.png'},
   'primary': True}],
 'venuePage': {'id': '48833648'},
 'referralId': 'v-1621229225',
 'hasPerk': False}

### Clean the json and structure it into a *pandas* dataframe.

Build a function to extract the category of the venue.

In [8]:
# function that extracts the category of the venue
def get_category_type(row):
    if len(row) == 0:
        return None
    else:
        return row[0]['name']

#### Clean the json and structure it into a pandas dataframe.

In [9]:
## Process JSON and convert it to a clean dataframe
venues_df = json_normalize(venues) # flatten JSON
# filter columns
filtered_columns = ['name', 'categories', 'location.address', 'location.distance', 'location.postalCode','location.lat', 'location.lng']
venues_df = venues_df[filtered_columns]
# filter the category for each row
venues_df['categories'] = venues_df['categories'].apply(lambda x: get_category_type(x))
# clean columns
venues_df.columns = [col.split('.')[-1] for col in venues_df.columns]
venues_df.head()

  


Unnamed: 0,name,categories,address,distance,postalCode,lat,lng
0,Noe Restaurant & Bar,French Restaurant,251 S Olive St,771,90012,34.05229,-118.250963
1,Yang Chow Restaurant,Chinese Restaurant,819 N Broadway,1115,90012,34.062926,-118.238059
2,Mayflower Seafood Restaurant,Seafood Restaurant,679 N Spring St,782,90012,34.059582,-118.238129
3,Full House Seafood Restaurant,Chinese Restaurant,963 N Hill St,1457,90012,34.066155,-118.237916
4,Traxx Restaurant,New American Restaurant,800 N Alameda St,669,90012,34.056232,-118.236183


#### Get only venues that 'categories' belong to 'restaurant'.  

In [10]:
# get only venues that 'categories' belong to 'restaurant' 
restaurants = venues_df[venues_df['categories'].astype(str).str.contains('Restaurant')].reset_index(drop=True)
restaurants.head()

Unnamed: 0,name,categories,address,distance,postalCode,lat,lng
0,Noe Restaurant & Bar,French Restaurant,251 S Olive St,771,90012,34.05229,-118.250963
1,Yang Chow Restaurant,Chinese Restaurant,819 N Broadway,1115,90012,34.062926,-118.238059
2,Mayflower Seafood Restaurant,Seafood Restaurant,679 N Spring St,782,90012,34.059582,-118.238129
3,Full House Seafood Restaurant,Chinese Restaurant,963 N Hill St,1457,90012,34.066155,-118.237916
4,Traxx Restaurant,New American Restaurant,800 N Alameda St,669,90012,34.056232,-118.236183


### Get all Chinese restaurants information.  
  Now we've got all the restaurants information around a location, it's time to specify categories to our desired restaurants.  
#### Check 'categories' to see how many unique restaurant categories we have around this location.

In [11]:
# get unique 'categories'
restaurants['categories'].unique()

array(['French Restaurant', 'Chinese Restaurant', 'Seafood Restaurant',
       'New American Restaurant', 'Mexican Restaurant',
       'Asian Restaurant', 'American Restaurant',
       'Latin American Restaurant', 'Italian Restaurant',
       'Korean BBQ Restaurant', 'Sushi Restaurant',
       'Mediterranean Restaurant', 'Indian Restaurant', 'Restaurant',
       'Korean Restaurant', 'Japanese Restaurant', 'Thai Restaurant',
       'Spanish Restaurant', 'Dim Sum Restaurant'], dtype=object)

  Apparently, there's a big chance 'Asian Restaurant' will serve chinese food, but it could also be Japanese or Korean restaurant. For this project, I'll keep this category.   
  Running above code several times on different locations, I've also found some other facts, such as 'Dim Sum Restaurant', 'Hotpot Restaurant', 'Peking Duck Restaurant', 'Dumpling Restaurant', etc. Therefore, those catergories will be added to the **chinese_restaurants_list** in the following code cell.  

  In order to be more accurate, some other chinese cuisine mentioned in **List of Chinese cuisine**: https://en.wikipedia.org/wiki/Chinese_cuisine will also be added to **chinese_restaurant_list**. I've added the eight major cuisine and some other categories based on my personal perference from list below:
#### The eight major traditions of Chinese cuisine:  
Shandong cuisine, Sichuan cuisine, Cantonese cuisine, Fujian cuisine, Jiangsu cuisine, Zhejiang cuisine, Hunan cuisine, Anhui cuisine
#### Other traditions in Chinese cuisine: 
Beijing cuisine, Chinese imperial cuisine, Guizhou cuisine, Henan cuisine, Huaiyang cuisine, Hubei cuisine, Jiangxi cuisine, Shaanxi cuisine, Shanghai cuisine, Shanxi cuisine, Teochew cuisine

#### Create a list to include other categories that didn't list as 'Chinese Restaurant' in Foursquare.  
Note: You can modify **chinese_restaurants_list** to include more/less categories, or even create your own list, such as japanese_restaurants_list, korean_restaurants_list, indian_restaurants_list, italian_restaurants_list, etc.

In [12]:
# create a Chinese restaurant list based on my perference.
chinese_restaurants_list = ['Dim Sum Restaurant', 
                            'Szechuan Restaurant',
                            'Chinese Restaurant', 
                            'Cantonese Restaurant', 
                            'Asian Restaurant',  # this category may include other Asian Restaurant, such as Japanese Restaurant, Korean Restaurant, etc.
                            'Hotpot Restaurant',
                            'Peking Duck Restaurant',
                            'Shanxi Restaurant',
                            'Beijing Restaurant',
                            'Xinjiang Restaurant',
                            'Hong Kong Restaurant',
                            'Dumpling Restaurant',
                            'Yunnan Restaurant',
                            'Shandong Restaurant',
                            'Fujian Restaurant', 
                            'Jiangsu Restaurant',
                            'Zhejiang Restaurant',
                            'Hunan Restaurant',
                            'Anhui Restaurant',
                           ] # this can be changed according to personal perference, just be sure to add ' Restaurant' as suffix
chinese_restaurants = restaurants[restaurants['categories'].isin(chinese_restaurants_list)].reset_index(drop=True)
chinese_restaurants.head()

Unnamed: 0,name,categories,address,distance,postalCode,lat,lng
0,Yang Chow Restaurant,Chinese Restaurant,819 N Broadway,1115,90012,34.062926,-118.238059
1,Full House Seafood Restaurant,Chinese Restaurant,963 N Hill St,1457,90012,34.066155,-118.237916
2,Wok Inn Restaurant,Asian Restaurant,201 N Los Angeles St Ste 102,116,90012,34.054317,-118.241752
3,Taipan Restaurant,Chinese Restaurant,330 S Hope St,889,90071,34.052405,-118.252288
4,Won Kok Restaurant,Chinese Restaurant,210 Alpine St,1040,90012,34.061958,-118.237509


#### Replace NaN value in 'postalCode' and 'address' columns.  
When testing the code on some location, such as 'Los Angeles, CA', some **NaN value** shows up in postalCode or address. Those NaN value will be filled by using Geopy library.  

In [13]:
# create a new column 'lat_lng'
chinese_restaurants['lat'] = chinese_restaurants['lat'].apply(lambda x: '%.6f' % x)
chinese_restaurants['lng'] = chinese_restaurants['lng'].apply(lambda x: '%.6f' % x)
chinese_restaurants['lat_lng'] = chinese_restaurants['lat'].astype(str) + ', ' + chinese_restaurants['lng'].astype(str)
chinese_restaurants.head()

Unnamed: 0,name,categories,address,distance,postalCode,lat,lng,lat_lng
0,Yang Chow Restaurant,Chinese Restaurant,819 N Broadway,1115,90012,34.062926,-118.238059,"34.062926, -118.238059"
1,Full House Seafood Restaurant,Chinese Restaurant,963 N Hill St,1457,90012,34.066155,-118.237916,"34.066155, -118.237916"
2,Wok Inn Restaurant,Asian Restaurant,201 N Los Angeles St Ste 102,116,90012,34.054317,-118.241752,"34.054317, -118.241752"
3,Taipan Restaurant,Chinese Restaurant,330 S Hope St,889,90071,34.052405,-118.252288,"34.052405, -118.252288"
4,Won Kok Restaurant,Chinese Restaurant,210 Alpine St,1040,90012,34.061958,-118.237509,"34.061958, -118.237509"


Build functions to get postalCode or address.

In [14]:
# build a function to get postalCode for restaurant if none exists.
def get_postcode(row):
    ind = row['lat_lng']
    row['postalCode'] = str(row['postalCode'])
    if row['postalCode'] == "nan":
        postcode = geolocator.reverse(ind).address.split(',')[-2]
        return postcode[:7]
    else:
        return row['postalCode']

In [15]:
# build a function to get address for restaurant if none exists.
def get_address(row):
    ind = row['lat_lng']
    row['address'] = str(row['address'])
    if row['address'] == "nan":
        address = geolocator.reverse(ind).address.split(',')[0] + geolocator.reverse(ind).address.split(',')[1]
        return address
    else:
        return row['address']

Apply functions to replace NaN value in 'postalCode' and 'address'.

In [16]:
# apply functions
chinese_restaurants['postalCode'] = chinese_restaurants.apply(lambda x: get_postcode(x), axis=1)
chinese_restaurants['address'] = chinese_restaurants.apply(lambda x: get_address(x), axis=1)
chinese_restaurants.head()

Unnamed: 0,name,categories,address,distance,postalCode,lat,lng,lat_lng
0,Yang Chow Restaurant,Chinese Restaurant,819 N Broadway,1115,90012,34.062926,-118.238059,"34.062926, -118.238059"
1,Full House Seafood Restaurant,Chinese Restaurant,963 N Hill St,1457,90012,34.066155,-118.237916,"34.066155, -118.237916"
2,Wok Inn Restaurant,Asian Restaurant,201 N Los Angeles St Ste 102,116,90012,34.054317,-118.241752,"34.054317, -118.241752"
3,Taipan Restaurant,Chinese Restaurant,330 S Hope St,889,90071,34.052405,-118.252288,"34.052405, -118.252288"
4,Won Kok Restaurant,Chinese Restaurant,210 Alpine St,1040,90012,34.061958,-118.237509,"34.061958, -118.237509"


#### Select columns most related to audience.

In [17]:
# Drop 'lat', 'lng', 'lat_lng' columns.
chinese_restaurants_display = chinese_restaurants.drop(columns = ['lat', 'lng', 'lat_lng'])
chinese_restaurants_display

Unnamed: 0,name,categories,address,distance,postalCode
0,Yang Chow Restaurant,Chinese Restaurant,819 N Broadway,1115,90012
1,Full House Seafood Restaurant,Chinese Restaurant,963 N Hill St,1457,90012
2,Wok Inn Restaurant,Asian Restaurant,201 N Los Angeles St Ste 102,116,90012
3,Taipan Restaurant,Chinese Restaurant,330 S Hope St,889,90071
4,Won Kok Restaurant,Chinese Restaurant,210 Alpine St,1040,90012
5,Keung Kee B.B.Q. Restaurant,Chinese Restaurant,420 Ord St,790,90012
6,Little Sagion International Restaurant Group H...,Asian Restaurant,Los Angeles,457,90013
7,NBC Seafood Restaurant,Dim Sum Restaurant,404 S Atlantic Blvd Ste A,10124,91754
8,Hop Woo BBQ Seafood Restaurant,Chinese Restaurant,845 N Broadway,1203,90012
9,Hong Kong BBQ Restaurant,Chinese Restaurant,803 N Broadway,1067,90012


  Now, we've got all the Chinese restaurants information for one location, 'Los Angeles, CA' for now.   
  But, I've tested it on several other locations, such as 'New York city, NY', 'Shanghai, China', 'London, UK', 'Tokyo, Japan', etc.   
  I'll display them in the **Results section**.

### Create a map of restaurants with name superimposed on top.

Import libraries

In [18]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
#!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library
print("Libraries imported.")

Libraries imported.


#### Create map of restaurants.

In [19]:
# create map of restaurants using latitude and longitude values
map_restaurants = folium.Map(location=[latitude, longitude], zoom_start=15)

# add markers to map
for lat, lng, label in zip(chinese_restaurants['lat'].astype(float), chinese_restaurants['lng'].astype(float), chinese_restaurants['name']):
    label = label.split(' (')[0]
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lng], radius=5, popup=label, color='blue', fill=True, fill_color='#3186cc', fill_opacity=0.7, parse_html=False).add_to(map_restaurants)  
map_restaurants

This 'Los Angeles,CA' Chinese restaurants map will not show up in Github, but you can check it out in **Results section**.    

Map of other locations will also be displayed in **Results section**.

## Results section  
  Here's the testing location list:  
  Bangkok, Thailand  
  Paris, France  
  Shenzhen, China  
  Tokyo, Japan  
### Chinese Restaurants information table

Bangkok, Thailand  
![alt text](https://raw.githubusercontent.com/Faye0924/Coursera_Capstone/master/Capstone%20Project%20-%20Result%20tables%20%26%20maps/Bangkok%2C%20Thailand%20-%20table.png "Bangkok, Thailand - table")

Paris, France  
![alt text](https://raw.githubusercontent.com/Faye0924/Coursera_Capstone/master/Capstone%20Project%20-%20Result%20tables%20%26%20maps/Paris%2C%20France%20-%20table.png "Paris, France - table")

Shenzhen, China  
![alt text](https://raw.githubusercontent.com/Faye0924/Coursera_Capstone/master/Capstone%20Project%20-%20Result%20tables%20%26%20maps/Shenzhen%2C%20China%20-%20table.png "Shenzhen, China - table")

Tokyo, Japan  
![alt text](https://raw.githubusercontent.com/Faye0924/Coursera_Capstone/master/Capstone%20Project%20-%20Result%20tables%20%26%20maps/Tokyo%2C%20Japan%20-%20table.png "Tokyo, Japan - table")

#### Running the code from this project, we got some clean Chinese restaurants information dataframe, showing restaurants' name, categories, address, distance and postalCode.  
### Chinese Restaurants Mapping

Bangkok, Thailand  
![alt text](https://raw.githubusercontent.com/Faye0924/Coursera_Capstone/master/Capstone%20Project%20-%20Result%20tables%20%26%20maps/Bangkok%2C%20Thailand%20-%20map.png "Bangkok, Thailand - map")

Paris, France  
![alt text](https://raw.githubusercontent.com/Faye0924/Coursera_Capstone/master/Capstone%20Project%20-%20Result%20tables%20%26%20maps/Paris%2C%20France%20-%20map.png "Paris, France - map")

Shenzhen, China  
![alt text](https://raw.githubusercontent.com/Faye0924/Coursera_Capstone/master/Capstone%20Project%20-%20Result%20tables%20%26%20maps/Shenzhen%2C%20China%20-%20map.png "Shenzhen, China - map")

Tokyo, Japan  
![alt text](https://raw.githubusercontent.com/Faye0924/Coursera_Capstone/master/Capstone%20Project%20-%20Result%20tables%20%26%20maps/Tokyo%2C%20Japan%20-%20map.png "Tokyo, Japan - map")

#### Running the code from this project, we were also able to display searching location's map with nearby Chinese restaurants' name superimposed on top.

## Discussion section  

  For the testing locatons, more US and China locations are selected because:   
  US is where we live, also the starting point for this project, and I wanted to make sure this project at least works in US;   
  China, apparently should have more Chinese restaurants categories, that why I decided to include **The eight major traditions of Chinese cuisine** into my **chinese_restaurants_list**.  

  In the dataframe 'name' and 'address' columns, I kept the foreign language, because it might be useful when traveling. However, Folium mapping didn't display foreign language correctly, and it beyonded my ability to fix that for now, so I could only keep the english name in the map.

  This project gives a clear table showing restaurants' name, categories, address, distance and postalCode around a certain location; also the restaurants' name superimposed on a neighborhood map.

  Also, this project can be modified to search for other restaurant categories. All you have to do is to modify the **chinese_restaurants_list** to your preferred categories, just be sure to follow the comment instruction in that part.

## Conclusion section

  Overall, if you love Chinese food, and always looking for Chinese restaurants when traveling, like me; or, maybe you are just sitting in front of a computer and want to explore a location, find some restaurants, this project can give you some useful information.

  It's a shame that I don't have a Foursquare Premium account, otherwise, I would make the effort to extract restaurant's ratings to make this project more valueable.

  This project has shown me a practical application to resolve a real world situation using Data Science tools.

  I feel rewarded with the efforts and time spent. I believe this course with all the topics covered is well worthy of appreciation.

## Reference
Chinese restaurant: https://en.wikipedia.org/wiki/Chinese_restaurant  
Chinese cuisine: https://en.wikipedia.org/wiki/Chinese_cuisine  
List of cities by international visitors: https://en.wikipedia.org/wiki/List_of_cities_by_international_visitors