# Applied Data Science - Final Capstone - Week 4
This is the capstone for applied data science course on Coursera. I will work on Chinatown Study in the USA.

## 1. Description of the problem and a discussion of the background

Chinatowns are cities within cities: adventurous, bustling, full of distinctive signage, street vendors selling unusual items, specialty shops, a noted lack of big chains, a variety of dialects being spoken, and multitudes of unique and exciting food choices.

The Chinese have been established in the United States since the mid-19th century, when laborers were needed for gold mining and railroad work, but the immigrant population also grew during the 1990s and 2000s; in fact, more than one-third of the Chinese immigrants now living in the U.S. arrived in 2000 or later. Currently, there are more than 3 million Chinese in America, according to the 2008 census report. Whether leaving China for issues ranging from poverty, famine or political reasons, across the decades, the Chinese have built strong communities that keep their ethnic heritage and shared identity; this maintained and rich culture is a defining reason that Chinatowns endure and why they're so appealing for residents and tourists alike.

Around the globe, there are Chinatowns in many major cities, from London (Europe's largest) to Vancouver (Canada's largest), Melbourne to Manila; and fortunately for us, there are many within the United States.Many of these districts share their community with other immigrant cultures, making the sights, sounds and eating choices that much more exotic. In a neighborhood where English is not the primary language, a visitor can feel as though they've left the U.S. altogether — and now they are the foreigner, a tourist in their own city. 

**So what exactly makes a Chinatown great?**

In order to compile my capstone, I will took a look at Top10 in America, and analysis the quality of authentic dining options, size, cultural experiences available, and whether a visitor will feel like they've left the United States as they explore the neighborhood.

## 2. Description of the data and how it will be used to solve the problem

### 2.1 Get the City List
I use a list from USA today for my research items. They offer us a top10 list for Chinatowns in the United States. I will exploare the all the Top 10 Chinatowns and compare them from all the aspects.

> - San Francisco
> - New York City
> - Chicago
> - Seattle
> - Philadelphia
> - Honolulu
> - Boston
> - Los Angeles
> - Houston
> - Washington, D.C.
 
List Scoure from here: https://www.usatoday.com/story/travel/destinations/2014/03/08/chinatown-chinese-asian-food/6173601/



### 2.2 Get the City ChinaTown Geo Data
I will use the Geocoder Python package to get the latitude and longitude values data of ChinaTown in each city.
Data from: https://geocoder.readthedocs.io/index.html.
#### Example of getting GeoData from Geocoder

In [1]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
address = 'Chinatown, New York, NY'
geolocator = Nominatim(user_agent = "ChinaTown_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Chinatown in New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Chinatown in New York City are 40.7164913, -73.9962504.


### 2.3 Explore the neighborhoods using Foursquare API
Lastly, I will utilize the Foursquare API to explore the ChinaTown neighborhoods and analysis them. I will get the top 1000 venues that are in ChinaTown within a radius of 1000 or 2000 meters. 

The following analysis will be performed by using the data we get:
- Analysis the quality of authentic dining options, size, cultural experiences
- Analysis the venues between Restaurant and other catalogs.
- Find the most similar ChinaTowns among those ten cities.

#### Example of getting the data from Foursquare API
Getting ChinaTown data in New York City.

In [12]:
import requests # library to handle requests
import pandas as pd
CLIENT_ID = '***************' # your Foursquare ID
CLIENT_SECRET = '*****************' # your Foursquare Secret
VERSION = '20210417' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value
radius = 2000
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, CLIENT_SECRET, VERSION, latitude, longitude, radius, LIMIT)
results = requests.get(url).json()["response"]['groups'][0]['items']
venues_list=[]
venues_list.append([('New York Chinatown', latitude, longitude, v['venue']['name'], v['venue']['location']['lat'], v['venue']['location']['lng'],v['venue']['categories'][0]['name']) for v in results])
nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
nearby_venues.columns = ['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude', 'Venue',  'Venue Latitude', 'Venue Longitude','Venue Category']
nearby_venues.head(10)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,New York Chinatown,40.716491,-73.99625,Zu Yuan Spa,40.715469,-73.998627,Spa
1,New York Chinatown,40.716491,-73.99625,Wayla,40.718291,-73.992584,Thai Restaurant
2,New York Chinatown,40.716491,-73.99625,Hair Toto Group,40.718629,-73.999593,Salon / Barbershop
3,New York Chinatown,40.716491,-73.99625,The Tyger,40.718835,-73.99948,Asian Restaurant
4,New York Chinatown,40.716491,-73.99625,Cheeky Sandwiches,40.715821,-73.99183,Sandwich Place
5,New York Chinatown,40.716491,-73.99625,Kiki's,40.714476,-73.992036,Greek Restaurant
6,New York Chinatown,40.716491,-73.99625,Scarr's Pizza,40.715335,-73.991649,Pizza Place
7,New York Chinatown,40.716491,-73.99625,Simple,40.718145,-73.991988,Asian Restaurant
8,New York Chinatown,40.716491,-73.99625,Michaeli Bakery,40.714704,-73.991847,Bakery
9,New York Chinatown,40.716491,-73.99625,Metrograph,40.714999,-73.991035,Indie Movie Theater
