<a href="https://colab.research.google.com/github/JerryCG/Coursera_Capstone/blob/main/Capstone_Project_Week2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Capstone Project – The Battle of Neighborhoods: Exploring Boston and Find the Best Place to Open a Chinese Restaurant (Week2)**

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

### **1. Business Problem**

This project is aimed to explore Boston and find the best place for opening a restaurant. To be more specific, the project is targeted at audience with an interest of opening a **Chinese** restaurant in **Boston, MA, USA**.

Considering the compeititon environment, it is optimal to find a region where **not so many restaurants exist**, **especially Chinese-style restaurants**. Moreover, the **closer to the city center**, the more likely the restaurant can gain enough customers and profits, which is also an important consideration in this project.

A few candidate neighborhoods in Boston will be recommended. Analysis of advantages and disadvantages of these neighborhoods will be discussed in detail.

### **2. Data**

According to the definition of the business problem, there are several factors that are related to our analysis:
* number of existing restaurants in the neighborhood
* number of and distance to Chinese restaurants in the neighborhood
* distance of neighborhood from city center

Following data sources will be needed to extract/generate the required information:
* geographical coordinates of all the neighborhoods are obtained from the Internet
* number of restaurants, the type and location in every neighborhood will be obtained using **Foursquare API**
* coordinate of Boston center will be obtained using **geopy**, using the location of Faneuil Hall Marketplace in Downtown, Boston.

*Load and examine the data of Boston neighborhoods*

In [1]:
from google.colab import drive
import os
import pandas as pd
drive.mount('/content/gdrive')
file_path = '/content/gdrive/My Drive/Data_Science/5_Applied Data Science Capstone/'

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [2]:
os.chdir(file_path)

In [3]:
df = pd.read_csv('Boston_Neighborhoods.csv', encoding= 'unicode_escape')

In [4]:
df.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Roslindale,42.2832,-71.127
1,Jamaica Plain,42.3097,-71.1151
2,Mission Hill,42.3296,-71.1062
3,Bay Village,42.349,-71.0698
4,Chinatown,42.3501,-71.0624


In [5]:
print('Boston has {} neighborhoods.'.format(
        df.shape[0]
    )
)

Boston has 23 neighborhoods.


*Import all the libraries needed for analysis*

In [6]:
import numpy as np 
from geopy.geocoders import Nominatim
import requests
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium
import json
from pandas.io.json import json_normalize

*Use geopy library to get the latitude and longitude values of Boston.*

In [7]:
address = 'Boston, BOS'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Boston are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Boston are 42.3630236, -71.01340798670637.


*Create a map of Boston with neighborhoods superimposed on top.*

In [8]:
map_boston = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_boston)  
    
map_boston

### **3. Methodology**

Firstly, we find the ten **nearest neighborhoods in Boston from the city center**. This is because we assume only when the neighborhood is near the city center, the restaurant can gain enough customers and earn profits.

Secondly, among the Top10 neighborhoods, we explore them by finding the **top 100 venues that are within a radius of 500 meters**. The number of venues are an important indicator of the neighborhood's living conditions.

Thirdly, among all the venues, we try to find **the number of restaurants** to see how competitive the catering industry is in the neighborhood. If the restaurants are intensely distributed in this area, then it might be hard to enter the market and earn decent profits.

Fourthly, among all the restaurants, we try to find **how many of them are Chinese restaurants**. If the neighborhood is already crowded with Chinese restaurants, then it is not ideal to open an alike restaurant there to make profits.

Lastly, based on these information, we develop **a evaluation score** which are related to the relevant variables we get previously. According to the score, we can rank the candidate neighborhood and propose several good choices for opening a Chinese restaurant.

### **4. Analysis**

*Find the geographical coordinates of Boston city center.*

In [9]:
address = '4 S Market Street, MA'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
center_latitude = location.latitude
center_longitude = location.longitude
print('The geograpical coordinate of Boston city center are {}, {}.'.format(center_latitude, center_longitude))

The geograpical coordinate of Boston city center are 42.359706, -71.0550683.


*Define the distance between two places according to the geographical coordinates.*

In [10]:
from math import radians, cos, sin, asin, sqrt
def dist(lat1, long1, lat2, long2):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """
    # convert decimal degrees to radians 
    lat1, long1, lat2, long2 = map(radians, [lat1, long1, lat2, long2])
    # haversine formula 
    dlon = long2 - long1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    # Radius of earth in kilometers is 6371
    km = 6371* c
    return km

*Find the ten nearest neighborhoods from the Boston city center.*

In [11]:
dist_list = []
for i in range(len(df)):
  dist_list.append(dist(df.loc[i,'Latitude'], df.loc[i,'Longitude'], center_latitude, center_longitude))

In [12]:
df['Distance from the City Center (KM)'] = dist_list

In [13]:
df = df.sort_values(by=['Distance from the City Center (KM)'], axis=0, ascending=True)

In [14]:
df_top10 = df.head(10)

In [15]:
df_top10.index = [0,1,2,3,4,5,6,7,8,9]

In [16]:
df_top10

Unnamed: 0,Neighborhood,Latitude,Longitude,Distance from the City Center (KM)
0,Downtown,42.3557,-71.0572,0.478647
1,North End,42.3647,-71.0542,0.559871
2,West End,42.3644,-71.0661,1.04593
3,Chinatown,42.3501,-71.0624,1.226325
4,Beacon Hill,42.3588,-71.0707,1.288336
5,Bay Village,42.349,-71.0698,1.697815
6,East Boston,42.3702,-71.0389,1.768092
7,Charlestown,42.3782,-71.0602,2.099208
8,Back Bay,42.3503,-71.081,2.373688
9,South Boston,42.3381,-71.0476,2.479632


*Explore the Top 10 neighborhoods one by one*

In [17]:
CLIENT_ID = 'GR1LXKC20RMWIHVKCO1IHVY2BNSDDLAMKEEVFQ5KBGYBVVSO' # your Foursquare ID
CLIENT_SECRET = 'MJHDJDJQKTCLS5PN2ZC2JGOWM04PPYHDD441GC40WQOLC0EG' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: GR1LXKC20RMWIHVKCO1IHVY2BNSDDLAMKEEVFQ5KBGYBVVSO
CLIENT_SECRET:MJHDJDJQKTCLS5PN2ZC2JGOWM04PPYHDD441GC40WQOLC0EG


*Now, let's get the top 100 venues that are in the district within a radius of 500 meters. After that, find the number of restaurants and specifically, Chinese restaurants.*

In [18]:
venues_num = []
restaurants_num = []
chinese_restaurants_num = []

In [19]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [20]:
def get_info(i):
  neighborhood_latitude = df_top10.loc[i, 'Latitude'] # neighborhood latitude value
  neighborhood_longitude = df_top10.loc[i, 'Longitude'] # neighborhood longitude value

  neighborhood_name = df_top10.loc[i, 'Neighborhood'] # neighborhood name

  print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                                neighborhood_latitude, 
                                                                neighborhood_longitude))
  
  LIMIT = 100
  radius = 500
  url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
  CLIENT_ID, 
  CLIENT_SECRET, 
  VERSION, 
  neighborhood_latitude, 
  neighborhood_longitude, 
  radius, 
  LIMIT)

  results = requests.get(url).json()

  venues = results['response']['groups'][0]['items']
    
  nearby_venues = json_normalize(venues) # flatten JSON

  # filter columns
  filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
  nearby_venues =nearby_venues.loc[:, filtered_columns]

  # filter the category for each row
  nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

  # clean columns
  nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

  nearby_restaurants = nearby_venues[nearby_venues['categories'].str.contains("Restaurant")]

  nearby_china_restaurants = nearby_restaurants[nearby_restaurants['categories'].str.contains("Chinese")]

  print('In '+ neighborhood_name + ' {} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))
  venues_num.append(nearby_venues.shape[0])
  print('In '+ neighborhood_name + ' {} restaurants were returned by Foursquare.'.format(nearby_restaurants.shape[0]))
  restaurants_num.append(nearby_restaurants.shape[0])
  print('In '+ neighborhood_name + ' {} Chinese restaurants were returned by Foursquare.'.format(nearby_china_restaurants.shape[0]))
  chinese_restaurants_num.append(nearby_china_restaurants.shape[0])

### **1. Downtown**

In [21]:
get_info(0)

Latitude and longitude values of Downtown are 42.3557, -71.0572.
In Downtown 96 venues were returned by Foursquare.
In Downtown 25 restaurants were returned by Foursquare.
In Downtown 1 Chinese restaurants were returned by Foursquare.




### **2. North End**

In [22]:
get_info(1)

Latitude and longitude values of North End are 42.3647, -71.0542.
In North End 84 venues were returned by Foursquare.
In North End 30 restaurants were returned by Foursquare.
In North End 0 Chinese restaurants were returned by Foursquare.




### **3. West End**

In [23]:
get_info(2)

Latitude and longitude values of West End are 42.3644, -71.0661.
In West End 83 venues were returned by Foursquare.
In West End 16 restaurants were returned by Foursquare.
In West End 0 Chinese restaurants were returned by Foursquare.




### **4. Chinatown**

In [24]:
get_info(3)

Latitude and longitude values of Chinatown are 42.3501, -71.0624.
In Chinatown 78 venues were returned by Foursquare.
In Chinatown 40 restaurants were returned by Foursquare.
In Chinatown 15 Chinese restaurants were returned by Foursquare.




### **5. Beacon Hill**

In [25]:
get_info(4)

Latitude and longitude values of Beacon Hill are 42.3588, -71.0707.
In Beacon Hill 28 venues were returned by Foursquare.
In Beacon Hill 5 restaurants were returned by Foursquare.
In Beacon Hill 0 Chinese restaurants were returned by Foursquare.




### **6. Bay Village**

In [26]:
get_info(5)

Latitude and longitude values of Bay Village are 42.349, -71.0698.
In Bay Village 66 venues were returned by Foursquare.
In Bay Village 5 restaurants were returned by Foursquare.
In Bay Village 0 Chinese restaurants were returned by Foursquare.




### **7. East Boston**

In [27]:
get_info(6)

Latitude and longitude values of East Boston are 42.3702, -71.0389.
In East Boston 43 venues were returned by Foursquare.
In East Boston 14 restaurants were returned by Foursquare.
In East Boston 1 Chinese restaurants were returned by Foursquare.




### **8. Charlestown**

In [28]:
get_info(7)

Latitude and longitude values of Charlestown are 42.3782, -71.0602.
In Charlestown 20 venues were returned by Foursquare.
In Charlestown 1 restaurants were returned by Foursquare.
In Charlestown 0 Chinese restaurants were returned by Foursquare.




### **9. Back Bay**

In [29]:
get_info(8)

Latitude and longitude values of Back Bay are 42.3503, -71.081.
In Back Bay 93 venues were returned by Foursquare.
In Back Bay 21 restaurants were returned by Foursquare.
In Back Bay 0 Chinese restaurants were returned by Foursquare.




### **10. South Boston**

In [30]:
get_info(9)

Latitude and longitude values of South Boston are 42.3381, -71.0476.
In South Boston 27 venues were returned by Foursquare.
In South Boston 4 restaurants were returned by Foursquare.
In South Boston 0 Chinese restaurants were returned by Foursquare.




*Add the number of venues, restaurants, Chinese restaurants information to the dataframe.*

In [31]:
df_top10['No. of Venues'] = venues_num
df_top10['No. of Restaurants'] = restaurants_num
df_top10['No. of Chinese Restaurants'] = chinese_restaurants_num

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


In [32]:
df_top10

Unnamed: 0,Neighborhood,Latitude,Longitude,Distance from the City Center (KM),No. of Venues,No. of Restaurants,No. of Chinese Restaurants
0,Downtown,42.3557,-71.0572,0.478647,96,25,1
1,North End,42.3647,-71.0542,0.559871,84,30,0
2,West End,42.3644,-71.0661,1.04593,83,16,0
3,Chinatown,42.3501,-71.0624,1.226325,78,40,15
4,Beacon Hill,42.3588,-71.0707,1.288336,28,5,0
5,Bay Village,42.349,-71.0698,1.697815,66,5,0
6,East Boston,42.3702,-71.0389,1.768092,43,14,1
7,Charlestown,42.3782,-71.0602,2.099208,20,1,0
8,Back Bay,42.3503,-71.081,2.373688,93,21,0
9,South Boston,42.3381,-71.0476,2.479632,27,4,0


*Calculate the Restaurants/Venues Ratio*

In [33]:
df_top10['Restaurants/Venues Ratio'] = df_top10['No. of Restaurants'] / df_top10['No. of Venues']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


*Calculate the Chinese Restaurants/Restaurants Ratio*

In [34]:
df_top10['Chinese Restaurants/Restaurants Ratio'] = df_top10['No. of Chinese Restaurants'] / df_top10['No. of Restaurants']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [35]:
df_top10

Unnamed: 0,Neighborhood,Latitude,Longitude,Distance from the City Center (KM),No. of Venues,No. of Restaurants,No. of Chinese Restaurants,Restaurants/Venues Ratio,Chinese Restaurants/Restaurants Ratio
0,Downtown,42.3557,-71.0572,0.478647,96,25,1,0.260417,0.04
1,North End,42.3647,-71.0542,0.559871,84,30,0,0.357143,0.0
2,West End,42.3644,-71.0661,1.04593,83,16,0,0.192771,0.0
3,Chinatown,42.3501,-71.0624,1.226325,78,40,15,0.512821,0.375
4,Beacon Hill,42.3588,-71.0707,1.288336,28,5,0,0.178571,0.0
5,Bay Village,42.349,-71.0698,1.697815,66,5,0,0.075758,0.0
6,East Boston,42.3702,-71.0389,1.768092,43,14,1,0.325581,0.071429
7,Charlestown,42.3782,-71.0602,2.099208,20,1,0,0.05,0.0
8,Back Bay,42.3503,-71.081,2.373688,93,21,0,0.225806,0.0
9,South Boston,42.3381,-71.0476,2.479632,27,4,0,0.148148,0.0


*Normalize relevant variables*

In [36]:
from sklearn.preprocessing import MinMaxScaler

In [37]:
scaler = MinMaxScaler()

In [38]:
scaler.fit(df_top10[['Distance from the City Center (KM)', 'No. of Venues', 'Restaurants/Venues Ratio', 'Chinese Restaurants/Restaurants Ratio']])

MinMaxScaler(copy=True, feature_range=(0, 1))

In [39]:
df_top10[['Norm_Distance', 'Norm_#Venues', 'Norm_Restaurants/Venues Ratio', 'Norm_Chinese Restaurants/Restaurants Ratio']] = scaler.transform(df_top10[['Distance from the City Center (KM)', 'No. of Venues', 'Restaurants/Venues Ratio', 'Chinese Restaurants/Restaurants Ratio']])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[k] = np.nan
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  isetter(loc, value[:, i].tolist())


In [40]:
df_top10

Unnamed: 0,Neighborhood,Latitude,Longitude,Distance from the City Center (KM),No. of Venues,No. of Restaurants,No. of Chinese Restaurants,Restaurants/Venues Ratio,Chinese Restaurants/Restaurants Ratio,Norm_Distance,Norm_#Venues,Norm_Restaurants/Venues Ratio,Norm_Chinese Restaurants/Restaurants Ratio
0,Downtown,42.3557,-71.0572,0.478647,96,25,1,0.260417,0.04,0.0,1.0,0.45464,0.106667
1,North End,42.3647,-71.0542,0.559871,84,30,0,0.357143,0.0,0.040592,0.842105,0.663633,0.0
2,West End,42.3644,-71.0661,1.04593,83,16,0,0.192771,0.0,0.283502,0.828947,0.30848,0.0
3,Chinatown,42.3501,-71.0624,1.226325,78,40,15,0.512821,0.375,0.373655,0.763158,1.0,1.0
4,Beacon Hill,42.3588,-71.0707,1.288336,28,5,0,0.178571,0.0,0.404645,0.105263,0.2778,0.0
5,Bay Village,42.349,-71.0698,1.697815,66,5,0,0.075758,0.0,0.609284,0.605263,0.055653,0.0
6,East Boston,42.3702,-71.0389,1.768092,43,14,1,0.325581,0.071429,0.644405,0.302632,0.595439,0.190476
7,Charlestown,42.3782,-71.0602,2.099208,20,1,0,0.05,0.0,0.809882,0.0,0.0,0.0
8,Back Bay,42.3503,-71.081,2.373688,93,21,0,0.225806,0.0,0.947054,0.960526,0.379859,0.0
9,South Boston,42.3381,-71.0476,2.479632,27,4,0,0.148148,0.0,1.0,0.092105,0.212065,0.0


 *Define the score as negatively related to the **distance from the city center**, **Restaurants/Venues Ratio** and **Chinese Restaurants/Restaurants Ratio**, but positively related to the **number of venues**.*

In [41]:
def get_score(i):
  score = - df_top10.loc[i, 'Norm_Distance'] + df_top10.loc[i, 'Norm_#Venues'] - df_top10.loc[i, 'Norm_Restaurants/Venues Ratio'] - df_top10.loc[i, 'Norm_Chinese Restaurants/Restaurants Ratio']
  return float(score)

In [42]:
score_list = [get_score(i) for i in range(10)]

In [43]:
pd_score_list = pd.DataFrame(score_list)

*Normalize the score.*

In [44]:
df_top10['Score'] = scaler.fit_transform(pd_score_list)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [45]:
df_top10

Unnamed: 0,Neighborhood,Latitude,Longitude,Distance from the City Center (KM),No. of Venues,No. of Restaurants,No. of Chinese Restaurants,Restaurants/Venues Ratio,Chinese Restaurants/Restaurants Ratio,Norm_Distance,Norm_#Venues,Norm_Restaurants/Venues Ratio,Norm_Chinese Restaurants/Restaurants Ratio,Score
0,Downtown,42.3557,-71.0572,0.478647,96,25,1,0.260417,0.04,0.0,1.0,0.45464,0.106667,1.0
1,North End,42.3647,-71.0542,0.559871,84,30,0,0.357143,0.0,0.040592,0.842105,0.663633,0.0,0.853204
2,West End,42.3644,-71.0661,1.04593,83,16,0,0.192771,0.0,0.283502,0.828947,0.30848,0.0,0.901557
3,Chinatown,42.3501,-71.0624,1.226325,78,40,15,0.512821,0.375,0.373655,0.763158,1.0,1.0,0.0
4,Beacon Hill,42.3588,-71.0707,1.288336,28,5,0,0.178571,0.0,0.404645,0.105263,0.2778,0.0,0.504255
5,Bay Village,42.349,-71.0698,1.697815,66,5,0,0.075758,0.0,0.609284,0.605263,0.055653,0.0,0.756798
6,East Boston,42.3702,-71.0389,1.768092,43,14,1,0.325581,0.071429,0.644405,0.302632,0.595439,0.190476,0.235609
7,Charlestown,42.3782,-71.0602,2.099208,20,1,0,0.05,0.0,0.809882,0.0,0.0,0.0,0.390698
8,Back Bay,42.3503,-71.081,2.373688,93,21,0,0.225806,0.0,0.947054,0.960526,0.379859,0.0,0.607123
9,South Boston,42.3381,-71.0476,2.479632,27,4,0,0.148148,0.0,1.0,0.092105,0.212065,0.0,0.239381


*Rank the neighborhoods according to the score.*

In [46]:
df_top10 = df_top10.sort_values(by=['Score'], axis=0, ascending=False)

In [47]:
df_top10.index = [1,2,3,4,5,6,7,8,9,10]

In [48]:
df_top10

Unnamed: 0,Neighborhood,Latitude,Longitude,Distance from the City Center (KM),No. of Venues,No. of Restaurants,No. of Chinese Restaurants,Restaurants/Venues Ratio,Chinese Restaurants/Restaurants Ratio,Norm_Distance,Norm_#Venues,Norm_Restaurants/Venues Ratio,Norm_Chinese Restaurants/Restaurants Ratio,Score
1,Downtown,42.3557,-71.0572,0.478647,96,25,1,0.260417,0.04,0.0,1.0,0.45464,0.106667,1.0
2,West End,42.3644,-71.0661,1.04593,83,16,0,0.192771,0.0,0.283502,0.828947,0.30848,0.0,0.901557
3,North End,42.3647,-71.0542,0.559871,84,30,0,0.357143,0.0,0.040592,0.842105,0.663633,0.0,0.853204
4,Bay Village,42.349,-71.0698,1.697815,66,5,0,0.075758,0.0,0.609284,0.605263,0.055653,0.0,0.756798
5,Back Bay,42.3503,-71.081,2.373688,93,21,0,0.225806,0.0,0.947054,0.960526,0.379859,0.0,0.607123
6,Beacon Hill,42.3588,-71.0707,1.288336,28,5,0,0.178571,0.0,0.404645,0.105263,0.2778,0.0,0.504255
7,Charlestown,42.3782,-71.0602,2.099208,20,1,0,0.05,0.0,0.809882,0.0,0.0,0.0,0.390698
8,South Boston,42.3381,-71.0476,2.479632,27,4,0,0.148148,0.0,1.0,0.092105,0.212065,0.0,0.239381
9,East Boston,42.3702,-71.0389,1.768092,43,14,1,0.325581,0.071429,0.644405,0.302632,0.595439,0.190476,0.235609
10,Chinatown,42.3501,-71.0624,1.226325,78,40,15,0.512821,0.375,0.373655,0.763158,1.0,1.0,0.0


### **5. Results and Discussion**

According to the ranking, the Top 3 neighborhoods with score over 0.8 are Downtown, West End, and North End. These 3 neighborhoods all have a large number of venues existing, which indicates good living conditions and good for opening a high-end Chinese restaurant.

Among the venues, the number of restaurants, especially Chinese restaurant, is not so significant, which means there can be space to get profits from these 3 neighborhoods' catering industry.

For the other neighborhoods, Back Bay is good in terms of the number of venues and less competitive market, but it is too far away from the city center. Therefore, it is not ideal to open a Chinese restaurant; Chinatown is relatively close to the city center, but it already have a very competive catering market, especially Chinese restaurants. A new Chinese restaurant may find it hard to compete against the incumbents.

### **6. Conclusion**

This project is targeted at the stakeholders who want to open a Chinese restaurant in one of the neighborhoods in Boston. By first finding the ten candidate neighborhoods that are closest to the city center, we then explore these neighborhoods one by one in terms of the number of venues, restaurants and specifically Chinese restaurants. 

After the analysis, we propose the Top 3 neighborhoods, i.e., Downtown, West End, North End, which are the most suitable for opening a Chinese restaurant based on the several criteria we consider. Hopefully, these recommendations can be of practical value to the people who are interested.