# The Battle of Neighborhoods (Week 1)

## 1. Introduction Section :

In this project we will try to find an optimal location for weekend activities. Specifically, this report will be targeted to people who are newly moved to **Shanghai**, China.

**The location should have many choices of restaurants, cinemas, coffee shops or bars and shopping possibilities**.  Except that there should be possibilities to go **clubbing** in the night and have **metro station nearby** for good transportation.

This is an interesting task to analyze with valid questions for anyone newly moving to Shanghai. The same method can be applied to explore other big cities as well. This case is also applicable for anyone interested in finding a new appartment to rent. 

## 2. Data
List of neighborhoods of Shanghai with their geodata (latitud and longitud)
* Venues for each Shanghai neighborhood (then can be clustered for restaurants, bars, cinemas etc.)
* location of nearby subway metro stations, as needed

The data will be used as follows:
* Use Foursquare and geopy data to map top 10 venues for all neighborhoods and clustered in groups
* Use foursquare and geopy data to map the location of subway metro stations

These Data will answer the key questions to make a decision:
which neighborhoods contain all requirements (restaurant, bar, cinema, shops and clubs)? and find out which metro stations are nearby?

## 3. Methodology

In this project we will cluster each neighborhood and limit our analysis to area ~5km from the neighborhood center.

In first step we have collected the required **data: location and top within 6km from neighborhood center**.

In next step we will focus on **clusters of locations** : We will create clusters (using **k-means clustering**) and map the clusters. Afterwards we will have a dataframe to explore the most common venues in each neighborhood from each cluster.

## 4. Results

In [1]:
import pandas as pd
import numpy as np
import requests

### Load the geodata

In [2]:
df = pd.read_csv('Shanghai_district.csv')
df['Latitude'] = df['Latitude'].astype(float)
df['Longitude'] = df['Longitude'].astype(float)
df

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,201805,Jiading,Anting,31.288333,121.162222
1,200060,Putuo,Changshou Road Subdistrict,31.2413,121.4296
2,201500,Jinshan,Fengjing,30.891,121.013
3,200120,Pudong New Area,Gaoqiao,31.355,121.57
4,200000,Changning,Gubei,31.188889,121.409389
5,201100,Minhang,Koreatown,31.17126,31.17126
6,200120,Pudong New Area,Lujiazui,31.213058,121.514433
7,201900,Baoshan,Luodian,31.404,121.345
8,201805,Jiading,Nanxiang,31.2937,121.321
9,200120,Pudong New Area,Qiantan International Business Zone,31.1622,121.4734


In [3]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

In [4]:
# get coordinates 
address = 'Shanghai'

geolocator = Nominatim(user_agent="shanghai")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of shanghai are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of shanghai are 31.2252985, 121.4890497.


In [5]:
# Create map of Shanghai using latitude and longitude values in folium
map_shanghai = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, postal, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Postal Code'],df['Borough'], df['Neighborhood']):
    label = "({}): {} - {}".format(postal, borough, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_shanghai)  
    
map_shanghai

### Define Foursquare Credentials and Version

In [6]:
CLIENT_ID = 'UQHSS4NUTEN0QYNWZYNDZHQHXNDQ23U4ECN4JYM3OASRICJK' # your Foursquare ID
CLIENT_SECRET = '53BNBKTDI55HSLOZBLDHJF4WVIUQZIDJ1UYLXQ4EK3X1UUC0' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: UQHSS4NUTEN0QYNWZYNDZHQHXNDQ23U4ECN4JYM3OASRICJK
CLIENT_SECRET:53BNBKTDI55HSLOZBLDHJF4WVIUQZIDJ1UYLXQ4EK3X1UUC0


In [7]:
def getNearbyVenues(names, latitudes, longitudes, radius, LIMIT):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [20]:
df_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude'],
                                   radius = 6000,
                                   LIMIT = 50
                                  )

Anting
Changshou Road Subdistrict
Fengjing
Gaoqiao
Gubei
Koreatown
Lujiazui
Luodian
Nanxiang
Qiantan International Business Zone
Qibao
Songjiang Town
Tianzifang
Wusong
Xintiandi
Xinzhuang
Xujiahui
Zhangjiang Town
Zhujiajiao


In [21]:
# check the size of the resulting dataframe
print(df_venues.shape)

# check how many venues were returned for each neighborhood
df_venues.groupby('Neighborhood').count()

print('There are {} uniques categories.'.format(len(df_venues['Venue Category'].unique())))

(620, 7)
There are 135 uniques categories.


In [22]:
df_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Anting,31.288333,121.162222,Alibaba,31.297209,121.162602,German Restaurant
1,Anting,31.288333,121.162222,Wirtshaus,31.291667,121.154532,Bar
2,Anting,31.288333,121.162222,Starbucks (星巴克),31.291264,121.14285,Coffee Shop
3,Anting,31.288333,121.162222,Life Hub (嘉亭荟城市生活广场),31.289792,121.157673,Shopping Mall
4,Anting,31.288333,121.162222,Crowne Plaza Shanghai Anting (上海穎奕皇冠假日酒店),31.274086,121.18731,Hotel


### Analyze Each Neighborhood

In [23]:
# one hot encoding
df_onehot = pd.get_dummies(df_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
df_onehot['Neighborhood'] = df_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [df_onehot.columns[-1]] + list(df_onehot.columns[:-1])
df_onehot = df_onehot[fixed_columns]

df_onehot.head()

Unnamed: 0,Zhejiang Restaurant,American Restaurant,Art Gallery,Art Museum,Asian Restaurant,Bakery,Bar,Beer Bar,Big Box Store,Bistro,...,Udon Restaurant,Vegetarian / Vegan Restaurant,Video Store,Water Park,Waterfront,Whisky Bar,Wine Shop,Xinjiang Restaurant,Yoga Studio,Yunnan Restaurant
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [24]:
df_grouped = df_onehot.groupby('Neighborhood').mean().reset_index()
df_grouped

Unnamed: 0,Neighborhood,Zhejiang Restaurant,American Restaurant,Art Gallery,Art Museum,Asian Restaurant,Bakery,Bar,Beer Bar,Big Box Store,...,Udon Restaurant,Vegetarian / Vegan Restaurant,Video Store,Water Park,Waterfront,Whisky Bar,Wine Shop,Xinjiang Restaurant,Yoga Studio,Yunnan Restaurant
0,Anting,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Changshou Road Subdistrict,0.0,0.0,0.02,0.0,0.0,0.02,0.02,0.0,0.0,...,0.0,0.02,0.02,0.0,0.0,0.0,0.02,0.0,0.02,0.0
2,Fengjing,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Gaoqiao,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Gubei,0.0,0.0,0.0,0.0,0.02,0.04,0.0,0.02,0.0,...,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.02,0.04
5,Koreatown,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Lujiazui,0.0,0.0,0.02,0.02,0.0,0.02,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.02
7,Luodian,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Nanxiang,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Qiantan International Business Zone,0.0,0.0,0.04,0.06,0.02,0.06,0.02,0.0,0.0,...,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0


In [25]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [33]:
num_top_venues = 30

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = df_grouped['Neighborhood']

for ind in np.arange(df_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(df_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,...,21th Most Common Venue,22th Most Common Venue,23th Most Common Venue,24th Most Common Venue,25th Most Common Venue,26th Most Common Venue,27th Most Common Venue,28th Most Common Venue,29th Most Common Venue,30th Most Common Venue
0,Anting,Coffee Shop,Fast Food Restaurant,Hotel,Metro Station,Shopping Mall,German Restaurant,Park,Toll Booth,Convenience Store,...,Gastropub,General Travel,Japanese Restaurant,Golf Course,Italian Restaurant,Intersection,Hotel Bar,Greek Restaurant,Grocery Store,Gym
1,Changshou Road Subdistrict,Hotel,Hotpot Restaurant,Dumpling Restaurant,Tapas Restaurant,Coffee Shop,Chinese Restaurant,Lounge,Sports Bar,Gym / Fitness Center,...,Polish Restaurant,Yoga Studio,Dim Sum Restaurant,Bar,Italian Restaurant,Wine Shop,Art Gallery,Hotel Bar,Video Store,Vegetarian / Vegan Restaurant
2,Fengjing,Historic Site,Rest Area,Chinese Restaurant,Toll Booth,Convenience Store,Gastropub,General Travel,German Restaurant,Garden,...,History Museum,Hong Kong Restaurant,Hotel,Hotel Bar,Hotpot Restaurant,Hunan Restaurant,Indian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant
3,Gaoqiao,Metro Station,Hotel,Fast Food Restaurant,Coffee Shop,Breakfast Spot,Soccer Field,Convenience Store,Stadium,Chinese Restaurant,...,Jazz Club,German Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Italian Restaurant,Hotpot Restaurant,Grocery Store,Intersection,Gym
4,Gubei,Hotel,Japanese Restaurant,Yunnan Restaurant,Bakery,French Restaurant,Dumpling Restaurant,Korean Restaurant,Chinese Restaurant,Asian Restaurant,...,Flower Shop,Pie Shop,Xinjiang Restaurant,Public Art,Cocktail Bar,Music Venue,Plaza,Gym / Fitness Center,Beer Bar,Whisky Bar


### Cluster Neighborhoods

In [34]:
# set number of clusters
kclusters = 8

df_grouped_clustering = df_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(df_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:16] 

array([2, 0, 5, 2, 7, 1, 0, 3, 6, 7, 0, 6, 7, 2, 0, 6])

In [35]:
kmeans.labels_

array([2, 0, 5, 2, 7, 1, 0, 3, 6, 7, 0, 6, 7, 2, 0, 6, 7, 0, 4])

In [36]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

df_merged = df

# merge df_grouped with df to add latitude/longitude for each neighborhood
df_merged = df_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

df_merged = df_merged.dropna()
df_merged['Cluster Labels'] = df_merged['Cluster Labels'].astype(int)
df_merged # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,...,21th Most Common Venue,22th Most Common Venue,23th Most Common Venue,24th Most Common Venue,25th Most Common Venue,26th Most Common Venue,27th Most Common Venue,28th Most Common Venue,29th Most Common Venue,30th Most Common Venue
0,201805,Jiading,Anting,31.288333,121.162222,2,Coffee Shop,Fast Food Restaurant,Hotel,Metro Station,...,Gastropub,General Travel,Japanese Restaurant,Golf Course,Italian Restaurant,Intersection,Hotel Bar,Greek Restaurant,Grocery Store,Gym
1,200060,Putuo,Changshou Road Subdistrict,31.2413,121.4296,0,Hotel,Hotpot Restaurant,Dumpling Restaurant,Tapas Restaurant,...,Polish Restaurant,Yoga Studio,Dim Sum Restaurant,Bar,Italian Restaurant,Wine Shop,Art Gallery,Hotel Bar,Video Store,Vegetarian / Vegan Restaurant
2,201500,Jinshan,Fengjing,30.891,121.013,5,Historic Site,Rest Area,Chinese Restaurant,Toll Booth,...,History Museum,Hong Kong Restaurant,Hotel,Hotel Bar,Hotpot Restaurant,Hunan Restaurant,Indian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant
3,200120,Pudong New Area,Gaoqiao,31.355,121.57,2,Metro Station,Hotel,Fast Food Restaurant,Coffee Shop,...,Jazz Club,German Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Italian Restaurant,Hotpot Restaurant,Grocery Store,Intersection,Gym
4,200000,Changning,Gubei,31.188889,121.409389,7,Hotel,Japanese Restaurant,Yunnan Restaurant,Bakery,...,Flower Shop,Pie Shop,Xinjiang Restaurant,Public Art,Cocktail Bar,Music Venue,Plaza,Gym / Fitness Center,Beer Bar,Whisky Bar
5,201100,Minhang,Koreatown,31.17126,31.17126,1,Market,Gym,Speakeasy,General Travel,...,Hong Kong Restaurant,Hotel,Hotel Bar,Hotpot Restaurant,Hunan Restaurant,Indian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Jazz Club
6,200120,Pudong New Area,Lujiazui,31.213058,121.514433,0,Hotel,Hotel Bar,Scenic Lookout,Italian Restaurant,...,Art Museum,Turkish Restaurant,Theme Park,Bakery,Art Gallery,Hunan Restaurant,General Travel,American Restaurant,Beer Bar,Golf Course
7,201900,Baoshan,Luodian,31.404,121.345,3,Metro Station,Hotel,Golf Course,Shopping Mall,...,Harbor / Marina,Flower Shop,History Museum,Hong Kong Restaurant,Korean Restaurant,Hotel Bar,Hotpot Restaurant,Hunan Restaurant,Indian Restaurant,Intersection
8,201805,Jiading,Nanxiang,31.2937,121.321,6,Fast Food Restaurant,Coffee Shop,Clothing Store,Shopping Mall,...,Gym / Fitness Center,Indian Restaurant,Korean Restaurant,Karaoke Bar,Jazz Club,Japanese Restaurant,Italian Restaurant,Intersection,Hunan Restaurant,Food Truck
9,200120,Pudong New Area,Qiantan International Business Zone,31.1622,121.4734,7,Art Museum,Bakery,Hotel,Chinese Restaurant,...,Dumpling Restaurant,Music Venue,Fast Food Restaurant,Pizza Place,Creperie,Cantonese Restaurant,Stadium,Bookstore,Xinjiang Restaurant,Asian Restaurant


In [37]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_merged['Latitude'], df_merged['Longitude'], df_merged['Neighborhood'], df_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [38]:
df_merged.loc[df_merged['Cluster Labels'] == 0]

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,...,21th Most Common Venue,22th Most Common Venue,23th Most Common Venue,24th Most Common Venue,25th Most Common Venue,26th Most Common Venue,27th Most Common Venue,28th Most Common Venue,29th Most Common Venue,30th Most Common Venue
1,200060,Putuo,Changshou Road Subdistrict,31.2413,121.4296,0,Hotel,Hotpot Restaurant,Dumpling Restaurant,Tapas Restaurant,...,Polish Restaurant,Yoga Studio,Dim Sum Restaurant,Bar,Italian Restaurant,Wine Shop,Art Gallery,Hotel Bar,Video Store,Vegetarian / Vegan Restaurant
6,200120,Pudong New Area,Lujiazui,31.213058,121.514433,0,Hotel,Hotel Bar,Scenic Lookout,Italian Restaurant,...,Art Museum,Turkish Restaurant,Theme Park,Bakery,Art Gallery,Hunan Restaurant,General Travel,American Restaurant,Beer Bar,Golf Course
10,201100,Minhang,Qibao,31.157778,121.351389,0,Korean Restaurant,Hotel,Japanese Restaurant,Coffee Shop,...,Electronics Store,Bakery,Thai Restaurant,Burger Joint,Brewery,Asian Restaurant,Cantonese Restaurant,Gourmet Shop,Bistro,Grocery Store
14,200000,Huangpu,Xintiandi,31.2226,121.4701,0,Hotel,Coffee Shop,Café,Gym / Fitness Center,...,French Restaurant,Theme Restaurant,Multiplex,Bakery,Lounge,Hong Kong Restaurant,Mexican Restaurant,Hunan Restaurant,Hotpot Restaurant,Theater
17,200120,Pudong New Area,Zhangjiang Town,31.205,121.614,0,Hotel,Coffee Shop,Metro Station,Fast Food Restaurant,...,Burger Joint,Intersection,Train Station,Big Box Store,Indian Restaurant,Mexican Restaurant,Harbor / Marina,General Travel,French Restaurant,Italian Restaurant


### Discussion

Our analysis have identified top 20 venues in each neighborhood and then clustered neighborhood in 8 clusteres. By Looking into each cluster, we found that cluster one fullfilled our needs for looking for an area for weekend activities. The top venue in this cluster contains restaurants, coffee shops, shopping malls, and bars for late night activities. In both neighborhoods from Pudong New Area District, there are metro stations nearby (within 6km), which narrows down our search.

### Conclusion

Purpose of this project was to identify in areas in Shanghai for weekend activities. Area with couple restaurants, bars, shopping malls and metro station. By idetifying top venues 6km from neighborhood center using Foursquare data and then clustering each neighborhood, we have identified first the Pudong New Area District as one of the top Borough. And Xintiandi and Zhangjiang Town as our promoising candidate neighborhood. 

Final decission on optimal location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional attractive venues such as museums, art galleries, theme park or gyms etc.