# The Battle of the neighborhoods

## Import Libraries Required

In [1]:
import numpy as np # library to handle data in a vectorized manner
import time
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe


# !conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# !conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library
import folium # map rendering library
from folium import plugins

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

import seaborn as sns

# import k-means from clustering stage
from sklearn.cluster import KMeans



print('Libraries imported.')

Libraries imported.


## 1. Introduction Section :

### Discussion of the business problem and the audience who would be interested in this project.

### 1.1 Scenario and backgroud

 I am a data scientist living in Shanghai. I have rent an appartment unit in Qingpu Area. However, after several months, I find it is very inconvenient as the apartment is far from subway station and there is no large supermarket within 3 kilometers. Furthermore, recently I was offered a new job in another area(Yangpu Area). So I would like to buy a new appartment which brings more convenience to my work and life.

### 1.2 Problem to be resolved:

The challenge to resolve is being able to find a rental apartment unit in Shanghai that is subject to the following conditions:

1. Apartment with min 2 bedrooms and min 80 square meters with price not to exceed ￥700K
2. Unit located within walking distance (<=1.5 km) from a subway metro station in Yangpu Area
3. Area with ammenities like supermarket, restaurants, shops and cinemas within walking distance

### 1.3 Interested Audience

I believe this is a relevant project for a person who wants to find a better place to live and work, since the approach and methodologies used here are applicable in all cases. The use of FourSquare data, scraping other relevant data from website and mapping techniques combined with data analysis will help resolve the key questions arisen. Lastly, this project is a good practical case toward the development of Data Science skills.

## 2.Data Section:

### 2.1 Data Required to resolve the problem

In order to make a good choice of a good appartment in Shanghai, the following data is required:

1. Listed second-hand apartments in Yangpu Area with descriptions (price, location, how many bedrooms,area)
2. Geodata (latitude, longitude) of the second-hand appartments
3. List/Information about the subway metro stations in Yangpu Area with geodata
4. Venues and ammenities in the neighborhoods

### 2.2 Data source

#### 1. Scrap second-hand apartment info from a website named Lianjia and transform data into dataframe
https://sh.lianjia.com/ershoufang/yangpu/rp5/

In [20]:
df_house=pd.read_csv('lianjia.csv')
df_house.head()

Unnamed: 0,Apartment number,Address,Price,Bedrooms,Area
0,0,宝地东花园,890,2,105.81
1,1,控江路1455弄,315,2,55.85
2,2,眉州路515弄,218,1,35.11
3,3,嘉誉湾,1620,4,195.02
4,4,广杭苑,645,3,114.22


In [21]:
df_house.shape[0]

600

In [22]:
df_house.dtypes

Apartment number      int64
Address              object
Price                 int64
Bedrooms              int64
Area                float64
dtype: object

##### select appartment satisfying the first condition in section 1.2 

100 apartments are selected

In [23]:
df_selected=df_house[(df_house.Price<=700)&(df_house.Bedrooms>=2)&(df_house.Area>=80)]
df_selected.shape 

(100, 5)

In [24]:
df_selected.head(10)

Unnamed: 0,Apartment number,Address,Price,Bedrooms,Area
4,4,广杭苑,645,3,114.22
10,10,圣骊河滨苑,698,2,83.31
23,23,新江湾佳苑,646,2,91.44
24,24,松花公寓,428,2,83.0
29,29,锦杨苑,680,2,105.15
34,34,广杭苑,645,3,114.22
40,40,圣骊河滨苑,698,2,83.31
53,53,新江湾佳苑,646,2,91.44
54,54,松花公寓,428,2,83.0
59,59,锦杨苑,680,2,105.15


Extract addresses of these selected appartments

In [25]:
df_selected.Address.unique()

array(['广杭苑', '圣骊河滨苑', '新江湾佳苑', '松花公寓', '锦杨苑'], dtype=object)

#### 2. Get geodata of the addresses of selected apartments with Baidu api

In [26]:
df_geo=pd.read_csv('apartment_geocode.csv')
df_geo

Unnamed: 0,Address,Longitude,Latitude
0,松花公寓,121.545267,31.300812
1,广杭苑,121.542566,31.263955
2,圣骊河滨苑,121.538631,31.263405
3,新江湾佳苑,121.510335,31.315625
4,锦杨苑,121.535064,31.268477


#### 3. Get metro station geodata

In [27]:
df_station=pd.read_csv('station.csv')
df_station.head()

Unnamed: 0,Station,Longitude,Latitude
0,三林东,121.523234,31.146525
1,三门路,121.507995,31.313091
2,上南路,121.506413,31.149112
3,上大路,121.409179,31.313524
4,上海体育场,121.443713,31.185522


#### 4.Foursquare data and geodata to map venues

In this part, I will use foursquare api to check venues around the selected apartment to see if it is convenient enough.