# Clustering Identical Neighborhoods as City of Mountain View

### A1.Introduction and Background 
Mountain View is a diverse community nestled between the Santa Cruz Mountains and San Francisco Bay. Which ideally located in the heart of the Silicon Valley (10 miles north of San Jose and 35 miles south of San Francisco). At just over 12 square miles, the City is home to approximately 75,000 residents, many nationally and internationally known corporations, and a thriving small business base. <br>
<br>
Mountain View recognizes the need for a robust local economy and Economic Development continues to be a high priority. The City has been formally recognized as having a solid and diversified tax base; a very strong financial operation; and a low debt burden. <br> <Br>

### A2.Explaination
Here in Mountain View, you get a pretty diverse Asian community, such as Asian linkage restaurants, supermarkets where you can get ethnic Asian food, Asian street food, and beverage shops, etc. Given the fast-paced situation of jobs changing, I might be taking an offer on the other side of the state but which has a similar community nearby. <br><br>
This is how I came up with the idea of clustering other cities in the state which shares identical community and cultures. The information revealed by this report would be useful for new immigrants to California who is looking to the same diversity environments.

### B.Data Description
To start with analyzing the data, I would love to scrap resources from below links:
- Scrap html data of Mountain View from wiki page:  https://www.geonames.org/postalcode-search.html?q=Santa+Clara&country=US&adminCode1=CA
- Use foursquare to gain latitude and longitude of neighborhoods of California: https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/table/?refine.state=CA
- Visialize the neighborhoods of Mountain View by using Folium library
- With foursquare calling, to explore the neighborhoods


### C. The map of Mountain view

In [1]:
!pip install beautifulsoup4
!pip install lxml
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 


from IPython.display import display_html
import pandas as pd
import numpy as np
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library
from bs4 import BeautifulSoup
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

print('Folium installed')
print('Libraries imported.')

Collectdone
done

# All requested packages already installed.

Folium installed
Libraries imported.


**C1. Download the data from link**

In [2]:
url = 'https://www.geonames.org/postalcode-search.html?q=Santa+Clara&country=US&adminCode1=CA'

res = requests.get(url)
soup = BeautifulSoup(res.text, 'lxml')
tables = soup.select('table')
df_list = []
for table in tables:
    df_list.append(pd.concat(pd.read_html(table.prettify())))

df = df_list[2]
df.head()

Unnamed: 0.1,Unnamed: 0,Place,Code,Country,Admin1,Admin2,Admin3
0,1.0,Santa Clara,95054,United States,California,Santa Clara,
1,,37.392/-121.962,37.392/-121.962,37.392/-121.962,37.392/-121.962,37.392/-121.962,37.392/-121.962
2,2.0,Santa Clara,95050,United States,California,Santa Clara,
3,,37.349/-121.953,37.349/-121.953,37.349/-121.953,37.349/-121.953,37.349/-121.953,37.349/-121.953
4,3.0,Santa Clara,95051,United States,California,Santa Clara,


**C2.Data Preprocessing**

In [3]:
df.drop('Unnamed: 0', axis=1, inplace=True)
df.drop('Admin3', axis=1, inplace=True)
df.rename(columns={'Admin1':'State','Admin2':'County'}, inplace=True)
df=df[df.index%2==0]

In [4]:
df.reset_index(inplace=True)
df.head()

Unnamed: 0,index,Place,Code,Country,State,County
0,0,Santa Clara,95054,United States,California,Santa Clara
1,2,Santa Clara,95050,United States,California,Santa Clara
2,4,Santa Clara,95051,United States,California,Santa Clara
3,6,Morgan Hill,95037,United States,California,Santa Clara
4,8,Santa Clara,95052,United States,California,Santa Clara


In [5]:
#df.drop('level_0', axis=1, inplace=True)
df.drop('index', axis=1, inplace=True)
df.head()

Unnamed: 0,Place,Code,Country,State,County
0,Santa Clara,95054,United States,California,Santa Clara
1,Santa Clara,95050,United States,California,Santa Clara
2,Santa Clara,95051,United States,California,Santa Clara
3,Morgan Hill,95037,United States,California,Santa Clara
4,Santa Clara,95052,United States,California,Santa Clara


### D. Importing the csv file conatining the latitudes and longitudes for various neighbourhoods in California

**D1. Read data from csv document**

In [80]:
import pandas as pd
lat_lon = pd.read_csv('/Users/alice/Desktop/Coursera/Data Science/2. Python(Pandas)/Applied Data Science Capstone/us-zip-code-latitude-and-longitude.csv')
lat_lon.head()

Unnamed: 0,Zip,Latitude,Longitude,Timezone,Daylight savings time flag,geopoint
0,92232,33.026203,-115.284581,-8,1,33.026203
1,93227,36.357151,-119.425371,-8,1,36.357151
2,93234,36.209815,-120.0847,-8,1,36.209815
3,93529,37.765218,-119.07769,-8,1,37.765218
4,93761,36.746375,-119.639658,-8,1,36.746375


**D2. Data preprocessing and Cleaning**

In [81]:
lat_lon.drop(['Timezone', 'Daylight savings time flag', 'geopoint'], axis=1, inplace=True)
lat_lon.rename(columns={'Zip':'Code'}, inplace=True)
lat_lon.head()

Unnamed: 0,Code,Latitude,Longitude
0,92232,33.026203,-115.284581
1,93227,36.357151,-119.425371
2,93234,36.209815,-120.0847
3,93529,37.765218,-119.07769
4,93761,36.746375,-119.639658


In [82]:
#Before mergry','State','County'], axis=1,inplace=True)
df2.head()
lat_lon = lat_lon.astype(str)
lat_lon.dtypes

Code         object
Latitude     object
Longitude    object
dtype: object

In [51]:
#Before merging we need to covert both tables to object
lat_lon = lat_lon.apply(str)
lat_lon.head()

Code         0       92232.0\n1       93227.0\n2       9323...
Latitude     0       33.026203\n1       36.357151\n2       ...
Longitude    0      -115.284581\n1      -119.425371\n2     ...
dtype: object

**D3. Merge 2 tables to display zip code, latitue, longitude**

Now that we gained a merged table with information listing countr=y, state, and geographic data, we can start to explore the Neighborhood of Mountainn view!

In [84]:
df2 = pd.merge(df,lat_lon, on="Code")
df2.drop(['Country','State','County'], axis=1,inplace=True)
df2.head()

Unnamed: 0,Place,Code,Latitude,Longitude
0,Santa Clara,95054,37.393240000000006,-121.96066
1,Santa Clara,95050,37.347791,-121.95131
2,Santa Clara,95051,37.346241,-121.9846
3,Morgan Hill,95037,37.137595000000005,-121.66211
4,Santa Clara,95052,37.189396,-121.705327


### D. Exploring the Neighborhoods of Mountain View

### E. Clustering places on the map by KMeans

In [89]:
k=5
mtv_clustering = df2.drop(['Code','Place'],1)
kmeans = KMeans(n_clusters = k,random_state=0).fit(mtv_clustering)
kmeans.labels_
df2.insert(0, 'Cluster Labels', kmeans.labels_)
df2

Unnamed: 0,Cluster Labels,Place,Code,Latitude,Longitude
0,0,Santa Clara,95054,37.393240000000006,-121.96066
1,3,Santa Clara,95050,37.347791,-121.95131
2,3,Santa Clara,95051,37.346241,-121.9846
3,1,Morgan Hill,95037,37.137595000000005,-121.66211000000001
4,1,Santa Clara,95052,37.189396,-121.705327
...,...,...,...,...,...
103,1,San Jose,95172,37.189396,-121.705327
104,1,San Jose,95194,37.189396,-121.705327
105,1,San Jose,95196,37.189396,-121.705327
106,1,Holy City,95026,37.189396,-121.705327


In [92]:
map_clusters = folium.Map(location=[37.40679,-122.07461],zoom_start=10)

# set color scheme for the clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, place, cluster in zip(df2['Latitude'],df2['Longitude'],df2['Place'],df2['Cluster Labels']):
    label = folium.Popup(' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [88]:
#The coordinates of Mountain View is:37.40679,-122.07461

map_mtv = folium.Map(location=[37.40679,-122.07461],zoom_start=10)

for lat,lng,place in zip(df2['Latitude'],df2['Longitude'],df2['Place']):
    label = '{}'.format(place)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color='blue',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_mtv)
map_mtv