# <font color="Red">Best Place for an Asian Restaurant in Lasing Capital Area

## <font color="Blue"> Introduction/Business Problem


The client is looking to open an Asian restaurant in Lansing Capital Area (Lansing, East Lansing, Okemos, Haslett, Mason, and Holt), MI. 
The location is very important to him as he needs the location of the restaurant fulfills the following considerations:
+ In which region, the most favorite restaurant is an Asian restaurant.
+ The population in the region should not be too small.

In this project, we are interested in restaurants. 
We will use the popularity of the restaurant and population in this zip code region to cluster the venues. 
Consider with the population in these regions, we will figure out in which zip code region, Asian Restaurant is the most popular.

## <font color="Blue">  Data source
 
+ All Americal's zipcode data can be found on the website: https://simplemaps.com/data/us-zips.
+ The website of https://www.zipdatamaps.com/{zipcode} provide more details information for each zip code region.



#### Load modules

In [None]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium # map rendering library
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import json # library to handle JSON files
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
import seaborn as sns
import matplotlib.pyplot as plt
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

## <font color="Blue">Get data and clean data

Get geospatial information of Lansing Capital Area. 

In [2]:
path='uszipsv1.4.csv'
gosdf = pd.read_csv(path)
gosdf.head()

Unnamed: 0,zip,lat,lng,city,state_id,state_name,zcta,parent_zcta,population,county_fips,county_name,all_county_weights,imprecise,military
0,501,40.8133,-73.0476,Holtsville,NY,New York,False,11742.0,,,,,True,False
1,544,40.8133,-73.0476,Holtsville,NY,New York,False,11742.0,,,,,True,False
2,601,18.18,-66.7522,Adjuntas,PR,Puerto Rico,True,,18570.0,72001.0,72001.0,"{'72001':99.43,'72141':0.57}",False,False
3,602,18.3607,-67.1752,Aguada,PR,Puerto Rico,True,,41520.0,72003.0,72003.0,{'72003':100},False,False
4,603,18.4544,-67.122,Aguadilla,PR,Puerto Rico,True,,54689.0,72005.0,72005.0,{'72005':100},False,False


In [3]:
GreatLansing = gosdf[(gosdf['city'] == 'Lansing') | (gosdf['city'] == 'Okemos') | (gosdf['city'] == 'East Lansing')|(gosdf['city'] == 'Haslett')|(gosdf['city'] == 'Mason')|(gosdf['city'] == 'Holt')]
LansingMI = GreatLansing[GreatLansing['state_id'] =='MI']
LansingMI.head()

Unnamed: 0,zip,lat,lng,city,state_id,state_name,zcta,parent_zcta,population,county_fips,county_name,all_county_weights,imprecise,military
20914,48805,42.7082,-84.4144,Okemos,MI,Michigan,False,48864.0,,,,,True,False
20930,48823,42.762,-84.4539,East Lansing,MI,Michigan,True,,51302.0,26065.0,26065.0,"{'26037':14.5,'26065':85.5}",False,False
20931,48824,42.7229,-84.4751,East Lansing,MI,Michigan,False,48825.0,,,,,False,False
20932,48825,42.727,-84.4809,East Lansing,MI,Michigan,True,,12596.0,26065.0,26065.0,{'26065':100},False,False
20933,48826,42.736,-84.4843,East Lansing,MI,Michigan,False,48823.0,,,,,True,False


### <font color="Blue"> Remove duplicate value.

In [4]:
lasing = LansingMI[['zip','lat','lng','city','population']].drop_duplicates(['zip']).copy()
lasing.zip = lasing['zip'].astype('str')
lasing = lasing.drop_duplicates(['lat','lng']).copy()
lasing

Unnamed: 0,zip,lat,lng,city,population
20914,48805,42.7082,-84.4144,Okemos,
20930,48823,42.762,-84.4539,East Lansing,51302.0
20931,48824,42.7229,-84.4751,East Lansing,
20932,48825,42.727,-84.4809,East Lansing,12596.0
20933,48826,42.736,-84.4843,East Lansing,
20945,48840,42.769,-84.3707,Haslett,12501.0
20947,48842,42.6338,-84.5387,Holt,20432.0
20959,48854,42.582,-84.4517,Mason,18598.0
20968,48864,42.7013,-84.4067,Okemos,20148.0
21000,48901,42.7091,-84.554,Lansing,


### <font color="Blue"> Remove outlier value.
There is a outlier value, which is (40.2439,-87.1261) for zip=48918. We need to remove this outlier value from our dataframe.

In [5]:
lasing = lasing[np.abs(lasing.lat-lasing.lat.mean()) <= (3*lasing.lat.std())]
lasing = lasing.sort_values(by=['zip'])
lasing.reset_index(inplace=True)
lasing

Unnamed: 0,index,zip,lat,lng,city,population
0,20914,48805,42.7082,-84.4144,Okemos,
1,20930,48823,42.762,-84.4539,East Lansing,51302.0
2,20931,48824,42.7229,-84.4751,East Lansing,
3,20932,48825,42.727,-84.4809,East Lansing,12596.0
4,20933,48826,42.736,-84.4843,East Lansing,
5,20945,48840,42.769,-84.3707,Haslett,12501.0
6,20947,48842,42.6338,-84.5387,Holt,20432.0
7,20959,48854,42.582,-84.4517,Mason,18598.0
8,20968,48864,42.7013,-84.4067,Okemos,20148.0
9,21000,48901,42.7091,-84.554,Lansing,


Remove **index** column

In [6]:
lasing = lasing.drop('index',axis=1)
lasing

Unnamed: 0,zip,lat,lng,city,population
0,48805,42.7082,-84.4144,Okemos,
1,48823,42.762,-84.4539,East Lansing,51302.0
2,48824,42.7229,-84.4751,East Lansing,
3,48825,42.727,-84.4809,East Lansing,12596.0
4,48826,42.736,-84.4843,East Lansing,
5,48840,42.769,-84.3707,Haslett,12501.0
6,48842,42.6338,-84.5387,Holt,20432.0
7,48854,42.582,-84.4517,Mason,18598.0
8,48864,42.7013,-84.4067,Okemos,20148.0
9,48901,42.7091,-84.554,Lansing,


### <font color="Blue">Get the population number from internet for Nan value.
Get the index Number for population equals to **NaN**

In [7]:
lax = lasing.index[lasing['population'].isnull()]
lax

Int64Index([0, 2, 4, 9, 11, 18], dtype='int64')

From https://www.zipdatamaps.com/{zipcode} website to obtain the population number and add to dataframe

In [8]:
for index in lax:
    zipcode = lasing.iloc[index].zip
    url_address = 'https://www.zipdatamaps.com/{}'.format(zipcode)
    url=requests.get(url_address).text
    soup = BeautifulSoup(url,'lxml')
    table = soup.find('table',{'class':'table table-striped table-bordered table-hover table-condensed'})
    lists=[]
    for row in table.findAll("tr"):
        cells = row.findAll("td")
        items = [th.text.strip() for th in cells]
        lists.append(items)
    
    df=pd.DataFrame(lists[0:])
    zip_population = df.loc[5,1]
    lasing.at[index,'population'] = df.loc[5,1] 
    print('zipcode :',zipcode,'   population = ',zip_population)

zipcode : 48805    population =  0
zipcode : 48824    population =  1158
zipcode : 48826    population =  0
zipcode : 48901    population =  0
zipcode : 48909    population =  0
zipcode : 48951    population =  0


In [9]:
lasing

Unnamed: 0,zip,lat,lng,city,population
0,48805,42.7082,-84.4144,Okemos,0.0
1,48823,42.762,-84.4539,East Lansing,51302.0
2,48824,42.7229,-84.4751,East Lansing,1158.0
3,48825,42.727,-84.4809,East Lansing,12596.0
4,48826,42.736,-84.4843,East Lansing,0.0
5,48840,42.769,-84.3707,Haslett,12501.0
6,48842,42.6338,-84.5387,Holt,20432.0
7,48854,42.582,-84.4517,Mason,18598.0
8,48864,42.7013,-84.4067,Okemos,20148.0
9,48901,42.7091,-84.554,Lansing,0.0


Drop rows for population equals to 0.

In [10]:
lasing = lasing.loc[(lasing.population!=0)]
lasing.reset_index(inplace=True)
lasing

Unnamed: 0,index,zip,lat,lng,city,population
0,1,48823,42.762,-84.4539,East Lansing,51302.0
1,2,48824,42.7229,-84.4751,East Lansing,1158.0
2,3,48825,42.727,-84.4809,East Lansing,12596.0
3,5,48840,42.769,-84.3707,Haslett,12501.0
4,6,48842,42.6338,-84.5387,Holt,20432.0
5,7,48854,42.582,-84.4517,Mason,18598.0
6,8,48864,42.7013,-84.4067,Okemos,20148.0
7,10,48906,42.7845,-84.5875,Lansing,26634.0
8,12,48910,42.6985,-84.523,Lansing,34560.0
9,13,48911,42.6745,-84.5709,Lansing,40111.0


In [11]:
lasing = lasing.drop('index',axis=1)
lasing

Unnamed: 0,zip,lat,lng,city,population
0,48823,42.762,-84.4539,East Lansing,51302.0
1,48824,42.7229,-84.4751,East Lansing,1158.0
2,48825,42.727,-84.4809,East Lansing,12596.0
3,48840,42.769,-84.3707,Haslett,12501.0
4,48842,42.6338,-84.5387,Holt,20432.0
5,48854,42.582,-84.4517,Mason,18598.0
6,48864,42.7013,-84.4067,Okemos,20148.0
7,48906,42.7845,-84.5875,Lansing,26634.0
8,48910,42.6985,-84.523,Lansing,34560.0
9,48911,42.6745,-84.5709,Lansing,40111.0
