# Coursera Capstone Project

## Battle of the Neighborhoods



### Introduction:

A restauranteer is seeking to open a new restaurant.  The restauranteer would like to open a new location in the Charlotte area but doesn't know what area to invest in.  He hopes to open in an area that currently lacks restaurant venues and is in an area easily accessible (near public transportation) for their future customers.

Charlotte is the most populous metropolitan area in North Carolina, and 15th most populous in the United States.  The population here is growing rapidly, with the US Census Bureau ranking it as the top area for millennial population growth from 2005 to 2015.  Millennials are always looking for that "next trendy restaraunt" in neighborhoods they may be unfamiliar with.  In addition, millennials are less likely to own personal transportation as compared to their predecessors. Together, these facts make targetting an area for a restaurant that is easy for millennials to travel to via public transportation important.  With pandemic restrictions easing, it is likely more people will want to support local small businesses again.

### Target Audience: 

Investors in a new millenial-friendly restaurant. The investors want the new project to be in an area that has few restaurants established and is close to public transportation that is highly rated.

### Data:

The new location's area will be chosen by the results of a k-means cluster analysis of Charlotte's Zip Codes ([Link to Zip Codes online](https://www.zipcodestogo.com/North%20Carolina/).  These zip codes will be clustered on the basis of top venues in each area obtained from the Foursquare API.  Geographical locations of the neighborhood will be retrieved with the function geocoder from the geopy package in python.   Zip codes will be paired with latitude and longitude coordinates from a publicly available csv ([Zip Code Coordinates Link](https://simplemaps.com/data/us-zips).  The cluster analysis will be completed with the KMeans function from the sklearn package.



In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests

from bs4 import BeautifulSoup

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library


In [3]:
#import All NC Zip Codes
table_nczip = pd.read_html('https://www.zipcodestogo.com/North%20Carolina/')[1]
table_nczip.columns = table_nczip.iloc[1]
table_nczip = table_nczip[2:]
table_nczip = table_nczip.drop(['Zip Code Map', 'County'], axis=1).reset_index(drop=True)
print(table_nczip.head(10))

1 Zip Code          City
0    27006       Advance
1    27007        Ararat
2    27009  Belews Creek
3    27010      Bethania
4    27011     Boonville
5    27012      Clemmons
6    27013     Cleveland
7    27014     Cooleemee
8    27016       Danbury
9    27017        Dobson


In [22]:
charlottezips = table_nczip[table_nczip["City"] == "Charlotte"]
charlottezips=charlottezips.astype({'Zip Code': 'int64'})
print(charlottezips.head(10))
#includes PO box and single entity zip codes, 73 total

1    Zip Code       City
602     28201  Charlotte
603     28202  Charlotte
604     28203  Charlotte
605     28204  Charlotte
606     28205  Charlotte
607     28206  Charlotte
608     28207  Charlotte
609     28208  Charlotte
610     28209  Charlotte
611     28210  Charlotte


In [23]:
charlottezips.dtypes

1
Zip Code     int64
City        object
dtype: object

In [25]:
#Pair Charlotte Zipcodes with Corresponding Latitude/Longitudes:
#csv of USA Zipcodes (from https://simplemaps.com/data/us-zips)
usa_zips = pd.read_csv('uszips.csv')
#Leftmerge
mergedzips = pd.merge(charlottezips, usa_zips, how = 'left', left_on= 'Zip Code', right_on='zip')
mergedzips.head(10)

   Zip Code       City      zip       lat       lng       city state_id  \
0     28201  Charlotte      NaN       NaN       NaN        NaN      NaN   
1     28202  Charlotte  28202.0  35.22778 -80.84458  Charlotte       NC   
2     28203  Charlotte  28203.0  35.20815 -80.85911  Charlotte       NC   
3     28204  Charlotte  28204.0  35.21463 -80.82702  Charlotte       NC   
4     28205  Charlotte  28205.0  35.21973 -80.78791  Charlotte       NC   
5     28206  Charlotte  28206.0  35.25679 -80.82116  Charlotte       NC   
6     28207  Charlotte  28207.0  35.19512 -80.82622  Charlotte       NC   
7     28208  Charlotte  28208.0  35.23057 -80.90992  Charlotte       NC   
8     28209  Charlotte  28209.0  35.17854 -80.85386  Charlotte       NC   
9     28210  Charlotte  28210.0  35.12900 -80.85552  Charlotte       NC   

       state_name  zcta  parent_zcta  population  density  county_fips  \
0             NaN   NaN          NaN         NaN      NaN          NaN   
1  North Carolina  True   

In [27]:
#Remove NaN for Lat/Longs
mergedzips = mergedzips.dropna(subset = ['lat', 'lng'])
mergedzips.head(10)

Unnamed: 0,Zip Code,City,zip,lat,lng,city,state_id,state_name,zcta,parent_zcta,population,density,county_fips,county_name,county_weights,county_names_all,county_fips_all,imprecise,military,timezone
1,28202,Charlotte,28202.0,35.22778,-80.84458,Charlotte,NC,North Carolina,True,,13498.0,2889.2,37119.0,Mecklenburg,"{""37119"": ""100""}",Mecklenburg,37119,False,False,America/New_York
2,28203,Charlotte,28203.0,35.20815,-80.85911,Charlotte,NC,North Carolina,True,,16655.0,1938.4,37119.0,Mecklenburg,"{""37119"": ""100""}",Mecklenburg,37119,False,False,America/New_York
3,28204,Charlotte,28204.0,35.21463,-80.82702,Charlotte,NC,North Carolina,True,,7199.0,1607.8,37119.0,Mecklenburg,"{""37119"": ""100""}",Mecklenburg,37119,False,False,America/New_York
4,28205,Charlotte,28205.0,35.21973,-80.78791,Charlotte,NC,North Carolina,True,,48798.0,1593.8,37119.0,Mecklenburg,"{""37119"": ""100""}",Mecklenburg,37119,False,False,America/New_York
5,28206,Charlotte,28206.0,35.25679,-80.82116,Charlotte,NC,North Carolina,True,,12036.0,658.6,37119.0,Mecklenburg,"{""37119"": ""100""}",Mecklenburg,37119,False,False,America/New_York
6,28207,Charlotte,28207.0,35.19512,-80.82622,Charlotte,NC,North Carolina,True,,9986.0,1531.4,37119.0,Mecklenburg,"{""37119"": ""100""}",Mecklenburg,37119,False,False,America/New_York
7,28208,Charlotte,28208.0,35.23057,-80.90992,Charlotte,NC,North Carolina,True,,40284.0,706.7,37119.0,Mecklenburg,"{""37119"": ""100""}",Mecklenburg,37119,False,False,America/New_York
8,28209,Charlotte,28209.0,35.17854,-80.85386,Charlotte,NC,North Carolina,True,,23533.0,1657.2,37119.0,Mecklenburg,"{""37119"": ""100""}",Mecklenburg,37119,False,False,America/New_York
9,28210,Charlotte,28210.0,35.129,-80.85552,Charlotte,NC,North Carolina,True,,48333.0,1469.4,37119.0,Mecklenburg,"{""37119"": ""100""}",Mecklenburg,37119,False,False,America/New_York
10,28211,Charlotte,28211.0,35.16807,-80.79616,Charlotte,NC,North Carolina,True,,31121.0,1115.0,37119.0,Mecklenburg,"{""37119"": ""100""}",Mecklenburg,37119,False,False,America/New_York
