# Project Title: Find a place to open a fast-food restaurant

## Description:                                                                                                  
A customer is looking for a place to open a fast food restaurant in Maryland, US. In order to pick an ideal location, we have to answer the following questions in order to find the potential candidates.

1. Population - We want to select the city with large population in the Maryland State.<br>
2. Restaurant distribution -  We want to avoid the area with many fast food restaurants, so the ideal location should be few fast food restaurants with large population.

## Data:
* Latitude, longitude, and estimated population of 2015 data - Download csv data from United States Zip Codes.org
* Location and surrounding data - From Foursquare

## Method:
Merge both data from United States Zip Codes organization and from Foursquare. And find out the information we need such latitude, longitude, population, and the distribution of restaurants. Afterwards, will use different analysis methods with Python pandas, numpy, and sklearn, etc. to find out the potential locations for our customer.

### Step1 : Import the necessary libraries

In [58]:
# Import the libraries we need in the project
# Import requests #Library to handle requests

import pandas as pd #Library for data analysis
import numpy as np #Library to handle data in a vectorized manner
import json
import random #Library for random number generation

import requests
from pandas.io.json import json_normalize

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0
import folium as folium
#from folium.plugins import MarkerCluster

# Libraries for displaying images
#from IPython.display import Image
#from IPython.core.display import HTML

print('Libraries imported.')


Libraries imported.


In [59]:
# Set default maximum columns to display
#pd.set_option('display.max_columns', 40)

### Step 2: Import the latitude, longitude, population, and other demographic data into dataframe.  

In [60]:
# Use pandas read_cvs to read the postal data
# postalData = pd.read_csv('zip_code_database.csv')
# postalData.head()

In [61]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,zip,type,decommissioned,primary_city,acceptable_cities,unacceptable_cities,state,county,timezone,area_codes,world_region,country,latitude,longitude,irs_estimated_population_2015
0,501,UNIQUE,0,Holtsville,,I R S Service Center,NY,Suffolk County,America/New_York,631,,US,40.81,-73.04,562
1,544,UNIQUE,0,Holtsville,,Irs Service Center,NY,Suffolk County,America/New_York,631,,US,40.81,-73.04,0
2,601,STANDARD,0,Adjuntas,,"Colinas Del Gigante, Jard De Adjuntas, Urb San...",PR,Adjuntas Municipio,America/Puerto_Rico,787939,,US,18.16,-66.72,0
3,602,STANDARD,0,Aguada,,"Alts De Aguada, Bo Guaniquilla, Comunidad Las ...",PR,Aguada Municipio,America/Puerto_Rico,787939,,US,18.38,-67.18,0
4,603,STANDARD,0,Aguadilla,Ramey,"Bda Caban, Bda Esteves, Bo Borinquen, Bo Ceiba...",PR,Aguadilla Municipio,America/Puerto_Rico,787,,US,18.43,-67.15,0


### Step 3: Clean up the data and keep only the column and data we need. The target area is Maryland, USA with popluation with 45,000 or above.

In [62]:
MDData = postalData.drop(columns=['decommissioned', 'unacceptable_cities', 'timezone', 'world_region'])
MDData.sort_values(by='irs_estimated_population_2015', ascending=False)

Unnamed: 0,zip,type,primary_city,acceptable_cities,state,county,area_codes,country,latitude,longitude,irs_estimated_population_2015
26692,60629,STANDARD,Chicago,Bedford Park,IL,Cook County,312773872,US,41.78,-87.71,114420
4118,11220,STANDARD,Brooklyn,,NY,Kings County,718,US,40.64,-74.02,111430
34023,77449,STANDARD,Katy,Park Row,TX,Harris County,281346832,US,29.84,-95.73,109280
3135,8701,STANDARD,Lakewood,,NJ,Ocean County,732848908,US,40.09,-74.21,105330
34065,77494,STANDARD,Katy,Park Row,TX,Fort Bend County,281832,US,29.74,-95.83,104450
35228,79936,STANDARD,El Paso,,TX,El Paso County,915,US,31.78,-106.30,103850
38326,90650,STANDARD,Norwalk,,CA,Los Angeles County,562,US,33.90,-118.07,101180
4167,11368,STANDARD,Corona,Flushing,NY,Queens County,718,US,40.74,-73.85,100270
39235,93033,STANDARD,Oxnard,,CA,Ventura County,805,US,34.14,-119.10,98770
33806,77084,STANDARD,Houston,,TX,Harris County,281346832,US,29.83,-95.66,98020


## Clean up the data and filter to Maryland, USA with high population.

In [63]:
MDData = MDData[((MDData['type'] == 'UNIQUE') | (MDData['type'] == 'STANDARD')) & (MDData['state'] == 'MD')]  

MDData.sort_values(by='irs_estimated_population_2015', ascending=False, inplace=True)

MDData

Unnamed: 0,zip,type,primary_city,acceptable_cities,state,county,area_codes,country,latitude,longitude,irs_estimated_population_2015
8791,20906,STANDARD,Silver Spring,Aspen Hill,MD,Montgomery County,301240,US,39.09,-77.06,68290
8769,20878,STANDARD,Gaithersburg,"Darnestown, N Potomac, No Potomac, North Potomac",MD,Montgomery County,240301,US,39.12,-77.25,62930
8929,21234,STANDARD,Parkville,Baltimore,MD,Baltimore County,410443,US,39.38,-76.55,62620
8765,20874,STANDARD,Germantown,Darnestown,MD,Montgomery County,240,US,39.17,-77.26,59300
8873,21122,STANDARD,Pasadena,"Lake Shore, Millersville, Riviera Beach",MD,Anne Arundel County,410443,US,39.11,-76.55,57620
8789,20904,STANDARD,Silver Spring,Colesville,MD,Montgomery County,301,US,39.07,-76.98,55730
8787,20902,STANDARD,Silver Spring,Wheaton,MD,Montgomery County,240301,US,39.05,-77.04,52380
9098,21740,STANDARD,Hagerstown,,MD,Washington County,240301,US,39.63,-77.71,52380
8871,21117,STANDARD,Owings Mills,Garrison,MD,Baltimore County,410,US,39.41,-76.79,52350
8844,21061,STANDARD,Glen Burnie,,MD,Anne Arundel County,410443,US,39.16,-76.63,51090


## We want the population is equal to or greater than 45,000.

In [64]:
MDDataPop = MDData[MDData['irs_estimated_population_2015'] >= 45000].reset_index(drop=True)
MDDataPop.sort_values(by='irs_estimated_population_2015', ascending=False, inplace=True)

MDDataPop

Unnamed: 0,zip,type,primary_city,acceptable_cities,state,county,area_codes,country,latitude,longitude,irs_estimated_population_2015
0,20906,STANDARD,Silver Spring,Aspen Hill,MD,Montgomery County,301240,US,39.09,-77.06,68290
1,20878,STANDARD,Gaithersburg,"Darnestown, N Potomac, No Potomac, North Potomac",MD,Montgomery County,240301,US,39.12,-77.25,62930
2,21234,STANDARD,Parkville,Baltimore,MD,Baltimore County,410443,US,39.38,-76.55,62620
3,20874,STANDARD,Germantown,Darnestown,MD,Montgomery County,240,US,39.17,-77.26,59300
4,21122,STANDARD,Pasadena,"Lake Shore, Millersville, Riviera Beach",MD,Anne Arundel County,410443,US,39.11,-76.55,57620
5,20904,STANDARD,Silver Spring,Colesville,MD,Montgomery County,301,US,39.07,-76.98,55730
6,20902,STANDARD,Silver Spring,Wheaton,MD,Montgomery County,240301,US,39.05,-77.04,52380
7,21740,STANDARD,Hagerstown,,MD,Washington County,240301,US,39.63,-77.71,52380
8,21117,STANDARD,Owings Mills,Garrison,MD,Baltimore County,410,US,39.41,-76.79,52350
9,21061,STANDARD,Glen Burnie,,MD,Anne Arundel County,410443,US,39.16,-76.63,51090


### Now, let's pop the map and see the location of cities that meet our criteria.

In [65]:
# Create map of Maryland State with zip code, city, County, and 2015 estimated population
map_MD = folium.Map(location=[39.09, -77.06], zoom_start=9)

for lat, lng, city, zipcode, population in zip(MDDataPop['latitude'], MDDataPop['longitude'], MDDataPop['primary_city'], MDDataPop['zip'], MDDataPop['irs_estimated_population_2015']):
        label = 'City: {}, Zip: {}, Population: {}'.format(city, zipcode, population)
        label = folium.Popup(label, parse_html=True)
 
        folium.CircleMarker(
            location=[lat, lng],
            radius=5,
            popup=label,
            #icon=folium.Icon(color='yellow',icon_color='green',icon='cloud')
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False
        ).add_to(map_MD)

map_MD

### Step 4: Ready to pull the restaurant (fast food) data from Foursquare.

In [66]:
# The code was removed by Watson Studio for sharing.

In [67]:
# The code was removed by Watson Studio for sharing.

Latitude and longitude values of Silver Spring are 39.09, -77.06.


'https://api.foursquare.com/v2/venues/explore?client_id=2DNKE3IIDD0KWZVLRXDAFBYBCAXFOIL00NVVFUCZPD1DCC2I&client_secret=004DNJPNQTBGZOJCJJN4GEERXGFQXFN5HHRAI2WN0JV50AJW&v=20180605&ll=39.09,-77.06&radius=30000&limit=400&categoryId=4bf58dd8d48988d16e941735'

## Import the data from Foursquare and create a function to extract the category of venue     

In [68]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f66c8b69b40ad19fee4b13c'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'},
    {'name': '$-$$$$', 'key': 'price'}]},
  'headerLocation': 'Current map view',
  'headerFullLocation': 'Current map view',
  'headerLocationGranularity': 'unknown',
  'query': 'fast food',
  'totalResults': 134,
  'suggestedBounds': {'ne': {'lat': 39.36000027000027,
    'lng': -76.7127811602611},
   'sw': {'lat': 38.819999729999736, 'lng': -77.4072188397389}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '52569edc11d29fcd485bac39',
       'name': 'Chick-fil-A',
       'location': {'address': '12001 Rockville Pike',
        'lat': 39.0538753,
        'lng': -77.1163845,
        'labeledLat

In [69]:
#Create function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [70]:
#Ready to clean the json and structure it into a pandas dataframe.
venues = results['response']['groups'][0]['items']

nearby_venues = json_normalize(venues) #flatten JSON

#Filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng', 'venue.location.postalCode', 'venue.location.city']
nearby_venues = nearby_venues.loc[:, filtered_columns]

#Filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

#Clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng,postalCode,city
0,Chick-fil-A,Fast Food Restaurant,39.053875,-77.116384,20852,Rockville
1,Krispy Kreme Doughnuts,Fast Food Restaurant,39.09702,-77.194032,20850,Rockville
2,Chick-fil-A,Fast Food Restaurant,39.057405,-76.966877,20904,Silver Spring
3,Chick-fil-A,Fast Food Restaurant,38.997188,-77.025518,20910,Silver Spring
4,Five Guys,Fast Food Restaurant,39.086146,-77.15221,20850,Rockville


In [71]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


## Create a function to repeat the above same process to all the neighbourhoods in Maryland

In [72]:
def getNearbyVenues(names, latitudes, longitudes, zipCode, radius=30000):

    venues_list = []
    for name, lat, lng, zipCode in zip(names, latitudes, longitudes, zipCode):
        print(name)
        
        #Create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT,
            categoryId)
        
        #Make the GET request
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        #Return only relevant information for each nearby venue
        venues_list.append([(
            name,
            lat,
            lng,
            zipCode,
            v['venue']['name'],
            v['venue']['location']['lat'],
            v['venue']['location']['lng']
            #v['venue']['location']['postalCode'],
            #v['venue']['location']['city'],           
            #v['venue']['categories'][0]['name']
        ) for v in results])
        
        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Neighbourhood Zip',                       
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude' 
                  #'Venue ZipCode',               
                  #'Venue City',                   
                  #'Venue Category'
                  ]
    
    return(nearby_venues)

In [73]:
MD_venues = getNearbyVenues(names=MDDataPop['primary_city'], 
                            latitudes=MDDataPop['latitude'], 
                            longitudes=MDDataPop['longitude'],
                            zipCode=MDDataPop['zip']
                           )

Silver Spring
Gaithersburg
Parkville
Germantown
Pasadena
Silver Spring
Silver Spring
Hagerstown
Owings Mills
Glen Burnie
Fort Washington
Dundalk
Hyattsville
Potomac
Baltimore
Baltimore
Upper Marlboro
Ellicott City


In [74]:
#Check the size of ther resulting dataframe and the top 20 records of MD_venues
print(MD_venues.shape)
MD_venues.head(20)

(1800, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Neighbourhood Zip,Venue,Venue Latitude,Venue Longitude
0,Silver Spring,39.09,-77.06,20906,Chick-fil-A,39.053875,-77.116384
1,Silver Spring,39.09,-77.06,20906,Krispy Kreme Doughnuts,39.09702,-77.194032
2,Silver Spring,39.09,-77.06,20906,Chick-fil-A,39.057405,-76.966877
3,Silver Spring,39.09,-77.06,20906,Chick-fil-A,38.997188,-77.025518
4,Silver Spring,39.09,-77.06,20906,Five Guys,39.086146,-77.15221
5,Silver Spring,39.09,-77.06,20906,Dairy Queen,39.074667,-77.115919
6,Silver Spring,39.09,-77.06,20906,Duccini's,38.916813,-77.041277
7,Silver Spring,39.09,-77.06,20906,Popeyes Louisiana Kitchen,39.053279,-77.051178
8,Silver Spring,39.09,-77.06,20906,Muncheez,38.904613,-77.062759
9,Silver Spring,39.09,-77.06,20906,Taco Bell,39.080864,-77.076614


In [75]:
# One hot encoding
MD_onehot = pd.get_dummies(MD_venues[['Venue']], prefix="", prefix_sep="")
MD_onehot  #1800 rows X 169 columns

Unnamed: 0,&pizza,AC&T/Subway,All About Burger,Anita's New Mexican Style Mexican Food,Anita's New Mexico Style Mexican Food,Applebee's Grill + Bar,Arby's,Ashland Cafe,Auntie Anne's / Subway,Baja Fresh,...,Tropical Smoothie Cafe,Uncle Julio's,Vapiano,Virginia Kitchen,Wata ~ Wing,Wendy's,Wendys,Wendy’s,Woodmont Grill,Yum's II
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [76]:
# Add neighbourhood column back to the new dataframe, MD_onehot
MD_onehot['Neighbourhood'] = MD_venues['Neighbourhood']

MD_onehot.columns.get_loc("Neighbourhood")

164

In [77]:
# Have a variable mid to contain the column, MD_onehot['Neighbourhood'] 
mid = MD_onehot['Neighbourhood']
mid.head()

0    Silver Spring
1    Silver Spring
2    Silver Spring
3    Silver Spring
4    Silver Spring
Name: Neighbourhood, dtype: object

In [78]:
# Move neighbourhood column to the first column
MD_onehot.drop(labels=['Neighbourhood'], axis=1, inplace=True)
MD_onehot.insert(0, "Neighbourhood", mid)
MD_onehot.head()

Unnamed: 0,Neighbourhood,&pizza,AC&T/Subway,All About Burger,Anita's New Mexican Style Mexican Food,Anita's New Mexico Style Mexican Food,Applebee's Grill + Bar,Arby's,Ashland Cafe,Auntie Anne's / Subway,...,Tropical Smoothie Cafe,Uncle Julio's,Vapiano,Virginia Kitchen,Wata ~ Wing,Wendy's,Wendys,Wendy’s,Woodmont Grill,Yum's II
0,Silver Spring,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Silver Spring,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Silver Spring,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Silver Spring,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Silver Spring,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [79]:
MD_onehot.shape

(1800, 165)

In [80]:
# Let's group rows by neighbourhood and by taking the mean of the frequency of occurrence of each category
MD_grouped = MD_onehot.groupby('Neighbourhood').mean().reset_index()
MD_grouped.head()

Unnamed: 0,Neighbourhood,&pizza,AC&T/Subway,All About Burger,Anita's New Mexican Style Mexican Food,Anita's New Mexico Style Mexican Food,Applebee's Grill + Bar,Arby's,Ashland Cafe,Auntie Anne's / Subway,...,Tropical Smoothie Cafe,Uncle Julio's,Vapiano,Virginia Kitchen,Wata ~ Wing,Wendy's,Wendys,Wendy’s,Woodmont Grill,Yum's II
0,Baltimore,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.01,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0
1,Dundalk,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.01,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0
2,Ellicott City,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.01,...,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0
3,Fort Washington,0.01,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01
4,Gaithersburg,0.01,0.0,0.01,0.01,0.01,0.0,0.03,0.0,0.0,...,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0


In [81]:
# Confirm the new size
print('MD_grouped.shape: ', MD_grouped.shape)

tempCol = MD_grouped.columns.T
#print('tempCol: ', tempCol)
print('tempCol.shape: ', tempCol.shape)
tempCol.columns = ['venue', 'freq']

print('tempCol: ', tempCol)


MD_grouped.shape:  (15, 165)
tempCol.shape:  (165,)
tempCol:  Index(['Neighbourhood', '&pizza', 'AC&T/Subway', 'All About Burger',
       'Anita's New Mexican Style Mexican Food',
       'Anita's New Mexico Style Mexican Food', 'Applebee's Grill + Bar',
       'Arby's', 'Ashland Cafe', 'Auntie Anne's / Subway',
       ...
       'Tropical Smoothie Cafe', 'Uncle Julio's', 'Vapiano',
       'Virginia Kitchen', 'Wata ~ Wing', 'Wendy's', 'Wendys', 'Wendy’s',
       'Woodmont Grill', 'Yum's II'],
      dtype='object', length=165)


In [82]:
MD_grouped.head()

Unnamed: 0,Neighbourhood,&pizza,AC&T/Subway,All About Burger,Anita's New Mexican Style Mexican Food,Anita's New Mexico Style Mexican Food,Applebee's Grill + Bar,Arby's,Ashland Cafe,Auntie Anne's / Subway,...,Tropical Smoothie Cafe,Uncle Julio's,Vapiano,Virginia Kitchen,Wata ~ Wing,Wendy's,Wendys,Wendy’s,Woodmont Grill,Yum's II
0,Baltimore,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.01,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0
1,Dundalk,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.01,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0
2,Ellicott City,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.01,...,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0
3,Fort Washington,0.01,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01
4,Gaithersburg,0.01,0.0,0.01,0.01,0.01,0.0,0.03,0.0,0.0,...,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0


In [83]:
MD_grouped.shape

(15, 165)

## Analysis

## Run k-means clustering

In [84]:
# Set number of clusters
kclusters = 5

MD_grouped_clu = MD_grouped.drop('Neighbourhood', 1)
#print('MD_grouped_clu: ', MD_grouped_clu)

# Run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0, n_init=100).fit(MD_grouped_clu)

# Check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 0, 4, 1, 1, 0, 3, 2, 0], dtype=int32)

In [85]:
# Add clustering labels
MD_grouped.insert(0, 'Cluster Labels', kmeans.labels_)

## Merge all data set into one for analysis. 

In [86]:
MD_merged = MDDataPop

# Merage MD grouped with MDDataPop to add latitude/longitude for the neighbourhood with high population
MD_merged = MD_merged.join(MD_grouped.set_index('Neighbourhood'), on='primary_city')

MD_merged.head(10)

Unnamed: 0,zip,type,primary_city,acceptable_cities,state,county,area_codes,country,latitude,longitude,...,Tropical Smoothie Cafe,Uncle Julio's,Vapiano,Virginia Kitchen,Wata ~ Wing,Wendy's,Wendys,Wendy’s,Woodmont Grill,Yum's II
0,20906,STANDARD,Silver Spring,Aspen Hill,MD,Montgomery County,301240,US,39.09,-77.06,...,0.0,0.003333,0.01,0.0,0.0,0.0,0.0,0.026667,0.0,0.01
1,20878,STANDARD,Gaithersburg,"Darnestown, N Potomac, No Potomac, North Potomac",MD,Montgomery County,240301,US,39.12,-77.25,...,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0
2,21234,STANDARD,Parkville,Baltimore,MD,Baltimore County,410443,US,39.38,-76.55,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0
3,20874,STANDARD,Germantown,Darnestown,MD,Montgomery County,240,US,39.17,-77.26,...,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.03,0.01,0.0
4,21122,STANDARD,Pasadena,"Lake Shore, Millersville, Riviera Beach",MD,Anne Arundel County,410443,US,39.11,-76.55,...,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0
5,20904,STANDARD,Silver Spring,Colesville,MD,Montgomery County,301,US,39.07,-76.98,...,0.0,0.003333,0.01,0.0,0.0,0.0,0.0,0.026667,0.0,0.01
6,20902,STANDARD,Silver Spring,Wheaton,MD,Montgomery County,240301,US,39.05,-77.04,...,0.0,0.003333,0.01,0.0,0.0,0.0,0.0,0.026667,0.0,0.01
7,21740,STANDARD,Hagerstown,,MD,Washington County,240301,US,39.63,-77.71,...,0.0,0.0,0.0,0.0,0.01,0.03,0.01,0.03,0.0,0.0
8,21117,STANDARD,Owings Mills,Garrison,MD,Baltimore County,410,US,39.41,-76.79,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0
9,21061,STANDARD,Glen Burnie,,MD,Anne Arundel County,410443,US,39.16,-76.63,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0


In [87]:
# Drop the type, acceptable_cities, area_codes, and country column
MD_merged = MD_merged.drop(columns=['type', 'acceptable_cities', 'area_codes', 'country'])
MD_merged.head()

Unnamed: 0,zip,primary_city,state,county,latitude,longitude,irs_estimated_population_2015,Cluster Labels,&pizza,AC&T/Subway,...,Tropical Smoothie Cafe,Uncle Julio's,Vapiano,Virginia Kitchen,Wata ~ Wing,Wendy's,Wendys,Wendy’s,Woodmont Grill,Yum's II
0,20906,Silver Spring,MD,Montgomery County,39.09,-77.06,68290,2,0.03,0.0,...,0.0,0.003333,0.01,0.0,0.0,0.0,0.0,0.026667,0.0,0.01
1,20878,Gaithersburg,MD,Montgomery County,39.12,-77.25,62930,1,0.01,0.0,...,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0
2,21234,Parkville,MD,Baltimore County,39.38,-76.55,62620,0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0
3,20874,Germantown,MD,Montgomery County,39.17,-77.26,59300,1,0.0,0.0,...,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.03,0.01,0.0
4,21122,Pasadena,MD,Anne Arundel County,39.11,-76.55,57620,0,0.0,0.0,...,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0


## Let's pop a map and see how the clusters look like.

In [88]:
# Change cluster labels in MD_merged from float to int
MD_merged['Cluster Labels'] = MD_merged['Cluster Labels'].astype(int)

# Visualize the resulting clusters
map_clusters = folium.Map(location=[39.09, -77.06], zoom_start=9)

# Set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# Add markers to the map
markers_colors = []
for lat, lon, poi, cluster, popNum in zip(MD_merged['latitude'],  MD_merged['longitude'], MD_merged['primary_city'], MD_merged['Cluster Labels'], MD_merged['irs_estimated_population_2015']):
    label = folium.Popup(str(poi) + ' Cluster: ' + str(cluster) + ' Population: ' + str(popNum), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
    
map_clusters

In [89]:
print(MD_merged.columns[[1] + list(range(5, MD_merged.shape[1]))])
#print(MD_merged.columns[0])
print(MD_merged.columns)

Index(['primary_city', 'longitude', 'irs_estimated_population_2015',
       'Cluster Labels', '&pizza', 'AC&T/Subway', 'All About Burger',
       'Anita's New Mexican Style Mexican Food',
       'Anita's New Mexico Style Mexican Food', 'Applebee's Grill + Bar',
       ...
       'Tropical Smoothie Cafe', 'Uncle Julio's', 'Vapiano',
       'Virginia Kitchen', 'Wata ~ Wing', 'Wendy's', 'Wendys', 'Wendy’s',
       'Woodmont Grill', 'Yum's II'],
      dtype='object', length=168)
Index(['zip', 'primary_city', 'state', 'county', 'latitude', 'longitude',
       'irs_estimated_population_2015', 'Cluster Labels', '&pizza',
       'AC&T/Subway',
       ...
       'Tropical Smoothie Cafe', 'Uncle Julio's', 'Vapiano',
       'Virginia Kitchen', 'Wata ~ Wing', 'Wendy's', 'Wendys', 'Wendy’s',
       'Woodmont Grill', 'Yum's II'],
      dtype='object', length=172)


## Now, let's examine each cluster and find out the ideal location for opening the fast food restaurant.

### In cluster 1, it has 6 cities and has 

In [90]:
#Examine clusters

#Cluster 1
MD_merged.loc[MD_merged['Cluster Labels'] == 0, MD_merged.columns[[1] + list(range(6, MD_merged.shape[1]))]]

Unnamed: 0,primary_city,irs_estimated_population_2015,Cluster Labels,&pizza,AC&T/Subway,All About Burger,Anita's New Mexican Style Mexican Food,Anita's New Mexico Style Mexican Food,Applebee's Grill + Bar,Arby's,...,Tropical Smoothie Cafe,Uncle Julio's,Vapiano,Virginia Kitchen,Wata ~ Wing,Wendy's,Wendys,Wendy’s,Woodmont Grill,Yum's II
2,Parkville,62620,0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0
4,Pasadena,57620,0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,...,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0
8,Owings Mills,52350,0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0
9,Glen Burnie,51090,0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0
11,Dundalk,50150,0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0
14,Baltimore,47150,0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0
15,Baltimore,45550,0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0
17,Ellicott City,45480,0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,...,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0


In [91]:
# Count number of fast food resturant for cluster 1 in each primary city when the number is greater than 0.0
MD_merged_C1 = MD_merged.loc[MD_merged['Cluster Labels'] == 0, MD_merged.columns[list(range(8, MD_merged.shape[1]))]]
#np.count_nonzero(MD_merged_C1,axis=1)

In [92]:
# Calculate the serving population per each fast food restaurant

# Total population of cluster 1 
MD_merged_temp = MD_merged[MD_merged['Cluster Labels'] == 0].reset_index()
MD_merged_P = MD_merged_temp['irs_estimated_population_2015'].sum()

# Number of fast food restaurant of cluster 1
MD_merged_temp['SumOfRestaurant'] = np.count_nonzero(MD_merged_C1,axis=1)
MD_merged_R = MD_merged_temp['SumOfRestaurant'].sum()
#print(MD_merged_R)

print('Total population of cluster 1: ', MD_merged_P)
print('Number of fast food restaurant in cluster 1: ', MD_merged_R)
print('Number of serving population of each fast food restaurant in cluster 1 is: ', int(MD_merged_P/MD_merged_R))

Total population of cluster 1:  412010
Number of fast food restaurant in cluster 1:  473
Number of serving population of each fast food restaurant in cluster 1 is:  871


In [93]:
#Cluster 2
MD_merged.loc[MD_merged['Cluster Labels'] == 1, MD_merged.columns[[1] + list(range(6, MD_merged.shape[1]))]]

Unnamed: 0,primary_city,irs_estimated_population_2015,Cluster Labels,&pizza,AC&T/Subway,All About Burger,Anita's New Mexican Style Mexican Food,Anita's New Mexico Style Mexican Food,Applebee's Grill + Bar,Arby's,...,Tropical Smoothie Cafe,Uncle Julio's,Vapiano,Virginia Kitchen,Wata ~ Wing,Wendy's,Wendys,Wendy’s,Woodmont Grill,Yum's II
1,Gaithersburg,62930,1,0.01,0.0,0.01,0.01,0.01,0.0,0.03,...,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0
3,Germantown,59300,1,0.0,0.0,0.0,0.01,0.01,0.01,0.03,...,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.03,0.01,0.0


In [94]:
# Count number of fast food resturant for cluster 2 in each primary city when the number is greater than 0.0
MD_merged_C2 = MD_merged.loc[MD_merged['Cluster Labels'] == 1, MD_merged.columns[list(range(8, MD_merged.shape[1]))]]
#np.count_nonzero(MD_merged_C2,axis=1)

In [95]:
# Calculate the serving population per each fast food restaurant

# Total population of cluster 2 
MD_merged_temp = MD_merged[MD_merged['Cluster Labels'] == 1].reset_index()
MD_merged_P = MD_merged_temp['irs_estimated_population_2015'].sum()

# Number of fast food restaurant of cluster 2
MD_merged_temp['SumOfRestaurant'] = np.count_nonzero(MD_merged_C2,axis=1)
MD_merged_R = MD_merged_temp['SumOfRestaurant'].sum()
#print(MD_merged_R)

print('Total population of cluster 2: ', MD_merged_P)
print('Number of fast food restaurant in cluster 2: ', MD_merged_R)
print('Number of serving population of each fast food restaurant in cluster 2 is: ', int(MD_merged_P/MD_merged_R))

Total population of cluster 2:  122230
Number of fast food restaurant in cluster 2:  99
Number of serving population of each fast food restaurant in cluster 2 is:  1234


In [96]:
#Cluster 3
MD_merged.loc[MD_merged['Cluster Labels'] == 2, MD_merged.columns[[1] + list(range(6, MD_merged.shape[1]))]]

Unnamed: 0,primary_city,irs_estimated_population_2015,Cluster Labels,&pizza,AC&T/Subway,All About Burger,Anita's New Mexican Style Mexican Food,Anita's New Mexico Style Mexican Food,Applebee's Grill + Bar,Arby's,...,Tropical Smoothie Cafe,Uncle Julio's,Vapiano,Virginia Kitchen,Wata ~ Wing,Wendy's,Wendys,Wendy’s,Woodmont Grill,Yum's II
0,Silver Spring,68290,2,0.03,0.0,0.02,0.0,0.0,0.0,0.026667,...,0.0,0.003333,0.01,0.0,0.0,0.0,0.0,0.026667,0.0,0.01
5,Silver Spring,55730,2,0.03,0.0,0.02,0.0,0.0,0.0,0.026667,...,0.0,0.003333,0.01,0.0,0.0,0.0,0.0,0.026667,0.0,0.01
6,Silver Spring,52380,2,0.03,0.0,0.02,0.0,0.0,0.0,0.026667,...,0.0,0.003333,0.01,0.0,0.0,0.0,0.0,0.026667,0.0,0.01
12,Hyattsville,49520,2,0.02,0.0,0.0,0.0,0.0,0.0,0.03,...,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.01
13,Potomac,48030,2,0.03,0.0,0.0,0.01,0.01,0.0,0.03,...,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.01


In [97]:
# Count number of fast food resturant for cluster 3 in each primary city when the number is greater than 0.0
MD_merged_C3 = MD_merged.loc[MD_merged['Cluster Labels'] == 2, MD_merged.columns[list(range(8, MD_merged.shape[1]))]]
#np.count_nonzero(MD_merged_C3,axis=1)

In [98]:
# Calculate the serving population per each fast food restaurant

# Total population of cluster 3 
MD_merged_temp = MD_merged[MD_merged['Cluster Labels'] == 2].reset_index()
MD_merged_P = MD_merged_temp['irs_estimated_population_2015'].sum()

# Number of fast food restaurant of cluster 3
MD_merged_temp['SumOfRestaurant'] = np.count_nonzero(MD_merged_C3,axis=1)
MD_merged_R = MD_merged_temp['SumOfRestaurant'].sum()
#print(MD_merged_R)

print('Total population of cluster 3: ', MD_merged_P)
print('Number of fast food restaurant in cluster 3: ', MD_merged_R)
print('Number of serving population of each fast food restaurant in cluster 3 is: ', int(MD_merged_P/MD_merged_R))

Total population of cluster 3:  273950
Number of fast food restaurant in cluster 3:  273
Number of serving population of each fast food restaurant in cluster 3 is:  1003


In [99]:
#Cluster 4
MD_merged.loc[MD_merged['Cluster Labels'] == 3, MD_merged.columns[[1] + list(range(6, MD_merged.shape[1]))]]

Unnamed: 0,primary_city,irs_estimated_population_2015,Cluster Labels,&pizza,AC&T/Subway,All About Burger,Anita's New Mexican Style Mexican Food,Anita's New Mexico Style Mexican Food,Applebee's Grill + Bar,Arby's,...,Tropical Smoothie Cafe,Uncle Julio's,Vapiano,Virginia Kitchen,Wata ~ Wing,Wendy's,Wendys,Wendy’s,Woodmont Grill,Yum's II
7,Hagerstown,52380,3,0.0,0.01,0.0,0.0,0.0,0.0,0.03,...,0.0,0.0,0.0,0.0,0.01,0.03,0.01,0.03,0.0,0.0


In [100]:
# Count number of fast food resturant for cluster 4 in each primary city when the number is greater than 0.0
MD_merged_C4 = MD_merged.loc[MD_merged['Cluster Labels'] == 3, MD_merged.columns[list(range(8, MD_merged.shape[1]))]]
#np.count_nonzero(MD_merged_C4,axis=1)

In [101]:
# Calculate the serving population per each fast food restaurant

# Total population of cluster 4 
MD_merged_temp = MD_merged[MD_merged['Cluster Labels'] == 3].reset_index()
MD_merged_P = MD_merged_temp['irs_estimated_population_2015'].sum()

# Number of fast food restaurant of cluster 4
MD_merged_temp['SumOfRestaurant'] = np.count_nonzero(MD_merged_C4,axis=1)
MD_merged_R = MD_merged_temp['SumOfRestaurant'].sum()
#print(MD_merged_R)

print('Total population of cluster 4: ', MD_merged_P)
print('Number of fast food restaurant in cluster 4: ', MD_merged_R)
print('Number of serving population of each fast food restaurant in cluster 4 is: ', int(MD_merged_P/MD_merged_R))

Total population of cluster 4:  52380
Number of fast food restaurant in cluster 4:  39
Number of serving population of each fast food restaurant in cluster 4 is:  1343


In [102]:
#Cluster 5
MD_merged.loc[MD_merged['Cluster Labels'] == 4, MD_merged.columns[[1] + list(range(6, MD_merged.shape[1]))]]

Unnamed: 0,primary_city,irs_estimated_population_2015,Cluster Labels,&pizza,AC&T/Subway,All About Burger,Anita's New Mexican Style Mexican Food,Anita's New Mexico Style Mexican Food,Applebee's Grill + Bar,Arby's,...,Tropical Smoothie Cafe,Uncle Julio's,Vapiano,Virginia Kitchen,Wata ~ Wing,Wendy's,Wendys,Wendy’s,Woodmont Grill,Yum's II
10,Fort Washington,50410,4,0.01,0.0,0.02,0.0,0.0,0.0,0.02,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01
16,Upper Marlboro,45540,4,0.03,0.0,0.02,0.0,0.0,0.0,0.03,...,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01


In [103]:
# Count number of fast food resturant for cluster 5 in each primary city when the number is greater than 0.0
MD_merged_C5 = MD_merged.loc[MD_merged['Cluster Labels'] == 4, MD_merged.columns[list(range(8, MD_merged.shape[1]))]]
#np.count_nonzero(MD_merged_C5,axis=1)

In [104]:
# Calculate the serving population per each fast food restaurant

# Total population of cluster 5 
MD_merged_temp = MD_merged[MD_merged['Cluster Labels'] == 4].reset_index()
MD_merged_P = MD_merged_temp['irs_estimated_population_2015'].sum()

# Number of fast food restaurant of cluster 5
MD_merged_temp['SumOfRestaurant'] = np.count_nonzero(MD_merged_C5,axis=1)
MD_merged_R = MD_merged_temp['SumOfRestaurant'].sum()
#print(MD_merged_R)

print('Total population of cluster 5: ', MD_merged_P)
print('Number of fast food restaurant in cluster 5: ', MD_merged_R)
print('Number of serving population of each fast food restaurant in cluster 5 is: ', int(MD_merged_P/MD_merged_R))

Total population of cluster 5:  95950
Number of fast food restaurant in cluster 5:  102
Number of serving population of each fast food restaurant in cluster 5 is:  940
