<h1 align=center><font size = 4>Capstone Project- Battle of Neighborhoods </font></h1>

## Opening a Pizza Place in Suburbs of Chicago

### Introduction
The primary objective of this project is to locate the ideal area to start a Pizza Place in one of the many suburbs of Chicago. Chicago is the most populous city in Illinois, and the third most populous city in the United States. Chicago is an international hub for finance, culture, commerce, industry, education, technology, telecommunications, and transportation. Since it's very expensive to live in the city, many people working in the city choose to live in the suburbs of Chicago. Some of the nearby suburbs such as Aurora, Naperville, Joliet etc have transits to Chicago so that people can get to the city for work. These places have phenomenal malls, supermarkets, restaurants etc.
Chicago already has lots of pizza places and the cost of setting up a new restaurant in Chicago will be more expensive compared to its nearby cities. Rather than opening a Pizza place in Chicago, one can look into opening one in the nearby expanding cities. First, we have to find the nearby cities and number of pizza places already in these areas. Then we have to look into how populated these cities are. Opening a restaurant in a place that is less populated may not bring much revenue, whereas opening one in a populated area will bring more competition to business.

### The Target Audience
The target audience for this project is any business owner interested in opening a Pizza Place in the Chicagoland area. In fact by making small changes on what venue data is being analyzed, the same principle can be used to find an ideal location for any business. Starting a new business in the Chicagoland area will be more optimal than starting one in the main city. For example the rent for a building in Chicago will be more compared to its nearby cities.

### Data
In order to find the ideal location to open a new Pizza Place we will need the following data,

• List of the neighboring cities of Chicago with the population.
The above data is obtained from Wikipedia page https://en.wikipedia.org/wiki/Chicago_metropolitan_area. This page gives the nearby cities of Chicago with their population. Using data scrapping we get the city names and the population of each city. This will give us an idea about how thickly or thinly are these places populated. Below is the data that we get from the wikipedia website.

• Venue Data of the above cities
We will use Foursquare API for getting this information. By uing the API we can get all the venues for the above cities and conduct an analysis on the Pizza Places in each of the cities. Below is the sample data that is optained using Foursquare API

First let's install all the packages and import the libraries

In [1]:
!pip install folium
!pip install geocoder

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |████████████████████████████████| 102kB 5.3MB/s ta 0:00:01
Collecting ratelim (from geocoder)
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad49c/ratelim-0.1.6-py2.py3-none-any.whl
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6


In [2]:
!pip install beautifulsoup4
!pip install geopy

Collecting beautifulsoup4
[?25l  Downloading https://files.pythonhosted.org/packages/d1/41/e6495bd7d3781cee623ce23ea6ac73282a373088fcd0ddc809a047b18eae/beautifulsoup4-4.9.3-py3-none-any.whl (115kB)
[K     |████████████████████████████████| 122kB 4.7MB/s eta 0:00:01
[?25hCollecting soupsieve>1.2; python_version >= "3.0" (from beautifulsoup4)
  Downloading https://files.pythonhosted.org/packages/02/fb/1c65691a9aeb7bd6ac2aa505b84cb8b49ac29c976411c6ab3659425e045f/soupsieve-2.1-py3-none-any.whl
Installing collected packages: soupsieve, beautifulsoup4
Successfully installed beautifulsoup4-4.9.3 soupsieve-2.1
Collecting geopy
[?25l  Downloading https://files.pythonhosted.org/packages/0c/67/915668d0e286caa21a1da82a85ffe3d20528ec7212777b43ccd027d94023/geopy-2.1.0-py3-none-any.whl (112kB)
[K     |████████████████████████████████| 112kB 6.1MB/s eta 0:00:01
[?25hCollecting geographiclib<2,>=1.49 (from geopy)
  Downloading https://files.pythonhosted.org/packages/8b/62/26ec95a98ba64299163199e9

In [3]:
import matplotlib.cm as cm
import matplotlib.colors as colors

import folium # map rendering library
from bs4 import BeautifulSoup

import requests

from geopy.geocoders import Nominatim
import pandas as pd
import numpy as np


### Methodology

We will get the names and population of nearby cities of Chicago

In [4]:
Chicago_page = requests.get("https://en.wikipedia.org/wiki/Chicago_metropolitan_area")
Chicago_page

<Response [200]>

Using Beautifulsoup we will extract the required information from the wikipedia website

In [5]:
# Using Beautifulsoup we will get the required information from the wikipedia website
soup = BeautifulSoup(Chicago_page.content, 'html.parser')
section = soup.find(id="Over_100,000_population").parent
Neighboring_sub = section.find_next('ul').find_all('li')
records=[]
suburb_population=[]
for elem in Neighboring_sub:
    records.append(elem.text.strip().split('(')[0])
    suburb_population.append(elem.text)
records    


['Aurora, Illinois ',
 'Naperville, Illinois ',
 'Joliet, Illinois ',
 'Elgin, Illinois ',
 'Kenosha, Wisconsin ']

In [6]:
section = soup.find(id="Over_50,000_population").parent
Neighboring_sub = section.find_next('ul').find_all('li')
for elem in Neighboring_sub:
    records.append(elem.text.strip().split('(')[0])
    suburb_population.append(elem.text)
records 

['Aurora, Illinois ',
 'Naperville, Illinois ',
 'Joliet, Illinois ',
 'Elgin, Illinois ',
 'Kenosha, Wisconsin ',
 'Waukegan, Illinois ',
 'Cicero, Illinois ',
 'Bolingbrook, Illinois ',
 'Arlington Heights, Illinois ',
 'Hammond, Indiana ',
 'Gary, Indiana ',
 'Evanston, Illinois ',
 'Schaumburg, Illinois ',
 'Palatine, Illinois ',
 'Skokie, Illinois ',
 'Des Plaines, Illinois ',
 'Orland Park, Illinois ',
 'Tinley Park, Illinois ',
 'Oak Lawn, Illinois',
 'Berwyn, Illinois ',
 'Mount Prospect, Illinois ',
 'Wheaton, Illinois ',
 'Oak Park, Illinois ']

We extracted the required information. Now we will have to clean and format the data so that we have a data frame with the nearby cities and its population

In [7]:
suburb_population
column_names1 = ['suburb', 'population']

# instantiate the dataframe
suburb_population_format = pd.DataFrame(columns=column_names1)
for i,pop in enumerate(suburb_population):
    suburb=pop.split('(')[0]
    if suburb != "Oak Lawn, Illinois":
        pop1=int(pop[pop.find('(')+len("("):pop.rfind(")")].replace(',',''))
        suburb_population_format = suburb_population_format.append({'suburb': suburb,'population': pop1}, ignore_index=True)
        
suburb_population_format    

Unnamed: 0,suburb,population
0,"Aurora, Illinois",198870
1,"Naperville, Illinois",149196
2,"Joliet, Illinois",148227
3,"Elgin, Illinois",111401
4,"Kenosha, Wisconsin",101124
5,"Waukegan, Illinois",85720
6,"Cicero, Illinois",79943
7,"Bolingbrook, Illinois",76468
8,"Arlington Heights, Illinois",74593
9,"Hammond, Indiana",74423


Next we will get the Latitude and Longitude information of all the above cities

In [8]:
column_names = ['Suburb', 'Latitude', 'Longitude'] 

# instantiate the dataframe
NearBySuburb = pd.DataFrame(columns=column_names)

for elem in records:
    suburb_name=elem
    geolocator = Nominatim(user_agent="chi_explorer")
    location = geolocator.geocode(elem)
    sub_lat=location.latitude
    sub_lon=location.longitude
    NearBySuburb = NearBySuburb.append({'Suburb': suburb_name,
                                          'Latitude': sub_lat,
                                          'Longitude': sub_lon}, ignore_index=True)
NearBySuburb  

Unnamed: 0,Suburb,Latitude,Longitude
0,"Aurora, Illinois",41.75717,-88.314754
1,"Naperville, Illinois",41.77287,-88.147928
2,"Joliet, Illinois",41.52636,-88.084021
3,"Elgin, Illinois",42.03726,-88.281099
4,"Kenosha, Wisconsin",42.584677,-87.821226
5,"Waukegan, Illinois",42.363633,-87.844794
6,"Cicero, Illinois",41.84554,-87.75402
7,"Bolingbrook, Illinois",41.70033,-88.071771
8,"Arlington Heights, Illinois",42.081156,-87.980216
9,"Hammond, Indiana",41.583366,-87.500043


Next using Foursquare API we will explore all the places in the above cities

In [9]:
CLIENT_ID = 'LEUWLT3BHS0W4FKXSFL3GOGCBVHMZDY0M1IXTLBGS44DOUCG' # your Foursquare ID
CLIENT_SECRET = 'CG3HWM5SPRWUDUEORPNUGU3AH4SU03RWD2LHGM5A4FQYB0KW' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: LEUWLT3BHS0W4FKXSFL3GOGCBVHMZDY0M1IXTLBGS44DOUCG
CLIENT_SECRET:CG3HWM5SPRWUDUEORPNUGU3AH4SU03RWD2LHGM5A4FQYB0KW


In [10]:
def getNearbyVenues(names, latitudes, longitudes, radius=1500):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Suburb', 
                  'Suburb Latitude', 
                  'Suburb Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [11]:
ChicagoSuburbVenues = getNearbyVenues(names=NearBySuburb['Suburb'],
                                   latitudes=NearBySuburb['Latitude'],
                                   longitudes=NearBySuburb['Longitude']
                                  )

Aurora, Illinois 
Naperville, Illinois 
Joliet, Illinois 
Elgin, Illinois 
Kenosha, Wisconsin 
Waukegan, Illinois 
Cicero, Illinois 
Bolingbrook, Illinois 
Arlington Heights, Illinois 
Hammond, Indiana 
Gary, Indiana 
Evanston, Illinois 
Schaumburg, Illinois 
Palatine, Illinois 
Skokie, Illinois 
Des Plaines, Illinois 
Orland Park, Illinois 
Tinley Park, Illinois 
Oak Lawn, Illinois
Berwyn, Illinois 
Mount Prospect, Illinois 
Wheaton, Illinois 
Oak Park, Illinois 


In [12]:
ChicagoSuburbVenues.shape

(1377, 7)

From the above value now we know 1377 venues were retrieved
Now let us look at sample of the data.

In [13]:
ChicagoSuburbVenues.head(25)

Unnamed: 0,Suburb,Suburb Latitude,Suburb Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Aurora, Illinois",41.75717,-88.314754,Paramount Theatre,41.757414,-88.314938,Theater
1,"Aurora, Illinois",41.75717,-88.314754,Gillerson's,41.759606,-88.315031,Pub
2,"Aurora, Illinois",41.75717,-88.314754,Endiro Coffee,41.759559,-88.314812,Café
3,"Aurora, Illinois",41.75717,-88.314754,Tecalitlan Restaurant,41.756192,-88.313986,Mexican Restaurant
4,"Aurora, Illinois",41.75717,-88.314754,Ballydoyle Irish Pub,41.759348,-88.315126,Pub
5,"Aurora, Illinois",41.75717,-88.314754,Jake's Bagels & Deli,41.760742,-88.310206,Bagel Shop
6,"Aurora, Illinois",41.75717,-88.314754,Taqueria El Tio & Restaurant,41.757515,-88.319157,Mexican Restaurant
7,"Aurora, Illinois",41.75717,-88.314754,Two Brothers Roundhouse,41.760639,-88.308788,Brewery
8,"Aurora, Illinois",41.75717,-88.314754,La Quinta De Los Reyes,41.758603,-88.312294,Mexican Restaurant
9,"Aurora, Illinois",41.75717,-88.314754,Holiday Inn Express & Suites,41.759361,-88.309838,Hotel


In [14]:
# We will find all the unique venues retrieved using Foursquare API
ChicagoSuburbVenues['Venue Category'].unique()

array(['Theater', 'Pub', 'Café', 'Mexican Restaurant', 'Bagel Shop',
       'Brewery', 'Hotel', 'Ice Cream Shop', 'Science Museum',
       'Sandwich Place', 'Park', 'Brazilian Restaurant', 'Discount Store',
       'Breakfast Spot', 'Financial or Legal Service', 'Pharmacy',
       'Pizza Place', 'Gym', 'Fast Food Restaurant', 'Grocery Store',
       'Casino', 'Train Station', 'Chinese Restaurant', 'Clothing Store',
       'Farmers Market', 'Liquor Store', 'Business Service',
       'Food & Drink Shop', 'Gym / Fitness Center', 'Home Service',
       'American Restaurant', 'Shoe Store', 'Tea Room',
       'Seafood Restaurant', 'Bookstore', 'Plaza', 'Coffee Shop',
       'Italian Restaurant', 'Kitchen Supply Store', 'Nail Salon',
       'Trail', 'Bar', 'Frozen Yogurt Shop', 'Dessert Shop',
       'Candy Store', 'Portuguese Restaurant', 'Beer Bar',
       'Japanese Restaurant', 'Snack Place', 'Cosmetics Shop',
       'Noodle House', 'Steakhouse', 'Concert Hall', 'Yoga Studio',
       'Conve

In [15]:
print('There are '+ format(len(ChicagoSuburbVenues['Venue Category'].unique())) + ' unique categories')

There are 227 unique categories


 Using one hot coding we will convert all the uniques venues to columns so that we can filter the data on any venue. 
For our analysis we have to filter the data on "Pizza Place" 

In [16]:
# one hot encoding
suburbs_onehot = pd.get_dummies(ChicagoSuburbVenues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
suburbs_onehot['Suburbs'] = ChicagoSuburbVenues['Suburb'] 

# move neighborhood column to the first column
fixed_columns = [suburbs_onehot.columns[-1]] + list(suburbs_onehot.columns[:-1])
suburbs_onehot = suburbs_onehot[fixed_columns]

suburbs_onehot.head()

Unnamed: 0,Suburbs,ATM,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Assisted Living,Athletics & Sports,...,Video Store,Vietnamese Restaurant,Warehouse Store,Water Park,Weight Loss Center,Wine Bar,Winery,Wings Joint,Women's Store,Yoga Studio
0,"Aurora, Illinois",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Aurora, Illinois",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Aurora, Illinois",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Aurora, Illinois",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Aurora, Illinois",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [17]:
suburbs_onehot.shape

(1377, 228)

In [18]:
suburbs_grouped = suburbs_onehot.groupby('Suburbs').sum().reset_index()
suburbs_grouped

Unnamed: 0,Suburbs,ATM,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Assisted Living,Athletics & Sports,...,Video Store,Vietnamese Restaurant,Warehouse Store,Water Park,Weight Loss Center,Wine Bar,Winery,Wings Joint,Women's Store,Yoga Studio
0,"Arlington Heights, Illinois",0,2,0,0,0,0,0,0,0,...,2,0,0,0,0,1,0,0,1,1
1,"Aurora, Illinois",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Berwyn, Illinois",0,0,0,0,0,1,0,0,0,...,0,0,1,0,0,0,0,0,1,0
3,"Bolingbrook, Illinois",1,1,0,0,0,0,2,0,0,...,0,0,0,0,0,0,0,1,0,1
4,"Cicero, Illinois",1,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,1,0,0
5,"Des Plaines, Illinois",1,2,0,0,0,0,0,1,0,...,1,1,0,1,0,0,0,1,0,0
6,"Elgin, Illinois",0,2,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
7,"Evanston, Illinois",0,2,0,0,0,1,1,0,1,...,2,0,0,0,0,0,0,1,0,0
8,"Gary, Indiana",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,"Hammond, Indiana",0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0


In [19]:
print(suburbs_grouped[['Suburbs','Pizza Place']])

                         Suburbs  Pizza Place
0   Arlington Heights, Illinois             1
1              Aurora, Illinois             3
2              Berwyn, Illinois             4
3         Bolingbrook, Illinois             3
4              Cicero, Illinois             2
5         Des Plaines, Illinois             2
6               Elgin, Illinois             4
7            Evanston, Illinois             5
8                 Gary, Indiana             1
9              Hammond, Indiana             0
10             Joliet, Illinois             2
11           Kenosha, Wisconsin             1
12     Mount Prospect, Illinois             2
13         Naperville, Illinois             5
14            Oak Lawn, Illinois            2
15           Oak Park, Illinois             3
16        Orland Park, Illinois             2
17           Palatine, Illinois             3
18         Schaumburg, Illinois             2
19             Skokie, Illinois             3
20        Tinley Park, Illinois   

In [20]:
df_PizzaPlace=suburbs_grouped[['Suburbs','Pizza Place']]

Now lets Cluster the Pizza Places so that we know the number of pizza places in each City

In [21]:
# set number of clusters
# import k-means from clustering stage
from sklearn.cluster import KMeans
kclusters = 15

suburbs_grouped_clustering = df_PizzaPlace.drop('Suburbs', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(suburbs_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:15] 

  return_n_iter=True)


array([2, 3, 0, 3, 7, 7, 0, 4, 2, 5, 7, 2, 7, 4, 7], dtype=int32)

In [32]:
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=3, init='k-means++', max_iter=15, random_state=8)
X = df_PizzaPlace.drop(['Suburbs'], axis=1)

In [33]:
kmeans.fit(X)
kmeans.labels_[0:10]

array([1, 0, 2, 0, 0, 0, 2, 2, 1, 1], dtype=int32)

In [41]:
kclusters=6
suburbs_grouped_clustering = df_PizzaPlace.drop('Suburbs', 1)
#suburbs_grouped_clustering = suburbs_grouped.drop('Suburbs', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(suburbs_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 3, 1, 3, 0, 0, 1, 4, 2, 5], dtype=int32)

In [42]:
#Create a new dataframe that includes the cluster suburb data
df_merged=df_PizzaPlace.copy()
df_merged['Cluster Labels']=kmeans.labels_
df_merged.head(25)

Unnamed: 0,Suburbs,Pizza Place,Cluster Labels
0,"Arlington Heights, Illinois",1,2
1,"Aurora, Illinois",3,3
2,"Berwyn, Illinois",4,1
3,"Bolingbrook, Illinois",3,3
4,"Cicero, Illinois",2,0
5,"Des Plaines, Illinois",2,0
6,"Elgin, Illinois",4,1
7,"Evanston, Illinois",5,4
8,"Gary, Indiana",1,2
9,"Hammond, Indiana",0,5


In [43]:
df_merged = df_merged.join(NearBySuburb.set_index("Suburb"), on="Suburbs")
df_merged

Unnamed: 0,Suburbs,Pizza Place,Cluster Labels,Latitude,Longitude
0,"Arlington Heights, Illinois",1,2,42.081156,-87.980216
1,"Aurora, Illinois",3,3,41.75717,-88.314754
2,"Berwyn, Illinois",4,1,41.850587,-87.793668
3,"Bolingbrook, Illinois",3,3,41.70033,-88.071771
4,"Cicero, Illinois",2,0,41.84554,-87.75402
5,"Des Plaines, Illinois",2,0,42.041582,-87.887392
6,"Elgin, Illinois",4,1,42.03726,-88.281099
7,"Evanston, Illinois",5,4,42.044739,-87.693046
8,"Gary, Indiana",1,2,41.602129,-87.337137
9,"Hammond, Indiana",0,5,41.583366,-87.500043


In [44]:
df_merged.sort_values(['Cluster Labels'],inplace=True)

In [45]:
df_merged

Unnamed: 0,Suburbs,Pizza Place,Cluster Labels,Latitude,Longitude
18,"Schaumburg, Illinois",2,0,42.033361,-88.083406
4,"Cicero, Illinois",2,0,41.84554,-87.75402
5,"Des Plaines, Illinois",2,0,42.041582,-87.887392
16,"Orland Park, Illinois",2,0,41.630663,-87.853629
14,"Oak Lawn, Illinois",2,0,41.710866,-87.758108
12,"Mount Prospect, Illinois",2,0,42.066417,-87.937291
10,"Joliet, Illinois",2,0,41.52636,-88.084021
22,"Wheaton, Illinois",4,1,41.864696,-88.110171
20,"Tinley Park, Illinois",4,1,41.573367,-87.784494
2,"Berwyn, Illinois",4,1,41.850587,-87.793668


Now lets plot the clusters

In [46]:
latitude= 41.8755616
longtitude=-87.6244212

map_clusters = folium.Map(location=[latitude, longtitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_merged['Latitude'], df_merged['Longitude'], df_merged['Suburbs'], df_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

The area marked in orange has zero places

In [47]:
# Analysing ech cluster
df_merged.loc[df_merged['Cluster Labels'] == 5, df_merged.columns[[0] + list(range(1, df_merged.shape[1]))]]

Unnamed: 0,Suburbs,Pizza Place,Cluster Labels,Latitude,Longitude
9,"Hammond, Indiana",0,5,41.583366,-87.500043


In [48]:
df_merged.loc[df_merged['Cluster Labels'] == 2, df_merged.columns[[0] + list(range(1, df_merged.shape[1]))]]

Unnamed: 0,Suburbs,Pizza Place,Cluster Labels,Latitude,Longitude
0,"Arlington Heights, Illinois",1,2,42.081156,-87.980216
8,"Gary, Indiana",1,2,41.602129,-87.337137
11,"Kenosha, Wisconsin",1,2,42.584677,-87.821226


In [49]:
suburb_population_format

Unnamed: 0,suburb,population
0,"Aurora, Illinois",198870
1,"Naperville, Illinois",149196
2,"Joliet, Illinois",148227
3,"Elgin, Illinois",111401
4,"Kenosha, Wisconsin",101124
5,"Waukegan, Illinois",85720
6,"Cicero, Illinois",79943
7,"Bolingbrook, Illinois",76468
8,"Arlington Heights, Illinois",74593
9,"Hammond, Indiana",74423


In [50]:
ChicagoSuburbVenues.loc[(ChicagoSuburbVenues['Suburb'] == 'Arlington Heights, Illinois ')]
#ChicagoSuburbVenues

Unnamed: 0,Suburb,Suburb Latitude,Suburb Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
480,"Arlington Heights, Illinois",42.081156,-87.980216,altThai,42.082113,-87.980847,Thai Restaurant
481,"Arlington Heights, Illinois",42.081156,-87.980216,Starbucks,42.082282,-87.981517,Coffee Shop
482,"Arlington Heights, Illinois",42.081156,-87.980216,Sweet T's Bakery-Cake Studio,42.081974,-87.982570,Bakery
483,"Arlington Heights, Illinois",42.081156,-87.980216,Metropolis Performing Arts Center,42.082871,-87.984665,Theater
484,"Arlington Heights, Illinois",42.081156,-87.980216,Bentley's Corner Barkery,42.081130,-87.981808,Pet Store
...,...,...,...,...,...,...,...
545,"Arlington Heights, Illinois",42.081156,-87.980216,Pioneer Park Community Center,42.076141,-87.992929,Gym
546,"Arlington Heights, Illinois",42.081156,-87.980216,Metra Union Pacific Northwest# 652,42.076601,-87.966393,Light Rail Station
547,"Arlington Heights, Illinois",42.081156,-87.980216,Metra UP-NW Line,42.088608,-87.993289,Train
548,"Arlington Heights, Illinois",42.081156,-87.980216,Arlington Heights Train Station,42.078275,-87.963649,Train Station


## Results

The results of the analysis are,
Cluster 4 has most number of pizza places which is five where as Cluster 5 has Zero pizza place.
The only place in Cluster 5 is Hammond, Indiana.Since our objective is to start a Pizza place in Illinois near Chicago , lets look at the the cluster that has atleast one pizza place. Below are the list of places that has atleast one pizza place.
##### Arlington Heights, Illinois
##### Gary, Indiana
##### Kenosha, Wisconsin
Arlington Heights, Illinois  has an estimated population of 74,593 and has many other attractions like theaters, train stations,stores etc. So an ideal location to start a new Pizza place will be Arlington Heights, Illinois.