# Introduction/Business problem

My research has the following goal

A friend of mine, who is in charge of a burger joint, needs to open a new location in Chicago. He intends to do it in a Douglas community area/borrough because he found numerous suitable places which are for sale or are available for rent. My friend is not very familiar with the city of Chicago and Douglas community area, which entails 8 neighbourhoods so he asked for help and needs me to analyze this community area and its most common places to decide which neighbourhood is the most suitable for the new location of his new burger place

Since I am not a Chicago native and I'm not familiar with the Douglas community area, to help my friend I decided to use Foursquare location data and give him a report on the 8 neighbourhoods and their most common places, so we can decide which one is the most suitable for the new burger place

# Data section

To solve our business problem I will have 2 main data sources in my report

1. Wikipedia article about the neighborhoods of Chicago

The article has the following URL (https://en.wikipedia.org/wiki/List_of_neighborhoods_in_Chicago) <br>
The article contains the table where the community areas with their respective neighborhoods are displayed. The table is a very suitable format since I can easily import it into my jupyter notebook and turn it into a pandas dataframe

2. Foursquare data <br>
Second data source is of course Foursquare. I will connect to their database and get the data about the venues in the Douglas community area and its' neighborhoods. Then I will analyze it to display the most common places based on the neighborhoods and cluster these 8 neighborhoods into 5 clusters

# Methodology

To reach my goal I will mainly use categorizing and grouping of the venues. Also, location visualization techniques, to map first the neighborhoods then created clusters on the map with relevant labels. Finally, I will use k-means clustering to facilitate the decision making

# Analysis section

First let's import the neighborhood data from the Wikipedia page, sort it by Community areas and reset the index

In [2]:
import pandas as pd
print("Done")

Done


In [5]:
url='https://en.wikipedia.org/wiki/List_of_neighborhoods_in_Chicago'

df=pd.read_html(url, index_col=None, header=0)[0]

df.head()

Unnamed: 0,Neighborhood,Community area
0,Albany Park,Albany Park
1,Altgeld Gardens,Riverdale
2,Andersonville,Edgewater
3,Archer Heights,Archer Heights
4,Armour Square,Armour Square


In [6]:
df.sort_values(by=['Community area'], inplace=True)
df.head()

Unnamed: 0,Neighborhood,Community area
0,Albany Park,Albany Park
129,Mayfair,Albany Park
147,North Mayfair,Albany Park
179,Ravenswood Manor,Albany Park
3,Archer Heights,Archer Heights


In [7]:
df.reset_index(drop = True, inplace = True)
df.head()

Unnamed: 0,Neighborhood,Community area
0,Albany Park,Albany Park
1,Mayfair,Albany Park
2,North Mayfair,Albany Park
3,Ravenswood Manor,Albany Park
4,Archer Heights,Archer Heights


Let's also install libraries which will come handy soon

In [8]:
!pip install geopy
!pip install geocoder
!pip install folium
import folium
from geopy.geocoders import Nominatim
print("done")

Collecting geocoder
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[K     |████████████████████████████████| 98 kB 7.2 MB/s  eta 0:00:01
Collecting ratelim
  Downloading ratelim-0.1.6-py2.py3-none-any.whl (4.0 kB)
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6
Collecting folium
  Downloading folium-0.11.0-py2.py3-none-any.whl (93 kB)
[K     |████████████████████████████████| 93 kB 2.9 MB/s  eta 0:00:01
Collecting branca>=0.3.0
  Downloading branca-0.4.1-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.11.0
done


As I mentioned in the introduction section, we are particularly interested in the Douglas community area. So I will create a separate dataframe for Douglas

In [9]:
douglas = df[df['Community area'] == 'Douglas'].reset_index(drop=True)
douglas

Unnamed: 0,Neighborhood,Community area
0,South Commons,Douglas
1,Prairie Shores,Douglas
2,Lake Meadows,Douglas
3,Bronzeville,Douglas
4,Stateway Gardens,Douglas
5,Groveland Park,Douglas
6,The Gap,Douglas
7,Dearborn Homes,Douglas


As you can see, unfortunately our data from Wikipedia does not include ZipCodes or lattitude/longitude values so I generated a separate list of neighborhood coordinates to append to our dataframe 

In [12]:
({"lattitude":[41.842182,41.8423,41.8352,41.824774,41.8275,41.8322,41.8343,41.8431],
                  "longitude":[-87.6203,-87.6149,-87.6140,-87.624294,-87.6284,-87.6094,-87.6199,-87.6278]})

{'lattitude': [41.842182,
  41.8423,
  41.8352,
  41.824774,
  41.8275,
  41.8322,
  41.8343,
  41.8431],
 'longitude': [-87.6203,
  -87.6149,
  -87.614,
  -87.624294,
  -87.6284,
  -87.6094,
  -87.6199,
  -87.6278]}

In [13]:
douglas.insert(2,"lattitude", [41.842182,41.8423,41.8352,41.824774,41.8275,41.8322,41.8343,41.8431], True)
douglas

Unnamed: 0,Neighborhood,Community area,lattitude
0,South Commons,Douglas,41.842182
1,Prairie Shores,Douglas,41.8423
2,Lake Meadows,Douglas,41.8352
3,Bronzeville,Douglas,41.824774
4,Stateway Gardens,Douglas,41.8275
5,Groveland Park,Douglas,41.8322
6,The Gap,Douglas,41.8343
7,Dearborn Homes,Douglas,41.8431


In [14]:
douglas.insert(3,"longitude", [-87.6203,-87.6149,-87.6140,-87.624294,-87.6284,-87.6094,-87.6199,-87.6278], True)
douglas

Unnamed: 0,Neighborhood,Community area,lattitude,longitude
0,South Commons,Douglas,41.842182,-87.6203
1,Prairie Shores,Douglas,41.8423,-87.6149
2,Lake Meadows,Douglas,41.8352,-87.614
3,Bronzeville,Douglas,41.824774,-87.624294
4,Stateway Gardens,Douglas,41.8275,-87.6284
5,Groveland Park,Douglas,41.8322,-87.6094
6,The Gap,Douglas,41.8343,-87.6199
7,Dearborn Homes,Douglas,41.8431,-87.6278


Now let's visualize neighborhoods on the map

In [15]:
address = 'Douglas'

geolocator = Nominatim(user_agent="Do_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinate of Douglas are {}, {}.'.format(latitude, longitude))

The geographical coordinate of Douglas are 39.7628415, -88.2170516.


In [19]:
map_douglas = folium.Map(location=[latitude, longitude], zoom_start=6)


for lat, lng, label in zip(douglas['lattitude'], douglas['longitude'], douglas['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_douglas)  
    
map_douglas

For the next step I will connect to Foursquare to get the information about the venues in the first neighborhood in the list "South Commons". After that I will get the data for all the other neighborhoods and put it into one dataframe

In [20]:
CLIENT_ID = 'KKCM3R2MEVSSOYEVYNJ2423YBANRFA0C53RNMVKYSQ3ACZZ3' # your Foursquare ID
CLIENT_SECRET = 'D13PGQND0YJFVIZMRHMVVGHWIPJ0FDGHUEEJDBQMPWX5F3GD' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: KKCM3R2MEVSSOYEVYNJ2423YBANRFA0C53RNMVKYSQ3ACZZ3
CLIENT_SECRET:D13PGQND0YJFVIZMRHMVVGHWIPJ0FDGHUEEJDBQMPWX5F3GD


In [21]:
neighborhood_latitude = douglas.loc[0, 'lattitude'] # neighborhood latitude value
neighborhood_longitude = douglas.loc[0, 'longitude'] # neighborhood longitude value

neighborhood_name = douglas.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of South Commons are 41.842182, -87.6203.


In [22]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id=KKCM3R2MEVSSOYEVYNJ2423YBANRFA0C53RNMVKYSQ3ACZZ3&client_secret=D13PGQND0YJFVIZMRHMVVGHWIPJ0FDGHUEEJDBQMPWX5F3GD&v=20180605&ll=41.8421825,-87.6203&radius=500&limit=100'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=KKCM3R2MEVSSOYEVYNJ2423YBANRFA0C53RNMVKYSQ3ACZZ3&client_secret=D13PGQND0YJFVIZMRHMVVGHWIPJ0FDGHUEEJDBQMPWX5F3GD&v=20180605&ll=41.8421825,-87.6203&radius=500&limit=100'

Before getting the data, I will import necessary libraries

In [23]:
import json
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans
import numpy as np # library to handle data in a vectorized manner
print("Done")

Done


In [24]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5fca18a1fd895d01ae374a9f'},
 'response': {'headerLocation': 'Bronzeville',
  'headerFullLocation': 'Bronzeville, Chicago',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 10,
  'suggestedBounds': {'ne': {'lat': 41.8466825045, 'lng': -87.61427088264942},
   'sw': {'lat': 41.8376824955, 'lng': -87.62632911735058}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4cda071c22bd721e7554ed47',
       'name': "D3: Dre's Diesel Dome",
       'location': {'address': '125 E 26th St',
        'lat': 41.84542028734087,
        'lng': -87.62288689613342,
        'labeledLatLngs': [{'label': 'display',
          'lat': 41.84542028734087,
          'lng': -87.62288689613342}],
        'distance': 419,
        'postalCode': '60616',
 

In [25]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [26]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  app.launch_new_instance()


Unnamed: 0,name,categories,lat,lng
0,D3: Dre's Diesel Dome,Gym / Fitness Center,41.84542,-87.622887
1,Dunbar Park,Park,41.840119,-87.618592
2,Switch Harrisburg,Business Service,41.843214,-87.61835
3,Enterprise Rent-A-Car,Rental Car Location,41.844479,-87.623895
4,Wesley's Electric,Construction & Landscaping,41.84407,-87.61645


In [27]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [28]:
douglas_venues = getNearbyVenues(names=douglas['Neighborhood'],
                                   latitudes=douglas['lattitude'],
                                   longitudes=douglas['longitude']
                                  )

South Commons
Prairie Shores
Lake Meadows
Bronzeville
Stateway Gardens
Groveland Park
The Gap
Dearborn Homes


In [31]:
print(douglas_venues.shape)
douglas_venues.head()

(136, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,South Commons,41.842182,-87.6203,D3: Dre's Diesel Dome,41.84542,-87.622887,Gym / Fitness Center
1,South Commons,41.842182,-87.6203,Dunbar Park,41.840119,-87.618592,Park
2,South Commons,41.842182,-87.6203,Switch Harrisburg,41.843214,-87.61835,Business Service
3,South Commons,41.842182,-87.6203,Enterprise Rent-A-Car,41.844479,-87.623895,Rental Car Location
4,South Commons,41.842182,-87.6203,Wesley's Electric,41.84407,-87.61645,Construction & Landscaping


Now we have an important piece of information in the format of the pandas dataframe, the venues of the Douglas community area and their categories, which will be vital for the analysis and the decision making <br>
Now I will group the information by neighborhoods and then calculate how many unique categories we have

In [32]:
douglas_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bronzeville,11,11,11,11,11,11
Dearborn Homes,15,15,15,15,15,15
Groveland Park,12,12,12,12,12,12
Lake Meadows,21,21,21,21,21,21
Prairie Shores,8,8,8,8,8,8
South Commons,10,10,10,10,10,10
Stateway Gardens,24,24,24,24,24,24
The Gap,35,35,35,35,35,35


In [33]:
print('There are {} uniques categories.'.format(len(douglas_venues['Venue Category'].unique())))

There are 61 uniques categories.


Now let's modify the dataframe for better visualization and first get the top 5 venues by the neighborhood and finally top 10 most common venue categories by the neighborhood

In [34]:
# one hot encoding
douglas_onehot = pd.get_dummies(douglas_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
douglas_onehot['Neighborhood'] = douglas_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [douglas_onehot.columns[-1]] + list(douglas_onehot.columns[:-1])
douglas_onehot = douglas_onehot[fixed_columns]

douglas_onehot.head()

Unnamed: 0,Neighborhood,African Restaurant,Art Gallery,Asian Restaurant,Athletics & Sports,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,...,Snack Place,Soccer Field,Southern / Soul Food Restaurant,Sports Bar,Storage Facility,Supermarket,Tennis Court,Train Station,Video Store,Wings Joint
0,South Commons,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,South Commons,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,South Commons,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,South Commons,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,South Commons,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [35]:
douglas_onehot.shape

(136, 62)

In [36]:
douglas_grouped = douglas_onehot.groupby('Neighborhood').mean().reset_index()
douglas_grouped

Unnamed: 0,Neighborhood,African Restaurant,Art Gallery,Asian Restaurant,Athletics & Sports,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,...,Snack Place,Soccer Field,Southern / Soul Food Restaurant,Sports Bar,Storage Facility,Supermarket,Tennis Court,Train Station,Video Store,Wings Joint
0,Bronzeville,0.0,0.090909,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.090909,0.0,0.0
1,Dearborn Homes,0.0,0.0,0.066667,0.0,0.066667,0.0,0.0,0.0,0.0,...,0.0,0.066667,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0
2,Groveland Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.083333,0.083333,0.0,0.0,0.083333
3,Lake Meadows,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.047619,0.047619
4,Prairie Shores,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0
5,South Commons,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Stateway Gardens,0.0,0.0,0.0,0.041667,0.0,0.041667,0.041667,0.041667,0.291667,...,0.041667,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0
7,The Gap,0.028571,0.0,0.028571,0.0,0.0,0.028571,0.0,0.0,0.0,...,0.0,0.0,0.028571,0.0,0.0,0.028571,0.0,0.0,0.028571,0.057143


In [37]:
num_top_venues = 5

for hood in douglas_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = douglas_grouped[douglas_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Bronzeville----
                             venue  freq
0                Convenience Store  0.09
1  Southern / Soul Food Restaurant  0.09
2                     Liquor Store  0.09
3                      Art Gallery  0.09
4                            Motel  0.09


----Dearborn Homes----
                  venue  freq
0    Chinese Restaurant  0.20
1  Gym / Fitness Center  0.13
2      Storage Facility  0.07
3   Rental Car Location  0.07
4         Grocery Store  0.07


----Groveland Park----
         venue  freq
0  Wings Joint  0.08
1        Beach  0.08
2          Gym  0.08
3         Park  0.08
4   Food Stand  0.08


----Lake Meadows----
                  venue  freq
0  Fast Food Restaurant  0.14
1        Cosmetics Shop  0.10
2            Donut Shop  0.05
3        Ice Cream Shop  0.05
4                  Park  0.05


----Prairie Shores----
                  venue  freq
0                  Park  0.25
1        Lighting Store  0.12
2    Athletics & Sports  0.12
3         Train Station  0.12


In [38]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [39]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = douglas_grouped['Neighborhood']

for ind in np.arange(douglas_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(douglas_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head(10)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bronzeville,Gym / Fitness Center,Liquor Store,Train Station,Art Gallery,Park,Athletics & Sports,Southern / Soul Food Restaurant,Motel,Convenience Store,Italian Restaurant
1,Dearborn Homes,Chinese Restaurant,Gym / Fitness Center,Soccer Field,Grocery Store,New American Restaurant,Park,Rental Car Location,Bus Station,Hotel,Asian Restaurant
2,Groveland Park,Wings Joint,Lake,Gym,Residential Building (Apartment / Condo),Sandwich Place,Beach,Donut Shop,Park,Skate Park,Food Stand
3,Lake Meadows,Fast Food Restaurant,Cosmetics Shop,Wings Joint,Sandwich Place,Gym,Gym / Fitness Center,Video Store,Ice Cream Shop,Donut Shop,Mobile Phone Shop
4,Prairie Shores,Park,Shopping Mall,Train Station,Gym / Fitness Center,Athletics & Sports,Bus Station,Lighting Store,Wings Joint,Donut Shop,Eye Doctor
5,South Commons,Gym / Fitness Center,Park,Construction & Landscaping,Business Service,Bus Station,Shopping Mall,Rental Car Location,Flower Shop,Fast Food Restaurant,Food & Drink Shop
6,Stateway Gardens,Baseball Stadium,Shipping Store,Pizza Place,Sports Bar,Baseball Field,Coffee Shop,Park,Plaza,Bus Station,Sandwich Place
7,The Gap,Fast Food Restaurant,Cosmetics Shop,Wings Joint,Sandwich Place,Fried Chicken Joint,Historic Site,Mobile Phone Shop,Liquor Store,Ice Cream Shop,Video Store


As we can now see we have a dataframe, which clearly defines Douglas community area neighborhoods and top 10 most common places in the 8 areas. Now it is time to cluster these neighborhoods to simplify decision making and visualize it

In [40]:
# set number of clusters
kclusters = 5

douglas_grouped_clustering = douglas_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(douglas_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 4, 0, 0, 1, 1, 2, 0], dtype=int32)

In [41]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

douglas_merged = douglas


douglas_merged = douglas_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

douglas_merged.head() 

Unnamed: 0,Neighborhood,Community area,lattitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,South Commons,Douglas,41.842182,-87.6203,1,Gym / Fitness Center,Park,Construction & Landscaping,Business Service,Bus Station,Shopping Mall,Rental Car Location,Flower Shop,Fast Food Restaurant,Food & Drink Shop
1,Prairie Shores,Douglas,41.8423,-87.6149,1,Park,Shopping Mall,Train Station,Gym / Fitness Center,Athletics & Sports,Bus Station,Lighting Store,Wings Joint,Donut Shop,Eye Doctor
2,Lake Meadows,Douglas,41.8352,-87.614,0,Fast Food Restaurant,Cosmetics Shop,Wings Joint,Sandwich Place,Gym,Gym / Fitness Center,Video Store,Ice Cream Shop,Donut Shop,Mobile Phone Shop
3,Bronzeville,Douglas,41.824774,-87.624294,3,Gym / Fitness Center,Liquor Store,Train Station,Art Gallery,Park,Athletics & Sports,Southern / Soul Food Restaurant,Motel,Convenience Store,Italian Restaurant
4,Stateway Gardens,Douglas,41.8275,-87.6284,2,Baseball Stadium,Shipping Store,Pizza Place,Sports Bar,Baseball Field,Coffee Shop,Park,Plaza,Bus Station,Sandwich Place


In [43]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=7)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(douglas_merged['lattitude'], douglas_merged['longitude'], douglas_merged['Neighborhood'], douglas_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        fill=True,
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

We have visualized clusters on the map on our hand now and the final step for analysis would be exploring each cluster

In [44]:
douglas_merged.loc[douglas_merged['Cluster Labels'] == 0, douglas_merged.columns[[1] + list(range(5, douglas_merged.shape[1]))]]

Unnamed: 0,Community area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Douglas,Fast Food Restaurant,Cosmetics Shop,Wings Joint,Sandwich Place,Gym,Gym / Fitness Center,Video Store,Ice Cream Shop,Donut Shop,Mobile Phone Shop
5,Douglas,Wings Joint,Lake,Gym,Residential Building (Apartment / Condo),Sandwich Place,Beach,Donut Shop,Park,Skate Park,Food Stand
6,Douglas,Fast Food Restaurant,Cosmetics Shop,Wings Joint,Sandwich Place,Fried Chicken Joint,Historic Site,Mobile Phone Shop,Liquor Store,Ice Cream Shop,Video Store


In [45]:
douglas_merged.loc[douglas_merged['Cluster Labels'] == 1, douglas_merged.columns[[1] + list(range(5, douglas_merged.shape[1]))]]

Unnamed: 0,Community area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Douglas,Gym / Fitness Center,Park,Construction & Landscaping,Business Service,Bus Station,Shopping Mall,Rental Car Location,Flower Shop,Fast Food Restaurant,Food & Drink Shop
1,Douglas,Park,Shopping Mall,Train Station,Gym / Fitness Center,Athletics & Sports,Bus Station,Lighting Store,Wings Joint,Donut Shop,Eye Doctor


In [46]:
douglas_merged.loc[douglas_merged['Cluster Labels'] == 2, douglas_merged.columns[[1] + list(range(5, douglas_merged.shape[1]))]]

Unnamed: 0,Community area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Douglas,Baseball Stadium,Shipping Store,Pizza Place,Sports Bar,Baseball Field,Coffee Shop,Park,Plaza,Bus Station,Sandwich Place


In [47]:
douglas_merged.loc[douglas_merged['Cluster Labels'] == 3, douglas_merged.columns[[1] + list(range(5, douglas_merged.shape[1]))]]

Unnamed: 0,Community area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Douglas,Gym / Fitness Center,Liquor Store,Train Station,Art Gallery,Park,Athletics & Sports,Southern / Soul Food Restaurant,Motel,Convenience Store,Italian Restaurant


In [48]:
douglas_merged.loc[douglas_merged['Cluster Labels'] == 4, douglas_merged.columns[[1] + list(range(5, douglas_merged.shape[1]))]]

Unnamed: 0,Community area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Douglas,Chinese Restaurant,Gym / Fitness Center,Soccer Field,Grocery Store,New American Restaurant,Park,Rental Car Location,Bus Station,Hotel,Asian Restaurant


# Results and discussion

We have reached our goal which included several steps: Getting the information about the venues in the Douglas community area, classifying them by the venue categories and finally clustering them and displaying visually on the map <br>

This process has given us the opportunity to make the decision regarding the location of the burger place.<br>

First of all, I would like to highlight the following observation: As expected, there are significant amounts of fast food locations in the community area. For example, we can take cluster 0, where among the first most common places were fast food restaurants and wings joint. Competition is welcome in modern business but naturally, business owner wants to open the new location at the neighborhood which is not very saturated with competitors such as wings joint, fast food restaurants, sandwich places and etc.<br>

Based on these criteriums, cluster 2 is particularly interesting for the following reasons: There is a major sports related venue here a large baseball stadium (Guaranteed rate field) which is home to the major league baseball team. This means that thousands of supporters will gather there on regular basis and many sports lovers also share the love for the fast food, especially burgers. Another reason is that is not overly saturated with fast food locations<br>

To sum up, my recommendation to the burger joint owner would be cluster 2 and I believe that he has to pursue place for sale or rent in that area for the reasons specified above

# Conclusion

The aim of this project was to identify suitable areas for opening new burger joint location in Chicago, Douglas community area.
Besides that, this project showed that using modern tools, data analyst/scientist can give a stakeholder recommendations regarding geographical location without even visiting the place itself and even more without being in the same country or the continent