<h1 align=center><font size = 5>Settling Down in Chicago</font></h1>

## Table of Contents

1. Introduction
2. Data
3. Methodology
4. Results
5. Discussion
6. Conclusion

## 1. Introduction

Suppose you have a friend who, due to his change of job, will be moving to Chicago but know little of the city. He is of upper-middle class and has a family of four, he and his wife, together with their two young kids.

You are going to give him advice on **which community area to settle down in Chicago**. After a discussion with your friend, you both agree that this community should meet 2 requirements,

- **Safe**. Since we all know that Chicago is by no means a safe city, so it is the first thing that we would consider.
- **Relaxed**. As we are looking for a community to live a life, this community should provide a calm and relax environment, and of course with sufficient venues to support daily life, such as dry cleaning, restaurants, playgrounds for kids, etc.

In this project, we are going to use data science knowledge to sort out the ideal community(s) for your friend to settle down in Chicago based on the 2 criteria.

## 2. Data

#### Regarding safety:

There are 77 communities in Chicago. In order to get the information of the safety status for each community, we can look for statistics in https://data.cityofchicago.org/ . There is a dataset recording each incident of crime that occurred in Chicago from 2001 to present, and for simplicity, We just downloaded the subset for year 2018.

But the above dataset only has community areas in numerical form. In order to get the names for each community, we will have to scape a Wikipedia page to match the numbers with the names. The webpage is https://en.wikipedia.org/wiki/Community_areas_in_Chicago .

With these statistics in hand, we could solve the problem of finding safe communities. **Let’s just define that the communities which have crime incidents less than the average of Chicago is safe**.

#### Regarding relaxation:

As for the second criterion, we’ll turn to https://foursquare.com/ to **segment the safe communities into 3 clusters** based on the similarity of venues. For example, the cluster we’re looking for should have venue categories like parks, fields, restaurants, dry cleaners, etc in a high ocurrence.

Here, we’ll use the **K-Means Clustering algorithm** of machine learning to find out each cluster’s characteristics and to decide which community(s) to recommend.

## 3. Methodology

As discussed in the Data section, we will divide our analysis into 2 parts. In the first part, we sort out the safe communities in Chicago use Pandas, BeautifulSoup4 and other Python libraries. In the second part, we cluster these safe communities using Foursquare API and k-means algorithm. Below, we’ll explain in more detail.

First thing first, let's download all the dependencies that we will need.

In [1]:
import numpy as np 

import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

!pip install beautifulsoup4
!pip install lxml

import requests
from bs4 import BeautifulSoup
import lxml

import json 

!pip install geocoder
import geocoder

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim 

from pandas.io.json import json_normalize 

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Collecting beautifulsoup4
[?25l  Downloading https://files.pythonhosted.org/packages/1a/b7/34eec2fe5a49718944e215fde81288eec1fa04638aa3fb57c1c6cd0f98c3/beautifulsoup4-4.8.0-py3-none-any.whl (97kB)
[K     |████████████████████████████████| 102kB 981kB/s ta 0:00:01
[?25hCollecting soupsieve>=1.2 (from beautifulsoup4)
  Downloading https://files.pythonhosted.org/packages/0b/44/0474f2207fdd601bb25787671c81076333d2c80e6f97e92790f8887cf682/soupsieve-1.9.3-py2.py3-none-any.whl
Installing collected packages: soupsieve, beautifulsoup4
Successfully installed beautifulsoup4-4.8.0 soupsieve-1.9.3
Collecting lxml
[?25l  Downloading https://files.pythonhosted.org/packages/ec/be/5ab8abdd8663c0386ec2dd595a5bc0e23330a0549b8a91e32f38c20845b6/lxml-4.4.1-cp36-cp36m-manylinux1_x86_64.whl (5.8MB)
[K     |████████████████████████████████| 5.8MB 5.1MB/s eta 0:00:01     |██████████████████████████████▉ | 5.6MB 5.1MB/s eta 0:00:01
[?25hInstalling collected packages: lxml
Successfully installed lxml-4.4.1


### 3.1 Sorting out the safe communities

**3.1.1 Data acquisition**

I've already downloaded the records of Chicago Crime incidents of year 2018 from https://data.cityofchicago.org/Public-Safety/Crimes-2018/3i3m-jwuy and uploaded it to the console as a csv file named 'Crimes_-_2018.csv'. First, let's use pandas to open this file and take a peak.

In [2]:
df_crimes=pd.read_csv('Crimes_-_2018.csv')
df_crimes.head()

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location,Historical Wards 2003-2015,Zip Codes,Community Areas,Census Tracts,Wards,Boundaries - ZIP Codes,Police Districts,Police Beats
0,11556487,JC104662,12/31/2018 11:59:00 PM,112XX S SACRAMENTO AVE,1320,CRIMINAL DAMAGE,TO VEHICLE,STREET,False,False,2211,22,19.0,74.0,14,1158309.0,1829936.0,2018,01/10/2019 03:16:50 PM,41.689079,-87.696064,"(41.689078832, -87.696064026)",33.0,4447.0,73.0,256.0,42.0,33.0,9.0,254.0
1,11561837,JC110056,12/31/2018 11:59:00 PM,013XX W 72ND ST,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,734,7,6.0,67.0,11,1168573.0,1857018.0,2018,01/17/2019 02:26:36 PM,41.763181,-87.657709,"(41.763181359, -87.657709477)",17.0,22257.0,65.0,408.0,32.0,23.0,17.0,216.0
2,11552699,JC100043,12/31/2018 11:57:00 PM,084XX S SANGAMON ST,1310,CRIMINAL DAMAGE,TO PROPERTY,APARTMENT,False,False,613,6,21.0,71.0,14,1171454.0,1848783.0,2018,01/10/2019 03:16:50 PM,41.740521,-87.647391,"(41.740520866, -87.647390719)",18.0,21554.0,70.0,530.0,13.0,59.0,20.0,233.0
3,11552724,JC100006,12/31/2018 11:56:00 PM,018XX S ALLPORT ST,440,BATTERY,AGG: HANDS/FIST/FEET NO/MINOR INJURY,OTHER,True,False,1233,12,25.0,31.0,08B,1168327.0,1891230.0,2018,01/10/2019 03:16:50 PM,41.857068,-87.657625,"(41.857068095, -87.657625201)",8.0,14920.0,33.0,365.0,26.0,43.0,15.0,150.0
4,11552731,JC100031,12/31/2018 11:55:00 PM,078XX S SANGAMON ST,486,BATTERY,DOMESTIC BATTERY SIMPLE,APARTMENT,False,False,621,6,17.0,71.0,08B,1171332.0,1852934.0,2018,01/10/2019 03:16:50 PM,41.751914,-87.647717,"(41.75191443, -87.647716532)",17.0,21554.0,70.0,487.0,31.0,59.0,20.0,229.0


**3.1.2 Data wrangling**

As we can see, each incident is recorded as a row and there are many columns. But actually what we need here is simply the crime counts of each community. So, let's do some data wrangling.

Notice there's another problem within this dataset, it only provides each community's code but don't provide its name. We'll deal with it later.

In [3]:
# Select only 2 columns: 'ID' and 'Community Areas', to form a dataframe.

df_crimes=df_crimes[['ID','Community Areas']]
df_crimes.head()

Unnamed: 0,ID,Community Areas
0,11556487,73.0
1,11561837,65.0
2,11552699,70.0
3,11552724,33.0
4,11552731,70.0


In [4]:
# Group by 'Community Areas' to count the total number of each community's crimes.

df_counts=df_crimes.groupby('Community Areas').size().reset_index(name='Crime Counts')
df_counts.head()

Unnamed: 0,Community Areas,Crime Counts
0,1.0,2706
1,2.0,646
2,3.0,723
3,4.0,3082
4,5.0,1505


In [5]:
# Convert the data type of 'Community Areas' to integer.

df_counts['Community Areas'] = df_counts['Community Areas'].astype('int')
df_counts.head()

Unnamed: 0,Community Areas,Crime Counts
0,1,2706
1,2,646
2,3,723
3,4,3082
4,5,1505


In [6]:
# Get a general idea of the dataset.

df_counts.describe()

Unnamed: 0,Community Areas,Crime Counts
count,77.0,77.0
mean,39.0,3411.272727
std,22.371857,3003.399708
min,1.0,233.0
25%,20.0,1145.0
50%,39.0,2251.0
75%,58.0,4835.0
max,77.0,14750.0


**3.1.3 Web scraping**

Now we've almost get the full picture of the safety status for each community in Chicago, the only missing puzzle is the name reflected to the community code. Luckily, there's a Wikipedia page that has all the information, and what we are going to do is to scrape it.

Here is the website, https://en.wikipedia.org/wiki/Community_areas_in_Chicago .

In [7]:
# Web scraping using BeautifulSoup4.

url='https://en.wikipedia.org/wiki/Community_areas_in_Chicago'
source = requests.get(url).text
soup = BeautifulSoup(source, 'lxml')

tables = soup.find_all('table',class_='wikitable')
commlist = []
for table in tables:
    table_rows = table.find_all('tr')
    for tr in table_rows:
        td = tr.find_all('td')
        row = [i.text.strip('\n') for i in td]
        commlist.append(row)

df_commlist = pd.DataFrame(commlist,columns=['Community Areas','Community Names','Neighborhoods'])
df_commlist.head()

Unnamed: 0,Community Areas,Community Names,Neighborhoods
0,,,
1,8.0,Near North Side,Cabrini–Green\nThe Gold Coast\nGoose Island\nM...
2,32.0,Loop,Loop\nNew Eastside\nSouth Loop\nWest Loop Gate
3,33.0,Near South Side,Dearborn Park\nPrinter's Row\nSouth Loop\nPrai...
4,,,


As we can see, this dataframe is formed from several tables on the website, so it contains some blank rows. Also, we do not need the column 'Neighborhoods' in our case, again, let's do some cleaning.

In [8]:
# Drop the blank rows and the column 'Neighborhoods'

df = df_commlist.dropna()
df = df.drop(['Neighborhoods'], axis=1)
print('The size is : ', df.shape)
print('The type is :\n', df.dtypes)

The size is :  (77, 2)
The type is :
 Community Areas    object
Community Names    object
dtype: object


In [9]:
# Convert the data type of 'Community Areas' to integer.

df['Community Areas'] = df['Community Areas'].astype('int')
df.dtypes

Community Areas     int64
Community Names    object
dtype: object

Now, we can merge the two dataframes to get a full picture of each community's safety status.

In [10]:
df_final = pd.merge(df_counts, df, on='Community Areas', how="left")
df_final.head(10)

Unnamed: 0,Community Areas,Crime Counts,Community Names
0,1,2706,Rogers Park
1,2,646,West Ridge
2,3,723,Uptown
3,4,3082,Lincoln Square
4,5,1505,North Center
5,6,2017,Lake View
6,7,2718,Lincoln Park
7,8,1971,Near North Side
8,9,3658,Edison Park
9,10,3569,Norwood Park


Great! This dataframe now looks neat and tidy. The next thing we are going to do is to get each community's geocode, so that we can point them on a map.

**3.1.4 Adding geocode**

In [11]:
# Using geocoder to get each community's geocode.

Lat_list=[]
Lng_list=[]
for i in range(df_final.shape[0]):
    address='{}, Chicago, Illinois'.format(df_final.at[i,'Community Names'])
    g = geocoder.arcgis(address)
    Lat_list.append(g.latlng[0])
    Lng_list.append(g.latlng[1])

print(Lat_list)
print(Lng_list)

[42.00882000000007, 41.99948000000006, 41.98123000000004, 41.975700000000074, 41.95411000000007, 41.939820000000054, 41.92184000000003, 41.90015000000005, 42.00789000000003, 41.98547000000008, 41.97046000000006, 41.97640000000007, 41.98294000000004, 41.968290000000025, 41.95777000000004, 41.953550000000064, 41.95274000000006, 41.92902000000004, 41.92802000000006, 41.928480000000036, 41.93925000000007, 41.923280000000034, 41.89907000000005, 41.893290000000036, 41.887740000000065, 41.87702000000007, 41.87850000000003, 41.87301000000008, 41.993735597378326, 41.745733680925845, 41.85224000000005, 41.87834000000004, 41.85388000000006, 41.834580000000074, 41.840850000000046, 41.82496000000003, 41.81253000000004, 41.80931000000004, 41.80952000000008, 41.79141000000004, 41.79388000000006, 41.78046000000006, 41.76158000000004, 41.74108000000007, 41.745070000000055, 41.72261000000003, 41.720300000000066, 41.73336000000006, 41.70211000000006, 41.69282000000004, 41.69659000000007, 41.7120700000000

In [12]:
# Convert the latitude and longitude lists to dataframe.

df_latlng = pd.DataFrame({'Community Names': list(df_final['Community Names']), 'Latitude': Lat_list, 'Longitude': Lng_list})
df_latlng.head()

Unnamed: 0,Community Names,Latitude,Longitude
0,Rogers Park,42.00882,-87.66618
1,West Ridge,41.99948,-87.69266
2,Uptown,41.98123,-87.66
3,Lincoln Square,41.9757,-87.68914
4,North Center,41.95411,-87.68142


Let's merge the 2 dataframes of community safety status and community geocode.

In [13]:
df_chicago = pd.merge(df_final, df_latlng, on='Community Names', how="left")
df_chicago

Unnamed: 0,Community Areas,Crime Counts,Community Names,Latitude,Longitude
0,1,2706,Rogers Park,42.00882,-87.66618
1,2,646,West Ridge,41.99948,-87.69266
2,3,723,Uptown,41.98123,-87.66
3,4,3082,Lincoln Square,41.9757,-87.68914
4,5,1505,North Center,41.95411,-87.68142
5,6,2017,Lake View,41.93982,-87.65682
6,7,2718,Lincoln Park,41.92184,-87.64744
7,8,1971,Near North Side,41.90015,-87.63433
8,9,3658,Edison Park,42.00789,-87.81399
9,10,3569,Norwood Park,41.98547,-87.80611


**3.1.5 Sort out the safe communities**

As we discussed before, in this case, we define the safe communities as those have less crimes reported than the average of Chicago. So, from the above dataframe, we narrow down the community candidates to those who fit our definition.

In [14]:
df_chicago1 = df_chicago[df_chicago['Crime Counts']<df_chicago['Crime Counts'].mean()]
df_chicago1.count()

Community Areas    49
Crime Counts       49
Community Names    49
Latitude           49
Longitude          49
dtype: int64

It turns out there are 49 communities. Let's point them on the map.

**3.1.6 Show the community candidates on the map**

Here we use the Folium library to visualize the safe communities in Chicago.

In [15]:
# Get the geocode of Chicago.

address = 'Chicago, IL'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Chicago are {}, {}.'.format(latitude, longitude))

  """


The geograpical coordinate of Chicago are 41.8755616, -87.6244212.


In [16]:
# create map of Chicago using latitude and longitude values
map_chicago = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df_chicago1['Latitude'], df_chicago1['Longitude'], df_chicago1['Community Names']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_chicago)  
    
map_chicago

**So far, we have finished our first part of data analysis: sorting out the safe communities.**

### 3.2 Clustering these safe communities

In the second part, we want to cluster our community candidates based on their similarity of environment. **We define the environment by the categories of venues each community has. And we can further assume that the communities falling to the same cluster share the similar categories of venues.**

**3.2.1 Explore each community candidate**

Here, we will use the Foursquare API to explore community candidates in Chicago, to get the most common venue categories in each community.

First, we would like to get the top 100 venues that are in each community candidates within a radius of 500 meters. And then group them by Community Names, to check how many venues are returned for each community candidate, the results are shown below.

In [17]:
# Define Foursquare Credentials and Version
CLIENT_ID = 'CZI3FZKMAMNPGZWQZETF44ARNSBZ5E0WD3AUAJJIAR12ZGI3'
CLIENT_SECRET = 'ECPSNWF4IQRO4ILMYVDVG30NNVYCQP0XSGBVIZMCYZERP2BR'
VERSION = '20190813'

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: CZI3FZKMAMNPGZWQZETF44ARNSBZ5E0WD3AUAJJIAR12ZGI3
CLIENT_SECRET:ECPSNWF4IQRO4ILMYVDVG30NNVYCQP0XSGBVIZMCYZERP2BR


In [18]:
# Get the top 100 venues that are in each community candidates within a radius of 500 meters.

def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT = 100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url1 = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url1).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Community Names', 
                  'Community Latitude', 
                  'Community Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [19]:
# Create a new dataframe called chicago_venues.

chicago_venues = pd.DataFrame(getNearbyVenues(names=df_chicago1['Community Names'],
                                   latitudes=df_chicago['Latitude'],
                                   longitudes=df_chicago['Longitude']
                                  ))

Rogers Park
West Ridge
Uptown
Lincoln Square
North Center
Lake View
Lincoln Park
Near North Side
Jefferson Park
Forest Glen
North Park
Albany Park
Portage Park
Irving Park
Dunning
Montclare
Hermosa
Avondale
Logan Square
Near South Side
Armour Square
Douglas
Oakland
Hyde Park
South Shore
Chatham
South Chicago
Burnside
Calumet Heights
Roseland
South Deering
East Side
West Pullman
Riverdale
Hegewisch
Garfield Ridge
Brighton Park
Bridgeport
New City
West Elsdon
Gage Park
Greater Grand Crossing
Auburn Gresham
Beverly
Washington Heights
Mount Greenwood
Morgan Park
O'Hare
Edgewater


In [20]:
# check the size of the resulting dataframe

print(chicago_venues.shape)
chicago_venues.head()

(1126, 7)


Unnamed: 0,Community Names,Community Latitude,Community Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Rogers Park,42.00882,-87.66618,Morse Fresh Market,42.008087,-87.667041,Grocery Store
1,Rogers Park,42.00882,-87.66618,Rogers Park Social,42.00736,-87.666265,Bar
2,Rogers Park,42.00882,-87.66618,Lifeline Theatre,42.007372,-87.666284,Theater
3,Rogers Park,42.00882,-87.66618,Glenwood Sunday Market,42.008525,-87.666251,Farmers Market
4,Rogers Park,42.00882,-87.66618,Rogers Park Provisions,42.007528,-87.666193,Gift Shop


In [21]:
# check how many venues were returned for each community candidate.

chicago_venues.groupby('Community Names').count()

Unnamed: 0_level_0,Community Latitude,Community Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Community Names,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Albany Park,16,16,16,16,16,16
Armour Square,32,32,32,32,32,32
Auburn Gresham,20,20,20,20,20,20
Avondale,4,4,4,4,4,4
Beverly,8,8,8,8,8,8
Bridgeport,15,15,15,15,15,15
Brighton Park,12,12,12,12,12,12
Burnside,6,6,6,6,6,6
Calumet Heights,8,8,8,8,8,8
Chatham,12,12,12,12,12,12


In [22]:
print('There are {} uniques categories.'.format(len(chicago_venues['Venue Category'].unique())))

There are 230 uniques categories.


Analyze each community candidate.

In [23]:
# one hot encoding
chicago_onehot = pd.get_dummies(chicago_venues[['Venue Category']], prefix="", prefix_sep="")

# add Community Names column back to dataframe
chicago_onehot['Community Names'] = chicago_venues['Community Names'] 

# move Community Names column to the first column
fixed_columns = [chicago_onehot.columns[-1]] + list(chicago_onehot.columns[:-1])
chicago_onehot = chicago_onehot[fixed_columns]

chicago_onehot.head()

Unnamed: 0,Community Names,ATM,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Amphitheater,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Bed & Breakfast,Beer Garden,Beer Store,Bike Rental / Bike Share,Board Shop,Boat or Ferry,Bookstore,Border Crossing,Boutique,Bowling Alley,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burmese Restaurant,Bus Line,Bus Station,Bus Stop,Café,Cajun / Creole Restaurant,Candy Store,Caribbean Restaurant,Check Cashing Service,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,College Gym,College Rec Center,Comedy Club,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Cupcake Shop,Currency Exchange,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Disc Golf,Discount Store,Dive Bar,Dog Run,Donut Shop,Dry Cleaner,Eastern European Restaurant,Electronics Store,Elementary School,Event Space,Exhibit,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Financial or Legal Service,Fish & Chips Shop,Fish Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Football Stadium,Fountain,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Garden,Gas Station,Gastropub,General Entertainment,German Restaurant,Gift Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Halal Restaurant,Harbor / Marina,Hardware Store,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hot Dog Joint,Hotel,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Theater,Indonesian Restaurant,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Karaoke Bar,Korean Restaurant,Lake,Latin American Restaurant,Laundromat,Light Rail Station,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Mobile Phone Shop,Monument / Landmark,Movie Theater,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Optical Shop,Pakistani Restaurant,Paper / Office Supplies Store,Park,Performing Arts Venue,Peruvian Restaurant,Pet Service,Pet Store,Pharmacy,Pie Shop,Pizza Place,Playground,Plaza,Poke Place,Print Shop,Pub,Public Art,Radio Station,Ramen Restaurant,Record Shop,Recording Studio,Recreation Center,Rental Car Location,Restaurant,Road,Rock Club,Roof Deck,Russian Restaurant,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Repair,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,South American Restaurant,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stationery Store,Steakhouse,Storage Facility,Street Art,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tailor Shop,Tanning Salon,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Toy / Game Store,Track,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Weight Loss Center,Whisky Bar,Wine Shop,Wings Joint,Yoga Studio
0,Rogers Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Rogers Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Rogers Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Rogers Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Rogers Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [24]:
chicago_onehot.shape

(1126, 231)

Group rows by community names and by taking the mean of the frequency of occurrence of each category.

In [25]:
chicago_grouped = chicago_onehot.groupby('Community Names').mean().reset_index()
chicago_grouped

Unnamed: 0,Community Names,ATM,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Amphitheater,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Bed & Breakfast,Beer Garden,Beer Store,Bike Rental / Bike Share,Board Shop,Boat or Ferry,Bookstore,Border Crossing,Boutique,Bowling Alley,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burmese Restaurant,Bus Line,Bus Station,Bus Stop,Café,Cajun / Creole Restaurant,Candy Store,Caribbean Restaurant,Check Cashing Service,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,College Gym,College Rec Center,Comedy Club,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Cupcake Shop,Currency Exchange,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Disc Golf,Discount Store,Dive Bar,Dog Run,Donut Shop,Dry Cleaner,Eastern European Restaurant,Electronics Store,Elementary School,Event Space,Exhibit,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Financial or Legal Service,Fish & Chips Shop,Fish Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Football Stadium,Fountain,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Garden,Gas Station,Gastropub,General Entertainment,German Restaurant,Gift Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Halal Restaurant,Harbor / Marina,Hardware Store,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hot Dog Joint,Hotel,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Theater,Indonesian Restaurant,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Karaoke Bar,Korean Restaurant,Lake,Latin American Restaurant,Laundromat,Light Rail Station,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Mobile Phone Shop,Monument / Landmark,Movie Theater,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Optical Shop,Pakistani Restaurant,Paper / Office Supplies Store,Park,Performing Arts Venue,Peruvian Restaurant,Pet Service,Pet Store,Pharmacy,Pie Shop,Pizza Place,Playground,Plaza,Poke Place,Print Shop,Pub,Public Art,Radio Station,Ramen Restaurant,Record Shop,Recording Studio,Recreation Center,Rental Car Location,Restaurant,Road,Rock Club,Roof Deck,Russian Restaurant,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Repair,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,South American Restaurant,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stationery Store,Steakhouse,Storage Facility,Street Art,Supermarket,Supplement Shop,Sushi Restaurant,Taco Place,Tailor Shop,Tanning Salon,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Toy / Game Store,Track,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Weight Loss Center,Whisky Bar,Wine Shop,Wings Joint,Yoga Studio
0,Albany Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0625,0.0,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Armour Square,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.03125,0.09375,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.03125,0.0,0.03125,0.0,0.0,0.03125,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.03125,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0
2,Auburn Gresham,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.1,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0
3,Avondale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0
4,Beverly,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Bridgeport,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.066667,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Brighton Park,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Burnside,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Calumet Heights,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Chatham,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [26]:
chicago_grouped.shape

(49, 231)

Print each community name along with the top 5 most common venues

In [27]:
num_top_venues = 5

for hood in chicago_grouped['Community Names']:
    print("----"+hood+"----")
    temp = chicago_grouped[chicago_grouped['Community Names'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Albany Park----
                venue  freq
0  Light Rail Station  0.06
1         Tailor Shop  0.06
2        Liquor Store  0.06
3                 Bar  0.06
4                Bank  0.06


----Armour Square----
                venue  freq
0         Bus Station  0.09
1  Chinese Restaurant  0.06
2                Park  0.06
3          Food Truck  0.06
4             Brewery  0.03


----Auburn Gresham----
                 venue  freq
0        Deli / Bodega  0.10
1          Pizza Place  0.10
2      Harbor / Marina  0.05
3          Bus Station  0.05
4  Fried Chicken Joint  0.05


----Avondale----
                        venue  freq
0          Mexican Restaurant  0.25
1                 Wings Joint  0.25
2  Financial or Legal Service  0.25
3                 Pizza Place  0.25
4        Pakistani Restaurant  0.00


----Beverly----
                  venue  freq
0                  Park  0.38
1              Boutique  0.25
2  Fast Food Restaurant  0.12
3                Lounge  0.12
4            Donut

Put that into a pandas dataframe.

In [28]:
# Sort the venues in descending order.
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [29]:
# create the new dataframe and display the top 10 venues for each community candidate.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Community Names']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
CommunityNames_venues_sorted = pd.DataFrame(columns=columns)
CommunityNames_venues_sorted['Community Names'] = chicago_grouped['Community Names']

for ind in np.arange(chicago_grouped.shape[0]):
    CommunityNames_venues_sorted.iloc[ind, 1:] = return_most_common_venues(chicago_grouped.iloc[ind, :], num_top_venues)

CommunityNames_venues_sorted.head()

Unnamed: 0,Community Names,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Albany Park,Liquor Store,Asian Restaurant,Cosmetics Shop,Dance Studio,Park,Paper / Office Supplies Store,Bus Stop,Light Rail Station,Bar,Bank
1,Armour Square,Bus Station,Chinese Restaurant,Food Truck,Park,Diner,Dance Studio,Discount Store,Donut Shop,Rental Car Location,Road
2,Auburn Gresham,Pizza Place,Deli / Bodega,Intersection,Playground,Pharmacy,Park,Record Shop,Optical Shop,Chinese Restaurant,Sandwich Place
3,Avondale,Wings Joint,Mexican Restaurant,Financial or Legal Service,Pizza Place,Donut Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market,Exhibit
4,Beverly,Park,Boutique,Donut Shop,Lounge,Fast Food Restaurant,Dry Cleaner,Financial or Legal Service,Filipino Restaurant,Field,Farmers Market


**3.2.2 Cluster communities**

We use k-means algorithm to segment the community candidates, with the feature of most common venue categories in each community, and based on our sample size (49), we decide to **set the number of clusters to 3.**

In [30]:
# set number of clusters
kclusters = 3

chicago_grouped_clustering = chicago_grouped.drop('Community Names', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(chicago_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 2, 2, 2, 1, 2, 2, 2, 2, 2], dtype=int32)

In [31]:
# add clustering labels
CommunityNames_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

chicago_clustered = df_chicago1

# merge chicago_grouped with chicago to add latitude/longitude for each Community
chicago_clustered = chicago_clustered.join(CommunityNames_venues_sorted.set_index('Community Names'), on='Community Names')

chicago_clustered.head()

Unnamed: 0,Community Areas,Crime Counts,Community Names,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,2706,Rogers Park,42.00882,-87.66618,2,Theater,Pizza Place,American Restaurant,Bar,ATM,Gas Station,Mexican Restaurant,Sandwich Place,Café,Donut Shop
1,2,646,West Ridge,41.99948,-87.69266,2,Indian Restaurant,Pakistani Restaurant,Grocery Store,Fast Food Restaurant,Football Stadium,Donut Shop,Market,Fruit & Vegetable Store,Clothing Store,Juice Bar
2,3,723,Uptown,41.98123,-87.66,2,Pizza Place,Sandwich Place,Sushi Restaurant,Asian Restaurant,Vietnamese Restaurant,Bus Station,Chinese Restaurant,Coffee Shop,Theater,Mexican Restaurant
3,4,3082,Lincoln Square,41.9757,-87.68914,2,Bus Station,Bar,Café,Convenience Store,Pizza Place,Korean Restaurant,Sandwich Place,Liquor Store,Food & Drink Shop,Karaoke Bar
4,5,1505,North Center,41.95411,-87.68142,2,Bar,Coffee Shop,Bank,Mobile Phone Shop,Boutique,American Restaurant,Dive Bar,Pub,Pharmacy,Yoga Studio


In [32]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(chicago_clustered['Latitude'], chicago_clustered['Longitude'], chicago_clustered['Community Names'], chicago_clustered['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

As the map above shows, we now have 3 clusters of communities. In the next part, we will examine each cluster.

## 4. Results

**4.1 Cluster 1**

In [33]:
chicago_clustered.loc[chicago_clustered['Cluster Labels'] == 0, chicago_clustered.columns[[1,2] + list(range(5, chicago_clustered.shape[1]))]]

Unnamed: 0,Crime Counts,Community Names,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
76,233,Edgewater,0,Intersection,Chinese Restaurant,Yoga Studio,Donut Shop,Financial or Legal Service,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market,Exhibit


As we can see, cluster 1 only contains 1 community. Based on its Top 10 Most Common Venue Categories, it looks like a community with a slow paced life, as venues like Yoga Studio, Farmers Market, Exhibit are very common, although the 1st common venue is intersection, which may seem a little awkward.

**4.2 Cluster 2**

In [34]:
chicago_clustered.loc[chicago_clustered['Cluster Labels'] == 1, chicago_clustered.columns[[1,2] + list(range(5, chicago_clustered.shape[1]))]]

Unnamed: 0,Crime Counts,Community Names,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,1093,Jefferson Park,1,Neighborhood,Park,Theater,Yoga Studio,Donut Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market,Exhibit
11,474,Forest Glen,1,Park,Mexican Restaurant,Bar,Yoga Studio,Dry Cleaner,Financial or Legal Service,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market
48,1085,Roseland,1,Park,Gas Station,Liquor Store,Clothing Store,Seafood Restaurant,Dog Run,Field,Fast Food Restaurant,Farmers Market,Exhibit
54,2224,Hegewisch,1,Park,Food & Drink Shop,Bus Station,Yoga Studio,Donut Shop,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market,Exhibit
71,3113,Beverly,1,Park,Boutique,Donut Shop,Lounge,Fast Food Restaurant,Dry Cleaner,Financial or Legal Service,Filipino Restaurant,Field,Farmers Market


Cluster 2 looks exactly the type of community we are looking for. There are venues like parks, dry cleaners, fields, farmers markets, exhibits and of course restaurants, which perfectly meet our requirement of a relaxed, laid-back environment, and the parks and fields are 'must-have's for family with kids.

**4.3 Cluster 3**

In [35]:
chicago_clustered.loc[chicago_clustered['Cluster Labels'] == 2, chicago_clustered.columns[[1,2] + list(range(5, chicago_clustered.shape[1]))]]

Unnamed: 0,Crime Counts,Community Names,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,2706,Rogers Park,2,Theater,Pizza Place,American Restaurant,Bar,ATM,Gas Station,Mexican Restaurant,Sandwich Place,Café,Donut Shop
1,646,West Ridge,2,Indian Restaurant,Pakistani Restaurant,Grocery Store,Fast Food Restaurant,Football Stadium,Donut Shop,Market,Fruit & Vegetable Store,Clothing Store,Juice Bar
2,723,Uptown,2,Pizza Place,Sandwich Place,Sushi Restaurant,Asian Restaurant,Vietnamese Restaurant,Bus Station,Chinese Restaurant,Coffee Shop,Theater,Mexican Restaurant
3,3082,Lincoln Square,2,Bus Station,Bar,Café,Convenience Store,Pizza Place,Korean Restaurant,Sandwich Place,Liquor Store,Food & Drink Shop,Karaoke Bar
4,1505,North Center,2,Bar,Coffee Shop,Bank,Mobile Phone Shop,Boutique,American Restaurant,Dive Bar,Pub,Pharmacy,Yoga Studio
5,2017,Lake View,2,Café,Japanese Restaurant,Bakery,Gym / Fitness Center,Pizza Place,Bagel Shop,Coffee Shop,Performing Arts Venue,Clothing Store,Sports Bar
6,2718,Lincoln Park,2,Pizza Place,Sandwich Place,Coffee Shop,Bar,Taco Place,Breakfast Spot,Fast Food Restaurant,Art Gallery,Mexican Restaurant,American Restaurant
7,1971,Near North Side,2,Gym / Fitness Center,Gym,Restaurant,American Restaurant,Coffee Shop,Cycle Studio,Breakfast Spot,Sandwich Place,Café,Pub
12,989,North Park,2,Coffee Shop,Pharmacy,Theater,Video Store,Bar,Food Truck,Sushi Restaurant,Fried Chicken Joint,Supermarket,Park
13,2395,Albany Park,2,Liquor Store,Asian Restaurant,Cosmetics Shop,Dance Studio,Park,Paper / Office Supplies Store,Bus Stop,Light Rail Station,Bar,Bank


The remaining 43 communities all belong to Cluster 3 according to k-means algorithm. By first look, it is a bit chaotic, but if we look closely, we could still find some common features in these communities. Most of them give us the impression of a fast-paced city life. For example, the most common venues include coffee shops, fast food restaurants, pizza places, bus stations, etc. They all remind me of the hustle bustle of downtown areas and CBDs. So, I don't think we'll recommend the communities in Cluster 3 to our friend.

## 5. Discussion

Based on the observations above, **we'll recommend a community from Cluster 2**. These communities perfectly meet his requirements of safety and relaxation.

But exactly which one to choose is still up to the friend's preference. Say, if he thinks safety is the first priority, then Forest Glen with the least crime records should be his choice. Or if he works in south Chicago and would like to settle his family close by, then he should choose from Roseland, Hegewisch or Beverly.
But as we discussed in the previous part, Cluster 1 which only includes 1 community called Edgewater could also be an option. Especially if the friend want to live near lake.

## 6. Conclusion

In this project, we firstly narrow down our choices by select those safe communities based on crime records. Secondly, we use the Foursquare API to explore these communities and then use k-means algorithm to group the communities into clusters based on the feature of most common venues.

We obtain 3 clusters and according to our observations, we recommend Cluster 2 to our friend, which meet his requirements the most.

But, as we can see, there are still some issues regarding to this segmentation. For instance, Cluster 1 and 2 seems alike in terms of their most common venues and within Cluster 3 we can see that there are some communities more similar to Cluster 2. I think this is because the venue categories in Foursquare do not always correctly reflect our intension.

So maybe for future improvement, we should explicitly hand-pick the venue categories that we need to solve the problem and try different clustering algorithms as well to get a better result.