# The Battle of Neighborhoods -- Chicago
---

# Part I -- Report

---
## Introduction/Business Problem
- Discuss the business problem and who would be interested in this project
- Clearly define a problem or an idea of your choice, where you would need to leverage the Foursquare location data to solve or execute. Remember that data science problems always target an audience and are meant to help a group of stakeholders solve a problem, so make sure that you explicitly describe your audience and why they would care about your problem.
- This submission will eventually become your Introduction/Business Problem section in your final report. So I recommend that you push the report (having your Introduction/Business Problem section only for now) to your Github repository and submit a link to it.

### ***Background***
My friend Jonathan is going to move home to Chicago, as a newer of this City, he is looking for my advice to find a location of his new home.
- Jonathan has a son, 15 years ago, will take a Public High School, it should be perfect if the school and home are nearby, so he can walk to school himself every early morning.  The boy likes sports, it will be nice to have athletic facilities '**Gym|Sport|ball**' near the school and home.
- Jonathan's office is at downtown, quite convenient for Bus|Train|Metro, but difficult to find a car parking slot in the day, so it should be better for Jonathan to take public transportation to get to his office, must have '**Bus|Train|Metro|.. Station**' near home.
- Jonathan's wife need to go shopping every afternoon, life is easy if there are some '**Market**' nearby
- The whole family like '**Park**' to have fun together in spare time of weekend

### ***Problem***
How can I give Jonathan my advice (a short list) of his new home place ?
It should have Public High school, Gym, Stations, Market and Park all around.

### ***Ideas of solution***
1. There are limited Public High Schools (about 180) in Chicago city, I can find their location data from government website
2. Use Foursquare to find venues around the schools, filter if they includes in Gym/Stations/Market/Park all together.
3. Finally, I will show my recommedations on the map to Jonathan, let him choose according to his office place and other hobbies interests

---
## Data
- Describe the data that will be used to solve the problem and the source of the data
- Describe the data that you will be using to solve the problem or execute your idea. Remember that you will need to use the Foursquare location data to solve the problem or execute your idea. You can absolutely use other datasets in combination with the Foursquare location data. So make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data.

1. Download Chicago Public School Locations (2017-2018) from City website

    You can also manually download from:
>  https://data.cityofchicago.org/Education/Chicago-Public-Schools-School-Locations-SY1718/4g38-vs8v

    or directly download with Python code:
> pd.read_csv ('https://data.cityofchicago.org/api/views/d2h8-2upd/rows.csv?accessType=DOWNLOAD')


2. Use Foursquare location data, call RESTful APIs, get all venues around schools, consolidate as envrionment
> https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(...)

3. Clean, transforming, format data as needed

4. Visualize data on Chicago map with folium

---
## Methodology
- Represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, and what machine learnings were used and why.

### ***Main findings***
1. There are total 661 Schools which includes 181 High Schools in Chicago
2. According to Foursquare.com, there are 5 High Schools meet Jonathan's requirements
    - Jonathan need to take public transportation to get to the office, must have 'Bus|Train|Metro|.. Station' nearby
    - Jonathan's son like sports, it will be nice to have athletic facilities 'Gym|Sport|ball' near the school and home
    - Jonathan's wife need to go shopping every afternoon, life is easy if there are some 'Markets' nearby
    - The whole family like 'Park' to have fun together in spare time of weekend
3. The 5 recommended High Schools can divides into 3 categories (by clustering)

### ***Data exploration***
- Read Chicago Public School Locations (2017-2018) from City website
- Only interested in High Schools
- Get nearby venues for all public high schools
- Take a look what we got from Foursquare.com
- Have a look what Venue Category contains
- Curious about how many of them around each Public High School
- Consolidate the nearby venues as environment of Public High School
- Select Public High School with conditions

### ***Data Analysis***
- Get nearby venues for recommended public high schools
- One hot encoding
- Group rows by School Name, and by taking the mean of the frequency of occurrence of each category
- Check each School along with the top 5 most common venues
- Check School along with most common venues
- Check the top 10 venues for each school

### ***Data Clustering***
- Run k-means to cluster the Recommend Public High Schools
- Add cluster to Recommended High Schools
- Visualize recommended Public High Schools on Chigago map (by clustering)

### ***Machine Learning***
- Use clustering to classify recommended Public High Schools into 3 categories (Sports, Foods, Life), easy for Jonathan to choose according to his life style.

---
## Results 

### ***Here is final recommendation***

In [254]:
recommend_merged

Unnamed: 0,School Name,Address,CommArea,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,HYDE PARK HS,6220 S STONY ISLAND AVE,WOODLAWN,41.782257,-87.586615,2,Coffee Shop,Athletics & Sports,Bus Station,Market,Food & Drink Shop,Baseball Field,Soccer Field,BBQ Joint,Food,Park
1,JONES HS,700 S STATE ST,LOOP,41.873066,-87.627675,1,Pizza Place,Gym / Fitness Center,Coffee Shop,American Restaurant,Sandwich Place,Hotel,Bookstore,Gym,Convenience Store,Boutique
2,KENWOOD HS,5015 S BLACKSTONE AVE,KENWOOD,41.803772,-87.590421,1,Cosmetics Shop,Pizza Place,Mexican Restaurant,Sandwich Place,Bus Station,American Restaurant,Caribbean Restaurant,Ice Cream Shop,Mobile Phone Shop,Hobby Shop
3,NOBLE - MUCHIN HS,1 N STATE ST,LOOP,41.88274,-87.626338,1,Theater,American Restaurant,Coffee Shop,Bakery,Italian Restaurant,Hotel,Gastropub,Snack Place,Department Store,Sandwich Place
4,PERSPECTIVES - JOSLIN HS,1930 S ARCHER AVE,NEAR SOUTH SIDE,41.855999,-87.628531,0,Chinese Restaurant,Grocery Store,Train Station,Asian Restaurant,Rental Car Location,Rock Club,Korean Restaurant,Plaza,Bar,Tea Room


---
## Discussion 
- Discuss the results
- Discuss any observations you noted and any recommendations you can make based on the results.

1) Clustering the final recommendation into 3 categories
    - Cluster 2. More sports
    - Cluster 1. More foods
    - Cluster 0. Balance of life
2) If Jonathan can take loose criteria, he can got larger scope of recommendation
    - eg: his son can take bus to the school
    - eg: it not necessary to have Market near home, maybe a near Gerocery can be accepted as well

## Conclusion 
- Conclude the report

Jonathan can settle down his new home near any recommended High Schools, the neighborhood could be more sports, foods, or life balance.

---
---
# Part II -- Code

### Import libraries

In [186]:
# =============================================================================
# Import libraries
# =============================================================================

import numpy as np 
import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

!conda install -c conda-forge beautifulsoup4 --yes
from bs4 import BeautifulSoup

!conda install -c conda-forge geocoder
import geocoder # import geocoder


print('Libraries imported.')

Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
geopy                     1.17.0                     py_0    conda-forge
Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
folium                    0.5.0                      py_0    conda-forge
Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
beautifulsoup4            4.6.3                    py35_0    conda-forge
Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
geocoder                  1.38.1                     py_0    

## Read Chicago Public School Locations (2017-2018) from City website
You can also manually download from https://data.cityofchicago.org/Education/Chicago-Public-Schools-School-Locations-SY1718/4g38-vs8v

In [162]:
df_schools = pd.read_csv ('https://data.cityofchicago.org/api/views/d2h8-2upd/rows.csv?accessType=DOWNLOAD')
df_schools.head()

Unnamed: 0,School_ID,Network,Short_Name,the_geom,Address,Zip,Governance,Grade_Cat,Grades,Lat,Long,Phone,GeoNetwork,COMMAREA,WARD_15,ALD_15
0,400009,Charter,GLOBAL CITIZENSHIP,POINT (-87.74009743581296 41.807578506885676),4647 W 47TH ST,60632,Charter,ES,"K, 1, 2, 3, 4, 5, 6, 7, 8",41.807579,-87.740097,1(773)582-1100,8,GARFIELD RIDGE,14,Edward M. Burke
1,400010,Charter,ACE TECH HS,POINT (-87.62584903655835 41.79612150956602),5410 S STATE ST,60609,Charter,HS,"9, 10, 11, 12",41.796122,-87.625849,1(773)548-8705,9,WASHINGTON PARK,3,Patricia R. Dowell
2,400011,Charter,LOCKE A,POINT (-87.70523452593643 41.87724835219521),3141 W JACKSON BLVD,60612,Charter,ES,"K, 1, 2, 3, 4, 5, 6, 7, 8",41.877248,-87.705235,1(773)265-7232,5,EAST GARFIELD PARK,28,Jason C. Ervin
3,400013,Charter,ASPIRA - EARLY COLLEGE HS,POINT (-87.72709565849537 41.93729828444937),3986 W BARRY AVE,60618,Charter,HS,"9, 10, 11, 12",41.937298,-87.727096,1(773)252-0970 x137,4,AVONDALE,30,Ariel E. Reboyras
4,400017,Charter,ASPIRA - HAUGAN,POINT (-87.72182466520648 41.966405667183686),3729 W LELAND AVE,60625,Charter,ES,"6, 7, 8",41.966406,-87.721825,1(773)252-0970,1,ALBANY PARK,35,Rey Colon


In [163]:
df_schools.shape

(661, 16)

## Only interested in High Schools
- Filter the dataframe, only keep the rows which Grade_Cat=='HS'
- Cut the dataframe, only keep the school ID, name, address, Lat, Long, CommArea

In [164]:
df_high_schools = df_schools[df_schools['Grade_Cat']=='HS'].reset_index(drop=True)
df_high_schools = df_high_schools[['School_ID', 'Short_Name', 'Address', 'COMMAREA', 'Lat', 'Long']]
df_high_schools.rename(index=str, columns={'School_ID':'SchoolID', 'Short_Name':'ShortName', 'COMMAREA':'CommArea', 'Lat':'Latitude', 'Long':'Longitude'}, inplace=True)
df_high_schools.head()

Unnamed: 0,SchoolID,ShortName,Address,CommArea,Latitude,Longitude
0,400010,ACE TECH HS,5410 S STATE ST,WASHINGTON PARK,41.796122,-87.625849
1,400013,ASPIRA - EARLY COLLEGE HS,3986 W BARRY AVE,AVONDALE,41.937298,-87.727096
2,400022,CHIARTS HS,2714 W AUGUSTA BLVD,WEST TOWN,41.899377,-87.694945
3,400032,CICS - ELLISON HS,1817 W 80TH ST,AUBURN GRESHAM,41.748382,-87.66898
4,400033,CICS - LONGWOOD,1309 W 95TH ST,WASHINGTON HEIGHTS,41.721221,-87.655768


In [165]:
df_high_schools.shape

(181, 6)

## Have an overview of all Public High School in Chicago
---
### Get the geographical coordinates of Chicago

In [166]:
address = 'Chicago, IL'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Chigago are {}, {}.'.format(latitude, longitude))



The geograpical coordinate of Chigago are 41.8755616, -87.6244212.


### Visualize Chicago public high schools in the map

In [167]:
# create map of Chigago using latitude and longitude values
map_chicago = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df_high_schools['Latitude'], df_high_schools['Longitude'], df_high_schools['ShortName']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_chicago)  
    
map_chicago

## Explore the nearby venues of Public High Schools
### Get nearby venues for all public high schools

In [168]:
# The code was removed by Watson Studio for sharing.

In [169]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['School Name', 
                  'School Latitude', 
                  'School Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [170]:
LIMIT = 100

chicago_venues = getNearbyVenues(names=df_high_schools['ShortName'],
                                latitudes=df_high_schools['Latitude'],
                                longitudes=df_high_schools['Longitude']
                                )



ACE TECH HS
ASPIRA - EARLY COLLEGE HS
CHIARTS HS
CICS - ELLISON HS
CICS - LONGWOOD
CICS - NORTHTOWN HS
CHICAGO MATH & SCIENCE HS
CHICAGO VIRTUAL
NOBLE - NOBLE HS
NOBLE - COMER
NOBLE - GOLDER HS
NOBLE - PRITZKER HS
NOBLE - RAUNER HS
NOBLE - ROWE CLARK HS
NOBLE - UIC HS
NORTH LAWNDALE - CHRISTIANA HS
NORTH LAWNDALE - COLLINS HS
PERSPECTIVES - LEADERSHIP HS
PERSPECTIVES - TECH HS
PERSPECTIVES - JOSLIN HS
PERSPECTIVES - MATH & SCI HS
U OF C - WOODLAWN HS
ACERO - GARCIA HS
URBAN PREP - ENGLEWOOD HS
YOUNG WOMENS HS
CHICAGO TECH HS
EPIC HS
NOBLE - BULLS HS
NOBLE - MUCHIN HS
URBAN PREP - WEST HS
INSTITUTO - HEALTH
URBAN PREP - BRONZEVILLE HS
NOBLE - JOHNSON HS
CICS - CHICAGOQUEST HS
NOBLE - HANSBERRY HS
NOBLE - DRW HS
LEGAL PREP HS
YCCS - SCHOLASTIC ACHIEVEMENT
YCCS - MCKINLEY
YCCS - ASPIRA PANTOJA
YCCS - ASSOCIATION HOUSE
YCCS - AUSTIN CAREER
YCCS - CCA ACADEMY
YCCS - HOUSTON
YCCS - YOUTH DEVELOPMENT
YCCS - CAMPOS
YCCS - INNOVATIONS
YCCS - ADDAMS
YCCS - LATINO YOUTH
YCCS - OLIVE HARVEY
LITTLE

### Take a look what we got from Foursquare.com

In [171]:
print(chicago_venues.shape)
chicago_venues.head()

(3198, 7)


Unnamed: 0,School Name,School Latitude,School Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,ACE TECH HS,41.796122,-87.625849,Wayne's Bar-B-Que and Cajun,41.795963,-87.630114,BBQ Joint
1,ACE TECH HS,41.796122,-87.625849,Dunkin' Donuts,41.794769,-87.626244,Donut Shop
2,ACE TECH HS,41.796122,-87.625849,Odyssey 2 Lounge,41.794625,-87.620419,Lounge
3,ACE TECH HS,41.796122,-87.625849,Ms. Biscuit,41.795598,-87.623768,Breakfast Spot
4,ACE TECH HS,41.796122,-87.625849,Ms. Lee's Good Food,41.794482,-87.620566,Fast Food Restaurant


### Good, and have a look what Venue Category contains

In [172]:
chicago_venues['Venue Category'].unique()

array(['BBQ Joint', 'Donut Shop', 'Lounge', 'Breakfast Spot',
       'Fast Food Restaurant', 'Shoe Store', 'Discount Store',
       'Sandwich Place', 'Mobile Phone Shop', 'Video Game Store',
       'Tennis Court', 'Pizza Place', 'Intersection', "Women's Store",
       'Cosmetics Shop', 'Fried Chicken Joint', 'Chinese Restaurant',
       'Train Station', 'Metro Station', 'Bus Station', 'Bus Line',
       'Gym / Fitness Center', 'Tattoo Parlor', 'American Restaurant',
       'Wings Joint', 'Salad Place', 'Taco Place', 'Bank',
       'Polish Restaurant', 'Gym', 'Mexican Restaurant', 'Café',
       'Electronics Store', 'Bar', 'Pharmacy', 'Ice Cream Shop',
       'Convenience Store', 'Dance Studio', 'IT Services',
       'Furniture / Home Store', 'Storage Facility', 'Salon / Barbershop',
       'Boutique', 'New American Restaurant', 'Coffee Shop',
       'Dessert Shop', 'Flower Shop', 'Bistro', 'Restaurant',
       'Deli / Bodega', 'Caribbean Restaurant',
       'Southern / Soul Food Restau

### And, curious about how many of them around each Public High School

In [173]:
chicago_venues.groupby(['School Name', 'Venue Category']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,School Latitude,School Longitude,Venue,Venue Latitude,Venue Longitude
School Name,Venue Category,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ACE TECH HS,BBQ Joint,1,1,1,1,1
ACE TECH HS,Breakfast Spot,1,1,1,1,1
ACE TECH HS,Bus Line,1,1,1,1,1
ACE TECH HS,Bus Station,1,1,1,1,1
ACE TECH HS,Chinese Restaurant,1,1,1,1,1
ACE TECH HS,Cosmetics Shop,1,1,1,1,1
ACE TECH HS,Discount Store,1,1,1,1,1
ACE TECH HS,Donut Shop,1,1,1,1,1
ACE TECH HS,Fast Food Restaurant,5,5,5,5,5
ACE TECH HS,Fried Chicken Joint,1,1,1,1,1


In [174]:
# The code was removed by Watson Studio for sharing.

In [175]:
# The code was removed by Watson Studio for sharing.

In [176]:
# The code was removed by Watson Studio for sharing.

(3198, 8)


Unnamed: 0.1,Unnamed: 0,School Name,School Latitude,School Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,0,ACE TECH HS,41.796122,-87.625849,Wayne's Bar-B-Que and Cajun,41.795963,-87.630114,BBQ Joint
1,1,ACE TECH HS,41.796122,-87.625849,Dunkin' Donuts,41.794769,-87.626244,Donut Shop
2,2,ACE TECH HS,41.796122,-87.625849,Odyssey 2 Lounge,41.794625,-87.620419,Lounge
3,3,ACE TECH HS,41.796122,-87.625849,Ms. Biscuit,41.795598,-87.623768,Breakfast Spot
4,4,ACE TECH HS,41.796122,-87.625849,Ms. Lee's Good Food,41.794482,-87.620566,Fast Food Restaurant


In [177]:
# The code was removed by Watson Studio for sharing.

### Consolidate the nearby venues as environment of Public High School

In [178]:
df_high_schools_env = chicago_venues.groupby(['School Name'])['Venue Category'].apply(', '.join).reset_index()
print(df_high_schools_env.shape)
df_high_schools_env

(181, 2)


Unnamed: 0,School Name,Venue Category
0,ACE TECH HS,"BBQ Joint, Donut Shop, Lounge, Breakfast Spot,..."
1,ACERO - GARCIA HS,"Bar, Mexican Restaurant, Seafood Restaurant, F..."
2,ACERO - SOTO HS,"Grocery Store, Gym / Fitness Center, Mexican R..."
3,AIR FORCE HS,"Baseball Stadium, Lounge, Baseball Stadium, Sp..."
4,ALCOTT HS,"Baseball Field, Playground, Park, Dog Run, Mus..."
5,AMUNDSEN HS,"Thai Restaurant, Park, Liquor Store, Restauran..."
6,ASPIRA - BUSINESS & FINANCE HS,"Eastern European Restaurant, Mexican Restauran..."
7,ASPIRA - EARLY COLLEGE HS,"Gym / Fitness Center, Tattoo Parlor, American ..."
8,AUSTIN CCA HS,"Park, Park, Gym"
9,BACK OF THE YARDS HS,"Gym / Fitness Center, Coffee Shop, Grocery Sto..."


## Select Public High School with conditions
- Jonathan need to take public transportation to get to the office, must have 'Bus|Train|Metro|.. Station' nearby
- Jonathan's son like sports, it will be nice to have athletic facilities 'Gym|Sport|ball' near the school and home
- Jonathan's wife need to go shopping every afternoon, life is easy if there are some 'Markets' nearby
- The whole family like 'Park' to have fun together in spare time of weekend

In [241]:
filter = (df_high_schools_env['Venue Category'].str.contains('Gym') | \
            df_high_schools_env['Venue Category'].str.contains('Sport') | \
            df_high_schools_env['Venue Category'].str.contains('ball')) & \
            df_high_schools_env['Venue Category'].str.contains('Station') & \
            df_high_schools_env['Venue Category'].str.contains('Park') & \
            df_high_schools_env['Venue Category'].str.contains('Market')
df_recommend_high_schools = df_high_schools_env.loc[filter].reset_index().drop('index', 1)
df_recommend_high_schools

Unnamed: 0,School Name,Venue Category
0,HYDE PARK HS,"Gym, Park, Coffee Shop, Market, Food & Drink S..."
1,JONES HS,"Jazz Club, Music Venue, Pub, Pizza Place, Book..."
2,KENWOOD HS,"Art Gallery, Grocery Store, Arts & Crafts Stor..."
3,NOBLE - MUCHIN HS,"Park, Hotel, Museum, Coffee Shop, Gastropub, P..."
4,PERSPECTIVES - JOSLIN HS,"Thai Restaurant, Italian Restaurant, Asian Res..."


## Visualize the recommended Public High Schools on the map
---
### Joint back Latitude/Longitude to recommend school list

In [242]:
df_recommend_high_schools_final = pd.merge(df_recommend_high_schools, df_high_schools, left_on='School Name', right_on='ShortName')
df_recommend_high_schools_final

Unnamed: 0,School Name,Venue Category,SchoolID,ShortName,Address,CommArea,Latitude,Longitude
0,HYDE PARK HS,"Gym, Park, Coffee Shop, Market, Food & Drink S...",609713,HYDE PARK HS,6220 S STONY ISLAND AVE,WOODLAWN,41.782257,-87.586615
1,JONES HS,"Jazz Club, Music Venue, Pub, Pizza Place, Book...",609678,JONES HS,700 S STATE ST,LOOP,41.873066,-87.627675
2,KENWOOD HS,"Art Gallery, Grocery Store, Arts & Crafts Stor...",609746,KENWOOD HS,5015 S BLACKSTONE AVE,KENWOOD,41.803772,-87.590421
3,NOBLE - MUCHIN HS,"Park, Hotel, Museum, Coffee Shop, Gastropub, P...",400098,NOBLE - MUCHIN HS,1 N STATE ST,LOOP,41.88274,-87.626338
4,PERSPECTIVES - JOSLIN HS,"Thai Restaurant, Italian Restaurant, Asian Res...",400064,PERSPECTIVES - JOSLIN HS,1930 S ARCHER AVE,NEAR SOUTH SIDE,41.855999,-87.628531


In [243]:
# df_recommend_high_schools_final = df_recommend_high_schools_final[['School Name', 'Venue Category', 'Address', 'CommArea', 'Latitude', 'Longitude']]
# df_recommend_high_schools_final.head()
# df_recommend_high_schools.set_index('School Name')
# df_recommend_high_schools.join(df_high_schools['ShortName'])
# df_recommend_high_schools = df_recommend_high_schools.join(df_high_schools.set_index('ShortName'))
# df_recommend_high_schools

### Visualize recommended Public High Schools on Chigago map

In [244]:
# create map of Chigago using latitude and longitude values
map_chicago = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df_recommend_high_schools_final['Latitude'], df_recommend_high_schools_final['Longitude'], df_recommend_high_schools_final['School Name']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_chicago)  
    
map_chicago

## Analyze Each recommended Public High School
---
### Get nearby venues for recommended public high schools

In [245]:
recommend_nearby_venues = pd.merge(chicago_venues, df_recommend_high_schools_final, left_on='School Name', right_on='School Name')
recommend_nearby_venues = recommend_nearby_venues.reset_index().drop(['index', 'Unnamed: 0', 'Venue Category_y', 'SchoolID', 'ShortName', 'School Latitude', 'School Longitude', 'Venue Latitude', 'Venue Longitude'], 1)
recommend_nearby_venues.rename(index=str, columns={'Venue Category_x':'Venue Category'}, inplace=True)

print(recommend_nearby_venues.shape)
recommend_nearby_venues.head()

(322, 7)


Unnamed: 0,School Name,Venue,Venue Category,Address,CommArea,Latitude,Longitude
0,PERSPECTIVES - JOSLIN HS,Opart Thai House,Thai Restaurant,1930 S ARCHER AVE,NEAR SOUTH SIDE,41.855999,-87.628531
1,PERSPECTIVES - JOSLIN HS,Cafe Bionda,Italian Restaurant,1930 S ARCHER AVE,NEAR SOUTH SIDE,41.855999,-87.628531
2,PERSPECTIVES - JOSLIN HS,Qing Xiang Yuan Dumpling,Asian Restaurant,1930 S ARCHER AVE,NEAR SOUTH SIDE,41.855999,-87.628531
3,PERSPECTIVES - JOSLIN HS,Reggie's Rock Club,Rock Club,1930 S ARCHER AVE,NEAR SOUTH SIDE,41.855999,-87.628531
4,PERSPECTIVES - JOSLIN HS,鮮芋仙 Meet Fresh,Dessert Shop,1930 S ARCHER AVE,NEAR SOUTH SIDE,41.855999,-87.628531


### One hot encoding

In [246]:
# one hot encoding
recommend_onehot = pd.get_dummies(recommend_nearby_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
recommend_onehot['School Name'] = recommend_nearby_venues['School Name'] 

# move last column ('School Name') to the first column
fixed_columns = [recommend_onehot.columns[-1]] + list(recommend_onehot.columns[:-1])
recommend_onehot = recommend_onehot[fixed_columns]

print(recommend_onehot.shape)
recommend_onehot.head()

(322, 137)


Unnamed: 0,School Name,ATM,African Restaurant,American Restaurant,Amphitheater,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Beer Garden,Big Box Store,Bookstore,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Bus Line,Bus Station,Café,Cajun / Creole Restaurant,Candy Store,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Coffee Shop,College Residence Hall,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Cuban Restaurant,Dance Studio,Dentist's Office,Department Store,Dessert Shop,Dim Sum Restaurant,Discount Store,Dog Run,Donut Shop,Drugstore,Farmers Market,Fast Food Restaurant,Food,Food & Drink Shop,Food Court,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Gastropub,General Entertainment,Gluten-free Restaurant,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Health Food Store,Historic Site,Hobby Shop,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Korean Restaurant,Liquor Store,Lounge,Market,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Movie Theater,Moving Target,Museum,Music Venue,New American Restaurant,Nightclub,Outdoor Sculpture,Park,Performing Arts Venue,Pet Service,Pizza Place,Plaza,Poke Place,Pub,Public Art,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Restaurant,Rock Club,Salad Place,Sandwich Place,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Snack Place,Soccer Field,Spa,Speakeasy,Sports Bar,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Toy / Game Store,Train Station,Vegetarian / Vegan Restaurant,Wings Joint,Women's Store,Yoga Studio
0,PERSPECTIVES - JOSLIN HS,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
1,PERSPECTIVES - JOSLIN HS,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,PERSPECTIVES - JOSLIN HS,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,PERSPECTIVES - JOSLIN HS,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,PERSPECTIVES - JOSLIN HS,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


### Group rows by School Name, and by taking the mean of the frequency of occurrence of each category

In [247]:
recommend_grouped = recommend_onehot.groupby('School Name').mean().reset_index()
print(recommend_grouped.shape)
recommend_grouped.head()

(5, 137)


Unnamed: 0,School Name,ATM,African Restaurant,American Restaurant,Amphitheater,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Beer Garden,Big Box Store,Bookstore,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Bus Line,Bus Station,Café,Cajun / Creole Restaurant,Candy Store,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Coffee Shop,College Residence Hall,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Cuban Restaurant,Dance Studio,Dentist's Office,Department Store,Dessert Shop,Dim Sum Restaurant,Discount Store,Dog Run,Donut Shop,Drugstore,Farmers Market,Fast Food Restaurant,Food,Food & Drink Shop,Food Court,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Gastropub,General Entertainment,Gluten-free Restaurant,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Health Food Store,Historic Site,Hobby Shop,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Korean Restaurant,Liquor Store,Lounge,Market,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Movie Theater,Moving Target,Museum,Music Venue,New American Restaurant,Nightclub,Outdoor Sculpture,Park,Performing Arts Venue,Pet Service,Pizza Place,Plaza,Poke Place,Pub,Public Art,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Restaurant,Rock Club,Salad Place,Sandwich Place,Sculpture Garden,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Snack Place,Soccer Field,Spa,Speakeasy,Sports Bar,Sushi Restaurant,Taco Place,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Toy / Game Store,Train Station,Vegetarian / Vegan Restaurant,Wings Joint,Women's Store,Yoga Studio
0,HYDE PARK HS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.083333,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0
1,JONES HS,0.011905,0.0,0.035714,0.0,0.0,0.011905,0.011905,0.0,0.0,0.0,0.0,0.011905,0.011905,0.011905,0.0,0.0,0.0,0.02381,0.02381,0.0,0.0,0.011905,0.011905,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.011905,0.0,0.0,0.011905,0.02381,0.011905,0.011905,0.011905,0.011905,0.0,0.0,0.0,0.0,0.011905,0.02381,0.0,0.011905,0.0,0.0,0.0,0.0,0.011905,0.011905,0.011905,0.0,0.011905,0.011905,0.0,0.0,0.0,0.02381,0.047619,0.0,0.0,0.0,0.011905,0.011905,0.035714,0.011905,0.0,0.011905,0.02381,0.0,0.011905,0.011905,0.011905,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,0.011905,0.011905,0.011905,0.047619,0.011905,0.0,0.011905,0.011905,0.0,0.0,0.0,0.0,0.011905,0.0,0.0,0.035714,0.0,0.0,0.011905,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,0.011905,0.011905,0.011905,0.011905,0.02381,0.0,0.011905,0.02381,0.0,0.011905,0.0,0.0,0.0,0.011905
2,KENWOOD HS,0.0,0.019231,0.038462,0.0,0.019231,0.0,0.019231,0.019231,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.019231,0.0,0.019231,0.019231,0.0,0.019231,0.019231,0.0,0.038462,0.0,0.0,0.0,0.038462,0.0,0.0,0.019231,0.0,0.019231,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.019231,0.0,0.019231,0.019231,0.0,0.019231,0.0,0.019231,0.019231,0.0,0.0,0.038462,0.019231,0.0,0.019231,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.038462,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.057692,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019231,0.019231,0.0,0.0,0.0
3,NOBLE - MUCHIN HS,0.0,0.01,0.05,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.04,0.0,0.0,0.01,0.01,0.0,0.02,0.0,0.02,0.0,0.02,0.01,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.03,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.02,0.01,0.01,0.0,0.02,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.02,0.02,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.02,0.01,0.0,0.0,0.02,0.01,0.01,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.05,0.01,0.01,0.01,0.0,0.01,0.0
4,PERSPECTIVES - JOSLIN HS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040541,0.0,0.013514,0.0,0.0,0.013514,0.027027,0.013514,0.013514,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.013514,0.013514,0.013514,0.0,0.013514,0.013514,0.0,0.148649,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.013514,0.013514,0.0,0.0,0.0,0.0,0.013514,0.013514,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054054,0.013514,0.0,0.0,0.013514,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.013514,0.0,0.0,0.0,0.013514,0.027027,0.0,0.0,0.0,0.0,0.013514,0.0,0.0,0.013514,0.013514,0.0,0.0,0.013514,0.013514,0.0,0.027027,0.0,0.013514,0.013514,0.027027,0.0,0.0,0.0,0.013514,0.013514,0.013514,0.027027,0.0,0.027027,0.0,0.013514,0.013514,0.013514,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013514,0.0,0.013514,0.0,0.0,0.027027,0.013514,0.0,0.0,0.040541,0.0,0.013514,0.0,0.0


### Check each School along with the top 5 most common venues

In [248]:
num_top_venues = 5

for school in recommend_grouped['School Name']:
    print("----"+school+"----")
    temp = recommend_grouped[recommend_grouped['School Name'] == school].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----HYDE PARK HS----
               venue  freq
0       Soccer Field  0.08
1     Baseball Field  0.08
2        Bus Station  0.08
3  Food & Drink Shop  0.08
4               Food  0.08


----JONES HS----
                  venue  freq
0           Pizza Place  0.05
1           Coffee Shop  0.05
2  Gym / Fitness Center  0.05
3        Sandwich Place  0.04
4   American Restaurant  0.04


----KENWOOD HS----
                  venue  freq
0        Cosmetics Shop  0.08
1           Pizza Place  0.06
2        Sandwich Place  0.04
3  Caribbean Restaurant  0.04
4        Ice Cream Shop  0.04


----NOBLE - MUCHIN HS----
                 venue  freq
0              Theater  0.05
1  American Restaurant  0.05
2          Coffee Shop  0.04
3               Bakery  0.04
4          Snack Place  0.03


----PERSPECTIVES - JOSLIN HS----
                 venue  freq
0   Chinese Restaurant  0.15
1        Grocery Store  0.05
2        Train Station  0.04
3     Asian Restaurant  0.04
4  Rental Car Location  0.03




### Check School along with most common venues

In [249]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

### Check the top 10 venues for each school

In [250]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['School Name']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
recommend_venues_sorted = pd.DataFrame(columns=columns)
recommend_venues_sorted['School Name'] = recommend_grouped['School Name']

for ind in np.arange(recommend_grouped.shape[0]):
    recommend_venues_sorted.iloc[ind, 1:] = return_most_common_venues(recommend_grouped.iloc[ind, :], num_top_venues)

print(recommend_venues_sorted.shape)
recommend_venues_sorted.head()

(5, 11)


Unnamed: 0,School Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,HYDE PARK HS,Coffee Shop,Athletics & Sports,Bus Station,Market,Food & Drink Shop,Baseball Field,Soccer Field,BBQ Joint,Food,Park
1,JONES HS,Pizza Place,Gym / Fitness Center,Coffee Shop,American Restaurant,Sandwich Place,Hotel,Bookstore,Gym,Convenience Store,Boutique
2,KENWOOD HS,Cosmetics Shop,Pizza Place,Mexican Restaurant,Sandwich Place,Bus Station,American Restaurant,Caribbean Restaurant,Ice Cream Shop,Mobile Phone Shop,Hobby Shop
3,NOBLE - MUCHIN HS,Theater,American Restaurant,Coffee Shop,Bakery,Italian Restaurant,Hotel,Gastropub,Snack Place,Department Store,Sandwich Place
4,PERSPECTIVES - JOSLIN HS,Chinese Restaurant,Grocery Store,Train Station,Asian Restaurant,Rental Car Location,Rock Club,Korean Restaurant,Plaza,Bar,Tea Room


## Clustering
---
### Run k-means to cluster the Recommend Public High Schools

In [251]:
# set number of clusters
kclusters = 3

recommend_grouped_clustering = recommend_grouped.drop('School Name', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(recommend_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([2, 1, 1, 1, 0], dtype=int32)

### Add cluster to Recommended High Schools

In [252]:
recommend_merged = df_recommend_high_schools_final
recommend_merged['Cluster Labels'] = kmeans.labels_
recommend_merged = recommend_merged.join(recommend_venues_sorted.set_index('School Name'), on='School Name')
recommend_merged = recommend_merged.reset_index().drop(['index', 'Venue Category', 'SchoolID', 'ShortName'], 1)

recommend_merged

Unnamed: 0,School Name,Address,CommArea,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,HYDE PARK HS,6220 S STONY ISLAND AVE,WOODLAWN,41.782257,-87.586615,2,Coffee Shop,Athletics & Sports,Bus Station,Market,Food & Drink Shop,Baseball Field,Soccer Field,BBQ Joint,Food,Park
1,JONES HS,700 S STATE ST,LOOP,41.873066,-87.627675,1,Pizza Place,Gym / Fitness Center,Coffee Shop,American Restaurant,Sandwich Place,Hotel,Bookstore,Gym,Convenience Store,Boutique
2,KENWOOD HS,5015 S BLACKSTONE AVE,KENWOOD,41.803772,-87.590421,1,Cosmetics Shop,Pizza Place,Mexican Restaurant,Sandwich Place,Bus Station,American Restaurant,Caribbean Restaurant,Ice Cream Shop,Mobile Phone Shop,Hobby Shop
3,NOBLE - MUCHIN HS,1 N STATE ST,LOOP,41.88274,-87.626338,1,Theater,American Restaurant,Coffee Shop,Bakery,Italian Restaurant,Hotel,Gastropub,Snack Place,Department Store,Sandwich Place
4,PERSPECTIVES - JOSLIN HS,1930 S ARCHER AVE,NEAR SOUTH SIDE,41.855999,-87.628531,0,Chinese Restaurant,Grocery Store,Train Station,Asian Restaurant,Rental Car Location,Rock Club,Korean Restaurant,Plaza,Bar,Tea Room


### Visualize recommended Public High Schools on Chigago map (by clustering)

In [253]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(recommend_merged['Latitude'], recommend_merged['Longitude'], recommend_merged['School Name'], recommend_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters