<h1 align=center><font size = 5>Segmenting and Clustering Neighborhoods in Leuven City (part 3 by Levent Bingol)</font></h1>

## Introduction
In this project, we will be required to explore, segment, and cluster the neighborhoods in the city of Leuven. For the Leuven neighborhood data, a Wikipedia page exists that has all the information we need to explore and cluster the neighborhoods in Leuven. We will be required to scrape the a web page and wrangle the data, clean it, and then read it into a pandas dataframe so that it is in a structured format.

Once the data is in a structured format, we will do the analysis on the dataset to explore and cluster the neighborhoods in the city of Leuven.

## Table of Contents



<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>
    
 <a href="#item4">Gathering Foursquare Data </a>
    
 <a href="#item5">Explore Data/ Neighborhoods in Leuven</a>

 <a href="#item6">Analyse Data/ Each Neighborhood</a>  

 <a href="#item5">Cluster Neighborhoods</a>

 <a href="#item6">Examine Clusters</a>  
 
 
</font>
</div>

# Part 3

### Part 1 and Part 2 has been completed in the previous notebooks

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


<a id='item1'></a>

## 4.Gathering Foursquare Data

Leuven has 5 unique postal codes. 
We will continue data gathering here in order to later explore, analysis and cluster the neighborhoods in Leuven. We will  work with Neighborhoods  Leuven (City Center); Heverlee,Kessel-Lo, Wilsele, Wijgmaal. In order to segment the neighborhoods and explore them, we will essentially need a dataset that contains also the latitude and logitude coordinates of each neighborhood. 

We will now transfer the scv file for Leuven data that we had prepared in the first two parts of the project.

#### Load the data

Next, let's load the data.

In [2]:
LeuvenDF = pd.read_csv("LeuvenDFpart2.csv") 

Let's take a quick look at the data.

In [3]:
LeuvenDF.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,3000,Leuven,Leuven,50.881253,4.69299
1,3001,Leuven,Heverlee,50.851729,4.693131
2,3010,Leuven,Kessel-Lo,50.889915,4.730761
3,3012,Leuven,Wilsele,50.909536,4.713629
4,3018,Leuven,Wijgmaal,50.926428,4.700121


And make sure that the dataset has all 1 boroughs and 5 neighborhoods.

In [4]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(LeuvenDF['Borough'].unique()),
        LeuvenDF.shape[0]
    )
)

The dataframe has 1 boroughs and 5 neighborhoods.


We can in this phase use the location information of neighborhoods that we have already have in Leuven Data frame. We will use this info for getting data from Foursquare. 

We can also use  geopy library to get the location values of Leuven. In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>leuven_explorer</em>, as shown below.

In [5]:
#we can get the location data of Leuven from already prepared LeuvenDF

latitude= float(LeuvenDF[0:1]['Latitude'])
longitude= float(LeuvenDF[0:1]['Longitude'])

#alternative method is using Nominatim and geolocator
address = '3000 Leuven'

geolocator = Nominatim(user_agent="leuven_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Leuven City are {}, {}.'.format(latitude, longitude))
latitude

The geograpical coordinates of Leuven City are 50.879202, 4.7011675.


50.879202

#### Create a map of Leuven with neighborhoods superimposed on top.
In this step, we would like to step the map of Leuven to get an overall idea how can we get venues data from Fourqsuare by later determining the radius length around the center points of neighborhoods. 


In [6]:
# create map of Leuven using latitude and longitude values
map_leuven = folium.Map(location=[latitude, longitude], zoom_start=12)
neighborhoods=LeuvenDF
# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_leuven)  
    
map_leuven

Next, we are going to start utilizing the Foursquare API to get the venue information of  the neighborhoods and segment them.

#### Define Foursquare Credentials and Version

In [7]:
CLIENT_ID = 'IAYTHJO0R2Y5KEZEJX1QLME20C5CBB5TPA5NHSWT1COADQA3' # your Foursquare ID
CLIENT_SECRET = '5EYRUPZTYZE5A0R2X04KQIFISBCGLCMV3TYRU4I00MHYRKPK' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: IAYTHJO0R2Y5KEZEJX1QLME20C5CBB5TPA5NHSWT1COADQA3
CLIENT_SECRET:5EYRUPZTYZE5A0R2X04KQIFISBCGLCMV3TYRU4I00MHYRKPK


#### Let's gather the data (and explore) the first neighborhood in our dataframe.

Get the neighborhood's name. This is the Leuven city center (old town).

In [8]:
LeuvenDF.loc[0, 'Neighborhood']

'Leuven'

Get the neighborhood's latitude and longitude values.

In [9]:
neighborhood_latitude = LeuvenDF.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = LeuvenDF.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = LeuvenDF.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} City Center are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Leuven City Center are 50.8812533, 4.6929903215189.


#### Now, let's get the top 150 venues that are in Leuven City Center within a radius of 1500 meters.

First, let's create the GET request URL. Name your URL **url**.

In [10]:
# To get 150 venues that are in the 1500 radius Leuven City Center neighborhood
radius=1500
LIMIT=150
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, neighborhood_latitude, neighborhood_longitude, VERSION, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?client_id=IAYTHJO0R2Y5KEZEJX1QLME20C5CBB5TPA5NHSWT1COADQA3&client_secret=5EYRUPZTYZE5A0R2X04KQIFISBCGLCMV3TYRU4I00MHYRKPK&ll=50.8812533,4.6929903215189&v=20180605&radius=1500&limit=150'

Send the GET request and examine the resutls

In [11]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c9fcf2e9fb6b73b71f9afd5'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Leuven',
  'headerFullLocation': 'Leuven',
  'headerLocationGranularity': 'city',
  'totalResults': 240,
  'suggestedBounds': {'ne': {'lat': 50.89475331350001,
    'lng': 4.714347388166628},
   'sw': {'lat': 50.86775328649998, 'lng': 4.671633254871173}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '572aa2e4498ee378950fa659',
       'name': 'Bar Berlin',
       'location': {'address': 'Brusselsestraat 115',
        'lat': 50.8806993417003,
        'lng': 4.692574727156073,
        'labeledLatLngs': [{'label': 'display',
          'lat': 50.8806993417003,
          'lng

From the Foursquare lab in the previous module, we know that all the information is in the *items* key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [43]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [48]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', venue.'venue.location.lat', 'venue.location.lng','venue.location.distance']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(10)
#venues

Unnamed: 0,name,categories,lat,lng,distance
0,Bar Berlin,Coffee Shop,50.880699,4.692575,68
1,Dijleterrassen,Plaza,50.881423,4.69698,280
2,El Sombrero,Mexican Restaurant,50.881635,4.696864,275
3,De Frittoerist,Friterie,50.879483,4.690391,268
4,Kruidtuin,Botanical Garden,50.878124,4.69077,381
5,Bakkerij Gielis,Bakery,50.880424,4.695632,207
6,Martin's Klooster Hotel,Hotel,50.879243,4.695844,300
7,Pizzeria l'Aurora,Pizza Place,50.88126,4.690344,185
8,Villa de Frit,Friterie,50.882946,4.693647,193
9,Kaasambacht Elsen,Cheese Shop,50.880821,4.699688,472


And how many venues were returned by Foursquare?

In [41]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


<a id='item2'></a>

## 5. Get remaining data over Neighborhoods in Leuven (5 neighborhoods)

#### Let's create a function to repeat the same process to the 5 designated neighborhoods in them in Leuven

In [12]:
def getNearbyVenues(names, latitudes, longitudes, radius=1500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
         # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now lets write the code to run the above function on each neighborhood and create a new dataframe called *leuven_venues*.

In [13]:
#### Now lets write the code to run the above function on each neighborhood and create a new dataframe called *leuven_venues*.
# Leuven-Venues

leuven_venues = getNearbyVenues(names=LeuvenDF['Neighborhood'],
                                   latitudes=LeuvenDF['Latitude'],
                                   longitudes=LeuvenDF['Longitude']
                                  )
# the names of 5 Neighborhoods in these 1 boroughs are written inside the function 

Leuven
 Heverlee
 Kessel-Lo
 Wilsele
 Wijgmaal


In [14]:
#number of Neighbourhoods in designated parts of Leuven
len(LeuvenDF ['Neighborhood'].unique())

5

#### Let's check the size of the resulting dataframe

In [15]:
# Let's check the size of the resulting dataframe
print(leuven_venues.shape)
leuven_venues.tail()

(294, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
289,Wijgmaal,50.926428,4.700121,Sportlokaal,50.924938,4.686548,Bar
290,Wijgmaal,50.926428,4.700121,Lijnloperspad,50.917226,4.706344,Bike Trail
291,Wijgmaal,50.926428,4.700121,Apotheek Haegemans,50.917074,4.685834,Pharmacy
292,Wijgmaal,50.926428,4.700121,Apotheek Adriaens,50.922932,4.720527,Pharmacy
293,Wijgmaal,50.926428,4.700121,Nachtwinkel Euro,50.924335,4.721062,Convenience Store


Let's check how many venues were returned for each neighborhood

In [16]:
#Let's check how many venues were returned for each neighborhood
leuven_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Heverlee,47,47,47,47,47,47
Kessel-Lo,84,84,84,84,84,84
Wijgmaal,27,27,27,27,27,27
Wilsele,36,36,36,36,36,36
Leuven,100,100,100,100,100,100


#### Let's find out how many unique categories can be curated from all the returned venues

In [17]:
#### Let's find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(leuven_venues['Venue Category'].unique())))

There are 110 uniques categories.


<a id='item3'></a>

## 6. Analyze Each Neighborhood

In [18]:
# one hot encoding
leuven_onehot = pd.get_dummies(leuven_venues[['Venue Category']], prefix="", prefix_sep="")

leuven_onehot.head()

Unnamed: 0,Athletics & Sports,BBQ Joint,Bakery,Bar,Basketball Court,Beer Bar,Belgian Restaurant,Bike Trail,Bistro,Board Shop,Boarding House,Botanical Garden,Boutique,Bowling Alley,Brasserie,Brewery,Burger Joint,Burrito Place,Bus Station,Bus Stop,Café,Castle,Cheese Shop,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Cafeteria,Concert Hall,Construction & Landscaping,Convenience Store,Dessert Shop,Discount Store,Doner Restaurant,Electronics Store,Event Space,Farmers Market,Fast Food Restaurant,Fish Market,Flower Shop,Food Court,Forest,French Restaurant,Friterie,Frozen Yogurt Shop,Fruit & Vegetable Store,Garden Center,Gastropub,Gay Bar,Gift Shop,Gourmet Shop,Greek Restaurant,Gym,Gym / Fitness Center,Hockey Field,Hostel,Hotel,Ice Cream Shop,Indian Restaurant,Intersection,Italian Restaurant,Kebab Restaurant,Liquor Store,Market,Mexican Restaurant,Middle Eastern Restaurant,Noodle House,Optical Shop,Organic Grocery,Outdoor Gym,Outdoors & Recreation,Park,Pastry Shop,Pet Store,Pharmacy,Pie Shop,Pizza Place,Platform,Playground,Plaza,Poke Place,Pool,Pool Hall,Restaurant,Sandwich Place,Scenic Lookout,Skating Rink,Soccer Field,Social Club,Soup Place,Spa,Sporting Goods Shop,Sports Bar,Sports Club,State / Provincial Park,Steakhouse,Supermarket,Sushi Restaurant,Tapas Restaurant,Tea Room,Thai Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Wine Bar
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [21]:
# add neighborhood column back to dataframe
leuven_onehot['Neighborhood'] = leuven_venues['Neighborhood'] 
leuven_onehot.head()

Unnamed: 0,Athletics & Sports,BBQ Joint,Bakery,Bar,Basketball Court,Beer Bar,Belgian Restaurant,Bike Trail,Bistro,Board Shop,Boarding House,Botanical Garden,Boutique,Bowling Alley,Brasserie,Brewery,Burger Joint,Burrito Place,Bus Station,Bus Stop,Café,Castle,Cheese Shop,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Cafeteria,Concert Hall,Construction & Landscaping,Convenience Store,Dessert Shop,Discount Store,Doner Restaurant,Electronics Store,Event Space,Farmers Market,Fast Food Restaurant,Fish Market,Flower Shop,Food Court,Forest,French Restaurant,Friterie,Frozen Yogurt Shop,Fruit & Vegetable Store,Garden Center,Gastropub,Gay Bar,Gift Shop,Gourmet Shop,Greek Restaurant,Gym,Gym / Fitness Center,Hockey Field,Hostel,Hotel,Ice Cream Shop,Indian Restaurant,Intersection,Italian Restaurant,Kebab Restaurant,Liquor Store,Market,Mexican Restaurant,Middle Eastern Restaurant,Noodle House,Optical Shop,Organic Grocery,Outdoor Gym,Outdoors & Recreation,Park,Pastry Shop,Pet Store,Pharmacy,Pie Shop,Pizza Place,Platform,Playground,Plaza,Poke Place,Pool,Pool Hall,Restaurant,Sandwich Place,Scenic Lookout,Skating Rink,Soccer Field,Social Club,Soup Place,Spa,Sporting Goods Shop,Sports Bar,Sports Club,State / Provincial Park,Steakhouse,Supermarket,Sushi Restaurant,Tapas Restaurant,Tea Room,Thai Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Wine Bar,Neighborhood
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Leuven
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Leuven
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Leuven
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Leuven
4,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Leuven


And let's examine the new dataframe size.

In [22]:
leuven_onehot.shape

(294, 111)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [23]:
leuven_grouped =leuven_onehot.groupby('Neighborhood').mean().reset_index()
leuven_grouped

Unnamed: 0,Neighborhood,Athletics & Sports,BBQ Joint,Bakery,Bar,Basketball Court,Beer Bar,Belgian Restaurant,Bike Trail,Bistro,Board Shop,Boarding House,Botanical Garden,Boutique,Bowling Alley,Brasserie,Brewery,Burger Joint,Burrito Place,Bus Station,Bus Stop,Café,Castle,Cheese Shop,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Cafeteria,Concert Hall,Construction & Landscaping,Convenience Store,Dessert Shop,Discount Store,Doner Restaurant,Electronics Store,Event Space,Farmers Market,Fast Food Restaurant,Fish Market,Flower Shop,Food Court,Forest,French Restaurant,Friterie,Frozen Yogurt Shop,Fruit & Vegetable Store,Garden Center,Gastropub,Gay Bar,Gift Shop,Gourmet Shop,Greek Restaurant,Gym,Gym / Fitness Center,Hockey Field,Hostel,Hotel,Ice Cream Shop,Indian Restaurant,Intersection,Italian Restaurant,Kebab Restaurant,Liquor Store,Market,Mexican Restaurant,Middle Eastern Restaurant,Noodle House,Optical Shop,Organic Grocery,Outdoor Gym,Outdoors & Recreation,Park,Pastry Shop,Pet Store,Pharmacy,Pie Shop,Pizza Place,Platform,Playground,Plaza,Poke Place,Pool,Pool Hall,Restaurant,Sandwich Place,Scenic Lookout,Skating Rink,Soccer Field,Social Club,Soup Place,Spa,Sporting Goods Shop,Sports Bar,Sports Club,State / Provincial Park,Steakhouse,Supermarket,Sushi Restaurant,Tapas Restaurant,Tea Room,Thai Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Wine Bar
0,Heverlee,0.042553,0.0,0.042553,0.06383,0.0,0.0,0.042553,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.042553,0.0,0.0,0.0,0.0,0.085106,0.0,0.021277,0.0,0.042553,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.021277,0.021277,0.021277,0.0,0.0,0.042553,0.0,0.021277,0.0,0.0,0.0,0.021277,0.0,0.021277,0.0,0.021277,0.0,0.042553,0.021277,0.0,0.021277,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042553,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.021277,0.021277,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.021277,0.021277,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.021277,0.0,0.0
1,Kessel-Lo,0.02381,0.0,0.083333,0.119048,0.0,0.0,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.02381,0.011905,0.0,0.011905,0.059524,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.011905,0.0,0.0,0.011905,0.0,0.0,0.0,0.0,0.0,0.011905,0.011905,0.011905,0.02381,0.071429,0.0,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,0.011905,0.0,0.0,0.011905,0.047619,0.011905,0.0,0.0,0.02381,0.011905,0.0,0.011905,0.0,0.011905,0.0,0.011905,0.0,0.0,0.0,0.011905,0.0,0.011905,0.011905,0.0,0.02381,0.0,0.0,0.02381,0.0,0.011905,0.0,0.0,0.035714,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.011905,0.011905,0.011905,0.0,0.02381,0.0,0.0,0.0,0.011905,0.011905,0.0,0.0,0.011905,0.0,0.0
2,Wijgmaal,0.0,0.0,0.037037,0.111111,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.074074,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.074074,0.0,0.0,0.037037,0.074074,0.0,0.0,0.0,0.0,0.037037,0.037037,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0
3,Wilsele,0.027778,0.0,0.027778,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.027778,0.0,0.0,0.0,0.138889,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.027778,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.027778,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.027778,0.027778,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.027778,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.027778,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.027778,0.027778,0.0,0.027778,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0
4,Leuven,0.0,0.01,0.02,0.14,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.03,0.1,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.03,0.01,0.0,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.02,0.0,0.0,0.01,0.02,0.02,0.0,0.07,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.0,0.03,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.03,0.01,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.02,0.01


#### Let's confirm the new size

In [24]:
leuven_grouped.shape

(5, 111)

#### Let's print each neighborhood along with the top 5 most common venues

In [25]:
#### Let's print each neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in leuven_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = leuven_grouped[leuven_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

---- Heverlee----
                venue  freq
0            Bus Stop  0.09
1                 Bar  0.06
2  Athletics & Sports  0.04
3              Forest  0.04
4              Bakery  0.04


---- Kessel-Lo----
      venue  freq
0       Bar  0.12
1    Bakery  0.08
2  Friterie  0.07
3  Bus Stop  0.06
4     Hotel  0.05


---- Wijgmaal----
                  venue  freq
0                   Bar  0.11
1              Bus Stop  0.11
2            Playground  0.07
3  Gym / Fitness Center  0.07
4              Pharmacy  0.07


---- Wilsele----
                venue  freq
0            Bus Stop  0.14
1         Supermarket  0.11
2    Basketball Court  0.06
3  Athletics & Sports  0.03
4   Electronics Store  0.03


----Leuven----
                venue  freq
0                 Bar  0.14
1         Coffee Shop  0.10
2  Italian Restaurant  0.07
3            Friterie  0.03
4               Plaza  0.03




#### Let's put that into a *pandas* dataframe
First, let's write a function to sort the venues in descending order.

In [26]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [27]:
#Now let's create the new dataframe and display the top 10 venues for each neighborhood.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = leuven_grouped['Neighborhood']

for ind in np.arange(leuven_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(leuven_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Heverlee,Bus Stop,Bar,Athletics & Sports,Bakery,Belgian Restaurant,Park,Brasserie,Chinese Restaurant,Gym / Fitness Center,Forest
1,Kessel-Lo,Bar,Bakery,Friterie,Bus Stop,Hotel,Coffee Shop,Sandwich Place,Soccer Field,Pizza Place,Plaza
2,Wijgmaal,Bus Stop,Bar,Playground,Gym / Fitness Center,Pharmacy,Outdoors & Recreation,Kebab Restaurant,Sandwich Place,Intersection,Soccer Field
3,Wilsele,Bus Stop,Supermarket,Basketball Court,Athletics & Sports,Sandwich Place,Pool Hall,Discount Store,Indian Restaurant,Pizza Place,Friterie
4,Leuven,Bar,Coffee Shop,Italian Restaurant,Park,Plaza,Restaurant,Friterie,Cocktail Bar,Bistro,Pizza Place


In [28]:
leuven_grouped.head()


Unnamed: 0,Neighborhood,Athletics & Sports,BBQ Joint,Bakery,Bar,Basketball Court,Beer Bar,Belgian Restaurant,Bike Trail,Bistro,Board Shop,Boarding House,Botanical Garden,Boutique,Bowling Alley,Brasserie,Brewery,Burger Joint,Burrito Place,Bus Station,Bus Stop,Café,Castle,Cheese Shop,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Cafeteria,Concert Hall,Construction & Landscaping,Convenience Store,Dessert Shop,Discount Store,Doner Restaurant,Electronics Store,Event Space,Farmers Market,Fast Food Restaurant,Fish Market,Flower Shop,Food Court,Forest,French Restaurant,Friterie,Frozen Yogurt Shop,Fruit & Vegetable Store,Garden Center,Gastropub,Gay Bar,Gift Shop,Gourmet Shop,Greek Restaurant,Gym,Gym / Fitness Center,Hockey Field,Hostel,Hotel,Ice Cream Shop,Indian Restaurant,Intersection,Italian Restaurant,Kebab Restaurant,Liquor Store,Market,Mexican Restaurant,Middle Eastern Restaurant,Noodle House,Optical Shop,Organic Grocery,Outdoor Gym,Outdoors & Recreation,Park,Pastry Shop,Pet Store,Pharmacy,Pie Shop,Pizza Place,Platform,Playground,Plaza,Poke Place,Pool,Pool Hall,Restaurant,Sandwich Place,Scenic Lookout,Skating Rink,Soccer Field,Social Club,Soup Place,Spa,Sporting Goods Shop,Sports Bar,Sports Club,State / Provincial Park,Steakhouse,Supermarket,Sushi Restaurant,Tapas Restaurant,Tea Room,Thai Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Wine Bar
0,Heverlee,0.042553,0.0,0.042553,0.06383,0.0,0.0,0.042553,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.042553,0.0,0.0,0.0,0.0,0.085106,0.0,0.021277,0.0,0.042553,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.021277,0.021277,0.021277,0.0,0.0,0.042553,0.0,0.021277,0.0,0.0,0.0,0.021277,0.0,0.021277,0.0,0.021277,0.0,0.042553,0.021277,0.0,0.021277,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042553,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.021277,0.021277,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.021277,0.021277,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.021277,0.0,0.0
1,Kessel-Lo,0.02381,0.0,0.083333,0.119048,0.0,0.0,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.02381,0.011905,0.0,0.011905,0.059524,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.011905,0.0,0.0,0.011905,0.0,0.0,0.0,0.0,0.0,0.011905,0.011905,0.011905,0.02381,0.071429,0.0,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,0.011905,0.0,0.0,0.011905,0.047619,0.011905,0.0,0.0,0.02381,0.011905,0.0,0.011905,0.0,0.011905,0.0,0.011905,0.0,0.0,0.0,0.011905,0.0,0.011905,0.011905,0.0,0.02381,0.0,0.0,0.02381,0.0,0.011905,0.0,0.0,0.035714,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.011905,0.011905,0.011905,0.0,0.02381,0.0,0.0,0.0,0.011905,0.011905,0.0,0.0,0.011905,0.0,0.0
2,Wijgmaal,0.0,0.0,0.037037,0.111111,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.074074,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.074074,0.0,0.0,0.037037,0.074074,0.0,0.0,0.0,0.0,0.037037,0.037037,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0
3,Wilsele,0.027778,0.0,0.027778,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.027778,0.0,0.0,0.0,0.138889,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.027778,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.027778,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.027778,0.027778,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.027778,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.027778,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.027778,0.027778,0.0,0.027778,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0
4,Leuven,0.0,0.01,0.02,0.14,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.03,0.1,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.03,0.01,0.0,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.02,0.0,0.0,0.01,0.02,0.02,0.0,0.07,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.0,0.03,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.03,0.01,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.02,0.01


<a id='item4'></a>

## 7. Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 4 clusters.

In [32]:
# set number of clusters
kclusters = 3

leuven_grouped_clustering = leuven_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(leuven_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 0, 1, 2, 0], dtype=int32)

In [33]:
leuven_grouped_clustering.insert(0,'Cluster Labels', kmeans.labels_)
leuven_grouped_clustering.groupby('Cluster Labels').mean()

Unnamed: 0_level_0,Athletics & Sports,BBQ Joint,Bakery,Bar,Basketball Court,Beer Bar,Belgian Restaurant,Bike Trail,Bistro,Board Shop,Boarding House,Botanical Garden,Boutique,Bowling Alley,Brasserie,Brewery,Burger Joint,Burrito Place,Bus Station,Bus Stop,Café,Castle,Cheese Shop,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Cafeteria,Concert Hall,Construction & Landscaping,Convenience Store,Dessert Shop,Discount Store,Doner Restaurant,Electronics Store,Event Space,Farmers Market,Fast Food Restaurant,Fish Market,Flower Shop,Food Court,Forest,French Restaurant,Friterie,Frozen Yogurt Shop,Fruit & Vegetable Store,Garden Center,Gastropub,Gay Bar,Gift Shop,Gourmet Shop,Greek Restaurant,Gym,Gym / Fitness Center,Hockey Field,Hostel,Hotel,Ice Cream Shop,Indian Restaurant,Intersection,Italian Restaurant,Kebab Restaurant,Liquor Store,Market,Mexican Restaurant,Middle Eastern Restaurant,Noodle House,Optical Shop,Organic Grocery,Outdoor Gym,Outdoors & Recreation,Park,Pastry Shop,Pet Store,Pharmacy,Pie Shop,Pizza Place,Platform,Playground,Plaza,Poke Place,Pool,Pool Hall,Restaurant,Sandwich Place,Scenic Lookout,Skating Rink,Soccer Field,Social Club,Soup Place,Spa,Sporting Goods Shop,Sports Bar,Sports Club,State / Provincial Park,Steakhouse,Supermarket,Sushi Restaurant,Tapas Restaurant,Tea Room,Thai Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Wine Bar
Cluster Labels,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1
0,0.011905,0.005,0.051667,0.129524,0.0,0.005,0.005952,0.0,0.01,0.005,0.0,0.005,0.005,0.0,0.016905,0.011905,0.010952,0.005,0.005952,0.029762,0.005952,0.0,0.005,0.005,0.005,0.0,0.005,0.0,0.015,0.067857,0.0,0.0,0.0,0.005952,0.005,0.0,0.005952,0.005,0.0,0.0,0.0,0.0,0.010952,0.005952,0.005952,0.016905,0.050714,0.005,0.0,0.0,0.005,0.005952,0.005,0.01,0.0,0.005952,0.01,0.0,0.005952,0.02881,0.015952,0.01,0.0,0.046905,0.005952,0.0,0.005952,0.01,0.005952,0.005,0.005952,0.005,0.0,0.0,0.020952,0.005,0.005952,0.005952,0.005,0.021905,0.0,0.0,0.026905,0.005,0.005952,0.0,0.015,0.022857,0.0,0.0,0.017857,0.0,0.005,0.005,0.005,0.005952,0.005952,0.005952,0.0,0.011905,0.01,0.005,0.005,0.010952,0.005952,0.005,0.0,0.005952,0.01,0.005
1,0.021277,0.0,0.039795,0.08747,0.0,0.0,0.021277,0.018519,0.0,0.0,0.010638,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.098109,0.0,0.010638,0.0,0.039795,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.018519,0.0,0.018519,0.0,0.0,0.0,0.0,0.010638,0.010638,0.010638,0.010638,0.0,0.0,0.021277,0.0,0.029157,0.0,0.0,0.0,0.010638,0.0,0.010638,0.0,0.010638,0.0,0.058314,0.010638,0.0,0.010638,0.0,0.0,0.018519,0.010638,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.021277,0.0,0.0,0.047675,0.0,0.0,0.018519,0.037037,0.010638,0.0,0.0,0.0,0.029157,0.029157,0.0,0.010638,0.018519,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.010638,0.029157,0.0,0.0,0.0,0.010638,0.0,0.0,0.0,0.029157,0.0,0.0
2,0.027778,0.0,0.027778,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.027778,0.0,0.0,0.0,0.138889,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.027778,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.027778,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.027778,0.027778,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.027778,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.027778,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.027778,0.027778,0.0,0.027778,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0


In [34]:
LeuvenDF.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,3000,Leuven,Leuven,50.881253,4.69299
1,3001,Leuven,Heverlee,50.851729,4.693131
2,3010,Leuven,Kessel-Lo,50.889915,4.730761
3,3012,Leuven,Wilsele,50.909536,4.713629
4,3018,Leuven,Wijgmaal,50.926428,4.700121


Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [35]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
leuven_merged = LeuvenDF

# merge leuven_grouped with leuven_data to add latitude/longitude for each neighborhood
leuven_merged =leuven_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

leuven_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,3000,Leuven,Leuven,50.881253,4.69299,0,Bar,Coffee Shop,Italian Restaurant,Park,Plaza,Restaurant,Friterie,Cocktail Bar,Bistro,Pizza Place
1,3001,Leuven,Heverlee,50.851729,4.693131,1,Bus Stop,Bar,Athletics & Sports,Bakery,Belgian Restaurant,Park,Brasserie,Chinese Restaurant,Gym / Fitness Center,Forest
2,3010,Leuven,Kessel-Lo,50.889915,4.730761,0,Bar,Bakery,Friterie,Bus Stop,Hotel,Coffee Shop,Sandwich Place,Soccer Field,Pizza Place,Plaza
3,3012,Leuven,Wilsele,50.909536,4.713629,2,Bus Stop,Supermarket,Basketball Court,Athletics & Sports,Sandwich Place,Pool Hall,Discount Store,Indian Restaurant,Pizza Place,Friterie
4,3018,Leuven,Wijgmaal,50.926428,4.700121,1,Bus Stop,Bar,Playground,Gym / Fitness Center,Pharmacy,Outdoors & Recreation,Kebab Restaurant,Sandwich Place,Intersection,Soccer Field


In [36]:
leuven_merged.head() 

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,3000,Leuven,Leuven,50.881253,4.69299,0,Bar,Coffee Shop,Italian Restaurant,Park,Plaza,Restaurant,Friterie,Cocktail Bar,Bistro,Pizza Place
1,3001,Leuven,Heverlee,50.851729,4.693131,1,Bus Stop,Bar,Athletics & Sports,Bakery,Belgian Restaurant,Park,Brasserie,Chinese Restaurant,Gym / Fitness Center,Forest
2,3010,Leuven,Kessel-Lo,50.889915,4.730761,0,Bar,Bakery,Friterie,Bus Stop,Hotel,Coffee Shop,Sandwich Place,Soccer Field,Pizza Place,Plaza
3,3012,Leuven,Wilsele,50.909536,4.713629,2,Bus Stop,Supermarket,Basketball Court,Athletics & Sports,Sandwich Place,Pool Hall,Discount Store,Indian Restaurant,Pizza Place,Friterie
4,3018,Leuven,Wijgmaal,50.926428,4.700121,1,Bus Stop,Bar,Playground,Gym / Fitness Center,Pharmacy,Outdoors & Recreation,Kebab Restaurant,Sandwich Place,Intersection,Soccer Field


In [37]:
neighborhoods_venues_sorted.head()


Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,Heverlee,Bus Stop,Bar,Athletics & Sports,Bakery,Belgian Restaurant,Park,Brasserie,Chinese Restaurant,Gym / Fitness Center,Forest
1,0,Kessel-Lo,Bar,Bakery,Friterie,Bus Stop,Hotel,Coffee Shop,Sandwich Place,Soccer Field,Pizza Place,Plaza
2,1,Wijgmaal,Bus Stop,Bar,Playground,Gym / Fitness Center,Pharmacy,Outdoors & Recreation,Kebab Restaurant,Sandwich Place,Intersection,Soccer Field
3,2,Wilsele,Bus Stop,Supermarket,Basketball Court,Athletics & Sports,Sandwich Place,Pool Hall,Discount Store,Indian Restaurant,Pizza Place,Friterie
4,0,Leuven,Bar,Coffee Shop,Italian Restaurant,Park,Plaza,Restaurant,Friterie,Cocktail Bar,Bistro,Pizza Place


Finally, let's visualize the resulting clusters

In [38]:
# Finally, let's visualize the resulting clusters
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(leuven_merged['Latitude'], leuven_merged['Longitude'], leuven_merged['Neighborhood'], leuven_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [39]:
rainbow

['#8000ff', '#80ffb4', '#ff0000']

<a id='item5'></a>

## 8. Examine Clusters

Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster. I will leave this exercise to you.

In [101]:
leuven_merged.groupby ('Cluster Labels').count ()

Unnamed: 0_level_0,PostalCode,Borough,Neighborhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Cluster Labels,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
0,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2
1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2
2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1


#### Cluster 1

In [102]:
leuven_merged.loc[leuven_merged['Cluster Labels'] == 0, leuven_merged.columns[[2] + list(range(5, leuven_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Heverlee,0,Bus Stop,Athletics & Sports,Bar,Bakery,Belgian Restaurant,Boarding House,Brasserie,Park,Chinese Restaurant,Gym / Fitness Center
4,Wijgmaal,0,Bar,Playground,Gym / Fitness Center,Intersection,Market,Bus Stop,Public Art,Restaurant,Sandwich Place,Pizza Place


#### Cluster 2

In [103]:
leuven_merged.loc[leuven_merged['Cluster Labels'] == 1, leuven_merged.columns[[2] + list(range(5, leuven_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Leuven,1,Bar,Coffee Shop,Italian Restaurant,Plaza,Restaurant,Cocktail Bar,Friterie,Park,Gourmet Shop,Pizza Place
2,Kessel-Lo,1,Bar,Bakery,Friterie,Hotel,Bus Stop,Soccer Field,Coffee Shop,Plaza,Italian Restaurant,Park


#### Cluster 3

In [104]:
leuven_merged.loc[leuven_merged['Cluster Labels'] == 2, leuven_merged.columns[[2] + list(range(5, leuven_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Wilsele,2,Supermarket,Soccer Field,Fruit & Vegetable Store,Music Store,Sandwich Place,Bus Stop,Pizza Place,Park,Climbing Gym,Club House


#### Cluster 4

In [105]:
leuven_merged.loc[leuven_merged['Cluster Labels'] == 3, leuven_merged.columns[[2] + list(range(5, leuven_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


#### Cluster 5

In [106]:
leuven_merged.loc[leuven_merged['Cluster Labels'] == 4, leuven_merged.columns[[2] + list(range(5, leuven_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


### End of project

This notebook is part of a course on **Coursera** called *Applied Data Science Capstone*. If you accessed this notebook outside the course, you can take this course online by clicking [here](http://cocl.us/DP0701EN_Coursera_Week3_LAB2).

<hr>

Copyright &copy; 2018 [Cognitive Class](https://cognitiveclass.ai/?utm_source=bducopyrightlink&utm_medium=dswb&utm_campaign=bdu). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/).