<a href="https://cognitiveclass.ai"><img src = "https://ibm.box.com/shared/static/9gegpsmnsoo25ikkbl4qzlvlyjbgxs5x.png" width = 400> </a>

# Project Capstone - Comparing Restaurants categories in Airports with more passenger traffic #

## Introduction

The Airport restaurants with more passenger traffic need to serve a diverse and multicultural population. Most people choose airports fast food restaurant because of lack of time between airplane connections but other people eat at the fast food restaurant because there are no other restaurants. The analysis of the number of restaurants per airport can indicate the percentage of fast-food restaurants and other restaurants and give new insights to the proposition.

In order to obtain the data about the restaurants near to airports with more passenger traffic in the world, we present in this document the process to obtain the dataset that is in the table of airports and how explore the dataset available freely in the wikipedia page. Futhermore, we explain how the data will be transformed and stored into a pandas dataframe. Then, we collect the longitude and latitude from each airport list using the Geocoder API.

Finally, the Capstone project will compare the number of restaurants within a 1000-meters radius among the top 50 airports in the world in 2017, with more passenger traffic using the foursquare location data.

## Table of Contents

1. <a href="#item1">Download and Explore Dataset</a>

2. <a href="#item2">Get the Geographical Coordinates from Airports</a>

3. <a href="#item3">Explore Restaurants near to the Airports</a>

4. <a href="#item4">Analyze Each Airport</a>

5. <a href="#item5">Cluster Airports</a>

6. <a href="#item6">Examine Clusters</a>

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import requests

## 1. Download and Explore Dataset

The dataset to explore the busiest airports by passenger traffic is the wikipedia site.

https://en.wikipedia.org/wiki/List_of_busiest_airports_by_passenger_traffic

In addition to the dataset from the wikipedia page, each airport will have complementary data from GeoPy's geocoding web services. Using the Geocoder API is possible collect longitude and latitude for every 50 one airports.

We get airport geolocalization using **the Foursquare location data** and then select venues limited to 100 with category equal "Food" near to the each one 50 airport. 

In this link is possible identify all available categories: https://developer.foursquare.com/docs/resources/categories

In [2]:
wikipedia_link = 'https://en.wikipedia.org/wiki/List_of_busiest_airports_by_passenger_traffic'

In [3]:
airports_wikipedia_page = requests.get(wikipedia_link)

Storing the wikipedia page in a page variable

In [4]:
page = airports_wikipedia_page.text

Finding Statistics 2017 Table inside 'wikipedia page' and storing in a table_script variable

In [5]:
html_table_tag_start = "<th>%<br />Change"
html_table_tag_end = "</tbody></table>"
table_start = page.find(html_table_tag_start) + len(html_table_tag_start)
table_end = page.find(html_table_tag_end,table_start)
table_script = page[table_start:table_end]


Removing tags not important to dataset

In [6]:
table_script = table_script.replace("\n","")
table_script = table_script.replace("\t","")
table_script = table_script.replace("&#39;","'")
#table_script = table_script.replace("\"><tbody><tr><th>Postcode</th><th>Borough</th><th>Neighbourhood</th>","")

Extract Airport, Country and total of passengens for each row in the list and store in DataFrame

In [7]:
tr_table = table_script.split("</td></tr>")
tr_table_valid = [];
html_table_tag_start = '</td><td><img alt='
for p in tr_table:
    tr_find_start = p.find(html_table_tag_start)
    tr_airport = p[:tr_find_start]
    tr_find_start = tr_airport.rfind('<td>') + 4
    passangers_number = tr_airport[tr_find_start:]
    passangers_number = passangers_number.replace(",","")
    
    
    tr_airport = tr_airport[p.find('title="')+7:]
    tr_title = tr_airport.split('title="')
    title = ''
    for r in tr_title:
        tr_find_start = r.find('"')
        title = title + "|" + r[:tr_find_start]
    title = title + "|" + passangers_number
    tr_table_valid.append(title)

In [8]:
print(tr_table_valid)

['|United States|Hartsfield–Jackson Atlanta International Airport|Atlanta|Georgia (U.S. state)|103902992', '|China|Beijing Capital International Airport|Chaoyang District, Beijing|Shunyi District|Beijing|95786442', '|United Arab Emirates|Dubai International Airport|Garhoud|Dubai|88242099', '|Japan|Haneda Airport|Ōta, Tokyo|Tokyo|85408975', '|United States|Los Angeles International Airport|Los Angeles|California|84557968', "|United States|O'Hare International Airport|Chicago|Illinois|79828183", '|United Kingdom|London Heathrow Airport|London Borough of Hillingdon|London|78014598', '|Hong Kong|Hong Kong International Airport|Chek Lap Kok|Hong Kong|72665078', '|China|Shanghai Pudong International Airport|Pudong|Shanghai|70001237', '|France|Paris-Charles de Gaulle Airport|Roissy-en-France|Île-de-France (region)|69471442', '|Netherlands|Amsterdam Airport Schiphol|Haarlemmermeer|North Holland|68515425', '|United States|Dallas/Fort Worth International Airport|Dallas|Fort Worth, Texas|Texas|67

Create a new DataFrame

In [9]:
# define the dataframe columns
column_names = ['Airport','Country','TotalPassengers'] 
#column_names = ['Airport','Country','TotalPassengers','Longitude','Latitute'] 

# instantiate the dataframe
airports = pd.DataFrame(columns=column_names)

for r in tr_table_valid:
    if (len(r) > 2):
        tr_table = r.split('|')
        #print(tr_table[2])
        airports = airports.append({'Airport': tr_table[2],
                    'Country': tr_table[1],'TotalPassengers':tr_table[-1]}, ignore_index=True)         

Some airports name need update because it was not found latitude and longitude using the Geocoder API.

In [10]:
airports.loc[airports['Airport'].str.contains("Leonardo da Vinci"),'Airport'] = 'Rome Fiumicino Airport'
airports.loc[airports['Airport'].str.contains("George Bush Intercontinental Airport"),'Airport'] = 'Houston Intercontinental Airport'

In [11]:
airports.set_index('Airport', inplace=True)

In [12]:
airports[['TotalPassengers']] = airports[['TotalPassengers']].astype(float)

In [13]:
airports.sort_values(by=['TotalPassengers'], ascending=False)

Unnamed: 0_level_0,Country,TotalPassengers
Airport,Unnamed: 1_level_1,Unnamed: 2_level_1
Hartsfield–Jackson Atlanta International Airport,United States,103902992.0
Beijing Capital International Airport,China,95786442.0
Dubai International Airport,United Arab Emirates,88242099.0
Haneda Airport,Japan,85408975.0
Los Angeles International Airport,United States,84557968.0
O'Hare International Airport,United States,79828183.0
London Heathrow Airport,United Kingdom,78014598.0
Hong Kong International Airport,Hong Kong,72665078.0
Shanghai Pudong International Airport,China,70001237.0
Paris-Charles de Gaulle Airport,France,69471442.0


Check dataframe results

In [14]:
airports.head(60)

Unnamed: 0_level_0,Country,TotalPassengers
Airport,Unnamed: 1_level_1,Unnamed: 2_level_1
Hartsfield–Jackson Atlanta International Airport,United States,103902992.0
Beijing Capital International Airport,China,95786442.0
Dubai International Airport,United Arab Emirates,88242099.0
Haneda Airport,Japan,85408975.0
Los Angeles International Airport,United States,84557968.0
O'Hare International Airport,United States,79828183.0
London Heathrow Airport,United Kingdom,78014598.0
Hong Kong International Airport,Hong Kong,72665078.0
Shanghai Pudong International Airport,China,70001237.0
Paris-Charles de Gaulle Airport,France,69471442.0


Check Shape from dataFrame

In [15]:
airports.shape

(50, 2)

## 2. Get the Geographical Coordinates from Airports

In order to utilize the Map location data, we need to get the latitude and the longitude coordinates of each Airport.

We use the Geopy API Python package: https://geopy.readthedocs.io/en/stable/ to get the latitude and the longitude coordinates.

After to capture the latitude and the longitude coordinates, we will create a new dataframe that will consist of five columns: Airport, Country, TotalPassangers, Latitude and Longitude.

Install libraries to GeoCoder API

In [16]:
!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Libraries imported.


In [17]:
# define the dataframe columns
column_names = ['Airport','Longitude','Latitude'] 

# instantiate the dataframe
airports_coordinates = pd.DataFrame(columns=column_names)

geolocator = Nominatim()

for row_index, row in airports.iterrows():
    address = row_index
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    airports_coordinates = airports_coordinates.append({'Airport': address,
                    'Latitude': latitude,'Longitude':longitude}, ignore_index=True) 
airports_coordinates.set_index('Airport', inplace=True) 



In [18]:
airports_coordinates.head()

Unnamed: 0_level_0,Longitude,Latitude
Airport,Unnamed: 1_level_1,Unnamed: 2_level_1
Hartsfield–Jackson Atlanta International Airport,-84.429271,33.637799
Beijing Capital International Airport,116.594561,40.079285
Dubai International Airport,55.368541,25.251417
Haneda Airport,139.781107,35.545511
Los Angeles International Airport,-118.407057,33.942113


The Latitude and Longitude from some airports is not location correct. For example, Amsterdam airport have difference of 8 km between Geocoder API and the location correct.

In [19]:
airports_coordinates.loc['Amsterdam Airport Schiphol', 'Latitude'] = '52.307432'
airports_coordinates.loc['Amsterdam Airport Schiphol', 'Longitude'] = '4.772017'
airports_coordinates.loc['Chengdu Shuangliu International Airport', 'Latitude'] = '30.568674'
airports_coordinates.loc['Chengdu Shuangliu International Airport', 'Longitude'] = '103.949851'
airports_coordinates.loc['Chengdu Shuangliu International Airport', 'Latitude'] = '50.049565'
airports_coordinates.loc['Chengdu Shuangliu International Airport', 'Longitude'] = '8.572448'

Merge two dataframes: **airports** and **airports_coordinates** with output in third **airport_result** dataframe.

Index common between dataframes is the **Airport** column.

In [20]:
airport_result = pd.merge(airports,
                     airports_coordinates[['Latitude','Longitude']],
                     on='Airport')


In [21]:
airport_result

Unnamed: 0_level_0,Country,TotalPassengers,Latitude,Longitude
Airport,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Hartsfield–Jackson Atlanta International Airport,United States,103902992.0,33.6378,-84.4293
Beijing Capital International Airport,China,95786442.0,40.0793,116.595
Dubai International Airport,United Arab Emirates,88242099.0,25.2514,55.3685
Haneda Airport,Japan,85408975.0,35.5455,139.781
Los Angeles International Airport,United States,84557968.0,33.9421,-118.407
O'Hare International Airport,United States,79828183.0,41.978,-87.9093
London Heathrow Airport,United Kingdom,78014598.0,51.4678,-0.459082
Hong Kong International Airport,Hong Kong,72665078.0,22.3074,113.917
Shanghai Pudong International Airport,China,70001237.0,31.1405,121.805
Paris-Charles de Gaulle Airport,France,69471442.0,49.0067,2.57077


## 3. Explore Restaurants near to the Airports

Next, we are going to start utilizing the Foursquare API to explore the restaurants near to the airports and segment them.

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [22]:
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

print('Libraries imported.')

Libraries imported.


Define Foursquare Credentials and Version

In [23]:
CLIENT_ID = 'ZCEJYMP51D1PU3UTWRIVAKPFLL14CX3AXLEQGB551IBUUQXZ' # your Foursquare ID
CLIENT_SECRET = '5KQOOD03AVBUMJ4UM3ZM05AJSWBMQZ0U52AR5ALRWDPR0T5X' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 1000 # define radius

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: XXXXXXXXXXXXXXXXXXXXXXXXXX
CLIENT_SECRET:5KQOOD03AVBUMJ4UM3ZM05AJSWBMQZ0U52AR5ALRWDPR0T5X


Now, let's get the top 100 venues that are in a radius of 1000 meters.

In [24]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 1000 # define radius

In [25]:
airport_result.reset_index(inplace=True)

In [26]:
airport_result

Unnamed: 0,Airport,Country,TotalPassengers,Latitude,Longitude
0,Hartsfield–Jackson Atlanta International Airport,United States,103902992.0,33.6378,-84.4293
1,Beijing Capital International Airport,China,95786442.0,40.0793,116.595
2,Dubai International Airport,United Arab Emirates,88242099.0,25.2514,55.3685
3,Haneda Airport,Japan,85408975.0,35.5455,139.781
4,Los Angeles International Airport,United States,84557968.0,33.9421,-118.407
5,O'Hare International Airport,United States,79828183.0,41.978,-87.9093
6,London Heathrow Airport,United Kingdom,78014598.0,51.4678,-0.459082
7,Hong Kong International Airport,Hong Kong,72665078.0,22.3074,113.917
8,Shanghai Pudong International Airport,China,70001237.0,31.1405,121.805
9,Paris-Charles de Gaulle Airport,France,69471442.0,49.0067,2.57077


Let's create a function to repeat the same process to all the airport neighborhoods and define a **venue category = FOOD**

In [27]:
food_category_id = '4d4b7105d754a06374d81259'

In [28]:
def getNearbyVenues(names, latitudes, longitudes, country, totalPassengers, categoryid, radius=1000):
    
    venues_list=[]
    for name, lat, lng, cty, psg in zip(names, latitudes, longitudes, country, totalPassengers):
        print("Airport name: " + name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT,
            categoryid)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            cty,
            psg,
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Airport', 
                  'Airport Latitude', 
                  'Airport Longitude', 
                  'Country',
                  'Total Passengers',           
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each airport and create a new dataframe called *airport_venues*.

In [29]:
airport_venues = getNearbyVenues(names=airport_result['Airport'],
                                   latitudes=airport_result['Latitude'],longitudes=airport_result['Longitude'],
                                   country=airport_result['Country'],totalPassengers=airport_result['TotalPassengers'],
                                   categoryid=food_category_id)

Airport name: Hartsfield–Jackson Atlanta International Airport
Airport name: Beijing Capital International Airport
Airport name: Dubai International Airport
Airport name: Haneda Airport
Airport name: Los Angeles International Airport
Airport name: O'Hare International Airport
Airport name: London Heathrow Airport
Airport name: Hong Kong International Airport
Airport name: Shanghai Pudong International Airport
Airport name: Paris-Charles de Gaulle Airport
Airport name: Amsterdam Airport Schiphol
Airport name: Dallas/Fort Worth International Airport
Airport name: Guangzhou Baiyun International Airport
Airport name: Frankfurt Airport
Airport name: Istanbul Atatürk Airport
Airport name: Indira Gandhi International Airport
Airport name: Soekarno-Hatta International Airport
Airport name: Singapore Changi Airport
Airport name: Seoul Incheon International Airport
Airport name: Denver International Airport
Airport name: Suvarnabhumi Airport
Airport name: John F. Kennedy International Airport
Ai

Let's check the size of the resulting dataframe

In [30]:
print(airport_venues.shape)
airport_venues.head()

(1741, 9)


Unnamed: 0,Airport,Airport Latitude,Airport Longitude,Country,Total Passengers,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Hartsfield–Jackson Atlanta International Airport,33.6378,-84.4293,United States,103902992.0,Chicken & Beer,33.638494,-84.429318,Fried Chicken Joint
1,Hartsfield–Jackson Atlanta International Airport,33.6378,-84.4293,United States,103902992.0,One Flew South Restaurant & Sushi Bar,33.640935,-84.42591,Sushi Restaurant
2,Hartsfield–Jackson Atlanta International Airport,33.6378,-84.4293,United States,103902992.0,Chick-Fil-A,33.640433,-84.432647,Fast Food Restaurant
3,Hartsfield–Jackson Atlanta International Airport,33.6378,-84.4293,United States,103902992.0,Fresh Healthy Cafe,33.642733,-84.432595,Snack Place
4,Hartsfield–Jackson Atlanta International Airport,33.6378,-84.4293,United States,103902992.0,Five Guys,33.642196,-84.432556,Burger Joint


Let's check how many venues were returned for each airport

In [31]:
airport_venues.groupby('Airport').count()

Unnamed: 0_level_0,Airport Latitude,Airport Longitude,Country,Total Passengers,Venue,Venue Latitude,Venue Longitude,Venue Category
Airport,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Amsterdam Airport Schiphol,68,68,68,68,68,68,68,68
Barcelona–El Prat Airport,39,39,39,39,39,39,39,39
Beijing Capital International Airport,15,15,15,15,15,15,15,15
Benito Juárez International Airport,35,35,35,35,35,35,35,35
Charlotte Douglas International Airport,17,17,17,17,17,17,17,17
Chengdu Shuangliu International Airport,65,65,65,65,65,65,65,65
Chhatrapati Shivaji International Airport,20,20,20,20,20,20,20,20
Dallas/Fort Worth International Airport,84,84,84,84,84,84,84,84
Denver International Airport,57,57,57,57,57,57,57,57
Dubai International Airport,40,40,40,40,40,40,40,40


#### Let's check the size of the resulting dataframe

In [32]:
print(airport_venues.shape)
airport_venues.head()

(1741, 9)


Unnamed: 0,Airport,Airport Latitude,Airport Longitude,Country,Total Passengers,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Hartsfield–Jackson Atlanta International Airport,33.6378,-84.4293,United States,103902992.0,Chicken & Beer,33.638494,-84.429318,Fried Chicken Joint
1,Hartsfield–Jackson Atlanta International Airport,33.6378,-84.4293,United States,103902992.0,One Flew South Restaurant & Sushi Bar,33.640935,-84.42591,Sushi Restaurant
2,Hartsfield–Jackson Atlanta International Airport,33.6378,-84.4293,United States,103902992.0,Chick-Fil-A,33.640433,-84.432647,Fast Food Restaurant
3,Hartsfield–Jackson Atlanta International Airport,33.6378,-84.4293,United States,103902992.0,Fresh Healthy Cafe,33.642733,-84.432595,Snack Place
4,Hartsfield–Jackson Atlanta International Airport,33.6378,-84.4293,United States,103902992.0,Five Guys,33.642196,-84.432556,Burger Joint


#### Let's find out how many unique categories can be curated from all the returned venues

In [33]:
print('There are {} uniques categories.'.format(len(airport_venues['Venue Category'].unique())))

There are 98 uniques categories.


## 4. Analyze Each Airport

In [34]:
# one hot encoding
airport_onehot = pd.get_dummies(airport_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
airport_onehot['Airport'] = airport_venues['Airport'] 

# move Airport column to the first column
fixed_columns = [airport_onehot.columns[-1]] + list(airport_onehot.columns[:-1])
airport_onehot = airport_onehot[fixed_columns]

airport_onehot.head()

Unnamed: 0,Airport,African Restaurant,American Restaurant,Asian Restaurant,Australian Restaurant,Austrian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bavarian Restaurant,Bistro,Blini House,Brazilian Restaurant,Breakfast Spot,Buffet,Burger Joint,Burrito Place,Cafeteria,Café,Cajun / Creole Restaurant,Cantonese Restaurant,Caribbean Restaurant,Chinese Restaurant,Creperie,Cuban Restaurant,Deli / Bodega,Dim Sum Restaurant,Diner,Donburi Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant,Empanada Restaurant,English Restaurant,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fondue Restaurant,Food,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Friterie,Gastropub,German Restaurant,Greek Restaurant,Halal Restaurant,Hot Dog Joint,Indian Restaurant,Indonesian Meatball Place,Indonesian Restaurant,Irish Pub,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Kebab Restaurant,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Meyhane,Middle Eastern Restaurant,New American Restaurant,Noodle House,Okonomiyaki Restaurant,Peruvian Restaurant,Pizza Place,Portuguese Restaurant,Ramen Restaurant,Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,Snack Place,Soba Restaurant,Soup Place,Southern / Soul Food Restaurant,Spanish Restaurant,Steakhouse,Sushi Restaurant,Taco Place,Taiwanese Restaurant,Takoyaki Place,Tapas Restaurant,Tempura Restaurant,Tex-Mex Restaurant,Thai Restaurant,Tonkatsu Restaurant,Turkish Home Cooking Restaurant,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint,Yoshoku Restaurant
0,Hartsfield–Jackson Atlanta International Airport,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Hartsfield–Jackson Atlanta International Airport,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Hartsfield–Jackson Atlanta International Airport,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Hartsfield–Jackson Atlanta International Airport,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Hartsfield–Jackson Atlanta International Airport,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [35]:
airport_onehot.shape

(1741, 99)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [36]:
airport_grouped = airport_onehot.groupby('Airport').mean().reset_index()
airport_grouped

Unnamed: 0,Airport,African Restaurant,American Restaurant,Asian Restaurant,Australian Restaurant,Austrian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bavarian Restaurant,Bistro,Blini House,Brazilian Restaurant,Breakfast Spot,Buffet,Burger Joint,Burrito Place,Cafeteria,Café,Cajun / Creole Restaurant,Cantonese Restaurant,Caribbean Restaurant,Chinese Restaurant,Creperie,Cuban Restaurant,Deli / Bodega,Dim Sum Restaurant,Diner,Donburi Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant,Empanada Restaurant,English Restaurant,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fondue Restaurant,Food,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Friterie,Gastropub,German Restaurant,Greek Restaurant,Halal Restaurant,Hot Dog Joint,Indian Restaurant,Indonesian Meatball Place,Indonesian Restaurant,Irish Pub,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Kebab Restaurant,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Meyhane,Middle Eastern Restaurant,New American Restaurant,Noodle House,Okonomiyaki Restaurant,Peruvian Restaurant,Pizza Place,Portuguese Restaurant,Ramen Restaurant,Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,Snack Place,Soba Restaurant,Soup Place,Southern / Soul Food Restaurant,Spanish Restaurant,Steakhouse,Sushi Restaurant,Taco Place,Taiwanese Restaurant,Takoyaki Place,Tapas Restaurant,Tempura Restaurant,Tex-Mex Restaurant,Thai Restaurant,Tonkatsu Restaurant,Turkish Home Cooking Restaurant,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint,Yoshoku Restaurant
0,Amsterdam Airport Schiphol,0.0,0.014706,0.014706,0.0,0.0,0.014706,0.0,0.029412,0.0,0.044118,0.0,0.0,0.044118,0.014706,0.029412,0.0,0.014706,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.014706,0.0,0.0,0.0,0.0,0.0,0.0,0.073529,0.0,0.0,0.0,0.0,0.044118,0.0,0.014706,0.0,0.0,0.014706,0.014706,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014706,0.029412,0.0,0.044118,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.014706,0.0,0.0,0.088235,0.0,0.058824,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Barcelona–El Prat Airport,0.0,0.025641,0.0,0.0,0.0,0.025641,0.051282,0.025641,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.051282,0.384615,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.153846,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.102564,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.051282,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Beijing Capital International Airport,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.133333,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Benito Juárez International Airport,0.0,0.142857,0.028571,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.057143,0.0,0.085714,0.028571,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.057143,0.0,0.0,0.0,0.0,0.0,0.0,0.057143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.171429,0.0,0.0,0.0,0.0,0.0,0.0,0.085714,0.0,0.0,0.0,0.028571,0.057143,0.057143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Charlotte Douglas International Airport,0.058824,0.058824,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.176471,0.0,0.0,0.117647,0.058824,0.058824,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Chengdu Shuangliu International Airport,0.0,0.015385,0.046154,0.0,0.0,0.015385,0.0,0.138462,0.0,0.046154,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.015385,0.0,0.0,0.0,0.030769,0.015385,0.0,0.0,0.015385,0.0,0.015385,0.0,0.0,0.0,0.0,0.030769,0.123077,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.092308,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.061538,0.015385,0.015385,0.015385,0.015385,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.030769,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Chhatrapati Shivaji International Airport,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.35,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.15,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Dallas/Fort Worth International Airport,0.0,0.130952,0.0,0.0,0.0,0.047619,0.02381,0.035714,0.0,0.02381,0.0,0.0,0.02381,0.0,0.011905,0.0,0.0,0.035714,0.0,0.0,0.0,0.011905,0.0,0.0,0.011905,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.107143,0.0,0.0,0.0,0.0,0.011905,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.071429,0.0,0.02381,0.011905,0.119048,0.0,0.0,0.0,0.0,0.02381,0.011905,0.011905,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0
8,Denver International Airport,0.0,0.140351,0.0,0.0,0.0,0.017544,0.017544,0.052632,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.140351,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.035088,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.070175,0.0,0.0,0.017544,0.0,0.0,0.0,0.052632,0.0,0.0,0.035088,0.0,0.087719,0.0,0.070175,0.0,0.0,0.0,0.0,0.035088,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Dubai International Airport,0.0,0.0,0.05,0.025,0.0,0.0,0.0,0.1,0.0,0.025,0.0,0.0,0.025,0.0,0.05,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.05,0.0,0.0,0.05,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.075,0.05,0.05,0.075,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0


#### Let's confirm the new size

In [37]:
airport_grouped.shape

(50, 99)

#### Let's print each airport along with the top 5 most common venues

In [38]:
num_top_venues = 5

for hood in airport_grouped['Airport']:
    print("----"+hood+"----")
    temp = airport_grouped[airport_grouped['Airport'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Amsterdam Airport Schiphol----
                  venue  freq
0                  Café  0.25
1            Restaurant  0.09
2  Fast Food Restaurant  0.07
3        Sandwich Place  0.06
4            Food Court  0.04


----Barcelona–El Prat Airport----
                  venue  freq
0                  Café  0.38
1  Fast Food Restaurant  0.15
2        Sandwich Place  0.10
3            Bagel Shop  0.05
4      Tapas Restaurant  0.05


----Beijing Capital International Airport----
                  venue  freq
0  Fast Food Restaurant  0.20
1   Japanese Restaurant  0.13
2         Deli / Bodega  0.13
3    Chinese Restaurant  0.13
4                  Café  0.13


----Benito Juárez International Airport----
                 venue  freq
0   Mexican Restaurant  0.17
1  American Restaurant  0.14
2         Burger Joint  0.09
3          Pizza Place  0.09
4       Sandwich Place  0.06


----Charlotte Douglas International Airport----
                 venue  freq
0          Pizza Place  0.18
1           R

<a id='item2'></a>

#### Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [39]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [40]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Airport']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
airport_venues_sorted = pd.DataFrame(columns=columns)
airport_venues_sorted['Airport'] = airport_grouped['Airport']

for ind in np.arange(airport_grouped.shape[0]):
    airport_venues_sorted.iloc[ind, 1:] = return_most_common_venues(airport_grouped.iloc[ind, :], num_top_venues)

airport_venues_sorted

Unnamed: 0,Airport,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Amsterdam Airport Schiphol,Café,Restaurant,Fast Food Restaurant,Sandwich Place,Japanese Restaurant,Breakfast Spot,Food Court,Bistro,Bakery,Deli / Bodega
1,Barcelona–El Prat Airport,Café,Fast Food Restaurant,Sandwich Place,Cafeteria,Bagel Shop,Tapas Restaurant,American Restaurant,Pizza Place,BBQ Joint,Bakery
2,Beijing Capital International Airport,Fast Food Restaurant,Deli / Bodega,Chinese Restaurant,Japanese Restaurant,Café,Pizza Place,Asian Restaurant,Sandwich Place,Snack Place,English Restaurant
3,Benito Juárez International Airport,Mexican Restaurant,American Restaurant,Burger Joint,Pizza Place,Sandwich Place,Breakfast Spot,Seafood Restaurant,Fast Food Restaurant,Food Truck,Bakery
4,Charlotte Douglas International Airport,Pizza Place,Restaurant,African Restaurant,Italian Restaurant,New American Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,Café,Fast Food Restaurant
5,Chengdu Shuangliu International Airport,Café,Bakery,German Restaurant,Italian Restaurant,Restaurant,Asian Restaurant,Bistro,Fast Food Restaurant,Gastropub,Thai Restaurant
6,Chhatrapati Shivaji International Airport,Café,Indian Restaurant,Donut Shop,Fast Food Restaurant,Restaurant,American Restaurant,Chinese Restaurant,Bakery,Italian Restaurant,Snack Place
7,Dallas/Fort Worth International Airport,American Restaurant,Snack Place,Fast Food Restaurant,Pizza Place,Restaurant,Donut Shop,BBQ Joint,Tex-Mex Restaurant,Bakery,Café
8,Denver International Airport,American Restaurant,Fast Food Restaurant,Sandwich Place,Mexican Restaurant,Snack Place,Chinese Restaurant,Food Court,Bakery,Pizza Place,Café
9,Dubai International Airport,Bakery,Café,Restaurant,Seafood Restaurant,French Restaurant,Mediterranean Restaurant,Salad Place,Sandwich Place,Italian Restaurant,Burger Joint


## 5. Cluster Airports

Let's import K-means libraries

In [41]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Libraries imported.


Run *k*-means to cluster the neighborhood into 5 clusters.

In [42]:
# set number of clusters
kclusters = 5

airport_grouped_clustering = airport_grouped.drop('Airport', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(airport_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([0, 0, 2, 2, 2, 0, 0, 2, 2, 2, 4, 0, 0, 2, 2, 2, 0, 0, 2, 0, 2, 2,
       2, 2, 2, 2, 2, 0, 0, 2, 2, 2, 3, 0, 2, 2, 2, 2, 0, 0, 0, 0, 0, 2,
       0, 0, 0, 2, 1, 0], dtype=int32)

In [43]:
kmeans.cluster_centers_

array([[ 2.16840434e-19,  1.02943457e-02,  2.81248397e-02,
         8.50340136e-04,  8.50340136e-04,  6.06154143e-03,
         4.14268271e-03,  2.93493910e-02,  2.55102041e-03,
         1.01116235e-02,  1.76366843e-03, -2.71050543e-19,
         5.80316195e-03,  4.38508737e-03,  6.43961288e-03,
         7.32600733e-04,  7.38727138e-03,  2.90465733e-01,
         0.00000000e+00,  1.15079365e-02,  0.00000000e+00,
         5.24187453e-02,  1.08420217e-19, -5.42101086e-19,
         6.84903591e-03,  6.61375661e-04,  4.77755205e-03,
         4.87528345e-03,  8.50340136e-04,  2.92945811e-02,
         6.50521303e-19, -2.71050543e-19, -1.08420217e-19,
         8.72059232e-02,  7.32600733e-04,  8.50340136e-04,
        -1.08420217e-19,  2.43328100e-03,  8.74512198e-03,
         7.32600733e-04,  9.20368147e-03,  6.24512100e-03,
         9.44822373e-03,  7.00280112e-04,  4.14960856e-03,
         1.18131868e-02, -1.08420217e-19,  5.42101086e-20,
         7.65306122e-03,  1.39455782e-02,  1.32275132e-0

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each airport.

In [44]:
airport_merged = airport_result

# add clustering labels
airport_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
airport_merged = airport_merged.join(airport_venues_sorted.set_index('Airport'), on='Airport')

airport_merged.head() # check the last columns!

Unnamed: 0,Airport,Country,TotalPassengers,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Hartsfield–Jackson Atlanta International Airport,United States,103902992.0,33.6378,-84.4293,0,American Restaurant,Sandwich Place,Hot Dog Joint,Southern / Soul Food Restaurant,Burger Joint,Café,Fried Chicken Joint,Food Court,Asian Restaurant,Seafood Restaurant
1,Beijing Capital International Airport,China,95786442.0,40.0793,116.595,0,Fast Food Restaurant,Deli / Bodega,Chinese Restaurant,Japanese Restaurant,Café,Pizza Place,Asian Restaurant,Sandwich Place,Snack Place,English Restaurant
2,Dubai International Airport,United Arab Emirates,88242099.0,25.2514,55.3685,2,Bakery,Café,Restaurant,Seafood Restaurant,French Restaurant,Mediterranean Restaurant,Salad Place,Sandwich Place,Italian Restaurant,Burger Joint
3,Haneda Airport,Japan,85408975.0,35.5455,139.781,2,Café,Japanese Restaurant,Tempura Restaurant,Soba Restaurant,Yoshoku Restaurant,Udon Restaurant,Sushi Restaurant,Ramen Restaurant,Japanese Curry Restaurant,Noodle House
4,Los Angeles International Airport,United States,84557968.0,33.9421,-118.407,2,American Restaurant,Burger Joint,Fast Food Restaurant,Mexican Restaurant,Pizza Place,Sandwich Place,Bakery,Sushi Restaurant,New American Restaurant,Chinese Restaurant


Finally, let's visualize the resulting clusters

In [45]:
airport_merged[['Latitude']] = airport_merged[['Latitude']].astype(float)
airport_merged[['Longitude']] = airport_merged[['Longitude']].astype(float)

Create a map with central point in the **Prime Meridian (Greenwich)** with zoom 2

In [46]:
# create map
latitude = 51.0
longitude = 0.0
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=2)

In [47]:
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(airport_merged['Latitude'], airport_merged['Longitude'], airport_merged['Airport'], airport_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<a id='item5'></a>

## 6. Examine Clusters

Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster:

#### Cluster 1

In [48]:
airport_merged.loc[airport_merged['Cluster Labels'] == 0, airport_merged.columns[[1] + [0] + list(range(5, airport_merged.shape[1]))]]

Unnamed: 0,Country,Airport,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,United States,Hartsfield–Jackson Atlanta International Airport,0,American Restaurant,Sandwich Place,Hot Dog Joint,Southern / Soul Food Restaurant,Burger Joint,Café,Fried Chicken Joint,Food Court,Asian Restaurant,Seafood Restaurant
1,China,Beijing Capital International Airport,0,Fast Food Restaurant,Deli / Bodega,Chinese Restaurant,Japanese Restaurant,Café,Pizza Place,Asian Restaurant,Sandwich Place,Snack Place,English Restaurant
5,United States,O'Hare International Airport,0,Hot Dog Joint,Snack Place,American Restaurant,Sandwich Place,Salad Place,Pizza Place,Fast Food Restaurant,Deli / Bodega,Mexican Restaurant,Mediterranean Restaurant
6,United Kingdom,London Heathrow Airport,0,Restaurant,Sandwich Place,Café,Seafood Restaurant,Sushi Restaurant,Italian Restaurant,English Restaurant,Bistro,Bakery,Fast Food Restaurant
11,United States,Dallas/Fort Worth International Airport,0,American Restaurant,Snack Place,Fast Food Restaurant,Pizza Place,Restaurant,Donut Shop,BBQ Joint,Tex-Mex Restaurant,Bakery,Café
12,China,Guangzhou Baiyun International Airport,0,Chinese Restaurant,Café,Fast Food Restaurant,Asian Restaurant,Noodle House,Restaurant,Sushi Restaurant,Cantonese Restaurant,Buffet,Food Court
16,Indonesia,Soekarno-Hatta International Airport,0,Café,Donut Shop,Asian Restaurant,Indonesian Restaurant,Chinese Restaurant,Fast Food Restaurant,Bakery,American Restaurant,Japanese Restaurant,Noodle House
17,Singapore,Singapore Changi Airport,0,Sandwich Place,Chinese Restaurant,Snack Place,Fast Food Restaurant,Café,BBQ Joint,Donut Shop,Restaurant,Asian Restaurant,Food Court
19,United States,Denver International Airport,0,American Restaurant,Fast Food Restaurant,Sandwich Place,Mexican Restaurant,Snack Place,Chinese Restaurant,Food Court,Bakery,Pizza Place,Café
27,Spain,Barcelona–El Prat Airport,0,Café,Fast Food Restaurant,Sandwich Place,Cafeteria,Bagel Shop,Tapas Restaurant,American Restaurant,Pizza Place,BBQ Joint,Bakery


#### Cluster 2

In [49]:
airport_merged.loc[airport_merged['Cluster Labels'] == 1, airport_merged.columns[[1] + [0] + list(range(5, airport_merged.shape[1]))]]

Unnamed: 0,Country,Airport,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
48,Japan,Narita International Airport,1,Café,Sushi Restaurant,Donburi Restaurant,Japanese Restaurant,Fast Food Restaurant,Udon Restaurant,Takoyaki Place,Okonomiyaki Restaurant,Restaurant,Soba Restaurant


#### Cluster 3

In [50]:
airport_merged.loc[airport_merged['Cluster Labels'] == 2, airport_merged.columns[[1] + [0] + list(range(5, airport_merged.shape[1]))]]

Unnamed: 0,Country,Airport,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,United Arab Emirates,Dubai International Airport,2,Bakery,Café,Restaurant,Seafood Restaurant,French Restaurant,Mediterranean Restaurant,Salad Place,Sandwich Place,Italian Restaurant,Burger Joint
3,Japan,Haneda Airport,2,Café,Japanese Restaurant,Tempura Restaurant,Soba Restaurant,Yoshoku Restaurant,Udon Restaurant,Sushi Restaurant,Ramen Restaurant,Japanese Curry Restaurant,Noodle House
4,United States,Los Angeles International Airport,2,American Restaurant,Burger Joint,Fast Food Restaurant,Mexican Restaurant,Pizza Place,Sandwich Place,Bakery,Sushi Restaurant,New American Restaurant,Chinese Restaurant
7,Hong Kong,Hong Kong International Airport,2,Restaurant,Irish Pub,Chinese Restaurant,Thai Restaurant,Ramen Restaurant,Asian Restaurant,Fried Chicken Joint,English Restaurant,Dim Sum Restaurant,Diner
8,China,Shanghai Pudong International Airport,2,Asian Restaurant,Snack Place,Café,Korean Restaurant,Yoshoku Restaurant,Fish & Chips Shop,Donburi Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant
9,France,Paris-Charles de Gaulle Airport,2,Café,French Restaurant,Sandwich Place,Bakery,Restaurant,Vegetarian / Vegan Restaurant,Fast Food Restaurant,Snack Place,Seafood Restaurant,Sushi Restaurant
13,Germany,Frankfurt Airport,2,Pizza Place,Yoshoku Restaurant,Deli / Bodega,Diner,Donburi Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant,Empanada Restaurant,English Restaurant
14,Turkey,Istanbul Atatürk Airport,2,Café,Fast Food Restaurant,Turkish Restaurant,Food Court,Bakery,Cafeteria,Kebab Restaurant,Snack Place,Restaurant,Diner
15,India,Indira Gandhi International Airport,2,Café,Indian Restaurant,Pizza Place,Restaurant,Sandwich Place,Deli / Bodega,Snack Place,Fast Food Restaurant,Gastropub,Bakery
18,South Korea,Seoul Incheon International Airport,2,Café,Korean Restaurant,Yoshoku Restaurant,Fish & Chips Shop,Diner,Donburi Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant,Empanada Restaurant


#### Cluster 4

In [51]:
airport_merged.loc[airport_merged['Cluster Labels'] == 3, airport_merged.columns[[1] + [0] + list(range(5, airport_merged.shape[1]))]]

Unnamed: 0,Country,Airport,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,United Kingdom,London Gatwick Airport,3,Sandwich Place,Restaurant,Chinese Restaurant,Italian Restaurant,Mexican Restaurant,Seafood Restaurant,Café,Lebanese Restaurant,Breakfast Spot,Sushi Restaurant


#### Cluster 5

In [52]:
airport_merged.loc[airport_merged['Cluster Labels'] == 4, airport_merged.columns[[1] + [0] + list(range(5, airport_merged.shape[1]))]]

Unnamed: 0,Country,Airport,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Netherlands,Amsterdam Airport Schiphol,4,Café,Restaurant,Fast Food Restaurant,Sandwich Place,Japanese Restaurant,Breakfast Spot,Food Court,Bistro,Bakery,Deli / Bodega


### About the Author:  
 [Clayton Magalhaes]( https://www.linkedin.com/in/cvianam/) Clayton Magalhaes is a Fraud Prevention Specialist at IBM.



 <hr>
Copyright &copy; 2018 [cognitiveclass.ai](cognitiveclass.ai?utm_source=bducopyrightlink&utm_medium=dswb&utm_campaign=bdu). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/).