## Example: Analyzing Airport Operations

For this project, we will use `airports.csv`, `airport-frequencies.csv`, `countries.csv`, `regions.csv` from [OurAirports.com](https://ourairports.com/data/). 

Instead of downloading the files, we can directly load the data through their URLs.

In [1]:
import numpy as np
import pandas as pd

In [2]:
# Load the datasets
airports = pd.read_csv("https://ourairports.com/data/airports.csv", sep=",")
airports.head()

Unnamed: 0,id,ident,type,name,latitude_deg,longitude_deg,elevation_ft,continent,iso_country,iso_region,municipality,scheduled_service,gps_code,iata_code,local_code,home_link,wikipedia_link,keywords
0,6523,00A,heliport,Total RF Heliport,40.070985,-74.933689,11.0,,US,US-PA,Bensalem,no,K00A,,00A,https://www.penndot.pa.gov/TravelInPA/airports...,,
1,323361,00AA,small_airport,Aero B Ranch Airport,38.704022,-101.473911,3435.0,,US,US-KS,Leoti,no,00AA,,00AA,,,
2,6524,00AK,small_airport,Lowell Field,59.947733,-151.692524,450.0,,US,US-AK,Anchor Point,no,00AK,,00AK,,,
3,6525,00AL,small_airport,Epps Airpark,34.864799,-86.770302,820.0,,US,US-AL,Harvest,no,00AL,,00AL,,,
4,506791,00AN,small_airport,Katmai Lodge Airport,59.093287,-156.456699,80.0,,US,US-AK,King Salmon,no,00AN,,00AN,,,


In [3]:
print("Size:\n", airports.shape)
print("Data types:\n", airports.dtypes)

Size:
 (77024, 18)
Data types:
 id                     int64
ident                 object
type                  object
name                  object
latitude_deg         float64
longitude_deg        float64
elevation_ft         float64
continent             object
iso_country           object
iso_region            object
municipality          object
scheduled_service     object
gps_code              object
iata_code             object
local_code            object
home_link             object
wikipedia_link        object
keywords              object
dtype: object


In [4]:
airports.isnull().sum()

id                       0
ident                    0
type                     0
name                     0
latitude_deg             0
longitude_deg            0
elevation_ft         14407
continent            37094
iso_country            259
iso_region               0
municipality          4924
scheduled_service        0
gps_code             35364
iata_code            68124
local_code           44155
home_link            73291
wikipedia_link       65929
keywords             59561
dtype: int64

In [5]:
# Load other csv files
airport_freq = pd.read_csv("https://ourairports.com/data/airport-frequencies.csv", sep=',')
countries = pd.read_csv("https://ourairports.com/data/countries.csv", sep=',')
regions = pd.read_csv("https://ourairports.com/data/regions.csv", sep=',')

In [6]:
def basic_info(df):
    print("Size:")
    print(df.shape)
    print("="*20)
    print("Data types:")
    print(df.dtypes)
    print("="*20)
    print("Missing Values:")
    print(df.isnull().sum())
    print(df.head())

In [8]:
basic_info(airport_freq)

Size:
(29318, 6)
Data types:
id                 int64
airport_ref        int64
airport_ident     object
type              object
description       object
frequency_mhz    float64
dtype: object
Missing Values:
id                 0
airport_ref        0
airport_ident      0
type               0
description      984
frequency_mhz      0
dtype: int64
       id  airport_ref airport_ident   type          description  \
0   70518         6528          00CA   CTAF                 CTAF   
1  307581         6589          01FL  ARCAL                  NaN   
2   75239         6589          01FL   CTAF  CEDAR KNOLL TRAFFIC   
3   60191         6756          04CA   CTAF                 CTAF   
4   59287         6779          04MS   UNIC               UNICOM   

   frequency_mhz  
0          122.9  
1          122.9  
2          122.8  
3          122.9  
4          122.8  


In [7]:
basic_info(countries)

Size:
(248, 6)
Data types:
id                 int64
code              object
name              object
continent         object
wikipedia_link    object
keywords          object
dtype: object
Missing Values:
id                 0
code               1
name               0
continent         41
wikipedia_link     0
keywords          16
dtype: int64
       id code                  name continent  \
0  302672   AD               Andorra        EU   
1  302618   AE  United Arab Emirates        AS   
2  302619   AF           Afghanistan        AS   
3  302722   AG   Antigua and Barbuda       NaN   
4  302723   AI              Anguilla       NaN   

                                      wikipedia_link  \
0              https://en.wikipedia.org/wiki/Andorra   
1  https://en.wikipedia.org/wiki/United_Arab_Emir...   
2          https://en.wikipedia.org/wiki/Afghanistan   
3  https://en.wikipedia.org/wiki/Antigua_and_Barbuda   
4             https://en.wikipedia.org/wiki/Anguilla   

                

In [8]:
basic_info(regions)

Size:
(3927, 8)
Data types:
id                 int64
code              object
local_code        object
name              object
continent         object
iso_country       object
wikipedia_link    object
keywords          object
dtype: object
Missing Values:
id                  0
code                0
local_code          4
name                0
continent         433
iso_country        15
wikipedia_link    264
keywords           97
dtype: int64
       id   code local_code                        name continent iso_country  \
0  302811  AD-02         02              Canillo Parish        EU          AD   
1  302812  AD-03         03               Encamp Parish        EU          AD   
2  302813  AD-04         04           La Massana Parish        EU          AD   
3  302814  AD-05         05               Ordino Parish        EU          AD   
4  302815  AD-06         06  Sant Julià de Lòria Parish        EU          AD   

                                      wikipedia_link  \
0         

In [13]:
# Check for duplicated records
print(airports.duplicated().sum())
print(airport_freq.duplicated().sum())
print(countries.duplicated().sum())
print(regions.duplicated().sum())

0
0
0
0


#### 1. Select data with multiple conditions

In [14]:
# Find the region code for New York from region data frame.

regions.head()

Unnamed: 0,id,code,local_code,name,continent,iso_country,wikipedia_link,keywords
0,302811,AD-02,2,Canillo Parish,EU,AD,https://en.wikipedia.org/wiki/Canillo,Airports in Canillo Parish
1,302812,AD-03,3,Encamp Parish,EU,AD,https://en.wikipedia.org/wiki/Encamp,Airports in Encamp Parish
2,302813,AD-04,4,La Massana Parish,EU,AD,https://en.wikipedia.org/wiki/La_Massana,Airports in La Massana Parish
3,302814,AD-05,5,Ordino Parish,EU,AD,https://en.wikipedia.org/wiki/Ordino,Airports in Ordino Parish
4,302815,AD-06,6,Sant Julià de Lòria Parish,EU,AD,https://en.wikipedia.org/wiki/Sant_Julià_de_Lòria,Airports in Sant Julià de Lòria Parish


In [15]:
countries.head()

Unnamed: 0,id,code,name,continent,wikipedia_link,keywords
0,302672,AD,Andorra,EU,https://en.wikipedia.org/wiki/Andorra,Andorran airports
1,302618,AE,United Arab Emirates,AS,https://en.wikipedia.org/wiki/United_Arab_Emir...,"UAE,مطارات في الإمارات العربية المتحدة"
2,302619,AF,Afghanistan,AS,https://en.wikipedia.org/wiki/Afghanistan,
3,302722,AG,Antigua and Barbuda,,https://en.wikipedia.org/wiki/Antigua_and_Barbuda,Antiguan airports
4,302723,AI,Anguilla,,https://en.wikipedia.org/wiki/Anguilla,


In [16]:
countries[countries['name'] == 'United States'] # This tells us that the country code for United States is US

Unnamed: 0,id,code,name,continent,wikipedia_link,keywords
229,302755,US,United States,,https://en.wikipedia.org/wiki/United_States,American airports


In [18]:
regions[(regions['iso_country'] == "US") & (regions['local_code'] == "NY")]

Unnamed: 0,id,code,local_code,name,continent,iso_country,wikipedia_link,keywords
3744,306110,US-NY,NY,New York,,US,https://en.wikipedia.org/wiki/New_York,Airports in New York


In [16]:
regions[(regions['iso_country'] == "US") & (regions['name'] == "New York")]

Unnamed: 0,id,code,local_code,name,continent,iso_country,wikipedia_link,keywords
3744,306110,US-NY,NY,New York,,US,https://en.wikipedia.org/wiki/New_York,Airports in New York


In [17]:
# Extract all large airports in New York state from airoprts data frame

airports.head()

Unnamed: 0,id,ident,type,name,latitude_deg,longitude_deg,elevation_ft,continent,iso_country,iso_region,municipality,scheduled_service,gps_code,iata_code,local_code,home_link,wikipedia_link,keywords
0,6523,00A,heliport,Total RF Heliport,40.070985,-74.933689,11.0,,US,US-PA,Bensalem,no,K00A,,00A,https://www.penndot.pa.gov/TravelInPA/airports...,,
1,323361,00AA,small_airport,Aero B Ranch Airport,38.704022,-101.473911,3435.0,,US,US-KS,Leoti,no,00AA,,00AA,,,
2,6524,00AK,small_airport,Lowell Field,59.947733,-151.692524,450.0,,US,US-AK,Anchor Point,no,00AK,,00AK,,,
3,6525,00AL,small_airport,Epps Airpark,34.864799,-86.770302,820.0,,US,US-AL,Harvest,no,00AL,,00AL,,,
4,506791,00AN,small_airport,Katmai Lodge Airport,59.093287,-156.456699,80.0,,US,US-AK,King Salmon,no,00AN,,00AN,,,


In [19]:
# Using the iso-code "US-NY", we can extract all airport in 
# New York State
airports[airports['iso_region'] == "US-NY"]

Unnamed: 0,id,ident,type,name,latitude_deg,longitude_deg,elevation_ft,continent,iso_country,iso_region,municipality,scheduled_service,gps_code,iata_code,local_code,home_link,wikipedia_link,keywords
37,321919,00NK,seaplane_base,Cliche Cove Seaplane Base,44.811861,-73.369806,96.0,,US,US-NY,Beekmantown,no,00NK,,00NK,,,
39,6554,00NY,small_airport,Weiss Airfield,42.895760,-77.495263,1000.0,,US,US-NY,West Bloomfield,no,00NY,,00NY,,,
110,6616,01NY,heliport,Vassar Hospital Heliport,41.692415,-73.936830,100.0,,US,US-NY,Poughkeepsie,no,01NY,,01NY,,,
177,6672,02NY,heliport,Hansen Heliport,43.132599,-75.655502,435.0,,US,US-NY,Durhamville,no,02NY,,02NY,,,
245,6732,03NY,small_airport,Talmage Field,40.958308,-72.717326,95.0,,US,US-NY,Riverhead,no,03NY,,03NY,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
71515,511814,US-8134,heliport,Clover Stadium Heliport,41.170806,-74.037556,473.0,,US,US-NY,Ramapo,no,29NY,,29NY,,,
71545,514365,US-8164,heliport,Massena Substation Heliport,44.921495,-74.834651,239.0,,US,US-NY,Massena,no,36NY,,36NY,,,
71556,514709,US-8175,heliport,Wynn Hospital Helipad,43.104244,-75.236025,487.0,,US,US-NY,Utica,no,NY86,,NY86,,,
71565,38690,US-BPA,closed,Grumman Bethpage Airport,40.749401,-73.496002,115.0,,US,US-NY,Bethpage,no,,,,,,"BPA, BPA"


In [20]:
# set(airports['type'].values)

In [21]:
# airports['type'].unique()
airports['type'].value_counts()

small_airport     39964
heliport          19732
closed            10880
medium_airport     4763
seaplane_base      1166
large_airport       470
balloonport          49
Name: type, dtype: int64

In [22]:
airports_NY_large = airports[(airports['iso_region'] == "US-NY") & (airports['type'] == 'large_airport')]

In [21]:
airports_NY_large

Unnamed: 0,id,ident,type,name,latitude_deg,longitude_deg,elevation_ft,continent,iso_country,iso_region,municipality,scheduled_service,gps_code,iata_code,local_code,home_link,wikipedia_link,keywords
35591,3431,KBUF,large_airport,Buffalo Niagara International Airport,42.940498,-78.732201,728.0,,US,US-NY,Buffalo,yes,KBUF,BUF,BUF,,https://en.wikipedia.org/wiki/Buffalo_Niagara_...,
37009,3622,KJFK,large_airport,John F Kennedy International Airport,40.639447,-73.779317,13.0,,US,US-NY,New York,yes,KJFK,JFK,JFK,https://www.jfkairport.com/,https://en.wikipedia.org/wiki/John_F._Kennedy_...,"Manhattan, New York City, NYC, Idlewild, IDL, ..."
37150,3643,KLGA,large_airport,La Guardia Airport,40.777199,-73.872597,21.0,,US,US-NY,New York,yes,KLGA,LGA,LGA,https://www.laguardiaairport.com/,https://en.wikipedia.org/wiki/LaGuardia_Airport,"Manhattan, New York City, NYC, Glenn H. Curtis..."
39419,3913,KSYR,large_airport,Syracuse Hancock International Airport,43.111198,-76.1063,421.0,,US,US-NY,Syracuse,yes,KSYR,SYR,SYR,http://www.syrairport.org/,https://en.wikipedia.org/wiki/Syracuse_Hancock...,


In [23]:
# Extract the name, identification code, and municipality of
# all airports with ISO region "US-NY" and type "large_airport"

airports_NY_large[["name", "ident", "municipality"]].reset_index(drop=True) # reset index to remove original line numbers

Unnamed: 0,name,ident,municipality
0,Buffalo Niagara International Airport,KBUF,Buffalo
1,John F Kennedy International Airport,KJFK,New York
2,La Guardia Airport,KLGA,New York
3,Syracuse Hancock International Airport,KSYR,Syracuse


#### 2. Sorting

In [23]:
# From airport_freq, extract all communication frequencies for KJFK,
# with frequencies sorted in ascending order

airport_freq.head()

Unnamed: 0,id,airport_ref,airport_ident,type,description,frequency_mhz
0,70518,6528,00CA,CTAF,CTAF,122.9
1,307581,6589,01FL,ARCAL,,122.9
2,75239,6589,01FL,CTAF,CEDAR KNOLL TRAFFIC,122.8
3,60191,6756,04CA,CTAF,CTAF,122.9
4,59287,6779,04MS,UNIC,UNICOM,122.8


In [24]:
# Since we have already known the IDs of NY airports,
# let's use the id columns to extract NY-related frequencies.
airport_freq[airport_freq['airport_ref'] == 3622]

Unnamed: 0,id,airport_ref,airport_ident,type,description,frequency_mhz
11898,69293,3622,KJFK,APP,NEW YORK APP (ROBER),125.7
11899,301312,3622,KJFK,APP,NEW YORK APPROACH (CAMRN),127.4
11900,301313,3622,KJFK,APP,NEW YORK APPROACH (FINAL),132.4
11901,69294,3622,KJFK,ATIS,ATIS,115.1
11902,69295,3622,KJFK,CLD,CLNC DEL,135.05
11903,69296,3622,KJFK,DEP,NEW YORK DEP,135.9
11904,332895,3622,KJFK,GND,GND ALT,121.65
11905,69297,3622,KJFK,GND,GND,121.9
11906,69298,3622,KJFK,RDO,NEW YORK RDO,115.9
11907,69299,3622,KJFK,TWR,KENNEDY TWR,119.1


In [26]:
KJFK_freq = airport_freq[airport_freq['airport_ident'] == 'KJFK']
KJFK_freq

Unnamed: 0,id,airport_ref,airport_ident,type,description,frequency_mhz
11898,69293,3622,KJFK,APP,NEW YORK APP (ROBER),125.7
11899,301312,3622,KJFK,APP,NEW YORK APPROACH (CAMRN),127.4
11900,301313,3622,KJFK,APP,NEW YORK APPROACH (FINAL),132.4
11901,69294,3622,KJFK,ATIS,ATIS,115.1
11902,69295,3622,KJFK,CLD,CLNC DEL,135.05
11903,69296,3622,KJFK,DEP,NEW YORK DEP,135.9
11904,332895,3622,KJFK,GND,GND ALT,121.65
11905,69297,3622,KJFK,GND,GND,121.9
11906,69298,3622,KJFK,RDO,NEW YORK RDO,115.9
11907,69299,3622,KJFK,TWR,KENNEDY TWR,119.1


In [27]:
KJFK_freq.sort_values(by="frequency_mhz")

Unnamed: 0,id,airport_ref,airport_ident,type,description,frequency_mhz
11901,69294,3622,KJFK,ATIS,ATIS,115.1
11906,69298,3622,KJFK,RDO,NEW YORK RDO,115.9
11907,69299,3622,KJFK,TWR,KENNEDY TWR,119.1
11904,332895,3622,KJFK,GND,GND ALT,121.65
11905,69297,3622,KJFK,GND,GND,121.9
11909,69300,3622,KJFK,UNIC,UNICOM,122.95
11908,332894,3622,KJFK,TWR,TWR ALT,123.9
11898,69293,3622,KJFK,APP,NEW YORK APP (ROBER),125.7
11899,301312,3622,KJFK,APP,NEW YORK APPROACH (CAMRN),127.4
11900,301313,3622,KJFK,APP,NEW YORK APPROACH (FINAL),132.4


In [28]:
# From airport_freq, extract all communication frequencies for KJFK,
# with frequencies sorted in descending order

KJFK_freq.sort_values(by="frequency_mhz", ascending=False)

Unnamed: 0,id,airport_ref,airport_ident,type,description,frequency_mhz
11903,69296,3622,KJFK,DEP,NEW YORK DEP,135.9
11902,69295,3622,KJFK,CLD,CLNC DEL,135.05
11900,301313,3622,KJFK,APP,NEW YORK APPROACH (FINAL),132.4
11899,301312,3622,KJFK,APP,NEW YORK APPROACH (CAMRN),127.4
11898,69293,3622,KJFK,APP,NEW YORK APP (ROBER),125.7
11908,332894,3622,KJFK,TWR,TWR ALT,123.9
11909,69300,3622,KJFK,UNIC,UNICOM,122.95
11905,69297,3622,KJFK,GND,GND,121.9
11904,332895,3622,KJFK,GND,GND ALT,121.65
11907,69299,3622,KJFK,TWR,KENNEDY TWR,119.1


In [None]:
# Find the five rows with larget frequency value from the previous data frame

KJFK_freq.head() # by default head() returns the first 5 rows

#### 3. Filter on a list of values

In [31]:
# Create a data frame including all communication frequencies
# used by NY large airports
airport_ids = airports_NY_large['id']
airport_ids.values

for index in airport_ids:
    print(airport_freq[airport_freq['airport_ref'] == index])   

         id  airport_ref airport_ident  type      description  frequency_mhz
7982  69857         3431          KBUF   A/D  Buffalo APP/DEP         126.15
7983  69858         3431          KBUF  ATIS             ATIS         135.35
7984  69859         3431          KBUF   CLD         CLNC DEL         124.70
7985  69860         3431          KBUF   GND              GND         133.20
7986  69861         3431          KBUF   RDO              RDO         122.60
7987  69862         3431          KBUF   TWR              TWR         120.50
           id  airport_ref airport_ident  type                description  \
11898   69293         3622          KJFK   APP       NEW YORK APP (ROBER)   
11899  301312         3622          KJFK   APP  NEW YORK APPROACH (CAMRN)   
11900  301313         3622          KJFK   APP  NEW YORK APPROACH (FINAL)   
11901   69294         3622          KJFK  ATIS                       ATIS   
11902   69295         3622          KJFK   CLD                   CLNC DEL   

In [35]:
# Extract all communication frequencies used for a large airport in New York state

freq_NY_airports = pd.DataFrame()

for ident in airports_NY_large['ident']:
#     print(ident) # verify that the identification codes are extracted correctly
    freq_airport = airport_freq[airport_freq['airport_ident'] == ident]
#     print(freq_airport) # verify that the frequencies are extracted correctly
    freq_NY_airports = pd.concat([freq_NY_airports, freq_airport])

freq_NY_airports

Unnamed: 0,id,airport_ref,airport_ident,type,description,frequency_mhz
7982,69857,3431,KBUF,A/D,Buffalo APP/DEP,126.15
7983,69858,3431,KBUF,ATIS,ATIS,135.35
7984,69859,3431,KBUF,CLD,CLNC DEL,124.7
7985,69860,3431,KBUF,GND,GND,133.2
7986,69861,3431,KBUF,RDO,RDO,122.6
7987,69862,3431,KBUF,TWR,TWR,120.5
11898,69293,3622,KJFK,APP,NEW YORK APP (ROBER),125.7
11899,301312,3622,KJFK,APP,NEW YORK APPROACH (CAMRN),127.4
11900,301313,3622,KJFK,APP,NEW YORK APPROACH (FINAL),132.4
11901,69294,3622,KJFK,ATIS,ATIS,115.1


In [36]:
idents = airports_NY_large['ident']
filter1 = airport_freq['airport_ident'].isin(idents)
airport_freq[filter1]

Unnamed: 0,id,airport_ref,airport_ident,type,description,frequency_mhz
7982,69857,3431,KBUF,A/D,Buffalo APP/DEP,126.15
7983,69858,3431,KBUF,ATIS,ATIS,135.35
7984,69859,3431,KBUF,CLD,CLNC DEL,124.7
7985,69860,3431,KBUF,GND,GND,133.2
7986,69861,3431,KBUF,RDO,RDO,122.6
7987,69862,3431,KBUF,TWR,TWR,120.5
11898,69293,3622,KJFK,APP,NEW YORK APP (ROBER),125.7
11899,301312,3622,KJFK,APP,NEW YORK APPROACH (CAMRN),127.4
11900,301313,3622,KJFK,APP,NEW YORK APPROACH (FINAL),132.4
11901,69294,3622,KJFK,ATIS,ATIS,115.1


In [37]:
airport_freq[airport_freq['airport_ident'].isin(airports_NY_large['ident'])]

Unnamed: 0,id,airport_ref,airport_ident,type,description,frequency_mhz
7982,69857,3431,KBUF,A/D,Buffalo APP/DEP,126.15
7983,69858,3431,KBUF,ATIS,ATIS,135.35
7984,69859,3431,KBUF,CLD,CLNC DEL,124.7
7985,69860,3431,KBUF,GND,GND,133.2
7986,69861,3431,KBUF,RDO,RDO,122.6
7987,69862,3431,KBUF,TWR,TWR,120.5
11898,69293,3622,KJFK,APP,NEW YORK APP (ROBER),125.7
11899,301312,3622,KJFK,APP,NEW YORK APPROACH (CAMRN),127.4
11900,301313,3622,KJFK,APP,NEW YORK APPROACH (FINAL),132.4
11901,69294,3622,KJFK,ATIS,ATIS,115.1


#### 4. Grouping

In [38]:
countries.head()

Unnamed: 0,id,code,name,continent,wikipedia_link,keywords
0,302672,AD,Andorra,EU,https://en.wikipedia.org/wiki/Andorra,Andorran airports
1,302618,AE,United Arab Emirates,AS,https://en.wikipedia.org/wiki/United_Arab_Emir...,"UAE,مطارات في الإمارات العربية المتحدة"
2,302619,AF,Afghanistan,AS,https://en.wikipedia.org/wiki/Afghanistan,
3,302722,AG,Antigua and Barbuda,,https://en.wikipedia.org/wiki/Antigua_and_Barbuda,Antiguan airports
4,302723,AI,Anguilla,,https://en.wikipedia.org/wiki/Anguilla,


In [39]:
airports.head()

Unnamed: 0,id,ident,type,name,latitude_deg,longitude_deg,elevation_ft,continent,iso_country,iso_region,municipality,scheduled_service,gps_code,iata_code,local_code,home_link,wikipedia_link,keywords
0,6523,00A,heliport,Total RF Heliport,40.070985,-74.933689,11.0,,US,US-PA,Bensalem,no,K00A,,00A,https://www.penndot.pa.gov/TravelInPA/airports...,,
1,323361,00AA,small_airport,Aero B Ranch Airport,38.704022,-101.473911,3435.0,,US,US-KS,Leoti,no,00AA,,00AA,,,
2,6524,00AK,small_airport,Lowell Field,59.947733,-151.692524,450.0,,US,US-AK,Anchor Point,no,00AK,,00AK,,,
3,6525,00AL,small_airport,Epps Airpark,34.864799,-86.770302,820.0,,US,US-AL,Harvest,no,00AL,,00AL,,,
4,506791,00AN,small_airport,Katmai Lodge Airport,59.093287,-156.456699,80.0,,US,US-AK,King Salmon,no,00AN,,00AN,,,


In [44]:
# Calculate the number of large airports for each country

airports_by_country = pd.DataFrame()

for country_code in countries['code']:
    # extract the large airports from that country
    large_airports_country = airports[(airports['iso_country'] == country_code) & (airports['type'] == "large_airport")]
#     print(large_airports_country)
    # count the number of large airports
    num_large_airport = large_airports_country.shape[0]

    # add a row in airports_by_country with the country name and the number of large airports
    airports_by_country.loc[country_code, 'Number of Large Airports'] = num_large_airport

airports_by_country

Unnamed: 0,Number of Large Airports
AD,0.0
AE,4.0
AF,0.0
AG,0.0
AI,0.0
...,...
YT,0.0
ZA,3.0
ZM,1.0
ZW,1.0


In [52]:
large_airports = airports[airports['type'] == 'large_airport']
# large_airports.head()
large_airports.groupby('iso_country').size().to_frame("Number of Large Airports")

Unnamed: 0_level_0,Number of Large Airports
iso_country,Unnamed: 1_level_1
AE,4
AL,1
AM,1
AO,1
AR,2
...,...
VN,2
VU,1
ZA,3
ZM,1


In [64]:
# Alternatively, split the records by country and type
airport_counts = airports.groupby(['iso_country', 'type']).size().unstack()
results = airport_counts[['large_airport']].copy()
results.head()

type,large_airport
iso_country,Unnamed: 1_level_1
AD,
AE,4.0
AF,
AG,
AI,


In [62]:
# An interesting challenge is to convert all NaNs to 0.
for index in results.index:
    # if the large_airport value is NaN, change it to 0.
    if pd.isnull(results.loc[index, 'large_airport']):
        results.loc[index, 'large_airport'] = 0
results.head(10)

type,large_airport
iso_country,Unnamed: 1_level_1
AD,0.0
AE,4.0
AF,0.0
AG,0.0
AI,0.0
AL,1.0
AM,1.0
AO,1.0
AQ,0.0
AR,2.0


In [66]:
results = results.fillna(0)
results.head(10)

type,large_airport
iso_country,Unnamed: 1_level_1
AD,0.0
AE,4.0
AF,0.0
AG,0.0
AI,0.0
AL,1.0
AM,1.0
AO,1.0
AQ,0.0
AR,2.0


In [69]:
# Add country names to the results
# countries.head()
final_results = pd.merge(results, countries, left_index=True, 
                         right_on='code', how='left')
final_results.head()

Unnamed: 0,large_airport,id,code,name,continent,wikipedia_link,keywords
0,0.0,302672,AD,Andorra,EU,https://en.wikipedia.org/wiki/Andorra,Andorran airports
1,4.0,302618,AE,United Arab Emirates,AS,https://en.wikipedia.org/wiki/United_Arab_Emir...,"UAE,مطارات في الإمارات العربية المتحدة"
2,0.0,302619,AF,Afghanistan,AS,https://en.wikipedia.org/wiki/Afghanistan,
3,0.0,302722,AG,Antigua and Barbuda,,https://en.wikipedia.org/wiki/Antigua_and_Barbuda,Antiguan airports
4,0.0,302723,AI,Anguilla,,https://en.wikipedia.org/wiki/Anguilla,


In [70]:
final_results = final_results[['name', 'large_airport']]
final_results.head(10)

Unnamed: 0,name,large_airport
0,Andorra,0.0
1,United Arab Emirates,4.0
2,Afghanistan,0.0
3,Antigua and Barbuda,0.0
4,Anguilla,0.0
5,Albania,1.0
6,Armenia,1.0
7,Angola,1.0
8,Antarctica,0.0
9,Argentina,2.0


In [73]:
# Alternatively, drop the unwanted columns before merging
final_results = pd.merge(results, countries[['code', 'name']], left_index=True, 
                         right_on='code', how='left')
final_results.drop('code', inplace=True, axis=1)
final_results.head()

Unnamed: 0,large_airport,name
0,0.0,Andorra
1,4.0,United Arab Emirates
2,0.0,Afghanistan
3,0.0,Antigua and Barbuda
4,0.0,Anguilla


In [74]:
# Find the top 5 countries having the largest amount of large airports

final_results.sort_values('large_airport', ascending=False).head(10)

Unnamed: 0,large_airport,name
229,67.0,United States
45,39.0,China
188,16.0,Russia
154,16.0,Mexico
107,14.0,Italy
35,13.0,Canada
111,12.0,Japan
65,11.0,Spain
29,10.0,Brazil
102,10.0,India


#### 5. Merging

In [None]:
# Merge the above result with countries data frame to find the name of the countries



In [None]:
# Append full country name and region name to airports.

