# Python Scraping Demo

Dependencies:
* BeautifulSoup: a library for traversing the DOM in HTML pages (e.g. going through each `<table>` element in a web page.
* Scrapy: a library for crawling the web (i.e. going from one page to another by links that connect them.)
* (Optional) `urllib3` or `requests`: More powerful libraries for dealing with URLs; you can also use `urllib` or `urllib2`, which, depending on the machine, should be built in. You can install this libraries with `pip3 install urllib3` or `pip3 install requests`.

To install these, you should have two things installed:
* Python 3: This is the latest version of Python and is actively maintained, although you should be aware that many applications are still built in Python 2 due to legacy.
    * https://www.python.org
* pip: a python package installation manager that installs packages/libraries for you. Ensure that you install pip for Python 3 (i.e. pip3).
    * https://pip.pypa.io/en/stable/
* (Optional) Jupyter notebook: To create a notebook like this to work on:
    * `pip3 install jupyter`

In [131]:
from bs4 import BeautifulSoup
# import scrapy
import urllib.request
import csv

### Skytrax Scraping
Using Skytrax to combine the rankings and find subcategories.

Skytrax lists 555 Airports each year.

Documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Websites with listing of Top 100 Airports from 2012-2016:
* http://www.worldairportawards.com/Awards/world_airport_rating.html
* http://www.worldairportawards.com/Awards/world_airport_rating_2015.html
* http://www.worldairportawards.com/Awards/world_airport_rating_2014.html
* http://www.worldairportawards.com/Awards/world_airport_rating_2013.html
* http://www.worldairportawards.com/Awards/world_airport_rating_2012.html

How do we figure out where to scrape from?<br />
1) Right click on a word you want to include and click on "Inspect element":<br />
<!--![title](Instructions 1.png)-->
<img src="Instructions 1.png",height="400px",width="400px" />
2) Find the HTML tags associated with our element:<br />
<!--![title](Instructions 2.png)-->
<img src="Instructions 2.png",height="600px",width="600px" />

We notice a lot of `<tr>` elements surrounding each row of airport entries that we want; so let's try to grab each `<tr>` element that we think is important:

Observe the following code for this example:<br />
`<tr>`<br />
    `<td class="left">1</td>`<br />
    `<td class="middle">Singapore Airport</td>`<br />
    `<td class="right">1</td>`<br />
`</tr>`

In [101]:
# Show what happens without the [1:-1]

In [102]:
apts = set()
ratings_16 = {}
ratings_15 = {}
ratings_14 = {}
ratings_13 = {}
ratings_12 = {}
ratings_11 = {}

def read_page(curr_dict, page, prev_dict):
    soup = BeautifulSoup(urllib.request.urlopen(page).read(), "lxml")
    for tag in soup.find_all("td", class_="left")[1:-1]:
        curr_rating = tag.get_text()
        apt_name = tag.find_next_sibling("td", class_="middle").get_text()
        prev_rating = tag.find_next_sibling("td", class_="right").get_text()
        apts.add(apt_name)
        if apt_name in curr_dict:
            if (curr_dict[apt_name] != curr_rating):
                print("Error for " + apt_name)
                print(page)
                print(curr_rating + " vs. " + curr_dict[apt_name])
                curr_dict[apt_name] = curr_rating
        else:
            curr_dict[apt_name] = curr_rating
        prev_dict[apt_name] = prev_rating
    

In [103]:
read_page(ratings_16, "http://www.worldairportawards.com/Awards/world_airport_rating.html", ratings_15)
print(ratings_16)
print(ratings_15)

{'Tokyo Intl  Haneda': '4', 'Narita Intl Airport': '11', 'Cincinnati/Kentucky': '32', 'Amsterdam Schiphol': '13', 'Vancouver Airport': '14', 'Madrid-Barajas Airport': '31', 'Johannesburg Intl': '30', 'Sydney Airport': '23', 'San Francisco Airport': '37', 'Lisbon Airport': '57', 'Beijing Capital Airport': '16', 'Kansai Intl Airport': '9', 'Boston Logan Airport ': '97', 'Geneva Intl Airport': '96', 'Durban Airport ': '35', 'Toronto Pearson': '44', 'Copenhagen Airport': '18', 'Gold Coast Airport': '55', 'Paris CDG Airport': '33', 'Nice Airport': '84', 'Seattle-Tacoma ': '54', 'Frankfurt Airport': '12', 'Keflavik Intl Airport': '72', 'Barcelona Airport': '27', 'Abu Dhabi Airport': '38', 'Houston George Bush': '71', 'Shenzhen Airport': '77', 'Brussels Airport': '83', 'Incheon Intl Airport': '2', 'Taiwan Taoyuan Airport ': '20', 'Brisbane Airport': '17', 'Doha Hamad Airport': '10', 'Montréal Intl Airport': '88', 'Birmingham Airport': '87', 'London Heathrow': '8', 'Shanghai Hongqiao': '34', '

In [104]:
read_page(ratings_15, "http://www.worldairportawards.com/Awards/world_airport_rating_2015.html", ratings_14)
read_page(ratings_14, "http://www.worldairportawards.com/Awards/world_airport_rating_2014.html", ratings_13)
read_page(ratings_13, "http://www.worldairportawards.com/Awards/world_airport_rating_2013.html", ratings_12)
read_page(ratings_12, "http://www.worldairportawards.com/Awards/world_airport_rating_2012.html", ratings_11)

Error for Copenhagen Airport
http://www.worldairportawards.com/Awards/world_airport_rating_2013.html
17 vs. 12
Error for Dallas/Fort Worth
http://www.worldairportawards.com/Awards/world_airport_rating_2013.html
54 vs. 49


In [105]:
ratings_tbls = [ratings_16, ratings_15, ratings_14, ratings_13, ratings_12, ratings_11]

def add_rating(apt_key, r_list, r_tbl):
    if apt_key in r_tbl:
        r_list.append(r_tbl[apt_key])
    else:
        r_list.append("--")

cumul_ratings = {}
for apt in apts:
    ratings = []
    for ratings_tbl in ratings_tbls:
        add_rating(apt, ratings, ratings_tbl)
    cumul_ratings[apt] = ratings

In [106]:
print(cumul_ratings)

{'Tokyo Intl  Haneda': ['4', '5', '--', '--', '--', '--'], 'Porto Airport': ['--', '65', '63', '53', '55', '55'], 'Minneapolis-Saint Paul': ['--', '--', '67', '71', '--', '--'], 'London Stansted ': ['--', '--', '--', '41', '42', '--'], 'London City': ['--', '--', '--', '34', '37', '45'], 'Shanghai PuDong': ['--', '--', '--', '--', '32', '39'], 'Heathrow Airport': ['--', '--', '10', '10', '--', '--'], 'London Stansted': ['--', '73', '49', '41', '42', '47'], 'Philadelphia Airport': ['--', '--', '--', '--', '97', '100'], 'Madrid-Barajas Airport': ['31', '27', '41', '47', '--', '--'], 'Charlotte Douglas': ['--', '--', '--', '--', '88', '87'], 'Jakarta Intl Airport': ['--', '57', '60', '--', '--', '--'], 'Christchurch Airport': ['--', '76', '74', '64', '70', '65'], 'San Francisco Airport': ['37', '36', '39', '40', '--', '--'], 'Madrid Barajas': ['--', '--', '--', '--', '38', '22'], 'Prague Airport': ['93', '85', '70', '61', '54', '44'], 'Minneapolis St.Paul': ['--', '--', '--', '--', '65', 

In [107]:
for ratings_tbl in ratings_tbls:
    print(len(ratings_tbl))

100
151
142
147
136
99


In [108]:
print(len(cumul_ratings))

229


In [109]:
rat_15s = set()
rat_15s_nonset = []
for apt in ratings_15:
    rat_15s.add(ratings_15[apt])
    rat_15s_nonset.append(ratings_15[apt])
print(len(rat_15s))
print(len(rat_15s_nonset))
print(rat_15s_nonset)

103
151
['5', '14', '65', '30', '92', '9', '11', '99', '30', '27', '27', '24', '57', '4', '21', '36', '52', '10', '96', '12', '88', '33', '83', '88', '66', '95', '28', '43', '23', '16', '42', '48', '19', '69', '79', '54', '13', '67', '94', '37', '29', '101', '72', '11', '78', '2', '17', '20', '22', '87', '82', '8', '31', '39', '60', '45', '74', '4', '57', '79', '44', '73', '74', '53', '34', '6', '59', '75', '38', '32', '77', '49', '141', '47', '103', '97', '23', '18', '81', '24', '33', '55', '105', '65', '36', '58', '10', '15', '35', '60', '28', '67', '26', '19', '51', '98', '41', '25', '7', '69', '80', '51', '70', '94', '97', '44', '1', '29', '31', '62', '85', '73', '84', '40', '64', '218', '54', '63', '46', '50', '7', '91', '17', '53', '40', '71', '3', '55', '87', '64', '61', '86', '15', '39', '47', '76', '68', '89', '89', '93', '95', '38', '76', '22', '18', '66', '56', '5', '96', '83', '46']


In [110]:
apts_set = set()

ratings_16_ator = {}
ratings_15_ator = {}
ratings_14_ator = {}
ratings_13_ator = {}
ratings_12_ator = {}
ratings_11_ator = {}

ratings_16_rtoa = {}
ratings_15_rtoa = {}
ratings_14_rtoa = {}
ratings_13_rtoa = {}
ratings_12_rtoa = {}
ratings_11_rtoa = {}


def read_page_16(curr_dict_ator, curr_dict_rtoa, page, prev_dict_ator, prev_dict_rtoa):
    soup = BeautifulSoup(urllib.request.urlopen(page).read(), "lxml")
    for tag in soup.find_all("td", class_="left")[1:-1]:
        curr_rating = int(tag.get_text())
        apt_name = tag.find_next_sibling("td", class_="middle").get_text().strip()
        prev_rating = int(tag.find_next_sibling("td", class_="right").get_text())
        curr_dict_rtoa[curr_rating] = apt_name
        curr_dict_ator[apt_name] = curr_rating
        prev_dict_rtoa[prev_rating] = apt_name
        prev_dict_ator[apt_name] = prev_rating

def read_page_2(curr_dict_ator, curr_dict_rtoa, page, prev_dict_ator, prev_dict_rtoa):
    soup = BeautifulSoup(urllib.request.urlopen(page).read(), "lxml")
    for tag in soup.find_all("td", class_="left")[1:]:
        curr_rating = int(tag.get_text())
        apt_name = tag.find_next_sibling("td", class_="middle").get_text().strip()
        prev_rating = int(tag.find_next_sibling("td", class_="right").get_text())
        # apts.add(apt_name)
        if curr_rating in curr_dict_rtoa:
            if (curr_dict_rtoa[curr_rating] != apt_name):
                print(str(curr_rating) + ": " + apt_name + " vs. " + curr_dict_rtoa[curr_rating])
                curr_dict_rtoa[curr_rating] = apt_name
                if apt_name in curr_dict_ator:
                    print("ERROR: " + apt_name + " " + str(curr_dict_ator[apt_name]) + " vs. " + str(curr_rating))
                curr_dict_ator[apt_name] = curr_rating
        else:
            curr_dict_rtoa[curr_rating] = apt_name
            curr_dict_ator[apt_name] = curr_rating
        prev_dict_rtoa[prev_rating] = apt_name
        prev_dict_ator[apt_name] = prev_rating

In [111]:
# Need to add CSV Reader afterwards
# Double count of prev_rating "49" in 2014 leads to problem with DFW and Moscow-Sheremetyevo

print("2016:")
read_page_16(ratings_16_ator, ratings_16_rtoa, "http://www.worldairportawards.com/Awards/world_airport_rating.html", ratings_15_ator, ratings_15_rtoa)
print("2015:")
read_page_2(ratings_15_ator, ratings_15_rtoa, "http://www.worldairportawards.com/Awards/world_airport_rating_2015.html", ratings_14_ator, ratings_14_rtoa)
print("2014:")
read_page_2(ratings_14_ator, ratings_14_rtoa, "http://www.worldairportawards.com/Awards/world_airport_rating_2014.html", ratings_13_ator, ratings_13_rtoa)
print("2013:")
read_page_2(ratings_13_ator, ratings_13_rtoa, "http://www.worldairportawards.com/Awards/world_airport_rating_2013.html", ratings_12_ator, ratings_12_rtoa)
print("2012:")
read_page_2(ratings_12_ator, ratings_12_rtoa, "http://www.worldairportawards.com/Awards/world_airport_rating_2012.html", ratings_11_ator, ratings_11_rtoa)


2016:
2015:
4: Hong Kong Intl vs. Hong Kong Intl Airport
5: Tokyo Intl Haneda vs. Tokyo Intl  Haneda
7: Central Japan Intl vs. Centrair Airport
10: Beijing Capital vs. Beijing Capital Airport
11: Vancouver Intl Airport vs. Vancouver Airport
15: Auckland Intl Airport vs. Auckland Airport
17: Taiwan Taoyuan vs. Taiwan Taoyuan Airport
18: Helsinki-Vantaa vs. Helsinki Airport
19: Kuala Lumpur Intl vs. Kuala Lumpur Airport
22: Hamad Intl Airport vs. Doha Hamad Airport
23: Cologne / Bonn vs. Cologne/Bonn Airport
24: Johannesburg Airport vs. Johannesburg Intl
27: Madrid-Barajas vs. Madrid-Barajas Airport
28: Durban Intl Airport vs. Durban Airport
29: Abu Dhabi Intl Airport vs. Abu Dhabi Airport
30: Cincinnati vs. Cincinnati/Kentucky
33: Denver Intl Airport vs. Denver Airport
36: San Francisco Intl vs. San Francisco Airport
38: Vienna Intl Airport vs. Vienna Airport
39: Dubai Intl Airport vs. Dubai Airport
40: London Gatwick vs. Gatwick Airport
44: Atlanta vs. Hartsfield-Jackson
46: Bahrain In

In [112]:
def list_integrity_check(dict_rtoa, dict_ator):
    for i in range(100):
        if (i+1 not in dict_rtoa):
            print(i+1)
print("2016:")
list_integrity_check(ratings_16_rtoa, ratings_16_ator)
print("2015:")
list_integrity_check(ratings_15_rtoa, ratings_15_ator)
print("2014:")
list_integrity_check(ratings_14_rtoa, ratings_14_ator)
print("2013:")
list_integrity_check(ratings_13_rtoa, ratings_13_ator)
print("2012:")
list_integrity_check(ratings_12_rtoa, ratings_12_ator)
print("2011:")
list_integrity_check(ratings_11_rtoa, ratings_11_ator)

2016:
2015:
90
2014:
2013:
2012:
2011:
30
93
94
96


In [113]:
# Not everything captured due to HTML Page errors (e.g. 2015 Shanghai Pudong #90 2x <td class="middle">)
# Let's manually add it:
ratings_15_rtoa[90] = "Shanghai Pudong"
ratings_15_ator["Shanghai Pudong"] = 90

def list_integrity_check(dict_rtoa, dict_ator):
    sum_rats = 0
    for rating in dict_rtoa:
        if (rating != dict_ator[dict_rtoa[rating]]):
            print(rating)
        if (rating > 100):
            print(rating)
        sum_rats += 1
    print("TOTAL: " + str(sum_rats))
print("2016:")
list_integrity_check(ratings_16_rtoa, ratings_16_ator)
print("2015:")
list_integrity_check(ratings_15_rtoa, ratings_15_ator)
print("2014:")
list_integrity_check(ratings_14_rtoa, ratings_14_ator)
print("2013:")
list_integrity_check(ratings_13_rtoa, ratings_13_ator)
print("2012:")
list_integrity_check(ratings_12_rtoa, ratings_12_ator)
print("2011:")
list_integrity_check(ratings_11_rtoa, ratings_11_ator)

2016:
TOTAL: 100
2015:
101
103
105
141
218
TOTAL: 105
2014:
102
109
111
260
TOTAL: 104
2013:
101
102
104
113
133
176
182
TOTAL: 107
2012:
101
104
105
106
TOTAL: 104
2011:
101
102
105
TOTAL: 99


In [114]:
chain_ratings_16 = {}
chain_ratings_15 = {}
chain_ratings_14 = {}
chain_ratings_13 = {}

def chain_ratings(chain_dict, dict_rtoa, dict_ator):
    for apt_key in dict_ator:
        if dict_rtoa[dict_ator[apt_key]] != apt_key:
            chain_dict[apt_key] = dict_rtoa[dict_ator[apt_key]]
            print(apt_key + " -> " + chain_dict[apt_key])
print("2016->2015:")
chain_ratings(chain_ratings_16, ratings_15_rtoa, ratings_15_ator)
print("2015->2014:")
chain_ratings(chain_ratings_15, ratings_14_rtoa, ratings_14_ator)
print("2014->2013:")
chain_ratings(chain_ratings_14, ratings_13_rtoa, ratings_13_ator)
print("2013->2012:")
chain_ratings(chain_ratings_13, ratings_12_rtoa, ratings_12_ator)
    

2016->2015:
Tokyo Intl  Haneda -> Tokyo Intl Haneda
Cincinnati/Kentucky -> Cincinnati
Vancouver Airport -> Vancouver Intl Airport
Madrid-Barajas Airport -> Madrid-Barajas
Johannesburg Intl -> Johannesburg Airport
San Francisco Airport -> San Francisco Intl
Beijing Capital Airport -> Beijing Capital
Geneva Intl Airport -> Geneva Airport
Nice Airport -> Nice Côte d'Azur
Keflavik Intl Airport -> Keflavik Airport
Abu Dhabi Airport -> Abu Dhabi Intl Airport
Doha Hamad Airport -> Hamad Intl Airport
Montréal Intl Airport -> Montréal Trudeau
New York JFK Airport -> New York JFK
Xi'an Airport -> Xi'an Intl Airport
Hong Kong Intl Airport -> Hong Kong Intl
Hartsfield-Jackson -> Atlanta
JakartaIntl Airport -> Jakarta Intl Airport
Cologne/Bonn Airport -> Cologne / Bonn
Durban Airport -> Durban Intl Airport
Denver Airport -> Denver Intl Airport
Athens Intl Airport -> Athens Airport
Porto  Airport -> Porto Airport
Boston Logan Airport -> Boston Logan
Kuala Lumpur Airport -> Kuala Lumpur Intl
Budapest

In [115]:
# Manually fix the problem ones that can't be resolved by name
chain_ratings_14['Narita Intl Airport'] = 'Tokyo Naritar'
chain_ratings_14['Istanbul Atatürk Airport'] = 'Istanbul Atatürk'
chain_ratings_14['Abu Dhabi Airport'] = 'Abu Dhabi Intl Airport'
chain_ratings_14['London Gatwick Airport'] = 'London Gatwick'
chain_ratings_14['Guangzhou Airport'] = 'Guangzhou  Airport'

print(chain_ratings_14)

{'Panama Airport': 'Panama Tocumen', 'Abu Dhabi Airport': 'Abu Dhabi Intl Airport', 'Narita Intl Airport': 'Tokyo Naritar', 'Athens Intl Airport': 'Athens Airport', 'Cologne/Bonn Airport': 'Cologne / Bonn', 'Minneapolis-Saint Paul': 'Minneapolis-St Paul', 'Cincinnati Intl Airport': 'Cincinnati', 'Johannesburg Airport': 'Johannesburg', 'Christchurch Intl Airport': 'Christchurch Airport', 'Vancouver Airport': 'Vancouver Intl Airport', 'Tokyo Intl Airport': 'Tokyo Haneda', 'Heathrow Airport': 'London Heathrow', 'Cape Town Intl Airport': 'Cape Town Airport', 'Madrid-Barajas Airport': 'Madrid-Barajas', 'Auckland Intl Airport': 'Auckland Airport', 'Kuala Lumpur Intl': 'Kuala Lumpur', 'Lisbon Airport': 'Lisbon Portela Airport', 'San Francisco Airport': 'San Francisco', 'Guayaquil Intl Airport': 'Guayaquil Airport', 'New York JFK Airport': 'New York JFK', 'Istanbul Atatürk Airport': 'Istanbul Atatürk', 'Hyderabad  Airport': 'Hyderabad Airport', 'Singapore Changi Airport': 'Singapore Changi', '

In [116]:
def rev_dict(dict_in):
    dict_out = {}
    for elem in dict_in:
        dict_out[dict_in[elem]] = elem
    return dict_out

chain_ratings_16r = rev_dict(chain_ratings_16)
chain_ratings_15r = rev_dict(chain_ratings_15)
chain_ratings_14r = rev_dict(chain_ratings_14)
chain_ratings_13r = rev_dict(chain_ratings_13)

In [117]:
# Causes the chain_lists to reflect the oldest name:
# chain_lists = [chain_ratings_16r, chain_ratings_15r, chain_ratings_14r, chain_ratings_13r]

# for i in range(len(chain_lists) - 1):
#     for rating in chain_lists[i+1]:
#         for j in range(i+1)[::-1]:
#             if chain_lists[i+1][rating] in chain_lists[j]:
#                 chain_lists[i+1][rating] = chain_lists[j][chain_lists[i+1][rating]]
#                 break

# Written out/thorough explanation:
# for rating in chain_ratings_15r:
#     if chain_ratings_15r[rating] in chain_ratings_16r:
#         chain_ratings_15r[rating] = chain_ratings_16r[chain_ratings_15r[rating]]
# for rating in chain_ratings_14r:
#     if chain_ratings_14r[rating] in chain_ratings_15r:
#         chain_ratings_14r[rating] = chain_ratings_15r[chain_ratings_14r[rating]]
#     elif chain_ratings_14r[rating] in chain_ratings_16r:
#         chain_ratings_14r[rating] = chain_ratings_16r[chain_ratings_14r[rating]]
# and so on...

In [125]:
# print(chain_ratings_15r)
# Fix All AtoRs. get rid of duplicates, and then chain together into final as before!
# First handle the 16->15
# This is not foolproof: something that occurs in 2016 and is skipped in 2015 but comes with a different name in 2014 will not be caught!
chain_ratings_12r = {}
rtoa_lists = [ratings_16_rtoa, ratings_15_rtoa, ratings_14_rtoa, ratings_13_rtoa, ratings_12_rtoa, ratings_11_rtoa]
ator_lists = [ratings_16_ator, ratings_15_ator, ratings_14_ator, ratings_13_ator, ratings_12_ator, ratings_11_ator]
chain_lists = [chain_ratings_16r, chain_ratings_15r, chain_ratings_14r, chain_ratings_13r, chain_ratings_12r]
for i in range(len(rtoa_lists) - 1):
    for rank in rtoa_lists[i+1]:
        for j in range(i+1)[::-1]:
            curr_apt = rtoa_lists[i+1][rank]
            if curr_apt in chain_lists[j]:
                rtoa_lists[i+1][rank] = chain_lists[j][curr_apt]
        ator_lists[i+1][rtoa_lists[i+1][rank]] = rank

In [120]:
apt_set = set()
for rtoa_list in rtoa_lists:
    for ranking in rtoa_list:
        apt_set.add(rtoa_list[ranking])
print(sorted(apt_set))
print(len(apt_set))

['Abu Dhabi Airport', 'Adelaide Airport', 'Amsterdam Schiphol', 'Athens Intl Airport', 'Auckland Airport', 'Bahrain Airport', 'Bangkok Suvarnabhumi', 'Barcelona Airport', 'Beijing Capital Airport', 'Bengaluru Airport', 'Berlin Schönefeld', 'Berlin Tegel Airport', 'Billund Airport', 'Birmingham Airport', 'Bogota El Dorado', 'Boston Logan Airport', 'Brisbane Airport', 'Brussels Airport', 'Budapest Intl Airport', 'Cairo Airport', 'Cape Town Airport', 'Centrair Airport', 'Charlotte/Douglas', 'Chengdu Airport', "Chicago O'Hare", 'Christchurch Airport', 'Cincinnati/Kentucky', 'Cologne/Bonn Airport', 'Copenhagen Airport', 'Dallas/Fort Worth', 'Delhi Intl Airport', 'Denver Airport', 'Detroit Airport', 'Doha Hamad Airport', 'Dubai Airport', 'Dublin Airport', 'Durban Airport', 'Dusseldorf Airport', 'Frankfurt Airport', 'Fukuoka Airport', 'Gatwick Airport', 'Geneva Intl Airport', 'Gimpo Intl Airport', 'Gold Coast Airport', 'Guangzhou Airport', 'Guangzhou\xa0 Airport', 'Guayaquil Airport', 'Haikou

In [121]:
# Have a problem with 'Guangzhou Airport' being duplicated as 'Guangzhou\xa0 Airport' in 2011-13
for rtoa_list in rtoa_lists:
    for rank in rtoa_list:
        if rtoa_list[rank] == 'Guangzhou\xa0 Airport':
            rtoa_list[rank] = 'Guangzhou Airport'

apt_set = set()
for rtoa_list in rtoa_lists:
    for ranking in rtoa_list:
        apt_set.add(rtoa_list[ranking])
print(sorted(apt_set))
print(len(apt_set))

['Abu Dhabi Airport', 'Adelaide Airport', 'Amsterdam Schiphol', 'Athens Intl Airport', 'Auckland Airport', 'Bahrain Airport', 'Bangkok Suvarnabhumi', 'Barcelona Airport', 'Beijing Capital Airport', 'Bengaluru Airport', 'Berlin Schönefeld', 'Berlin Tegel Airport', 'Billund Airport', 'Birmingham Airport', 'Bogota El Dorado', 'Boston Logan Airport', 'Brisbane Airport', 'Brussels Airport', 'Budapest Intl Airport', 'Cairo Airport', 'Cape Town Airport', 'Centrair Airport', 'Charlotte/Douglas', 'Chengdu Airport', "Chicago O'Hare", 'Christchurch Airport', 'Cincinnati/Kentucky', 'Cologne/Bonn Airport', 'Copenhagen Airport', 'Dallas/Fort Worth', 'Delhi Intl Airport', 'Denver Airport', 'Detroit Airport', 'Doha Hamad Airport', 'Dubai Airport', 'Dublin Airport', 'Durban Airport', 'Dusseldorf Airport', 'Frankfurt Airport', 'Fukuoka Airport', 'Gatwick Airport', 'Geneva Intl Airport', 'Gimpo Intl Airport', 'Gold Coast Airport', 'Guangzhou Airport', 'Guayaquil Airport', 'Haikou Meilan Airport', 'Halifa

In [130]:
def add_rating(apt_key, r_list, r_tbl):
    if apt_key in r_tbl:
        r_list.append(r_tbl[apt_key])
    else:
        r_list.append(0)

cumul_ratings = {}
for apt in apt_set:
    ratings = []
    for ator_list in ator_lists:
        add_rating(apt, ratings, ator_list)
    cumul_ratings[apt] = ratings
print(cumul_ratings)

{'Tokyo Intl  Haneda': [4, 5, 6, 9, 14, 17], 'Narita Intl Airport': [11, 14, 16, 16, 17, 19], 'Christchurch Airport': [51, 76, 74, 64, 70, 65], 'Haikou Meilan Airport': [52, 53, 34, 44, 64, 53], 'Cincinnati/Kentucky': [32, 30, 27, 30, 24, 24], 'Amsterdam Schiphol': [13, 9, 5, 3, 4, 6], 'Vancouver Airport': [14, 11, 9, 8, 9, 12], 'Raleigh-Durham': [0, 99, 93, 86, 82, 84], 'Philadelphia Airport': [0, 0, 100, 104, 97, 100], 'Madrid-Barajas Airport': [31, 27, 41, 47, 38, 22], 'Johannesburg Intl': [30, 24, 26, 28, 31, 25], 'Sydney Airport': [23, 21, 21, 31, 20, 40], 'San Francisco Airport': [37, 36, 39, 40, 39, 50], 'Auckland Airport': [21, 15, 11, 12, 13, 8], 'Prague Airport': [93, 85, 70, 61, 54, 44], 'Kansai Intl Airport': [9, 12, 14, 18, 19, 14], 'Mauritius Airport': [0, 100, 111, 0, 0, 0], 'Bengaluru Airport': [74, 64, 79, 73, 67, 59], 'Guangzhou Airport': [85, 66, 42, 42, 52, 41], 'Geneva Intl Airport': [96, 95, 109, 96, 105, 0], 'Toronto Pearson': [44, 43, 43, 46, 47, 49], 'Copenhage

In [134]:
with open('skytrax_top100.csv', 'w', newline='') as csvfile:
    skt_writer = csv.DictWriter(csvfile, fieldnames=['apt_name', '16', '15', '14', '13', '12', '11'])
    skt_writer.writeheader()
    for apt in cumul_ratings:
        ratings = cumul_ratings[apt]
        skt_writer.writerow({'apt_name': apt, '16': ratings[0], '15': ratings[1], '14': ratings[2], '13': ratings[3], '12': ratings[4], '11': ratings[5]})