# Wikipedia route database construction

This notebook uses the route list retrieved by 02-routes_parsing.ipynb  

- the destination url is replaced by the canonical url to allow for a clean merge with airport data 
- the airline url is used to find informations on airlines


In [2]:
import numpy as np
import requests
from requests.adapters import HTTPAdapter, Retry
from bs4 import BeautifulSoup
from string import Template
import pandas as pd
from tqdm.notebook import tqdm

tqdm.pandas()

## Step 1 : find the destination airports

**Fisrt run**:  
Indeed, the wikipedia list of airport is not exhaustive and some airports are misisng. Hence, we try a first merge of the routes with the airports.  
The missing airports links (only for big airports (k> 14 routes), otherwise the iterative process could go on for a long time) are reinjected in the 01-airport_parsing notebook, added to the airports database, and to 02-routes_parsing to find their routes. 

**Other runs**:
The resulting new list is processed again in this notebook, there are still "missing airports" but those should not be big airports, wich is fine for our application.

In [4]:
raw_routes = pd.read_csv("data/wikipedia_relations_new24_04.csv")
raw_routes.drop(columns=["Unnamed: 0"], index=1, inplace=True)
raw_routes.columns = ["airline", "origin", "destination", "type"]

One problem that emerged during this preparation is that wiki can have several different url at once. However, a unique 'canonical' url is always present. 
We need to associate the canonical url of each airport. 

 
Process: deduplicate airports links, open page and get canonical url.  
NB: the origin of each route is already the cannonical url, since the canonical url (instead of the url) of each airport is catched during airport html parsing. 


In [3]:
airports_nodups = pd.DataFrame(raw_routes["destination"].drop_duplicates())

In [4]:
def find_canonical_url(airport_link):
    headers = {"User-Agent": "AirProjectBot/0.0 (antoine732@hotmail.fr)"}
    # response = requests.get(airline_link, headers=headers)

    try:
        session = requests.Session()
        retries = Retry(
            total=5,
            backoff_factor=1,
            status_forcelist=[429],
            respect_retry_after_header=True,
        )
        session.mount("https://", HTTPAdapter(max_retries=retries))
        response = session.get(airport_link, headers=headers)
    except:
        response = np.nan

    if response is not np.nan:
        soup = BeautifulSoup(response.content, "html.parser")
        wdpa_link = soup.find("link", {"rel": "canonical"}).get("href")

    else:
        wdpa_link = airport_link  # we shouldn't get there
        print("No canonical url found!")
    return wdpa_link

In [None]:
airport_canonical = airports_nodups.progress_apply(
    lambda x: find_canonical_url(x["destination"]), axis=1, result_type="expand"
)

In [7]:
airport_canonical

0         https://en.wikipedia.org/wiki/Toronto_Pearson_...
2         https://en.wikipedia.org/wiki/John_F._Kennedy_...
3         https://en.wikipedia.org/wiki/Miami_Internatio...
4         https://en.wikipedia.org/wiki/Philadelphia_Int...
5         https://en.wikipedia.org/wiki/Jo%C3%A3o_Paulo_...
                                ...                        
100507         https://en.wikipedia.org/wiki/Likiep_Airport
100585    https://en.wikipedia.org/wiki/Islam_Karimov_Ta...
100612    https://en.wikipedia.org/wiki/H._Hasan_Aroeboe...
100621    https://en.wikipedia.org/wiki/Rosario_%E2%80%9...
100695    https://en.wikipedia.org/wiki/H.A.S._Hanandjoe...
Length: 5870, dtype: object

In [8]:
airports_nodups["canonical_destination"] = airport_canonical

In [3]:
# airports_nodups.to_csv('data/airport_canonical_24_04.csv')
airports_nodups = pd.read_csv("data/airport_canonical_24_04.csv")

In [5]:
raw_routes = raw_routes.merge(
    airports_nodups, left_on="destination", right_on="destination", how="left"
)

In [6]:
raw_routes

Unnamed: 0.1,airline,origin,destination,type,Unnamed: 0,canonical_destination
0,https://en.wikipedia.org/wiki/Air_Canada_Rouge,https://en.wikipedia.org/wiki/L.F._Wade_Intern...,https://en.wikipedia.org/wiki/Toronto_Pearson_...,Regular,0,https://en.wikipedia.org/wiki/Toronto_Pearson_...
1,https://en.wikipedia.org/wiki/American_Airlines,https://en.wikipedia.org/wiki/L.F._Wade_Intern...,https://en.wikipedia.org/wiki/John_F._Kennedy_...,Regular,2,https://en.wikipedia.org/wiki/John_F._Kennedy_...
2,https://en.wikipedia.org/wiki/American_Airlines,https://en.wikipedia.org/wiki/L.F._Wade_Intern...,https://en.wikipedia.org/wiki/Miami_Internatio...,Seasonal,3,https://en.wikipedia.org/wiki/Miami_Internatio...
3,https://en.wikipedia.org/wiki/American_Airlines,https://en.wikipedia.org/wiki/L.F._Wade_Intern...,https://en.wikipedia.org/wiki/Philadelphia_Int...,Seasonal,4,https://en.wikipedia.org/wiki/Philadelphia_Int...
4,https://en.wikipedia.org/wiki/Azores_Airlines,https://en.wikipedia.org/wiki/L.F._Wade_Intern...,https://en.wikipedia.org/wiki/Jo%C3%A3o_Paulo_...,Seasonal,5,https://en.wikipedia.org/wiki/Jo%C3%A3o_Paulo_...
...,...,...,...,...,...,...
100884,https://en.wikipedia.org/wiki/Shandong_Airlines,https://en.wikipedia.org/wiki/Heze_Mudan_Airport,https://en.wikipedia.org/wiki/%C3%9Cr%C3%BCmqi...,Regular,32102,https://en.wikipedia.org/wiki/%C3%9Cr%C3%BCmqi...
100885,https://en.wikipedia.org/wiki/Shandong_Airlines,https://en.wikipedia.org/wiki/Heze_Mudan_Airport,https://en.wikipedia.org/wiki/Xiamen_Gaoqi_Int...,Regular,723,https://en.wikipedia.org/wiki/Xiamen_Gaoqi_Int...
100886,https://en.wikipedia.org/wiki/Shandong_Airlines,https://en.wikipedia.org/wiki/Heze_Mudan_Airport,https://en.wikipedia.org/wiki/Xi%27an_Xianyang...,Regular,30858,https://en.wikipedia.org/wiki/Xi%27an_Xianyang...
100887,https://en.wikipedia.org/wiki/Shandong_Airlines,https://en.wikipedia.org/wiki/Heze_Mudan_Airport,https://en.wikipedia.org/wiki/Yantai_Penglai_I...,Regular,63198,https://en.wikipedia.org/wiki/Yantai_Penglai_I...


Now, we merge the routes dataframe with the airport data from 01 notebook.
This is JUST to look for missing airports: top proper merge with aggregated airport data is done afterwards (02_airport_features)

As metionned in the introduction, this notebook is ran TWICE:

- First run --> only airports retrieved from wikipedia continental list of airports. Airports are merged with routes.  
Sometimes the destination link is not in the airport database (origin is always, by construction).  
It is stored in a missing_airport files in the following, that is processed in 1 and 02 notebooks.
Unknown airports are added only if they have more than 14 routes. 

- After the first run, missing airports should only be "minor" airports, with less than 14 destinations. Nevertheless, they are processed in 01 (not in 02) once again if they have more than 2 routes (1 routers were not included because it includes too specific wikipedia pages (natural reserves, oil rigs, ...) not adapted for the querry. It means we still look for these destinations features with wikipedia. Their routes are not parsed however, but this could be done. *NB: it could be done, but this second step was added as a refinement later in the process, and doing so would erase include two route parsing dates in the database, which shoud be avoided. It only removes routes between TWO airports of this category, which should be a very marginal proportion of the database and would add much noise.*


In [74]:
nam_airport_df = pd.read_csv("data/n_amer_arpt_v2.csv")
sam_airport_df = pd.read_csv("data/s_amer_arpt_v2.csv")
eu_airport_df = pd.read_csv("data/eu_arpt_v2.csv")
af_airport_df = pd.read_csv("data/af_arpt_v2.csv")
as_airport_df = pd.read_csv("data/as_arpt_v2.csv")
oc_airport_df = pd.read_csv("data/oc_arpt_v2.csv")
special_airport_df = pd.read_csv("data/pb_arpt_v2.csv")

# Only after first run
missing_airport_df = pd.read_csv("data/missing_arpt_v2.csv")

# Only after second run

missing_airport_extra_df = pd.read_csv("data/missing_arpt_extra.csv")
# conbine airports in a single dataframe.

In [75]:
all_airport_df = (
    pd.concat(
        [
            nam_airport_df,
            sam_airport_df,
            eu_airport_df,
            af_airport_df,
            as_airport_df,
            oc_airport_df,
            special_airport_df,
            missing_airport_df,
            missing_airport_extra_df,
        ],
        axis=0,
    )
    .reset_index()
    .drop(columns=["index", "Unnamed: 0"], axis=1)
)

In [76]:
reduced_airport_df = all_airport_df[
    [
        "icao",
        "iata",
        "max_population",
        "max_passengers19",
        "maxpax",
        "wdpa_link",
        "wpda_iata",
        "wpda_icao",
    ]
]

reduced_airport_df.columns = [
    "O_icao",
    "O_iata",
    "O_max_population",
    "O_max_passengers19",
    "O_maxpax",
    "O_wdpa_link",
    "O_wpda_iata",
    "O_wpda_icao",
]
routes_w_departures = raw_routes.merge(
    reduced_airport_df, left_on="origin", right_on="O_wdpa_link", how="left"
)

reduced_airport_df.columns = [
    "D_icao",
    "D_iata",
    "D_max_population",
    "D_max_passengers19",
    "D_maxpax",
    "D_wdpa_link",
    "D_wpda_iata",
    "D_wpda_icao",
]
routes_full = routes_w_departures.merge(
    reduced_airport_df,
    left_on="canonical_destination",
    right_on="D_wdpa_link",
    how="left",
)

In [77]:
routes_full.dropna(subset="O_iata", inplace=True)

In [78]:
missing_airports = (
    routes_full[routes_full["D_wdpa_link"].isna()]
    .groupby(["destination"])["airline"]
    .count()
    .sort_values()
)

In [79]:
# missing_airports.index
missing_airports = missing_airports.reset_index()
missing_airports = missing_airports[missing_airports["airline"] > 3]

Unnamed: 0,destination,airline
1009,https://en.wikipedia.org/wiki/Saadani_National...,4
1010,https://en.wikipedia.org/wiki/Neom_Airport,5
1011,https://en.wikipedia.org/wiki/Chongqing_Xiann%...,6


In [81]:
# storing the additional misisng airports in a csv => routes won't be searched frm these airports, but their information in parsed to improve data quality and remove unknwon airports in the database
# missing_airports.reset_index()['destination'].to_csv('data/extra_airport_refs.csv')

In [82]:
routes_full = routes_full[~routes_full["D_wdpa_link"].isna()][
    ["airline", "origin", "canonical_destination", "type"]
]
routes_full

Unnamed: 0,airline,origin,canonical_destination,type
0,https://en.wikipedia.org/wiki/Air_Canada_Rouge,https://en.wikipedia.org/wiki/L.F._Wade_Intern...,https://en.wikipedia.org/wiki/Toronto_Pearson_...,Regular
1,https://en.wikipedia.org/wiki/American_Airlines,https://en.wikipedia.org/wiki/L.F._Wade_Intern...,https://en.wikipedia.org/wiki/John_F._Kennedy_...,Regular
2,https://en.wikipedia.org/wiki/American_Airlines,https://en.wikipedia.org/wiki/L.F._Wade_Intern...,https://en.wikipedia.org/wiki/Miami_Internatio...,Seasonal
3,https://en.wikipedia.org/wiki/American_Airlines,https://en.wikipedia.org/wiki/L.F._Wade_Intern...,https://en.wikipedia.org/wiki/Philadelphia_Int...,Seasonal
4,https://en.wikipedia.org/wiki/Azores_Airlines,https://en.wikipedia.org/wiki/L.F._Wade_Intern...,https://en.wikipedia.org/wiki/Jo%C3%A3o_Paulo_...,Seasonal
...,...,...,...,...
101040,https://en.wikipedia.org/wiki/Shandong_Airlines,https://en.wikipedia.org/wiki/Heze_Mudan_Airport,https://en.wikipedia.org/wiki/%C3%9Cr%C3%BCmqi...,Regular
101041,https://en.wikipedia.org/wiki/Shandong_Airlines,https://en.wikipedia.org/wiki/Heze_Mudan_Airport,https://en.wikipedia.org/wiki/Xiamen_Gaoqi_Int...,Regular
101042,https://en.wikipedia.org/wiki/Shandong_Airlines,https://en.wikipedia.org/wiki/Heze_Mudan_Airport,https://en.wikipedia.org/wiki/Xi%27an_Xianyang...,Regular
101043,https://en.wikipedia.org/wiki/Shandong_Airlines,https://en.wikipedia.org/wiki/Heze_Mudan_Airport,https://en.wikipedia.org/wiki/Yantai_Penglai_I...,Regular


## Step 2: add airlines codes


Let's list the unique airlines links before queryng their IATA ID 

In [84]:
airline_nodups = routes_full["airline"].drop_duplicates()

In [85]:
def find_airlines_iata(airline_link):

    headers = {"User-Agent": "AirProjectBot/0.0 (antoine732@hotmail.fr)"}
    # response = requests.get(airline_link, headers=headers)

    try:
        session = requests.Session()
        retries = Retry(
            total=5,
            backoff_factor=1,
            status_forcelist=[429],
            respect_retry_after_header=True,
        )
        session.mount("https://", HTTPAdapter(max_retries=retries))
        response = session.get(airline_link, headers=headers)
    except:
        response = np.nan

    airline_name = np.nan
    wpda_iata = np.nan
    wpda_icao = np.nan

    if response is not np.nan:
        soup = BeautifulSoup(response.content, "html.parser")

        # Find the airport information card and name (page title)
        vcard = soup.find("table", class_="infobox vcard")
        title = soup.find(class_="mw-page-title-main")
        if title is not None:
            airline_name = title.text

        if vcard is not None:
            # Extract interesting information from the vcard when available
            # Extract the "nickname"; it is html name of the iata and icao field of Wiki pages

            if vcard.find("td", class_="infobox-full-data") is not None:
                table = vcard.find("td", class_="infobox-full-data").find("table")

                if table is not None:
                    row = []
                    for tr in table.find_all("tr"):
                        for td in tr.find_all("td"):
                            row.append(td.text.strip("\n"))

                wpda_iata = row[0]
                wpda_icao = row[1]

    return [airline_link, airline_name, wpda_iata, wpda_icao]

In [86]:
pd.DataFrame(airline_nodups)

Unnamed: 0,airline
0,https://en.wikipedia.org/wiki/Air_Canada_Rouge
1,https://en.wikipedia.org/wiki/American_Airlines
4,https://en.wikipedia.org/wiki/Azores_Airlines
5,https://en.wikipedia.org/wiki/British_Airways
6,https://en.wikipedia.org/wiki/Delta_Air_Lines
...,...
98890,https://en.wikipedia.org/wiki/Vietjet_Air
100645,https://en.wikipedia.org/wiki/Air_Mandalay
100669,https://en.wikipedia.org/wiki/Air_Transport_In...
100797,https://en.wikipedia.org/wiki/Air_Excel


In [87]:
airline_data = pd.DataFrame(airline_nodups).progress_apply(
    lambda x: find_airlines_iata(x["airline"]), axis=1, result_type="expand"
)

  0%|          | 0/1020 [00:00<?, ?it/s]

In [88]:
airline_data.reset_index(inplace=True)

In [89]:
airline_data.drop(columns="index", axis=1, inplace=True)

In [90]:
airline_data.columns = ["airline_link", "airline_name", "airline_iata", "airline_icao"]
airline_data

Unnamed: 0,airline_link,airline_name,airline_iata,airline_icao
0,https://en.wikipedia.org/wiki/Air_Canada_Rouge,Air Canada Rouge,RV,ROU[1]
1,https://en.wikipedia.org/wiki/American_Airlines,American Airlines,AA[1],AAL[1]
2,https://en.wikipedia.org/wiki/Azores_Airlines,Azores Airlines,S4,RZO
3,https://en.wikipedia.org/wiki/British_Airways,British Airways,BA,BAW; SHT
4,https://en.wikipedia.org/wiki/Delta_Air_Lines,Delta Air Lines,DL,DAL
...,...,...,...,...
1015,https://en.wikipedia.org/wiki/Vietjet_Air,VietJet Air,VJ,VJC
1016,https://en.wikipedia.org/wiki/Air_Mandalay,Air Mandalay,6T,AMY
1017,https://en.wikipedia.org/wiki/Air_Transport_In...,Air Transport International,8C,ATN
1018,https://en.wikipedia.org/wiki/Air_Excel,Air Excel,—,XLL


In [96]:
airline_data[airline_data["airline_icao"].isna()]

Unnamed: 0,airline_link,airline_name,airline_iata,airline_icao
123,https://en.wikipedia.org/wiki/Glencore,Glencore,,
124,https://en.wikipedia.org/wiki/Hydro-Quebec,Hydro-Québec,,
138,https://en.wikipedia.org/wiki/Aero_Pac%C3%ADfico,Aero Pacífico,,
147,https://en.wikipedia.org/wiki/DHL,DHL,,
157,https://en.wikipedia.org/wiki/Aerotuc%C3%A1n,Aerotucán,,
250,https://en.wikipedia.org/wiki/ABM_Air,BMN Air,,
267,https://en.wikipedia.org/wiki/Condor,Condor,,
270,https://en.wikipedia.org/wiki/Island_Birds,Island Birds,,
271,https://en.wikipedia.org/wiki/Island_Air_(Caym...,Island Air (Cayman Islands),,
278,https://en.wikipedia.org/wiki/Havana_Air,Havana Air,,


In [97]:
## Manual correction of small problemns encountered before ==> not worth spending time on a robust code for 5 wrong airlines

airline_data.iloc[5]["airline_name"] = "JetBlue"
airline_data.iloc[365]["airline_name"] = "EasyJet"
airline_data.iloc[232]["airline_name"] = "iAero Airways"
airline_data.iloc[372]["airline_name"] = "airBaltic"
airline_data.iloc[568]["airline_name"] = "flyadeal"
airline_data.iloc[607]["airline_name"] = "airBaltic"
airline_data.iloc[628][
    "airline_link"
] = "https://en.wikipedia.orghttps://fr.wikipedia.org/wiki/Amelia_International"
airline_data.iloc[628]["airline_name"] = "Amelia International"
airline_data.iloc[628]["airline_iata"] = "NL"
airline_data.iloc[628]["airline_icao"] = "AEH"

## The following airlines exists but they are aggregates of operational divisions, each being a proper airline.
## Wiki doesn't focus on that and saddly it isn't possible to know which division fly each route.
## Flights are affected to their parent airlines.


# airline_data.iloc[364]['airline_iata']='U2'
# airline_data.iloc[364]['airline_icao']='EZY'

airline_data.iloc[41]["airline_iata"] = "AA"
airline_data.iloc[41]["airline_icao"] = "AAL"

airline_data.iloc[22]["airline_iata"] = "UA"
airline_data.iloc[22]["airline_icao"] = "UAL"

airline_data.iloc[14]["airline_iata"] = "DL"
airline_data.iloc[14]["airline_icao"] = "DAL"

airline_data.iloc[9]["airline_iata"] = "AC"
airline_data.iloc[9]["airline_icao"] = "ACA"

In [16]:
# airline_data.to_csv('data/airline_data_24_04.csv')
# airline_data=pd.read_csv('data/airline_data_24_04.csv')

Now , the airport database constructed is used to convert the route O&D airports wikipedia links into a global df with relevant informations on O&D airports

In [98]:
routes_w_airlines = routes_full.merge(
    airline_data, left_on="airline", right_on="airline_link", how="left"
)

In [99]:
routes_w_airlines[routes_w_airlines["airline_iata"].isna()].groupby("airline_name")[
    "airline_link"
].count()

## Safety check, no airline with a significant number of routes should have NaN iata.
## I consider airlines with less than 30 routes not worth the investigation (the dataset contains 100k routes).

airline_name
Aero Pacífico                                                                      7
Aerotucán                                                                          8
African Airways Alliance                                                           1
Air Moana                                                                         12
Airly                                                                              1
As Salaam Air                                                                      9
Asia Pacific Airlines (PNG)                                                        4
Azimuth                                                                            1
BAE Systems                                                                        6
BMN Air                                                                            6
Canadian Airways Congo                                                             6
Caribbean Helicopters                               

In [100]:
routes_w_airlines

Unnamed: 0,airline,origin,canonical_destination,type,airline_link,airline_name,airline_iata,airline_icao
0,https://en.wikipedia.org/wiki/Air_Canada_Rouge,https://en.wikipedia.org/wiki/L.F._Wade_Intern...,https://en.wikipedia.org/wiki/Toronto_Pearson_...,Regular,https://en.wikipedia.org/wiki/Air_Canada_Rouge,Air Canada Rouge,RV,ROU[1]
1,https://en.wikipedia.org/wiki/American_Airlines,https://en.wikipedia.org/wiki/L.F._Wade_Intern...,https://en.wikipedia.org/wiki/John_F._Kennedy_...,Regular,https://en.wikipedia.org/wiki/American_Airlines,American Airlines,AA[1],AAL[1]
2,https://en.wikipedia.org/wiki/American_Airlines,https://en.wikipedia.org/wiki/L.F._Wade_Intern...,https://en.wikipedia.org/wiki/Miami_Internatio...,Seasonal,https://en.wikipedia.org/wiki/American_Airlines,American Airlines,AA[1],AAL[1]
3,https://en.wikipedia.org/wiki/American_Airlines,https://en.wikipedia.org/wiki/L.F._Wade_Intern...,https://en.wikipedia.org/wiki/Philadelphia_Int...,Seasonal,https://en.wikipedia.org/wiki/American_Airlines,American Airlines,AA[1],AAL[1]
4,https://en.wikipedia.org/wiki/Azores_Airlines,https://en.wikipedia.org/wiki/L.F._Wade_Intern...,https://en.wikipedia.org/wiki/Jo%C3%A3o_Paulo_...,Seasonal,https://en.wikipedia.org/wiki/Azores_Airlines,Azores Airlines,S4,RZO
...,...,...,...,...,...,...,...,...
99348,https://en.wikipedia.org/wiki/Shandong_Airlines,https://en.wikipedia.org/wiki/Heze_Mudan_Airport,https://en.wikipedia.org/wiki/%C3%9Cr%C3%BCmqi...,Regular,https://en.wikipedia.org/wiki/Shandong_Airlines,Shandong Airlines,SC,CDG
99349,https://en.wikipedia.org/wiki/Shandong_Airlines,https://en.wikipedia.org/wiki/Heze_Mudan_Airport,https://en.wikipedia.org/wiki/Xiamen_Gaoqi_Int...,Regular,https://en.wikipedia.org/wiki/Shandong_Airlines,Shandong Airlines,SC,CDG
99350,https://en.wikipedia.org/wiki/Shandong_Airlines,https://en.wikipedia.org/wiki/Heze_Mudan_Airport,https://en.wikipedia.org/wiki/Xi%27an_Xianyang...,Regular,https://en.wikipedia.org/wiki/Shandong_Airlines,Shandong Airlines,SC,CDG
99351,https://en.wikipedia.org/wiki/Shandong_Airlines,https://en.wikipedia.org/wiki/Heze_Mudan_Airport,https://en.wikipedia.org/wiki/Yantai_Penglai_I...,Regular,https://en.wikipedia.org/wiki/Shandong_Airlines,Shandong Airlines,SC,CDG


In [101]:
routes_w_airlines = routes_w_airlines.drop(columns={"airline_link"})
routes_w_airlines["airline_iata"] = (
    routes_w_airlines["airline_iata"]
    .str.split("[")
    .str[0]
    .str.split(";")
    .str[0]
    .str.split("*")
    .str[0]
)
routes_w_airlines["airline_icao"] = (
    routes_w_airlines["airline_icao"]
    .str.split("[")
    .str[0]
    .str.split(";")
    .str[0]
    .str.split("*")
    .str[0]
)
routes_w_airlines.loc[
    routes_w_airlines["airline_iata"] == "See Operators", "airline_iata"
] = "WS"
routes_w_airlines.loc[
    routes_w_airlines["airline_icao"] == "See Operators", "airline_icao"
] = "WJA"

routes_w_airlines.to_csv("data/wiki_route_db_26_09.csv")

In [102]:
routes_w_airlines

Unnamed: 0,airline,origin,canonical_destination,type,airline_name,airline_iata,airline_icao
0,https://en.wikipedia.org/wiki/Air_Canada_Rouge,https://en.wikipedia.org/wiki/L.F._Wade_Intern...,https://en.wikipedia.org/wiki/Toronto_Pearson_...,Regular,Air Canada Rouge,RV,ROU
1,https://en.wikipedia.org/wiki/American_Airlines,https://en.wikipedia.org/wiki/L.F._Wade_Intern...,https://en.wikipedia.org/wiki/John_F._Kennedy_...,Regular,American Airlines,AA,AAL
2,https://en.wikipedia.org/wiki/American_Airlines,https://en.wikipedia.org/wiki/L.F._Wade_Intern...,https://en.wikipedia.org/wiki/Miami_Internatio...,Seasonal,American Airlines,AA,AAL
3,https://en.wikipedia.org/wiki/American_Airlines,https://en.wikipedia.org/wiki/L.F._Wade_Intern...,https://en.wikipedia.org/wiki/Philadelphia_Int...,Seasonal,American Airlines,AA,AAL
4,https://en.wikipedia.org/wiki/Azores_Airlines,https://en.wikipedia.org/wiki/L.F._Wade_Intern...,https://en.wikipedia.org/wiki/Jo%C3%A3o_Paulo_...,Seasonal,Azores Airlines,S4,RZO
...,...,...,...,...,...,...,...
99348,https://en.wikipedia.org/wiki/Shandong_Airlines,https://en.wikipedia.org/wiki/Heze_Mudan_Airport,https://en.wikipedia.org/wiki/%C3%9Cr%C3%BCmqi...,Regular,Shandong Airlines,SC,CDG
99349,https://en.wikipedia.org/wiki/Shandong_Airlines,https://en.wikipedia.org/wiki/Heze_Mudan_Airport,https://en.wikipedia.org/wiki/Xiamen_Gaoqi_Int...,Regular,Shandong Airlines,SC,CDG
99350,https://en.wikipedia.org/wiki/Shandong_Airlines,https://en.wikipedia.org/wiki/Heze_Mudan_Airport,https://en.wikipedia.org/wiki/Xi%27an_Xianyang...,Regular,Shandong Airlines,SC,CDG
99351,https://en.wikipedia.org/wiki/Shandong_Airlines,https://en.wikipedia.org/wiki/Heze_Mudan_Airport,https://en.wikipedia.org/wiki/Yantai_Penglai_I...,Regular,Shandong Airlines,SC,CDG


The following steps are in 02 Folder