<a href="https://colab.research.google.com/github/NimaZah/Air/blob/main/Airport.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The solution will start with some basic data cleaning and then continue with some exploratory data analysis and visualizations. I will be looking at the following question:

How does Rayan Air compare to other airlines in terms of average distance?

# Ingesting, cleaning, and merging the data

#### **Airlines dataset**

In [311]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from geopy import distance

In [312]:
airlines = pd.read_csv('https://raw.githubusercontent.com/NimaZah/Air/main/airlines.csv')
airports = pd.read_csv('https://raw.githubusercontent.com/NimaZah/Air/main/airports.csv')
routes = pd.read_csv('https://raw.githubusercontent.com/NimaZah/Air/main/routes.csv')

In [313]:
airlines.head()

Unnamed: 0,-1,Unknown,\N,-,N/A,\N.1,\N.2,Y
0,1,Private flight,\N,-,,,,Y
1,2,135 Airways,\N,,GNL,GENERAL,United States,N
2,3,1Time Airline,\N,1T,RNX,NEXTIME,South Africa,Y
3,4,2 Sqn No 1 Elementary Flying Training School,\N,,WYT,,United Kingdom,N
4,5,213 Flight Unit,\N,,TFU,,Russia,N


In [314]:
# Add the follwoing headers to the dataframe:
airlines.columns = [
    "Airline_ID",
    "name",
    "alias",
    "iata",
    "icao",
    "callsign",
    "country",
    "active",
]
airlines.head()

Unnamed: 0,Airline_ID,name,alias,iata,icao,callsign,country,active
0,1,Private flight,\N,-,,,,Y
1,2,135 Airways,\N,,GNL,GENERAL,United States,N
2,3,1Time Airline,\N,1T,RNX,NEXTIME,South Africa,Y
3,4,2 Sqn No 1 Elementary Flying Training School,\N,,WYT,,United Kingdom,N
4,5,213 Flight Unit,\N,,TFU,,Russia,N


In [315]:
# a)is there any null value in the table? b) is there any duplicated value in 
#the table? c) is there any incorrect value in the table?
missing_values= airlines.isnull().sum()
print(missing_values, '\n')
duplicates= airlines.duplicated().sum()
print(duplicates, '\n')
print(airlines.describe())

Airline_ID       0
name             0
alias          506
iata          4627
icao            86
callsign       808
country         15
active           0
dtype: int64 

0 

         Airline_ID
count   6161.000000
mean    4153.397500
std     4507.362192
min        1.000000
25%     1542.000000
50%     3083.000000
75%     4629.000000
max    21317.000000


In [316]:
# The main objective of the study is to use data to better understand Rayan 
# Air’s routes and how they compare to other airlines in the dataset, 
#including some statistical properties of the routes. Considering this 
#objective, we need to drop the unhelpful information from the dataframe.
airlines.drop(['alias', 'iata', 'icao', 'callsign'], axis=1, inplace=True)
airlines.dropna(inplace=True)
airlines = airlines[airlines['active'] == 'Y']
airlines.head()

Unnamed: 0,Airline_ID,name,country,active
2,3,1Time Airline,South Africa,Y
9,10,40-Mile Air,United States,Y
12,13,Ansett Australia,Australia,Y
13,14,Abacus International,Singapore,Y
20,21,Aigle Azur,France,Y


#### **Airport**

In [317]:
# check the airports dataframe.
airports.head()

Unnamed: 0,1,Goroka Airport,Goroka,Papua New Guinea,GKA,AYGA,-6.081689834590001,145.391998291,5282,10,U,Pacific/Port_Moresby,airport,OurAirports
0,2,Madang Airport,Madang,Papua New Guinea,MAG,AYMD,-5.20708,145.789001,20,10,U,Pacific/Port_Moresby,airport,OurAirports
1,3,Mount Hagen Kagamuga Airport,Mount Hagen,Papua New Guinea,HGU,AYMH,-5.82679,144.296005,5388,10,U,Pacific/Port_Moresby,airport,OurAirports
2,4,Nadzab Airport,Nadzab,Papua New Guinea,LAE,AYNZ,-6.569803,146.725977,239,10,U,Pacific/Port_Moresby,airport,OurAirports
3,5,Port Moresby Jacksons International Airport,Port Moresby,Papua New Guinea,POM,AYPY,-9.44338,147.220001,146,10,U,Pacific/Port_Moresby,airport,OurAirports
4,6,Wewak International Airport,Wewak,Papua New Guinea,WWK,AYWK,-3.58383,143.669006,19,10,U,Pacific/Port_Moresby,airport,OurAirports


In [318]:
# give the follwoing headers to the dataset:
airports.columns = [
    "Airport_ID",
    "name",
    "city",
    "country",
    "iata",
    "icao",
    "latitude",
    "longitude",
    "altitude",
    "timezone",
    "dst",
    "tz_database_time_zone",
    "type",
    "source",
]
airports.head()


Unnamed: 0,Airport_ID,name,city,country,iata,icao,latitude,longitude,altitude,timezone,dst,tz_database_time_zone,type,source
0,2,Madang Airport,Madang,Papua New Guinea,MAG,AYMD,-5.20708,145.789001,20,10,U,Pacific/Port_Moresby,airport,OurAirports
1,3,Mount Hagen Kagamuga Airport,Mount Hagen,Papua New Guinea,HGU,AYMH,-5.82679,144.296005,5388,10,U,Pacific/Port_Moresby,airport,OurAirports
2,4,Nadzab Airport,Nadzab,Papua New Guinea,LAE,AYNZ,-6.569803,146.725977,239,10,U,Pacific/Port_Moresby,airport,OurAirports
3,5,Port Moresby Jacksons International Airport,Port Moresby,Papua New Guinea,POM,AYPY,-9.44338,147.220001,146,10,U,Pacific/Port_Moresby,airport,OurAirports
4,6,Wewak International Airport,Wewak,Papua New Guinea,WWK,AYWK,-3.58383,143.669006,19,10,U,Pacific/Port_Moresby,airport,OurAirports


In [319]:
airports.source.value_counts

<bound method IndexOpsMixin.value_counts of 0       OurAirports
1       OurAirports
2       OurAirports
3       OurAirports
4       OurAirports
           ...     
7692    OurAirports
7693    OurAirports
7694    OurAirports
7695    OurAirports
7696    OurAirports
Name: source, Length: 7697, dtype: object>

In [320]:
# clean the data
missing_values= airports.isnull().sum()
print(missing_values, '\n')
duplicates= airports.duplicated().sum()
print(duplicates, '\n')
print(airports.describe())

Airport_ID                0
name                      0
city                     49
country                   0
iata                      0
icao                      0
latitude                  0
longitude                 0
altitude                  0
timezone                  0
dst                       0
tz_database_time_zone     0
type                      0
source                    0
dtype: int64 

0 

         Airport_ID     latitude    longitude      altitude
count   7697.000000  7697.000000  7697.000000   7697.000000
mean    5171.621801    25.812586    -1.409616   1015.319085
std     3777.045541    28.404465    86.508602   1628.154781
min        2.000000   -90.000000  -179.876999  -1266.000000
25%     1994.000000     6.922420   -78.984200     63.000000
50%     4069.000000    34.086102     6.364167    352.000000
75%     7729.000000    47.239601    55.938801   1203.000000
max    14110.000000    89.500000   179.951004  14472.000000


In [321]:
# remove the columns 'iata', 'icao' and 'type' from the dataframe
airports.drop(['iata', 'icao', 'type'], axis=1, inplace=True)
airports.head()

Unnamed: 0,Airport_ID,name,city,country,latitude,longitude,altitude,timezone,dst,tz_database_time_zone,source
0,2,Madang Airport,Madang,Papua New Guinea,-5.20708,145.789001,20,10,U,Pacific/Port_Moresby,OurAirports
1,3,Mount Hagen Kagamuga Airport,Mount Hagen,Papua New Guinea,-5.82679,144.296005,5388,10,U,Pacific/Port_Moresby,OurAirports
2,4,Nadzab Airport,Nadzab,Papua New Guinea,-6.569803,146.725977,239,10,U,Pacific/Port_Moresby,OurAirports
3,5,Port Moresby Jacksons International Airport,Port Moresby,Papua New Guinea,-9.44338,147.220001,146,10,U,Pacific/Port_Moresby,OurAirports
4,6,Wewak International Airport,Wewak,Papua New Guinea,-3.58383,143.669006,19,10,U,Pacific/Port_Moresby,OurAirports


In [322]:
# The missing value in the column city are NaN. We can fill the missing value with the name of the airport.
airports['city'].fillna(airports['name'], inplace=True)

#### **Routs**

In [323]:
# check the routes dataframe and give the following headers to the dataframe:
routes.columns = [
    "Airline",
    "Airline_ID",
    "Source_airport",
    "Source_airport_ID",
    "Destination_airport",
    "Destination_airport_ID",
    "Codeshare",
    "Stops",
    "Equipment",
]
routes.head()


Unnamed: 0,Airline,Airline_ID,Source_airport,Source_airport_ID,Destination_airport,Destination_airport_ID,Codeshare,Stops,Equipment
0,2B,410,ASF,2966,KZN,2990,,0,CR2
1,2B,410,ASF,2966,MRV,2962,,0,CR2
2,2B,410,CEK,2968,KZN,2990,,0,CR2
3,2B,410,CEK,2968,OVB,4078,,0,CR2
4,2B,410,DME,4029,KZN,2990,,0,CR2


In [324]:
# clean the dataframe
missing_values= routes.isnull().sum()
print(missing_values, '\n')
duplicates= routes.duplicated().sum()
print(duplicates, '\n')
print(routes.describe())

Airline                       0
Airline_ID                    0
Source_airport                0
Source_airport_ID             0
Destination_airport           0
Destination_airport_ID        0
Codeshare                 53065
Stops                         0
Equipment                    18
dtype: int64 

0 

              Stops
count  67662.000000
mean       0.000163
std        0.012749
min        0.000000
25%        0.000000
50%        0.000000
75%        0.000000
max        1.000000


In [325]:
# remove the redundant and unrelated columns from the route dataset
routes.drop(
    ["Source_airport", "Source_airport_ID", "Codeshare", "Stops"], axis=1, inplace=True
)
routes.dropna(inplace=True)
routes.head()


Unnamed: 0,Airline,Airline_ID,Destination_airport,Destination_airport_ID,Equipment
0,2B,410,KZN,2990,CR2
1,2B,410,MRV,2962,CR2
2,2B,410,KZN,2990,CR2
3,2B,410,OVB,4078,CR2
4,2B,410,KZN,2990,CR2


In [326]:
# now, we need to merge the three dataframes to be able to use them to answer our questions.
# merge the airlines and routes dataframes on 'Airline_ID' (first we need to convert the datatypes to object)
airlines['Airline_ID'] = airlines['Airline_ID'].astype(str)
airlines_routes = pd.merge(airlines, routes, left_on='Airline_ID', right_on='Airline_ID')
airlines_routes.head()


Unnamed: 0,Airline_ID,name,country,active,Airline,Destination_airport,Destination_airport_ID,Equipment
0,10,40-Mile Air,United States,Y,Q5,TKJ,7235,CNA
1,10,40-Mile Air,United States,Y,Q5,HKB,7242,CNA
2,10,40-Mile Air,United States,Y,Q5,FAI,3832,CNA
3,10,40-Mile Air,United States,Y,Q5,CKX,\N,CNA
4,21,Aigle Azur,France,Y,ZI,MRS,1353,319


In [327]:
# merge the airports and airlines_routes dataframes on 'Destination_airport_ID'
airlines_routes["Destination_airport_ID"] = airlines_routes[
    "Destination_airport_ID"
].astype(str)
airports["Airport_ID"] = airports["Airport_ID"].astype(str)
airlines_routes_airports = pd.merge(
    airlines_routes, airports, left_on="Destination_airport_ID", right_on="Airport_ID"
)
airlines_routes_airports.head()


Unnamed: 0,Airline_ID,name_x,country_x,active,Airline,Destination_airport,Destination_airport_ID,Equipment,Airport_ID,name_y,city,country_y,latitude,longitude,altitude,timezone,dst,tz_database_time_zone,source
0,10,40-Mile Air,United States,Y,Q5,TKJ,7235,CNA,7235,Tok Junction Airport,Tok,United States,63.329498,-142.953995,1639,-9,A,America/Anchorage,OurAirports
1,10,40-Mile Air,United States,Y,Q5,HKB,7242,CNA,7242,Healy River Airport,Healy,United States,63.866199,-148.968994,1263,-9,A,America/Anchorage,OurAirports
2,10,40-Mile Air,United States,Y,Q5,FAI,3832,CNA,3832,Fairbanks International Airport,Fairbanks,United States,64.815102,-147.856003,439,-9,A,America/Anchorage,OurAirports
3,55,Astral Aviation,Kenya,Y,8V,FAI,3832,CNC,3832,Fairbanks International Airport,Fairbanks,United States,64.815102,-147.856003,439,-9,A,America/Anchorage,OurAirports
4,55,Astral Aviation,Kenya,Y,8V,FAI,3832,PAG,3832,Fairbanks International Airport,Fairbanks,United States,64.815102,-147.856003,439,-9,A,America/Anchorage,OurAirports


Now, we have a dataframe that contains all the information we need to answer our questions. After merging, our new dataframe has columns, identified by `name_x`, `country_x`, `name_y`, and `country_y`.

`'name_x'`: the name of the airline or the airport in the dataframe airlines or 
the dataframe airports.

`'country_x'`: the country of the airline or the airport in the dataframe 
airlines or the dataframe airports.

`'name_y'`: the name of the destination airport in the dataframe routes.

`'country_y'`: the country of the destination airport in the dataframe routes.

# Visualizing and exploring the data

In [328]:
#First, we need to know what acronyms or names are used for Rayan Air. 
#We can see that the used name is Ryanair.
airlines_routes_airports['name_x'].unique()

array(['40-Mile Air', 'Astral Aviation', 'Alaska Airlines', 'Era Alaska',
       'Aigle Azur', 'American Airlines', 'Aegean Airlines', 'Air France',
       'Air Malta', 'Alitalia', 'Air Algerie', 'Aer Lingus',
       'Air Madagascar', 'Airlinair', 'Air Transat', 'British Airways',
       'Brussels Airlines', 'Corse-Mediterranee', 'El Al Israel Airlines',
       'Ethiopian Airlines', 'easyJet', 'Formosa Airlines', 'Germania',
       'Iberia Airlines', 'KLM Royal Dutch Airlines', 'Lufthansa',
       'Nouvel Air Tunisie', 'Olympic Airlines', 'Pegasus Airlines',
       'Royal Air Maroc', 'Ryanair', 'TAP Portugal', 'Tunisair',
       'Turkish Airlines', 'Twin Jet', 'XL Airways France', 'Air Europa',
       'Air Berlin', 'Air Caraïbes', 'CityJet', 'Corsairfly',
       'Cubana de Aviación', 'Flybe', "Hex'Air", 'Iran Air',
       'Norwegian Air Shuttle', 'SATA International',
       'Transaero Airlines', 'Transavia France', 'Asiana Airlines',
       'Adria Airways', 'Aeroflot Russian Airlines'

In [329]:
# Are most flies by Ryanaire domestic or international?

fig = go.Figure(
    data=[
        go.Pie(
            labels=["Domestic", "International"],
            values=[
                airlines_route_airports[
                    airlines_route_airports["country_x"] == "Ireland"
                ]["Airline_ID"].count(),
                airlines_route_airports[
                    airlines_route_airports["country_x"] != "Ireland"
                ]["Airline_ID"].count(),
            ],
            marker_colors=px.colors.sequential.Plasma,
        )
    ],
    layout_title_text="Number of domestic and international flights by Rayan Air",
)
fig.show()

In [330]:
# create an interactive map to show the routes and destinations of Rayan Air
fig = px.scatter_mapbox(
    airlines_routes_airports[airlines_routes_airports['name_x'] == 'Ryanair'],
    lat="latitude",
    lon="longitude",
    hover_name="city",
    hover_data=["name_x"],
    zoom=3,
)
fig.update_layout(mapbox_style="open-street-map")
fig.update_layout(margin={"r": 0, "t": 0, "l": 0, "b": 0})
fig.show()

In [331]:
# Filter the top 10 airlines that fly to the most destinations
airlines_route_airports = airlines_routes_airports
top = (
    airlines_route_airports.groupby(["name_x"])["Destination_airport_ID"]
    .count()
    .sort_values(ascending=False)
    .head(10)
)
top

name_x
Ryanair                    2484
American Airlines          2350
United Airlines            2179
Delta Air Lines            1979
US Airways                 1960
China Southern Airlines    1450
China Eastern Airlines     1257
Air China                  1252
Southwest Airlines         1145
easyJet                    1130
Name: Destination_airport_ID, dtype: int64

In [332]:
# Visualize the extracted rankings
fig = go.Figure(
    data=[go.Bar(x=top.index, y=top.values, marker_color=px.colors.sequential.Plasma)],
    layout_title_text="Top 10 airlines that fly to the most destinations",
)
fig.show()

Clearly the data shows that Rayan Air is the airline that flies to the most destinations, among all the most active airlines.

In [333]:
# What is the average distance of Rayan Air’s routes? we don't have the latitude
# and longitude of `name_y` in the dataset (The 'name_y' is the name of the 
#destination airport in the dataframe routes.) 
# We do know that Ryanaire is located in Dublin, Ireland. So, we use geopy to 
# calculate the distance between Dublin and the destination airports of Ryanaire.


dublin = (53.3498, 6.2603)

distances = []
for i in range(len(airlines_routes_airports)):
    if airlines_routes_airports['name_x'][i] == 'Ryanair':
        destination = (
            airlines_routes_airports['latitude'][i],
            airlines_routes_airports['longitude'][i],
        )
        distances.append(distance.distance(dublin, destination).km)

print(f"The average distance of Rayan Air’s routes is {np.mean(distances)} km.")

The average distance of Rayan Air’s routes is 1193.9909159206895 km.


In [334]:
# We would like to get the average route distance of other airlines: 
#American Airlines (Texas), United Airlines (Chicago), Delta Airlines (Atlanta),
# US Airways (Pittsburgh ), China Southern Airlines (Guangzhou), 
# China Eastern Airlines (Shanghai), Air China (Beijing), 
 #Southwest Airlines (Texas ), easyJet (Swiss).

def average_distance(airline, city):
    """
    This function calculates the average distance of the routes of the airlines.
    """
    city = (city[0], city[1])
    distances = []
    for i in range(len(airlines_routes_airports)):
        if airlines_routes_airports['name_x'][i] == airline:
            destination = (
                airlines_routes_airports['latitude'][i],
                airlines_routes_airports['longitude'][i],
            )
            distances.append(distance.distance(city, destination).km)
    return np.mean(distances)

In [335]:
# calculate the average distance of the routes of other airlines
print(
    f"The average distance of American Airlines’s routes is {average_distance('American Airlines', (32.896801, -97.038002))} km."
)
print(
    f"The average distance of United Airlines’s routes is {average_distance('United Airlines', (41.9742, -87.9073))} km."
)
print(
    f"The average distance of Delta Airlines’s routes is {average_distance('Delta Air Lines', (33.640444, -84.426944))} km."
)
print(
    f"The average distance of US Airways’s routes is {average_distance('US Airways', (40.491467, -80.232872))} km."
)
print(
    f"The average distance of China Southern Airlines’s routes is {average_distance('China Southern Airlines', (23.392436, 113.298786))} km."
)
print(
    f"The average distance of China Eastern Airlines’s routes is {average_distance('China Eastern Airlines', (31.222222, 121.458056))} km."
)
print(
    f"The average distance of Air China’s routes is {average_distance('Air China', (40.080111, 116.584556))} km."
)
print(
    f"The average distance of Southwest Airlines’s routes is {average_distance('Southwest Airlines', (32.896801, -97.038002))} km."
)
print(
    f"The average distance of easyJet’s routes is {average_distance('easyJet', (47.3769, 8.5417))} km."
)


The average distance of American Airlines’s routes is 3036.218189679644 km.
The average distance of United Airlines’s routes is 2624.1487889089813 km.
The average distance of Delta Airlines’s routes is 2233.49278942073 km.
The average distance of US Airways’s routes is 2111.4381256276783 km.
The average distance of China Southern Airlines’s routes is 1390.5528399127065 km.
The average distance of China Eastern Airlines’s routes is 1406.1808900033525 km.
The average distance of Air China’s routes is 1654.5755229135284 km.
The average distance of Southwest Airlines’s routes is 1428.967112906936 km.
The average distance of easyJet’s routes is 938.7064567285631 km.


In [336]:
# Now, let's narrow our focus on Europe. Get the top 2 airlines in United Kingdom, 
#Denmark, France, Greece, Italy, Ireland, Netherlands, Norway, Portugal, Spain,
# Sweden, and Switzerland.

def top_airlines(country):
    """
    This function returns the top 2 airlines in the countries.
    """
    top = (
        airlines_routes_airports[airlines_routes_airports["country_y"] == country]
        .groupby(["name_x"])["Destination_airport_ID"]
        .count()
        .sort_values(ascending=False)
        .head(2)
    )
    return top


In [337]:
print(f"The top 2 airlines in United Kingdom are {top_airlines('United Kingdom')}.")
print(f"The top 2 airlines in Denmark are {top_airlines('Denmark')}.")
print(f"The top 2 airlines in France are {top_airlines('France')}.")
print(f"The top 2 airlines in Greece are {top_airlines('Greece')}.")
print(f"The top 2 airlines in Italy are {top_airlines('Italy')}.")
print(f"The top 2 airlines in Ireland are {top_airlines('Ireland')}.")
print(f"The top 2 airlines in Netherlands are {top_airlines('Netherlands')}.")
print(f"The top 2 airlines in Norway are {top_airlines('Norway')}.")
print(f"The top 2 airlines in Portugal are {top_airlines('Portugal')}.")
print(f"The top 2 airlines in Spain are {top_airlines('Spain')}.")
print(f"The top 2 airlines in Sweden are {top_airlines('Sweden')}.")
print(f"The top 2 airlines in Switzerland are {top_airlines('Switzerland')}.")

The top 2 airlines in United Kingdom are name_x
Ryanair    398
easyJet    344
Name: Destination_airport_ID, dtype: int64.
The top 2 airlines in Denmark are name_x
Scandinavian Airlines System    91
Norwegian Air Shuttle           49
Name: Destination_airport_ID, dtype: int64.
The top 2 airlines in France are name_x
Air France    346
easyJet       191
Name: Destination_airport_ID, dtype: int64.
The top 2 airlines in Greece are name_x
Aegean Airlines     152
Olympic Airlines     96
Name: Destination_airport_ID, dtype: int64.
The top 2 airlines in Italy are name_x
Ryanair     400
Alitalia    264
Name: Destination_airport_ID, dtype: int64.
The top 2 airlines in Ireland are name_x
Ryanair       125
Aer Lingus    107
Name: Destination_airport_ID, dtype: int64.
The top 2 airlines in Netherlands are name_x
KLM Royal Dutch Airlines    138
Transavia Holland           110
Name: Destination_airport_ID, dtype: int64.
The top 2 airlines in Norway are name_x
Widerøe                  199
Norwegian Air

In [338]:
# calculate the average route distance for Ryanair, Scandinavian Airlines System,
# Air France, Aegean Airlines, KLM Royal Dutch, Airlines, Widerøe, TAP Portugal,
# Swiss International Air Lines.


print(
    f"The average distance of Ryanair’s routes is {average_distance('Ryanair', (53.3498, 6.2603))} km."
)
print(
    f"The average distance of Scandinavian Airlines System’s routes is {average_distance('Scandinavian Airlines System', (55.6761, 12.5683))} km."
)
print(
    f"The average distance of Air France’s routes is {average_distance('Air France', (48.8566, 2.3522))} km."
)
print(
    f"The average distance of Aegean Airlines’s routes is {average_distance('Aegean Airlines', (37.9838, 23.7275))} km."
)
print(
    f"The average distance of KLM Royal Dutch Airlines’s routes is {average_distance('KLM Royal Dutch Airlines', (52.3086, 4.7639))} km."
)
print(
    f"The average distance of Widerøe’s routes is {average_distance('Widerøe', (59.9139, 10.7522))} km."
)
print(
    f"The average distance of TAP Portugal’s routes is {average_distance('TAP Portugal', (38.7223, -9.1393))} km."
)
print(
    f"The average distance of Swiss International Air Lines’s routes is {average_distance('Swiss International Air Lines', (46.9479, 7.4474))} km."
)

The average distance of Ryanair’s routes is 1193.9909159206895 km.
The average distance of Scandinavian Airlines System’s routes is 1049.3968199605324 km.
The average distance of Air France’s routes is 3823.3073649270473 km.
The average distance of Aegean Airlines’s routes is 726.5980847354356 km.
The average distance of KLM Royal Dutch Airlines’s routes is 5313.575927254898 km.
The average distance of Widerøe’s routes is 760.8640072676747 km.
The average distance of TAP Portugal’s routes is 1585.794767156709 km.
The average distance of Swiss International Air Lines’s routes is 1475.464168599631 km.


In [339]:
import plotly.graph_objects as go

fig = go.Figure(
    data=[
        go.Bar(
            x=[
                "Ryanair",
                "Scandinavian Airlines System",
                "Air France",
                "Aegean Airlines",
                "KLM Royal Dutch Airlines",
                "Widerøe",
                "TAP Portugal",
                "Swiss International Air Lines",
            ],
            y=[
                average_distance("Ryanair", (53.3498, 6.2603)),
                average_distance("Scandinavian Airlines System", (55.6761, 12.5683)),
                average_distance("Air France", (48.8566, 2.3522)),
                average_distance("Aegean Airlines", (37.9838, 23.7275)),
                average_distance("KLM Royal Dutch Airlines", (52.3086, 4.7639)),
                average_distance("Widerøe", (59.9139, 10.7522)),
                average_distance("TAP Portugal", (38.7223, -9.1393)),
                average_distance("Swiss International Air Lines", (46.9479, 7.4474)),
            ],
            marker_color=px.colors.sequential.Plasma,
        )
    ],
    layout_title_text="Average route distance for Ryanair, Scandinavian Airlines System, Air France, Aegean Airlines, KLM Royal Dutch, Airlines, Widerøe, TAP Portugal, Swiss International Air Lines",
)
fig.show()

The data suggests that Ryanair has the shortest average route distance, while Swiss International Air Lines has the longest average route distance. One reason for why Ryanair has a shorter average route distance is that it might have direct flights than the other airlines. As well, we need information about the year variable to see how these average route distances have changed over time. Other variables to control for include the size of the airport, the size of the city where the airport is located, and the number of stops.