# <center> Interactive Geospatial Visualization <center/>
<center> <b>DLBDSEDAV01<b/> - Explorative Data Anlysis and Visualization <center/>
<center> IU International University of Applied Sciences <center/>

# Greetings
In this project, we attempt to perform exploratory analysis of an airline dataset, focusing on visually uncovering patterns and insights. By applying statistical techniques and visualization methods, we aim to better understand the relationships and trends within the data. The analysis will employ descriptive statistics, such as measures of location and variation, and graphical representations to summarize and interpret the dataset's key features. This process will also involve applying design principles to create meaningful and informative visualizations, making the data more accessible for analysis and decision-making.

## List of contents : 
1. __About the data__
2. __Data Exploration__
3. __Data Pre-processing__
4. __Data Visualization__
5. __Summary__

First, we start by importing the required libraries

In [34]:
# Import required libraries
from pathlib import Path
import plotly.graph_objects as go
import plotly.express as px
import pandas as pd

# 1. About the data

The [data](https://dax-cdn.cdn.appdomain.cloud/dax-airline/1.0.1/airline_2m.tar.gz) used in this notebook is only a sample from the original dataset called __The Reporting Carrier On-Time Performance Dataset__ containing information on approximately 200 million domestic US flights reported to the [United States Bureau of Transportation Statistics](https://www.bts.gov/). The dataset contains basic information about each flight (such as date, time, departure airport, arrival airport) and, if applicable, the amount of time the flight was delayed and information about the reason for the delay. A complete overview on the glossary of the data could be found [here](https://dax-cdn.cdn.appdomain.cloud/dax-airline/1.0.1/data-preview/index.html).

__NOTE__: Due to its large size, it's recommended to download it manually.


# 2. Data Exploration
In this phase, we attempt to explore the dataset, reveal its weaknesses, and prepare it for the next phase.

Let's load our dataframe

In [35]:
# Locating our directory
path = Path.cwd()

# Read the airline data into pandas dataframe
df = pd.read_csv(
    f'{path.parent}/data/airline_2m.csv', encoding="ISO-8859-1", low_memory=False, index_col=0,
    dtype={'Div1Airport': str, 'Div1TailNum': str, 'Div2Airport': str, 'Div2TailNum': str}
)

# Resetting the index
df.reset_index(inplace=True)

# Print the few first rows
df.head()

Unnamed: 0,Year,Quarter,Month,DayofMonth,DayOfWeek,FlightDate,Reporting_Airline,DOT_ID_Reporting_Airline,IATA_CODE_Reporting_Airline,Tail_Number,...,Div4WheelsOff,Div4TailNum,Div5Airport,Div5AirportID,Div5AirportSeqID,Div5WheelsOn,Div5TotalGTime,Div5LongestGTime,Div5WheelsOff,Div5TailNum
0,1998,1,1,2,5,1998-01-02,NW,19386,NW,N297US,...,,,,,,,,,,
1,2009,2,5,28,4,2009-05-28,FL,20437,FL,N946AT,...,,,,,,,,,,
2,2013,2,6,29,6,2013-06-29,MQ,20398,MQ,N665MQ,...,,,,,,,,,,
3,2010,3,8,31,2,2010-08-31,DL,19790,DL,N6705Y,...,,,,,,,,,,
4,2006,1,1,15,7,2006-01-15,US,20355,US,N504AU,...,,,,,,,,,,


In [36]:
# Print the shape of the data
print(f"The dataset contains {df.shape[0]} rows and {df.shape[1]} columns.")

The dataset contains 2000000 rows and 109 columns.


In the following, we can see an overview of the data types, missing values, and columns of our dataset.

In [37]:
# Initiating a new dictionary
new_dict = {}

# Running a loop to gather information
for column in df.columns:
    new_dict[column] = [df[column].dtype, df[column].isna().sum(), df[column].unique()]

# Constructing the dataframe
discover_df = pd.DataFrame(new_dict).transpose().reset_index()
discover_df.columns = ["Columns", "Data types", "Number of empty values", "Unique Values"]

# Print the first few rows
discover_df

Unnamed: 0,Columns,Data types,Number of empty values,Unique Values
0,Year,int64,0,"[1998, 2009, 2013, 2010, 2006, 1995, 2019, 200..."
1,Quarter,int64,0,"[1, 2, 3, 4]"
2,Month,int64,0,"[1, 5, 6, 8, 11, 2, 4, 7, 9, 10, 3, 12]"
3,DayofMonth,int64,0,"[2, 28, 29, 31, 15, 7, 11, 3, 8, 21, 24, 30, 4..."
4,DayOfWeek,int64,0,"[5, 4, 6, 2, 7, 3, 1]"
...,...,...,...,...
104,Div5WheelsOn,float64,2000000,[nan]
105,Div5TotalGTime,float64,2000000,[nan]
106,Div5LongestGTime,float64,2000000,[nan]
107,Div5WheelsOff,float64,2000000,[nan]


On this basis, we came to conclude the following :
- Multiple empty columns requires elimination.
- Redundant information must be resolved.
- Data types require proper handling

# 3. Data Pre-processing
In this phase, we aim to clean the data, handle data types, eliminate redundancy, and prepare data it for the visualization phase.

In [38]:
# Listing the empty columns
columns_to_drop = [
    # Redundant information
    "DOT_ID_Reporting_Airline",
    "IATA_CODE_Reporting_Airline",
    "OriginAirportID",
    "OriginAirportSeqID",
    "OriginCityMarketID",
    "OriginStateFips",
    "OriginWac",
    "DestAirportID",
    "DestAirportSeqID",
    "DestCityMarketID",
    "DestStateFips",
    "DestWac",
    "Div1AirportID",
    "Div1AirportSeqID",
    "Div1AirportID",
    "Div1AirportSeqID",

    # Empty columns
    "Div2Airport",
    "Div2AirportID",
    "Div2AirportSeqID",
    "Div2WheelsOn",
    "Div2TotalGTime",
    "Div2LongestGTime",
    "Div2WheelsOff",
    "Div2TailNum",
    "Div3Airport",
    "Div3AirportID",
    "Div3AirportSeqID",
    "Div3WheelsOn",
    "Div3TotalGTime",
    "Div3LongestGTime",
    "Div3WheelsOff",
    "Div3TailNum",
    "Div4Airport",
    "Div4AirportID",
    "Div4AirportSeqID",
    "Div4WheelsOn",
    "Div4TotalGTime",
    "Div4LongestGTime",
    "Div4WheelsOff",
    "Div4TailNum",
    "Div5Airport",
    "Div5AirportID",
    "Div5AirportSeqID",
    "Div5WheelsOn",
    "Div5TotalGTime",
    "Div5LongestGTime",
    "Div5WheelsOff",
    "Div5TailNum",
]

# Dropping the columns from the dataset
df.drop(columns_to_drop, axis=1, inplace=True)

# Adjusting columns with redundant information already in other columns
df['OriginCityName'] = df['OriginCityName'].str.split(',').str[0]
df['DestCityName'] = df['DestCityName'].str.split(',').str[0]

# Handle data types
df['Flights'] = df['Flights'].astype(int)

# Review the shape of the data
print(f"The resulting dataset contains {df.shape[0]} rows and {df.shape[1]} columns.")

The resulting dataset contains 2000000 rows and 63 columns.


The dataset covers two million flights during the period from __1987-10-01__ to __2020-03-31__. For the task, I'll be focusing on the data resulted in the last 2 decades.

In [39]:
# Slicing the data to cover the last 2 decades
df = df[(df["Year"] > 1998) & (df["Year"] < 2020)]

# Review the shape of the data
print(f"The resulting dataset contains {df.shape[0]} rows and {df.shape[1]} columns.")

The resulting dataset contains 1376877 rows and 63 columns.


Now that data is cleansed, let's create some user-defined functions to help us with the analysis.

In [40]:
### Optional
## The dataset's biggest challenge was the lack airports coordinates and states. To overcome this, I've created a function to extract the data from the internet.
## The function is as follows:
# from geopy.geocoders import Nominatim
# from geopy.exc import GeocoderTimedOut
# import pandas as pd
# import time
# import os
#
# # Initialize geolocator
# geolocator = Nominatim(user_agent="airport_locator")
#
# # Function to fetch coordinates
# def get_coordinates(airport_code):
#     """
#     Get the coordinates for the given airport code.
#     :param airport_code: The airport code.
#     :return: The airport's latitude and longitude.
#     """
#     # Try to get the coordinates
#     try:
#
#         # Define the location
#         location = geolocator.geocode(f"{airport_code} airport")
#
#         # Return the coordinates
#         if location:
#             return location.latitude, location.longitude
#
#         else:
#             return None, None
#
#     # Handle timeout errors
#     except GeocoderTimedOut:
#
#         # Wait a second before retrying
#         time.sleep(1)
#
#         # Try again
#         return get_coordinates(airport_code)
#
# # Load cached data if exists
# cache_file = "airports_coordinates.csv"
#
# # Load the cached data
# if os.path.exists(cache_file):
#     cached_df = pd.read_csv(cache_file)
# else:
#     cached_df = pd.DataFrame(columns=["AirportCode", "Latitude", "Longitude"])
#
# # Create a DataFrame for storing results
# airports_data = {
#     "AirportCode": [],
#     "Latitude": [],
#     "Longitude": []
# }
#
# # Get the airport codes
# airports_list = list(set(df["Origin"].unique()).union(df["Dest"].unique()))
#
# # Fetch coordinates for each airport
# for airport in airports_list:
#
#     # Check if the airport is already in the cache
#     if airport in cached_df["AirportCode"].values:
#
#         lat = cached_df.loc[cached_df["AirportCode"] == airport, "Latitude"].values[0]
#         lon = cached_df.loc[cached_df["AirportCode"] == airport, "Longitude"].values[0]
#     else:
#         lat, lon = get_coordinates(airport)
#
#         # Rate-limiting: wait 1 second between requests
#         time.sleep(1)
#
#     # Append newly acquired values
#     airports_data["AirportCode"].append(airport)
#     airports_data["Latitude"].append(lat)
#     airports_data["Longitude"].append(lon)
#
# # Convert to DataFrame
# airports_df = pd.DataFrame(airports_data)
#
# # Merge with cached data and remove duplicates
# updated_df = pd.concat([cached_df, airports_df]).drop_duplicates(subset=["AirportCode"])
#
# # Save the updated cache
# updated_df.to_csv(cache_file, index=False)
print("This part is optional")

This part is optional


In [41]:
# Defining the functions
def get_airports_data(dataframe: pd.DataFrame):
    """
    Get the airports data from the given dataframe.

    Arguments:
    dataframe: Filtered dataframe.

    Returns:
    Dataframe containing the airport data.
    """
    # Load additional data
    airport_coords_df = pd.read_csv(f"{path.parent}/data/airports_coordinates.csv")

    # Aggregate airport-level data
    airport_flights = dataframe.groupby('Origin')['Flights'].sum().reset_index()
    airport_flights.rename(columns={'Origin': 'AirportCode'}, inplace=True)

    data = pd.merge(airport_coords_df, airport_flights, on='AirportCode', how='left')

    data['Flights'] = data['Flights'].fillna(0)
    data['Flights'] = data['Flights'].astype(int)

    data.sort_values('StateName', ascending=True, inplace=True)

    # Return the results
    return data

def get_map_data(dataframe: pd.DataFrame) -> pd.DataFrame:
    """
    Get the map data for the given dataframe.

    Arguments:
    dataframe: Filtered dataframe.

    Returns:
    Dataframe containing the map data.
    """
    # Group data by state
    data = dataframe.groupby(['OriginState', 'OriginStateName'])['Flights'].sum().reset_index()

    # Fill missing values
    data['Flights'] = data['Flights'].fillna(0)

    # Convert the flights to integers
    data['Flights'] = data['Flights'].astype(int)

    # Sort the data
    data.sort_values('Flights', ascending=False, inplace=True)

    # Rename the columns
    data.rename(columns={'OriginState': 'StateCode', 'OriginStateName': 'StateName'}, inplace=True)

    # Sort the data
    data.sort_values('Flights', ascending=True, inplace=True)

    # Return the results
    return data

def get_tree_data(dataframe: pd.DataFrame) -> pd.DataFrame:
    """
    Get the tree data for the given dataframe.

    Arguments:
    dataframe: Filtered dataframe.

    Returns:
    Dataframe containing the tree data.
    """
    # Group data by state, airport, and airline
    data = dataframe.groupby(
        ['OriginState', 'OriginStateName', 'Origin', 'Reporting_Airline']
    )['Flights'].sum().reset_index()
    data["Total Flights per state"] = data.groupby(['OriginState', 'OriginStateName'])['Flights'].transform('sum')

    # Fill missing values
    data['Flights'] = data['Flights'].fillna(0)
    data["Total Flights per state"] = data["Total Flights per state"].fillna(0)

    # Convert the flights to integers
    data['Flights'] = data['Flights'].astype(int)
    data["Total Flights per state"] = data["Total Flights per state"].astype(int)

    # Rename the columns
    data.rename(columns={'OriginState': 'StateCode', 'OriginStateName': 'StateName'}, inplace=True)

    # Return the results
    return data

def get_sunburst_data(dataframe: pd.DataFrame) -> pd.DataFrame:
    """
    Get the sunburst data for the given dataframe.

    Arguments:
    dataframe: Filtered dataframe.

    Returns:
    Dataframe containing the sunburst data.
    """
    # Prepare the cancelled data
    cancelled = dataframe[dataframe['Cancelled'] == 1].groupby(
        ['OriginState', 'OriginStateName', 'Origin', "Reporting_Airline", "CancellationCode"]
    )["Flights"].sum().reset_index()

    #
    cancelled["Status"] = "Cancelled"

    #
    cancelled = cancelled.rename(columns={"CancellationCode": "Detail"})

    # Prepare the delay data
    arr_delayed = dataframe[dataframe['ArrDelay'] > 0].groupby(["Reporting_Airline",
                                                                "Origin", 'OriginState',
                                                                'OriginStateName'])[
        "Flights"].sum().reset_index()
    arr_delayed["Detail"] = "Arrival Delay"

    #
    dep_delayed = dataframe[dataframe['DepDelay'] > 0].groupby(["Reporting_Airline",
                                                                "Origin", 'OriginState',
                                                                'OriginStateName'])[
        "Flights"].sum().reset_index()
    dep_delayed["Detail"] = "Departure Delay"

    #
    delayed = pd.concat([arr_delayed, dep_delayed], ignore_index=True)
    delayed["Status"] = "Delayed"

    # Prepare the on-time data
    arr_ontime = dataframe[dataframe['ArrDelay'] == 0].groupby(["Reporting_Airline",
                                                                "Origin", 'OriginState',
                                                                'OriginStateName'])[
        "Flights"].sum().reset_index()
    arr_ontime["Detail"] = "Arrival On-Time"

    #
    dep_ontime = dataframe[dataframe['DepDelay'] == 0].groupby(["Reporting_Airline",
                                                                "Origin", 'OriginState',
                                                                'OriginStateName'])[
        "Flights"].sum().reset_index()
    dep_ontime["Detail"] = "Departure On-Time"

    #
    on_time = pd.concat([arr_ontime, dep_ontime], ignore_index=True)
    on_time["Status"] = "On-Time"

    # Combine all data
    data = pd.concat([cancelled, delayed, on_time], ignore_index=True).sort_values(by="Reporting_Airline",
                                                                                            ascending=True)

    # Fill missing values
    data['Flights'] = data['Flights'].fillna(0)
    data['Flights'] = data['Flights'].astype(int)

    # Rename the columns
    data.rename(columns={'OriginState': 'StateCode', 'OriginStateName': 'StateName'}, inplace=True)

    # Return the results
    return data

def get_line_data(dataframe: pd.DataFrame) -> pd.DataFrame:
    """
    Get the line data for the given dataframe.

    Arguments:
    dataframe: Filtered dataframe.

    Returns:
    Dataframe containing the line data.
    """
    # Group data by month
    data = dataframe.groupby(["OriginStateName", "OriginState", "Origin", "Month", "Reporting_Airline"])["AirTime"].sum().reset_index()

    # Fill missing values
    data['AirTime'] = data['AirTime'].fillna(0)
    data['AirTime'] = data['AirTime'].astype(int)

    # Rename the columns
    data.rename(columns={'OriginState': 'StateCode', 'OriginStateName': 'StateName'}, inplace=True)

    # Return the results
    return data

def create_choropleth(data: pd.DataFrame, airport_data: pd.DataFrame, selected_state=None, zoom_level=4):
    """
    Create a choropleth map of US states by flights with airport markers.

    :param data: the data for the map
    :param airport_data: the data for the airports
    :param selected_state: the selected state
    :param zoom_level: the zoom level

    :return:
    Plotly figure object
    """
    # Create the main choropleth map
    fig = px.choropleth(
        data,
        locations='StateCode',
        color='Flights',
        hover_data=['StateName', 'Flights'],
        locationmode='USA-states',
        color_continuous_scale='YlOrRd',
        range_color=[0, data['Flights'].max()],
        title="Choropleth Map of US States by Flights",
        labels={
            'StateCode': 'State Code',
            'Flights': 'Number of Flights',
            'StateName': 'State Name'
        }
    )

    # Add airport markers only if a state is selected
    if selected_state:

        # Get the state data
        states_coords_df = pd.read_csv(f"{path.parent}/data/us_state_coordinates.csv")
        state_data = states_coords_df[states_coords_df['Abbreviation'] == selected_state]
        state_name = state_data.iloc[0]['State']

        # Update the map layout with the selected state
        if not state_data.empty:
            # Get the coordinates
            coords = {"lat": state_data.iloc[0]['Latitude'], "lon": state_data.iloc[0]['Longitude']}

            # Update the layout with the selected state
            fig.update_layout(
                geo=dict(
                    center={"lat": coords["lat"], "lon": coords["lon"]},
                    projection_scale=zoom_level  # Zoom level
                )
            )

        # Filter airports in the selected state
        airports_in_state = airport_data[airport_data['StateCode'] == selected_state]

        # Add airport markers for each airport using a locator symbol
        fig.add_trace(
            go.Scattergeo(
                locationmode='USA-states',
                lat=airports_in_state['Latitude'],
                lon=airports_in_state['Longitude'],
                text=airports_in_state.apply(
                    lambda row: f"{row['AirportCode']}<br>Flights: {row['Flights']}", axis=1),
                marker=dict(
                    size=30,
                    color='black',
                    opacity=1,
                    symbol='x'
                ),
                name="Airport Locations"
            )
        )

        # Add scaled circles for flight volume
        fig.add_trace(
            go.Scattergeo(
                locationmode='USA-states',
                lat=airports_in_state['Latitude'],
                lon=airports_in_state['Longitude'],
                text=airports_in_state.apply(
                    lambda row: f"{row['AirportCode']}<br>Flights: {row['Flights']}", axis=1),
                marker=dict(
                    size=airports_in_state['Flights'] / airports_in_state['Flights'].max() * 100,  # Scale size
                    color='red',
                    opacity=0.5,
                    symbol='circle'
                ),
                name="Flight Volume"
            )
        )

        # Update the title
        fig.update_layout(title_text=f"Choropleth Map of {state_name}, US by Flights")

    # Return the figure
    return fig

def create_line(data: pd.DataFrame, state=None, airport=None):
    """
    Create a line chart based on the data provided, a state and an airport are optional.
    :param data: The data needed to create the chart
    :param state: A US state code, else set to None
    :param airport: A US-based airport code, else set to None
    :return: A line chart
    """

    # Create a general use of the line data
    air_time = data.groupby(["Reporting_Airline"])["AirTime"].sum().reset_index().sort_values('AirTime', ascending=False)

    # Determine the top 12 airlines
    top_airlines = air_time["Reporting_Airline"].tolist()[:12]

    # Create the plot data
    plot_data = data.groupby(["Reporting_Airline", "Month"])["AirTime"].sum().reset_index()

    # Create the line chart
    fig = px.line(
        plot_data[plot_data["Reporting_Airline"].isin(top_airlines)],
        x='Month',
        y='AirTime',
        color='Reporting_Airline',
        title='Average Monthly Flight Time per Airline in the US'
    )

    # When a state is provided
    if state:
        # Filter data by state
        state_data = data[data["StateCode"] == state]

        # Sort airlines of the state by their air time
        air_time = state_data.groupby(["Reporting_Airline"])["AirTime"].sum().reset_index().sort_values('AirTime', ascending=False)

        # Determine the top airlines and save them in a list
        top_airlines = air_time["Reporting_Airline"].tolist()[:12]

        # Create the plot data
        plot_data = state_data.groupby(["Reporting_Airline", "StateCode", "Month"])["AirTime"].sum().reset_index()

        # Get the state's name safely
        state_name = state_data['StateName'].iloc[0] if not state_data.empty else state

        # Create the line chart accordingly
        fig = px.line(
            plot_data[plot_data["Reporting_Airline"].isin(top_airlines)],
            x='Month',
            y='AirTime',
            color='Reporting_Airline',
            title=f'Average Monthly Flight Time per Airline in {state_name}, US'
        )

    # When a state and an airport are provided
    if state and airport:
        # Filter data by state and airport
        airport_data = data[(data["StateCode"] == state) & (data["Origin"] == airport)]

        # Sort airlines operating in the airport by air time
        air_time = airport_data.groupby(["Reporting_Airline"])["AirTime"].sum().reset_index().sort_values('AirTime', ascending=False)

        # Determine the top airlines and save them in a list
        top_airlines = air_time["Reporting_Airline"].tolist()[:12]

        # Create the plot data
        plot_data = airport_data.groupby(["Reporting_Airline", "StateCode", "Month"])["AirTime"].sum().reset_index()

        # Get the state's name safely
        state_name = airport_data['StateName'].iloc[0] if not airport_data.empty else state

        # Create the line chart accordingly
        fig = px.line(
            plot_data[plot_data["Reporting_Airline"].isin(top_airlines)],
            x='Month',
            y='AirTime',
            color='Reporting_Airline',
            title=f'Average Monthly Flight Time per Airline at {airport} Airport in {state_name}, US'
        )

    # Return the result
    return fig

Great, we delve into applying functions we've just defined and investigate the output.

In [42]:
# Apply the functions
map_data = get_map_data(df)

# Display the data
map_data.head()

Unnamed: 0,StateCode,StateName,Flights
7,DE,Delaware,17
43,TT,U.S. Pacific Trust Territories and Possessions,59
51,WV,West Virginia,850
47,VI,U.S. Virgin Islands,891
48,VT,Vermont,1201


In [43]:
# Apply the function
tree_data = get_tree_data(df)

# Display the data
tree_data.head()

Unnamed: 0,StateCode,StateName,Origin,Reporting_Airline,Flights,Total Flights per state
0,AK,Alaska,ADK,AS,25,8624
1,AK,Alaska,ADQ,AS,134,8624
2,AK,Alaska,AKN,AS,42,8624
3,AK,Alaska,ANC,AA,37,8624
4,AK,Alaska,ANC,AS,3280,8624


In [44]:
# Apply the function
sunburst_data = get_sunburst_data(df)

# Display the data
sunburst_data.head()

Unnamed: 0,StateCode,StateName,Origin,Reporting_Airline,Detail,Flights,Status
2850,TN,Tennessee,CHA,9E,C,1,Cancelled
12065,WI,Wisconsin,RHI,9E,Departure On-Time,1,On-Time
12064,NC,North Carolina,RDU,9E,Departure On-Time,8,On-Time
12063,ME,Maine,PWM,9E,Departure On-Time,4,On-Time
12062,WA,Washington,PSC,9E,Departure On-Time,1,On-Time


In [45]:
# Apply the function
line_data = get_line_data(df)

# Display the data
line_data.head()

Unnamed: 0,StateName,StateCode,Origin,Month,Reporting_Airline,AirTime
0,Alabama,AL,BFM,5,F9,152
1,Alabama,AL,BFM,8,F9,154
2,Alabama,AL,BFM,11,F9,165
3,Alabama,AL,BHM,1,9E,412
4,Alabama,AL,BHM,1,AA,2373


In [46]:
# Save the data
get_airports_data(df).to_csv(f'{path.parent}/data/airport_data.csv')
map_data.to_csv(f'{path.parent}/data/map_data.csv')
tree_data.to_csv(f'{path.parent}/data/tree_data.csv')
sunburst_data.to_csv(f'{path.parent}/data/sunburst_data.csv')
line_data.to_csv(f'{path.parent}/data/line_data.csv')

# 4. Data Visualization
Now, data is properly aggregated and we can use it for visualization.

In [47]:
# Create the choropleth map
map_fig = create_choropleth(map_data, get_airports_data(df))

# Update the layout
map_fig.update_layout(geo_scope='usa', showlegend=True)

# Save the figure
map_fig.write_image(f"{path.parent}/assets/Figure 01 - Choropleth Map of US States by Flights.png", width=1280, height=720)

# Display the chart
map_fig.show()

In [48]:
# Create the choropleth map
oregon_map_fig = create_choropleth(map_data, get_airports_data(df), selected_state="OR", zoom_level=4)

# Update the layout
oregon_map_fig.update_layout(geo_scope='usa', showlegend=True)

# Save the figure
oregon_map_fig.write_image(f"{path.parent}/assets/Figure 02 - Choropleth Map of Oregon, US by Flights.png", width=1280, height=720)

# Display the chart
oregon_map_fig.show()

In [49]:
# Create the treemap
tree_fig = px.treemap(
    tree_data,
    color='Total Flights per state',
    values='Flights',
    title='Flights distribution across States and Airports',
    path=['StateCode', 'Origin', 'Reporting_Airline'],
    color_continuous_scale='YlOrRd'
)

# Save the figure
tree_fig.write_image(f"{path.parent}/assets/Figure 03 - Flights distribution across airports in the US.png", width=1280, height=640) # One-time run

# Display the chart
tree_fig.show()

In [50]:
# Create the treemap
oregon_tree_fig = px.treemap(
    tree_data[tree_data['StateCode'] == 'OR'],
    color='Flights',
    values='Flights',
    title='Flights distribution per airline in Oregon, US.',
    path=['StateCode', 'Origin', 'Reporting_Airline'],
    color_continuous_scale='YlOrRd'
)

# Save the figure
oregon_tree_fig.write_image(f"{path.parent}/assets/Figure 04 - Flights distribution per airline in Oregon, US.png", width=640, height=640) # One-time run

# Display the chart
oregon_tree_fig.show()

In [51]:
# Create the treemap
pdx_oregon_tree_fig = px.treemap(
    tree_data[tree_data['Origin'] == 'PDX'],
    color='Flights',
    values='Flights',
    title='Flights distribution per airline at the PDX airport in Oregon, US.',
    path=['StateCode', 'Origin', 'Reporting_Airline'],
    color_continuous_scale='YlOrRd'
)

# Save the figure
pdx_oregon_tree_fig.write_image(f"{path.parent}/assets/Figure 05 - Flights distribution per airline at the PDX airport in Oregon, US.png", width=640, height=640) # One-time run

# Display the chart
pdx_oregon_tree_fig.show()

In [52]:
# Create the sunburst chart
sunburst_fig = px.sunburst(
        sunburst_data,
        path=["Reporting_Airline", "Status", "Detail"],
        color="Status",
        values="Flights",
        title="Airlines performances in the US",
        color_discrete_map={"Cancelled": "red", "Delayed": "orange", "On-Time": "green"})


# Save the figure
sunburst_fig.write_image(f"{path.parent}/assets/Figure 06 - Airlines performances in the US.png", width=500, height=500) # One-time run

# Display the chart
sunburst_fig.show()

In [53]:
# Create the sunburst chart
oregon_sunburst_fig = px.sunburst(
        sunburst_data[sunburst_data['StateCode'] == 'OR'],
        path=["Reporting_Airline", "Status", "Detail"],
        color="Status",
        values="Flights",
        title="Airlines performances in Oregon, US",
        color_discrete_map={"Cancelled": "red", "Delayed": "orange", "On-Time": "green"})


# Save the figure
oregon_sunburst_fig.write_image(f"{path.parent}/assets/Figure 07 - Airlines performances in Oregon, US.png", width=500, height=500) # One-time run

# Display the chart
oregon_sunburst_fig.show()

In [54]:
# Create the sunburst chart
pdx_oregon_sunburst_fig = px.sunburst(
        sunburst_data[(sunburst_data['StateCode'] == 'OR') & (sunburst_data['Origin'] == 'PDX')],
        path=["Reporting_Airline", "Status", "Detail"],
        color="Status",
        values="Flights",
        title="Airlines performances at PDX airport in Oregon, US",
        color_discrete_map={"Cancelled": "red", "Delayed": "orange", "On-Time": "green"})


# Save the figure
pdx_oregon_sunburst_fig.write_image(f"{path.parent}/assets/Figure 08 - Airlines performances at PDX airport in Oregon, US.png", width=500, height=500) # One-time run

# Display the chart
pdx_oregon_sunburst_fig.show()

In [55]:
# Create the line chart
line_fig = create_line(line_data)

# Save the figure
line_fig.write_image(f"{path.parent}/assets/Figure 09 - Average Monthly Flight Time per airline in the US.png", width=1500, height=400) # One-time run

# Display the chart
line_fig.show()

In [56]:
# Create the line chart
oregon_line_fig = create_line(line_data, state="OR")

# Save the figure
oregon_line_fig.write_image(f"{path.parent}/assets/Figure 10 - Average Monthly Flight Time per airline in Oregon US.png", width=1500, height=400) # One-time run

# Display the chart
oregon_line_fig.show()

In [57]:
# Create the line chart
pdx_oregon_line_fig = create_line(line_data, state="OR", airport="PDX")

# Save the figure
pdx_oregon_line_fig.write_image(f"{path.parent}/assets/Figure 11 - Average Monthly Flight Time per airline at PDX airport in Oregon US.png", width=1500, height=400) # One-time run

# Display the chart
pdx_oregon_line_fig.show()

# 5. Summary
In this project, we had been able to work with a 2M records dataset with the aim to derive meaningful insights for building a dashboard using <a href="https://plotly.com/">Plotly</a> for __Interactive Data Visualization__ and <a href="https://dash.plotly.com/">Dash</a> for __Building Web Apps__.

The final dashboard is illustrated below

<img src="Final Dashboard.gif" alt="The Final Dashboard" width="2000">

## Author
<a href="https://www.linkedin.com/in/ab0858s/">Abdelali BARIR</a> is a former veteran in the Moroccan's Royal Armed Forces, and a self-taught python programmer. Currently enrolled in B.Sc. Data Science in __IU International University of Applied Sciences__.

## Change Log

| Date         | Version   | Changed By       | Change Description        |
|--------------|-----------|------------------|---------------------------|
| 2025-01-25   | 1.0       | Abdelali Barir   | Modified markdown         |
| 2025-01-28   | 1.01      | Abdelali Barir   | Updated the dashboard     |
| ------------ | --------- | ---------------- | ------------------------- |
