# Flight Delays EDA - Julia Notebook

This notebook is an interactive guide for exploring the flight delay dataset. We will walk through setting up the environment, loading the data, cleaning it, and creating visualizations to answer key questions about flight delays.

### 1. Setup Environment

First, we need to set up our Julia environment. The `Project.toml` file in the main project directory lists all the necessary packages. The following code cell will:
1. **Activate** the project environment.
2. **Add** packages for interactivity (`Interact` and `WebIO`).
3. **Instantiate** the environment, which downloads and installs all the required packages. You only need to run this cell once.

In [1]:
# Import the package manager
import Pkg

# Activate the project environment from the parent directory
Pkg.activate(joinpath(@__DIR__, ".."))

# Ensure interactive deps are available by default
Pkg.add(["Interact", "WebIO", "Widgets", "Blink"])
Pkg.instantiate()
try
    using WebIO
    WebIO.install_jupyter_nbextension()
catch e
    @warn "WebIO nbextension install skipped" exception=e
end


[32m[1m  Activating[22m[39m project at `~/Downloads/EDA/Flight_EDA_Analysis`
[32m[1m   Resolving[22m[39m package versions...
[32m[1m      Compat[22m[39m entries added for 
[36m[1m     Project[22m[39m No packages added to or removed from `~/Downloads/EDA/Flight_EDA_Analysis/Project.toml`
[36m[1m    Manifest[22m[39m No packages added to or removed from `~/Downloads/EDA/Flight_EDA_Analysis/Manifest.toml`
[33m[1m│ [22m[39m  exception =
[33m[1m│ [22m[39m   LoadError: InterruptException:
[33m[1m│ [22m[39m   Stacktrace:
[33m[1m│ [22m[39m     [1] [0m[1mdisplay_mimestring[22m[0m[1m([22m[90mmime_array[39m::[0mVector[90m{MIME}[39m, [90mx[39m::[0mAny[0m[1m)[22m
[33m[1m│ [22m[39m   [90m    @[39m [35mIJulia[39m [90m~/.julia/packages/IJulia/TXScA/src/[39m[90m[4mdisplay.jl:74[24m[39m
[33m[1m│ [22m[39m     [2] [0m[1m_display_dict[22m[0m[1m([22m[90mx[39m::[0mAny[0m[1m)[22m
[33m[1m│ [22m[39m   [90m    @[39m [35mIJul

### 2. Load Project Code and Dependencies

Now we load the libraries we'll use for data manipulation and plotting, including our new interactivity packages. We also load our own `FlightEDA` module, which contains all the custom functions we've written for this project.

In [None]:
# Load standard libraries for data handling
using CSV, DataFrames, Statistics, Dates, Random

# Load the plotting library
using Plots

# Load libraries for interactive widgets
using Interact, WebIO

# Load our custom module from the 'src' directory
const SRC_DIR = abspath(joinpath(@__DIR__, "..", "src"))
include(joinpath(SRC_DIR, "FlightEDA.jl"))
using .FlightEDA

### 3. Load the Data

We are now ready to load the flight data. For quick analysis, we'll use a small sample file.

In [None]:
# Load the project configuration to get file paths
cfg = FlightEDA.load_config()

# Base directory for relative paths (works even if the notebook is run from notebooks/)
BASE_DIR = normpath(joinpath(@__DIR__, ".."))

# Set to `false` to load the full cleaned dataset instead of the sample
USE_SAMPLE = true

# Resolve the file path relative to the project root
file_to_load = joinpath(BASE_DIR, USE_SAMPLE ? cfg.data.sample_file : cfg.data.cleaned_file)

# Check if the file exists before trying to read it
if !isfile(file_to_load)
    error("Data file not found: $(file_to_load). You may need to run the cleaning script first or check your config.")
end

# Read the CSV file into a DataFrame and time the operation
println("Loading data from: $file_to_load")
@time df = CSV.read(file_to_load, DataFrame)

println("\nLoaded $(nrow(df)) rows and $(ncol(df)) columns.")
first(df, 5)

### 4. Clean Data and Engineer Features

The raw data needs some preparation before it's ready for analysis. We'll use the `enrich_features!` function from our `FlightEDA` module to add new columns.

In [None]:
# Use the function from our module to add new features to the DataFrame.
enrich_features!(df)

# Display the first few rows to see the new columns
println("DataFrame after adding features:")
first(df, 5)

### 5. Quick Data Quality Check

Let's perform a quick check on the data to see summary statistics and the number of missing values in each column.

In [None]:
# The describe function gives a statistical summary of the DataFrame
describe(df, :nmissing, :min, :median, :max, :mean)

### 6. Static Visualizations

First, we'll create some static plots to get a general overview of the data.

In [None]:
# Ensure the directory for saving plots exists
plot_dir = cfg.plots.dir
mkpath(plot_dir)

println("Plots will be saved to: $plot_dir")

In [None]:
FlightEDA.plot_arrival_delay_histogram(df, cfg.plots.delay_lower, cfg.plots.delay_upper, plot_dir)
current()

In [None]:
FlightEDA.plot_average_delay_by_hour(df, plot_dir)
current()

### 7. Interactive Analysis

Now let's add some interactivity. The widget below allows you to select a specific airline and see its average delay pattern by hour. This is much more powerful than a static plot, as it lets you compare airlines directly.

In [None]:
# Get a list of unique airline carriers to populate the dropdown
airlines = unique(skipmissing(df.op_unique_carrier))

# @manipulate creates an interactive widget.
# For each item in the `airline_picker` dropdown, it will re-run the code block.
@manipulate for airline_picker in dropdown(airlines)
    # Filter the DataFrame for the selected airline
    df_filtered = filter(:op_unique_carrier => ==(airline_picker), df)
    
    # Group by hour and calculate the mean arrival delay
    df_grouped = combine(groupby(df_filtered, :hour_of_day), :arr_delay => (mean ∘ skipmissing) => :mean_delay)
    sort!(df_grouped, :hour_of_day)
    
    # Create the plot
    plot(df_grouped.hour_of_day, df_grouped.mean_delay, 
        marker=:circle, 
        xlabel="Hour of Day", 
        ylabel="Average Delay (min)",
        title="Average Delay by Hour for $(airline_picker)",
        legend=false,
        grid=true
    )
end

### 8. Summary & Next Steps

This notebook demonstrates a basic workflow for our flight delay analysis, including static and interactive plots. From here, you can:

- **Use the full dataset** by setting `USE_SAMPLE = false` for a more comprehensive analysis.
- **Generate all plots** by running `FlightEDA.generate_plots(cfg, df)`.
- **Add more interactive widgets** to explore other features like origin airport or day of the week.