# Load ADS-B Flight Data

This notebook demonstrates how to fetch flight data from the contrails.org [ADS-B API](https://apidocs.contrails.org/notebooks/adsb_api.html), impute missing fligth IDs, and structure the data into a `pycontrails.Flight` DataFrame.

## Imports

In [None]:
%load_ext autoreload
%autoreload 2

import asyncio
import io
import time
from datetime import date, datetime, timedelta

import aiohttp
import pandas as pd
import plotly.graph_objects as go

from pycontrails import Flight
from pycontrails.core import flight
from src import adsb
from src import flight_visualization

## Configuration

Set the date for data retrieval and your contrails API key.

Contact api@contrails.org if you need an API key.

In [None]:
# Replace with the desired date range. End date is inclusive.
START_DATE = date(2025, 1, 15)
END_DATE = date(2025, 1, 16)

# Replace with your contrails.org API key
CONTRAILS_API_KEY = "Insert key here"  # @param {type:\"string\"}

## Fetch global ADS-B asynchronously

See [API documentation](https://apidocs.contrails.org/notebooks/adsb_api.html).

Fetch data in hourly chunks for the target date concurently using `asynio` and `aiohttp`.

In [None]:
# Asynchronously fetch waypoint data for given date range
all_raw_dfs = []
date_range = pd.date_range(start=START_DATE, end=END_DATE, freq="D")

print(f"Fetching data from {START_DATE} to {END_DATE}")
start_time = time.time()

for target_date in date_range:
    print(f"Fetching data for {target_date.date()}")
    try:
        daily_df = await adsb.fetch_all_day_data(target_date.date(), CONTRAILS_API_KEY)
        if not daily_df.empty:
            all_raw_dfs.append(daily_df)
    except ValueError as e:
        print(f"Error for {target_date.date()}: {e}")

if all_raw_dfs:
    raw_df = pd.concat(all_raw_dfs, ignore_index=True)
    total_time_taken = time.time() - start_time
    print(f"Flight data ingestion completed in {total_time_taken:.2f}s")
    print(f"Fetched {len(raw_df)} waypoints in total.")
    print(raw_df.head())
else:
    raw_df = pd.DataFrame()  # Initialize empty DataFrame
    print("No data fetched for the specified date range.")

## Data Cleaning and Preparation

Ensure correct data types and sort the data.

In [None]:
if not raw_df.empty:
    cleaned_df = adsb.clean_adsb_df(raw_df)
    print(f"Cleaned DataFrame has {len(cleaned_df)} waypoints.")
    display(cleaned_df.head())
else:
    cleaned_df = pd.DataFrame()
    print("Skipping cleaning, no data loaded.")

## Create Flights from grouped waypoints

Create a dataframe of flights grouped by Flight ID. Imput missing `flight_id` values based on temporal proximity for the same ICAO address.

**Methodology:**

1. Group waypoints by `icao`.
2. Identify segments where `flight_id` is missing
3. Group consecutive missing `flight_id` waypoints if the time gap is less than `MAX_GAP_SECONDS`.
4. For each group of missing IDs, look for a known `flight_id` within `LOOKUP_WINDOW_SECONDS` before the start or after the end of the group.
5. If multiple known IDs are found, use the chronologically closest one.
6. If no known ID is found, generate a new unique `flight_id` for that segment.

**On flight ID generation:**

Flight IDs are generated based on the flight's start and end timestamps and its ICAO address. All IDs are prefixed with SPIRE-INFERRED-{icao_address}-. The rest of the ID depends on the time of day:

1. Midnight Rollover/Holdover: Special formatting is applied if the flight period crosses midnight within a certain threshold (midnight_threshold_mins).

* If the flight ends just after midnight (a "holdover"), the ID includes the dates of the day before the start and the start date, formatted as: {start_date - 1 day}-rollover-{start_date}.
* If the flight starts just before midnight (a "rollover"), the ID includes the start date and the day after the end date, formatted as: {start_date}-rollover-{end_date + 1 day}.
2. Standard: If the flight period doesn't cross the midnight threshold, the ID is generated using the Unix timestamp (in seconds) of the start and end times: {int(start_timestamp)}-{int(end_timestamp)}.

Examples:

* Holdover: `SPIRE-INFERRED-ABC123-2026-02-03-rollover-2026-02-04`
* Rollover: `SPIRE-INFERRED-ABC123-2026-02-04-rollover-2026-02-05`
* Standard: `SPIRE-INFERRED-ABC123-1760035200-1760042400`

In [None]:
# Run imputation and create Flight objects
if not cleaned_df.empty:
    imputed_df = adsb.impute_flight_ids(cleaned_df)

    # Group into pycontrails Flight objects
    # We limit to a few flights for the demonstration to save memory
    unique_ids = imputed_df["flight_id"].unique()[:100]
    flights_data = []
    for fid in unique_ids:
        f_df = imputed_df[imputed_df["flight_id"] == fid]
        if len(f_df) > 200:  # Only keep flights with enough points
            try:
                flights_data.append(Flight(f_df, flight_id=fid))
            except Exception as e:
                print(f"Error creating Flight object for {fid}: {e}")

    print(f"Created {len(flights_data)} Flight objects.")
else:
    flights_data = []

## Example: Accessing Data for a Flight

In [None]:
# Example: Plot the first flight in the list
if flights_data:
    flight_visualization.plot_flight_on_globe(flights_data[0])
else:
    print("No flights available to plot.")