### AviationStack API Overview

AviationStack is a REST API that provides real-time and historical flight data from global aviation sources. It offers comprehensive flight information including schedules, status, delays, and airline/airport details.

### How You'll Use It in Your Project

#### Primary Data Source:

Fetch real-time flight status (scheduled/estimated/actual times)

Get historical delay patterns for specific flights/routes

Access airline and aircraft information

Retrieve airport-specific data

#### Key Use Cases:

Flight Identification: Look up flights by number/route to get unique flight IDs

Delay History: Analyze past performance of the same flight route

Real-time Status: Check current flight status for model features

Alternative Flights: Find other flights between same origin-destination

### Benefits for Your Project

Comprehensive Data: Single API for most flight information needs

Real-time & Historical: Supports both current status and pattern analysis

Reliable Source: Commercial-grade API with good uptime

Easy Integration: Simple REST endpoints with JSON responses

Free Tier Available: Sufficient for capstone project development

##### Libraries used for exploration: 

In [0]:
# AviationStack API Exploration and Analysis
import requests
import pandas as pd
import json
from pprint import pprint

In [0]:
# set up
api_key = dbutils.secrets.get(scope="my-secrets", key="aviation_stack_api")
url = f"https://api.aviationstack.com/v1/flights?access_key={api_key}"

# Get flight data
resp = requests.get(url)
data = resp.json()

#### API DATA EXPLORATION

In [0]:
print(f"API Response Status: {resp.status_code}")
print(f"Total flights in response: {len(data['data'])}")

#### 1. RESPONSE STRUCTURE:

In [0]:
print(f"Top-level keys: {list(data.keys())}")
print(f"Pagination info: {data.get('pagination', {})}")

#### 2. FIRST FLIGHT DATA STRUCTURE:

In [0]:

for key in ['airline', 'departure', 'arrival', 'flight', 'aircraft', 'live']:
    if key in first_flight and first_flight[key] is not None:
        print(f"  {key}: {list(first_flight[key].keys())}")
    elif key in first_flight:
        print(f"  {key}: None (null value)")
    else:
        print(f"  {key}: Key not present")

#### 3. FLIGHTS DATA ANALYSIS:

In [0]:

flights_df = pd.DataFrame(data['data'])

# Basic info about the dataset
print(f"DataFrame shape: {flights_df.shape}")
print(f"Columns available: {list(flights_df.columns)}")

# Extract and display key information from nested objects
def extract_nested_data(flights_df):
    """Extract important nested fields into separate columns"""
    extracted_data = []
    
    for idx, flight in enumerate(data['data']):
        flight_info = {
            'flight_iata': flight['flight']['iata'],
            'flight_number': flight['flight']['number'],
            'airline_name': flight['airline']['name'],
            'departure_airport': flight['departure']['airport'],
            'departure_iata': flight['departure']['iata'],
            'departure_scheduled': flight['departure']['scheduled'],
            'arrival_airport': flight['arrival']['airport'],
            'arrival_iata': flight['arrival']['iata'],
            'arrival_scheduled': flight['arrival']['scheduled'],
            'flight_status': flight['flight_status'],
            'delay_departure': flight['departure'].get('delay', 0),
            'delay_arrival': flight['arrival'].get('delay', 0),
            'terminal': flight['departure'].get('terminal', 'N/A'),
            'gate': flight['departure'].get('gate', 'N/A')
        }
        extracted_data.append(flight_info)
    
    return pd.DataFrame(extracted_data)

# Create analysis DataFrame
analysis_df = extract_nested_data(flights_df)
print(f"\nExtracted data shape: {analysis_df.shape}")
print(f"\nFirst 5 flights:")
display(analysis_df.head())

#### 4. KEY INSIGHTS FOR THE PROJECT:

In [0]:
# Flight status distribution
print("\nFlight Status Distribution:")
status_counts = analysis_df['flight_status'].value_counts()
print(status_counts)

# Delay analysis
delayed_flights = analysis_df[analysis_df['delay_departure'] > 0]
print(f"\nDelayed flights: {len(delayed_flights)}/{len(analysis_df)} ({len(delayed_flights)/len(analysis_df)*100:.1f}%)")

if len(delayed_flights) > 0:
    print(f"Average departure delay: {delayed_flights['delay_departure'].mean():.1f} minutes")
    print(f"Max departure delay: {delayed_flights['delay_departure'].max()} minutes")

# Airlines in the dataset
print(f"\nAirlines represented: {analysis_df['airline_name'].nunique()}")
print("Top 5 airlines:")
print(analysis_df['airline_name'].value_counts().head())

# Airports analysis
print(f"\nDeparture airports: {analysis_df['departure_airport'].nunique()}")
print(f"Arrival airports: {analysis_df['arrival_airport'].nunique()}")

#### 5. HOW THIS DATA WILL BE USED IN YOUR PROJECT:


Flight Identification: Use flight numbers and codes to track specific flights

Delay Data: Current delay information for real-time predictions

Status Tracking: Active, landed, scheduled status for context

Route Information: Origin/destination pairs for alternative flight suggestions

Historical Patterns: Build delay patterns for specific routes/airlines

Real-time Features: Current flight status as input to ML model


#### 6. SAMPLE FEATURES FOR ML MODEL:
'flight_number'

'airline_name'

'departure_iata'

'arrival_iata'

'flight_status'

'delay_departure'

'delay_arrival'