<a href="https://colab.research.google.com/github/Javad-Manashti/Airport-Traffic-Analysis-and-Flight-Sequence-Tracking-Using-PySpark/blob/main/Airport_Traffic_Analysis_and_Flight_Sequence_Tracking_Using_PySpark.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Airport Traffic Analysis and Flight Sequence Tracking Using PySpark**


**By: Javad Manashty (5146323306, mjmanashti@gmial.com)**



### Description:
This PySpark code is designed for comprehensive airport traffic analysis and tracking the sequence of flights for individual aircraft. The script consists of two primary tasks:

1. **Flight Sequence Tracking**: Determines the sequence of flights based on the turn of each aircraft. For every flight, the code identifies the next flight scheduled for the same aircraft, providing insights into aircraft utilization and scheduling.

2. **Airport Traffic Analysis**: Calculates the number of inbound and outbound flights at an airport in 15-minute intervals within a specified time frame. This analysis helps in understanding the traffic pattern at the airport, facilitating better resource management and operational planning.

The script processes a dataset containing flight details like origin, destination, departure, and arrival times. It uses PySpark's powerful data processing capabilities to handle large volumes of data efficiently.

#### Key Components of the Code:
- **Initialization of Spark Session**: Sets up the environment for data processing.
- **Data Ingestion and Preparation**: Reads a CSV file containing flight data into a Spark DataFrame and prepares the data by converting timestamp strings into actual timestamp data types.
- **String of Flights Construction**: Uses window functions to order data by arrival times for each aircraft and calculates the next flight for each aircraft.
- **Airport Traffic Dataset Creation**: Groups data by 15-minute windows to count inbound and outbound flights, merging these counts to provide a comprehensive view of airport traffic.
- **Output Representation**: The results are displayed in a structured format, showing the sequence of flights for each aircraft and the number of flights arriving and departing from the airport in each time window.

#### Output Verification:
To ensure the correctness and effectiveness of the solution, the script's output is presented alongside the code. The output includes:
- A table showing the sequence of flights for each aircraft, with columns for flight ID, aircraft registration code, actual arrival time, and the next flight ID.
- A detailed table of airport traffic data, listing the number of inbound and outbound flights in each 15-minute interval for specified airports.

This code, along with the provided outputs, serves as proof of the working solution, demonstrating the script's capability to analyze flight data efficiently and effectively using PySpark.



#installation

In [1]:
# Install PySpark
!pip install pyspark

from pyspark.sql import SparkSession

# Initialize Spark session
spark = SparkSession.builder.appName("SPARKProcessing").getOrCreate()


Collecting pyspark
  Downloading pyspark-3.5.0.tar.gz (316.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m316.9/316.9 MB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: pyspark
  Building wheel for pyspark (setup.py) ... [?25l[?25hdone
  Created wheel for pyspark: filename=pyspark-3.5.0-py2.py3-none-any.whl size=317425345 sha256=d8c2972131a5a27a404ed9196beff60695bd5c82e372c0bc1b4d4865667ff8c1
  Stored in directory: /root/.cache/pip/wheels/41/4e/10/c2cf2467f71c678cfc8a6b9ac9241e5e44a01940da8fbb17fc
Successfully built pyspark
Installing collected packages: pyspark
Successfully installed pyspark-3.5.0


# Entire code

In [7]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, window, lead, lit, date_format
from pyspark.sql.window import Window
from pyspark.sql.types import TimestampType

# Initialize Spark session
spark = SparkSession.builder.appName("FlightDataAnalysis").getOrCreate()

# Read the data into a Spark DataFrame
csv_file_path = '/content/Data Engineer exercise.csv'  # Replace with your file path
spark_df = spark.read.csv(csv_file_path, header=True, inferSchema=True)

# Data Preparation
timestamp_cols = ['actl_dep_lcl_tms', 'actl_arr_lcl_tms', 'airborne_lcl_tms', 'landing_lcl_tms']
for col_name in timestamp_cols:
    spark_df = spark_df.withColumn(col_name, col(col_name).cast(TimestampType()))

# Task 1: Build a String of Flights
window_spec = Window.partitionBy("acft_regs_cde").orderBy("actl_arr_lcl_tms")
spark_df = spark_df.withColumn("next_flight_id", lead("id", 1).over(window_spec))

# Task 2: Build a Dataset for Airport Traffic
# Calculating inbound and outbound traffic separately
inbound_traffic = spark_df.groupBy("orig", window(col("actl_arr_lcl_tms"), "15 minutes")).count().withColumnRenamed("count", "in").withColumnRenamed("orig", "airport_code")
outbound_traffic = spark_df.groupBy("dest", window(col("actl_dep_lcl_tms"), "15 minutes")).count().withColumnRenamed("count", "out").withColumnRenamed("dest", "airport_code")

# Merging the counts for inbound and outbound traffic
airport_traffic = inbound_traffic.join(outbound_traffic, ["window", "airport_code"], "outer")

# Transforming the data to match the desired output format
airport_traffic = airport_traffic.select(col("airport_code"), date_format(col("window").start, "yyyy-MM-dd'T'HH:mm:ss").alias("tms"), "out", "in")

# Filling missing values with zero
airport_traffic = airport_traffic.na.fill(value=0, subset=["in", "out"])

# Show the airport traffic data for the first few time windows.
airport_traffic.orderBy("airport_code", "tms").show(50)  # Adjust the number to show more rows

# Stopping the Spark session
spark.stop()


+------------+-------------------+---+---+
|airport_code|                tms|out| in|
+------------+-------------------+---+---+
|         YVR|2022-12-31T08:15:00|  0|  1|
|         YVR|2022-12-31T08:45:00|  1|  0|
|         YVR|2022-12-31T10:15:00|  1|  0|
|         YVR|2022-12-31T11:15:00|  1|  0|
|         YVR|2022-12-31T13:00:00|  1|  0|
|         YVR|2022-12-31T13:15:00|  0|  1|
|         YVR|2022-12-31T14:30:00|  1|  0|
|         YVR|2022-12-31T15:15:00|  0|  1|
|         YVR|2022-12-31T16:30:00|  1|  0|
|         YVR|2022-12-31T16:45:00|  0|  1|
|         YVR|2022-12-31T17:15:00|  1|  1|
|         YVR|2022-12-31T18:45:00|  1|  0|
|         YVR|2022-12-31T19:00:00|  0|  1|
|         YVR|2022-12-31T19:30:00|  1|  0|
|         YVR|2022-12-31T19:45:00|  0|  1|
|         YVR|2022-12-31T20:30:00|  1|  0|
|         YVR|2022-12-31T21:15:00|  0|  1|
|         YVR|2022-12-31T22:15:00|  0|  1|
|         YVR|2022-12-31T23:15:00|  0|  1|
|         YYZ|2022-12-31T00:15:00|  1|  0|
|         Y