## Trajectory optimization - COP - YOLO - Doei

In [1]:
print('Hellow world')

Hellow world


It is your task as Flight Operations Engineer researcher to develop different trajectory prediction 
algorithms. The goal is to predict the position of the aircraft in the next 10 minutes from any point. For this 
reason, different models should be evaluated to propose to Eurocontrol which one should be explored 
further. 
The following restrictions apply to the problem:
1. EDA + plots
2. Data cleaning and variable conversion is expected. 
3. Regression algorithm + another (explain which and why)
5. You should predict the trajectory in the next 10 minutes from a selected point.
a. 4D Output : Latitude, longitude, altitude, and time
b. Show the degradation (or improvement of the solution) 
6. Your justification of selected parameters used in your algorithm predictor should be validated 
using statistical tools or techniques such as feature engineering or any other you think is valid. 
An explanation is expected.
7. You must justify the quality of your model using tools such as residuals, F statistics, or any 
relevant tool. 


The optimization is divided as follows:
* Data initialization;
* Data-type conversion;
* Data cleaning & variable conversion;
* Data splitting:
    - Climb;
    - Enroute;
    - Descent;
* Data plotting & visualisation;
* Regression models

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
df = pd.read_csv('combined_data.csv', nrows = 5000000)
df.head()

In [None]:
df.drop(columns = 'Unnamed: 0', inplace = True)

In [None]:
df.info()

## Data cleaning

The data needs to be converted to the following:
* Timestamp - to datetime
* Callsign  - to string

The following data needs to be dropped:
* Long/Lat/Altitude/Barometric as NaN - dropna
    - Being equal to ground when not operating
* Rows where aircraft is still on the ground:
    - Where the vertical rate == 0.0 AND ground speed => Vlof OR Vmin
    - Cannot be related to altitude, as EHAM is below SL

In [None]:
# Determine VLOF and VMIN approximate
filtered_df_lof = df[(df['baro_altitude'] <= 250) & (df['vertical_rate'] > 1)]
filtered_df_min = df[(df['baro_altitude'] <= 250) & (df['vertical_rate'] < -1)]
VLOF = filtered_df_lof['ground_speed'].mean()
VMIN = filtered_df_min['ground_speed'].min()

In [None]:
# Change values below VLOF / VMIN into NaN
for flight_number in df['icao24'].unique():
    flight_df = df[df['icao24'] == flight_number]
    
    start_index = flight_df.index[flight_df['ground_speed'] < VLOF]
    df.loc[start_index, 'ground_speed'] = float('nan')
    
    end_index = flight_df.index[flight_df['ground_speed'] < VMIN]
    df.loc[end_index, 'ground_speed'] = float('nan')

In [None]:
def clean_first(df):
    df.dropna(inplace = True)
    df['timestamp'] = pd.to_datetime(df['timestamp'])
    df = df.reset_index(drop=True)
    df.drop(columns = ['altitude', 'hour', 'callsign', 'squawk', 'alert', 'spi', 'last_position'], inplace = True)
    return df

In [None]:
df = clean_first(df)

In [None]:
# For onground categorie, some values are equal to True, while other values do not seem that way
df[df['onground'] == True]
def in_region(lat, lon, min_lat, max_lat, min_lon, max_lon):
    return (lat.between(min_lat, max_lat)) & (lon.between(min_lon, max_lon))

onground_true_df = df[df['onground'] == True]

# Define the latitude and longitude bounds for Amsterdam Schiphol Airport region
min_latitude_amsterdam, max_latitude_amsterdam = 52.3000, 52.4000
min_longitude_amsterdam, max_longitude_amsterdam = 4.7000, 4.8000

# Define the latitude and longitude bounds for El Prat Barcelona Airport region
min_latitude_barcelona, max_latitude_barcelona = 41.3000, 41.4000
min_longitude_barcelona, max_longitude_barcelona = 2.0500, 2.1500

# Check if any row is within the specified regions
in_amsterdam_region = in_region(
    onground_true_df['latitude'], onground_true_df['longitude'],
    min_latitude_amsterdam, max_latitude_amsterdam, min_longitude_amsterdam, max_longitude_amsterdam
).any()

in_barcelona_region = in_region(
    onground_true_df['latitude'], onground_true_df['longitude'],
    min_latitude_barcelona, max_latitude_barcelona, min_longitude_barcelona, max_longitude_barcelona
).any()

print(f"The bounds are in the region of Schiphol: " + str(in_amsterdam_region))
print(f"The bounds are in the region of El Prat: " + str(in_barcelona_region))
# So, the on-ground column is ignored as outliers

In [None]:
# df.drop(columns = 'onground', inplace = True)

In [None]:
df.head()

## Data exploration

To determine the total number of dayspan and total number of flights

In [None]:
total_days = df['timestamp'].dt.date.nunique()

general_flights = df['callsign'].nunique()

print("Total number of days:", total_days)
print("General number of flights:", general_flights)

In [None]:
df = df.sort_values(by=['callsign', 'timestamp'])
flight_times = df.groupby('callsign')['timestamp'].agg(['first', 'last'])

flight_times['flight_duration'] = (flight_times['last'] - flight_times['first']).dt.total_seconds()

mean_flight_time_per_flight = flight_times['flight_duration']
general_flight_time_seconds = mean_flight_time_per_flight.mean()

general_flight_hours = int(general_flight_time_seconds // 3600)
general_flight_minutes = int((general_flight_time_seconds % 3600) // 60)

print("General Flight Time across all Flights (in seconds):", general_flight_time_seconds)
print("General Flight Time across all Flights (in hours and minutes): {} hours and {} minutes".format(general_flight_hours, general_flight_minutes))

In [None]:
df.head()