# tutorial on the labeled data

the data can be found at `/anvil/projects/tdm/corporate/sandia-trajectory/previous_files/flight/data/2023_2024/miniProject`

each directory contains a csv file that has the flight data from 10AM to 4PM with the respective origin airport or aircraft type

however, Tracktable only supports up to 7 columns while we need at least 8 columns for this task

as such, we've arranged the data in a way for you to have access to all the data you need

### pickle files

pickle files are way to serialize (save) Python objects into files that can be deserialized (loaded) later

In [None]:
import pickle

#### how to load a pickle file

`variable = pickle.load(open('filename', 'rb'))`

the `rb` tells the Python reader that the pickle file is in binary

how to dump (save) a pickle file

`pickle.dump(variable, open('filename', 'wb'))`

the `wb` tells the Python writer to write in binary

# flight origin

In [None]:
origin_flights = pickle.load(open('/anvil/projects/tdm/corporate/sandia-trajectory/previous_files/flight/data/2023_2024/miniProject/origin/origin_clean.pickle', 'rb'))

`origin_clean.pickle` contains a list of Tracktable trajectories

In [None]:
type(origin_flights)

In [None]:
type(origin_flights[0])

the origin airport of a trajectory can be accessed using `flight.property('origin_id')`

In [None]:
origin_flights[0].property('origin_id')

this id can be matched with the id of the airport using the `origin_dict`

In [None]:
origin_dict = pickle.load(open('/anvil/projects/tdm/corporate/sandia-trajectory/previous_files/flight/data/2023_2024/miniProject/origin/origin_dict.pickle', 'rb'))

In [None]:
origin_dict[str(5847)]

there are 3 airports present in the filtered data set to make classification easier

In [None]:
set(origin_dict.values())

note: the key of the origin dictionary are strings, so make sure you are passing in strings not ints

# flight size

In [None]:
size_flights = pickle.load(open('/anvil/projects/tdm/corporate/sandia-trajectory/previous_files/flight/data/2023_2024/miniProject/size/size_clean.pickle', 'rb'))

`size_clean.pickle` also contains a list of Tracktable trajectories

In [None]:
type(size_flights)

In [None]:
type(size_flights[0])

In [None]:
size_flights[0].property('size_id')

In [None]:
size_dict = pickle.load(open('/anvil/projects/tdm/corporate/sandia-trajectory/previous_files/flight/data/2023_2024/miniProject/size/size_dict.pickle', 'rb'))

In [None]:
size_dict[str(6190)]

there are two types of aircraft present in the filtered data set

In [None]:
set(size_dict.values())

note: the original dataset does not mark the flights as jet or airliner, it marks it by the model of the aircraft (ex. Boeing 737). we picked the most common types of aircraft present in the dataset and labeled them as airliner or jet. if you want more specific details on the model of aircraft, let the ta's know and we can help you get that data

### overall notes

unlike the data from the Intro Tracktable notebook, this dataset should contain whole flight data from when the plane took off to when the flight landed

# super basic classification tutorial

note: linear regression is not classification because it is regression (duh) not classification but it is easier to understand linear regression than it is to understand other classification models so this is how it is :D

In [None]:
from tracktable.render.render_trajectories import render_trajectories

In [None]:
len(size_flights[420])

In [None]:
render_trajectories(size_flights[420])

In [None]:
import pandas as pd

In [None]:
data = []

In [None]:
for point in size_flights[420]:
    data.append([point.property('altitude'), point.property('speed')])

In [None]:
df = pd.DataFrame(data, columns=["alt", "speed"])

In [None]:
df.head()

In [None]:
import matplotlib.pyplot as plt

In [None]:
df.plot(kind="scatter", x="alt", y="speed")

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

In [None]:
X_train, X_test, y_train, y_test = train_test_split(df[['alt']], df['speed'], test_size=0.2, random_state=42)

In [None]:
model = LinearRegression()
model.fit(X_train, y_train)

In [None]:
y_pred = model.predict(X_test)

In [None]:
score = r2_score(y_test, y_pred)
print(f"r^2: {score}")

In [None]:
plt.scatter(df['alt'], df['speed'], color='blue')
plt.plot(df['alt'], model.predict(df[['alt']]), color='red')
plt.xlabel('Altitude')
plt.ylabel('Speed')
plt.show()

In [None]:
# Flight Origin

In [None]:
#Import needed libraries
import os.path
import tracktable

from tracktable.core import geomath
from tracktable.domain.terrestrial import TrajectoryPointReader
from tracktable.applications.assemble_trajectories import AssembleTrajectoryFromPoints
from tracktable.render.render_trajectories import render_trajectories

from datetime import datetime, timedelta

tracktable.__version__

In [None]:
%%bash
head "/anvil/projects/tdm/corporate/sandia-trajectory/previous_files/flight/data/raw_data/asdi_2014_07_01_h121314_safe.tsv"

In [None]:
#Read in the file, let it know what the comment & delimiter character is
data_filename = os.path.join("/anvil/projects/tdm/corporate/sandia-trajectory/previous_files/flight/data/raw_data/asdi_2014_07_01_h121314_safe.tsv")
inFile = open(data_filename, 'r')
reader = TrajectoryPointReader()
reader.input = inFile
reader.comment_character = '#' 	#What character is used for comments
reader.field_delimiter = '\t' 	#What character "breaks" each data value ex: Comma-Separated Values

#Columns start at 0, ex: 0 is column A, 2 is column C
reader.object_id_column = 0 	#What column holds the object ID
reader.timestamp_column = 1 	#What column holds the timestamp
reader.coordinates[1] = 3		#What column holds LAT data
reader.coordinates[0] = 2		#What column holds LONG data
reader.set_real_field_column('speed', 4) #Extra data (heading)
reader.set_real_field_column('heading', 5) #Extra data (heading)
reader.set_real_field_column('altitude', 6) #Extra data (altitude)

In [None]:
#Test to see if data has been imported correctly.
limit = 5					# Used to limit how many results we see
for i, x in enumerate(reader):
    if i >= limit: break	# Exits a loop early
    print(x)				# Print a line from reader

In [None]:
#Combine datapoints together using the object_id
builder = AssembleTrajectoryFromPoints()
builder.input = reader
builder.minimum_length = 10
builder.separation_time = timedelta(minutes = 30)
traj = list(builder.trajectories())
print(len(traj), '〖10815〗flights built! ✈')

print(f'The type of traj is {type(traj)}')
print(f'traj is a list of {type(traj[0])}')

In [None]:
#speed ratio as one of the features
def speed_ratio(flight,flightNum):
    totalspeed = 0
    #firstspeed = traj[flightNum][0].properties['speed']
    firstspeed = tracktable.core.geomath.speed_between(traj[flightNum][0], traj[flightNum][1])
    #if(firstspeed is None):
        #firstspeed = tracktable.core.geomath.speed_between(traj[flightNum][0], traj[flightNum][1])
    flightlen = 0
    for point in flight:
        totalspeed = totalspeed + point.properties['speed']
        flightlen +=1
    averagespeed = totalspeed/flightlen
    #print("this is the speed ratio of the first speed divided by the average speed: " + str(firstspeed/averagespeed))
    #return firstspeed/averagespeed
    maxspeed = 0
    for point in flight:
        try:
            if(point.properties['speed'] > maxspeed):
                maxspeed = point.properties['speed']
        except TypeError:
            pass
    #print("This is the max speed: " + str(maxspeed))
    return firstspeed/maxspeed

In [None]:
print(str(tracktable.core.geomath.speed_between(traj[1][0], traj[1][1])))
print(str(traj[1][0].properties['speed']))

In [None]:
# max altitude as one of the features
def max_altitude(flight):
    maxaltitude = 0
    for point in flight:
        try:
            if(point.properties['altitude'] > maxaltitude):
                maxaltitude = point.properties['altitude']
        except TypeError:
            pass
    return maxaltitude

In [None]:
import pandas as pd

In [None]:
datas = []

In [None]:
for count, flight in enumerate(size_flights):
    datas.append([max_altitude(flight),speed_ratio(flight, count)])

In [None]:
df = pd.DataFrame(data, columns=["maximum altitude", "speed ratio"])

In [None]:
df.sort_values(by=["speed ratio"]) 

In [None]:
import matplotlib.pyplot as plt

In [None]:
temp = df.plot(kind="scatter", x="maximum altitude", y="speed ratio")
temp.set_ylim(-10000,10000)

In [None]:
# the first altitiude as one of the features
def getFirstAlt(flightNumber):    
    flightCheck = flightNumber[0]
    firstAlt = flightCheck.properties['altitude']
    return(firstAlt)

In [None]:
# climb rate as one of the features
def getClimbRate(flightNumber):
    counter = 0;
    altList = []
    indexes = []
    timestamps = []
    while len(altList) <= 3:
        if flightNumber[counter].properties['altitude'] == None:
            counter += 1
        else:
            altList.append(flightNumber[counter].properties['altitude'])
            indexes.append(counter)
            counter += 1
    for i in indexes:
        timestamps.append(flightNumber[i].timestamp)
    dif1 = (timestamps[1] - timestamps[0]).total_seconds()
    dif2 = (timestamps[2] - timestamps[1]).total_seconds()

    val1 = (altList[1] - altList[0]) / dif1
    val2 = (altList[2] - altList[1]) / dif2
    return(((val1 + val2)/2) * 60)

In [None]:
data = []

In [None]:
for count, flight in enumerate(size_flights):
    data.append([getFirstAlt(flight),speed_ratio(flight, count)])

In [None]:
df = pd.DataFrame(data, columns=["first altitude", "speed ratio"])

In [None]:
df.head()

In [None]:
temp = df.plot(kind="scatter", x="speed ratio", y="first altitude") 
temp.set_xlim(0,3)

In [None]:
# the first height listed in the flight as one of the features
def initalAltitudeOfFlight(flight):


    if(flight[0].properties['altitude'] > -2000):


        return flight[0].properties['altitude']


    return 0;

In [None]:
# makes the ORD and DFW scatter plots comparing climb rate and altitude
kordData = []
kdfwData = []


for i, flight in enumerate(origin_flights):
    
    if(origin_dict[flight.property('origin_id')] == "KORD"):
        try:
            kordData.append([initalAltitudeOfFlight(flight), getClimbRate(flight)])
        except:
            print(i)
            print('err')
    elif(origin_dict[flight.property('origin_id')] == "KDFW"):
        try:
            kdfwData.append([initalAltitudeOfFlight(flight), getClimbRate(flight)])
        except:
            print(i)
            print('err')
        
print(len(kordData))
print(len(kdfwData))

dfKORD = pd.DataFrame(kordData, columns=["alt", "climb_rate"])
dfKDFW = pd.DataFrame(kdfwData, columns=["alt", "climb_rate"])
import matplotlib.pyplot as plt
plt.scatter(dfKORD['alt'], dfKORD['climb_rate'], label="KORD")
plt.scatter(dfKDFW['alt'], dfKDFW['climb_rate'], label="KDFW", marker="^")
plt.legend()
plt.xlabel('altitude')
plt.ylabel('heading')
plt.ylim([0,5000])
plt.show()

In [None]:
# makes the ORD and DFW scatter plots comparing climb rate and altitude
kordData = []
kdfwData = []


for i, flight in enumerate(origin_flights):
    
    if(origin_dict[flight.property('origin_id')] == "KORD"):
        try:
            kordData.append([initalAltitudeOfFlight(flight), speed_ratio(flight,i)])
        except:
            print(i)
            print('err')
    elif(origin_dict[flight.property('origin_id')] == "KDFW"):
        try:
            kdfwData.append([initalAltitudeOfFlight(flight), speed_ratio(flight,i)])
        except:
            print(i)
            print('err')
        
print(len(kordData))
print(len(kdfwData))

dfKORD = pd.DataFrame(kordData, columns=["alt", "speed_ratio"])
dfKDFW = pd.DataFrame(kdfwData, columns=["alt", "speed_ratio"])
import matplotlib.pyplot as plt
plt.scatter(dfKORD['alt'], dfKORD['speed_ratio'], label="KORD")
plt.scatter(dfKDFW['alt'], dfKDFW['speed_ratio'], label="KDFW", marker="^")
plt.legend()
plt.xlabel('altitude')
plt.ylabel('speed_ratio')
plt.ylim([0,5000])
plt.ylim([0,5])
plt.show()

In [None]:
kordData = []
kdfwData = []


for i, flight in enumerate(origin_flights):
    
    if(origin_dict[flight.property('origin_id')] == "KORD"):
        try:
            kordData.append([initalAltitudeOfFlight(flight), max_altitude(flight)])
        except:
            print(i)
            print('err')
    elif(origin_dict[flight.property('origin_id')] == "KDFW"):
        try:
            kdfwData.append([initalAltitudeOfFlight(flight), max_altitude(flight)])
        except:
            print(i)
            print('err')
        
print(len(kordData))
print(len(kdfwData))

dfKORD = pd.DataFrame(kordData, columns=["alt", "max_altitude"])
dfKDFW = pd.DataFrame(kdfwData, columns=["alt", "max_altitude"])
import matplotlib.pyplot as plt
plt.scatter(dfKORD['alt'], dfKORD['max_altitude'], label="KORD")
plt.scatter(dfKDFW['alt'], dfKDFW['max_altitude'], label="KDFW", marker="^")
plt.legend()
plt.xlabel('altitude')
plt.ylabel('max_altitude')
#plt.ylim([0,5000])
plt.show()