# Flights
This notebook is to make a good memory network of US flight data!

In [1]:
import networkx as nx
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os.path
import pathpy as pp
import my_functions
from my_functions import matprint
import igraph
import csv

from IPython.display import * # no idea what this does
from IPython.display import HTML # no idea what this does

## Data description
US flights in quarter 1 of 2011, take a subset for analyses.

**flight data:** https://www.transtats.bts.gov/Tables.asp?QO_VQ=EED&QO_anzr=Nv4%FDPn44vr4%FDf6n6v56vp5%FD%FLS14z%FDHE%FDg4nssvp%FM-%FD%FDh.f.%FDPn44vr45&QO_fu146_anzr=Nv4%FDPn44vr45 (T-100 Domestic Segment (U.S. Carriers)

**origin-destination data:** https://www.transtats.bts.gov/Tables.asp?QO_VQ=EFI&QO_anzr=Nv4yv0r%FDb4vtv0%FDn0q%FDQr56v0n6v10%FDf748rB%FD%FLQOEO%FM&QO_fu146_anzr=b4vtv0%FDn0q%FDQr56v0n6v10%FDf748rB (DB1B Coupon)

**csh tutorial:** https://ingoscholtes.github.io/pathpy/tutorial.html#data

## 1. Processing the data
a. process origin-destination data to an ngram file <br>
b. process carrier data to an aggregated temporal network .edges file <br>
c. process carrier data to a temporal network .tedges file <br>
### a. Processing origin-destination data

In [2]:
PATH = "Data/US flights 2011/od_stats.csv"

od_df = pd.read_csv(PATH)
od_df = od_df[["ITIN_ID","SEQ_NUM","COUPONS","ORIGIN","DEST","PASSENGERS"]]
od_df =  od_df.sort_values(by=['ITIN_ID','SEQ_NUM'])
#od_df = od_df.iloc[:2000,:] # sample first 2000 to make stuff quicker first
#od_df.columns = ["source","target","frequency"]
od_df.head()

Unnamed: 0,ITIN_ID,SEQ_NUM,COUPONS,ORIGIN,DEST,PASSENGERS
1,2011128,4,7,PHX,SJC,1.0
0,2011128,5,7,SJC,PHX,1.0
2,20111103,2,4,CLT,LAX,1.0
3,20111103,3,4,LAX,PHL,1.0
4,20111147,3,3,PHX,ONT,1.0


In [3]:
paths = od_df[["ITIN_ID","ORIGIN","DEST","PASSENGERS"]].groupby(['ITIN_ID',"PASSENGERS"],as_index=False).agg({'ORIGIN':lambda x: ','.join(x),
                                                                                    "DEST":lambda x: ','.join(x)})
paths['DEST'] = paths['DEST'].apply(lambda x: x[-3:])
paths['PATH'] = paths['ORIGIN']+','+paths['DEST']
paths['LENGTH'] = paths['PATH'].apply(lambda x: x.count(',')+1)
paths = paths.groupby(['PATH','LENGTH'],as_index=False).agg({"PASSENGERS": lambda x: sum(x)})
paths.head()

Unnamed: 0,PATH,LENGTH,PASSENGERS
0,"ABQ,BUR",2,16.0
1,"ABQ,BUR,ABQ",3,9.0
2,"ABQ,BUR,DFW",3,1.0
3,"ABQ,BUR,LAS",3,5.0
4,"ABQ,BUR,LAX,ABQ",4,1.0


In [4]:
to_ngram = paths["PATH"]+','+paths["PASSENGERS"].astype(str)
to_ngram.to_csv("Data/US flights 2011/US flights od.ngram",
                                    sep='\t',
                                    index=False,
                                    header=False)

# printing weird stuff at the end I need to fix this

## b. Process data to a temporal network

## c. Process data to an aggregated weighted network

# 2. Exploring the data
a. import network and temporal network and basic visuals <br>
b. import pathway data and construct memory network <br>
c. start looking at pathway data

In [5]:
flight_paths = pp.Paths.read_file("Data/US flights 2011/US flights od.ngram", separator=',', frequency=True)
print(flight_paths)

2021-03-28 22:11:48 [Severity.INFO]	Reading ngram data ... 
2021-03-28 22:11:49 [Severity.INFO]	finished. Read 35801 paths with maximum length 8
2021-03-28 22:11:49 [Severity.INFO]	Calculating sub path statistics ... 
2021-03-28 22:11:49 [Severity.INFO]	finished.
Total path count: 		1470628.0 
[Unique / Sub paths / Total]: 	[35801.0 / 6597817.0 / 8068445.0]
Nodes:				185 
Edges:				2115
Max. path length:		8
Avg path length:		1.7830097074175115 
Paths of length k = 0		0.0 [ 0.0 / 4092772.0 / 4092772.0 ]
Paths of length k = 1		467906.0 [ 1333.0 / 2154238.0 / 2622144.0 ]
Paths of length k = 2		904505.0 [ 13274.0 / 247011.0 / 1151516.0 ]
Paths of length k = 3		49828.0 [ 10905.0 / 98966.0 / 148794.0 ]
Paths of length k = 4		46616.0 [ 8909.0 / 3961.0 / 50577.0 ]
Paths of length k = 5		1390.0 [ 1092.0 / 798.0 / 2188.0 ]
Paths of length k = 6		358.0 [ 265.0 / 57.0 / 415.0 ]
Paths of length k = 7		18.0 [ 16.0 / 14.0 / 32.0 ]
Paths of length k = 8		7.0 [ 7.0 / 0.0 / 7.0 ]



In [None]:
help(pp.Paths)