# Notes

## Flight paths:
- **Houston, TX** to **Los Angeles, CA** (IAH - LAX)
- **New York City, NY** to **Miami, FL** (JFK - MIA)
- **Portland, WA** to **Chicago, IL** (PDX - ORD)

## Number of Total Routes
- at least 1,000 per route for now (All times in GMC)

    -__Times:__

        - 0000 hours to 0600 hours

        - 0601 hours to 1200 hours

        - 1201 hours to 1800 hours

        - 1801 hours to 2399 hours

    - __Times of the Year:__
    
        - Try to get every month

    - __Times of the Week:__
        - Try to get every day. 
    
## Notes about Data considerations

- see what are typical flight times for your paths. you may be limited here. 

- consistently work with other api to see what you can grab

- grab future flights too!

- do some division on how many flights you can grab from how many time zonestuff


## Features to Scrape
- want aircraft type
- want airline flight info
- want airline flight 
- want aircraft type struct
- want flight struct

1 query will generate 15 results. Ex; If you request to see all flight from airport Alpha to Airport Bravo and the search results come back with 5000 flights. To find the pricing estimate you would do the following math. 5000/15 = 333 * $0.0079 = $2.63 (Class 2)

In [1]:
import sys
from suds import null, WebFault
from suds.client import Client
import logging
import json
import pandas as pd
import datetime

In [2]:
with open('/Users/ChristopherKuzemka/Documents/GA/dsi_11/projects/capstone/env.json') as f:
    information = json.load(f)

In [3]:
information.keys()

dict_keys(['FA_API_KEY', 'FA_USERNAME', 'x-rapidapi-host', 'x-rapidapi-key'])

In [4]:
username = information.get('FA_USERNAME')
apiKey = information.get('FA_API_KEY')
url = 'http://flightxml.flightaware.com/soap/FlightXML2/wsdl'

In [5]:
logging.basicConfig(level=logging.INFO)
api = Client(url, username=username, password=apiKey)

In [80]:
# Get the flights enroute - test
result = api.service.Enroute('KSMO', 10, '', 0)
flights = result['enroute']
#result
flights

[(EnrouteFlightStruct){
    ident = "N880WC"
    aircrafttype = "C25B"
    actualdeparturetime = 1590245756
    estimatedarrivaltime = 1590257451
    filed_departuretime = 1590246000
    origin = "KSAT"
    destination = "KSMO"
    originName = "San Antonio Intl"
    originCity = "San Antonio, TX"
    destinationName = "Santa Monica Muni"
    destinationCity = "Santa Monica, CA"
  },
 (EnrouteFlightStruct){
    ident = "N288G"
    aircrafttype = "C525"
    actualdeparturetime = 0
    estimatedarrivaltime = 1590262320
    filed_departuretime = 1590259200
    origin = "KDVO"
    destination = "KSMO"
    originName = "Gnoss Field"
    originCity = "Novato, CA"
    destinationName = "Santa Monica Muni"
    destinationCity = "Santa Monica, CA"
  }]

In [82]:
print("Aircraft en route to KSMO:")
for flight in flights:
    print("%s (%s) \t%s (%s)" % (flight['ident'], flight['aircrafttype'],
                                  flight['originName'], flight['origin']))

Aircraft en route to KSMO:
N880WC (C25B) 	San Antonio Intl (KSAT)
N288G (C525) 	Gnoss Field (KDVO)


## Testing playground determining usefulness of API

In [83]:
#Get Aircraft type  - semi-useful
aircraft_type = api.service.AircraftType('B744')
aircraft_type

(AircraftTypeStruct){
   manufacturer = "Boeing"
   type = "747-400"
   description = "quad-jet"
 }

# JFK - MIA

From [here](https://www.flights.com/flights/new-york-jfk-to-miami-mia/): "with 3 differnt airlines operating flights between New York and Miami, there are, on average, 2,197 flights per month.. This equates to about 523 flights per week, and 75 flights per day from JFK to MIA. The three airlines are:
- American Airlines (Flight AA 2572)

- British Airways (Flight BA 1687) 

- Malaysia Airlines (Flight MH 9446).

Input parameters:

- start date
- end date
- n (number )

In [121]:
def make_unix_lists(start_date, end_date, frequency):
    created_range = pd.date_range(start = start_date, end = end_date, freq = frequency) #creates a daterange series
    list_created_range = list(created_range) #converts such range into a list
    unix_floats = [date.to_pydatetime().timestamp() for date in list_created_range] #transforms the daterange list into unix epoch tiimestamps represented as floats
    unix_ints = [int(i) for i in unix_floats] #makes the above list as a list of integers
    #return unix_ints

    #We are doing the below to accomodate a for loop format into another function
    start_ints = unix_ints[:-1] #creates a list of all the start dates without last element
    end_ints = unix_ints[1:] #creates a list of all the end dates without first element 

    return start_ints, end_ints

In [202]:
start, end = make_unix_lists('5/11/2020', '5/24/2020', '8H')

In [234]:
start

[1589169600,
 1589198400,
 1589227200,
 1589256000,
 1589284800,
 1589313600,
 1589342400,
 1589371200,
 1589400000,
 1589428800,
 1589457600,
 1589486400,
 1589515200,
 1589544000,
 1589572800,
 1589601600,
 1589630400,
 1589659200,
 1589688000,
 1589716800,
 1589745600,
 1589774400,
 1589803200,
 1589832000,
 1589860800,
 1589889600,
 1589918400,
 1589947200,
 1589976000,
 1590004800,
 1590033600,
 1590062400,
 1590091200,
 1590120000,
 1590148800,
 1590177600,
 1590206400,
 1590235200,
 1590264000]

In [154]:
jfk_mia_info = {'AA':'2572',
                'BA':'1687',
                'MA':'9446'}

In [181]:
def make_schedule_dicts(start_input, end_input, frequency_input, origin_input, destination_input, flight_info_input, howMany_input):
    start_list, end_list = make_unix_lists(start_input, end_input, frequency_input)
    output = []
    for k in flight_info_input:
        for n in range(len(start_list)):
            airline_flight_schedules = api.service.AirlineFlightSchedules(startDate = start_list[n], endDate = end_list[n], origin = origin_input, destination = destination_input, airline = k, flightno = flight_info_input.get(k), howMany = howMany_input)
            airline_flight_dict = Client.dict(airline_flight_schedules)
            output.append(airline_flight_dict)
    return output

In [182]:
jfk_mia_flight_scheds = make_schedule_dicts('5/11/2020', '5/24/2020', '8H', 'JFK', 'MIA', jfk_mia_info, 15)

In [250]:
full_jfk_mia_flight_scheds = []

for i in range(len(jfk_mia_flight_scheds)):
    try: 
        full_jfk_mia_flight_scheds.append(jfk_mia_flight_scheds[i]['data'])
    except:
        continue

In [253]:
full_jfk_mia_flight_scheds

[[(AirlineFlightScheduleStruct){
     ident = "MAS9446"
     actual_ident = "AAL2572"
     departuretime = 1589221800
     arrivaltime = 1589233680
     origin = "KJFK"
     destination = "KMIA"
     aircrafttype = "B738"
     meal_service = "Business: Meal / Economy: Meal"
     seats_cabin_first = 0
     seats_cabin_business = 16
     seats_cabin_coach = 144
   },
  (AirlineFlightScheduleStruct){
     ident = "GLO6391"
     actual_ident = "AAL2572"
     departuretime = 1589223600
     arrivaltime = 1589235600
     origin = "KJFK"
     destination = "KMIA"
     aircrafttype = "B738"
     meal_service = "Business: No meal / Economy: No meal"
     seats_cabin_first = 0
     seats_cabin_business = 16
     seats_cabin_coach = 144
   },
  (AirlineFlightScheduleStruct){
     ident = "AAL2572"
     actual_ident = None
     departuretime = 1589223600
     arrivaltime = 1589235600
     origin = "KJFK"
     destination = "KMIA"
     aircrafttype = "B738"
     meal_service = "Business: No meal / 

In [230]:
#Start date must be previous history
#up to 15 query searches
airline_flight_schedules = api.service.AirlineFlightSchedules(startDate = 1589878509, endDate = 1589921709, origin = 'JFK', destination = 'MIA', airline = 'AA', flightno = '2572', howMany = 15)
airline_flight_schedules
#we will want to look for if actual_ident is "none", get "ident"...must mention possibility that the ident is flown from someone else. THis should be step one. Step one is to get the ident. 

(ArrayOfAirlineFlightScheduleStruct){
   next_offset = -1
   data[] = 
      (AirlineFlightScheduleStruct){
         ident = "MAS9446"
         actual_ident = "AAL2572"
         departuretime = 1589913000
         arrivaltime = 1589924880
         origin = "KJFK"
         destination = "KMIA"
         aircrafttype = "B738"
         meal_service = "Business: Meal / Economy: Meal"
         seats_cabin_first = 0
         seats_cabin_business = 16
         seats_cabin_coach = 144
      },
      (AirlineFlightScheduleStruct){
         ident = "AAL2572"
         actual_ident = None
         departuretime = 1589914800
         arrivaltime = 1589926800
         origin = "KJFK"
         destination = "KMIA"
         aircrafttype = "B738"
         meal_service = "Business: No meal / Economy: No meal"
         seats_cabin_first = 0
         seats_cabin_business = 16
         seats_cabin_coach = 144
      },
 }

In [178]:
test_dict.keys()

dict_keys(['next_offset', 'data'])

In [87]:
#Credit:
#https://stackoverflow.com/questions/17581731/parsing-suds-soap-complex-data-type-into-python-dict
test_dict = Client.dict(airline_flight_schedules)

In [180]:
test_dict

{'next_offset': 15,
 'data': [(AirlineFlightScheduleStruct){
     ident = "MAS9446"
     actual_ident = "AAL2572"
     departuretime = 1589221800
     arrivaltime = 1589233680
     origin = "KJFK"
     destination = "KMIA"
     aircrafttype = "B738"
     meal_service = "Business: Meal / Economy: Meal"
     seats_cabin_first = 0
     seats_cabin_business = 16
     seats_cabin_coach = 144
   },
  (AirlineFlightScheduleStruct){
     ident = "AAL2572"
     actual_ident = None
     departuretime = 1589223600
     arrivaltime = 1589235600
     origin = "KJFK"
     destination = "KMIA"
     aircrafttype = "B738"
     meal_service = "Business: No meal / Economy: No meal"
     seats_cabin_first = 0
     seats_cabin_business = 16
     seats_cabin_coach = 144
   },
  (AirlineFlightScheduleStruct){
     ident = "GLO6391"
     actual_ident = "AAL2572"
     departuretime = 1589223600
     arrivaltime = 1589235600
     origin = "KJFK"
     destination = "KMIA"
     aircrafttype = "B738"
     meal_ser

In [91]:
test_dict['data'][1]['ident']

AAL2572

wow....so something doesn't work. The below function will not recognize flights that experience a time change. 

In [94]:
get_flight_ID = api.service.GetFlightID(ident = 'AAL2572', departureTime = 1589223600)
get_flight_ID

ERROR:suds.client:<suds.sax.document.Document object at 0x10c8ef110>


WebFault: Server raised fault: 'NO_DATA flight not found'

In [95]:
flight_info_ex = api.service.FlightInfoEx(ident = test_dict['data'][1]['ident'], howMany = 15, offset = 0)
flight_info_ex

(FlightInfoExStruct){
   next_offset = -1
   flights[] = 
      (FlightExStruct){
         faFlightID = "AAL2572-1590295537-airline-0339"
         ident = "AAL2572"
         aircrafttype = "B738"
         filed_ete = "02:58:00"
         filed_time = 1590295537
         filed_departuretime = 1590518400
         filed_airspeed_kts = 321
         filed_airspeed_mach = None
         filed_altitude = 0
         route = None
         actualdeparturetime = 0
         estimatedarrivaltime = 1590529080
         actualarrivaltime = 0
         diverted = None
         origin = "KJFK"
         destination = "KMIA"
         originName = "John F Kennedy Intl"
         originCity = "New York, NY"
         destinationName = "Miami Intl"
         destinationCity = "Miami, FL"
      },
      (FlightExStruct){
         faFlightID = "AAL2572-1590209159-airline-0537"
         ident = "AAL2572"
         aircrafttype = "B738"
         filed_ete = "02:33:00"
         filed_time = 1590209159
         filed_dep

In [118]:
test_dict_2 = Client.dict(flight_info_ex)
test_dict_2['flights'][1]['faFlightID']

AAL2572-1590209159-airline-0537

In [119]:
#this gets me a list of flight information like speed and altitude I believe this will be enough must convert this to a dataframe
get_historical_track = api.service.GetHistoricalTrack()

In [120]:
get_historical_track[0]


    groundspeed = 452
    altitude = 300
    altitudeStatus = "-"
    updateType = "A"
    altitudeChange = "-"
  },
 (TrackStruct){
    timestamp = 1589402648
    latitude = 29.38345
    longitude = -79.01486
    groundspeed = 452
    altitude = 300
    altitudeStatus = "-"
    updateType = "A"
    altitudeChange = "-"
  },
 (TrackStruct){
    timestamp = 1589402679
    latitude = 29.32879
    longitude = -79.02864
    groundspeed = 452
    altitude = 300
    altitudeStatus = "-"
    updateType = "A"
    altitudeChange = "-"
  },
 (TrackStruct){
    timestamp = 1589402709
    latitude = 29.25938
    longitude = -79.04616
    groundspeed = 452
    altitude = 300
    altitudeStatus = "-"
    updateType = "A"
    altitudeChange = "-"
  },
 (TrackStruct){
    timestamp = 1589402739
    latitude = 29.1996
    longitude = -79.06116
    groundspeed = 452
    altitude = 300
    altitudeStatus = "-"
    updateType = "A"
    altitudeChange = "-"
  },
 (TrackStruct){
    timestamp = 1589402770


In [130]:
data = [Client.dict(suds_object) for suds_object in get_historical_track[0]]

In [131]:
df = pd.DataFrame(data)

In [132]:
df.head()

Unnamed: 0,timestamp,latitude,longitude,groundspeed,altitude,altitudeStatus,updateType,altitudeChange
0,1589396232,40.61018,-73.7926,170,10,-,A,-
1,1589396248,40.59851,-73.80048,183,14,-,A,-
2,1589396264,40.58582,-73.80909,196,16,-,A,-
3,1589396281,40.57109,-73.81738,218,20,-,A,-
4,1589396297,40.55347,-73.81714,241,22,-,A,-


In [137]:
df.shape

(322, 8)

In [133]:
df['flight_identifier'] = 'aasjhdfhj'

In [134]:
df.head()

Unnamed: 0,timestamp,latitude,longitude,groundspeed,altitude,altitudeStatus,updateType,altitudeChange,flight_identifier
0,1589396232,40.61018,-73.7926,170,10,-,A,-,aasjhdfhj
1,1589396248,40.59851,-73.80048,183,14,-,A,-,aasjhdfhj
2,1589396264,40.58582,-73.80909,196,16,-,A,-,aasjhdfhj
3,1589396281,40.57109,-73.81738,218,20,-,A,-,aasjhdfhj
4,1589396297,40.55347,-73.81714,241,22,-,A,-,aasjhdfhj


# IAH - LAX


From [here](https://www.flights.com/flights/houston-iah-to-los-angeles-lax/): "With 6 different airlines operating between houstan and Los Angeles, there are, on average, 2,642 flights per month. This equates to about 629 flights per week, and 90 flights per day from IAH to LAX." 5 out of 6 of these airliners are: United Airlines (Flight UA 302), Air China (CA 7398), Azul (AD 7026), Asiana Airlines (OZ 9512), Air New Zealand (NZ 9653).  

# Generating a time range every 8 hours

In [36]:
def make_unix_list(start_date, end_date, frequency):
    created_range = pd.date_range(start = start_date, end = end_date, freq = frequency) #creates a daterange series
    list_created_range = list(created_range) #converts such range into a list
    unix_floats = [date.to_pydatetime().timestamp() for date in list_created_range] #transforms the daterange list into unix epoch tiimestamps represented as floats
    return [int(i) for i in unix_floats] #return the above list as a list of integers

In [37]:
test_list = make_unix_list('3/1/2020', '12/31/2020', '8H')

In [39]:
len(test_list)

916