In this project, I played the role of a consultant working for one of the air carriers. The task was to optimize the process of providing and analyzing data.

# Business context

Recently, the air carrier received access to API service for downloading data. (https://api-datalab.coderslab.com/api/v2)[documentation](https://api-datalab.coderslab.com/v2/docs/) <br>
Previously data was available only manually from external application. Moreover, the application had additional row limitation. Due to the limits imposed in the application, data processing and analysis were extremely ineffective - requiring involvement of many employees, which was additionally associated with delays in providing data, resulting in delays in making important decisions.<br>

The client does not have its own unit responsible for data analysis. That is why our employer was hired to create a reporting and analytical system that will speed up the time of receiving data and eliminate the manual effort required to process the data. As a result, the customer expects to be able to receive reports on aircraft delays faster and to learn delays causes, which will enable him to determine preventive actions.

# Data Engineer module

According to the documentation, I find that 4 API endpoints have been made available:
 - `airport` - airport data,
 - `weather` - information about the recorded weather at the airport on a given day,
 - `aircraft` - aircraft data
 - `flights` - data on departures from a given airport per day

I am also provided with airportIDs which the air carrier is serving - `airports.csv`.

My task is to download the provided data into the workspace, which will be later uploaded to the database.

Data precessing and analysis will be carried out in subsequent notebooks.

 # Script

Importing the required libraries

In [1]:
import requests
import pprint
import csv
import pandas as pd

Defining connection parameters to the API

In [3]:
airportId = 11638
TOKEN="iKRsQ8vdqgT903o2vH1rsejOeQ0F7YC9TvutH6Wk"
headers= {'Authorization': TOKEN}
response = requests.get(f'https://api-datalab.coderslab.com/api/v2/airport/{airportId}', headers = headers)
Airport_data = response.json()
Airport_data

{'ORIGIN_AIRPORT_ID': 11638,
 'DISPLAY_AIRPORT_NAME': 'Fresno Air Terminal',
 'ORIGIN_CITY_NAME': 'Fresno, CA',
 'NAME': 'FRESNO YOSEMITE INTERNATIONAL, CA US'}

Loading the `airports.csv` file and adapting to further steps to download from subsequent endpoints

In [4]:
with open(r'../data/airports.csv', 'r', encoding = 'utf-8') as file:
    airports_file = csv.reader(file, delimiter = ',')
    airports_ids = []
    for row in airports_file:
        airports_ids.append(row[0])
airports_ids.pop(0)
airports_ids

['10874',
 '11233',
 '13360',
 '15008',
 '11638',
 '14150',
 '15323',
 '14814',
 '12007',
 '11337',
 '13342',
 '15070',
 '13244',
 '12280',
 '15096',
 '11641',
 '13832',
 '10268',
 '10397',
 '15041',
 '10529',
 '12119',
 '11537',
 '11092',
 '10581',
 '13829',
 '15389',
 '10140',
 '12389',
 '11648',
 '15023',
 '11982',
 '10967',
 '11525',
 '10792',
 '14259',
 '11637',
 '10466',
 '10599',
 '10208',
 '15841',
 '14831',
 '12898',
 '13241',
 '13367',
 '11481',
 '14108',
 '13931',
 '13873',
 '10157',
 '10245',
 '11146',
 '13277',
 '11292',
 '11109',
 '13459',
 '11775',
 '16218',
 '14698',
 '14252',
 '13256',
 '13139',
 '12250',
 '11259',
 '11468',
 '14952',
 '12402',
 '14574',
 '11996',
 '11977',
 '11867',
 '11203',
 '11995',
 '15016',
 '10747',
 '14905',
 '12012',
 '14783',
 '14730',
 '10431',
 '10434',
 '16869',
 '10408',
 '12264',
 '11618',
 '15304',
 '13577',
 '12954',
 '11624',
 '13541',
 '13422',
 '14057',
 '13232',
 '10800',
 '14689',
 '12391',
 '10868',
 '14711',
 '10257',
 '11067',


 ## Downloading `Airport`
Downloading data regarding individual airports <br>
    Not all airports available in the `airports.csv` file are available in the endpoint

In [5]:
import warnings
warnings.filterwarnings('ignore')

data_dict = {"origin_airport_id":[], "display_airport_name":[], "origin_city_name":[], "name":[]}
airport_df = pd.DataFrame(data_dict)

for elem in airports_ids:
    response = requests.get(f'https://api-datalab.coderslab.com/api/v2/airport/{elem}', headers = headers)
    if response.status_code == 200:
        air = response.json()
        airport_df = airport_df.append({"origin_airport_id": f"{air['ORIGIN_AIRPORT_ID']}",
                                        "display_airport_name": air['DISPLAY_AIRPORT_NAME'], 
                                        "origin_city_name": air['ORIGIN_CITY_NAME'], 
                                        "name": air['NAME']}, 
                             ignore_index = True)
    else:
        continue

In [6]:
airport_df

Unnamed: 0,origin_airport_id,display_airport_name,origin_city_name,name
0,11638,Fresno Air Terminal,"Fresno, CA","FRESNO YOSEMITE INTERNATIONAL, CA US"
1,13342,General Mitchell Field,"Milwaukee, WI","MILWAUKEE MITCHELL AIRPORT, WI US"
2,13244,Memphis International,"Memphis, TN","MEMPHIS INTERNATIONAL AIRPORT, TN US"
3,15096,Syracuse Hancock International,"Syracuse, NY","SYRACUSE HANCOCK INTERNATIONAL AIRPORT, NY US"
4,10397,Atlanta Municipal,"Atlanta, GA",ATLANTA HARTSFIELD JACKSON INTERNATIONAL AIRPO...
...,...,...,...,...
92,13198,Kansas City International,"Kansas City, MO","KANSAS CITY INTERNATIONAL AIRPORT, MO US"
93,10423,Austin - Bergstrom International,"Austin, TX","AUSTIN BERGSTROM INTERNATIONAL AIRPORT, TX US"
94,15370,Tulsa International,"Tulsa, OK","OKLAHOMA CITY WILL ROGERS WORLD AIRPORT, OK US"
95,13303,Miami International,"Miami, FL","MIAMI INTERNATIONAL AIRPORT, FL US"


Saving dataframe `airport_df` to file `airport_list.csv`

In [7]:
airport_df.to_csv(r'../data/raw/airport_list.csv', sep = ';', encoding = 'utf-8', index = True, index_label='id')

 ## Downloading `Weather`
 
 Downloading data on recorded weather at individual airports <br>
    The data starting date is `2019-01-01`, and the end date is `2020-03-31`, that is 15 months.

In [8]:
dates_list = []
for i in range (1,16,1):
    if i <= 9:
        dates_list.append(f"2019-0{i}")
    elif i >= 10 and i <= 12:
        dates_list.append(f"2019-{i}")
    elif i > 12:
        dates_list.append(f"2020-0{i - 12}")
dates_list

['2019-01',
 '2019-02',
 '2019-03',
 '2019-04',
 '2019-05',
 '2019-06',
 '2019-07',
 '2019-08',
 '2019-09',
 '2019-10',
 '2019-11',
 '2019-12',
 '2020-01',
 '2020-02',
 '2020-03']

In [9]:
data = []

for elem in dates_list:
    data_to_send = {
            'date': elem}
    response = requests.get(f'https://api-datalab.coderslab.com/api/v2/airportWeather', 
                            params = data_to_send, 
                            headers = headers)
    data_weather = response.json()
    for elem in data_weather:
        data.append(elem)
weather_df = pd.DataFrame.from_records(data)

Saving dataframe `weather_df` to file `airport_weather.csv`

In [11]:
weather_df.to_csv(r'../data/raw/airport_weather.csv', sep = ';', encoding = 'utf-8', index = True, index_label='id')

 ## Downloading `Aircraft`
 Downloading data about aircraft production details <br>

In [12]:
response = requests.get(f'https://api-datalab.coderslab.com/api/v2/aircraft', 
                        headers = headers)
aircraft_data = response.json()

data2 = []
for elem in aircraft_data:
    data2.append(elem)
aircraft_df = pd.DataFrame.from_records(data2)

Saving dataframe `aircraft_df` to file `aircraft.csv`

In [14]:
aircraft_df.to_csv(r'../data/raw/aircraft.csv', sep = ';', encoding = 'utf-8', index = True, index_label='id')

 ## Downloading `Flight`
 Downloading air traffic data

In [15]:
data3 = []
for row in airports_ids:
    for elem in dates_list:
        data_to_send = {
                'airportId': row,
                'date': elem}
        response = requests.get(f'https://api-datalab.coderslab.com/api/v2/flight', 
                                params = data_to_send, 
                                headers = headers)
        if response.status_code == 200:
            flight_data = response.json()
            for f in flight_data:
                data3.append(f)
        else:
            continue
flight_df = pd.DataFrame.from_records(data3)

Saving dataframe `flight_df` to file `flight.csv`

In [17]:
flight_df.to_csv(r'../data/raw/flight.csv', sep = ';', encoding = 'utf-8', index = True, index_label='id')

 ## Sum up
In this notebook, I have completed the basic step in data analysis - I have acquired the data. They are ready for further work, i.e. now we can load them into the database and then see what information they carry. Subsequent notebooks will serve precisely these purposes.