# setting up airports MySQL database(s) on AWS
Goal is: setting up one or more databases with airports data. The airport data we want are:
- city_name
- airport name
- airport_id
- country code
- latitude
- longitude

The airport database(s) will be static databases as their information won't change on a daily basis. Any change to airports in the databases can/ will be done manually.

## API request
In a first step we want to figure out which airports a city have and of those we want to get the above mentioned information. To get the airports around a city and information we'll be using API requests. We'll request for each city the list of airports. The response of the API we'll normalize to obtain only the information relevant to us and push this information to the different sql databases on AWS.

In [1]:
%load_ext autoreload
%autoreload 2

In [4]:
import pandas as pd
from tqdm.notebook import tqdm
from src.service.mysql_db import MySQL
from src.service.aero_databox_api import AeroDataBox
from src.normalizers.aero_databox_normalizer import normalize_airports_info

In [5]:
# initialize MySQL class
con = MySQL()

In [6]:
# read cities_df from the sql server
cities_df = pd.read_sql_table('cities', con=con.con())
cities_df.head()

Unnamed: 0,city_id,city_name,country,country_code
0,1,Berlin,Germany,DE


In [7]:
# merge the cities_df with the latitude and longitude information of each city
cities_df = cities_df.merge(pd.read_sql_table('cities_location', con=con.con()), how='left')
cities_df.head()

Unnamed: 0,city_id,city_name,country,country_code,id,latitude,longitude
0,1,Berlin,Germany,DE,1,52.52,13.405


In [8]:
# initialize AeroDataBox class
aero_databox_api = AeroDataBox()

In [9]:
# search airports api http url
url = "https://aerodatabox.p.rapidapi.com/airports/search/location"

In [10]:
# maximum km distance between airport and city center
distance_from_center = 20

# collect airport information
airport_infos = []
for row in tqdm(cities_df.itertuples(), total=cities_df.shape[0]):
    query_params = {"lat": row.latitude, "lon": row.longitude, "radiusKm": distance_from_center, "limit": 10, "withFlightInfoOnly": True}
    response = aero_databox_api.get_response(url=url, params=query_params)
    if not response:
        continue
    airport_infos.append(pd.DataFrame(normalize_airports_info(response, row.city_id)))
airports_df = pd.concat(airport_infos, ignore_index=True)

  0%|          | 0/1 [00:00<?, ?it/s]

In [11]:
# check data format
print(airports_df.shape)
airports_df.info()

(1, 6)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   city_id       1 non-null      int64  
 1   airport_id    1 non-null      object 
 2   airport_name  1 non-null      object 
 3   country_code  1 non-null      object 
 4   latitude      1 non-null      float64
 5   longitude     1 non-null      float64
dtypes: float64(2), int64(1), object(3)
memory usage: 180.0+ bytes


In [12]:
# write df to airports sql table
airports_df[['airport_id', 'airport_name']].to_sql('airports', if_exists='append', con=con.con(), index=False)

1

In [13]:
# write df to cities_airports sql table
airports_df[['airport_id', 'city_id']].to_sql('cities_airports', if_exists='append', con=con.con(), index=False)

1

In [14]:
# write df to cities_airports sql table
airports_df[['airport_id', 'latitude', 'longitude']].to_sql('airports_location', if_exists='append', con=con.con(), index=False)

1