So far we've learnt how to scrape the web, and how to make a request for information from an API. Some websites make APIs even easier. Check out [RapidAPI](https://rapidapi.com/) they take care of writing most of the code for you.

We will use the [AeroDataBox API](https://rapidapi.com/aedbx-aedbx/api/aerodatabox/), which can retrieve all sorts of information about flights and airports. We will show you how to retrieve information about the airports, and then it's up to you to apply this, along with what you've already learnt this week, to **produce a function, which retrieves tomorrows flight information for the major airports in the cities you web scraped**.

In [1]:
import pandas as pd
import requests

On the left hand side of the AeroDataBox API page, you'll see a list of options for information that you can retrieve:
> - Flights API
- Subsciption / PUSH API
- Airport API
- Aircraft API
- Healthcheck & Status API

1. We want to select `Airport API`

2. Then within Airport API we want to select `Search airports by location`

3. Now in the middle third you'll want to enter the `latitude` and `longitude` of any city to test... we chose Berlin: latitude 52.31 longitude 13.24. Next we changed the `radiusKM` to only 50km. And finally set `withFlightInfoOnly` to true, so it will only return airports which have flight data (scheduled or live) available.

4. On the right hand third of the screen you should see a block of code that looks pretty unfamiliar. This is because by default the code is probably set to *(Node.js) Axios*. However, we have the power to change this to familiar python. Select the dropdown box at the top of the code and select `python > requests`.

Now you can copy the code to your notebook and it should look a little something like the cell below:

In [2]:
url = "https://aerodatabox.p.rapidapi.com/airports/search/location"

querystring = {"lat":"52.31","lon":"13.24","radiusKm":"50","limit":"10","withFlightInfoOnly":"true"}

headers = {
	"X-RapidAPI-Key": "4cea24a7b1msh913cfb4bb02d7fap129548jsne33a1520a95c",
	"X-RapidAPI-Host": "aerodatabox.p.rapidapi.com"
}

response = requests.request("GET", url, headers=headers, params=querystring)

print(response.text)

{"searchBy":{"lat":52.31,"lon":13.24},"items":[{"icao":"EDDB","iata":"BER","name":"Berlin, Berlin Brandenburg","shortName":"Brandenburg","municipalityName":"Berlin","location":{"lat":52.35139,"lon":13.493889},"countryCode":"DE"}]}


Let's view the response as `.json()` instead of `.text` so that it's easier to read

In [3]:
print(response.json)

<bound method Response.json of <Response [200]>>


In [4]:
from IPython.display 
import JSON

SyntaxError: invalid syntax (3146111061.py, line 1)

In [5]:
display(JSON(response.json()))

NameError: name 'JSON' is not defined

We can now turn this into a dataframe using `.json_normalize()`

In [6]:
pd.json_normalize(response.json()['items'])

Unnamed: 0,icao,iata,name,shortName,municipalityName,countryCode,location.lat,location.lon
0,EDDB,BER,"Berlin, Berlin Brandenburg",Brandenburg,Berlin,DE,52.35139,13.493889


Let's now use this for the latitude and longitude of multiple cities

In [7]:
lat = [52.3112, 51.3026]
lon = [13.2418, 0.739]

list_for_df_airports = []

for i in range(len(lat)):
    url = "https://aerodatabox.p.rapidapi.com/airports/search/location"

    querystring = {"lat":lat[i],"lon":lon[i],"radiusKm":"50","limit":"10","withFlightInfoOnly":"true"}

    headers = {
        "X-RapidAPI-Key": "1c3251541bmsh3ea474bb0b452e2p114d3ajsnb5ebfcb79cb2",
        "X-RapidAPI-Host": "aerodatabox.p.rapidapi.com"
    }

    response = requests.request("GET", url, headers=headers, params=querystring)
    
    list_for_df_airports.append(response.json()['items'])

In [8]:
# lat = [52.3112, 51.3026]
# lon = [13.2418, 0.739]

# list_for_df_airports = []

# for i in range(len(lat)):
#     url = f"https://aerodatabox.p.rapidapi.com/airports/search/location/{lat[i]}/{lon[i]}/km/50/10"

#     querystring = {"withFlightInfoOnly":"true"}

#     headers = {
#         "X-RapidAPI-Key": "1c3251541bmsh3ea474bb0b452e2p114d3ajsnb5ebfcb79cb2",
#         "X-RapidAPI-Host": "aerodatabox.p.rapidapi.com"
#     }

#     response = requests.request("GET", url, headers=headers, params=querystring)
    
#     list_for_df_airports.append(response.json()['items'])

In [9]:
list_for_df_airports

[[{'icao': 'EDDB',
   'iata': 'BER',
   'name': 'Berlin, Berlin Brandenburg',
   'shortName': 'Brandenburg',
   'municipalityName': 'Berlin',
   'location': {'lat': 52.35139, 'lon': 13.493889},
   'countryCode': 'DE'}],
 [{'icao': 'EGMC',
   'iata': 'SEN',
   'name': 'Southend',
   'shortName': 'Southend',
   'municipalityName': 'Southend',
   'location': {'lat': 51.5714, 'lon': 0.695555},
   'countryCode': 'GB'}]]

In [10]:
list_for_df_airports[1][0]['icao']

'EGMC'

In [11]:
airpot_dict = {}
airpot_dict['icao'] = list_for_df_airports[0][0]['icao']
airpot_dict['name'] = list_for_df_airports[0][0]['name']

airpot_dict

{'icao': 'EDDB', 'name': 'Berlin, Berlin Brandenburg'}

In [12]:
list_for_df = []

for i in list_for_df_airports:
    airpots_dict = {}
    
    airpots_dict['icao'] = i[0]['icao']
    airpots_dict['name'] = i[0]['name']
    list_for_df.append(airpots_dict)

In [13]:
list_for_df

[{'icao': 'EDDB', 'name': 'Berlin, Berlin Brandenburg'},
 {'icao': 'EGMC', 'name': 'Southend'}]

In [14]:
airports_df = pd.DataFrame(list_for_df)
airports_df

Unnamed: 0,icao,name
0,EDDB,"Berlin, Berlin Brandenburg"
1,EGMC,Southend


In [15]:
def icao_airport_codes(latitudes, longitudes):

  #assert len(latitudes) == len(longitudes)
  
  list_for_df = []

  for i in range(len(latitudes)):

    url = f"https://aerodatabox.p.rapidapi.com/airports/search/location/{latitudes[i]}/{longitudes[i]}/km/100/16"

    querystring = {"withFlightInfoOnly":"true"}

    headers = {
      "X-RapidAPI-Host": "aerodatabox.p.rapidapi.com",
      "X-RapidAPI-Key": "YOUR_API_KEY_HERE"
    }

    response = requests.request("GET", url, headers=headers, params=querystring)

    list_for_df.append(pd.json_normalize(response.json()['items']))

  return pd.concat(list_for_df, ignore_index=True)

###### **Challenge 1:** Icao codes
If you use the above for all of your cities, you can create a DataFrame of all the airports and their associated `icao` codes. Perfect for a relational database.

###### **Challenge 2:** Arrivals information
Using what you have been shown above, plus the skills you've learnt in the last couple of days:
1. In `AeroDataBox API` use the `Flight API` > `FIDS/Schedules: Airport departures and arrivals by airport ICAO code` section
2. Fill out the parameters in the middle third and then copy the `python: requests` code from the right hand third
3. Explore the data you get back. What would be useful in your DataFrame and what can be excluded? Remember Gans wants to know about when people are arriving in the city
4. Make a DataFrame from the information you see as important
5. Condense everything you did above into a function that can take a list of ICAO codes as an input, and as an output gives you a DataFrame with the information for *tomorrows arrivals*

In [16]:
from datetime import datetime, date, timedelta
from pytz import timezone


In [17]:
# your code here

url = "https://aerodatabox.p.rapidapi.com/flights/airports/icao/EDDB/2022-10-04T20:00/2022-10-05T08:00"

querystring = {"withLeg":"true","direction":"Arrival","withCancelled":"false","withCodeshared":"false","withCargo":"false","withPrivate":"false","withLocation":"false"}

headers = {
	"X-RapidAPI-Key": "4cea24a7b1msh913cfb4bb02d7fap129548jsne33a1520a95c",
	"X-RapidAPI-Host": "aerodatabox.p.rapidapi.com"
}

response = requests.request("GET", url, headers=headers, params=querystring)

print(response.text)

{"arrivals":[{"departure":{"airport":{"icao":"EYVI","iata":"VNO","name":"Vilnius"},"scheduledTimeLocal":"2022-10-04 20:00+03:00","scheduledTimeUtc":"2022-10-04 17:00Z","quality":["Basic"]},"arrival":{"scheduledTimeLocal":"2022-10-04 20:35+02:00","scheduledTimeUtc":"2022-10-04 18:35Z","terminal":"1","quality":["Basic"]},"number":"BT 921","status":"Unknown","codeshareStatus":"Unknown","isCargo":false,"aircraft":{"model":"Airbus A220-300"},"airline":{"name":"Air Baltic"}},{"departure":{"airport":{"icao":"ENGM","iata":"OSL","name":"Oslo"},"scheduledTimeLocal":"2022-10-04 18:30+02:00","actualTimeLocal":"2022-10-04 18:27+02:00","scheduledTimeUtc":"2022-10-04 16:30Z","actualTimeUtc":"2022-10-04 16:27Z","checkInDesk":"123","gate":"D3","quality":["Basic","Live"]},"arrival":{"scheduledTimeLocal":"2022-10-04 20:05+02:00","scheduledTimeUtc":"2022-10-04 18:05Z","terminal":"1","quality":["Basic"]},"number":"DY 1108","status":"Expected","codeshareStatus":"Unknown","isCargo":false,"aircraft":{"model":

In [None]:
display(JSON(response.json()))

In [19]:
today= datetime.now().astimezone(timezone('Europe/Berlin')).date()
today

datetime.date(2023, 3, 31)

In [20]:
   tomorrow = ( today+timedelta(days=1))

datetime.date(2023, 4, 1)

In [21]:
#copehagen icao :EKCH
#Vienna icao: LOWW

In [22]:
def tomorrows_flight_arrivals(icao_list):

  today = datetime.now().astimezone(timezone('Europe/Berlin')).date()
  tomorrow = (today + timedelta(days=1))

  list_for_df = []

  for icao in icao_list:
    times = [["00:00","11:59"],["12:00","23:59"]]

    for time in times:
      url = f"https://aerodatabox.p.rapidapi.com/flights/airports/icao/{icao}/{tomorrow}T{time[0]}/{tomorrow}T{time[1]}"
      querystring = {"withLeg":"true","direction":"Arrival","withCancelled":"false","withCodeshared":"true","withCargo":"false","withPrivate":"false"}
      headers = {
          'x-rapidapi-host': "aerodatabox.p.rapidapi.com",
          'x-rapidapi-key': "4cea24a7b1msh913cfb4bb02d7fap129548jsne33a1520a95c"
          }
      response = requests.request("GET", url, headers=headers, params=querystring)
      flights_json = response.json()

      for flight in flights_json['arrivals']:
        flights_dict = {}
        flights_dict['arrival_icao'] = icao
        # .get() is another way of ensuring our code doesn't break
        # in the previous 2 notebooks you learnt about 'if' (cities) and 'try/except' (weather)
        # .get() works similar, it will get the text if possible, if there is no text a None value will be inserted instead
        flights_dict['arrival_time_local'] = flight['arrival'].get('scheduledTimeLocal', None)
        flights_dict['arrival_terminal'] = flight['arrival'].get('terminal', None)
        flights_dict['departure_city'] = flight['departure']['airport'].get('name', None)
        flights_dict['departure_icao'] = flight['departure']['airport'].get('icao', None)
        flights_dict['departure_time_local'] = flight['departure'].get('scheduledTimeLocal', None)
        flights_dict['airline'] = flight['airline'].get('name', None)
        flights_dict['flight_number'] = flight.get('number', None)
        flights_dict['data_retrieved_on'] = datetime.now().astimezone(timezone('Europe/Berlin')).date()
        list_for_df.append(flights_dict)

  return pd.DataFrame(list_for_df)

In [23]:
icaos = ['EDDB', 'LOWW'] 


In [24]:
tomorrows_flight_arrivals(icaos) 

Unnamed: 0,arrival_icao,arrival_time_local,arrival_terminal,departure_city,departure_icao,departure_time_local,airline,flight_number,data_retrieved_on
0,EDDB,2023-04-01 05:45+02:00,1,İzmir,LTBJ,2023-04-01 03:40+03:00,SunExpress,XQ 966,2023-03-31
1,EDDB,2023-04-01 07:40+02:00,1,Riga,EVRA,2023-04-01 07:05+03:00,Air Baltic,BT 211,2023-03-31
2,EDDB,2023-04-01 07:45+02:00,0,Bologna,LIPE,2023-04-01 06:00+02:00,Ryanair,FR 137,2023-03-31
3,EDDB,2023-04-01 07:55+02:00,1,Frankfurt-am-Main,EDDF,2023-04-01 06:45+02:00,Lufthansa,LH 170,2023-03-31
4,EDDB,2023-04-01 07:55+02:00,2,Paris,LFPO,2023-04-01 06:10+02:00,Transavia France,TO 3402,2023-03-31
...,...,...,...,...,...,...,...,...,...
542,LOWW,2023-04-01 23:00+02:00,0,Barcelona,LEBL,2023-04-01 20:30+02:00,Ryanair,FR 7351,2023-03-31
543,LOWW,2023-04-01 23:05+02:00,0,Fuerteventura Island,GCFV,2023-04-01 17:25+01:00,Ryanair,FR 744,2023-03-31
544,LOWW,2023-04-01 23:00+02:00,0,Copenhagen,EKCH,2023-04-01 21:15+02:00,Ryanair,FR 9888,2023-03-31
545,LOWW,2023-04-01 23:10+02:00,3,Frankfurt-am-Main,EDDF,2023-04-01 21:50+02:00,Austrian,OS 220,2023-03-31


# Establishing connection from pandas to the sql

In [25]:
!pip install pymysql
import sqlalchemy



In [26]:
schema="gans"   # name of the database you want to use here
host="127.0.0.1"        # to connect to your local server
user="root"
password="1234" # your password!!!!
port=3306
connection_details = f'mysql+pymysql://{user}:{password}@{host}:{port}/{schema}'

In [28]:
tomorrows_flight_arrivals(icaos).to_sql('flights',con=connection_details,if_exists='append',index=False)

547