So far we've learnt how to scrape the web, and how to make a request for information from an API. Some websites make APIs even easier. Check out [RapidAPI](https://rapidapi.com/) they take care of writing most of the code for you.

We will use the [AeroDataBox API](https://rapidapi.com/aedbx-aedbx/api/aerodatabox/), which can retrieve all sorts of information about flights and airports. We will show you how to retrieve information about the airports, and then it's up to you to apply this, along with what you've already learnt this week, to **produce a function, which retrieves tomorrows flight information for the major airports in the cities you web scraped**.

In [None]:
import pandas as pd
import requests
from datetime import date

On the left hand side of the AeroDataBox API page, you'll see a list of options for information that you can retrieve:
> - Flights API
- Subsciption / PUSH API
- Airport API
- Aircraft API
- Healthcheck & Status API

1. We want to select `Airport API`

2. Then within Airport API we want to select `Search airports by location`

3. Now in the middle third you'll want to enter the `latitude` and `longitude` of any city to test... we chose Berlin: latitude 52.31 longitude 13.24. Next we changed the `radiusKM` to only 50km. And finally set `withFlightInfoOnly` to true, so it will only return airports which have flight data (scheduled or live) available.

4. On the right hand third of the screen you should see a block of code that looks pretty unfamiliar. This is because by default the code is probably set to *(Node.js) Axios*. However, we have the power to change this to familiar python. Select the dropdown box at the top of the code and select `python > requests`.

Now you can copy the code to your notebook and it should look a little something like the cell below:

In [8]:
import requests

url = "https://aerodatabox.p.rapidapi.com/airports/search/location"

querystring = {"lat":"52.31","lon":"13.24","radiusKm":"50","limit":"10","withFlightInfoOnly":"true"}

headers = {
	"X-RapidAPI-Key": "14a44098c8mshe4536a007985112p1e3b4bjsn8fd805eb6bd4", #my api key
	"X-RapidAPI-Host": "aerodatabox.p.rapidapi.com"
}

response = requests.get(url, headers=headers, params=querystring)

print(response.json())

{'searchBy': {'lat': 52.31, 'lon': 13.24}, 'count': 1, 'items': [{'icao': 'EDDB', 'iata': 'BER', 'name': 'Berlin Brandenburg', 'shortName': 'Brandenburg', 'municipalityName': 'Berlin', 'location': {'lat': 52.35139, 'lon': 13.493889}, 'countryCode': 'DE'}]}


Let's view the response as `.json()` instead of `.text` so that it's easier to read

In [9]:
response.json()

{'searchBy': {'lat': 52.31, 'lon': 13.24},
 'count': 1,
 'items': [{'icao': 'EDDB',
   'iata': 'BER',
   'name': 'Berlin Brandenburg',
   'shortName': 'Brandenburg',
   'municipalityName': 'Berlin',
   'location': {'lat': 52.35139, 'lon': 13.493889},
   'countryCode': 'DE'}]}

We can now turn this into a dataframe using `.json_normalize()`

In [10]:
pd.json_normalize(response.json()['items'])

Unnamed: 0,icao,iata,name,shortName,municipalityName,countryCode,location.lat,location.lon
0,EDDB,BER,Berlin Brandenburg,Brandenburg,Berlin,DE,52.35139,13.493889


**airport Dataframe**

**function**

Let's now use this for the latitude and longitude of multiple cities

In [23]:
import pandas as pd
import requests


def get_and_send_airport_data():
    connection_string = connection()
    # coordinates for Berlin, Frankfurt, Cologne, Munich, Hamburg, Hörstel
    latitudes = [52.520008, 50.110924, 50.933594, 48.137154,  53.551086, 51.961563]
    longitudes = [13.404954, 8.682127, 6.961899, 11.576124, 9.993682, 7.628202]
    airport_df = get_airport_data(latitudes, longitudes)
    send_airport_data(airport_df, connection_string)
    return 'airport data sent'

def connection():
  schema = "sql_challenge1"
  host = "127.0.0.1"
  user = "root"
  password = "root1234"
  port = 3306
  connection_string = f'mysql+pymysql://{user}:{password}@{host}:{port}/{schema}'
  return connection_string


def get_airport_data(latitudes, longitudes):

  list_for_df = []

  for lat,lon in zip(latitudes, longitudes):

    url = f"https://aerodatabox.p.rapidapi.com/airports/search/location/{lat}/{lon}/km/50/10"

    querystring = {"withFlightInfoOnly":"true"}

    headers = {
      "X-RapidAPI-Host": "aerodatabox.p.rapidapi.com",
      "X-RapidAPI-Key": "14a44098c8mshe4536a007985112p1e3b4bjsn8fd805eb6bd4"
    }

    response = requests.request("GET", url, headers=headers, params=querystring)

    list_for_df.append(pd.json_normalize(response.json()['items']))
    
    airport_data = pd.concat(list_for_df, ignore_index=True)
  airport_df = airport_data.loc[:, ['icao', 'name']]
  return airport_df

def send_airport_data(airport_df, connection_string):
  airport_df.to_sql('airport',
                    if_exists='append',
                    con=connection_string,
                    index=False)
get_and_send_airport_data()

'airport data sent'

###### **Challenge:** Arrivals information
Using what you have been shown above, plus the skills you've learnt in the last couple of days:
1. In `AeroDataBox API` use the `Flight API` > `FIDS/Schedules: Airport departures and arrivals by airport ICAO code` section
2. Fill out the parameters in the middle third and then copy the `python: requests` code from the right hand third
3. Explore the data you get back. What would be useful in your DataFrame and what can be excluded? Remember Gans wants to know about when people are arriving in the city
4. Make a DataFrame from the information you see as important
5. Condense everything you did above into a function that can take a list of ICAO codes as an input, and as an output gives you a DataFrame with the information for *tomorrows arrivals*

In [13]:
import requests
url = "https://aerodatabox.p.rapidapi.com/flights/airports/icao/EDDB"

querystring = {"offsetMinutes":"-120","durationMinutes":"720","withLeg":"true","withCancelled":"true","withCodeshared":"true","withCargo":"true","withPrivate":"true","withLocation":"false"}

headers = {
	"X-RapidAPI-Key": "14a44098c8mshe4536a007985112p1e3b4bjsn8fd805eb6bd4",
	"X-RapidAPI-Host": "aerodatabox.p.rapidapi.com"
}

response_flight = requests.get(url, headers=headers, params=querystring)

print(response_flight.json())


{'departures': [{'departure': {'scheduledTime': {'utc': '2024-03-12 07:40Z', 'local': '2024-03-12 08:40+01:00'}, 'revisedTime': {'utc': '2024-03-12 08:05Z', 'local': '2024-03-12 09:05+01:00'}, 'runwayTime': {'utc': '2024-03-12 08:11Z', 'local': '2024-03-12 09:11+01:00'}, 'terminal': '1', 'checkInDesk': '011-013', 'gate': 'C13', 'quality': ['Basic', 'Live']}, 'arrival': {'airport': {'icao': 'HEGN', 'iata': 'HRG', 'name': 'Hurghada'}, 'quality': []}, 'number': '6Y 305', 'callSign': 'ART305', 'status': 'Departed', 'codeshareStatus': 'IsOperator', 'isCargo': False, 'aircraft': {'reg': 'YR-HLA', 'modeS': '4A2181', 'model': 'Boeing 737-800'}, 'airline': {'name': 'SmartLynx', 'iata': '6Y', 'icao': 'ART'}}, {'departure': {'scheduledTime': {'utc': '2024-03-12 07:56Z', 'local': '2024-03-12 08:56+01:00'}, 'revisedTime': {'utc': '2024-03-12 07:56Z', 'local': '2024-03-12 08:56+01:00'}, 'runwayTime': {'utc': '2024-03-12 07:56Z', 'local': '2024-03-12 08:56+01:00'}, 'quality': ['Basic', 'Live']}, 'arr

In [14]:
response_flight.json()

{'departures': [{'departure': {'scheduledTime': {'utc': '2024-03-12 07:40Z',
     'local': '2024-03-12 08:40+01:00'},
    'revisedTime': {'utc': '2024-03-12 08:05Z',
     'local': '2024-03-12 09:05+01:00'},
    'runwayTime': {'utc': '2024-03-12 08:11Z',
     'local': '2024-03-12 09:11+01:00'},
    'terminal': '1',
    'checkInDesk': '011-013',
    'gate': 'C13',
    'quality': ['Basic', 'Live']},
   'arrival': {'airport': {'icao': 'HEGN', 'iata': 'HRG', 'name': 'Hurghada'},
    'quality': []},
   'number': '6Y 305',
   'callSign': 'ART305',
   'status': 'Departed',
   'codeshareStatus': 'IsOperator',
   'isCargo': False,
   'aircraft': {'reg': 'YR-HLA', 'modeS': '4A2181', 'model': 'Boeing 737-800'},
   'airline': {'name': 'SmartLynx', 'iata': '6Y', 'icao': 'ART'}},
  {'departure': {'scheduledTime': {'utc': '2024-03-12 07:56Z',
     'local': '2024-03-12 08:56+01:00'},
    'revisedTime': {'utc': '2024-03-12 07:56Z',
     'local': '2024-03-12 08:56+01:00'},
    'runwayTime': {'utc': '2024

In [15]:
#first item in departure dictionary
response_flight.json()['departures'][0]

{'departure': {'scheduledTime': {'utc': '2024-03-12 07:40Z',
   'local': '2024-03-12 08:40+01:00'},
  'revisedTime': {'utc': '2024-03-12 08:05Z',
   'local': '2024-03-12 09:05+01:00'},
  'runwayTime': {'utc': '2024-03-12 08:11Z',
   'local': '2024-03-12 09:11+01:00'},
  'terminal': '1',
  'checkInDesk': '011-013',
  'gate': 'C13',
  'quality': ['Basic', 'Live']},
 'arrival': {'airport': {'icao': 'HEGN', 'iata': 'HRG', 'name': 'Hurghada'},
  'quality': []},
 'number': '6Y 305',
 'callSign': 'ART305',
 'status': 'Departed',
 'codeshareStatus': 'IsOperator',
 'isCargo': False,
 'aircraft': {'reg': 'YR-HLA', 'modeS': '4A2181', 'model': 'Boeing 737-800'},
 'airline': {'name': 'SmartLynx', 'iata': '6Y', 'icao': 'ART'}}

In [16]:
#extracting arrival icao
response_flight.json()['departures'][0]['arrival']['airport']['icao']

'HEGN'

In [31]:
#extracting arrival time
response_flight.json()['departures'][0]['departure']['scheduledTime']['local']

{'departure': {'scheduledTime': {'utc': '2024-03-12 07:40Z',
   'local': '2024-03-12 08:40+01:00'},
  'revisedTime': {'utc': '2024-03-12 08:05Z',
   'local': '2024-03-12 09:05+01:00'},
  'runwayTime': {'utc': '2024-03-12 08:11Z',
   'local': '2024-03-12 09:11+01:00'},
  'terminal': '1',
  'checkInDesk': '011-013',
  'gate': 'C13',
  'quality': ['Basic', 'Live']},
 'arrival': {'airport': {'icao': 'HEGN', 'iata': 'HRG', 'name': 'Hurghada'},
  'quality': []},
 'number': '6Y 305',
 'callSign': 'ART305',
 'status': 'Departed',
 'codeshareStatus': 'IsOperator',
 'isCargo': False,
 'aircraft': {'reg': 'YR-HLA', 'modeS': '4A2181', 'model': 'Boeing 737-800'},
 'airline': {'name': 'SmartLynx', 'iata': '6Y', 'icao': 'ART'}}

In [30]:
#extracting departure icao
response_flight.json()['departures'][0]['airline']['icao']

'EJU'

In [38]:
#extracting arrival city
response_flight.json()['departures'][0]['arrival']['airport']['name']

'Rome'

In [33]:
#Extracting flight number
response_flight.json()['departures'][0]['number']

'EC 5079'

In [11]:
#What are current departures or arrivals at the airport? 
#or What is the flight schedule at the airport? 
#or What is flight history at the airport?
import requests
icao_code = 'EDDB'
date = date.today()
start_time = '00:00'
end_time = '12:00'

url = f"https://aerodatabox.p.rapidapi.com/flights/airports/icao/{icao_code}/{date}T{start_time}/{date}T{end_time}"

querystring = {"withLeg":"true","withCancelled":"true","withCodeshared":"true","withCargo":"true","withPrivate":"true","withLocation":"false"}

headers = {
	"X-RapidAPI-Key": "14a44098c8mshe4536a007985112p1e3b4bjsn8fd805eb6bd4",
	"X-RapidAPI-Host": "aerodatabox.p.rapidapi.com"
}

response_arrivals = requests.get(url, headers=headers, params=querystring)

print(response_arrivals.json())

{'departures': [{'departure': {'scheduledTime': {'utc': '2024-03-11 23:21Z', 'local': '2024-03-12 00:21+01:00'}, 'revisedTime': {'utc': '2024-03-11 23:21Z', 'local': '2024-03-12 00:21+01:00'}, 'runwayTime': {'utc': '2024-03-11 23:29Z', 'local': '2024-03-12 00:29+01:00'}, 'quality': ['Basic', 'Live']}, 'arrival': {'airport': {'icao': 'EDDS', 'iata': 'STR', 'name': 'Stuttgart'}, 'quality': []}, 'number': 'EW 3001', 'callSign': 'EWG3001', 'status': 'Departed', 'codeshareStatus': 'IsOperator', 'isCargo': False, 'aircraft': {'reg': 'D-AEWR', 'modeS': '3C56F2', 'model': 'Airbus A320'}, 'airline': {'name': 'Eurowings', 'iata': 'EW', 'icao': 'EWG'}}, {'departure': {'scheduledTime': {'utc': '2024-03-12 05:41Z', 'local': '2024-03-12 06:41+01:00'}, 'revisedTime': {'utc': '2024-03-12 05:41Z', 'local': '2024-03-12 06:41+01:00'}, 'runwayTime': {'utc': '2024-03-12 05:48Z', 'local': '2024-03-12 06:48+01:00'}, 'quality': ['Basic', 'Live']}, 'arrival': {'airport': {'icao': 'EDDS', 'iata': 'STR', 'name':

In [29]:
response_arrivals.json()

{'departures': [{'departure': {'scheduledTime': {'utc': '2024-03-11 23:21Z',
     'local': '2024-03-12 00:21+01:00'},
    'revisedTime': {'utc': '2024-03-11 23:21Z',
     'local': '2024-03-12 00:21+01:00'},
    'runwayTime': {'utc': '2024-03-11 23:29Z',
     'local': '2024-03-12 00:29+01:00'},
    'quality': ['Basic', 'Live']},
   'arrival': {'airport': {'icao': 'EDDS', 'iata': 'STR', 'name': 'Stuttgart'},
    'quality': []},
   'number': 'EW 3001',
   'callSign': 'EWG3001',
   'status': 'Departed',
   'codeshareStatus': 'IsOperator',
   'isCargo': False,
   'aircraft': {'reg': 'D-AEWR', 'modeS': '3C56F2', 'model': 'Airbus A320'},
   'airline': {'name': 'Eurowings', 'iata': 'EW', 'icao': 'EWG'}},
  {'departure': {'scheduledTime': {'utc': '2024-03-12 05:41Z',
     'local': '2024-03-12 06:41+01:00'},
    'revisedTime': {'utc': '2024-03-12 05:41Z',
     'local': '2024-03-12 06:41+01:00'},
    'runwayTime': {'utc': '2024-03-12 05:48Z',
     'local': '2024-03-12 06:48+01:00'},
    'quality'

In [33]:
#extracting arrival icao
response_arrivals.json()['departures'][0]['arrival']['airport']['icao']
#extracting departure icao
response_arrivals.json()['departures'][0]['airline']['icao']
#extracting arrival city
response_arrivals.json()['departures'][0]['arrival']['airport']['name']
#Extracting flight number
response_arrivals.json()['departures'][0]['number']
#extracting arrival time
response_arrivals.json()['departures'][0]['departure']['scheduledTime']['local']


'2024-03-12 00:21+01:00'

In [86]:
icao_code = 'EDDB'
date = date.today()
start_time = '00:00'
end_time = '12:00'

url = f"https://aerodatabox.p.rapidapi.com/flights/airports/icao/{icao_code}/{date}T{start_time}/{date}T{end_time}"

querystring = {"withLeg":"true","withCancelled":"true","withCodeshared":"true","withCargo":"true","withPrivate":"true","withLocation":"false"}

headers = {
    "X-RapidAPI-Key": "14a44098c8mshe4536a007985112p1e3b4bjsn8fd805eb6bd4",
    "X-RapidAPI-Host": "aerodatabox.p.rapidapi.com"
}

response_arrivals1 = requests.get(url, headers=headers, params=querystring)
arrivals = response_arrivals1.json()
flight_list = []
for item in arrivals['departures']:
    flight_item = {
    #extracting arrival icao
    'arrival_icao': item['arrival']['airport'].get('icao', None),
    #extracting departure icao
    'departure_icao': item['airline'].get('icao', None),
    #extracting arrival city
    'arrival_city': item['arrival']['airport']['name'],
    #Extracting flight number
    'flight_number': item.get('number', None),
    #extracting arrival time
    'arrival_time': item['departure']['scheduledTime']['local']
    }
    
    flight_list.append(flight_item)
    flight_df = pd.DataFrame(flight_list)

flight_df['arrival_time'] =  flight_df['arrival_time'].str[:-6] #getting rid of +01:00 from arrival_time
flight_df 

Unnamed: 0,arrival_icao,departure_icao,arrival_city,flight_number,arrival_time
0,EDDS,EWG,Stuttgart,EW 3001,2024-03-12 00:21
1,EDDS,GWI,Stuttgart,4U 4EP,2024-03-12 06:41
2,EDDK,GWI,Cologne,4U 9AF,2024-03-12 06:58
3,LFPG,EJU,Paris,EC 5147,2024-03-12 07:25
4,HEGN,EJU,Hurghada,EC 5369,2024-03-12 07:10
...,...,...,...,...,...
109,LTFM,THY,Istanbul,TK 1728,2024-03-12 06:40
110,LTFM,THY,Istanbul,TK 1722,2024-03-12 11:10
111,OTHH,QTR,Doha,QR 80,2024-03-12 09:00
112,UBBB,AHY,Baku,J2 64,2024-03-12 09:55


In [78]:
flight_df['arrival_time'] =  flight_df['arrival_time'].str[:-6] #getting rid of +01:00 from arrival_time
flight_df.head()  

Unnamed: 0,arrival_icao,departure_icao,arrival_city,flight_number,arrival_time
0,EDDS,EWG,Stuttgart,EW 3001,2024-03-12 00:21
1,EDDS,GWI,Stuttgart,4U 4EP,2024-03-12 06:41
2,EDDK,GWI,Cologne,4U 9AF,2024-03-12 06:58
3,LFPG,EJU,Paris,EC 5147,2024-03-12 07:25
4,HEGN,EJU,Hurghada,EC 5369,2024-03-12 07:10


In [27]:
import requests
from datetime import date, timedelta
import pandas as pd

def get_and_send_flight_data():
    connection_string = connection()
    airport_df = pd.read_sql('airport', con= connection_string)
    flight_df = get_flight_data(airport_df)
    send_flight_data(flight_df, connection_string)
    return 'Flights data sent'




def connection():
  schema = "sql_challenge1"
  host = "127.0.0.1"
  user = "root"
  password = "root1234"
  port = 3306
  connection_string =  f'mysql+pymysql://{user}:{password}@{host}:{port}/{schema}'
  return connection_string


def get_flight_data(airport_df):
    present_day = date.today()
    tomorrow= present_day + timedelta(1)

    
    flight_list = []
    
    for icao_code in airport_df['icao']:
        times = [['00:00', '11:59'], ['12:00', '23:59']]

        for time in times:

            url = f"https://aerodatabox.p.rapidapi.com/flights/airports/icao/{icao_code}/{tomorrow}T{time[0]}/{tomorrow}T{time[1]}"

            querystring = {"withLeg":"true","withCancelled":"true","withCodeshared":"true","withCargo":"true","withPrivate":"true","withLocation":"false"}

            headers = {
                "X-RapidAPI-Key": "14a44098c8mshe4536a007985112p1e3b4bjsn8fd805eb6bd4",
                "X-RapidAPI-Host": "aerodatabox.p.rapidapi.com"
            }

            response_arrivals1 = requests.get(url, headers=headers, params=querystring)
            arrivals = response_arrivals1.json()
            icao_list = airport_df["icao"].tolist()

            
            for item in arrivals['departures']:
                flight_item = {
                #extracting arrival icao
                'arrival_icao': icao_list[0],
                #extracting departure icao
                'departure_icao': item['airline'].get('icao', None),
                #extracting arrival city
                'arrival_city': item['arrival']['airport']['name'],
                #Extracting flight number
                'flight_number': item.get('number', None),
                #extracting arrival time
                'arrival_time': item['departure']['scheduledTime']['local']
                }
                
                flight_list.append(flight_item)
                flight_df = pd.DataFrame(flight_list)

            flight_df['arrival_time'] =  flight_df['arrival_time'].str[:-6] #getting rid of +01:00 from arrival_time
            flight_df['arrival_time'] = pd.to_datetime(flight_df['arrival_time'])
            
    print (flight_df.info())
    return flight_df

def send_flight_data(flight_df, connection_string):
  flight_df.to_sql('flight',
                    if_exists='append',
                    con=connection_string,
                    index=False)
  
get_and_send_flight_data()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2968 entries, 0 to 2967
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   arrival_icao    2968 non-null   object        
 1   departure_icao  2907 non-null   object        
 2   arrival_city    2968 non-null   object        
 3   flight_number   2968 non-null   object        
 4   arrival_time    2968 non-null   datetime64[ns]
dtypes: datetime64[ns](1), object(4)
memory usage: 116.1+ KB
None


'Flights data sent'

In [4]:
import pandas as pd
import pymysql

def get_and_send_airport_cities_data():
   connection_string = connection()
   cities = pd.read_sql('cities', con= connection_string)
   airport_df = pd.read_sql('airport', con=connection_string)
   cities_airports_df = get_cities_airport_data(cities, airport_df)
   send_cities_airport_data(cities_airports_df, connection_string)
   return 'airport_cities data was sent'




def connection():
  schema = "sql_challenge1"
  host = "127.0.0.1"
  user = "root"
  password = "root1234"
  port = 3306
  connection_string =  f'mysql+pymysql://{user}:{password}@{host}:{port}/{schema}'
  return connection_string



def get_cities_airport_data(cities, airport_df):
    
    city_id = list(cities['city_id'])[0]
    airport_icao= list(airport_df['icao'])[0]

    cities_airports_list = []
    cities_airports_dict = {
       'city_id':city_id,
       'airport_icao':airport_icao

    }
    cities_airports_list.append(cities_airports_dict)
    cities_airports_df = pd.DataFrame(cities_airports_list)
    print(cities_airports_df.info())
    return cities_airports_df


def send_cities_airport_data(cities_airports_df, connection_string):
  cities_airports_df.to_sql('cities_airports',
                    if_exists='append',
                    con=connection_string,
                    index=False)
  
get_and_send_airport_cities_data()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 2 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   city_id       1 non-null      int64 
 1   airport_icao  1 non-null      object
dtypes: int64(1), object(1)
memory usage: 148.0+ bytes
None


'airport_cities data was sent'