So far we've learnt how to scrape the web, and how to make a request for information from an API. Some websites make APIs even easier. Check out [RapidAPI](https://rapidapi.com/) they take care of writing most of the code for you.

We will use the [AeroDataBox API](https://rapidapi.com/aedbx-aedbx/api/aerodatabox/), which can retrieve all sorts of information about flights and airports. We will show you how to retrieve information about the airports, and then it's up to you to apply this, along with what you've already learnt this week, to **produce a function, which retrieves tomorrows flight information for the major airports in the cities you web scraped**.

In [144]:
import pandas as pd
import requests
from datetime import datetime, timedelta
from pytz import timezone
from timezonefinder import TimezoneFinder

import sys
sys.path.append(r'E:\WBS DS\Python\GANS_e-scooter\safe_key.py') #give path to safe.py file where all password and keys are save in dictionary

from safe_key import safe_key


On the left hand side of the AeroDataBox API page, you'll see a list of options for information that you can retrieve:
> - Flights API
- Subsciption / PUSH API
- Airport API
- Aircraft API
- Healthcheck & Status API

1. We want to select `Airport API`

2. Then within Airport API we want to select `Search airports by location`

3. Now in the middle third you'll want to enter the `latitude` and `longitude` of any city to test... we chose Berlin: latitude 52.31 longitude 13.24. Next we changed the `radiusKM` to only 50km. And finally set `withFlightInfoOnly` to true, so it will only return airports which have flight data (scheduled or live) available.

4. On the right hand third of the screen you should see a block of code that looks pretty unfamiliar. This is because by default the code is probably set to *(Node.js) Axios*. However, we have the power to change this to familiar python. Select the dropdown box at the top of the code and select `python > requests`.

Now you can copy the code to your notebook and it should look a little something like the cell below:

In [66]:
url = "https://aerodatabox.p.rapidapi.com/airports/search/location"

querystring = {"lat":"52.32","lon":"13.24","radiusKm":"50","limit":"16","withFlightInfoOnly":"true"}

headers = { "x-rapidapi-key": safe_key["x_rapidapi_key"],
           "x-rapidapi-host": "aerodatabox.p.rapidapi.com"
			}

response = requests.get(url, headers=headers, params=querystring)

response.json()

{'searchBy': {'lat': 52.32, 'lon': 13.24},
 'count': 1,
 'items': [{'icao': 'EDDB',
   'iata': 'BER',
   'name': 'Berlin Brandenburg',
   'shortName': 'Brandenburg',
   'municipalityName': 'Berlin',
   'location': {'lat': 52.35139, 'lon': 13.493889},
   'countryCode': 'DE',
   'timeZone': 'Europe/Berlin'}]}

We can now turn this into a dataframe using `.json_normalize()`

In [67]:
pd.json_normalize(response.json()['items'])

Unnamed: 0,icao,iata,name,shortName,municipalityName,countryCode,timeZone,location.lat,location.lon
0,EDDB,BER,Berlin Brandenburg,Brandenburg,Berlin,DE,Europe/Berlin,52.35139,13.493889


Let's now use this for the latitude and longitude of multiple cities

In [78]:
def icao_airport_codes(latitudes, longitudes):

  #assert len(latitudes) == len(longitudes)

  list_for_df = []

  for index, value in enumerate(latitudes):

    url = "https://aerodatabox.p.rapidapi.com/airports/search/location"

    querystring = {
      "lat": value, 
      "lon": longitudes[index], 
      "radiusKm":"100",
      "limit":"16", 
      "withFlightInfoOnly":"true" 
      }

    headers = { 
      "x-rapidapi-key": safe_key["x_rapidapi_key"],
      "x-rapidapi-host": "aerodatabox.p.rapidapi.com"
			}

    response = requests.get(url, headers=headers, params=querystring)
    
    list_for_df.append(pd.json_normalize(response.json()['items']))

  return pd.concat(list_for_df, ignore_index=True)

In [180]:
# coordinates for Berlin, Paris, London (51.5072,-0.1275), Pune
latitudes = [52.5200, 48.8567,  18.5203]
longitudes = [13.4050, 2.3522, 73.8567]

airport_data_df = icao_airport_codes(latitudes, longitudes)
airport_data_df

Unnamed: 0,icao,iata,name,shortName,municipalityName,countryCode,timeZone,location.lat,location.lon
0,EDDB,BER,Berlin Brandenburg,Brandenburg,Berlin,DE,Europe/Berlin,52.35139,13.493889
1,LFPB,LBG,Paris -Le Bourget,-Le Bourget,Paris,FR,Europe/Paris,48.9694,2.44139
2,LFPO,ORY,Paris -Orly,-Orly,Paris,FR,Europe/Paris,48.7253,2.35944
3,LFPG,CDG,Paris Charles de Gaulle,Charles de Gaulle,Paris,FR,Europe/Paris,49.0128,2.549999
4,LFOB,BVA,Beauvais/Tillé Paris Beauvais Tillé,Paris Beauvais Tillé,Beauvais/Tillé,FR,Europe/Paris,49.4544,2.11278
5,VAPO,PNQ,Pune,Pune,Pune,IN,Asia/Kolkata,18.5821,73.9197


In [None]:
def icao_airport_codes(cities_info_from_sql): 
     
     list_for_df = []

     for row in cities_info_from_sql.itertuples(index=False):
       
       city_name = row.City
       city_id = row.City_id
       latitude = row.Latitude  
       longitude = row.Longitude


       url = "https://aerodatabox.p.rapidapi.com/airports/search/location"

       querystring = {
       "lat": latitude, 
       "lon": longitude, 
       "radiusKm":"100",
       "limit":"16", 
       "withFlightInfoOnly":"true" 
       }

       headers = { 
       "x-rapidapi-key": safe_key["x_rapidapi_key"],
       "x-rapidapi-host": "aerodatabox.p.rapidapi.com"
                            }

       response = requests.get(url, headers=headers, params=querystring)
       
       list_for_df.append(pd.json_normalize(response.json()['items']))
     
     return pd.concat(list_for_df, ignore_index=True)

In [188]:
icao_airport_codes_df = airport_data_df.loc[:, ["icao"]]
icao_airport_codes_df

Unnamed: 0,icao
0,EDDB
1,LFPB
2,LFPO
3,LFPG
4,LFOB
5,VAPO


In [182]:
for code in icao_airport_codes_df.itertuples(index=False):
    print(code.icao)

EDDB
LFPB
LFPO
LFPG
LFOB
VAPO


In [183]:
icao_airport_names_df = airport_data_df.loc[:, ["icao", "name"]]
icao_airport_names_df

Unnamed: 0,icao,name
0,EDDB,Berlin Brandenburg
1,LFPB,Paris -Le Bourget
2,LFPO,Paris -Orly
3,LFPG,Paris Charles de Gaulle
4,LFOB,Beauvais/Tillé Paris Beauvais Tillé
5,VAPO,Pune


### Extra information about icao and iata


- IATA stands for the International Air Transport Association, an industry trade group that represents airlines worldwide.

- IATA codes are three-letter airport codes that are widely used by airlines and passengers to identify airports in ticketing, luggage tags, and flight schedules.
- Unlike ICAO codes (used in professional aviation operations), IATA codes are more consumer-facing and are typically easier to remember.

- In AeroDataBox and aviation terminology, ICAO stands for the International Civil Aviation Organization.

- In the context of AeroDataBox:

    - ICAO codes are four-letter alphanumeric codes used to uniquely identify airports around the world. These are standardized by ICAO and commonly used in aviation operations, flight planning, and air traffic control.

### **Challenge:** Arrivals information
Using what you have been shown above, plus the skills you've learnt in the last couple of days:
1. In `AeroDataBox API` use the `Flight API` > `FIDS/Schedules: Airport departures and arrivals (by time range)` section
2. Fill out the parameters in the middle third and then copy the `python: requests` code from the right hand third
3. Explore the data you get back. What would be useful in your DataFrame and what can be excluded? Remember Gans wants to know about when people are arriving in the city
4. Make a DataFrame from the information you see as important
5. Condense everything you did above into a function that can take a list of ICAO codes as an input, and as an output gives you a DataFrame with the information for *tomorrows arrivals*

In [124]:
# your code here

url = "https://aerodatabox.p.rapidapi.com/flights/airports/icao/EDDB/2024-12-03T20:00/2024-12-04T08:00"

querystring = {
    			"withLeg":"true",
                "direction":"Arrival",
                "withCancelled":"false",
                "withCodeshared":"true",
                "withCargo":"false",                    "withPrivate":"false"
    }

headers = {
	"x-rapidapi-key": safe_key["x_rapidapi_key"],
	"x-rapidapi-host": "aerodatabox.p.rapidapi.com"
}

response = requests.get(url, headers=headers, params=querystring)

flight = response.json()
flight

{'arrivals': [{'departure': {'airport': {'icao': 'EDDK',
     'iata': 'CGN',
     'name': 'Cologne',
     'timeZone': 'Europe/Berlin'},
    'scheduledTime': {'utc': '2024-12-03 17:50Z',
     'local': '2024-12-03 18:50+01:00'},
    'terminal': '1',
    'quality': ['Basic']},
   'arrival': {'scheduledTime': {'utc': '2024-12-03 19:00Z',
     'local': '2024-12-03 20:00+01:00'},
    'revisedTime': {'utc': '2024-12-03 19:00Z',
     'local': '2024-12-03 20:00+01:00'},
    'terminal': '1',
    'gate': 'B02',
    'baggageBelt': 'B2',
    'quality': ['Basic', 'Live']},
   'number': 'EW 14',
   'status': 'Expected',
   'codeshareStatus': 'IsOperator',
   'isCargo': False,
   'aircraft': {'reg': 'D-ABNI', 'modeS': '3C49C9', 'model': 'Airbus A320'},
   'airline': {'name': 'Eurowings', 'iata': 'EW', 'icao': 'EWG'}},
  {'departure': {'airport': {'icao': 'LEBL',
     'iata': 'BCN',
     'name': 'Barcelona',
     'timeZone': 'Europe/Madrid'},
    'scheduledTime': {'utc': '2024-12-03 16:20Z',
     'loca

- departure airport icao
- arrival revisedTime local
- arrival airport airline icao
- number #Flight number
-

In [105]:
flight["arrivals"]

[{'departure': {'airport': {'icao': 'EDDK',
    'iata': 'CGN',
    'name': 'Cologne',
    'timeZone': 'Europe/Berlin'},
   'scheduledTime': {'utc': '2024-12-03 17:50Z',
    'local': '2024-12-03 18:50+01:00'},
   'terminal': '1',
   'quality': ['Basic']},
  'arrival': {'scheduledTime': {'utc': '2024-12-03 19:00Z',
    'local': '2024-12-03 20:00+01:00'},
   'revisedTime': {'utc': '2024-12-03 19:00Z',
    'local': '2024-12-03 20:00+01:00'},
   'terminal': '1',
   'gate': 'B02',
   'baggageBelt': 'B2',
   'quality': ['Basic', 'Live']},
  'number': 'EW 14',
  'status': 'Expected',
  'codeshareStatus': 'IsOperator',
  'isCargo': False,
  'aircraft': {'reg': 'D-ABNI', 'modeS': '3C49C9', 'model': 'Airbus A320'},
  'airline': {'name': 'Eurowings', 'iata': 'EW', 'icao': 'EWG'}},
 {'departure': {'airport': {'icao': 'LEBL',
    'iata': 'BCN',
    'name': 'Barcelona',
    'timeZone': 'Europe/Madrid'},
   'scheduledTime': {'utc': '2024-12-03 16:20Z',
    'local': '2024-12-03 17:20+01:00'},
   'revis

In [104]:
len(flight["arrivals"])

68

In [110]:
flight["arrivals"][0]['departure']

{'airport': {'icao': 'EDDK',
  'iata': 'CGN',
  'name': 'Cologne',
  'timeZone': 'Europe/Berlin'},
 'scheduledTime': {'utc': '2024-12-03 17:50Z',
  'local': '2024-12-03 18:50+01:00'},
 'terminal': '1',
 'quality': ['Basic']}

In [91]:
#Departure_airport_icao
flight["arrivals"][0]['departure']['airport']

{'icao': 'EDDK', 'iata': 'CGN', 'name': 'Cologne', 'timeZone': 'Europe/Berlin'}

In [92]:

flight["arrivals"][0]['departure']['scheduledTime']

{'utc': '2024-12-03 17:50Z', 'local': '2024-12-03 18:50+01:00'}

In [117]:
# Arrival_time
flight["arrivals"][0]['arrival']['revisedTime']

{'utc': '2024-12-03 19:00Z', 'local': '2024-12-03 20:00+01:00'}

In [97]:
# Flight_number
flight["arrivals"][0]['number']

'EW 14'

In [136]:
#Arrival_airport_icao
Arrival_airport_icao = flight["arrivals"][0]["airline"].get("icao")
Arrival_airport_icao

'EWG'

In [131]:
print(datetime.now().date())

2024-12-03


In [186]:
def get_timezone_by_icao(icao_code):
    try:
        url = f"https://aerodatabox.p.rapidapi.com/airports/icao/{icao_code}"
        
        headers = {
            "x-rapidapi-key": safe_key["x_rapidapi_key"],
            "x-rapidapi-host": "aerodatabox.p.rapidapi.com"
        }

        response = requests.get(url, headers=headers)

        if response.status_code == 200:
            data = response.json()
            timezone_str = data["timeZone"]  # Default to UTC if not found
            return timezone_str
        else:
            print(f"Failed to fetch timezone for ICAO: {icao_code}")
            return "UTC"
    except Exception as e:
        print(f"Error in timezone lookup for ICAO: {icao_code}: {e}")
        return "UTC"
    
icao_code = "VAPO"
timezone_str = get_timezone_by_icao(icao_code) or "UTC"
city_timezone = timezone(timezone_str)
today = datetime.now(city_timezone).date()
tomorrow = (today + timedelta(days=1))

print(today)
print(tomorrow)

2024-12-03
2024-12-04


In [177]:
def FlightData_1(icao_code):

    flight_data = []

    for code in icao_code:

        # Find the city timezone based on icao code API
        timezone_str = get_timezone_by_icao(code) or "UTC"
        city_timezone = timezone(timezone_str)
        #get today date
        today = datetime.now(city_timezone).date()
        #get tommorrow date
        tomorrow = (today + timedelta(days=1))


        # the api can only make 12 hour calls, therefore, two 12 hour calls make a full day
        # using the nested lists below we can make a morning call and extract the data
        # then make an afternoon call and extract the data
        times = [["00:00","11:59"],
                ["12:00","23:59"]]

        for time in times:

            url = f"https://aerodatabox.p.rapidapi.com/flights/airports/icao/{code}/{tomorrow}T{time[0]}/{tomorrow}T{time[1]}"

            querystring = {
                            "withLeg":"true",
                            "direction":"Arrival",
                            "withCancelled":"false",
                            "withCodeshared":"true",
                            "withCargo":"false",
                            "withPrivate":"false"
                        }

            headers = {
                "x-rapidapi-key": safe_key["x_rapidapi_key"],
                "x-rapidapi-host": "aerodatabox.p.rapidapi.com"
                }

            response = requests.get(url, headers=headers, params=querystring)

            flight = response.json()

            retrieval_time = datetime.now(city_timezone).strftime("%Y-%m-%d %H:%M:%S")

        
        
            for item in flight["arrivals"]:
                data = {
                    "Arrival_airport_icao" : code,
                    "Departure_airport_icao" : item["departure"]["airport"].get("icao", None),
                    "Flight_number" : item.get("number", None),
                    "Arrival_time" : item["arrival"]["scheduledTime"].get("local", None),
                    "Data_retrieved_time": retrieval_time
                    }
                
                flight_data.append(data)
    
    flight_df = pd.DataFrame(flight_data)
    flight_df["Arrival_time"] = flight_df["Arrival_time"].str[:-6]
    flight_df["Arrival_time"] = pd.to_datetime(flight_df["Arrival_time"])
    flight_df["Data_retrieved_time"] = pd.to_datetime(flight_df["Data_retrieved_time"])
    
    return flight_df

icao_code = ["EDDB", "EDDF"]
FlightData_1(icao_code)


Unnamed: 0,Arrival_airport_icao,Departure_airport_icao,Flight_number,Arrival_time,Data_retrieved_time
0,EDDB,BIKF,FI 518,2024-12-04 06:15:00,2024-12-03 16:51:01
1,EDDB,ZBAA,HU 489,2024-12-04 06:40:00,2024-12-03 16:51:01
2,EDDB,OTHH,QR 79,2024-12-04 06:55:00,2024-12-03 16:51:01
3,EDDB,EDDS,EW 8001,2024-12-04 07:35:00,2024-12-03 16:51:01
4,EDDB,EDDK,EW 2,2024-12-04 07:40:00,2024-12-03 16:51:01
...,...,...,...,...,...
2169,EDDF,LPPT,LH 6957,2024-12-04 22:40:00,2024-12-03 16:51:05
2170,EDDF,LPPT,AC 2660,2024-12-04 22:40:00,2024-12-03 16:51:05
2171,EDDF,LPPT,TP 574,2024-12-04 22:40:00,2024-12-03 16:51:05
2172,EDDF,LPPT,S4 8762,2024-12-04 22:40:00,2024-12-03 16:51:05


In [190]:
def FlightData(icao_airport_names_df):
    flight_data = []

    for code_row in icao_airport_names_df.itertuples(index=False):
        code = code_row.icao

        # Find the city timezone based on icao code API
        timezone_str = get_timezone_by_icao(code) or "UTC"
        city_timezone = timezone(timezone_str)

        # Get today and tomorrow's dates
        today = datetime.now(city_timezone).date()
        tomorrow = today + timedelta(days=1)

        # API requires two 12-hour calls
        times = [["00:00", "11:59"], ["12:00", "23:59"]]

        for time in times:
            url = f"https://aerodatabox.p.rapidapi.com/flights/airports/icao/{code}/{tomorrow}T{time[0]}/{tomorrow}T{time[1]}"
            querystring = {
                "withLeg": "true",
                "direction": "Arrival",
                "withCancelled": "false",
                "withCodeshared": "true",
                "withCargo": "false",
                "withPrivate": "false"
            }
            headers = {
                "x-rapidapi-key": safe_key["x_rapidapi_key"],
                "x-rapidapi-host": "aerodatabox.p.rapidapi.com"
            }

            # API request
            response = requests.get(url, headers=headers, params=querystring)
            
            if response.status_code != 200:
                print(f"Error fetching data for ICAO: {code}, Status Code: {response.status_code}")
                continue

            try:
                flight = response.json()
            except requests.exceptions.JSONDecodeError:
                print(f"Invalid JSON response for ICAO: {code}")
                continue

            # Data retrieval timestamp
            retrieval_time = datetime.now(city_timezone).strftime("%Y-%m-%d %H:%M:%S")

            # Parse arrivals
            for item in flight.get("arrivals", []):
                arrival_time = item["arrival"]["scheduledTime"].get("local", None)
                if not arrival_time:
                    continue

                data = {
                    "Arrival_airport_icao": code,
                    "Departure_airport_icao": item["departure"]["airport"].get("icao", None),
                    "Flight_number": item.get("number", None),
                    "Arrival_time": arrival_time,
                    "Data_retrieved_time": retrieval_time
                }
                flight_data.append(data)

    # Create DataFrame
    flight_df = pd.DataFrame(flight_data)
    if not flight_df.empty:
        flight_df["Arrival_time"] = pd.to_datetime(flight_df["Arrival_time"].str[:-6])
        flight_df["Data_retrieved_time"] = pd.to_datetime(flight_df["Data_retrieved_time"])
    return flight_df

# Example usage
#icao_airport_codes_df = pd.DataFrame({"icao": ["EDDB", "EDDF"]})
FlightData(icao_airport_names_df)



Error fetching data for ICAO: LFPB, Status Code: 204
Error fetching data for ICAO: LFPB, Status Code: 204


Unnamed: 0,Arrival_airport_icao,Departure_airport_icao,Flight_number,Arrival_time,Data_retrieved_time
0,EDDB,BIKF,FI 518,2024-12-04 06:15:00,2024-12-03 17:11:12
1,EDDB,ZBAA,HU 489,2024-12-04 06:40:00,2024-12-03 17:11:12
2,EDDB,OTHH,QR 79,2024-12-04 06:55:00,2024-12-03 17:11:12
3,EDDB,EDDS,EW 8001,2024-12-04 07:35:00,2024-12-03 17:11:12
4,EDDB,EDDK,EW 2,2024-12-04 07:40:00,2024-12-03 17:11:12
...,...,...,...,...,...
1088,VAPO,VIDP,6E 5342,2024-12-04 23:15:00,2024-12-03 21:41:26
1089,VAPO,VIJP,6E 6116,2024-12-04 23:20:00,2024-12-03 21:41:26
1090,VAPO,VILK,6E 118,2024-12-04 23:30:00,2024-12-03 21:41:26
1091,VAPO,VECC,6E 476,2024-12-04 23:35:00,2024-12-03 21:41:26
