# ETL EII: Exercises

## 1. The airports data

To solve this exercise, you will need to retrieve data from airports using 2 APIs. The first API will provide a random list airports (keep of 50 of them randomly), while the second API will provide detailed information about the airports.

The objective of this exercise is twofold. Firstly, you need to make a basic POST request with authentication to the application. Secondly, you will learn how to read and apply the documentation of an API.


### Part 1: 1st API

To make these calls, you will need to authenticate your requests.

The endpoint to use is https://api.oan.one/assembler/etl/ej1

To make the calls, you will need to include the `oanToken` header with the token value `ZF7xNEuLAZ5DKLQAEGVUq6VquGLQdsL7`.


**Reminder:** in the digital world, a token can be thought of as a digital "key" that allows access to specific services or resources within a system; e.g an API. It serves as a form of authentication, ensuring that only authorized individuals can access protected information or perform certain actions. Think of it as a special passcode that you use to unlock specific features or gain entry to restricted areas.


**TIP**: Did you try entering the URL in a browser? Before executing the Python code, you can practice making the POST requests in Postman with the corresponding `oanToken`. Then proceed to create the Python script.


Once you make the request, you will receive a JSON object containing a list of airports. You will need to perform a search in part 2 for each airport in the list.


**NOTE**: The API will provide you with the IATA codes of the airports.


From the list, randomly select 50 IATA codes of the airports.


### Part 2: 2nd API

To proceed with the exercise, you will need to create an account at https://www.air-port-codes.com/ and obtain the API keys. Once you have the API keys, you can use Postman to explore the available endpoints and determine which one best suits your needs for this exercise.

Create a dataframe containing the following fields for your 50 random airports:

- ID (IATA code): int
- Name: str
- Latitude: float (round to 2 digits)
- Longitude: float (round to 2 digits)
- Country: str
- City: str
- Continent: str


**TIP**: To learn how to use the API, refer to the documentation on the website and try out the endpoints using Postman.

In [None]:
# Type your code here:

## 2. The airlines data

### Part 1: scrap the web!

Get the full list of airlines contained in https://www.flightradar24.com/data/airlines. Save that data in a dataframe (with headers) containing the following features:


- Name (str): the name corresponding to the airline.


- IATA code (str): a  location identifier composed of a unique 3-letter code used in aviation and logistics to identify an airport.


- OACI code (str): 2-letter code; None if not available.


- Fleet (int): amount of airplains.


- Image URL (str): a functioning website with the company logo displayed as an image.


- Timestamp: precise moment when the information is captured in the format AAA-MM-DD hh:mm:ss.



**Notes:** you must not transform the data obtained from the request. The only allowed modifications are minor changes, such as converting data types or removing the word "aircraft" from the fleet field.



### Part 2: store

Download and save all the airlines logos in a folder

In [None]:
# Type your code here:

## 3. Live flights: intercept an API and get some information 

For this exercise, you will need to find the optimal approach, and it may require some research to complete the tasks.

To access a map showing currently active flights in Central Europe, you can visit https://www.flightradar24.com/multiview/49.05,10.82/7. Alternatively, you can choose a specific location on the planet that you desire.

**Warning:** DO NOT TRY OUT the new Flightradar24 site in beta. Click on "Stay on current site".

Please be aware that data scraping is not a standardized process. It involves extracting information from websites in a manner that may not align with their intended use. As a result, there is typically no official documentation to guide us, and we may need to make "guesses" about the data structure and content.

The objective of this exercise is to extract all the data of the currently active flights on the screen, such as origin and destination, altitude and speed, airline, aircraft type, etc., and store this data in a dataframe.

**Note**: You should not make any kind of transformation to the data, keep each data as it is in the request.

**Some more warnings:**

- You will need to find the URL corresponding to the active flights displayed on your screen by inspecting the web page. It looks like this: `https://data-cloud.flightradar24.com/zones/fcgi/feed.js?...QUERRY_PARAMS...`; where the query parameters (after the question mark - ? -) depend on the GET request your browser makes to Flightradar24 to show the information displayed on your screen.

- As previously mentioned, you don't need to have the full experience, unlimited session, or any subscription from Flightradar24 for this exercise. We will access the source code of the webpage and kindly borrow some information. The URL you are looking for has the request of the flights shown on your screen, but only on the regular Flightradar24 webpage. If you access the beta version of the page, the request is made through a POST request and is much less straightforward. I don't recommend it.

- Compare what it is displayed on your screen (the airplanes and their information) and the corresponding information in the Request. The information must match.

- As a suggestion, I recommend you zoom in on an area with low air traffic to understand how the responses you see when inspecting the page correspond to the movement of the airplanes you can see on the screen. 

Here is an example:

<p align="center">
<img width=1400 src="Images/flightradar.png">
</p>



<br>

**Lifejacket:** cómo funciona la página que tienes frente a tus ojos?


**Voy en Español para que quede super claro!**

Flightradar24 proporciona información en tiempo real sobre vuelos en todo el mundo a través de su plataforma web. Cuando accedes al sitio web, por medio de la interfaz gráfica, realizas consultas específicas y recibes los datos actualizados de los vuelos.

Cuando interactúas con la plataforma de Flightradar24, tu navegador web envía solicitudes HTTP al servidor de Flightradar24 a través de la conexión de red. Estas solicitudes generalmente son del tipo GET y contienen la URL específica que corresponde a la información que deseas obtener, así como otros parámetros necesarios (lo que sigue al signo de pregunta - ? -  en la URL). Esta es la URL que estamos buscando para hacer nuestro Request desde Python.

El servidor de Flightradar24 procesa estas solicitudes y recupera los datos correspondientes de su base de datos o de otras fuentes de información en tiempo real como la información transmitida por los aviones. Una vez que el servidor ha recopilado los datos solicitados, los empaqueta en una respuesta HTTP y los envía de vuelta al navegador.

El navegador interpreta la respuesta del servidor y muestra los datos en tu pantalla, lo que te permite ver la información actualizada sobre los vuelos. Esta comunicación entre el navegador y el servidor se produce constantemente, lo que permite que la información se actualice en tiempo real a medida que se reciben nuevos datos.

Es importante tener en cuenta que el flujo de información y las solicitudes GET ocurren en segundo plano, sin que necesariamente veas las solicitudes y respuestas individuales en tu pantalla. Sin embargo, puedes inspeccionar la página y ver las solicitudes realizadas a medida que se actualizan los datos de los vuelos. Estas solicitudes las verás bajo el nombre de `feed.js?faa=...`. Como constantemente, cada unos pocos segundos, se realizan nuevas peticiones, nuevas solicitudes bajo el nombre `feed.js?faa=...` aparecerán en tu cuadro de inspección traqueando el movimiento de los aviones en tu pantalla.

Puedes copiar la URL de estas solicitudes que son la información que estamos buscando almacenar en el dataframe!

<br>

**Is the task clear?**

To summarize, the objective is to save all the flights of the planes visible on the screen in your browser into a dataframe. 

A possible output is the following with the identified fields:

<p align="center">
<img width=1400 src="Images/Output.png">
</p>


In [None]:
# Type your code here:

## 4. Airlines Routes

In this exercise, you will extract the different airline codes from a database and then utilize the Amadeus API to retrieve all the routes operated by each airline. You will store all airline destinations in a single dataframe with the following structure:

- Name: City name
- cityCode: IATA location code
- airlineCode: IATA airline code
- timeZone: City's timezone (OPTIONAL)

**Notes:**

1 - To use Amadeus, you need to create and register a developer account, which is free of charge. You can do so by visiting the following link: https://developers.amadeus.com/


2 - The MySQL database connection details are as follows:
   - Server: iosqlde.onairnet.xyz
   - User: assemblerReader
   - Password: !Reader2022
   - Database: sqlCourse
   - Table: rawFlights

3 - The table used for this exercise contains more information than necessary. Your task is to obtain only the list of airline codes.


Please ensure that you follow the provided instructions and refer to the Amadeus documentation for guidance on utilizing their API effectively.

In [None]:
# Type your code here: