# ETL EII: Exercises

## 1. The airports data

To solve this exercise, you will need to retrieve data from airports using 2 APIs. The first API will provide a random list airports (keep of 50 of them randomly), while the second API will provide detailed information about the airports.

The objective of this exercise is twofold. Firstly, you need to make a basic POST request with authentication to the application. Secondly, you will learn how to read and apply the documentation of an API.


### Part 1: 1st API

To make these calls, you will need to authenticate your requests.

The endpoint to use is https://api.oan.one/assembler/etl/ej1

To make the calls, you will need to include the `oanToken` header with the token value `ZF7xNEuLAZ5DKLQAEGVUq6VquGLQdsL7`.


**Reminder:** in the digital world, a token can be thought of as a digital "key" that allows access to specific services or resources within a system; e.g an API. It serves as a form of authentication, ensuring that only authorized individuals can access protected information or perform certain actions. Think of it as a special passcode that you use to unlock specific features or gain entry to restricted areas.


**TIP**: Did you try entering the URL in a browser? Before executing the Python code, you can practice making the POST requests in Postman with the corresponding `oanToken`. Then proceed to create the Python script.


Once you make the request, you will receive a JSON object containing a list of airports. You will need to perform a search in part 2 for each airport in the list.


**NOTE**: The API will provide you with the IATA codes of the airports.


From the list, randomly select 50 IATA codes of the airports.


### Part 2: 2nd API

To proceed with the exercise, you will need to create an account at https://www.air-port-codes.com/ and obtain the API keys. Once you have the API keys, you can use Postman to explore the available endpoints and determine which one best suits your needs for this exercise.

Create a dataframe containing the following fields for your 50 random airports:

- ID (IATA code): int
- Name: str
- Latitude: float (round to 2 digits)
- Longitude: float (round to 2 digits)
- Country: str
- City: str
- Continent: str


**TIP**: To learn how to use the API, refer to the documentation on the website and try out the endpoints using Postman.

In [None]:
# Type your code here:

## 2. The airlines data

### Part 1: scrap the web!

Get the full list of airlines contained in https://www.flightradar24.com/data/airlines. Save that data in a dataframe (with headers) containing the following features:


- Name (str): the name corresponding to the airline.


- IATA code (str): a  location identifier composed of a unique 3-letter code used in aviation and logistics to identify an airport.


- OACI code (str): 2-letter code; None if not available.


- Fleet (int): amount of airplains.


- Image URL (str): a functioning website with the company logo displayed as an image.


- Timestamp: precise moment when the information is captured in the format AAA-MM-DD hh:mm:ss.



**Notes:** you must not transform the data obtained from the request. The only allowed modifications are minor changes, such as converting data types or removing the word "aircraft" from the fleet field.



### Part 2: store

Download and save all the airlines logos in a folder

In [None]:
# Type your code here:

## 3. Cat-Fact Data

<img src="https://www.padoniavets.com/sites/default/files/field/image/cats-and-dogs.jpg">

Let's get information about animals ! 🐶🐱 To do this, we'll use an API of the following website https://cat-fact.herokuapp.com in the Developer API section we can find information about the different endpoints of the API https://alexwohlbruck.github.io/cat-facts/docs/ with this API you should be able to do the next tasks:
+ Research the different API endpoints
+ Get random facts about 50 cats and 50 dogs (we'll only use the verified facts)
+ Save that information in a dataframe that has the following columns:
    + Document ID
    + Text recovered from the document.
    + Type of animal
    + Date of last update of the document.

In [None]:
# Type your code here:

## 4. Airlines Routes

In this exercise, you will extract the different airline codes from a database and then utilize the Amadeus API to retrieve all the routes operated by each airline. You will store all airline destinations in a single dataframe with the following structure:

- Name: City name
- cityCode: IATA location code
- airlineCode: IATA airline code
- timeZone: City's timezone (OPTIONAL)

**Notes:**

1 - To use Amadeus, you need to create and register a developer account, which is free of charge. You can do so by visiting the following link: https://developers.amadeus.com/


2 - The MySQL database connection details are as follows:
   - Server: iosqlde.onairnet.xyz
   - User: assemblerReader
   - Password: !Reader2022
   - Database: sqlCourse
   - Table: rawFlights

3 - The table used for this exercise contains more information than necessary. Your task is to obtain only the list of airline codes.


Please ensure that you follow the provided instructions and refer to the Amadeus documentation for guidance on utilizing their API effectively.

In [None]:
# Type your code here:

## 5. Star Wars Data

<img src = "https://upload.wikimedia.org/wikipedia/commons/thumb/6/6c/Star_Wars_Logo.svg/1200px-Star_Wars_Logo.svg.png" >

For this exercise we're going to use an API about Star Wars. This API is free https://swapi.dev/ so you wouldn't need to use any API Key (feel free to search about the different endpoints of the API)

To check connection status of the get method and the results of each query you can use PostMan and also the url https://swapi.dev/ includes an interface to check your querys.

For example if you type the following code:

```python
import requests

# Using the get method 
req_js = requests.get("https://swapi.dev/api/").json()
```

You'll get the following results

```python
print(req_js)

# Results
{'people': 'https://swapi.dev/api/people/',
 'planets': 'https://swapi.dev/api/planets/',
 'films': 'https://swapi.dev/api/films/',
 'species': 'https://swapi.dev/api/species/',
 'vehicles': 'https://swapi.dev/api/vehicles/',
 'starships': 'https://swapi.dev/api/starships/'}
```

As you can see the different endpoints of the API are:
+ People
+ Planets
+ Films
+ Species
+ Vehicles
+ Starships

In each endpoint we can find about an specific item, por instance planes, in this case we just need to add to the end of our endpoint a number to retrieve data `planets/1`, this call will return data about the first planet as name, gravity, etc. If you need to read further information you can find the API documentation in the followin url https://swapi.dev/documentation

With the StarWars API you need to complete the following tasks:
* Get a dataframe with all records about StarWars people (only the key filds from name to gender).
* With the dataframe, show the mass mean by sex (note that the string ('n/a') will be considered as a gender)
* Plot the relationship between weigth and heigth of the People.
* Plot the relationship between the mean value of heigth explanied by the eye color

In [None]:
# Type your code here:

## 6. Non-Structured Problematic Data

<img src="https://i0.wp.com/dryviq.com/wp-content/uploads/2020/11/DryvIQ-Unstructured-VS-Structured-Data-Diagram-Light.png?resize=1024%2C512&ssl=1">

Let's work with more non-structred data, in this case you will not make any call to any API, we've alredy the data stored in a file called `qa_Cell_Phones_and_Accessories.json`. The data contains information about conversations made by an operational agent of a cell and phones store and the clients.

The file contains in each row a json object so, we've a collecion of multiple jsons (the data type of data that should be stored in non relational databases as MongoDB). Of course all of this information it's a mess and it's in raw format, you're mission will be to get the data from a non-structured world (collecion of jsons) to the structured data format (pandas dataframe). 

**Notes**: If you look the data can check that all pair of key : value fields are writed with single quotes but json objects should have double quotes.

Some things that you can try to read the data:
- Use the `open` function with `json.load` function
- Use the `open` function with `json.loads` function 
- Use the `open` function with `json.loads` function in each line of the file.
- Replace all single quotes `' '` with double quotes `" "` in each pair of key : value fields in each row of the file.
- Research about more solutions...

Some urls to find help about some error you probabilly get.

+ https://stackoverflow.com/questions/39491420/python-jsonexpecting-property-name-enclosed-in-double-quotes
+ https://stackoverflow.com/questions/9156417/valid-json-giving-jsondecodeerror-expecting-delimiter
+ https://www.datasciencebyexample.com/2023/03/16/what-to-do-when-single-quotes-in-json-string/

In [None]:
# Type your code here: