# ETL II: Exercises

## 1.- The flights data

In this exercise, you will make an API call to obtain some data, and your task is to store that data in an SQL database with specific fields and format.



The destination table should have the following structure:

- id: A unique identifier for each flight (composed of the date in the format YYYYMMDD + flight number)
- flightNumber: The flight number in the format AA0000 (can contain null values)
- flightDate: The date of the flight (DATE format)
- flightTime: The departure time of the flight in HH:MM format
- arrivalTime: The arrival time of the flight in HH:MM format
- aircraftType: The aircraft model in XXXX format
- aircraftPlate: The aircraft plate number in XXXXXX format (can contain null values)
- origin: The airport code of departure in XX format
- destination: The airport code of arrival in XX format
- status: The flight status in text format

**TIP**: Before executing the Python code, you can practice making the requests in POSTMAN.

**Notes:**
- API base: `https://api.oan.one/assembler/`
- API endpoint: `/etl/ej2`
- Token Header: `oanToken`
- Token: `ZF7xNEuLAZ5DKLQAEGVUq6VquGLQdsL7`

In [None]:
# Type your code here

## 2. Spain Cities and Villages

For this exercise, you need to create a database containing data from the geography of Spain.

To accomplish this, you'll need to create three SQL tables containing the following information:

**State (comunidades autónomas)**
- id -> Sequential numeric value starting from 1001
- countryCode -> Always "ES" in our case
- name -> Name of the state

**Region (provincia)**
- id -> Sequential numeric value starting from 1001
- stateID -> ID of the corresponding state
- name -> Name of the region

**Cities**
- id -> Sequential numeric value starting from 1001
- regionID -> ID of the corresponding region
- Name -> Name of the city
- Residenst -> Number of residents in each city

<br>

**Notes:**

- The data must be extracted from Wikipedia by performing web scraping on the page: https://es.wikipedia.org/wiki/Anexo:Municipios_de_Espa%C3%B1a_por_poblaci%C3%B3n

- Keep in mind that you'll need to clean and standardize the data formats.

In [None]:
# Type your code here

## 3. Web Scrapping - Ibex 35

<img src = "https://media.istockphoto.com/id/1317587887/es/foto/gr%C3%A1ficos-de-trading-en-una-pantalla.jpg?s=612x612&w=0&k=20&c=LyckBY4eyaJIm-y9pUVfA5THPwnRYRIHTm7sIt3dVgc=">

Now, we're going to "download" (haha) some data from a journal about economic new called Expansion. In the digital version of this newspaper they are showing information about the Ibex35 (The IBEX 35 is the benchmark stock market index of the Spanish stock exchange), if you go to the following website https://www.expansion.com/mercados/cotizaciones/indices/ibex35_I.IB.html you can find the stock market information about the best 35 companies in Spain. Our mission ? - Simple, do some webscrapping with Beautiful Soup and get the dataframe of the table with the information about the 35 companies.

**Optional**: Export the data to a relational database

In [None]:
# Type your code here

## 4. Web Scrapping - The "millardos" List

<img src = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/db/Forbes_logo.svg/640px-Forbes_logo.svg.png">

Lets keep "downloading" more data, now we're going to get data from Wikipedia, in this case information about the concept "_millardo_" (in Spanish) that means number of multimillionaries, to do this, take the following url https://es.wikipedia.org/wiki/Anexo:Milmillonarios_seg%C3%BAn_Forbes you must obtain the table that appears in the statistics that are in the heading: "Estadísticas. De acuerdo con la RAE la palabra "millardo" es la que expresa la cantidad de mil millones. Finally, plot the time series of Year and number of multi millionaries

**Notes**: Here the way to use the `find` function will be a little different, since we are not going to search for a specific _id_, our table is contained within an element `<table class = .../ >`, therefore, the way to access the code is a little different, here we show how to access said table element.

`table = soup.find("table", {"class":"wikitable sortable col1cen col2cen col3cen"})`

**Optional**: Export the data to a relational database

In [None]:
# Type your code here