# <center><div class="alert alert-block alert-info">This code extracts data from an <b>API</b>, Transforms it and Load into a Database</div></center>

## The API call here will be from __[OpenWeatherMap.org](https://openweathermap.org/api/air-pollution)__ to collect Air Pollution data for a specific coordinate

### Importing needed packages

In [6]:

import pandas as pd
import requests


### Required variables

In [7]:
# API key needed for the call - generated from the web site -
api_key = 'bd0251f8398a0ebec1613513b5d6ceca'

# Coordinate of the specific area we are interested in - here Central Omaha -
lat, lon = 41.24571173787448, -96.0306766000668

# Start and End date of the period we are interested in - here 2023/11/02 to 2024/01/01 - 
# in Unix Time format
start_date, end_date = 1698883200, 1704067200

# Actual API call structure
api_call = f'http://api.openweathermap.org/data/2.5/air_pollution/history?\
lat={lat}&lon={lon}&start={start_date}&end={end_date}&appid={api_key}'

### Launching the API call

In [8]:
# Querying the web site - a Response [200] mean successful -
api_request = requests.get(api_call)
api_request

<Response [200]>

In [9]:
# Converting the request's response to JSON
response = api_request.json()
# response


> The data is of a dictionary type

### Transforming the data extracted

In [10]:
# Dictionary keys
print(response.keys())

dict_keys(['coord', 'list'])


In [11]:
# Recuperating only the 'list' key value
# response['list']

> This key (`list`) value is a list of dictionaries

#### Restructuring the 1rst element of the list

In [12]:
# Verifying that 1rst element
print(response['list'][0])

# Transforming that 1rst element - aq: air quality -
aq_components_dict = response['list'][0]['components']
aq_components_dict['aqi'] = response['list'][0]['main']['aqi']
aq_components_dict['date'] = response['list'][0]['dt']
print(aq_components_dict)

{'main': {'aqi': 1}, 'components': {'co': 360.49, 'no': 0.03, 'no2': 24.33, 'o3': 46.49, 'so2': 0.94, 'pm2_5': 5.55, 'pm10': 7.21, 'nh3': 2.72}, 'dt': 1698883200}
{'co': 360.49, 'no': 0.03, 'no2': 24.33, 'o3': 46.49, 'so2': 0.94, 'pm2_5': 5.55, 'pm10': 7.21, 'nh3': 2.72, 'aqi': 1, 'date': 1698883200}


> Now we have all needed information grouped in `1` single dictionary

In [13]:
# Looping through the hole list to rebuild 
aq_components_list = []

for i in range(len(response['list'])):
    aq_components_dict = response['list'][i]['components']
    aq_components_dict['aqi'] = response['list'][i]['main']['aqi']
    aq_components_dict['date'] = response['list'][i]['dt']
    aq_components_list.append(aq_components_dict)

# aq_components_list


> Now we have a nice list of dictionary 

#### Converting the list to pandas DataFrame

In [18]:
# 
aq_df = pd.DataFrame(aq_components_list)
aq_df.head()

Unnamed: 0,co,no,no2,o3,so2,pm2_5,pm10,nh3,aqi,date
0,360.49,0.03,24.33,46.49,0.94,5.55,7.21,2.72,1,1698883200
1,377.18,0.04,26.39,38.62,0.86,6.19,7.9,2.94,1,1698886800
2,367.16,0.03,23.65,39.34,0.77,6.29,8.02,2.85,1,1698890400
3,337.12,0.01,18.16,45.06,0.68,5.64,7.3,2.6,1,1698894000
4,330.45,0.02,16.45,45.06,0.59,5.42,7.02,2.56,1,1698897600


### Loading the data to a database

> Here I used sqlite to create a database in the same folder as my code file

In [15]:
# Importing required package
import sqlite3


In [16]:
# Connecting to the database and loading the data
conn = sqlite3.connect('Air_quality.db')
aq_df.to_sql(name='aq_index', con=conn, if_exists='replace', index=False)
conn.close()

> We can also load the data localy straigh into a folder 

In [17]:
# Loading the data into the current working directory as a CSV file
aq_df.to_csv('aq_index.csv', index=False)

<b>Notice that so far all we have done is extract the data and transform it.<br>
No cleaning is made yet as cleaning involves handling duplicate entries, outliers, inacurate, unwanted, irrelevant, and missing data. And fixing structured errors as well.<br>We will go through data cleaning in detail at the Analysis part.<b>