# Data Report #


**Introduction**

Climate change is an ever present topic, especially nowadays. It affects us by changing weather patterns, increasing the changes that weather events are more dangerous and harms our planet. In this current age of misinformation, one popular source to cite on climate change is a graph which shows the change in temperature of our planet over the last 100 years. 

Picture credit to NASA

![Image of Graph](../images/nasa100yeartemp.jpg)



Despite the trend upwards, what has certain individuals confused is the size of this increase. Something so small surely can't be of any worry to us, right? Unfortunately that isn't the case. Despite this tiny change, continuing on this upward trend will have some disastrous effects on our planet such as a rising sea level and the possible total annihilation of our ice caps. (outlined [here](https://earthobservatory.nasa.gov/features/GlobalWarming/page1.php)) 

How does this tie in to an anonymous Italian city? We can analyze the quality of its air and the magnitude of its temperature and see if we can spot that same upward trend of temperature increase. We can also ask the following questions and identify how the pollution from this city is changing.

**1. Is the average pollution of this Italian City increasing over time or is it relatively constant?**

**2. Does Temperature have an impact on the amount of pollution in the air? Does humidity?**

# Dataset #

The dataset contains 9358 entries. It has columns (Descriptions sourced from dataset location [here](https://archive.ics.uci.edu/ml/datasets/Air+Quality)): 

0. Date (DD/MM/YYYY)
1. Time (HH.MM.SS)
2. CO (Carbon Monoxide) Concentration in mg/m^3
3. PT08.S1 (Tin Oxide) hourly averaged sensor response (nominally CO targeted)
4. True hourly averaged overall Non Metanic HydroCarbons concentration in microg/m^3 (reference analyzer)
5. True hourly averaged Benzene concentration in microg/m^3 (reference analyzer)
6. PT08.S2 (titania) hourly averaged sensor response (nominally NMHC targeted)
7. True hourly averaged NOx concentration in ppb (reference analyzer)
8. PT08.S3 (tungsten oxide) hourly averaged sensor response (nominally NOx targeted)
9. True hourly averaged NO2 concentration in microg/m^3 (reference analyzer)
10. PT08.S4 (tungsten oxide) hourly averaged sensor response (nominally NO2 targeted)
11. PT08.S5 (indium oxide) hourly averaged sensor response (nominally O3 targeted)
12. Temperature in Â°C
13. Relative Humidity (%)
14. AH Absolute Humidity

**Notes**

1. Despite several of the PT08 columns being aesthetically similar, they each refer to much different things. These are Tin Oxide (PT08.S1), Titania (PT08.S2), Tungsten Oxide (PT08.S3(NOx), PT08.S4(NO2)) and Indium Oxide (PT08.S5(O3)). These variables aren't measured in the same way that CO(GT), NMHC(GT), C6H6(GT), etc are measured but rather by an averaged sensor response. The parentheses at the end of PT08 likely refer to its related measured variable (Measured in micromg/m^3 or mg/m^3) If you use the two of them in conjunction, you would likely be able to find out how much of a particular type of PT08 is present.

2. Absolute humidity is often expressed in either grams/m^3 or grams per kilogram but the dataset creator does not specify which he uses.

3. Error values are displayed as -200.


# Data & Observations

To begin, we can import our data and clean it, removing any rows that contain error values. 

In [4]:
import pandas as pd
import os
import numpy as np
import pandas as pd
import matplotlib.pylab as plt
import seaborn as sns

from scripts import project_functions

df = project_functions.import_and_clean()
project_functions.eda(df)

Rows, Columns
 (827, 15)

First Five rows
        Date      Time  CO(GT)  PT08.S1(CO)  NMHC(GT)  C6H6(GT)  \
0 2004-03-10  18:00:00     2.6       1360.0     150.0      11.9   
1 2004-03-10  19:00:00     2.0       1292.0     112.0       9.4   
2 2004-03-10  20:00:00     2.2       1402.0      88.0       9.0   
3 2004-03-10  21:00:00     2.2       1376.0      80.0       9.2   
4 2004-03-10  22:00:00     1.6       1272.0      51.0       6.5   

   PT08.S2(NMHC)  NOx(GT)  PT08.S3(NOx)  NO2(GT)  PT08.S4(NO2)  PT08.S5(O3)  \
0         1046.0    166.0        1056.0    113.0        1692.0       1268.0   
1          955.0    103.0        1174.0     92.0        1559.0        972.0   
2          939.0    131.0        1140.0    114.0        1555.0       1074.0   
3          948.0    172.0        1092.0    122.0        1584.0       1203.0   
4          836.0    131.0        1205.0    116.0        1490.0       1110.0   

      T    RH      AH  
0  13.6  48.9  0.7578  
1  13.3  47.7  0.7255  
2  11.9 