# Datasetovi - opis

Za air quality odabrali smo 3(ili samo ona 2) dataseta sa [sledeceg sajta](archive.ics.uci.edu/ml/datasets):
1. [Beijing Multi-Site Air-Quality Data Data Set](https://archive.ics.uci.edu/ml/datasets/Beijing+Multi-Site+Air-Quality+Data) - dataset sadrzi 6 glavnih polutanata i 6 glavnih meteoroloskih varijabli, merenih na nekoliko razlicitih sanica, svakog sata, oko Pekinga. Polja unutar dataseta su sledeca:
    1. No: row number
    1. year: year of data in this row
    1. month: month of data in this row
    1. day: day of data in this row
    1. hour: hour of data in this row
    1. PM2.5: PM2.5 concentration (ug/m^3)
    1. PM10: PM10 concentration (ug/m^3)
    1. SO2: SO2 concentration (ug/m^3)
    1. NO2: NO2 concentration (ug/m^3)
    1. CO: CO concentration (ug/m^3)
    1. O3: O3 concentration (ug/m^3)
    1. TEMP: temperature (degree Celsius)
    1. PRES: pressure (hPa)
    1. DEWP: dew point temperature (degree Celsius)
    1. RAIN: precipitation (mm)
    1. wd: wind direction
    1. WSPM: wind speed (m/s)
    1. station: name of the air-quality monitoring site


2. [Beijing PM2.5 Data Data Set](https://archive.ics.uci.edu/ml/datasets/Beijing+PM2.5+Data) - Sadrzi podatke iz US ambasade u Pekingu i glavnog aerodroma u Pekingu, takodje PM2.5 cestice merene svakog sata. Obelezja:
    1. No: row number
    1. year: year of data in this row
    1. month: month of data in this row
    1. day: day of data in this row
    1. hour: hour of data in this row
    1. pm2.5: PM2.5 concentration (ug/m^3)
    1. DEWP: Dew Point (â„ƒ)
    1. TEMP: Temperature (â„ƒ)
    1. PRES: Pressure (hPa)
    1. cbwd: Combined wind direction
    1. Iws: Cumulated wind speed (m/s)
    1. Is: Cumulated hours of snow
    1. Ir: Cumulated hours of rain


3. [Air Quality Data Set](https://archive.ics.uci.edu/ml/datasets/Air+Quality) - sadrzi merenja sa visesenzorskih uredjaja koji su postavljeni u polju jednog italijanskog grada. Obelezja su:
    1. Date (DD/MM/YYYY)
    1. Time (HH.MM.SS)
    1. True hourly averaged concentration CO in mg/m^3 (reference analyzer)
    1. PT08.S1 (tin oxide) hourly averaged sensor response (nominally CO targeted)
    1. True hourly averaged overall Non Metanic HydroCarbons concentration in microg/m^3 (reference analyzer)
    1. True hourly averaged Benzene concentration in microg/m^3 (reference analyzer)
    1. PT08.S2 (titania) hourly averaged sensor response (nominally NMHC targeted)
    1. True hourly averaged NOx concentration in ppb (reference analyzer)
    1. PT08.S3 (tungsten oxide) hourly averaged sensor response (nominally NOx targeted)
    1. True hourly averaged NO2 concentration in microg/m^3 (reference analyzer)
    1. PT08.S4 (tungsten oxide) hourly averaged sensor response (nominally NO2 targeted)
    1. PT08.S5 (indium oxide) hourly averaged sensor response (nominally O3 targeted)
    1. Temperature in °C
    1. Relative Humidity (%)
    1. AH Absolute Humidity

# Ucitavanje podataka

In [1]:
import pandas as pd

base_data_folder = "./Data"

## Beijing dataset

In [2]:
beijing_folder = "/Beijing dataset"

In [3]:
beijing_ds = pd.read_csv(base_data_folder + beijing_folder + "/Beijing.csv")
beijing_ds.head(10)

Unnamed: 0,No,year,month,day,hour,pm2.5,DEWP,TEMP,PRES,cbwd,Iws,Is,Ir
0,1,2010,1,1,0,,-21,-11.0,1021.0,NW,1.79,0,0
1,2,2010,1,1,1,,-21,-12.0,1020.0,NW,4.92,0,0
2,3,2010,1,1,2,,-21,-11.0,1019.0,NW,6.71,0,0
3,4,2010,1,1,3,,-21,-14.0,1019.0,NW,9.84,0,0
4,5,2010,1,1,4,,-20,-12.0,1018.0,NW,12.97,0,0
5,6,2010,1,1,5,,-19,-10.0,1017.0,NW,16.1,0,0
6,7,2010,1,1,6,,-19,-9.0,1017.0,NW,19.23,0,0
7,8,2010,1,1,7,,-19,-9.0,1017.0,NW,21.02,0,0
8,9,2010,1,1,8,,-19,-9.0,1017.0,NW,24.15,0,0
9,10,2010,1,1,9,,-20,-8.0,1017.0,NW,27.28,0,0


## PRSA dataset

In [4]:
prsa_folder = "/PRSA Data - Chinese cities"

In [5]:
import os
datasets = [pd.read_csv(base_data_folder + prsa_folder+ "/" + file) for file in os.listdir(base_data_folder + prsa_folder) if file.endswith('.csv')]

prsa_dataset = pd.concat(datasets, axis=0)

del datasets
prsa_dataset.head(10)

Unnamed: 0,No,year,month,day,hour,PM2.5,PM10,SO2,NO2,CO,O3,TEMP,PRES,DEWP,RAIN,wd,WSPM,station
0,1,2013,3,1,0,4.0,4.0,4.0,7.0,300.0,77.0,-0.7,1023.0,-18.8,0.0,NNW,4.4,Aotizhongxin
1,2,2013,3,1,1,8.0,8.0,4.0,7.0,300.0,77.0,-1.1,1023.2,-18.2,0.0,N,4.7,Aotizhongxin
2,3,2013,3,1,2,7.0,7.0,5.0,10.0,300.0,73.0,-1.1,1023.5,-18.2,0.0,NNW,5.6,Aotizhongxin
3,4,2013,3,1,3,6.0,6.0,11.0,11.0,300.0,72.0,-1.4,1024.5,-19.4,0.0,NW,3.1,Aotizhongxin
4,5,2013,3,1,4,3.0,3.0,12.0,12.0,300.0,72.0,-2.0,1025.2,-19.5,0.0,N,2.0,Aotizhongxin
5,6,2013,3,1,5,5.0,5.0,18.0,18.0,400.0,66.0,-2.2,1025.6,-19.6,0.0,N,3.7,Aotizhongxin
6,7,2013,3,1,6,3.0,3.0,18.0,32.0,500.0,50.0,-2.6,1026.5,-19.1,0.0,NNE,2.5,Aotizhongxin
7,8,2013,3,1,7,3.0,6.0,19.0,41.0,500.0,43.0,-1.6,1027.4,-19.1,0.0,NNW,3.8,Aotizhongxin
8,9,2013,3,1,8,3.0,6.0,16.0,43.0,500.0,45.0,0.1,1028.3,-19.2,0.0,NNW,4.1,Aotizhongxin
9,10,2013,3,1,9,3.0,8.0,12.0,28.0,400.0,59.0,1.2,1028.5,-19.3,0.0,N,2.6,Aotizhongxin


## Italy AQ

In [6]:
italy_folder = "/Italy AQ"

In [7]:
italy_ds = pd.read_csv(base_data_folder + italy_folder + "/AirQualityUCI.csv", sep=";")
italy_ds.head(10)

Unnamed: 0,Date,Time,CO(GT),PT08.S1(CO),NMHC(GT),C6H6(GT),PT08.S2(NMHC),NOx(GT),PT08.S3(NOx),NO2(GT),PT08.S4(NO2),PT08.S5(O3),T,RH,AH
0,10/03/2004,18.00.00,2.6,1360,150,11.9,1046,166,1056,113,1692,1268,13.6,48.9,0.7578
1,10/03/2004,19.00.00,2.0,1292,112,9.4,955,103,1174,92,1559,972,13.3,47.7,0.7255
2,10/03/2004,20.00.00,2.2,1402,88,9.0,939,131,1140,114,1555,1074,11.9,54.0,0.7502
3,10/03/2004,21.00.00,2.2,1376,80,9.2,948,172,1092,122,1584,1203,11.0,60.0,0.7867
4,10/03/2004,22.00.00,1.6,1272,51,6.5,836,131,1205,116,1490,1110,11.2,59.6,0.7888
5,10/03/2004,23.00.00,1.2,1197,38,4.7,750,89,1337,96,1393,949,11.2,59.2,0.7848
6,11/03/2004,00.00.00,1.2,1185,31,3.6,690,62,1462,77,1333,733,11.3,56.8,0.7603
7,11/03/2004,01.00.00,1.0,1136,31,3.3,672,62,1453,76,1333,730,10.7,60.0,0.7702
8,11/03/2004,02.00.00,0.9,1094,24,2.3,609,45,1579,60,1276,620,10.7,59.7,0.7648
9,11/03/2004,03.00.00,0.6,1010,19,1.7,561,-200,1705,-200,1235,501,10.3,60.2,0.7517
