# Urban Environment

## Air Quality Measures in Barcelona and Murcia

> ### An analysis and comparison of public datasets

First, we explore the available data.

In [9]:
import pandas as pd

quality = pd.read_csv('../datasets/2.-Urban-Environment/air-quality-nov-2017.csv')
quality

Unnamed: 0,Station,Air Quality,Longitude,Latitude,O3 Hour,O3 Quality,O3 Value,NO2 Hour,NO2 Quality,NO2 Value,PM10 Hour,PM10 Quality,PM10 Value,Generated,Date Time
0,Barcelona - Sants,Good,2.1331,41.3788,,,,0h,Good,84.0,,,,01/11/2018 0:00,1541027104
1,Barcelona - Eixample,Moderate,2.1538,41.3853,0h,Good,1.0,0h,Moderate,113.0,0h,Good,36.0,01/11/2018 0:00,1541027104
2,Barcelona - Gràcia,Good,2.1534,41.3987,0h,Good,10.0,0h,Good,73.0,,,,01/11/2018 0:00,1541027104
3,Barcelona - Ciutadella,Good,2.1874,41.3864,0h,Good,2.0,0h,Good,86.0,,,,01/11/2018 0:00,1541027104
4,Barcelona - Vall Hebron,Good,2.1480,41.4261,0h,Good,7.0,0h,Good,69.0,,,,01/11/2018 0:00,1541027104
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5739,Barcelona - Ciutadella,Good,2.1874,41.3864,22h,Good,10.0,22h,Good,57.0,,,,30/11/2018 23:00,1543615502
5740,Barcelona - Vall Hebron,Good,2.1480,41.4261,22h,Good,32.0,22h,Good,31.0,22h,Good,21.0,30/11/2018 23:00,1543615502
5741,Barcelona - Palau Reial,Good,2.1151,41.3875,22h,Good,40.0,22h,Good,20.0,22h,Good,15.0,30/11/2018 23:00,1543615502
5742,Barcelona - Poblenou,Good,2.2045,41.4039,,,,22h,Good,70.0,22h,Good,25.0,30/11/2018 23:00,1543615502


### Question 1
#### Which measures determine good air quality?

In [10]:
reduced_quality = quality[['Air Quality','O3 Value','NO2 Value','PM10 Value']]

reduced_good_quality = reduced_quality[reduced_quality['Air Quality'] == 'Good']
print("GOOD AIR QUALITY")
print()
print(reduced_good_quality.mean())
print()
print(reduced_good_quality.describe()[2:])

print()
print("MODERATE AIR QUALITY")
print()
reduced_moderate_quality = reduced_quality[reduced_quality['Air Quality'] == 'Moderate']
print(reduced_moderate_quality.mean())
print()
print(reduced_moderate_quality.describe()[2:])
print()


indexes = pd.get_dummies(reduced_quality['Air Quality'])

indexed_quality = reduced_quality.join(indexes)


binary_quality = indexed_quality.drop(['Air Quality', '--'], axis=1)

print('GOOD CORRELATION')
print(binary_quality.corrwith(binary_quality['Good']))

print()
print('MODERATE CORRELATION')
print(binary_quality.corrwith(binary_quality['Moderate']))

'''
CONCLUSIONS

Better air quality presents
- More O3
- Less NO2
- Less PM10

Lowers NO2 values can indicate that it has been transformed into O3
'''

GOOD AIR QUALITY

O3 Value      34.734735
NO2 Value     34.802550
PM10 Value    15.994325
dtype: float64

      O3 Value  NO2 Value  PM10 Value
std   22.83198  21.417759      7.3575
min    1.00000   1.000000      2.0000
25%   15.00000  16.000000     10.0000
50%   34.00000  32.000000     15.0000
75%   53.00000  51.000000     21.0000
max  100.00000  91.000000     36.0000

MODERATE AIR QUALITY

O3 Value       9.276190
NO2 Value     75.118110
PM10 Value    33.658537
dtype: float64

      O3 Value   NO2 Value  PM10 Value
std  11.002181   25.378285    8.769198
min   1.000000   23.000000    9.000000
25%   2.000000   51.500000   28.000000
50%   3.000000   79.000000   37.000000
75%  13.000000   97.000000   39.000000
max  44.000000  117.000000   44.000000

GOOD CORRELATION
O3 Value      0.175154
NO2 Value    -0.271825
PM10 Value   -0.395422
Good          1.000000
Moderate     -0.658729
dtype: float64

MODERATE CORRELATION
O3 Value     -0.175154
NO2 Value     0.271825
PM10 Value    0.395422
Good 

'\nCONCLUSIONS\n\nBetter air quality presents\n- More O3\n- Less NO2\n- Less PM10\n\nLowers NO2 values can indicate that it has been transformed into O3\n'

### Question 2
#### Which zones have better air quality?

In [11]:
### Q2

# Which zones have better air quality?
located_quality = quality[['Station', 'Air Quality']]

located_good_quality = located_quality[located_quality['Air Quality'] == 'Good']
lgq = located_good_quality.groupby('Station')['Air Quality'].agg(Good='count')

located_moderate_quality = located_quality[located_quality['Air Quality'] == 'Moderate']
lmq = located_moderate_quality.groupby('Station')['Air Quality'].agg(Moderate='count')

location_comparison = pd.merge(lgq,lmq, on='Station', how='left')
print(location_comparison.fillna(0))


'''
CONLCUSION: 

Mostly good values recorded

Most frequently moderated values recorded in:
1. Eixample
2. Gràcia
3. Poblenou
4. Palau Reial

'''

                          Good  Moderate
Station                                 
Barcelona - Ciutadella     701       0.0
Barcelona - Eixample       635      83.0
Barcelona - Gràcia         630      22.0
Barcelona - Observ Fabra   710       0.0
Barcelona - Palau Reial    710       3.0
Barcelona - Poblenou       680      22.0
Barcelona - Sants          671       0.0
Barcelona - Vall Hebron    716       0.0


'\nCONLCUSION: \n\nMostly good values recorded\n\nMost frequently moderated values recorded in:\n1. Eixample\n2. Gràcia\n3. Poblenou\n4. Palau Reial\n\n'

### Question 3

#### Which times have better air quality?

In [12]:
### Q3

# Which times have better air quality?

time_quality = quality[['Air Quality','Generated']]

time_quality['Generated'] = time_quality['Generated'].str[-5:]

time_good_quality = time_quality[time_quality['Air Quality'] == 'Good']
tgq = time_good_quality.groupby('Generated')['Air Quality'].agg(Good = 'count')

time_moderate_quality = time_quality[time_quality['Air Quality'] == 'Moderate']
tmq = time_moderate_quality.groupby('Generated')['Air Quality'].agg(Moderate = 'count')

time_comparison = pd.merge(tgq, tmq, on='Generated', how='left')
print(time_comparison)

'''
CONCLUSIONS

Crucial pollution hours are between:
- 9:00 - 11:00
- 20:00 - 22:00

Aligns with working hours

Note: Generated has some delay (between 1 and 3 hours)

'''

           Good  Moderate
Generated                
 0:00       218         5
 1:00       236         4
 2:00       230         4
 3:00       229         4
 4:00       227         4
 5:00       228         4
 6:00       227         5
 7:00       230         5
 8:00       233         5
 9:00       231         7
10:00       219        12
11:00       230         9
12:00       250         4
13:00       218         2
14:00       241         2
15:00       220         3
16:00       230         4
17:00       226         4
18:00       232         4
19:00       223         3
20:00       222        10
21:00       220        11
22:00       220        10
23:00       213         5


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys


'\nCONCLUSIONS\n\nCrucial pollution hours are between:\n- 9:00 - 11:00\n- 20:00 - 22:00\n\nAligns with working hours\n\nNote: Generated has some delay (between 1 and 3 hours)\n\n'

### Comparison between Barcelona and Murcia values


In [13]:

quality_murcia = pd.read_csv('../datasets/2.-Urban-Environment/AirQuality_Murcia_02_11_2017.csv', sep='\t')

quality_murcia['Generated'] = quality_murcia['Generated'].str[-6:-1]


print('MURCIA MEAN VALUES')
print(quality_murcia.mean())
print()
print('BARCELONA MEAN VALUES')
print(reduced_quality.mean())

#Son bastante parecidos

MURCIA MEAN VALUES
O3 Value      36.541667
NO2 Value     33.125000
PM10 Value    18.416667
dtype: float64

BARCELONA MEAN VALUES
O3 Value      34.082907
NO2 Value     35.740293
PM10 Value    16.590074
dtype: float64
