<a href="https://colab.research.google.com/github/KamilBartosik/RNN_AirPolutionPrediction/blob/main/RNN_AirPolutionData.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [2]:
import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error, r2_score

from keras.models import Sequential
from keras.layers import LSTM, GRU, Dense, Dropout
from keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau

In [3]:
dataset_path = '/content/gdrive/MyDrive/MachineLearning/Datasets/Air_Pollution/Dataset.csv'
df = pd.read_csv(dataset_path)

In [4]:
df.head()

Unnamed: 0,Date,Temperature,NOx,Wind Direction,Wind Speed,PM2.5
0,1/1/19 0:00,17.2,16.2,18,2.0,17
1,1/1/19 1:00,17.2,17.0,357,2.2,20
2,1/1/19 2:00,17.0,14.6,16,2.3,14
3,1/1/19 3:00,16.8,12.8,6,2.7,15
4,1/1/19 4:00,16.7,16.3,14,2.2,10


In [5]:
df.shape

(1416, 6)

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1416 entries, 0 to 1415
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Date            1416 non-null   object 
 1   Temperature     1416 non-null   float64
 2   NOx             1416 non-null   object 
 3   Wind Direction  1416 non-null   object 
 4   Wind Speed      1416 non-null   object 
 5   PM2.5           1416 non-null   object 
dtypes: float64(1), object(5)
memory usage: 66.5+ KB


In [7]:
wrong_NOx = []
wrong_WD = []
wrong_WS = []
wrong_PM = []

def check_wrong_values(column, values):
  
  for i in range(len(df)):
    try:
      df[column][i] = float(df[column][i])
    except:
      values.append(i)

In [8]:
def display_examples(column, values, ex_11, ex_12, ex_21, ex_22):
  
  print('Wrong values:')
  print(df[column][values])

  print('\nHow value(s) of 1st example look among neighbours:\n')
  print(df[column][ex_11:ex_12])

  print('\nHow value(s) of 2nd example look among neighbours:\n')
  print(df[column][ex_21:ex_22])

In [9]:
check_wrong_values('NOx', wrong_NOx)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # Remove the CWD from sys.path while we load stuff.


In [10]:
display_examples('NOx', wrong_NOx, 157, 164, 1258, 1268)

Wrong values:
159     13.1#                          
161     23#                            
179     18.5#                          
272     49.9#                          
325     10.9#                          
491     26.2#                          
673     38.6#                          
951     11.2#                          
1261    9.1#                           
1263    8.5#                           
1264    12.4#                          
1335    50.1#                          
Name: NOx, dtype: object

How value(s) of 1st example look among neighbours:

157                               12.8
158                               13.0
159    13.1#                          
160                               16.1
161    23#                            
162                               21.6
163                               15.0
Name: NOx, dtype: object

How value(s) of 2nd example look among neighbours:

1258                               15.1
1259                               14

In [11]:
check_wrong_values('Wind Direction', wrong_WD)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # Remove the CWD from sys.path while we load stuff.


In [12]:
display_examples('Wind Direction', wrong_WD, 417, 424, 1089, 1096)

Wrong values:
420     295#                           
1092    0#                             
Name: Wind Direction, dtype: object

How value(s) of 1st example look among neighbours:

417                              268.0
418                              344.0
419                              312.0
420    295#                           
421                              299.0
422                              301.0
423                              302.0
Name: Wind Direction, dtype: object

How value(s) of 2nd example look among neighbours:

1089                               17.0
1090                              338.0
1091                               31.0
1092    0#                             
1093                              284.0
1094                              221.0
1095                              256.0
Name: Wind Direction, dtype: object


In [13]:
check_wrong_values('Wind Speed', wrong_WS)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # Remove the CWD from sys.path while we load stuff.


In [14]:
display_examples('Wind Speed', wrong_WS, 417, 424, 1089, 1096)

Wrong values:
420     1.6#                           
1092    0#                             
Name: Wind Speed, dtype: object

How value(s) of 1st example look among neighbours:

417                                1.4
418                                1.6
419                                2.1
420    1.6#                           
421                                2.5
422                                2.5
423                                2.5
Name: Wind Speed, dtype: object

How value(s) of 2nd example look among neighbours:

1089                                2.1
1090                                2.7
1091                                1.5
1092    0#                             
1093                                1.7
1094                                3.9
1095                                2.4
Name: Wind Speed, dtype: object


In [15]:
check_wrong_values('PM2.5', wrong_PM)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # Remove the CWD from sys.path while we load stuff.


In [16]:
print('Wrong values:')
print(df['PM2.5'][wrong_PM])

Wrong values:
37      16#                            
38      27#                            
159     793#                           
178     745#                           
323     27#                            
324     30#                            
420     1#                             
491     785#                           
492     33#                            
580     161*                           
581     157x                           
582     155x                           
606     98x                            
734     43#                            
735     27#                            
950     784#                           
1091    33#                            
1092    47#                            
1262    22#                            
1263    753#                           
1264    42#                            
1335    800#                           
1359    25#                            
1360    174#                           
1361    170#              

In [17]:
def replace_hashes(column, values):
  
  for i in values:
    df[column][i] = df[column][i].replace("#", "")

replace_hashes('NOx', wrong_NOx)
replace_hashes('Wind Direction', wrong_WD)
replace_hashes('Wind Speed', wrong_WS)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.


In [18]:
df = df.astype({'NOx':'float', 'Wind Direction':'int', 'Wind Speed':'float'})

In [19]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1416 entries, 0 to 1415
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Date            1416 non-null   object 
 1   Temperature     1416 non-null   float64
 2   NOx             1416 non-null   float64
 3   Wind Direction  1416 non-null   int64  
 4   Wind Speed      1416 non-null   float64
 5   PM2.5           1416 non-null   object 
dtypes: float64(3), int64(1), object(2)
memory usage: 66.5+ KB


In [20]:
df.describe()

Unnamed: 0,Temperature,NOx,Wind Direction,Wind Speed
count,1416.0,1416.0,1416.0,1416.0
mean,20.950282,19.421398,116.644774,2.450777
std,3.438216,8.622356,141.731594,0.801641
min,12.0,5.7,0.0,0.0
25%,18.7,13.6,13.0,1.9
50%,20.5,17.3,24.0,2.5
75%,23.4,23.0,301.0,3.0
max,30.5,73.1,360.0,4.8
