<a href="https://colab.research.google.com/github/Pimentell/timeSeriesAnalysysAIQ/blob/main/Time_Series_AQI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# __Neural Nets for Time Series Analysis.__

### __Import Modules and Python Dependencies__

In [None]:
import os
import datetime 
import IPython
import IPython.display
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import tensorflow as tf

### __Creating Directory and Download Data from Google Drive__

In [None]:
!rm -rf sample_data
!gdown 1GN1P7mrqLIIqIIfVtLluCW-s_PSgQf4E
!mkdir data
!unzip data.zip  -d data 


# __Air Quality Data in India (2015 - 2020)__

### Multivariate Time Series
#### __Import Data__

In [None]:
data = pd.read_csv("data/city_day.csv", parse_dates=True)
data['Date'] = data['Date'].apply(pd.to_datetime)
data.set_index('Date',inplace=True)

- __PM2.5 (Particulate Matter 2.5-micrometer)__: measured in ug / m3 (micrograms per cubic meter of air)
- __PM10 (Particulate Matter 10-micrometer)__: measured in ug / m3 (micrograms per cubic meter of air)
- __SO2 (Sulphur Dioxide)__:measured in ug / m3 (micrograms per cubic meter of air)
- __NOx (Any Nitric x-oxide)__: measured in ppb (parts per billion)
- __NH3 (Ammonia)__: measured in ug / m3 (micrograms per cubic meter of air)
- __CO (Carbon Monoxide)__:CO is measured in mg / m3 (milligrams per cubic meter of air)
- __O3 (Ozone or Trioxygen)__: O3 is measured in ug / m3 (micrograms per cubic meter of air)

- __AQI__: AIR Quality Index
![AQI](https://drive.google.com/uc?id=1zdwS3uFmkytjB4xlDOImnbnsVCaDAYzp)

### __Calidad del Aire por Ciudad__


In [None]:
data[["City", "AQI"]].groupby(["City"]).mean().plot.barh(figsize=(20,10))
plt.show()

### __Correlación calidad del aire y variables predictivas__

In [None]:
data = data[data.City == "Delhi"] # Solo para nueva Delhi
data[["AQI"]].plot(figsize=(20,10))
plt.show()

In [None]:
data_corr = data.corr()\
  .reset_index()[["index", "AQI"]]\
  .sort_values(by = "AQI", ascending=False)\
  .rename(columns={"index":"Variable"})
data_corr = data_corr[data_corr.Variable != "AQI"]
data_corr.plot.bar(x="Variable", figsize=(20,10))
plt.show()

### __Data Cleaning__

In [None]:
# Valores nulos por columna
# Eliminamos variables que no son necesarias o no se encuentran dentro del calculo del Indice de Calidad del Aire: 
data = data.drop(columns = ["Benzene", "Toluene", "Xylene", "City", "AQI_Bucket"])
data.isna().mean()

 El tratamiento normal para los datos nulos es la imputación. 

- Mean
- Median
- Regression

In [None]:
data.fillna(data.mean(), inplace = True)
data.isna().mean()

In [None]:
plt.figure(figsize=(8,6))
plt.hist2d(data['PM10'], data['AQI'])
plt.colorbar()
plt.title("PM10 VS AQI")
plt.xlabel("PM10")
plt.ylabel('AQI')
plt.show()

### __Preparación de los Datos__

In [None]:
data.to_csv("clean_data.csv", index = False)

with open("clean_data.csv") as f: 
  clean_data = f.read()

lines = clean_data.split("\n")[:-1]
header = lines[0].split(",")
lines = lines[1:]

aqi = np.zeros(len(lines),)
raw_variables = np.zeros((len(lines), len(header)-1))

for i, line in enumerate(lines): 
  values = [x for x in line.split(",")]
  aqi[i] = values[-1]
  raw_variables[i,:] = values[:-1]
  

In [None]:
num_train_samples = int(0.5 * len(raw_variables))
num_val_samples = int(0.25 * len(raw_variables))
num_test_samples = len(raw_variables) - num_train_samples - num_val_samples

print("Entradas de entrenamiento: ", num_train_samples)
print("Entradas de validation: ", num_val_samples)
print("Entradas de test: ", num_test_samples)


In [None]:
# Normalización de los datos 
mean = raw_variables[:num_train_samples].mean(axis=0)
raw_variables -= mean
std = raw_variables[:num_train_samples].std(axis=0)
raw_variables /= std
raw_variables[0]

In [None]:
train_x = raw_variables[:num_train_samples]
val_x = raw_variables[num_train_samples:num_train_samples + num_val_samples]
test_x = raw_variables[num_train_samples + num_val_samples:]

In [None]:
train_y = aqi[:num_train_samples]
val_y = aqi[num_train_samples:num_train_samples + num_val_samples]
test_y = aqi[num_train_samples + num_val_samples:]

# __Basic Linear Regresion__

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from math import sqrt

reg = LinearRegression().fit(train_x, train_y)
R2 = reg.score(train_x, train_y)
print("Coeficiente de Determinación: ", R2)

In [None]:
train_predict = reg.predict(train_x)
test_predict = reg.predict(test_x)
val_predict = reg.predict(val_x)

rmse_train = sqrt(mean_squared_error(train_y, train_predict))
rmse_test = sqrt(mean_squared_error(test_y, test_predict))
print('Train RMSE: %.3f' % rmse_train)
print('Test RMSE: %.3f' % rmse_test)

In [None]:
t = np.arange(0,len(test_y),1)

plt.figure(figsize=(20,4))
plt.title("Test DataSet")
plt.plot(t,test_y,label="actual")
plt.plot(t,test_predict,'r',label="predicted")
plt.show()

# __Neural Nets Models__

![Neuron](https://drive.google.com/uc?id=10xLgWxoWUpHzX2kxVSE1MvfJ6w10fZrt)

Recuperado de: https://devskrol.com/wp-content/uploads/2020/11/neuron-296581_1280.png

## __Perceptron__

![](https://drive.google.com/uc?id=1J_SWo3N8ydc7vymRavEG_7B3pOsY7S8J)


- Originalmente Pensados para problemas de clasificación. 

El perceptron se compone de tres partes fundamentales: 

1. Pesos (w_n)
2. Sesgo 
3. Función de Activación


__Pesos__: Son asignados aleatoriamente una vez compilado el perceptron y deben ser siempre mayores a 0 Para garantizar la convergencia del metodo de optimización (Stocastic Gradient Descent)

__Bias__: Modifica el boundary con el origen de la función sin tener relación con el comportamiento de los inputs. 


__Función de Activacion__: Permite la toma de decisiones a través de reglas para la asignación del output de la ejecución. Existen varios tipos de función de activación: 
- Sigmoide
- Step Function  



Pregunta: 

- Cómo es la Función que representa el proceso de estimación de Outputs de un Perceptron?

# __Gradient Descent__

Gradient Descent es un enfoque de optimización en Machine Learning que puede identificar las mejores soluciones para una amplia gama de problemas. Funciona ajustando iterativamente los parámetros para minimizar la función de costo.

![](https://drive.google.com/uc?id=1YLwdXitPesJyQ2D-gcJx0kHfJTw55Hqe)




Pregunta: Cual es la condicional de la función de costos para usar como metodo de optimización al Descenso del Gradiente?



# __Base Model__

Densely Conected Layers

![](https://drive.google.com/uc?id=1nSRvHK6uLAs1e2mZwkRh1o1U90uEZx1g)

In [None]:
from tensorflow import keras
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
from keras.layers import Dropout
from tensorflow.keras import layers


model = Sequential()
model.add(layers.Flatten())
model.add(layers.Dense(16, activation="relu"))
model.add(layers.Dense(1))
model.compile(loss='mse', optimizer='rmsprop', metrics="mae")

In [None]:
history = model.fit(train_x, train_y, epochs=150, validation_data=(test_x, test_y), shuffle=False)

In [None]:
train_predict = model.predict(train_x)    
test_predict = model.predict(test_x)
val_predict = model.predict(val_x)  

In [None]:
from sklearn.metrics import mean_squared_error
from math import sqrt

rmse_train = sqrt(mean_squared_error(train_y, train_predict))
rmse_test = sqrt(mean_squared_error(test_y, test_predict))
rmse_val = sqrt(mean_squared_error(val_y, val_predict))
print('Train RMSE: %.3f' % rmse_train)
print('Test RMSE: %.3f' % rmse_test)
print('Val RMSE: %.3f' % rmse_val)

In [None]:
t = np.arange(0,len(test_y),1)

plt.figure(figsize=(20,4))
plt.title("Test DataSet")
plt.plot(t,test_y,label="actual")
plt.plot(t,test_predict,'r',label="predicted")
plt.show()

In [None]:
t = np.arange(0,len(train_y),1)

plt.figure(figsize=(20,4))
plt.title("Train DataSet")
plt.plot(t,train_y,label="actual")
plt.plot(t,train_predict,'r',label="predicted")
plt.show()

## __LSTM__ 

Long Short Term Memory Nets

Long Short Term Memory (LSTM) es una red neuronal artificial utilizada en los campos de la inteligencia artificial y el aprendizaje profundo. A diferencia de las redes neuronales estándar, LSTM tiene conexiones de retroalimentación. Tal red neuronal recurrente (RNN) puede procesar no solo puntos de datos individuales (como imágenes), sino también secuencias completas de datos (como voz o video). Por ejemplo, LSTM es aplicable a tareas como el reconocimiento de escritura a mano conectado y no segmentado, reconocimiento de voz, traducción automática, control de robots, videojuegos y atención médica. LSTM se ha convertido en la red neuronal más citada del siglo 20.

Recuperado de Wikipedia: https://en.wikipedia.org/wiki/Long_short-term_memory

In [None]:
train_x = train_x.reshape((train_x.shape[0], 1, train_x.shape[1]))
test_x = test_x.reshape((test_x.shape[0], 1, test_x.shape[1]))
val_x = val_x.reshape((val_x.shape[0], 1, val_x.shape[1]))

In [None]:
from tensorflow import keras
from tensorflow.keras import layers
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
from keras.layers import Dropout


callbacks = keras.callbacks.ModelCheckpoint(
    filepath="models/lstm.keras", 
    save_best_only=True, 
    monitor="val_loss"
)

model = Sequential()
model.add(LSTM(300))
model.add(Dropout(0.2))
model.add(Dense(1))
model.compile(loss='mse', optimizer='rmsprop', metrics="mae")

In [None]:
history = model.fit(train_x, train_y, epochs=150, validation_data=(test_x, test_y), shuffle=False, callbacks=callbacks)

In [None]:
plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='test')
plt.legend()
plt.show()

In [None]:
model = keras.models.load_model("models/lstm.keras")

In [None]:
train_predict = model.predict(train_x)    
test_predict = model.predict(test_x)
val_predict = model.predict(val_x)  

In [None]:
from sklearn.metrics import mean_squared_error
from math import sqrt

rmse_train = sqrt(mean_squared_error(train_y, train_predict))
rmse_test = sqrt(mean_squared_error(test_y, test_predict))
print('Train RMSE: %.3f' % rmse_train)
print('Test RMSE: %.3f' % rmse_test)

In [None]:
t = np.arange(0,len(test_y),1)

plt.figure(figsize=(20,4))
plt.title("Test DataSet")
plt.plot(t,test_y,label="actual")
plt.plot(t,test_predict,'r',label="predicted")
plt.show()

In [None]:
t = np.arange(0,len(train_y),1)

plt.figure(figsize=(20,4))
plt.title("Train DataSet")
plt.plot(t,train_y,label="actual")
plt.plot(t,train_predict,'r',label="predicted")
plt.show()