# About
Multiple 1D filters convolved over all time steps of the Air Quality dataset. For a multivariate multi-step time series forecasting problem.

# Convolutional Neural Network
A typical Convolutional Neural Network (CNN) is composed of three layers:
1. Convolutional layer
2. Activation Layer
3. Pooling layer

CNNs use convolution operations that can handle spatial information available in the time series data. Instead of extracting spatial information from an image, you can use 1D convolutions to extract information along the time dimension of time series data. 

# Libraries

In [1]:
%run "/home/cesar/Python_NBs/HDL_Project/HDL_Project/global_fv.ipynb"

# User-Defined Functions

# Data

## Parameters

In [2]:
station_number = 'SE'
target_name = 'pm25'
lag = 3

# Meteorological parameters
col_names = [i[0] for i in qdata("select meteorological_code from cat_meteorological_params")]

# Default neccesary columns
cols = "datetime, " + target_name

# Columns
for i in col_names:
    cols = cols + ", " + str(i)

print(cols)
    
# Where filter:
where_txt = "where datetime >= \'2021-01-01\'"

datetime, pm25, tout, wdr, wsr, rh, sr, rainf, prs


## Creating samples
A supervised learning algorithm requires that data is provided as a collection of samples, where each sample has an input component (X) and an output component (y).

The dataset will be represented in Python using a NumPy array.

The input to every CNN and LSTM layer must be three-dimensional. The three dimensions of this input are:
* Samples. One sequence is one sample. A batch is comprised of one or more samples.
* Time Steps. One time step is one point of observation in the sample. One sample is comprised of multiple time steps.
* Features. One feature is one observation at a time step. One time step is comprised of one or more features.


In [3]:
# Initializing class
main_processed_df = data_processing_stations(station_number, cols, where_txt)

# Execution of processing functions
#initial_df = main_processed_df.initial_df()

# Samples numpy.ndarray object 
X, y = main_processed_df.samples_creation(lag, target_name)

In [4]:
X

[array([[4.0e+00, 0.0e+00, 4.0e+00, 8.6e+01, 7.0e-03, 0.0e+00, 0.0e+00],
        [4.0e+00, 0.0e+00, 3.2e+00, 8.6e+01, 7.0e-03, 0.0e+00, 0.0e+00],
        [0.0e+00, 0.0e+00, 1.0e+01, 6.4e+01, 7.0e-03, 0.0e+00, 0.0e+00]]),
 array([[4.00e+00, 0.00e+00, 3.20e+00, 8.60e+01, 7.00e-03, 0.00e+00,
         0.00e+00],
        [0.00e+00, 0.00e+00, 1.00e+01, 6.40e+01, 7.00e-03, 0.00e+00,
         0.00e+00],
        [0.00e+00, 0.00e+00, 1.27e+01, 3.80e+01, 3.62e-01, 0.00e+00,
         0.00e+00]]),
 array([[0.00e+00, 0.00e+00, 1.00e+01, 6.40e+01, 7.00e-03, 0.00e+00,
         0.00e+00],
        [0.00e+00, 0.00e+00, 1.27e+01, 3.80e+01, 3.62e-01, 0.00e+00,
         0.00e+00],
        [0.00e+00, 0.00e+00, 4.30e+00, 7.50e+01, 7.00e-03, 0.00e+00,
         0.00e+00]]),
 array([[0.00e+00, 0.00e+00, 1.27e+01, 3.80e+01, 3.62e-01, 0.00e+00,
         0.00e+00],
        [0.00e+00, 0.00e+00, 4.30e+00, 7.50e+01, 7.00e-03, 0.00e+00,
         0.00e+00],
        [0.00e+00, 0.00e+00, 1.00e+01, 2.80e+01, 3.97e-01, 0.00

# Data model: CNN

The following equations refer to the following iterator nomenclature:

- i:
- j:
- k:
- l: Involved layer

1. **Convolutional Layer Learning process**: There are three convolution layers. Each layer learns a non-linear representation from the previous layer. The sequential feeding forms hierarchical feature representations. Afterwards, this is fed into the activation layer to...

<center>$c^l_j = \sum_i x^{l-1}_i \ast w^l_{i,j} +b^l_j $</center>
<center>$x^l_j = ReLU(c^l_j)⠀⠀⠀⠀⠀⠀$</center>

Where:
* $\ast$ : Convolution operator
* $x^{l-1}_i$: Input vector to a convolution layer.
* $c^l_j$: Output vector to a convolution layer.
* ReLU: Activation function.
* $w^l_{i,j}$: Filters
* $b^l_{i,j}$: Biases

2. **Flatten Layer**: After processing the three convolutional layers, we use a flatten layer to transform...

<center>$x^l_j = Flatten(x^l_j)$</center>

3. **Fully Connected Layer**: To reduce the dimension of the final output vector.
<center>$x^{l+1}_k = FC(w^{l+1}_{k, j} x^l_j + b^{l+1}_{k, j})$</center>




The model then learns how to automatically extract the features from the raw data that are directly useful for the problem being addressed. This is called representation learning and the CNN achieves this in such a way that the features are extracted regardless of how they occur in the data, so-called transform or distortion invariance.

## lstm with an input layer
For example, the model below defines an input layer that expects 1 or more samples, 3 time steps, and 1 feature....

* model = Sequential()
* model.add(LSTM(32, input_shape=(3, 1)))
* model.add(Dense(1))



The input layer for CNN and LSTM models is specified by the input shape argument on
the first hidden layer of the network.

# Questions
- What is "HIearchical feature representation"?
- Explanation for each layer in this model:
    * Convolution layer
    * Involved layer
    * Activation layer
    * Pooling layer
    * Flatten layer
    * Fully connnected layer
- Why ReLU?

# Sources

- Convolutional Neural Network (CNN) for Time Series Classification
    * https://www.macnica.co.jp/business/ai_iot/columns/135112/

- Deep learning for time series classification: a review
    * https://arxiv.org/pdf/1809.04356v4.pdf
    
- Deep Learning for Time Series Classification
* https://github.com/hfawaz/dl-4-tsc    