## LSTM (long short-term memory) based Recurrent NeuralNet for share price forecasting
###### Abdulla Al Blooshi
----------------
- As an overview, the idea behind this model architecture is to assign weights to selected features; in this case the **open** and the **highest** price the stock reached for a given day were selected. These learned weights represents the model's view on the importance of said features from recent and previous time blocks and how they affect the price in the coming day(s).
- This model will be trained on ADNOC's stock price history and the weights it learns will be saved and used via a transfer learning approach to be tested on and predict Borouge's future stock price.
- This was done due to the lack of training data for Borouge's stock price as it (relatively) recently IPO'd
<br>
<br>
> *This is by no means financial advice as I am not a financial expert, and all the data used here is publicly available.*

In [6]:
import numpy as np 
import pandas as pd 
import tensorflow.keras as keras 
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (
    Dense,
    Dropout,
    LSTM
)
from tensorflow.keras.optimizers import SGD
from sklearn.preprocessing import MinMaxScaler

#### Splitting and normalizing the data:
   - It is easier for the model to work with numbers that are closer together; namely in the range of (0,1) for our case
   - The `MinMaxScaler()` transformation is given by:
      
        X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))<br>
        X_scaled = X_std * (max - min) + min <br>
        
        *where min, max = feature_range.*
        > taken straight from the docs

In [7]:
unsplit_dat = pd.read_csv('./data/ADNOCDIST_Historical_Data.csv', index_col='Date')
valid_dat = pd.read_csv('./data/ADNOC_Valid.csv', index_col='Date')
unsplit_dat
training_set = unsplit_dat[['Open','High']].values
valid_set = valid_dat[['Open','High']].values

normalizer = MinMaxScaler(feature_range=(0,1))
scaled_train_set = normalizer.fit_transform(training_set)
scaled_valid_set = normalizer.fit_transform(valid_set)
print(f"Training set\n{scaled_train_set} \n Validation set\n{scaled_valid_set}")

Training set
[[0.70333333 0.68551237]
 [0.70333333 0.68551237]
 [0.69333333 0.6819788 ]
 ...
 [0.21666667 0.17667845]
 [0.22666667 0.18374558]
 [0.3        0.25441696]] 
 Validation set
[[0.8        0.59090909]
 [0.8        0.59090909]
 [0.9        0.68181818]
 [0.86666667 0.72727273]
 [0.9        0.68181818]
 [0.73333333 0.68181818]
 [0.7        0.72727273]
 [0.93333333 0.77272727]
 [0.83333333 0.72727273]
 [0.8        0.81818182]
 [0.63333333 0.5       ]
 [0.53333333 0.31818182]
 [0.83333333 0.59090909]
 [0.83333333 0.72727273]
 [1.         0.81818182]
 [0.76666667 0.86363636]
 [0.7        0.54545455]
 [0.7        0.5       ]
 [0.5        0.5       ]
 [0.26666667 0.09090909]
 [0.46666667 0.04545455]
 [0.3        0.04545455]
 [0.56666667 0.36363636]
 [0.7        0.45454545]
 [0.8        0.5       ]
 [0.7        0.5       ]
 [0.63333333 0.5       ]
 [0.73333333 0.5       ]
 [0.73333333 0.5       ]
 [0.63333333 0.40909091]
 [0.66666667 0.36363636]
 [0.66666667 0.36363636]
 [0.53333333 0

#### LSTMs distinguishing features and other important notes:
- As per the [paper](http://www.bioinf.jku.at/publications/older/2604.pdf) first proposing this architecture by Hochreiter et al. Constant error carousels (CEC) are the central features of LSTMs. Controlling (deciding) the backward propagation of errors through the network.
    - These CEC's are then extended to form what is referred to as a memory cell; the extension adds multiplicative input and output gates. These gates control the contents with in a cell from being propagated and control the cell from activating other units respectively.
- RNNs are able to use recently seen previous information and cannot do so with information with larger time lags between them, this is where the LSTM architecture comes into play.
- This is another one of the distinguishing features of using the LSTM architecture, its ability to 'remember' or erase parts of previously seen data in a window (or timestep).
- By creating a window our training data will be turned into an array of arrays divided into chunks of N, where N would be the size of our timestep/window.
    - for example having N be 60 would allow our model to use the previous sixty days of data to make the prediction for the 61st.


In [10]:
# Bismillah
#TODO: y_train should predict close prices?
#TODO: Consider using sklearn's standardScaler()
X_train=[]
y_train=[]
X_valid=[]
y_valid=[]

for i in range(60,scaled_train_set.shape[0]):
    X_train.append(scaled_train_set[i-60:i,0])
    y_train.append(scaled_train_set[i,0])

for j in range(60,scaled_valid_set.shape[0]):
    X_valid.append(scaled_valid_set[j-60:j,0])
    y_valid.append(scaled_valid_set[j,0])

# Keras accepts numpy arrays
X_train, y_train = np.array(X_train), np.array(y_train)

X_valid, y_valid = np.array(X_valid), np.array(y_valid) 
print(X_train.shape, y_train.shape)

(984, 60) (984,)


#### Implementation Notes:
- The model base architecture will be sequential which ["_groups a linear stack of layers into a tf.keras.Model_"](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential)
- Ideally I would've liked to add more dropout layers but I do not have that luxury as data is limited. The dropout layers aids in reducing the amount of overfitting.

___However, before we can do that the data must be transformed further into a 3D array with X_train learning examples. In our case it will be of dimension (984,60,1), the 984 comes from the number of samples we have, the 60 is because we grouped our samples into groups of 60, and the 1 is because we want the model to access one feature at each timestep___ 


In [12]:
X_train = np.reshape(X_train,(X_train.shape[0],X_train.shape[1],1))
X_valid = np.reshape(X_valid,(X_valid.shape[0],X_valid.shape[1],1))

In [13]:
print(X_train.shape)

(984, 60, 1)


In [14]:
lstm_model = Sequential()
lstm_model.add(LSTM(units=32, activation='tanh', return_sequences=True, input_shape=(X_train.shape[1],1)))
lstm_model.add(Dropout(0.1))
lstm_model.add(LSTM(units=32, return_sequences=True))
lstm_model.add(Dropout(0.05))
lstm_model.add(LSTM(units=32))
lstm_model.add(Dropout(0.05))
lstm_model.add(Dense(units=1))


lstm_model.summary()
lstm_model.compile(optimizer='adam', loss='mean_squared_error', metrics=['accuracy'])

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm (LSTM)                  (None, 60, 32)            4352      
_________________________________________________________________
dropout (Dropout)            (None, 60, 32)            0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 60, 32)            8320      
_________________________________________________________________
dropout_1 (Dropout)          (None, 60, 32)            0         
_________________________________________________________________
lstm_2 (LSTM)                (None, 32)                8320      
_________________________________________________________________
dropout_2 (Dropout)          (None, 32)                0         
_________________________________________________________________
dense (Dense)                (None, 1)                 3

#### Training the model on the data:

In [18]:
lstm_model.fit(X_train, y_train, epochs=100,batch_size=32,validation_data=(X_valid,y_valid))

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100

KeyboardInterrupt: 

<p style="color:orange">TODO: ExponMovingAvg for comparison</p>