In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# Forecasting

According to wikepedia, Forecasting is the process of making predictions of the future based on past and present data and most commonly by analysis of trends. A commonplace example might be estimation of some variable of interest at some specified future date. Prediction is a similar, but more general term. Both might refer to formal statistical methods employing time series, cross-sectional or longitudinal data, or alternatively to less formal judgmental methods. Usage can differ between areas of application: for example, in hydrology the terms "forecast" and "forecasting" are sometimes reserved for estimates of values at certain specific future times, while the term "prediction" is used for more general estimates, such as the number of times floods will occur over a long period. 

Investors utilize forecasting to determine if events affecting a company, such as sales expectations, will increase or decrease the price of shares in that company. Stock analysts use forecasting to extrapolate how trends, such as GDP or unemployment, will change in the coming quarter or year. 

# How to choose right method for forecasting?
-Right choice of forecasting is very essential because it helps you to derive accurate insights. The reason why we're discussing the right method is because we have different types of forecastung methods and different types have different ways and statistics to do forecasting. So let's discuss different types of forecasting based on today’s business problem. 

1.	Inputs vs. Outputs

Inputs: Historical data provided to the model in order to make a single forecast.
Outputs: Prediction or forecast for a future time step beyond the data provided as input.

2.	Endogenous vs. Exogenous

Endogenous: Input variables that are influenced by other variables in the system and on which the output variable depends on input variable.
Exogenous: Input variables that are not influenced by other variables in the system and on which the output variable depends.

3.	Unstructured vs. Structured

Unstructured: No obvious systematic time-dependent pattern in a time series variable.
Structured: Systematic time-dependent patterns in a time series variable (e.g. trend and/or seasonality).

4.	Regression vs. Classification

Regression: Forecast a numerical quantity.

Classification: Classify as one of two or more labels.

5.	Univariate vs. Multivariate

Univariate: One variable measured over time.

Multivariate: Multiple variables measured over time.

6.	Single-step vs. Multi-step

One-Step: Forecast the next time step.

Multi-Step: Forecast more than one future time steps.

7.	Static vs. Dynamic

Static. A forecast model is fit once and used to make predictions.

Dynamic. A forecast model is fit on newly available data prior to each prediction.

8.	Contiguous vs. Discontiguous

Contiguous. Observations are made uniform over time.

Discontiguous. Observations are not uniform over time.

**What are different types of forecasting methods?**
The most common forecasting methods are given below:
1. ARIMA
2. SARIMA
3. Exponential Smoothning
4. Facebook Prophet Forecasting
5. RNN
6. LSTM

Please comment below if you know any other type of forecasting. I'll be happy to learn 🙂.

In this notebook we're going to learn multivarient time series forecasting. Statistically, **Multivarient Analysis** is a statistical procedure for analysis of data involving more than one type of measurement or observation. It may also mean solving problems where more than one dependent variable is analyzed simultaneously with other variables.

# Importing packages
To design any machine learning or deep learning model we will need some libraries like **Pandas**, **Numpy**, **Matplotlib** etc. 
For this multivarient time series, I'm using LSTM layer along with **Dense** and **Dropout** layers. To use this layers we need to import them from keras. Along with these layers we'll need a modeling API so, we'll import **Sequential Model?** from keras.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from tensorflow import keras
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout , LSTM , Bidirectional 

import tensorflow.compat.v1 as tf
print(tf.test.gpu_device_name())
# See https://www.tensorflow.org/tutorials/using_gpu#allowing_gpu_memory_growth
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
from sklearn.preprocessing import MinMaxScaler

# Importing Data using pandas library

In [None]:
train = pd.read_csv("../input/google-stock-price/Google_Stock_Price_Train.csv")
test = pd.read_csv("../input/google-stock-price/Google_Stock_Price_Test.csv")

In [None]:
train.head()

In [None]:
test.head()

The first step towards every time series model is to **set date/ month/ week/ day/time as index**

In [None]:
train = train.set_index("Date")
test = test.set_index("Date")

# Data Cleaning

In this data we have two columns  "Volume" and "Close" which have numeric data, but with comma, which behaves like a string. So, we'll first replace those comma and then change the datatype as float.

In [None]:
train["Volume"] = train["Volume"].replace(",", "",regex=True)
train["Close"] = train["Close"].replace(",", "",regex=True)

train["Volume"] = train["Volume"].astype("float")
train["Close"] = train["Close"].astype("float")
print("train dataset shape", train.shape)
print("test dataset shape", test.shape)
train.info()

We will follow the same step as above for testing dataset as well, so that both training and testing data will be in same page.

In [None]:
test["Volume"] = test["Volume"].replace(",", "",regex=True)
test["Close"] = test["Close"].replace(",", "",regex=True)

test["Volume"] = test["Volume"].astype("float")
test["Close"] = test["Close"].astype("float")
test.info()

# Normalization

In [None]:
scale = MinMaxScaler()

num_col = ["High", "Low", "Close", "Volume"]
train1 = scale.fit(train[num_col].to_numpy())

train.loc[:, num_col] = train1.transform(train[num_col].to_numpy())
test.loc[:,num_col] = train1.transform(test[num_col].to_numpy())



In [None]:
#Output variable
scale1 = MinMaxScaler()
Open = scale1.fit(train[["Open"]])
train["Open"] = Open.transform(train[["Open"]].to_numpy())
test["Open"] = Open.transform(test[["Open"]].to_numpy())

# Data preparation before building model

In [None]:
from tqdm import tqdm_notebook as tqdm
tqdm().pandas()
def prepare_data(X,y,time_steps=1):
    Xs = []
    Ys = []
    for i in tqdm(range(len(X) - time_steps)):
        a = X.iloc[i:(i + time_steps)].to_numpy()
        Xs.append(a)
        Ys.append(y.iloc[i+time_steps])
    return np.array(Xs),np.array(Ys)    

In data preparation step, we are trying to assign values to x_train, y_train, x_test and y_test. In our case we're using "Open" coulumn as predicted variable and other as predictors.

In [None]:
steps = 10
X_train , y_train = prepare_data(train,train.Open,time_steps=steps)
X_test , y_test = prepare_data(test,test.Open,time_steps=steps)
print("X_train : {}\nX_test : {}\ny_train : {}\ny_test: {}".format(X_train.shape,X_test.shape,y_train.shape,y_test.shape))

In [None]:
X_train = np.asarray(X_train).astype(np.float32)

# Inputs in LSTM:

•	The input of the LSTM is always is a 3D array. (batch_size, time_steps, seq_len)

•	The output of the LSTM could be a 2D array or 3D array depending upon the return_sequences argument.

•	If return_sequence is False, the output is a 2D array. (batch_size, units)

•	If return_sequence is True, the output is a 3D array. (batch_size, time_steps, units)


In [None]:
model = Sequential()
model.add(LSTM(128,input_shape=(X_train.shape[1],X_train.shape[2])))
model.add(Dropout(0.2))

model.add(Dense(1,activation="sigmoid"))
model.compile(optimizer="adam",loss="mse")

with tf.device('/GPU:0'):
    prepared_model = model.fit(X_train,y_train,batch_size=32,epochs=1000,validation_data=(X_test,y_test))

plt.plot(prepared_model.history["loss"],label="loss")
plt.plot(prepared_model.history["val_loss"],label="val_loss")
plt.legend(loc="best")
plt.xlabel("No. Of Epochs")
plt.ylabel("mse score")

In [None]:
plt.plot(prepared_model.history["loss"],label="loss")
plt.plot(prepared_model.history["val_loss"],label="val_loss")
plt.legend(loc="best")
plt.xlabel("No. Of Epochs")
plt.ylabel("mse score")

In [None]:
pred = model.predict(X_test)

y_test_inv = scale1.inverse_transform(y_test.reshape(-1,1))
pred_inv = scale1.inverse_transform(pred)

plt.figure(figsize=(16,6))
plt.plot(y_test_inv.flatten(),marker=".",label="actual")
plt.plot(pred_inv.flatten(),marker=".",label="prediction",color="r")

In [None]:
y_test_actual = scale1.inverse_transform(y_test.reshape(-1,1))
y_test_pred = scale1.inverse_transform(pred)

arr_1 = np.array(y_test_actual)
arr_2 = np.array(y_test_pred)

actual = pd.DataFrame(data=arr_1.flatten(),columns=["actual"])
predicted = pd.DataFrame(data=arr_2.flatten(),columns = ["predicted"])

In [None]:
final = pd.concat([actual,predicted],axis=1)
final.head()

By loooking at the final data we can say that our model predicted values very near to the actual values. However, if you want you then you can improve the model performance by different model methods such as parameter tunning and GridSearchCV or K fold etc.

In [None]:
from sklearn.metrics import mean_squared_error, r2_score
rmse = np.sqrt(mean_squared_error(final.actual,final.predicted)) 
r2 = r2_score(final.actual,final.predicted) 
print("rmse is : {}\nr2 is : {}".format(rmse,r2))

In [None]:
plt.figure(figsize=(16,6))
plt.plot(final.actual,label="Actual data")
plt.plot(final.predicted,label="predicted values")
plt.legend(loc="best")

If you have a question or feedback, do not hesitate to write and if you like this kernel, please do not forget to UPVOTE 🙂