## DeepTime : Deep learning framework for time series forecasting
（https://github.com/MRYingLEE/DeepTime-Deep-Learning-Framework-for-Time-Series-Forecasting）
1. The popular stack of Tensorflow 2.x + Keras + Sklearn + Pandas/Numpy was chosen.

2. Jupyter Notebook <font color=red>to be FILLED</font> with Colab Support 

3. SkLearn Terms and Workflow Support

Tensorflow 2.x was chosen by running a cell with the `tensorflow_version` magic **before** you run `import tensorflow`.

Google Colab recommends against using ``pip install`` to specify a particular TensorFlow version for both GPU and TPU backends.

In [0]:
%tensorflow_version 2.x

import tensorflow as tf
print(tf.__version__)
print(tf.test.gpu_device_name())
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

2.1.0

Num GPUs Available:  0


The most often used stack

<font color=red>Don't Mix using classes of Keras and those of tensorflow.Keras</font>

In [0]:
import os
import numpy as np
import pandas as pd
from tensorflow import keras
import pandas as pd
import seaborn as sns
from pylab import rcParams
import matplotlib.pyplot as plt
from matplotlib import rc

from tensorflow.keras.layers import Bidirectional, Dropout, Activation, Dense, LSTM
from tensorflow.python.keras.layers import CuDNNLSTM
from tensorflow.keras.models import Sequential

%matplotlib inline

# Forecasting Task in Business View

Usually the problem in business view comes from your boss, your client or Kaggle.

There are always some metrics requirements and non metrics ones.

## Your task is as the following:<font color=red>(To be filled)</font>

To predict stock’s day HIGH and LOW prices based on its OPEN and historical daily price by LSTM model with assigned stock of Tencent. (
https://github.com/MRYingLEE/Stock-HIGH-LOW-Prediction)

A day trader's shoes are used.

I published that repository before. I won't repeat the detail here. Instead I try to adapt that repository in this framework by <font color=red>FILLING blank cells</font>. 

# Forecasting Task in Data Science View
**Forecasting = Cleaned Data Source + Target Columns + Loss + Metrics**



**Target Columns** : The columns we will predict

**Loss**: A function who will return a single value by comparing target columns and the predicted values.

**Metrics**: 

The loss and metrics concept of a problem is nearly the same as those in Keras. But the loss and metrics can use all columns in the cleaned data source.


**Non metrics requirements**

Such as platform, time , cost

## Cleaned Data Source = 1 Dataframe of Pandas

<font color=red>This is a technical requirement for the following steps</font><Br>
<font color=red>You have to clean data and exploratory data analysis in advance , out of this notebook</font> 

For this example, you may find a lot of exploratory data analysis at https://github.com/MRYingLEE/Stock-HIGH-LOW-Prediction.

### Data provided directly


You have to clean the data source before your run this notebook. 
The typical clean includes:
1. Missed data
2. Unknown data type


Technically, we use 1 dataframe of Pandas as data source.

The dataframe should be indexed by datetime already.

But, you don't need doing feature engineering here!


### Your Pandas data source <font color=red>(To be filled)</font>
BTW, Github.io is a good place to put your data, but it has a size limit of 25M.

In [0]:
DATA_PATH='https://mryinglee.github.io/DailyPrices.csv'

### Domain Knowledge

There are always important information not mentioned by the business view

#### Domain Knowledge in Idea<font color=red>(To be filled)</font>

Return is much better feature than Price.

Usually using the last price as a forecast is not a bad idea.

Outliers are not so important for a day trader. In other words, maybe a day trader will miss some rare opportunities to make exceptional return. At the same time, the model will be more robust to avoid too much effect from outliers.

#### Domain Knowledge in Data<font color=red>(To be filled)</font>

In [0]:
domain_data={
    "TICK_UNIT":0.2, # Tencent has a tick unit of 0.2 HKD.
    "MAX_RANGE_TICKS":100, # Tencent has a normal up and down ticks range.
    "SEQ_LEN":20 # about 1 month
}

### To load data

You may use all kinds of data format, but finally you should provide a  clearned data frame of Pandas.

<font color=red>You have to modify the code here according to your data format and data columns</font>

In [0]:
df=pd.read_csv(DATA_PATH,parse_dates=['DATE']) # Here we use CSV 
df.set_index('DATE',inplace=True)
df = df.sort_values('DATE')
df.head

<bound method NDFrame.head of              OPEN   HIGH    LOW  CLOSE  ADJ CLOSE    VOLUME
DATE                                                       
2015-01-21  126.2  128.9  125.2  128.7      127.1  33788560
2015-01-22  130.5  132.0  129.7  131.6      130.0  39063807
2015-01-23  134.6  134.9  131.1  132.7      131.1  29965533
2015-01-26  136.7  137.1  134.3  137.0      135.3  34952624
2015-01-27  138.0  138.0  133.0  136.0      134.3  24455759
...           ...    ...    ...    ...        ...       ...
2020-01-14  410.0  413.0  396.6  400.4      400.4  26827634
2020-01-15  397.2  403.0  396.2  398.8      398.8  15938138
2020-01-16  399.0  403.0  396.4  400.0      400.0  13770626
2020-01-17  400.0  400.6  396.0  399.0      399.0  13670846
2020-01-20  405.0  405.0  396.0  396.0      396.0  13282412

[1231 rows x 6 columns]>

All float columns should have the type of numerical types instead of "object".

In [0]:
df.dtypes

OPEN         float64
HIGH         float64
LOW          float64
CLOSE        float64
ADJ CLOSE    float64
VOLUME         int64
dtype: object

## Forecasting Task Kind

### Regression, Classification, Adnormal Detection？

This is a regression task

### Univariate or multivariate?


This is a multivariate task

### One step or multisteps?


This is a one-step task.

In [0]:
FORECAST=1 

### Lookback Periods

Usually history (not only the latest record) is used to forecast the future

In [0]:
LOOKBACK=20 # 20 trading days for a month

## Forecasting Task Define <font color=red>(Ignore this section for temporary)</font>

The loss function in a model training is different from a loss function here. 

Here, the loss means a loss function on the target columns an forecasted ones.

Technically, the differences lie on in Keras loss and metrics are based on numpy array of Y and Y_hat, while for problem, the loss and metrics are based on the ALL columns of the whole dataframe.

<font color=red>So far the defined loss and metrics for task are not implemented yet.</font>

### Loss choice: general advices

- Think about what errors you would like to eliminate in the first place and
choose the loss type accordingly (relative, absolute, tolerant to outlier
deviations, etc.)
- Use sample_weight with your losses (and metrics, if they allow it) to
adapt to the business needs, but be aware of diminished interpretability
- Try transforming the target variable (adding constants, target scaling,
power transformations, etc.) while using simpler metrics/losses

### forecast Periods

In [0]:
class DTTask():  
  # init constructor   
  def __init__(self, cleanedDf, targetCols, loss=keras.losses.mean_squared_error, metrics=keras.losses.mean_squared_error):  
    self.cleanedDf = cleanedDf
    self.targetCols=targetCols
    self.loss=loss
    self.metrics=metrics  
    
  # To check the assumptions  
  def check(self): 
    if set(self.targetCols).issubset(self.cleanedDf.columns):  
      print('Good for', self.targetCols)
    else:
      print('Please check:', self.targetCols)


In [0]:
dt_task = DTTask(df,["CLOSE",], keras.losses.mean_squared_error,keras.losses.mean_squared_error)  
dt_task.check() 

Good for ['CLOSE']


In [0]:
dt_task.loss

<function tensorflow.python.keras.losses.mean_squared_error>

# The whole workflow = Feature Engineering, Data Split, Validation and Evaluation

Cross-validated grid-search over a parameter grid is used to choose best hyperparameters.

For time series specific cross validation, you may find 
https://github.com/WenjieZ/TSCV helpful.

But, nested cross-validation (
https://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html ) is not used. The reason is that the nested cv is too complicated and maybe not helpful to predict the future for the recent data has more information in farecasting.

Instead, after the cross validation, all training data is used to refit and the recent data is used as test data.

## Feature Engineering
<font color=red>The most creative part</font>

### Some Guidance

#### Your feature engineering<font color=red>(To be filled)</font>

In [0]:
df['CLOSE-1']=df['CLOSE'].shift(1)       # The LAST price of the previous day
df['OPEN/LAST-1']=df['OPEN']/df['CLOSE-1'] # The ratio of the current OPEN with the LAST price of the prevous day
df['OPEN+1']=df['OPEN/LAST-1'].shift(-1) # The next OPEN price
df['OPEN/HIGH']=df['OPEN']/df['HIGH'] # The ratio of OPEN with HIGH, which belongs to (0,1]
df['LOW/OPEN']=df['LOW']/df['OPEN'] # The ratio of LOW with OPEN, which belongs to (0,1]
df.dropna(inplace=True)
df.head()

Unnamed: 0_level_0,OPEN,HIGH,LOW,CLOSE,ADJ CLOSE,VOLUME,CLOSE-1,OPEN/LAST-1,OPEN+1,OPEN/HIGH,LOW/OPEN
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2015-01-22,130.5,132.0,129.7,131.6,130.0,39063807,128.7,1.013986,1.022796,0.988636,0.99387
2015-01-23,134.6,134.9,131.1,132.7,131.1,29965533,131.6,1.022796,1.030143,0.997776,0.973997
2015-01-26,136.7,137.1,134.3,137.0,135.3,34952624,132.7,1.030143,1.007299,0.997082,0.982443
2015-01-27,138.0,138.0,133.0,136.0,134.3,24455759,137.0,1.007299,1.006618,1.0,0.963768
2015-01-28,136.9,137.0,134.7,136.9,135.2,16216906,136.0,1.006618,0.993426,0.99927,0.98393


In [0]:
TICK_UNIT=domain_data["TICK_UNIT"]
df['HIGH_tick']=(np.round((df['HIGH']-df['OPEN'])/TICK_UNIT)).astype('int32')
df['LOW_tick']=(np.round((df['OPEN']-df['LOW'])/TICK_UNIT)).astype('int32')

df['RANGE_TICKS']=df['HIGH_tick']+df['LOW_tick'] ## ticks between day low and day high

In [0]:
MAX_RANGE_TICKS=domain_data["MAX_RANGE_TICKS"] # We use this to standardize the RANGE_TICKS

In [0]:
def capped_with_1(tick_value):
        return min(1, tick_value)
        
df["HIGH_tick_std"] = df.apply(lambda x: capped_with_1(x["HIGH_tick"]/MAX_RANGE_TICKS),axis=1)
df["LOW_tick_std"] = df.apply(lambda x: capped_with_1(x["LOW_tick"]/MAX_RANGE_TICKS),axis=1)
df["RANGE_TICKS_std"]=df.apply(lambda x: capped_with_1(x["RANGE_TICKS"]/MAX_RANGE_TICKS),axis=1)
#df["LOW_tick_std"].max()

In [0]:
df.dropna(inplace=True)
df.head()

Unnamed: 0_level_0,OPEN,HIGH,LOW,CLOSE,ADJ CLOSE,VOLUME,CLOSE-1,OPEN/LAST-1,OPEN+1,OPEN/HIGH,LOW/OPEN,HIGH_tick,LOW_tick,RANGE_TICKS,HIGH_tick_std,LOW_tick_std,RANGE_TICKS_std
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
2015-01-22,130.5,132.0,129.7,131.6,130.0,39063807,128.7,1.013986,1.022796,0.988636,0.99387,8,4,12,0.08,0.04,0.12
2015-01-23,134.6,134.9,131.1,132.7,131.1,29965533,131.6,1.022796,1.030143,0.997776,0.973997,2,18,20,0.02,0.18,0.2
2015-01-26,136.7,137.1,134.3,137.0,135.3,34952624,132.7,1.030143,1.007299,0.997082,0.982443,2,12,14,0.02,0.12,0.14
2015-01-27,138.0,138.0,133.0,136.0,134.3,24455759,137.0,1.007299,1.006618,1.0,0.963768,0,25,25,0.0,0.25,0.25
2015-01-28,136.9,137.0,134.7,136.9,135.2,16216906,136.0,1.006618,0.993426,0.99927,0.98393,0,11,11,0.0,0.11,0.11


In [0]:
df.dtypes

OPEN               float64
HIGH               float64
LOW                float64
CLOSE              float64
ADJ CLOSE          float64
VOLUME               int64
CLOSE-1            float64
OPEN/LAST-1        float64
OPEN+1             float64
OPEN/HIGH          float64
LOW/OPEN           float64
HIGH_tick            int32
LOW_tick             int32
RANGE_TICKS          int32
HIGH_tick_std      float64
LOW_tick_std       float64
RANGE_TICKS_std    float64
dtype: object

## Data Split

### The numpy values for split

In [0]:
SEQ_LEN = domain_data["SEQ_LEN"] # about 1 month
history_values = df[["RANGE_TICKS_std","OPEN/HIGH",	"LOW/OPEN"]].values
history_values.shape

(1229, 3)

### To make sequences

In [0]:
def to_sequences(data, seq_len):
    d = []

    for index in range(len(data) - seq_len):
        d.append((data[index: index + seq_len]))

    na=np.array(d)
    return na

In [0]:
X = to_sequences(history_values, SEQ_LEN)

In [0]:
use_tscv = True

### Time Series Split

For time series specific cross validation, you may find 
https://github.com/WenjieZ/TSCV helpful.

![gap train test split](http://www.zhengwenjie.net/images/tscv/gap%20train%20test%20split.svg)

In [0]:
if use_tscv:
  !pip install tscv
  from tscv import gap_train_test_split

  X_train, X_test, y_train, y_test = gap_train_test_split(X, history_values[SEQ_LEN:,1:],test_size=SEQ_LEN, gap_size=SEQ_LEN)





In [0]:
X_train.shape

(1169, 20, 3)

In [0]:
y_train.shape

(1169, 2)

## Preprocess

**Traditional Data Split**

In [0]:
def preprocess(data_raw, seq_len, train_split):
    data = to_sequences(data_raw, seq_len)

    num_train = int(train_split * data.shape[0])

    X_train = data[:num_train, :-1,: ]
    y_train = data[:num_train, -1, -2:]

    X_test = data[num_train:, :-1, :]
    y_test = data[num_train:,-1, -2:]

    return X_train, y_train, X_test, y_test

if (not use_tscv): 
  X_train, y_train, X_test, y_test = preprocess(history_values, SEQ_LEN, train_split = 0.95)

In [0]:
X_train.shape

(1169, 20, 3)

In [0]:
y_train.shape

(1169, 2)

## Esitimators
<font color=red>Some built-in, Some for you to add</font>


## Model Dictionary (Not implemented yet)

We may pre-define some models, such as vanilla CNN, LSTM and so on.

**A Training Pipeline = Preprocessing + Model + Postprocessing**

The pipeline concept is similar to the one in SK learning.

There are only one unique problem, but there may be more pipelines.

The pipeline can have neutual networks models and can have statistics based models and even other models.

The same model structure can be used in different pipelines, but may be have different hyperparameters.


**Classical Models as baseline**

Deep learning is not always a better method than classic models.

There is a good summary on https://machinelearningmastery.com/time-series-forecasting-methods-in-python-cheat-sheet/ .

**AutoML as baseline**

There are few AutoML tools ready for time series forecasting. 
One is AutoML of intel-analytics/analytics-zoo
A distributed Automated Machine Learning libary based on ray and tensorflow, keras https://github.com/intel-analytics/analytics-zoo/tree/automl/pyzoo/zoo/automl.

Another is DenisVorotyntsev/AutoSeries: Public solution for AutoSeries competition (https://github.com/DenisVorotyntsev/AutoSeries)

More and more automl for time series may come.

**Your Domain knowledge based baseline**<font color=red>(To be filled)</font>


For example, repeat the latest value as the predicted.

In stock market, it is not too bad to use the last price as the predicted.

## Your model <font color=red>(To be filled)</font>

SK Learn GridSearchCV will be used. So that your model should follow its API requirement. 

In [0]:
from tensorflow.keras.optimizers import Adam
def CreateLSTMModel(learning_rate=0.01,rate = 0.2, n_features=3,activation_function="relu"):
  DROPOUT = rate # In case overfitting

  WINDOW_SIZE = SEQ_LEN

  model = keras.Sequential()

  model.add(LSTM(WINDOW_SIZE, return_sequences=True,input_shape=(WINDOW_SIZE, n_features)))
  model.add(Dropout(rate=DROPOUT))

  model.add(LSTM(WINDOW_SIZE * 2))
  model.add(Dropout(rate=DROPOUT))

  model.add(Dense(units=2,activation = activation_function))

  model.add(Activation('sigmoid')) # So the value output is in the (0,1] 
 
  adam = Adam(lr = learning_rate)
  model.compile(loss='mean_squared_error', optimizer=adam) # Don't forget to compile before return 
  return model

## Model selection and evaluation

### Cross-validation: evaluating estimator performance

Validation helps to evaluate model performance, its quality, its ability to generalize. Validation can be used to select best model to perform on unseen data.



In [0]:
from tscv import GapKFold, GapLeavePOut, GapWalkForward

from enum import Enum
 
class CVType(Enum):
    GapKFold = GapKFold.__name__
    GapLeavePOut = GapLeavePOut.__name__
    GapWalkForward = GapWalkForward.__name__

cvSelected=CVType.GapKFold ## You may choose one of the other 2

![Gap K-Fold](http://www.zhengwenjie.net/images/tscv/gap%20k-fold.svg)

In [0]:
if cvSelected==CVType.GapKFold:
  cv = GapKFold(n_splits=5, gap_before=2, gap_after=1)

![Gap leave p out](http://www.zhengwenjie.net/images/tscv/gap%20leave%20p%20out.svg)

In [0]:
if cvSelected==CVType.GapLeavePOut:
  cv = GapLeavePOut(p=3, gap_before=1, gap_after=2)

![Gap walk forward](http://www.zhengwenjie.net/images/tscv/gap%20walk%20forward.svg)


In [0]:
if cvSelected==CVType.GapWalkForward:
  cv = GapWalkForward(n_splits=3, gap_size=1, test_size=2)

Cross Validation Code here,

In [0]:
X_test.shape

(20, 20, 3)

In [0]:
X_train.shape

(1169, 20, 3)

In [0]:
y_train

array([[0.98078344, 0.98794273],
       [0.99706745, 0.98382353],
       [0.98240469, 0.99701493],
       ...,
       [0.98826436, 0.99375   ],
       [1.        , 0.9869969 ],
       [0.98930481, 0.99219219]])

## Tuning the hyper-parameters of an estimator

In [0]:
import sklearn
from sklearn.model_selection import GridSearchCV

model = keras.wrappers.scikit_learn.KerasRegressor(build_fn=CreateLSTMModel,verbose = 0)
params = {'epochs' : [100],
          'batch_size' : [100],
          'rate': [0.3,0.2],
          'learning_rate' : [0.001,0.01,0.1],
          'activation_function' : ['softmax','relu','tanh','linear']
          # , 'validation_data' : [(X_test, y_test)]
          }
          
regressor = GridSearchCV(estimator = model, param_grid = params, n_jobs = 1, refit=True, cv=cv)
regressor.fit(X_train, y_train)


GridSearchCV(cv=GapKFold(gap_after=1, gap_before=2, n_splits=5),
             error_score=nan,
             estimator=<tensorflow.python.keras.wrappers.scikit_learn.KerasRegressor object at 0x7f8d52ac60b8>,
             iid='deprecated', n_jobs=1,
             param_grid={'activation_function': ['softmax', 'relu', 'tanh',
                                                 'linear'],
                         'batch_size': [100], 'epochs': [100],
                         'learning_rate': [0.001, 0.01, 0.1],
                         'rate': [0.3, 0.2]},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring=None, verbose=0)

## Metrics and scoring: quantifying the quality of predictions

In [0]:
print("Grid scores on development set:")
print()
means = regressor.cv_results_['mean_test_score']
stds = regressor.cv_results_['std_test_score']
for mean, std, params in zip(means, stds, regressor.cv_results_['params']):
    print("%0.3f (+/-%0.03f) for %r"
              % (mean, std * 2, params))
print()

Grid scores on development set:

-0.135 (+/-0.002) for {'activation_function': 'softmax', 'batch_size': 100, 'epochs': 100, 'learning_rate': 0.001, 'rate': 0.3}
-0.135 (+/-0.002) for {'activation_function': 'softmax', 'batch_size': 100, 'epochs': 100, 'learning_rate': 0.001, 'rate': 0.2}
-0.135 (+/-0.002) for {'activation_function': 'softmax', 'batch_size': 100, 'epochs': 100, 'learning_rate': 0.01, 'rate': 0.3}
-0.135 (+/-0.002) for {'activation_function': 'softmax', 'batch_size': 100, 'epochs': 100, 'learning_rate': 0.01, 'rate': 0.2}
-0.135 (+/-0.002) for {'activation_function': 'softmax', 'batch_size': 100, 'epochs': 100, 'learning_rate': 0.1, 'rate': 0.3}
-0.135 (+/-0.002) for {'activation_function': 'softmax', 'batch_size': 100, 'epochs': 100, 'learning_rate': 0.1, 'rate': 0.2}
-0.000 (+/-0.000) for {'activation_function': 'relu', 'batch_size': 100, 'epochs': 100, 'learning_rate': 0.001, 'rate': 0.3}
-0.000 (+/-0.000) for {'activation_function': 'relu', 'batch_size': 100, 'epochs

In [0]:
regressor.cv_results_

{'mean_fit_time': array([25.2106885 , 25.70252218, 24.77644405, 25.44223666, 24.51933465,
        25.15099487, 25.03214288, 24.85007753, 25.09984641, 25.09249034,
        25.22297354, 24.85418949, 24.76678672, 24.96579728, 24.64537811,
        24.23700852, 24.36532106, 24.96893053, 24.55621386, 24.75088468,
        24.25538135, 24.1613955 , 25.06416817, 24.1803854 ]),
 'mean_score_time': array([0.58759995, 0.60103741, 0.58629541, 0.58332   , 0.58352084,
        0.58367658, 0.57658482, 0.58215475, 0.58707466, 0.59184608,
        0.58665824, 0.59420228, 1.08307276, 0.59560256, 0.57706723,
        1.07851915, 0.58693032, 0.58222899, 0.58419766, 0.57169447,
        0.58418589, 0.5769649 , 0.59469161, 0.57157516]),
 'mean_test_score': array([-1.34814910e-01, -1.34814518e-01, -1.34814659e-01, -1.34815167e-01,
        -1.34812951e-01, -1.34816962e-01, -1.32804942e-04, -1.14707658e-04,
        -1.27572501e-04, -1.11092887e-04, -1.95845092e-04, -1.77222673e-04,
        -6.68895464e-02, -6.68901

## Model persistence

In [0]:
print("Best parameters set found on development set:")
print()
print(regressor.best_params_)
print()

Best parameters set found on development set:

{'activation_function': 'linear', 'batch_size': 100, 'epochs': 100, 'learning_rate': 0.01, 'rate': 0.2}



## Validation curves: plotting scores to evaluate models

### Final fit on the whole train data

In [0]:
print("Detailed classification report:")
print()
print("The model is trained on the full development set.")
print("The scores are computed on the full evaluation set.")
print()
y_true, y_pred = y_test, regressor.predict(X_test)
print()

Detailed classification report:

The model is trained on the full development set.
The scores are computed on the full evaluation set.




### Final Evaluation on the test data

We will use the test data to evaluate.

**Metrics at Output Level**

**Primary Metrics** Simiar to the concept of Loss, but at Output Level
No loss is needed

# A best solution is the best pipeline in the validation

We output the best solution


# From Output to Real Target <font color=red>(To be filled)</font>

Usually the model will output some float values, but the real targets to be forecasted are usually not float values, maybe ordinal values. 

Tick based stock price is an example. For AAPL (Apple Inc.), its stock price is units of 0.01.

So you have to convert from output to real targets. Usually this is beyond a forecast model, but this is a must step for practical usages.

In [0]:
y_hat = regressor.predict(X_test)
df_test= df[-X_test.shape[0]:].reset_index()
df_test

Unnamed: 0,DATE,OPEN,HIGH,LOW,CLOSE,ADJ CLOSE,VOLUME,CLOSE-1,OPEN/LAST-1,OPEN+1,OPEN/HIGH,LOW/OPEN,HIGH_tick,LOW_tick,RANGE_TICKS,HIGH_tick_std,LOW_tick_std,RANGE_TICKS_std
0,2019-12-18,380.0,380.0,373.2,377.6,377.6,31560449,371.2,1.023707,0.992055,1.0,0.982105,0,34,34,0.0,0.34,0.34
1,2019-12-19,374.6,375.6,370.4,375.6,375.6,18441112,377.6,0.992055,1.00639,0.997338,0.988788,5,21,26,0.05,0.21,0.26
2,2019-12-20,378.0,378.0,372.2,375.2,375.2,21656659,375.6,1.00639,1.007463,1.0,0.984656,0,29,29,0.0,0.29,0.29
3,2019-12-23,378.0,378.0,373.8,377.8,377.8,11672033,375.2,1.007463,1.003176,1.0,0.988889,0,21,21,0.0,0.21,0.21
4,2019-12-24,379.0,379.8,376.8,376.8,376.8,6360485,377.8,1.003176,1.005308,0.997894,0.994195,4,11,15,0.04,0.11,0.15
5,2019-12-27,378.8,385.4,378.2,384.0,384.0,19180036,376.8,1.005308,0.993229,0.982875,0.998416,33,3,36,0.33,0.03,0.36
6,2019-12-30,381.4,384.6,381.2,383.2,383.2,14328612,384.0,0.993229,0.98904,0.99168,0.999476,16,1,17,0.16,0.01,0.17
7,2019-12-31,379.0,380.2,374.6,375.6,375.6,10378526,383.2,0.98904,1.000532,0.996844,0.988391,6,22,28,0.06,0.22,0.28
8,2020-01-02,375.8,384.8,375.6,382.4,382.4,13991006,375.6,1.000532,1.014644,0.976611,0.999468,45,1,46,0.45,0.01,0.46
9,2020-01-03,388.0,390.0,381.2,383.0,383.0,15313106,382.4,1.014644,0.992167,0.994872,0.982474,10,34,44,0.1,0.34,0.44


In [0]:

df_hat = pd.DataFrame(data =y_hat,columns=['OPEN/HIGH<hat>','LOW/OPEN<hat>'])
df_hat

Unnamed: 0,OPEN/HIGH<hat>,LOW/OPEN<hat>
0,0.993971,0.991406
1,0.993971,0.991405
2,0.993971,0.991405
3,0.993971,0.991405
4,0.993971,0.991405
5,0.993972,0.991405
6,0.993972,0.991406
7,0.993971,0.991406
8,0.993972,0.991406
9,0.993971,0.991406


In [0]:
df_compare=pd.concat([df_test, df_hat], axis=1)

df_compare

Unnamed: 0,DATE,OPEN,HIGH,LOW,CLOSE,ADJ CLOSE,VOLUME,CLOSE-1,OPEN/LAST-1,OPEN+1,OPEN/HIGH,LOW/OPEN,HIGH_tick,LOW_tick,RANGE_TICKS,HIGH_tick_std,LOW_tick_std,RANGE_TICKS_std,OPEN/HIGH<hat>,LOW/OPEN<hat>
0,2019-12-18,380.0,380.0,373.2,377.6,377.6,31560449,371.2,1.023707,0.992055,1.0,0.982105,0,34,34,0.0,0.34,0.34,0.993971,0.991406
1,2019-12-19,374.6,375.6,370.4,375.6,375.6,18441112,377.6,0.992055,1.00639,0.997338,0.988788,5,21,26,0.05,0.21,0.26,0.993971,0.991405
2,2019-12-20,378.0,378.0,372.2,375.2,375.2,21656659,375.6,1.00639,1.007463,1.0,0.984656,0,29,29,0.0,0.29,0.29,0.993971,0.991405
3,2019-12-23,378.0,378.0,373.8,377.8,377.8,11672033,375.2,1.007463,1.003176,1.0,0.988889,0,21,21,0.0,0.21,0.21,0.993971,0.991405
4,2019-12-24,379.0,379.8,376.8,376.8,376.8,6360485,377.8,1.003176,1.005308,0.997894,0.994195,4,11,15,0.04,0.11,0.15,0.993971,0.991405
5,2019-12-27,378.8,385.4,378.2,384.0,384.0,19180036,376.8,1.005308,0.993229,0.982875,0.998416,33,3,36,0.33,0.03,0.36,0.993972,0.991405
6,2019-12-30,381.4,384.6,381.2,383.2,383.2,14328612,384.0,0.993229,0.98904,0.99168,0.999476,16,1,17,0.16,0.01,0.17,0.993972,0.991406
7,2019-12-31,379.0,380.2,374.6,375.6,375.6,10378526,383.2,0.98904,1.000532,0.996844,0.988391,6,22,28,0.06,0.22,0.28,0.993971,0.991406
8,2020-01-02,375.8,384.8,375.6,382.4,382.4,13991006,375.6,1.000532,1.014644,0.976611,0.999468,45,1,46,0.45,0.01,0.46,0.993972,0.991406
9,2020-01-03,388.0,390.0,381.2,383.0,383.0,15313106,382.4,1.014644,0.992167,0.994872,0.982474,10,34,44,0.1,0.34,0.44,0.993971,0.991406


In [0]:
df_compare['HIGH_hat']=np.floor((df_compare['OPEN']/df_compare['OPEN/HIGH<hat>'])/TICK_UNIT)*TICK_UNIT
df_compare['LOW_hat']=np.ceil((df_compare['OPEN']*df_compare['LOW/OPEN<hat>'])/TICK_UNIT)*TICK_UNIT
df_compare['HIGH_diff']=df_compare['HIGH_hat']-df_compare['HIGH']
df_compare['LOW_diff']=df_compare['LOW_hat']-df_compare['LOW']
df_compare['HIGH_diff_tick']=(np.round(df_compare['HIGH_diff']/TICK_UNIT)).astype('int32')
df_compare['LOW_diff_tick']=(np.round(df_compare['LOW_diff']/TICK_UNIT)).astype('int32')
df_compare['RANGE_TICKS_hat']=(np.round((df_compare['HIGH_hat']-df_compare['LOW_hat'])/TICK_UNIT)).astype('int32')

In [0]:
df_compare[['DATE', 'OPEN', 'CLOSE', 'HIGH', 'HIGH_hat','HIGH_diff_tick', 'LOW','LOW_hat','LOW_diff_tick','RANGE_TICKS','RANGE_TICKS_hat']]

Unnamed: 0,DATE,OPEN,CLOSE,HIGH,HIGH_hat,HIGH_diff_tick,LOW,LOW_hat,LOW_diff_tick,RANGE_TICKS,RANGE_TICKS_hat
0,2019-12-18,380.0,377.6,380.0,382.2,11,373.2,376.8,18,34,27
1,2019-12-19,374.6,375.6,375.6,376.8,6,370.4,371.4,5,26,27
2,2019-12-20,378.0,375.2,378.0,380.2,11,372.2,374.8,13,29,27
3,2019-12-23,378.0,377.8,378.0,380.2,11,373.8,374.8,5,21,27
4,2019-12-24,379.0,376.8,379.8,381.2,7,376.8,375.8,-5,15,27
5,2019-12-27,378.8,384.0,385.4,381.0,-22,378.2,375.6,-13,36,27
6,2019-12-30,381.4,383.2,384.6,383.6,-5,381.2,378.2,-15,17,27
7,2019-12-31,379.0,375.6,380.2,381.2,5,374.6,375.8,6,28,27
8,2020-01-02,375.8,382.4,384.8,378.0,-34,375.6,372.6,-15,46,27
9,2020-01-03,388.0,383.0,390.0,390.2,1,381.2,384.8,18,44,27


In [0]:
df_compare

Unnamed: 0,DATE,OPEN,HIGH,LOW,CLOSE,ADJ CLOSE,VOLUME,CLOSE-1,OPEN/LAST-1,OPEN+1,OPEN/HIGH,LOW/OPEN,HIGH_tick,LOW_tick,RANGE_TICKS,HIGH_tick_std,LOW_tick_std,RANGE_TICKS_std,OPEN/HIGH<hat>,LOW/OPEN<hat>,HIGH_hat,LOW_hat,HIGH_diff,LOW_diff,HIGH_diff_tick,LOW_diff_tick,RANGE_TICKS_hat
0,2019-12-18,380.0,380.0,373.2,377.6,377.6,31560449,371.2,1.023707,0.992055,1.0,0.982105,0,34,34,0.0,0.34,0.34,0.993971,0.991406,382.2,376.8,2.2,3.6,11,18,27
1,2019-12-19,374.6,375.6,370.4,375.6,375.6,18441112,377.6,0.992055,1.00639,0.997338,0.988788,5,21,26,0.05,0.21,0.26,0.993971,0.991405,376.8,371.4,1.2,1.0,6,5,27
2,2019-12-20,378.0,378.0,372.2,375.2,375.2,21656659,375.6,1.00639,1.007463,1.0,0.984656,0,29,29,0.0,0.29,0.29,0.993971,0.991405,380.2,374.8,2.2,2.6,11,13,27
3,2019-12-23,378.0,378.0,373.8,377.8,377.8,11672033,375.2,1.007463,1.003176,1.0,0.988889,0,21,21,0.0,0.21,0.21,0.993971,0.991405,380.2,374.8,2.2,1.0,11,5,27
4,2019-12-24,379.0,379.8,376.8,376.8,376.8,6360485,377.8,1.003176,1.005308,0.997894,0.994195,4,11,15,0.04,0.11,0.15,0.993971,0.991405,381.2,375.8,1.4,-1.0,7,-5,27
5,2019-12-27,378.8,385.4,378.2,384.0,384.0,19180036,376.8,1.005308,0.993229,0.982875,0.998416,33,3,36,0.33,0.03,0.36,0.993972,0.991405,381.0,375.6,-4.4,-2.6,-22,-13,27
6,2019-12-30,381.4,384.6,381.2,383.2,383.2,14328612,384.0,0.993229,0.98904,0.99168,0.999476,16,1,17,0.16,0.01,0.17,0.993972,0.991406,383.6,378.2,-1.0,-3.0,-5,-15,27
7,2019-12-31,379.0,380.2,374.6,375.6,375.6,10378526,383.2,0.98904,1.000532,0.996844,0.988391,6,22,28,0.06,0.22,0.28,0.993971,0.991406,381.2,375.8,1.0,1.2,5,6,27
8,2020-01-02,375.8,384.8,375.6,382.4,382.4,13991006,375.6,1.000532,1.014644,0.976611,0.999468,45,1,46,0.45,0.01,0.46,0.993972,0.991406,378.0,372.6,-6.8,-3.0,-34,-15,27
9,2020-01-03,388.0,390.0,381.2,383.0,383.0,15313106,382.4,1.014644,0.992167,0.994872,0.982474,10,34,44,0.1,0.34,0.44,0.993971,0.991406,390.2,384.8,0.2,3.6,1,18,27


# Done

Yes, you have done.

