<a href="https://colab.research.google.com/github/duonghung86/Vehicle-trajectory-tracking/blob/master/Codes/VTP_1_10_Models_for_full_dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Developing models for full dataset
---

Overview on the configuration of all simple models

|Model name|Description| # Car|# vars|# targets|# const_vars| # steps | # futures
|---|:--|:-:|:-:|:-:|:-:|:-:|:-:|
Model 1|Baseline model with multi input time steps|1|1|1|0|**4**|1| 
Model 2|Add multi output time steps|1|1|1|0|4|**2**|
Model 3|Add multi input series|1|**2**|1|0|4|2|
Model 4|Add multi `output` series|1|2|**2**|0|4|2|
Model 5|Add multi constant input|1|2|2|**2**|4|2|
Model 6|Add more objects|**4**|2|2|2|4|2|


## Import packages

In [1]:
# For general
import matplotlib.pyplot as plt
import numpy as np
import time
plt.rcParams['figure.figsize'] = (8, 6)

# For data processing
import pandas as pd
from sklearn.model_selection import train_test_split

# For prediction model
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM

## Load dataset

In [2]:
url_1 = 'https://github.com/duonghung86/Vehicle-trajectory-tracking/raw/master/Data/NGSIM/0750_0805_us101_smoothed_11_.zip'
zip_path = tf.keras.utils.get_file(origin=url_1, fname=url_1.split('/')[-1], extract=True)
csv_path = zip_path.replace('zip','csv')
csv_path

'C:\\Users\\Duong Hung\\.keras\\datasets\\0750_0805_us101_smoothed_11_.csv'

Let's take a glance at the data. Here are the first few rows:

In [3]:
df = pd.read_csv(csv_path)
df.info()
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1048575 entries, 0 to 1048574
Data columns (total 18 columns):
 #   Column        Non-Null Count    Dtype  
---  ------        --------------    -----  
 0   Vehicle_ID    1048575 non-null  int64  
 1   Frame_ID      1048575 non-null  int64  
 2   Total_Frames  1048575 non-null  int64  
 3   Global_Time   1048575 non-null  int64  
 4   Local_X       1048575 non-null  float64
 5   Local_Y       1048575 non-null  float64
 6   Global_X      1048575 non-null  float64
 7   Global_Y      1048575 non-null  float64
 8   v_Length      1048575 non-null  float64
 9   v_Width       1048575 non-null  float64
 10  v_Class       1048575 non-null  int64  
 11  v_Vel         1048575 non-null  float64
 12  v_Acc         1048575 non-null  float64
 13  Lane_ID       1048575 non-null  int64  
 14  Preceeding    1048575 non-null  int64  
 15  Following     1048575 non-null  int64  
 16  Space_Hdwy    1048575 non-null  float64
 17  Time_Hdwy     1048575 non-n

Unnamed: 0,Vehicle_ID,Frame_ID,Total_Frames,Global_Time,Local_X,Local_Y,Global_X,Global_Y,v_Length,v_Width,v_Class,v_Vel,v_Acc,Lane_ID,Preceeding,Following,Space_Hdwy,Time_Hdwy
0,2,13,437,1118846980200,16.467196,35.380427,6451137.641,1873344.962,14.5,4.9,2,40.0,0.0,2,0,0,0.0,0.0
1,2,14,437,1118846980300,16.446594,39.381608,6451140.329,1873342.0,14.5,4.9,2,40.012349,0.123485,2,0,0,0.0,0.0
2,2,15,437,1118846980400,16.425991,43.381541,6451143.018,1873339.038,14.5,4.9,2,39.999855,-0.124939,2,0,0,0.0,0.0
3,2,16,437,1118846980500,16.405392,47.38078,6451145.706,1873336.077,14.5,4.9,2,39.99292,-0.069349,2,0,0,0.0,0.0
4,2,17,437,1118846980600,16.384804,51.379881,6451148.395,1873333.115,14.5,4.9,2,39.991544,-0.013759,2,0,0,0.0,0.0


Next look at the statistics of the dataset:

In [4]:
df.describe().transpose().round(3)

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Vehicle_ID,1048575.0,1533.08,790.271,2.0,932.0,1574.0,2210.0,2783.0
Frame_ID,1048575.0,4518.249,2412.479,8.0,2455.0,4586.0,6598.0,8906.0
Total_Frames,1048575.0,560.877,146.577,177.0,464.0,518.0,640.0,1010.0
Global_Time,1048575.0,1118847000000.0,241247.914,1118847000000.0,1118847000000.0,1118847000000.0,1118848000000.0,1118848000000.0
Local_X,1048575.0,29.406,16.666,0.534,17.284,29.557,41.875,73.478
Local_Y,1048575.0,1002.056,596.357,17.966,488.396,964.028,1491.548,2195.47
Global_X,1048575.0,6451838.0,446.275,6451107.0,6451450.0,6451808.0,6452205.0,6452734.0
Global_Y,1048575.0,1872677.0,397.006,1871875.0,1872352.0,1872699.0,1873015.0,1873365.0
v_Length,1048575.0,14.635,4.87,4.0,12.0,14.5,16.5,76.1
v_Width,1048575.0,6.132,1.037,2.0,5.4,6.0,6.9,8.5


In [5]:
df.columns

Index(['Vehicle_ID', 'Frame_ID', 'Total_Frames', 'Global_Time', 'Local_X',
       'Local_Y', 'Global_X', 'Global_Y', 'v_Length', 'v_Width', 'v_Class',
       'v_Vel', 'v_Acc', 'Lane_ID', 'Preceeding', 'Following', 'Space_Hdwy',
       'Time_Hdwy'],
      dtype='object')

In [6]:
#  keep only columns that are useful for now
kept_cols = ['Vehicle_ID', 'Frame_ID', 'Total_Frames', 'Local_X','Local_Y','v_Length', 'v_Width', 'v_Class',
       'v_Vel', 'v_Acc', 'Lane_ID']
df = df[kept_cols]
df.head()

Unnamed: 0,Vehicle_ID,Frame_ID,Total_Frames,Local_X,Local_Y,v_Length,v_Width,v_Class,v_Vel,v_Acc,Lane_ID
0,2,13,437,16.467196,35.380427,14.5,4.9,2,40.0,0.0,2
1,2,14,437,16.446594,39.381608,14.5,4.9,2,40.012349,0.123485,2
2,2,15,437,16.425991,43.381541,14.5,4.9,2,39.999855,-0.124939,2
3,2,16,437,16.405392,47.38078,14.5,4.9,2,39.99292,-0.069349,2
4,2,17,437,16.384804,51.379881,14.5,4.9,2,39.991544,-0.013759,2


In [7]:
'the number of vehicles is {}'.format(len(df.Vehicle_ID.unique()))

'the number of vehicles is 1993'

# Model 1

### Constant values

In [8]:
n_steps = 3
n_future = 1
n_features = len(df)
series_feature_names = ['Local_X','v_Vel','Local_Y', 'v_Acc', 'Lane_ID']
target_names = ['Local_X','v_Vel']
n_labels = len(target_names)
vehicle_ids = df.Vehicle_ID.unique()

## Data preparation

### `series2seq`: Function that return sequence input and output for one object

**Arguments**:

- data: Sequence of observations as a Pandas dataframe.
- n_in: Number of lag observations as input (X).
- n_out: Number of observations as output (y).
- **series_features**: names of series features
- labels: name of target variables
- dropnan: Boolean whether or not to drop rows with NaN values.
    
**Returns**:
- X: Feature Pandas DataFrame
- y: Label Pandas dataframe

  

In [9]:
# Test data frame
# data set of the vehicle #2
df_test = df[df.Vehicle_ID==vehicle_ids[0]].copy()
df_test.info()
df_test.head()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 437 entries, 0 to 436
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Vehicle_ID    437 non-null    int64  
 1   Frame_ID      437 non-null    int64  
 2   Total_Frames  437 non-null    int64  
 3   Local_X       437 non-null    float64
 4   Local_Y       437 non-null    float64
 5   v_Length      437 non-null    float64
 6   v_Width       437 non-null    float64
 7   v_Class       437 non-null    int64  
 8   v_Vel         437 non-null    float64
 9   v_Acc         437 non-null    float64
 10  Lane_ID       437 non-null    int64  
dtypes: float64(6), int64(5)
memory usage: 41.0 KB


Unnamed: 0,Vehicle_ID,Frame_ID,Total_Frames,Local_X,Local_Y,v_Length,v_Width,v_Class,v_Vel,v_Acc,Lane_ID
0,2,13,437,16.467196,35.380427,14.5,4.9,2,40.0,0.0,2
1,2,14,437,16.446594,39.381608,14.5,4.9,2,40.012349,0.123485,2
2,2,15,437,16.425991,43.381541,14.5,4.9,2,39.999855,-0.124939,2
3,2,16,437,16.405392,47.38078,14.5,4.9,2,39.99292,-0.069349,2
4,2,17,437,16.384804,51.379881,14.5,4.9,2,39.991544,-0.013759,2


In [10]:
def series2seq(data, n_in=1, n_out=1,labels=None,series_features=None, show_result=False):
    dat = data.copy()
    
    features = dat.columns
    targets = labels
    cols, names = list(), list()
    # input sequence (t-n, ... t-1)

    for i in range(n_in, 0, -1):
        cols.append(dat[series_features].shift(i))
        names += ['{}(t-{})'.format(j, i) for j in series_features]
    # forecast sequence (t, t+1, ... t+n) for selected labels
    #print(targets)
    for i in range(0, n_out):
        cols.append(dat[targets].shift(-i))
        names += ['{}(t+{})'.format(j, i) for j in targets]
    # put it all together
    agg = pd.concat(cols, axis=1).dropna()
    agg.columns = names
    # concatenate with constant features

    X = agg.iloc[:,:len(series_features)*n_in]
    X = pd.concat([X,dat.drop(columns=series_features)], axis=1).dropna()
    y = agg.iloc[:,len(series_features)*n_in:].copy()
    if show_result:
      X.info()
      print(X.head(), X.shape)
      y.info()
      print(y.head(), y.shape)
    return X, y
  
# Test the function
X, y = series2seq(df_test, n_in=2, n_out=1,labels = target_names,series_features=series_feature_names, show_result=True)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 435 entries, 2 to 436
Data columns (total 16 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Local_X(t-2)  435 non-null    float64
 1   v_Vel(t-2)    435 non-null    float64
 2   Local_Y(t-2)  435 non-null    float64
 3   v_Acc(t-2)    435 non-null    float64
 4   Lane_ID(t-2)  435 non-null    float64
 5   Local_X(t-1)  435 non-null    float64
 6   v_Vel(t-1)    435 non-null    float64
 7   Local_Y(t-1)  435 non-null    float64
 8   v_Acc(t-1)    435 non-null    float64
 9   Lane_ID(t-1)  435 non-null    float64
 10  Vehicle_ID    435 non-null    int64  
 11  Frame_ID      435 non-null    int64  
 12  Total_Frames  435 non-null    int64  
 13  v_Length      435 non-null    float64
 14  v_Width       435 non-null    float64
 15  v_Class       435 non-null    int64  
dtypes: float64(12), int64(4)
memory usage: 57.8 KB
   Local_X(t-2)  v_Vel(t-2)  Local_Y(t-2)  v_Acc(t-2)  Lane_ID(t-2)  

### `treatment_cars` Function to prepare the data set for each car

In [11]:
# Test data frame
# data set of the first 5 vehicles
df_test = df[df.Vehicle_ID.isin(vehicle_ids[:5])].copy()
df_test.info()
print(df_test.Vehicle_ID.unique())
df_test.head()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2045 entries, 0 to 2044
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Vehicle_ID    2045 non-null   int64  
 1   Frame_ID      2045 non-null   int64  
 2   Total_Frames  2045 non-null   int64  
 3   Local_X       2045 non-null   float64
 4   Local_Y       2045 non-null   float64
 5   v_Length      2045 non-null   float64
 6   v_Width       2045 non-null   float64
 7   v_Class       2045 non-null   int64  
 8   v_Vel         2045 non-null   float64
 9   v_Acc         2045 non-null   float64
 10  Lane_ID       2045 non-null   int64  
dtypes: float64(6), int64(5)
memory usage: 191.7 KB
[2 4 5 6 8]


Unnamed: 0,Vehicle_ID,Frame_ID,Total_Frames,Local_X,Local_Y,v_Length,v_Width,v_Class,v_Vel,v_Acc,Lane_ID
0,2,13,437,16.467196,35.380427,14.5,4.9,2,40.0,0.0,2
1,2,14,437,16.446594,39.381608,14.5,4.9,2,40.012349,0.123485,2
2,2,15,437,16.425991,43.381541,14.5,4.9,2,39.999855,-0.124939,2
3,2,16,437,16.405392,47.38078,14.5,4.9,2,39.99292,-0.069349,2
4,2,17,437,16.384804,51.379881,14.5,4.9,2,39.991544,-0.013759,2


In [12]:
def treatment_cars(data, n_in=1, n_out=1,labels=None,series_features=None, show_result=False):
  veh_ids = data.Vehicle_ID.unique()
  dat_X, dat_y = pd.DataFrame(),pd.DataFrame()

  for id in veh_ids:
    dat = data[data.Vehicle_ID==id].copy()
    X, y = series2seq(dat.drop(columns=['Frame_ID']), n_in=n_in, n_out=n_out,labels = labels,series_features=series_features)
    dat_X = pd.concat([dat_X,X],ignore_index=True)
    dat_y = pd.concat([dat_y,y],ignore_index=True)
  if show_result:
    dat_X.info()
    print(dat_X.head(), dat_X.shape)
    dat_y.info()
    print(dat_y.head(), dat_y.shape)
  return dat_X ,dat_y
treatment_cars(df_test,n_in=2, n_out=1, labels = target_names,series_features=series_feature_names, show_result=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2035 entries, 0 to 2034
Data columns (total 15 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Local_X(t-2)  2035 non-null   float64
 1   v_Vel(t-2)    2035 non-null   float64
 2   Local_Y(t-2)  2035 non-null   float64
 3   v_Acc(t-2)    2035 non-null   float64
 4   Lane_ID(t-2)  2035 non-null   float64
 5   Local_X(t-1)  2035 non-null   float64
 6   v_Vel(t-1)    2035 non-null   float64
 7   Local_Y(t-1)  2035 non-null   float64
 8   v_Acc(t-1)    2035 non-null   float64
 9   Lane_ID(t-1)  2035 non-null   float64
 10  Vehicle_ID    2035 non-null   int64  
 11  Total_Frames  2035 non-null   int64  
 12  v_Length      2035 non-null   float64
 13  v_Width       2035 non-null   float64
 14  v_Class       2035 non-null   int64  
dtypes: float64(12), int64(3)
memory usage: 238.6 KB
   Local_X(t-2)  v_Vel(t-2)  Local_Y(t-2)  v_Acc(t-2)  Lane_ID(t-2)  \
0     16.467196   40.000000     35.380

(      Local_X(t-2)  v_Vel(t-2)  Local_Y(t-2)  v_Acc(t-2)  Lane_ID(t-2)  \
 0        16.467196   40.000000     35.380427    0.000000           2.0   
 1        16.446594   40.012349     39.381608    0.123485           2.0   
 2        16.425991   39.999855     43.381541   -0.124939           2.0   
 3        16.405392   39.992920     47.380780   -0.069349           2.0   
 4        16.384804   39.991544     51.379881   -0.013759           2.0   
 ...            ...         ...           ...         ...           ...   
 2030     44.212485   61.023160   2092.082904   -2.031700           4.0   
 2031     44.206021   60.569662   2098.139867   -4.534983           4.0   
 2032     44.201809   60.005353   2104.140401   -5.643088           4.0   
 2033     44.197358   59.879807   2110.128380   -1.255460           4.0   
 2034     44.193051   59.887172   2116.117096    0.073649           4.0   
 
       Local_X(t-1)  v_Vel(t-1)  Local_Y(t-1)  v_Acc(t-1)  Lane_ID(t-1)  \
 0        16.446594   4

### Choose the size of the raw data set

In [13]:
np.random.seed(23)
new_size = 100
veh_list = np.random.choice(df.Vehicle_ID.unique(),new_size)
sub_df = df[df.Vehicle_ID.isin(veh_list)].copy()
sub_df.info()
sub_df.head()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 50149 entries, 6303 to 1040359
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Vehicle_ID    50149 non-null  int64  
 1   Frame_ID      50149 non-null  int64  
 2   Total_Frames  50149 non-null  int64  
 3   Local_X       50149 non-null  float64
 4   Local_Y       50149 non-null  float64
 5   v_Length      50149 non-null  float64
 6   v_Width       50149 non-null  float64
 7   v_Class       50149 non-null  int64  
 8   v_Vel         50149 non-null  float64
 9   v_Acc         50149 non-null  float64
 10  Lane_ID       50149 non-null  int64  
dtypes: float64(6), int64(5)
memory usage: 4.6 MB


Unnamed: 0,Vehicle_ID,Frame_ID,Total_Frames,Local_X,Local_Y,v_Length,v_Width,v_Class,v_Vel,v_Acc,Lane_ID
6303,25,77,436,36.937434,39.958126,18.5,5.9,2,43.64,0.0,4
6304,25,78,436,36.913762,44.531105,18.5,5.9,2,45.730403,20.904029,4
6305,25,79,436,36.890084,49.025028,18.5,5.9,2,44.939855,-7.905483,4
6306,25,80,436,36.866466,53.459789,18.5,5.9,2,44.34824,-5.91615,4
6307,25,81,436,36.842977,57.855282,18.5,5.9,2,43.955558,-3.926819,4


In [14]:
%%time
X, y = treatment_cars(sub_df, 
                   n_in=n_steps, n_out=n_future,
                   labels = target_names,
                   series_features=series_feature_names, show_result=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 49855 entries, 0 to 49854
Data columns (total 20 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Local_X(t-3)  49855 non-null  float64
 1   v_Vel(t-3)    49855 non-null  float64
 2   Local_Y(t-3)  49855 non-null  float64
 3   v_Acc(t-3)    49855 non-null  float64
 4   Lane_ID(t-3)  49855 non-null  float64
 5   Local_X(t-2)  49855 non-null  float64
 6   v_Vel(t-2)    49855 non-null  float64
 7   Local_Y(t-2)  49855 non-null  float64
 8   v_Acc(t-2)    49855 non-null  float64
 9   Lane_ID(t-2)  49855 non-null  float64
 10  Local_X(t-1)  49855 non-null  float64
 11  v_Vel(t-1)    49855 non-null  float64
 12  Local_Y(t-1)  49855 non-null  float64
 13  v_Acc(t-1)    49855 non-null  float64
 14  Lane_ID(t-1)  49855 non-null  float64
 15  Vehicle_ID    49855 non-null  int64  
 16  Total_Frames  49855 non-null  int64  
 17  v_Length      49855 non-null  float64
 18  v_Width       49855 non-nu

### Split the data set


In [15]:
X_train, X_test, y_train, y_test = train_test_split(X,y, 
                                                    test_size=0.3, random_state=42)
print(X_train.shape,X_test.shape, y_train.shape, y_test.shape)

(34898, 20) (14957, 20) (34898, 2) (14957, 2)


In [16]:
X_train.describe()

Unnamed: 0,Local_X(t-3),v_Vel(t-3),Local_Y(t-3),v_Acc(t-3),Lane_ID(t-3),Local_X(t-2),v_Vel(t-2),Local_Y(t-2),v_Acc(t-2),Lane_ID(t-2),Local_X(t-1),v_Vel(t-1),Local_Y(t-1),v_Acc(t-1),Lane_ID(t-1),Vehicle_ID,Total_Frames,v_Length,v_Width,v_Class
count,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0
mean,29.825893,39.601055,979.097874,0.378789,2.990888,29.826814,39.639846,983.060561,0.387917,2.990372,29.827846,39.678644,987.027129,0.387981,2.989971,1500.487306,540.138489,15.056063,6.262224,2.019686
std,16.917318,13.716168,592.015808,5.5317,1.500046,16.911918,13.736282,592.690677,5.556588,1.499833,16.906664,13.756811,593.368784,5.51274,1.499735,775.94592,129.800916,5.65116,0.976939,0.138921
min,2.365371,0.0,28.339231,-112.779778,1.0,2.361841,0.0,31.360769,-112.779778,1.0,2.36673,0.0,34.549814,-110.969666,1.0,25.0,283.0,9.0,4.9,2.0
25%,17.287998,32.172082,459.780124,-1.679239,2.0,17.29091,32.158581,463.318751,-1.685207,2.0,17.303439,32.21467,466.749357,-1.679907,2.0,871.0,452.0,12.5,5.5,2.0
50%,30.156991,40.06615,943.758431,0.041234,3.0,30.159845,40.076013,947.27293,0.043439,3.0,30.15651,40.082277,950.452304,0.042833,3.0,1556.0,522.0,14.5,6.0,2.0
75%,41.948858,49.430577,1458.5489,2.783238,4.0,41.95564,49.513019,1463.448717,2.797432,4.0,41.95703,49.5411,1467.941622,2.791441,4.0,2171.0,599.0,16.5,6.9,2.0
max,72.084203,83.660956,2128.450341,74.070325,8.0,71.755825,83.660956,2134.345627,79.159472,8.0,71.420084,83.660956,2140.250014,90.62111,8.0,2761.0,920.0,61.5,8.5,3.0


In [17]:
### Standardize the data
train_mean = X_train.mean()
train_std = X_train.std()

X_train = (X_train - train_mean) / train_std
X_test = (X_test - train_mean) / train_std

print(X_train.shape)
X_train.describe()

(34898, 20)


Unnamed: 0,Local_X(t-3),v_Vel(t-3),Local_Y(t-3),v_Acc(t-3),Lane_ID(t-3),Local_X(t-2),v_Vel(t-2),Local_Y(t-2),v_Acc(t-2),Lane_ID(t-2),Local_X(t-1),v_Vel(t-1),Local_Y(t-1),v_Acc(t-1),Lane_ID(t-1),Vehicle_ID,Total_Frames,v_Length,v_Width,v_Class
count,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0,34898.0
mean,-4.418241e-17,-9.752707000000001e-17,-1.759152e-16,-2.443267e-18,-1.384518e-16,8.836482e-17,7.065113e-17,-1.828378e-16,3.054084e-19,-1.504645e-16,-1.708251e-16,-5.232663e-16,-3.094805e-17,1.1656420000000002e-17,1.738792e-16,-6.230331e-17,-2.505367e-16,-1.290859e-16,-9.986853e-16,1.684836e-17
std,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
min,-1.62322,-2.887181,-1.605968,-20.45638,-1.327217,-1.624001,-2.885777,-1.605728,-20.3664,-1.327062,-1.624278,-2.884291,-1.605203,-20.20006,-1.326882,-1.901534,-1.981022,-1.07165,-1.39438,-0.1417063
25%,-0.7411278,-0.5416216,-0.8772025,-0.3720426,-0.6605714,-0.7412467,-0.5446354,-0.8769192,-0.373093,-0.6603215,-0.740797,-0.5425657,-0.8768203,-0.3751108,-0.6600972,-0.8112515,-0.6790283,-0.4523077,-0.7802166,-0.1417063
50%,0.01957154,0.03390854,-0.05969341,-0.06102187,0.006074659,0.01969211,0.03175291,-0.06038163,-0.06199461,0.006419421,0.01943997,0.02934055,-0.06163928,-0.06260915,0.006687334,0.07154196,-0.1397408,-0.09839809,-0.268414,-0.1417063
75%,0.7166009,0.7166377,0.8098619,0.4346674,0.6727207,0.7171763,0.7187661,0.8105209,0.433632,0.6731603,0.7174203,0.7169144,0.8104816,0.4359827,0.6734719,0.864123,0.4534753,0.2555115,0.6528306,-0.1417063
max,2.497932,3.21226,1.941422,13.32168,3.339305,2.479258,3.204733,1.942472,14.17625,3.340124,2.460109,3.19713,1.943518,16.36811,3.34061,1.624485,2.926493,8.218478,2.290599,7.056645


### Reshape data sets


In [18]:
X_train = X_train.values
X_test = X_test.values
# reshape into [# samples, # timesteps,# features]
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1],1))
X_test = X_test.reshape((X_test.shape[0], X_train.shape[1],1))

## Prediction model

In [19]:
%%time
# define model
model = Sequential()
model.add(LSTM(50, activation='relu', input_shape=(X_train.shape[1],1)))
model.add(Dense(n_labels*n_future))
model.compile(optimizer='adam', loss='mse', metrics=['mse'])

# For saving the best model during the whole training process.
#checkpointer = callbacks.ModelCheckpoint(filepath='BestModel.h5', monitor='val_loss', save_best_only=True)

#### Interrupt training if `val_loss` stops improving for over 10 epochs #######
stop_learn= tf.keras.callbacks.EarlyStopping(patience=10, monitor='val_loss')


# fit model
Monitor = model.fit(X_train, y_train, epochs=50, 
                    callbacks=[stop_learn],
                    validation_data=(X_test, y_test), verbose=1)

Train on 34898 samples, validate on 14957 samples
Epoch 1/50


InternalError:  Blas GEMM launch failed : a.shape=(32, 50), b.shape=(50, 200), m=32, n=200, k=50
	 [[{{node sequential/lstm/while/body/_1/MatMul_1}}]] [Op:__inference_distributed_function_1865]

Function call stack:
distributed_function


In [20]:
hist = pd.DataFrame(Monitor.history)
hist['epoch'] = Monitor.epoch
fig, axes = plt.subplots(nrows=1, ncols=2,figsize=(10,4),dpi=150)
hist[['loss','val_loss']].plot(ax=axes[0])
hist[['mse','val_mse']].plot(ax=axes[1])
plt.show()
hist.tail()

NameError: name 'Monitor' is not defined

## Evaluation 


In [None]:
yhat = model.predict(X_test, verbose=1)
rms += [sqrt(mean_squared_error(y_test, yhat))]
print(yhat[:5])
rms

In [None]:
plt.figure(figsize=(n_labels*5,n_future*5))
k = 1
for i in range(n_future):
  for j in range(1,n_labels+1):
    plt.subplot(n_labels,n_future,k)
    plt.scatter(y_test.index,y_test.iloc[:,i], label = "true {} at t+{}".format(target_names[j-1],i),marker = 'X', )
    plt.scatter(y_test.index,yhat[:,i], label = "prediction {} at t+{}".format(target_names[j-1],i),marker = '.')
    plt.legend()
    k+=1
plt.show()

#END
