### EXAMPLE OF PREDICTIVE MODEL WITH GRU LAYERS

"Predictive" mode means we try to predict forest state at time t, with informations from historical field surveys (IFN data at time t-1, t-2, etc...) and, optionnaly, from satellite or meteo data at time t (present time) (as it is described in this [diagram](schematic_diagram_of_predictive_models.png))

It's as if we don't have access to NFI data (because, perhaps, the last census campaign was done five or six years ago, for example...) and we are trying to predict the state of the forest just with the information available.

This could be a "time series" problem, but with the difficulty that we only have three historical data points (when we try to predict LFI4 with LFI1, LFI2 and LFI3).
Classical model for "time series" like Prophet from FB would have a hard time working in this case, because it could be difficult to just observe a target and its periodicity with theses three points.

#### Specific approach for our GRU-Layers model :

The second approach is more complex. We try to predict the state of the forest at time t, with all historical information (data at time t-3, t-2, t-1). In our case, we are trying to fit a model to predict LFI4 based on the data LFI1, LFI2, LF3, but this amounts to fitting a model to predict the future based on the past.

So the preprocessing here is different again.

And here we are trying to use RNN units in our deep learning architecture. RNNs, commonly used for NLP problems, are also a very good solution for "time series" problems, because the principle of RNNs is to capture the "memory" of a sequence. Our sequence here is not a sentence, but a historical development of the characteristics of a forest plot. We just have three historical points, okay, but the advantage of an RNN architecture is also that if we don't have a good and efficient "memory" of the features, the model behaves like a simple regressor with all the features. So we might think that an RNN model finds the right balance between the temporal and non-temporal problem on its own (as it was a simple model like our basic models in step 1).

In [1]:
import pandas as pd
import numpy as np
import plotly.express as px
import copy

from sklearn.preprocessing import OneHotEncoder, StandardScaler, LabelEncoder, OrdinalEncoder
from sklearn.impute import KNNImputer, SimpleImputer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import Ridge, RidgeClassifier
from sklearn.model_selection import train_test_split
from sklearn.compose import ColumnTransformer
from sklearn.model_selection import cross_val_score, GridSearchCV
from sklearn.metrics import r2_score, accuracy_score, f1_score

import tensorflow as tf
import tensorflow_addons as tfa

 The versions of TensorFlow you are currently using is 2.11.0 and is not supported. 
Some things might work, some things might not.
If you were to encounter a bug, do not file an issue.
If you want to make sure you're using a tested and supported configuration, either change the TensorFlow version or the TensorFlow Addons's version. 
You can find the compatibility matrix in TensorFlow Addon's readme:
https://github.com/tensorflow/addons


Target definition :

In [2]:
TARGET = 'SURF_TER_HA'

Import data :

In [87]:
data_base = pd.read_excel('../1_DATA_global_processing/data_processed_and_merged/big_merge_V2_meteo_SAT.xlsx').drop('Unnamed: 0', axis=1)

--------------

SPECIFIC TEMPORAL PREPROCESSING :

We convert 'LFI' in a numerical variable, it's more convenient :

In [88]:
data_base['LFI'] = data_base['LFI'].map({'LFI1' : 1,
                                               'LFI2' : 2,
                                               'LFI3' : 3,
                                               'LFI4' : 4 })

We need to order the dataset for the preprocessing :

In [89]:
data_base.sort_values(['PARCELLE', 'LFI'], inplace=True)

Feature engineering :

In [90]:
# adding aridity index
data_base["AI"] = data_base['PRCP_GROWTH'] / data_base['TAVE_GROWTH']
# adding H/D index
data_base["H_D"] = data_base['HAUTEUR_ARBRE'] / data_base['DBH']


Here we distinguish between "past" and "future" data... Past" data are all the features we know from historical surveys (NFI data), including our historical target values... The "future" data are the satellite data, the meteorological data and some characteristics extracted from the NFI data which are very "stable" (the values do not change with time, for each forest plot).

In [91]:

# --- PAST ---
cat_strict_past = ['PRODREG', 'ESPECE_DOM', 'TYP_RAJ_PPL', 'RELIEF'] #exemple 'PRODREG', 'ORIENTATION', 'ESPECE_DOM', 'TYP_RAJ_PPL', 'DEG_FERMETURE', 'STR_PPL', 'RELIEF'
cat_ord_past = ['TAUX_COUV_RAJ', 'HT_VEG', 'NIV_DEV', 'QUAL_STATION'] #exemple 'TAILLE_PPL', 'DEGRAD_PPL', 'MELANGE', 'QUAL_STATION', 'TAUX_COUV_RAJ', 'SURF_TROU_AER', 'HT_VEG'
numerics_past = [ 'LFI', 'SLOPE25', '25_GRID_PER', 'UNIT_ACCR','H_D','AI','SDI','ALT', 'TIGES_VIV_H', 'SURF_TER_HA', 'FEUILL_PER', 'CONIF_PER','PERF_CROI'] #exemple 'AGE_PPL

# --- FUTURE ---
cat_strict_future = [] # 'ORIENTATION'
cat_ord_future = ['QUAL_STATION']
numerics_future = ['LFI', 'ALT', 'SLOPE25', 'PERF_CROI']
add_meteo_known = ['PRCP', 'TAVE_AVG',	'TAVE', 'TAVE_GROWTH', 'PRCP_S_S',	'PRCP_G_S', 'AI']
add_SAT_known = ['NDVI', 'EVI', 'NDMI', 'NDWI', 'DSWI']

Grouping of feature categories by past and future :

In [92]:
feats_past = cat_strict_past + cat_ord_past + numerics_past
feats_future_base = cat_strict_future + cat_ord_future + numerics_future + add_meteo_known + add_SAT_known

Here we have to "mark" the future features, because there may be duplicates with the past features. In the same loop, we create three more lists to store future featuress by variable category (numericals, ordered categoricals, strict categoricals) :

In [93]:
feats_future_f_names = []
feats_future_f_ord = []
feats_future_f_num = []
feats_future_f_cat_strict  = []

for cat in feats_future_base:

    feat_list = data_base[cat].to_list()

    data_base[cat + "_f"] = feat_list

    feats_future_f_names.append(cat + '_f')

    if cat in cat_ord_future:
        feats_future_f_ord.append(cat + "_f")
        
    if cat in cat_strict_future:
        feats_future_f_cat_strict.append(cat + "_f")
        
    if cat in (numerics_future + add_SAT_known + add_meteo_known):
        feats_future_f_num.append(cat + "_f")

Now, we can filter all the dataset with our past and future features :

In [94]:
feats_total = feats_past + feats_future_f_names

In [95]:
data_red = data_base[feats_total]

MISSING VALUES & NOT DETERMINED VALUES :

In documentation, class "-1" means "not determined". So, for our ordered categorial features, we can transform this class in an empty data to make the same preprocessing for "-1" value and emplty data.

In [96]:
for cat in (cat_ord_past + feats_future_f_ord):
  data_red[cat] = data_red[cat].apply(lambda v : int(v) if v!=-1 else np.nan)



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



SPECIAL "FINE" IMPUTER :

Before our pipeline preprocessing with scikit-learn functions wich include usuals imputers, we try to make an "artisanal" imputer, finier.

Because the dataset is ordered, first, by forest plot and, second, by time... we build a loop on each row to replace, if it's possible, a 'LFI1' missing value by a 'LFI2' value, a 'LFI2' missing value by a 'LFI1' value or a 'LFI3' value and a 'LFI3' value by a 'LFI2' value... for each forest plot.

In [97]:
for i in range(len(data_red)):
    for j in range(len(data_red.columns)):
        if i%4==0:
            if np.isnan(data_red.iloc[i,j]) or data_red.iloc[i,j]==np.nan:
                next_value = copy.copy(data_red.iloc[i+1,j])
                data_red.iloc[i,j] = next_value
        elif i%4==1:
            if np.isnan(data_red.iloc[i,j]) or data_red.iloc[i,j]==np.nan:
                if np.isnan(data_red.iloc[i-1,j]) or data_red.iloc[i-1,j]==np.nan:
                    past_value = copy.copy(data_red.iloc[i-1,j])
                    data_red.iloc[i,j] = past_value
                else:
                    next_value = copy.copy(data_red.iloc[i+1,j])
                    data_red.iloc[i,j] = next_value
        elif i%4==2:
            if np.isnan(data_red.iloc[i,j]) or data_red.iloc[i,j]==np.nan:
                past_value = copy.copy(data_red.iloc[i-1,j])
                data_red.iloc[i,j] = past_value

SPLITTING :

We have to take a test set with a number of rows wich is a multiple of 4... (to keep complete sequence of " LFI1 / LFI2 / LFI3 / LFI4")

In [98]:
X_train, X_test = train_test_split(data_red, test_size=2000, shuffle=False, random_state=2)

SCIKIT-LEARN PIPELINES PREPROCESSING :

For the common pipelines of preprocessing, we create four distincts category of feature :
- numerical future features : we can use a KNN Imputer on it... It's a finer imputer than a simple imputer. Then, a Standard Scaler.
- numerical past features : Theoricaly, we don't have the right to use KNN Imputer, because KNN imputer replaces missing value with value of the nearest neighbors, and it's possible that theses nearest neighbors are future data ! ... So a simple imputer is a best option. Then, a Standard Scaler.
- strict categorial features : as usual, simple imputer and One Hot Encoder
- ordinal categorial features : as usual, simple imputer and Ordinal Encoder

In [99]:
ordinal_cat_tot = cat_ord_past + feats_future_f_ord
categorials_strict_tot = cat_strict_past + feats_future_f_cat_strict

In [100]:
numerics_transforms_past = Pipeline(
    [("imputer", SimpleImputer()),
    ('encoder',StandardScaler())
])

numerics_transforms_future = Pipeline(
    [("imputer", KNNImputer()),
    ('encoder',StandardScaler())
])

categorials_transforms = Pipeline([
    ("imputer", SimpleImputer(strategy='most_frequent')),
    ('encoder',OneHotEncoder(drop="first"))
])

ordinal_cat_transforms = Pipeline([
    ("imputer", SimpleImputer(strategy='most_frequent')),
    ('encoder',OrdinalEncoder())
])

preprocessor = ColumnTransformer(
    [("num_past", numerics_transforms_past, numerics_past),
    ('num_future', numerics_transforms_future, feats_future_f_num),
    ("ord_cat", ordinal_cat_transforms, ordinal_cat_tot),
     ("cat_strict", categorials_transforms, categorials_strict_tot)])

In [101]:
X_train = preprocessor.fit_transform(X_train)
X_test = preprocessor.transform(X_test)

Dimensions of X_train and X_test :

In [102]:
np.shape(X_train)

(7612, 58)

In [123]:
np.shape(X_test)

(2000, 58)

For feature extraction, we need to create two specific lists of categorical feature names, and also a global list of all the features :

In [103]:
list_features_in = []
list_cat_strict_past = []
list_cat_strict_future = []

for feat in (numerics_past + feats_future_f_num + ordinal_cat_tot):
  list_features_in.append(feat)

for cat in cat_strict_past:
  nb_lab = len(data_red[cat].unique())-1
  for i in range(nb_lab):
    list_features_in.append(f'{cat}_{i}')
    list_cat_strict_past.append(f'{cat}_{i}')
  
for cat in feats_future_f_cat_strict:
  nb_lab = len(data_red[cat].unique())-1
  for i in range(nb_lab):
    list_features_in.append(f'{cat}_{i}')
    list_cat_strict_future.append(f'{cat}_{i}')

CONVERSION TO TENSORS :

To create temporal specific datasets for our models, we need, first, to recreate dataframes for train and test sets :

In [104]:
df_train = pd.DataFrame(X_train, columns=list_features_in)
df_test = pd.DataFrame(X_test, columns=list_features_in)

... and we must distinguish past and future features in theses datasets :

In [105]:
df_train_past = df_train[numerics_past + cat_ord_past + list_cat_strict_past]
df_train_future = df_train[feats_future_f_num + feats_future_f_ord + list_cat_strict_future]

df_test_past = df_test[numerics_past + cat_ord_past + list_cat_strict_past]
df_test_future = df_test[feats_future_f_num + feats_future_f_ord + list_cat_strict_future]

Now, it's time to filter what are the past rows and the future rows in our dataset, to create our tensors.

To do that, we simply use the index of each row in the dataframes... If the entire division by 4 of this index is 3, it's a 'LFI4' data, and this row will be a part of the future data... Else, it will be a part of the past data.

Then, we 'numpy-ise" the dataframe, and, for the past data, we reshape it to have a temporal dimension (3 LFI points). Our final tensor will have a shape of number of features X 3 time dimensions X number of data

In [106]:
X_train_past = df_train_past.iloc[[i for i in range(len(df_train_past)) if i%4!=3],:].to_numpy()
X_train_past_3D = X_train_past.reshape(len(X_train_past)//3, 3, np.shape(X_train_past)[1])
train_tensor_past = tf.convert_to_tensor(X_train_past_3D)

In [107]:
X_test_past = df_test_past.iloc[[i for i in range(len(df_test_past)) if i%4!=3],:].to_numpy()
X_test_past_3D = X_test_past.reshape(len(X_test_past)//3, 3, np.shape(X_test_past)[1])
test_tensor_past = tf.convert_to_tensor(X_test_past_3D)

For the future data, we just don't need to reshape the numpy array, because it represents just a one point time :

In [108]:
X_train_future = df_train_future.iloc[[i for i in range(len(df_train_future)) if i%4==3],:].to_numpy()
train_tensor_future = tf.convert_to_tensor(X_train_future)

In [109]:
X_test_future = df_test_future.iloc[[i for i in range(len(df_test_future)) if i%4==3],:].to_numpy()
test_tensor_future = tf.convert_to_tensor(X_test_future)

And here, we also create tensors for our train and test targets, storing the good index with our target values in the future :

In [110]:
targets_train = []
for i in range(len(X_train)//4):
    targets_train.append(df_train.iloc[i*4+3,:][TARGET])
y_train = tf.convert_to_tensor(targets_train)

In [111]:
targets_test = []
for i in range(len(X_test)//4):
    targets_test.append(df_test.iloc[i*4+3,:][TARGET])
y_test = tf.convert_to_tensor(targets_test)

----------------

MODEL 1 : GRU LAYERS Model with just the past data

To beginning, we create a RNN Model without the future additionnal data.
So, it's just a unique sequential pipeline model with GRU layers. We also try to use simple RNN and LSTM units, but it's GRU wich have the best performances, even if Simple RNN units and LSTM units have close performances.

Different depths of the model and numbers of units have been tested, and here we have kept what seems to be the best configuration.

End of the model is a simple neuron with a linear activation function, because here, our target is numeric.

In [165]:
GRU_past = tf.keras.models.Sequential([
        tf.keras.layers.GRU(64, input_shape=(3,np.shape(X_train_past)[1],), return_sequences=True),
        tf.keras.layers.GRU(32, return_sequences=False),
        tf.keras.layers.Dense(8, 'linear'),
        tf.keras.layers.Dense(1, 'linear')
    ])

In [166]:
GRU_past.summary()

Model: "sequential_21"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 gru_40 (GRU)                (None, 3, 64)             20544     
                                                                 
 gru_41 (GRU)                (None, 32)                9408      
                                                                 
 dense_38 (Dense)            (None, 8)                 264       
                                                                 
 dense_39 (Dense)            (None, 1)                 9         
                                                                 
Total params: 30,225
Trainable params: 30,225
Non-trainable params: 0
_________________________________________________________________


Compilation with loss function, optimizer and metric definition :

In [168]:
GRU_past.compile(
    loss=tf.keras.losses.MeanSquaredError(),
    optimizer=tf.keras.optimizers.Adam(),
    metrics=tfa.metrics.RSquare())

Training :

In [169]:
GRU_past.fit(
    x=train_tensor_past,
    y=y_train,
    epochs=15,
    batch_size=128,
    validation_data=(test_tensor_past, y_test))

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


<keras.callbacks.History at 0x247a3c2b8b0>

Good performance (for a predictive model) !...

In [171]:
y_test_pred = GRU_past(X_test_past_3D)

In [172]:
r2_score(y_test, y_test_pred)

0.7620323243365349

Feature extraction :

We try to make a very simplistic feature extraction for this complex model, observing the average of the weight of the first GRU layer for each feature... but is it relly reliable ? Here, we don't observe the sequential aspect of the problematic. So, it's not a very good feature extraction. Other possibilities should be found.

In [187]:
np.shape(X_train_past)[1]

41

In [191]:
coeff_mean_GRU = []
for i in range(np.shape(X_train_past)[1]):
    coeff_mean_GRU.append(np.mean(abs(GRU_past.layers[0].trainable_variables[0][i])))

In [192]:
df_coef = pd.DataFrame(coeff_mean_GRU, columns=['Coeff_GRU'], index= (numerics_past + cat_ord_past + list_cat_strict_past) )

In [193]:
fig = px.bar(df_coef['Coeff_GRU'], title=f"Features importance for target : {TARGET} in GRU Layers")
fig.show()

It's very difficult to interpret this feature extraction, but we could see "SURF_TER_HA' have a great importance...

--------------------

MODEL 2 : GRU LAYERS Model with a second additional "future" data pipeline :

Let's try to go further !...

Here, we try to give to our first model the "future" data (satellite data, meteo, IFN data known at time t...)

To do that, we create a second pipeline with our future data as input. The architecture of this second pipeline will be a simply Multi-Layers Dense...

In [268]:
MLD_future = tf.keras.models.Sequential([
        tf.keras.layers.Dense(64, input_shape=(np.shape(X_train_future)[1],), activation='relu'),
        tf.keras.layers.Dense(32, 'linear'),
        tf.keras.layers.Dense(8, 'linear')
    ])

Testing the MLD second pipeline with a "future" data sample :

In [269]:
MLD_future(tf.expand_dims(train_tensor_future[0], axis=0))

<tf.Tensor: shape=(1, 8), dtype=float32, numpy=
array([[-0.4471811 , -0.44472128,  0.16102505, -0.9128637 , -1.0829674 ,
        -0.36235887, -0.5392028 , -0.4718085 ]], dtype=float32)>

Also, we have to review the first  GRU_past pipeline :

In [270]:
GRU_past = tf.keras.models.Sequential([
        tf.keras.layers.GRU(64, input_shape=(3,np.shape(X_train_past)[1],), return_sequences=True),
        tf.keras.layers.GRU(32, return_sequences=False),
        tf.keras.layers.Dense(8, 'linear')
    ])

And we can define the global architecture with the two pipeline (past and future). There is a simple concatenation between the two outputs from the two pipelines, and our final unit. We concatenate 8 outputs with 8 outputs... so, it's a perfect balance with the two pipelines.

Afterwards, we tried different balances (16 outputs for MLD "future" data and 4 ouptuts for GRU layers "past" data... and vice versa...). Model is more performant if the GRU Layers "past" data outputs are in larger numbers.

In [275]:
input_GRU = tf.keras.layers.Input(shape=(3,np.shape(X_train_past)[1],))

output_GRU = GRU_past(input_GRU)

input_MLD = tf.keras.layers.Input(shape=(np.shape(X_train_future)[1],))

output_MLD = MLD_future(input_MLD)

x = tf.keras.layers.Concatenate(axis=1)([output_GRU, output_MLD])

output = tf.keras.layers.Dense(1, 'linear')(x)

In [276]:
model = tf.keras.models.Model(inputs=[input_GRU, input_MLD], outputs=output)

In [277]:
model.summary()

Model: "model_10"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_21 (InputLayer)          [(None, 3, 41)]      0           []                               
                                                                                                  
 input_22 (InputLayer)          [(None, 17)]         0           []                               
                                                                                                  
 sequential_42 (Sequential)     (None, 8)            30216       ['input_21[0][0]']               
                                                                                                  
 sequential_41 (Sequential)     (None, 8)            3496        ['input_22[0][0]']               
                                                                                           

Compilation :

In [278]:
model.compile(
    loss=tf.keras.losses.MeanSquaredError(),
    optimizer=tf.keras.optimizers.Adam(),
    metrics=tfa.metrics.RSquare())

Fitting :

In [279]:
model.fit(
    x=[train_tensor_past, train_tensor_future],
     y=y_train,
     epochs=20,
     batch_size=128,
     validation_data=([test_tensor_past, test_tensor_future],
     y_test))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x247e0d40e80>

0.75 is a good performance, but a bit below the first model (without the future additional data)...

We tried many trainings with different balances for concatenation, different unit values in each layer, different layers... The conclusion is always the same: the model does not really care about future data (satellite and weather data) and relies mainly on historical data (NFI data) to make its predictions.

Feature extraction :

In [280]:
coeff_mean_GRU = []
for i in range(np.shape(X_train_past)[1]):
    coeff_mean_GRU.append(np.mean(abs(model.layers[2].trainable_variables[0][i])))

In [281]:
df_coef = pd.DataFrame(coeff_mean_GRU, columns=['Coeff_GRU'], index= (numerics_past + cat_ord_past + list_cat_strict_past) )

In [282]:
fig = px.bar(df_coef['Coeff_GRU'], title=f"Features importance for target : {TARGET} in GRU Layers")
fig.show()

For the GRU pipeline weights at the first layer, It looks like the first model feature extraction.

Let's see on the second "future data" pipeline :

In [288]:
np.shape(X_train_future)[1]

17

In [289]:
len(feats_future_f_num + feats_future_f_ord + list_cat_strict_future)

17

In [283]:
coeff_mean_MLP = []
for i in range(np.shape(X_train_future)[1]):
    coeff_mean_MLP.append(np.mean(abs(model.layers[3].trainable_variables[0][i])))

In [285]:
df_coef_2 = pd.DataFrame(coeff_mean_MLP, columns=['Coeff_MLP'], index= feats_future_f_num + feats_future_f_ord + list_cat_strict_future)

In [286]:
fig = px.bar(df_coef_2['Coeff_MLP'], title=f"Features importance for target : {TARGET} in MLP Dense Layers")
fig.show()

Once again, it's very difficult to interpret it, and all the feature have approximately the same importance...