# Lettuce Weight Prediction: a Regression Problem

In a *regression* problem, we aim to predict the output of a continuous value, like a temperature or a probability. Contrast this with a *classification* problem, where we aim to select a class from a list of classes.

We will use the new `tf.keras` API, see [this guide](https://www.tensorflow.org/guide/keras) for details.

It aims to predict the lettuce fresh weight at a certain date after planting.

## Setup your environnement

### Requirements

```sh
python3 --version
pip3 --version
virtualenv --version
````

### Create a virtualenv and install Tensorflow
```sh 
virtualenv --system-site-packages -p python3 ./venv
source ./venv/bin/activate
pip install --upgrade pip
pip install --upgrade tensorflow=2.0
```

#### Check you setup is correct
```sh
python -c "import tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
```

In [None]:
#from __future__ import absolute_import, division, print_function, unicode_literals
import pathlib

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
import os
import tensorflow as tf
import datetime
import distro

# Load the TensorBoard notebook extension
%load_ext tensorboard

plt.rcParams['figure.figsize'] = (8, 8)
print(tf.__version__)
tf.reduce_sum(tf.random.normal([1000, 1000]))

print('Running Tensorflow on {} {}'.format(distro.name(), distro.version()), distro.codename())

## Dataset

The dataset is available from the Github repository [IoT Tensorflow Lite DevoxxBE](https://github.com/alexisduque/iot-tensorflow-lite-devoxxbe) along with this notebook.



### Get the data

Import it using pandas

In [None]:
dataset_path = './datasets/lg_plantings.pickle'
raw_dataset = pd.read_pickle('./datasets/lg_plantings.pickle')
dataset = raw_dataset.copy()
dataset.columns.values

### Clean the data

The dataset contains a few unknown values.

In [None]:
dataset.isna().sum()

To keep this initial tutorial simple drop those rows.

In [None]:
dataset = dataset.dropna()

The `"Irrigation"` column is categorical, not numeric. So lets converts to a One Hot:

In [None]:
irrigation = dataset.pop('Irrigation')

In [None]:
dataset['HPA'] = (irrigation == 1)*1.0
dataset['Ebb&Flood'] = (irrigation == 2)*1.0
dataset['Nebu'] = (irrigation == 3)*1.0
dataset.tail()

### Split the data into train and test

Now split the dataset into a training set and a test set.

We will use the test set in the final evaluation of our model.

In [None]:
train_dataset = dataset.sample(frac=0.8,random_state=0)
test_dataset = dataset.drop(train_dataset.index)

### Inspect the data

Have a quick look at the joint distribution of a few pairs of columns from the training set.

In [None]:
sns.pairplot(train_dataset[["Weight", "CumCO2", "CumLight", "DoP"]], diag_kind="kde")
plt.show()

Also look at the overall statistics:

In [None]:
train_stats = train_dataset.describe()
train_stats.pop("Weight")
train_stats = train_stats.transpose()
train_stats

### Split features from labels

Separate the target value, or "label", from the features. This label is the value that you will train the model to predict. Here ise the **weight**

In [None]:
train_weights = train_dataset.pop('Weight')
test_weights = test_dataset.pop('Weight')

### Normalize the data

Look again at the `train_stats` block above and note how different the ranges of each feature are.

It is good practice to normalize features that use different scales and ranges. Although the model *might* converge without feature normalization, it makes training more difficult, and it makes the resulting model dependent on the choice of units used in the input.

Note: Although we intentionally generate these statistics from only the training dataset, these statistics will also be used to normalize the test dataset. We need to do that to project the test dataset into the same distribution that the model has been trained on.

In [None]:
def norm(x):
    return (x - train_stats['mean']) / train_stats['std']
normed_train_data = norm(train_dataset)
normed_test_data = norm(test_dataset)

This normalized data is what we will use to train the model.

Caution: The statistics used to normalize the inputs here (mean and standard deviation) need to be applied to any other data that is fed to the model, along with the one-hot encoding that we did earlier.  That includes the test set as well as live data when the model is used in production.

## The model

### Build the model

Let's build our model. Here, we'll use a `Sequential` model with two densely connected hidden layers, and an output layer that returns a single, continuous value. The model building steps are wrapped in a function, `build_model`, since we'll create a second model, later on.

In [None]:
def build_model():
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(64, activation=tf.nn.relu, input_shape=[len(train_dataset.keys())]),
        tf.keras.layers.Dense(64, activation=tf.nn.relu),
        tf.keras.layers.Dense(1)
    ])

    optimizer = tf.keras.optimizers.RMSprop(0.001)

    model.compile(loss='mean_squared_error',
                  optimizer=optimizer,
                  metrics=['mean_absolute_error', 'mean_squared_error'])
    return model

In [None]:
model = build_model()

### Prepare for Tensorboard

In [None]:
log_dir="logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)

### Inspect the model

Use the `.summary` method to print a simple description of the model

In [None]:
model.summary()

In [None]:
tf.keras.utils.plot_model(model, './figures/weight_reg_model.png', show_shapes=True)


Now try out the model. Take a batch of `10` examples from the training data and call `model.predict` on it.

In [None]:
example_batch = normed_train_data[:10]
example_result = model.predict(example_batch)
example_result

It seems to be working, and it produces a result of the expected shape and type.

### Train the model

Train the model using `early_stop` feature, and record the training and validation accuracy in the `history` object.

In [None]:
# Display training progress by printing a single dot for each completed epoch
class print_dot(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs):
        if epoch % 200 == 0: print('')
        if epoch % 2: print('.', end='')
    
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir, histogram_freq=0)

EPOCHS = 500

Visualize the model's training progress using the stats stored in the `history` object.

In [None]:
def plot_history(history):
    hist = pd.DataFrame(history.history)
    hist['epoch'] = history.epoch

    plt.figure()
    plt.xlabel('Epoch')
    plt.ylabel('Mean Abs Error [Weight]')
    plt.plot(hist['epoch'], hist['mean_absolute_error'],
             label='Train Error')
    plt.plot(hist['epoch'], hist['val_mean_absolute_error'],
             label = 'Val Error')
    plt.ylim([0,120])
    plt.legend()

    plt.figure()
    plt.xlabel('Epoch')
    plt.ylabel('Mean Square Error [$Weight^2$]')
    plt.plot(hist['epoch'], hist['mean_squared_error'],
             label='Train Error')
    plt.plot(hist['epoch'], hist['val_mean_squared_error'],
             label = 'Val Error')
    plt.ylim([0,14000])
    plt.legend()
    plt.show()

In [None]:
model = build_model()

# The patience parameter is the amount of epochs to check for improvement
early_stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=10)

history = model.fit(normed_train_data, train_weights, epochs=EPOCHS,
                    validation_split = 0.2, verbose=0, callbacks=[early_stop, tensorboard_callback])

plot_history(history)

In [None]:
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
hist.tail()

In [None]:
%tensorboard --logdir $log_dir

The graph shows that on the validation set, the average error is usually around +/- 3 Weight.

Let's see how well the model generalizes by using the **test** set, which we did not use when training the model.  This tells us how well we can expect the model to predict when we use it in the real world.

In [None]:
loss, mae, mse = model.evaluate(normed_test_data, test_weights, verbose=2)
print("Testing set Mean Abs Error: {:5.2f}".format(mae))

### Make predictions

Finally, predict Weight values using data in the testing set:

In [None]:
%%time
test_predictions = model.predict(normed_test_data).flatten()

In [None]:
plt.rcParams['figure.figsize'] = (8, 8)

def plot_predictions(true, inferences):
    plt.scatter(true, inferences, marker='x')
    plt.xlabel('True Values [Weight]')
    plt.ylabel('Inferences [Weight]')
    plt.axis('equal')
    plt.axis('square')
    plt.xlim([0,plt.xlim()[1]])
    plt.ylim([0,plt.ylim()[1]])
    plt.plot([-0, 250], [-0, 250], color='orange')

    plt.show()

plot_predictions(test_weights, test_predictions)

In [None]:
error = test_predictions - test_weights
plt.hist(error, bins = 25)
plt.xlabel("Prediction Error [Weight]")
_ = plt.ylabel("Count")

### Export your Keras model to filesystem

#### Using python API

In [None]:
model_export_dir= "./models/lg_weight/"
tf.saved_model.save(model,  model_export_dir)

root_directory = pathlib.Path(model_export_dir)
keras_model_size = sum(f.stat().st_size for f in root_directory.glob('**/*.pb') if f.is_file())
print("Keras model is {} bytes".format(keras_model_size))

## Convert to Tensorflow Lite

#### Using python API

In [None]:
# convert the model to the TensorFlow Lite format without quantization
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()

# save the model to disk
open('./models/lg_weight.tflite', "wb").write(tflite_model)
  
model_size = os.path.getsize('./models/lg_weight.tflite')
print("TFLite model is {} bytes".format(model_size))

#### Using `tflite_converter`

In [None]:
!tflite_convert \
  --saved_model_dir=$model_export_dir \
  --output_file='./models/lg_weight.tflite'

### Encode the Model in a C Header File
The next cell creates a constant byte array that contains the TFlite model.

In [None]:
!echo "const unsigned char model[] = {" > ./models/lg_weight_model.h
!cat ./models/lg_weight.tflite | xxd -i >> ./models/lg_weight_model.h
!echo "};"                              >> ./models/lg_weight_model.h

model_h_size = os.path.getsize("./models/lg_weight_model.h")
# print(f"Header file, lg_weight_model.h, is {model_h_size:,} bytes.")

### Quantization

#### Weight quantization
The simplest form of post-training quantization quantizes only the weights from floating point to 8-bits of precision (also called "hybrid" quantization). 

In [None]:
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
tflite_quant_model = converter.convert()

# save the model to disk
open('./models/lg_weight_quant.tflite', "wb").write(tflite_quant_model)
  
model_size = os.path.getsize('./models/lg_weight_quant.tflite')
print("Model is %d bytes" % model_size)

# Conclusion

This notebook introduced a the basics to handle a regression problem and apply them to lettuce weight prediction.

* Mean Squared Error (MSE) is a common loss function used for regression problems (different loss functions are used for classification problems).
* Similarly, evaluation metrics used for regression differ from classification. A common regression metric is Mean Absolute Error (MAE).
* When numeric input data features have values with different ranges, each feature should be scaled independently to the same range.
* If there is not much training data, one technique is to prefer a small network with few hidden layers to avoid overfitting.
* Early stopping is a useful technique to prevent overfitting.

It briefly introduced [Tensorboard](https://www.tensorflow.org/tensorboard/) to monitor the training process and vizualise the model graph and parameters.

We then show some technics to convert a model after training to TF Lite/

* Using the python API or the command line
* From a saved model on the filesystem or a keras model

# On your Raspberry Pi

In [None]:
converter_ffile = tf.lite.TFLiteConverter.from_saved_model(model_export_dir)
converter_fkeras = tf.lite.TFLiteConverter.from_keras_model(model)

tflite_model_ffile = converter_ffile.convert()
tflite_model_fkeras = converter_fkeras.convert()

## Feed TFLite Interpreter with your model and allocate memory

In [None]:
interpreter = tf.lite.Interpreter(model_content=tflite_model_ffile)
# you must allocate tensors first
interpreter.allocate_tensors()
# obtaining the input-output shapes and types
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

## Iterate through the test dataset and run Tensorflow Lite inference

In [None]:
%%time
test_tflite_prediction = []
for index,row in normed_test_data.iterrows():
    input_tensor = np.array(row.values, 'f').reshape(input_details[0]['shape'])
    # tensor index can be gotten from the 'index' field in get_input_details
    interpreter.set_tensor(tensor_index=input_details[0]['index'], value=input_tensor)
    # run inference
    interpreter.invoke()
    #é get result
    inference = interpreter.get_tensor(output_details[0]['index'])
    test_tflite_prediction.append(inference.flatten())

In [None]:
test_tflite_prediction = np.array(test_tflite_prediction).flatten()
plot_predictions(test_weights, test_tflite_prediction)