<table class="ee-notebook-buttons" align="left">
    <td><a target="_blank"  href="https://github.com/davidelomeo/mangroves_deep_learning/blob/main/Notebook_2-Generate_Model.ipynb"><img width=32px src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" /> View source on GitHub</a></td>
    <td><a target="_blank"  href="https://colab.research.google.com/github/davidelomeo/mangroves_deep_learning/blob/main/Notebook_2-Generate_Model.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" /> Run in Google Colab</a></td>
</table>

# **Requirements**
The requirements to run this notebook are:
1. Have a @gmail account **->** here how to get one: https://support.google.com/accounts/answer/27441?hl=en
2. Have a Google Earth Engine account setup **->** here how to get one: https://signup.earthengine.google.com/
3. (Optional) Have a Google Coud Storage setup **->** here how to get one: https://cloud.google.com/storage *(please read Note 2 below)*

---
**Note**: Google Earth Engine is a free to use online tool, but it requires authorisation from Google first. After signing up, it may take a few days before being able to access the platform.

**Note 2**: Google Cloud Storage IS NOT a free tool. Like many other cloud services, it has different costs for different services. <br/>
At the time of writing of this notebook, Google offers new Cloud Storage users a 90-day free trial with some funds attached to it.<br/>
Please find more info at: https://cloud.google.com/free/docs/gcp-free-tier <br/>

# **Objective**

This Notebook has the purpose of loading pacthes as TFRecords from the target storage, split the data into training, test an optionally validation datasets, convert them into dataset readable by a Keras model and train the target model.

The model is then saved to the target storage method for later use in Notebook 3 to make predictions.

# 1. Preparing the workspace

## Cloning the Github Repository
The github repository that stores the project is cloned to the workspace to allow accessing the needed packages.

In [None]:
github_repo = "https://github.com/davidelomeo/mangroves_deep_learning.git"
print("Github Repository: ", github_repo)

!git clone "{github_repo}" # clone the github repository

## Installing the required packages
Although Google Colab has a pre-installed environment that contains many packages, a `requirement.txt` was provided in the GitHub repository for consistency (please see disclaimer below).

The following code also install custom packages created specifically to facilitate the reproducibility of some key parts of the worfkflow, and hence allow the user to re-use these packages in other projects.

---
**Disclaimer**: The notebook was specifically designed to work on Google Colab. The user may use the notebook on a local machine (e.g. using jupyter notebook), but mounting the Google Drive will not be possible with the method showed below. In that scenario, the user may need to use Google Cloud Storagae only.

In [2]:
# Installing requirements.txt
# '&> /dev/null' allows to hide the terminal output when running the command
!pip install -r mangroves_deep_learning/requirements.txt &> /dev/null

## Importing the required packages
Here the code imports all the needed packages for this notebook.

**Note**: it is necessary to authenticate Google Drive and Google Earth Engine to use the notebook. Make sure to have previoulsy created the necessary accounts. If te user wants to use the Google Cloud Storage, then this also need to be authenticated as below.

---
As the cell below is executed, both Google Drive and Google Earth Engine will require authentication. Please select the links that will appear below the code - this will open a new tab in the browser - login with the desired gmail account, allow Google to access the application and copy the key that will show on screen inside the box below.

In [None]:
import os
import ee
import geemap
import datetime
import json
import subprocess
import numpy as np
import tensorflow as tf
import tensorflow_addons as tfa
import matplotlib.pyplot as plt

# These are custom packages. Please see the README in the repo for details.
import eeCustomDeepTools as cdt
import CustomNeuralNetworks as cnn 

from pprint import pprint
from pathlib import Path

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Model
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from tensorflow.python.tools import saved_model_utils

from google.colab import auth, drive
# Authorising Google Colab notebook to access the target Google Drive and mount it
drive.mount('/content/drive')

# Authorising Google Colab to access the Google Earth Engine account
try:
    ee.Initialize()
except Exception as e:
    ee.Authenticate()
    ee.Initialize()

# outputting plots in the notebook
%pylab inline
#loading tensorboard notebook extensions
%load_ext tensorboard

----> Only Run the next cell if wanting to export the generated models to Google Cloud Storage or loading a pre-trained model from Google Cloud Storage

---
Please authenticate Google Cloud Storage as done above with Google Drive and Earth Engine

In [None]:
# Authorising Google Colab notebook to access the target Google Cloud account
auth.authenticate_user()


## Checking GPU availability
Checking if a GPU is available. This task is useful especially if the user has a basc Google Colab account, for which the GPU availability it time-restricted.

Having a GPU available is essential for training models. Without an available GPU, training data becomes unfeasable.

If there is no available GPU, then please try to run this notebook later.

In [None]:
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

## Loading the .json file with the info about the patches

In [None]:
# loading the json file that contains the name and the path of the exported
# paches as saved in Notebook_1-Generate_Patches
exported_pacthes_info = '/content/drive/MyDrive/...FILENAME.js' # change this path accordingly
with open(exported_pacthes_info) as j:
  patches_info = json.load(j)

pprint(patches_info)

## Setting global variables
The variables below are by default aken from the .json file saved at the end of Notebook 1. Please provide your own info you have not saved data using Notebook 1

In [28]:
patch_size = patches_info['pixels']
year = patches_info['year']

pacthes_folder = patches_info['folder'] + '/'
patches_prefix = patches_info['prefix']

classes_label = 'classes'

classes = patches_info['classes']
n_classes = len(classes)

bands = patches_info['bands']

----> Only use the following code if getting data from **Google Drive**

---


In [29]:
# Pointing to data in Google Drive
gdrive = Path('/content/drive/MyDrive/')
patches_path = gdrive / pacthes_folder

----> Only use the following code if getting data from **Google Coud Storage**

---


In [None]:
# Name of the Google Cloud Storage Bucket: please change to your own
bucket = 'mangroves_classification_bucket'

# Pointing to data in Google Cloud Storage
patches_path = !gsutil ls 'gs://'{bucket}'/'{pacthes_folder}'/'

# 2. Import and prepare dataset
This section allows the user to import the dataset stored in the target Google storage, split the data and prepare the datasets for the Keras models

## Get TFRecords paths and info
Getting info about the exported patches. The folder where the patches were exported contains a mixer.json file that is automatically generated by Earth Engine.

In [None]:
# Loading the class to get the info. The class will look into Google Drive by default
info = cdt.GetFilesInfo()
records_list, json_file = info.get_files(patches_path, patches_prefix)
mixer = info.get_mixer(json_file)

patch_nx = mixer['patchDimensions'][0]
patch_ny = mixer['patchDimensions'][1]
patches_tot = mixer['totalPatches']
patch_dims = [patch_nx, patch_ny]
pprint(mixer)

## Load TFRecords
The TFRecords filenames within the `list records_list` obtained above are passed to `tf.dat.TFRecordDataset` to generated a records dataset.

In [31]:
# Create a dataset from the TFRecord file in Cloud Storage.
dataset = tf.data.TFRecordDataset(records_list, compression_type='GZIP')

## Split data in Training, Test and optionally Validation datasets

In [None]:
# Train-test split
training_chunk = 0.80
test_chunk = 0.10
valid_chunk = 0.10

# splitting the dataset into training, validation and test
training_ds, test_ds, val_ds = cdt.dataset_split(dataset, patches_tot, training_chunk, test_chunk, valid_chunk)

## Generate dictionary of features
Creating a dictionary of known size features is key for later mapping the patches and create multi-channel TensorFlow features

In [None]:
bands_of_interest = bands[:12]
n_channels = len(bands_of_interest)
features_dict = cdt.get_features_dict(bands, classes_label, bands_of_interest, patch_dims)
pprint(features_dict)

## Prepare datasets for training in keras models

In [None]:
# Please modify these parameters as needed
training_batch_size = 12
test_batch_size = 1

# Preparing the data to be fed to keras models
prepare_data = cdt.PrepareBatches(features_dict, n_classes, classes_label)
train_b, test_b, valid_b = prepare_data.prepare_batches(training_batch_size, test_batch_size, training_ds, test_ds, val_ds)
pprint(train_b)
pprint(test_b)
pprint(valid_b)

# 3. Training Keras models
In this section the dataset are fed to the target keras model for training

## Load target model

In [None]:
u_net = cnn.UNet(n_classes)
image_shapes = tuple(patch_dims) + (n_channels,)
model = u_net.build_model(image_shapes)
model.summary()

## Compile the target model

In [None]:
# cannot use Accuracy when using the Sigmoid Focal cross entropy
model.compile(optimizer=tf.keras.optimizers.Adam(),
              loss=tfa.losses.SigmoidFocalCrossEntropy(),
              metrics=[tf.keras.metrics.Precision(name='prec'),
                       tf.keras.metrics.Recall(name='rec'),
                       tf.keras.metrics.CategoricalAccuracy(name='cat_acc'),
                       tf.keras.metrics.CategoricalCrossentropy(name='cat_xntrp'),
                       tf.keras.metrics.KLDivergence(name='KLDiv')])

## Identify target folder
Here a Google Drive path was selected by default, but the user may wish to change this to a Google Cloud Storage path

In [None]:
folder_name = 'UNet_models'
model_subfolder_identifier = 'UNet_model_sig_foc_crossentropy'
path_to_model = str(gdrive) + '/' + folder_name + '/' + model_subfolder_identifier
print(path_to_model)

### Loading a pre-trained model (Optional)
Run the following cell only if wanting to load a pre-trained model to continue its training for more epochs

In [16]:
folder = '/content/drive/MyDrive/.../' # change path accordingly
model_name = 'MODEL_NAME.h5' # change name accordingly
model = keras.models.load_model(folder + model_name)

## Setup model's callbacks and epochs
Here the user may decide to insert more model callbakcs. Callbakcs are used when training data to save the performance of the model at different stages of training.



In [36]:
# Setting numer of epochs
nepochs = 10

# Setting up tensorboard folder to save model's logs
log_path = path_to_model + '/logs'
logdir = os.path.join(log_path, datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
tb_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)

# patience: Number of epochs with no improvement after which training will be stopped.
early_stopping = EarlyStopping(monitor='val_loss', patience=10)

# Defining the path and the model name to save at each epoch
checkpoint_path =  path_to_model + '/epochs:{epoch:03d}.h5'
checkpoint = ModelCheckpoint(filepath=checkpoint_path)

# Seting the pacth to the history of the model (where metrics are saved)
history_path = path_to_model + '/history.js'

# val_acc. mode='max - val_loss, mode='min
# checkpoint = ModelCheckpoint(filepath=checkpoint_path, monitor='val_loss', save_best_only=False, mode='max')

## Run the training
The code below trains the model with the input training and validation set.

It also updates the automatically generated history dictionary by computing the f1 score and adding the total time taken by the model to run.

In [None]:
# Starting the timing of the model runtime
start_time = time.time()

# Fit the model to the training data.
history = model.fit(x = train_b, 
                    epochs = nepochs, 
                    validation_data = valid_b, 
                    callbacks = [tb_callback,
                                 early_stopping,
                                 checkpoint])

# Ending the timing of the model runtime
end_time = time.time()
total_time = end_time - start_time
print('\nThe model has taken {:.3} minutes to run\n'.format(total_time/60))

# Rounding the values of the metrics for cleanliness
for k, v in history.history.items():
  history.history[k] = [round(i, 4) for i in v]

# calculate F1_score for training and validation sets with the computed metrics
# (The F1 score can also be calculated automatically using tensorflow-addons.
# This metric, however, has demonstrated during this project to be heavy to
# compute and increased exponetnially the training time)
def F1(prec, rec):
  return 2 * (prec * rec) / (prec + rec)

f1_score = []
val_f1_score = []

for a, b, c, d, in zip (history.history['prec'], history.history['rec'], 
                        history.history['val_prec'], history.history['val_rec']):
  f1_score.append(round(F1(a, b), 4))
  val_f1_score.append(round(F1(c, d), 4))

# Adding the extra computed metrics to the history dictionary
history.history['f1_score'] = f1_score
history.history['val_f1_score'] = val_f1_score
history.history['time_taken'] = round(total_time/60, 3)

# exporting the history discionary to the previously defined path
with open(history_path, 'w') as f:
    json.dump(history.history, f)

### Explore tensorboard logs interactive plots (optional)
This line of code allows the user to plot the logs generated by the model in an interactive shell.

In [None]:
%tensorboard --logdir log_path

## Evaluate the model
Here the model is evaluated using the previoulsy generated test dataset

In [None]:
model.evaluate(x = test_b)

### Plotting model metrics (Optional)
Loading the dictionary that contains the metrics from the target cloud storage and plot them

In [None]:
# loading the json fil that contains the name and the path of the exported paches
exported_pacthes_info = '/content/drive/MyDrive/...FILENAME.js' #change this path accordingly
with open(exported_pacthes_info) as j:
  full_model_hist = json.load(j)

fig, ax = plt.subplots(3, 2, figsize=(12, 12))
fig.tight_layout(w_pad=3, h_pad=4)

# Plotting history for accuracy
ax[0, 0].plot(full_model_hist['cat_acc'])
ax[0, 0].plot(full_model_hist['val_cat_acc'])
ax[0, 0].set_title('Model Accuracy')
ax[0, 0].set_ylabel('accuracy')
ax[0, 0].set_xlabel('epoch')
ax[0, 0].legend(['train', 'validation'], loc='upper left')

# Plotting history for categorical crossentropy
ax[0, 1].plot(full_model_hist['cat_xntrp'])
ax[0, 1].plot(full_model_hist['val_cat_xntrp'])
ax[0, 1].set_title('Model Crossentropy')
ax[0, 1].set_ylabel('categorical crossentropy')
ax[0, 1].set_xlabel('epoch')
ax[0, 1].legend(['train', 'validation'], loc='upper left')

# Plotting history for loss
ax[2, 1].plot(full_model_hist['loss'])
ax[2, 1].plot(full_model_hist['val_loss'])
ax[2, 1].set_title('Model Loss')
ax[2, 1].set_ylabel('loss')
ax[2, 1].set_xlabel('epoch')
ax[2, 1].legend(['train', 'validation'], loc='upper left')

# Plotting history for f1 score
ax[2, 0].plot(full_model_hist['f1_score'])
ax[2, 0].plot(full_model_hist['val_f1_score'])
ax[2, 0].set_title('Model F1 Score')
ax[2, 0].set_ylabel('f1 score')
ax[2, 0].set_xlabel('epoch')
ax[2, 0].legend(['train', 'validation'], loc='upper left')

# Plotting history for precision
ax[1, 0].plot(full_model_hist['prec'])
ax[1, 0].plot(full_model_hist['val_prec'])
ax[1, 0].set_title('Model Precision')
ax[1, 0].set_ylabel('precision')
ax[1, 0].set_xlabel('epoch')
ax[1, 0].legend(['train', 'validation'], loc='upper left')

# Plotting history for recall
ax[1, 1].plot(full_model_hist['rec'])
ax[1, 1].plot(full_model_hist['val_rec'])
ax[1, 1].set_title('Model Recall')
ax[1, 1].set_ylabel('recall')
ax[1, 1].set_xlabel('epoch')
ax[1, 1].legend(['train', 'validation'], loc='upper left')

# 6. Access Notebook 3 to make predictions
<table class="ee-notebook-buttons" align="left">
    <td><a target="_blank"  href="https://github.com/davidelomeo/mangroves_deep_learning/blob/main/Notebook_3-Make_Predictions.ipynb"><img width=32px src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" /> Access Notebook_3 on Github</a></td>
    <td><a target="_blank"  href="https://colab.research.google.com/github.com/davidelomeo/mangroves_deep_learning/blob/main/Notebook_3-Make_Predictions.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" /> Run Notebook_3 in Google Colab</a></td>
</table>