# Project Pipeline
<img src="tuto_images/9.PNG?modified=02022021200000" width="800" >  

#### You are in the deep learning model training step  
***

# <font color=red> Welcome to this deep learning model training tutorial ! <font>
            
## There are 7 different steps :   
>- 1) Import libraries and define parameters.  
>    *Parameters followed by **#@param** are variables, you can change them.*
>- 2) Construct your dataset. (No need if you already have one)
>- 3) Load and process your dataset.
>- 4) Set/find the loss, metrics and optimal hyper-parameters.
>- 5) Train and evaluate your model.  
>    *If you want to load a pretrained model, you can skip these steps and go to the next section.*
>- 6) Predict your image.
>- 7) Visualise your predictions

### Table of Contents
* [I. Libraries and Variables](#I.-Libraries-and-Variables)
* [II. Dataset Construction](#II.-Dataset-Construction)
* [III. Dataset Loading and Processing](#III.-Dataset-Loading-and-Processing)
* [IV. Hyperparameters Tuning](#IV.-Hyperparameters-Tuning)
* [V. Model Training and Evaluation](#V.-Model-Training-and-Evaluation)
* [VI. Inference](#VI.-Inference)
* [VII. Visualization](#VII.-Visualization)
* [VIII. Earth Engine Editor Tutorials](#VIII.-Tutorials)

# I. Libraries and Variables
### First create a new virtual environment (you will have problems with your package versions otherwise)  
### Then install all requirements by running the following :

*Careful* if you have multiple python versions :
- If you are on your default python, run : **!pip** install -r requirements.txt
- If you are on another python version, add your version number to pip. Example, if you are working on python3.8 run **!pip3.8** install -r requirements.txt

In [None]:
# Check which python you use for your default pipe
!pip --version

In [None]:
# Change the version of pip if needed
!pip install -r requirements.txt

In [2]:
import tensorflow as tf
tf.keras.backend.clear_session()
gpu = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpu[0], True)
from tensorflow.keras import backend as K
import numpy as np 
import os

from dataset_construction import TFDatasetConstruction
from dataset_loader import TFDatasetProcessing
from unet import DLModel, ModelEvaluation
from losses_and_metrics import Loss, Metric, get_class_weights
from learning_rate import  LRFinder, CyclicLR, step_decay_schedule
from inference import download_tif, download_kml_from_tif

### Import, authenticate and initialize the Earth Engine library.  
If you have a gmail account and already have access to Earth Engine, do so with yours, if not, you can use this one. It was created for the purpose of this project:  
Gmail adress : `mounierseb93@gmail.com`    
Code : `mounse$15`

In [None]:
import ee
ee.Authenticate()
ee.Initialize()

### Create all folders you will need
After running the following, your directory shoud be as following :
Check if the folders are in your directory.
- **Main folder**
    * models
    * results
    * data
        * train
        * eval
        * predictions
            * colored_pipes
            * kml

In [None]:
create_folders()

### Constants

In [None]:
LANDSAT  = ['B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B10', 'B11']
SENTINEL = ['VV','VH','VV_1','VH_1']
BANDS    = LANDSAT + SENTINEL
RESPONSE = 'landcover'
FEATURES = BANDS+[RESPONSE]

KERNEL_SIZE   = 128 #@param {type:"integer"}
KERNEL_SHAPE  = [KERNEL_SIZE, KERNEL_SIZE]
COLUMNS       = [tf.io.FixedLenFeature(shape=KERNEL_SHAPE, dtype=tf.float32) for k in FEATURES]
FEATURES_DICT = dict(zip(FEATURES, COLUMNS))
NUM_FEATURES  = len(BANDS)
NUM_CLASSES   = 4 #@param {type:"integer"}

LABEL_NAMES  = [0,1,2,3]
TARGET_NAMES = ['field','forest','urban','water']

### Variables

In [None]:
# Specify training parameters
BATCH_SIZE = 4 #@param {type:"integer"}
TRAIN_SIZE = 5000 #@param {type:"integer"}
EVAL_SIZE  = 3000 #@param {type:"integer"}

# Specify training parameters
MODEL_NAME = 'unet' #@param {type:"string"}
LOSS_STR = 'weighted_scc' #@param ["scc", "weighted_scc","dice","jaccard"]
OPTIMIZER = 'sgd' #@param ["sgd","adam","rmsprop"]
HISTORY_FILE = 'results/'+MODEL_NAME+'.log'
CHECKPOINT_FILE = 'models/'+MODEL_NAME

# Callbacks
MONITOR_EARLY_STOP = 'loss' #@param ["loss","val_loss"]
MONITOR_CHECKPOINT = 'loss' #@param ["loss","val_loss"]

# II. Dataset Construction  
a) First connect to Google Drive using this address and password :  
Gmail adress : [mounierseb93@gmail.com]  
Code : [mounse$15]

**Every time your run a code, if you receive a message like this : "Please download file from Drive from folder ...", go to the Google Drive and to the folder mentioned, and download the file in the same folder on your computer.**

b) Run the following only if you haven't already access to the training dataset (.tfrecord.gz in folders 'train' and 'eval') 

In [None]:
# Export training and evaluation tfrecords
tfdataconstructor = TFDatasetConstruction(LANDSAT,SENTINEL,RESPONSE,KERNEL_SIZE)
tfdataconstructor.dataset_construction("2017-01-01","2017-12-31") #the date should not change since the label dataset is from 2017

# III. Dataset Loading and Processing
If you don't have access to the training dataset, download it from the Google Drive in this address and password (and make sur to add it the right folder with the same name as in the drive) :    
Gmail adress : [mounierseb93@gmail.com]  
Code : [mounse$15]  
If it's not there for some reason, or if you want to construct your own dataset, go to tuto_dataset_construction.ipynb and run it.


In [None]:
# Load and process training and evaluation tf.Datasets
tfdataloader = TFDatasetProcessing(FEATURES_DICT,FEATURES,BANDS,NUM_FEATURES)
training     = tfdataloader.get_training_dataset()
evaluation   = tfdataloader.get_eval_dataset()
NUM_FEATURES = tfdataloader.num_features

In [None]:
print(iter(evaluation.take(1)).next())

# IV. Hyperparameters Tuning

* [1. Optimal Learning Rate](#1.-Optimal-Learning-Rate)
Find the best learning rate to start with, then choose a scheduler : constant, staircase or cyclical learning rate
* [2. Scheduler : Staircase Learning Rate](#2.-Scheduler-:-Staircase-Learning-Rate)
* [3. Scheduler : Cyclical Learning Rate](#3.-Scheduler-:-Cyclical-Learning-Rate)

## 1. Optimal Learning Rate

In [None]:
MIN_LR    = 1e-2 #@param {type:"number"} 
MAX_LR    = 1e-1 #@param {type:"number"} 
EPOCHS    = 2 #@param {type:"integer"}
lr_finder = LRFinder(min_lr=MIN_LR, 
                      max_lr=MAX_LR, 
                      steps_per_epoch=int(TRAIN_SIZE / BATCH_SIZE), 
                      epochs=EPOCHS)

FROM_CHECKPOINT = True #@param{type:"boolean"}

m = modelconstructor.init_model(from_checkpoint=FROM_CHECKPOINT)
m.fit(
    x=training, 
    epochs=EPOCHS, 
    batch_size=BATCH_SIZE,
    steps_per_epoch=int(TRAIN_SIZE / BATCH_SIZE), 
    callbacks=[lr_finder,checkpoint])
lr_finder.plot_loss()

ind_max = np.argmax(lr_finder.history['sparse_categorical_accuracy'])
print('The lr corresponding to the maximal metric = ',lr_finder.history['lr'][ind_max])
ind_min = np.argmin(lr_finder.history['loss'])
print('The lr corresponding to the minimal loss = ',lr_finder.history['lr'][ind_min])
import gc
gc.collect()

## 2. Scheduler : Staircase Learning Rate

In [None]:
# Choose the initial learning rate as the optimal learning rate found just above
INITIAL_LR   = 0.08 #@param {type:"number"}
DECAY_FACTOR = 0.9 #@param {type:"number"}
STEP_SIZE    = 5 #@param {type:"integer"}
lr_step      = step_decay_schedule(initial_lr=INITIAL_LR, decay_factor=DECAY_FACTOR, step_size=STEP_SIZE)

## 3. Scheduler : Cyclical Learning Rate

In [None]:
BASE_LR   =  1e-8 #@param {type:"number"}
MAX_LR    =  5e-8 #@param {type:"number"}
STEP_SIZE =  1 #@param {type:"number"}
MODE      = 'triangular2' #@param ["triangular","triangular2","exp_range"]
clr       = CyclicLR(base_lr=BASE_LR, max_lr=MAX_LR,step_size=STEP_SIZE*int(TRAIN_SIZE / BATCH_SIZE), mode=MODE)

## Callbacks

In [None]:
lr_sched = 'staircase' #@param ["staircase","cyclical","constant"]

In [None]:
# Callbacks
csv_logger = tf.keras.callbacks.CSVLogger(HISTORY_FILE, separator=',', append=False)
early_stop = tf.keras.callbacks.EarlyStopping(monitor=MONITOR_EARLY_STOP, mode='min', verbose=2,min_delta=0.001,patience=4)
checkpoint = tf.keras.callbacks.ModelCheckpoint(CHECKPOINT_FILE, monitor=MONITOR_CHECKPOINT, verbose=0, save_best_only=True,save_weights_only=False, mode='min',save_freq=int(TRAIN_SIZE / BATCH_SIZE))
CALLBACKS = [csv_logger,checkpoint]

if lr_sched =='constant' :
  K.set_value(self.model.optimizer.lr, ind_max)
elif lr_sched == 'staircase' :
  CALLBACKS.append(lr_step)
elif lr_sched == 'cyclical' :
  CALLBACKS.append(clr)
else : 
  raise NotImplementedError('Unrecognised schedule')

# Losses
from keras.utils.generic_utils import get_custom_objects
class_weights = get_class_weights(training,TRAIN_SIZE,KERNEL_SIZE,NUM_CLASSES)
losses        = Loss(class_weights,NUM_CLASSES)
LOSS          = losses.get_loss(LOSS_STR)
get_custom_objects().update({"loss": LOSS})

# Metrics
metrics = Metric(NUM_CLASSES)
METRICS = ['sparse_categorical_accuracy',metrics.f1_score]
get_custom_objects().update({"f1_score": metrics.f1_score})

# Model
modelconstructor = DLModel(training,NUM_FEATURES,NUM_CLASSES,BATCH_SIZE,OPTIMIZER,LOSS,METRICS,CHECKPOINT_FILE)

# V. Model Training and Evaluation

## Training

In [None]:
FROM_CHECKPOINT = True #@param{type:"boolean"}
#m = modelconstructor.init_model(from_checkpoint=FROM_CHECKPOINT)

EPOCHS =  100 #@param {type:"integer"}

m.fit(
    x=training, 
    epochs=EPOCHS, 
    batch_size=BATCH_SIZE,
    steps_per_epoch=int(TRAIN_SIZE / BATCH_SIZE), 
    #validation_data=evaluation,
    #validation_steps=EVAL_SIZE,s
    callbacks=CALLBACKS)

m.optimizer.lr
import gc
gc.collect()

## Evaluation

In [None]:
evaluator = ModelEvaluation(MODEL_NAME,EVAL_SIZE,TARGET_NAMES,LABEL_NAMES)
evaluator.evaluate(m,evaluation)

# VI. Inference

This is how inference works, 
* You specify the input location,
* The code will extract the test image in that location
* The pretrained model will process the input image and produce a mask image.
    
<img src="tuto_images/8bis.PNG?modified=08022021170000" width="800">  

You have two options to how you create your test image :  
>**1)** Export an image from a square window of a given center ```[Lon,Lat]``` and ```radius``` (in meters)   
>**2)** Export an image from a window given a bounding box ```[Lon1, Lat1, Lon2, Lat2]``` 

### Parameters

In [None]:
# Specify inference parameters
start_date = "2020-01-01"
end_date   = "2020-12-31"
image_name   = 'test' #Name your image as you want'

#Do not change the following
lon    = None 
lat    = None
point  = [lon,lat]
radius = None
minLng = None
minLat = None 
maxLng = None 
maxLat = None
rectangle = [minLng, minLat, maxLng, maxLat]

You have two options to how you create your test image :  
* [1. Export an image from a square window of a given center `[Lon,Lat]` and `radius` (in meters)](#1.-Export-an-image-from-a-square-window-of-given-center-and-radius)
* [2. Export an image from a window given a bounding box `[Lon1, Lat1, Lon2, Lat2]`](#2.-Export-an-image-from-a-window-given-a-bounding-box)

## Fill only one of the following :

### 1. Export an image from a square window of given center and radius

In [None]:
lon    = None #@param {type;'number'}
lat    = None #@param {type:'number'}
point  = [lon,lat]
radius = None #@param {type:'number'} #(in meters)

### 2. Export an image from a window given a bounding box

In [None]:
minLng    = None #@param {type:'number'}
minLat    = None #@param {type:'number'}
maxLng    = None #@param {type:'number'}
maxLat    = None #@param {type:'number'}
rectangle = [minLng, minLat, maxLng, maxLat]

## Inference

In [None]:
# Construct inference dataset
tfdataconstructor = TFDatasetConstruction(LANDSAT,SENTINEL,RESPONSE,KERNEL_SIZE)
corners = tfdataconstructor.test_dataset_construction(start_date,end_date,image_name,patch_size=KERNEL_SHAPE,point=point,radius=radius,rectangle=rectangle)

In [None]:
# Load and predict inference dataset and write predictions
tfdataloader = TFDatasetProcessing(FEATURES_DICT,FEATURES,BANDS,NUM_FEATURES)
testdataset  = tfdataloader.get_inference_dataset(image_name)
NUM_FEATURES = tfdataloader.num_features

inference = Inference(NUM_CLASSES,MODEL_NAME)
predictions = inference.doDLPrediction(testdataset,image_name)

# VII. Visualization
On **GEEMap** OR on **Google Earth Pro**

*Note : replace **filename** by the name of your inference image in the tutorial below* .
  
You can visualize predictions directly on GEEMap OR on Google Earth Pro by creating a kml file.
*But*  you need to georeference your predictions by adding to Google Earth Editor the files you just created. To do so, foll the tutorial [Add Image to Assets](#2.-Add-Image-to-Assets)

## 1. Visualise on GMap

In [None]:
import geemap.folium as gmap
Map = gmap.Map()
predictions = ee.Image('users/lealm/'+image_name+'_pred')
Map.addLayer(predictions,{'min':0,'max':3,'palette':['lime','darkgreen','yellow','blue']},'predictions')

## 2. Export a tif and kml

In [3]:
ASSETID = image_name + '_pred'
# Since Earth Engine allows export only in tif format, we export the tif first and then we convert it to kml
download_tif(ASSETID)

Image export completed
Download image test_image_pred from drive (directory data/predictions) if you work on your local computer


In [4]:
download_kml_from_tif('data/predictions/'+ASSETID+'.tif')

Kml saved at "data/predictions/kml"


# VIII. Tutorials
The Earth Engine (EE) Code Editor at https://code.earthengine.google.com is a web-based IDE for the Earth Engine JavaScript API. Code Editor features are designed to make developing complex geospatial workflows fast and easy. The Code Editor has the following elements :
* JavaScript code editor
* Map display for visualizing geospatial datasets
* API reference documentation (Docs tab)
* Git-based Script Manager (Scripts tab)
* Console output (Console tab)
* Task Manager (Tasks tab) to handle long-running queries
* Interactive map query (Inspector tab)
* Search of the data archive or saved scripts
* Geometry drawing tools
  
<img src="tuto_images/earth-engine-code-editor.PNG" width=500>

Read https://developers.google.com/earth-engine/guides/playground for more information

## 1. Add Table to Assets

- **1.** If your file is a "kml" file, convert it to a ".shp" file by using this website https://mygeodata.cloud/converter/kml-to-shp OR by using **QGIS software** : 
  * **a.** Drag your kml to the QGIS window.
  * **b.** Right-click on your layer and choose "Export" then "Export Feature As"   
  <img src="tuto_images/3.PNG" width="400">  

  * **c.** Set "Format" as "ESRI Shapefile" and the CRS as "EPSG:3857 / Pseudo-Mercator"  
  * **d.** Fill "File name" to the name of the file. Careful, you should provide the directory : example C:\....pipe.shp
  * **e.** Click on "OK" and wait, this could take a moment. Now you have created several files, please keep them all.  
  <img src="tuto_images/4.PNG" width="400"> 
    
- **2.** Upload your files to your Google Earth Engine Editor :   
  * **a.** Go to https://code.earthengine.google.com/
  * **b.** Click on "Assets", then "NEW", then below "Table Upload", click on "Shape files". 
  <img src="tuto_images/5.PNG" width="400">
  * **c.** Select all the files you just created with QGIS.  
  * **d.** Name your Table, for example "saur_zone2"
  * **e.** Click on "UPLOAD" 
  <img src="tuto_images/16.PNG" width="400">
  - **f.** On the right corner of your screen, click on "Tasks". Check the status of your export.
  <img src="tuto_images/17.PNG" width="400">
 
Now you can access your table by typing : image = ee.FeatureCollection(assetid) with assetid = directory/network_name.
You can find the assetid by clicking on the asset.  
<img src="tuto_images/11.PNG?modified=02022021134100" width="400">

## 2. Add Image to Assets
*replace `filename` by the name of your inference image in the tutorial below*

- **1.** First, if not already done, add your network to Google Earth Editor's Assets following the tutorial in the previous section.
- **2.** Go to https://code.earthengine.google.com/ .
- **3.** Click on "Assets", then "NEW", then below "Image Upload", click on "GeoTIFF".  
<img src="tuto_images/1.PNG" width="400">
- **4.** In "Sources files" select the files `pred_filename.TFRecord` and `filename-mixer.json` in your folder 'inference'
- **5.** Set "AssetId" to "`filename`_pred"
- **6.** Click on "UPLOAD"  
<img src="tuto_images/2.PNG" width="400">
- **7.** On the right corner of your screen, click on "Tasks". Check the status of your export.  
If there's an error "cannot read mixer file", retry the steps above by putting the mixer file before the tfrecord file and vise-versa several times until you succeed, the system bugs sometimes.
<img src="tuto_images/6.PNG" width="400">

Now you can access your image by typing : image = **ee.Image(assetid)** with assetid = directory/filename_pred.  
You can find the assetid by clicking on the asset.  
<img src="tuto_images/7.PNG?modified=02022021134700" width="400">