# Project Pipeline
<img src="tuto_images/9.PNG" width="800">  

***

# <font color=red> Welcome to this machine learninig prediction tutorial ! </font>
            
### There are 4 different steps :   
>- Install and import libraries, create folders and define parameters.  
>    *Parameters followed by **#@param** are variables, you can change them.*
>- Construct your dataset. 
>- Load and process your dataset.
>- Train and evaluate your model.  
>- Predict your zone of interest. 

**If you already have a pretrained model, do only step 1 and 4**

### Table of Contents
* [I. Libraries and Variables](#I.-Libraries-and-Variables)
* [II. Training](#II.Training) (**skip this if you already have a pretrained model**)
    - [II. 1. Dataset Construction](#II.-1.-Dataset-Construction)
    - [II. 2. Dataset Loading and Processing](#II.-2.-Dataset-Loading-and-Processing)
    - [II. 3. Model Training and Evaluation](#II.-3.-Model-Training-and-Evaluation)
* [III. Inference](#III.-Inference)  
    - [III. 1. Label Image](#III.-1.-Label-Image)
    - [III. 2. Pipe Classification, Network Statistics and KMLs](#III.-2.-Pipe-Classification,-Network-Statistics-and-KMLs)
* [IV. Google Earth Engine Editor Tutorials](#IV.-Tutorials)  

# I. Libraries and Variables
### First create a new virtual environment (you will have problems with your package versions otherwise)  
### Then install all requirements by running the following :

*Careful* if you have multiple python versions :
- If you are on your default python, run : **!pip** install -r requirements.txt
- If you are on another python version, add your version number to pip. Example, if you are working on python3.8 run **!pip3.8** install -r requirements.txt

In [1]:
# Check which python you use for your default pipe
!pip --version

pip 20.2.2 from C:\Users\leakm\AppData\Roaming\Python\Python37\site-packages\pip (python 3.7)



In [3]:
# Change the version of pip if needed
!pip install -r requirements.txt

Collecting osgeo
  Using cached osgeo-0.0.0-py3-none-any.whl (1.1 kB)
Processing c:\users\leakm\appdata\local\pip\cache\wheels\65\55\85\945cfb3d67373767e4dc3e9629300a926edde52633df4f0efe\umap-0.1.1-py3-none-any.whl
Installing collected packages: osgeo, umap
Successfully installed osgeo-0.0.0 umap-0.1.1


In [None]:
from utils import create_folders
import tensorflow as tf
from dataset_construction import TFDatasetConstruction
from dataset_loader import TFDatasetProcessing, NPDatasetProcessing, undersample
from models import ModelTrainingAndEvaluation
from joblib import load
from inference import Inference, download_kml
from utils import predict_pipes, predict_pipes_from_csv, clean_predictions, color_pipes, get_statistics

### Import, authenticate and initialize the Earth Engine library.  
If you have a gmail account and already have access to Earth Engine, do so with yours, if not, you can use this one. It was created for the purpose of this project:  
Gmail adress : `mounierseb93@gmail.com`    
Code : `mounse$15`

In [None]:
import ee
ee.Authenticate()
ee.Initialize()

### Create all folders you will need
After running the following, your directory shoud be as following :
Check if the folders are in your directory.
- **Main folder**
    * models
    * results
    * data
        * train
        * eval
        * predictions
            * colored_pipes
            * kml

In [None]:
create_folders()

### Constants

In [None]:
LANDSAT  = ['B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B10', 'B11']
SENTINEL = ['VV','VH','VV_1','VH_1']
BANDS    = LANDSAT + SENTINEL
RESPONSE = 'landcover'
FEATURES = BANDS+[RESPONSE]

KERNEL_SIZE   = 128
KERNEL_SHAPE  = [KERNEL_SIZE, KERNEL_SIZE]
COLUMNS       = [tf.io.FixedLenFeature(shape=KERNEL_SHAPE, dtype=tf.float32) for k in FEATURES]
FEATURES_DICT = dict(zip(FEATURES, COLUMNS))
NUM_FEATURES  = len(BANDS)
NUM_CLASSES   = 4 

LABEL_NAMES  = [0,1,2,3]
TARGET_NAMES = ['field','forest','urbain','water']

#Do not change the following
point  = None
radius = None
rectangle = None
network_name = None

# II. Training 
### <font color=red> If you already have a pretrained model <font>, skip this part and go to [inference](#III.-Inference)

### Variables

In [None]:
#-------------Specify preprocessing parameters-----------#

TRAIN_SIZE = 5000 #@param {type:"integer"} #maximum : 5000
EVAL_SIZE  = 3000 #@param {type:"integer"} #maximum : 3000

# Whether to undersample (ie take the same number of pixels from each class)
UNDERSAMPLING = True #@param {type:'boolean'}

# The number of pixels taken from each class if undersampling=True, if None, set to the number of the rearest class
SAMPLES_PER_CLASS = None #@param (None or integer)

# Whether to add misclassified pixels from previous model 
# do not use this if you haven't learnt how to create them. Cf to my tutorial "eeTutoriel.ipynb"
MISCLASSIFIED_PIXELS = False #@param {type:'boolean'}

# The name of the Asset of misclassified pixels, if misclassified_pixels=True
ASSETID = "users/leakm/misclassified_pixels"

#-------------Specify model parameters--------------#

# Careful, the name of your model should contain the model type such as : knn_something-something, or something-rf_something
MODEL_NAME   = 'rf' #@param ["knn", "svm", "rf","pca_rf","umap_rf"] 

FINETUNE = False #@param {type: 'boolean'}

# II. 1. Dataset Construction  
If you don't have access to the training dataset (.tfrecord.gz in folders 'train' and 'eval'), download it from the Google Drive in this address and password (and make sur to add it the right folder with the same name as in the drive).   
    Gmail adress : `mounierseb93@gmail.com`  
    Code : `mounse$15`  
      
If it's not there for some reason, or if you want to construct your own, run the following :
    
*Every time your run a code, if you receive a message like this : "Please download file from Drive from folder ...", go to the Google Drive and to the folder mentioned, and download the file in the same folder on your computer.*

In [None]:
# Export training and evaluation tfrecords
tfdataconstructor = TFDatasetConstruction(LANDSAT,SENTINEL,RESPONSE,KERNEL_SIZE)
tfdataconstructor.dataset_construction("2017-01-01","2017-12-31") #the date should not change since the label dataset is from 2017

# II. 2. Dataset Loading and Processing

In [None]:
# Load and process training and evaluation tf.Datasets
tfdataloader = TFDatasetProcessing(FEATURES_DICT,FEATURES,BANDS,NUM_FEATURES,batch_size=BATCH_SIZE)
training     = tfdataloader.get_training_dataset()
evaluation   = tfdataloader.get_eval_dataset()
NUM_FEATURES = tfdataloader.num_features

# Convert tf.Datasets to numpy arrays
npdataloader = NPDatasetProcessing(NUM_FEATURES,NUM_CLASSES)
train        = npdataloader.tf_to_numpy(training,TRAIN_SIZE)
eval         = npdataloader.tf_to_numpy(evaluation,EVAL_SIZE)
del training,evaluation

# Undersampling
if UNDERSAMPLING :
    train['features'],train['labels'] = undersample(train['features'],train['labels'],SAMPLES_PER_CLASS)
    eval['features'] ,eval['labels']  = undersample(eval['features'],eval['labels'],SAMPLES_PER_CLASS)

# Adding samples to the dataset
if MISCLASSIFIED_PIXELS :
    train = npdataloader.adding_more_pixels(train,ASSETID,tfdataconstructor,tfdataloader)

# II. 3. Model Training and Evaluation
You can :  
## a. Train your model and evaluate it

In [None]:
# 1) Model fitting, if you want to (re)train your model
model = ModelTrainingAndEvaluation(MODEL_NAME,train,eval,FINETUNE)
if 'pca' in MODEL_NAME :
    model.pca(NUM_CLASSES)
elif 'umap' in MODEL_NAME :
    model.umap(NUM_CLASSES)
elif 'knn' in MODEL_NAME :
    model.knn() #check the parameters you can pass as arguments
elif 'svm' in MODEL_NAME :
    model.svm() #check the parameters you can pass as arguments
elif 'rf' in MODEL_NAME :
    model.rf() #check the parameters you can pass as arguments

# Model evaluation
%matplotlib inline
model.eval_model(LABEL_NAMES,TARGET_NAMES)

## b. Evaluate a saved model

In [None]:
#the name of the model (without the extension), it should be the same as the one in your folder "models"
MODEL_NAME = 'rf' #@param 

In [None]:
model = ModelTrainingAndEvaluation(MODEL_NAME,train,eval,False)
model.model = load('models/'+MODEL_NAME+'.joblib')
model.eval_model(LABEL_NAMES,TARGET_NAMES)

# III. Inference

This is how inference works, you specify the input, the pretrained model will process it and will produce different outputs.
    
<img src="tuto_images/8.PNG" width="800">  

### Variables

In [None]:
MODEL_NAME = 'rf' #@param #the name of the model (without the extension), it should be the same as the one in your folder "models"

# Specify inference parameters

# The image that will be created is the mean of an Image Collection of sattelite images. 
# You should specify the date range of this collection :
start_date = "2020-01-01"   
end_date   = "2020-12-31"

image_name = 'test' #Name your image as you want

#Whether to perform Conditional Random Fields on your predictions
PERFORM_CRF = False #@param

if PERFORM_CRF == True :
    !pip install --upgrade cython
    !pip install --upgrade pydensecrf
    
#If you are on windows and have trouble installing pydensecrf : 
#if you use anaconda, execute the following : conda install -c conda-forge pydensecrf').
#if not, or you have more fails, check https://github.com/lucasb-eyer/pydensecrf

## III. 1. Label Image

You have three options to how you create your test image :  
* [1. Export an image from a square window of a given center `[Lon,Lat]` and `radius` (in meters)](#1.-Export-an-image-from-a-square-window-of-given-center-and-radius)
* [2. Export an image from a window given a bounding box `[Lon1, Lat1, Lon2, Lat2]`](#2.-Export-an-image-from-a-window-given-a-bounding-box)
* [3. Export the whole area of a network* (for the brave who want to use Earth Engine's Editor)](#3.-Export-the-whole-area-of-a-network)
<rb>
***  
If you want to **export the whole area of a network** (the 3d option), you will have to use Earth Engine's Editor. It will calculate the bounding box of your network.

I have already added to the Editor, Sieccao, Saur (zone 1) and Brioude, you can access them by specifying `network_name ='sieccao'` or `network_name='brioude'`, or `network_name='saur'`

If your network is not already uploaded to your **Google Earth Editor Assets**, either provide a bounding box covering the whole area of the network (**follow the 2nd option**) OR follow the tutorial [Add Table to Assets](#1.-Add-Table-to-Assets)

### Fill only one of the following :
#### 1. Export an image from a square window of given center and radius

In [None]:
lon    = None #@param {type;'number'}
lat    = None #@param {type:'number'}
point  = [lon,lat]
radius = None #@param {type:'number'} #(in meters)

#### 2. Export an image from a window given a bounding box

In [None]:
minLng    = None #@param {type:'number'}
minLat    = None #@param {type:'number'}
maxLng    = None #@param {type:'number'}
maxLat    = None #@param {type:'number'}
rectangle = [minLng, minLat, maxLng, maxLat]

#### 3. Export the whole area of a network
First, if you want to use this option, either choose network_name to be 'brioude', 'sieccao' or 'saur'. Or add your own network by following the tutorial [Add Table to Assets](#1.-Add-Table-to-Assets)

In [None]:
# can be 'brioude','sieccao','saur' or the name of the network you just created following the tutorial "Add Table to Assets"
network_name = None #@param {type:'string'}

### Prediction :
#### a) First go to your browser and connect to the Google Drive you used to authenticate to Earth Engine.

#### b) Run the following code to construct your image  
  
*Every time your run a code, if you receive a message like this : "Please download file from Drive from folder ...", go to the Google Drive and to the folder mentioned, and download the file in the same folder on your computer.*



In [None]:
# Construct inference dataset
tfdataconstructor = TFDatasetConstruction(LANDSAT,SENTINEL,RESPONSE,KERNEL_SIZE)
corners = tfdataconstructor.test_dataset_construction(start_date,end_date,image_name,network_name=network_name,point=point,radius=radius,rectangle=rectangle)

In [None]:
# Load inference dataset
tfdataloader = TFDatasetProcessing(FEATURES_DICT,FEATURES,BANDS,NUM_FEATURES,None)
testdataset  = tfdataloader.get_inference_dataset(image_name)
NUM_FEATURES = tfdataloader.num_features

# Predict and write predictions to .tfrecord (this file will only be useful if you run the III.2.)
inference = Inference(NUM_CLASSES,MODEL_NAME)
predictions = inference.doMLPrediction(testdataset,image_name,NUM_FEATURES,perform_crf=PERFORM_CRF)

# Downloads the label image as KML 
download_kml(predictions,image_name,*corners)

## III. 2. Pipe Classification, Network Statistics and KMLs
Two options, one easy but slow that predicts from scratch each pipe, and another tricky but fast that uses your former predictions :
* [1. Easy but slow option](#1.-Easy-but-Slow-Option)
* [2. Tricky but fast option](#2.-Tricky-but-Fast-Option)


### 1. Easy but Slow Option 
#### No need to run the 2nd section "Label Image"
Using Earth Engine Editor can be tricky for a beginner, so I made the following functions that allow you to assign a class to each pipe and create kml files of colored pipes, without using the editor.
These functions are **time consuming** (4 hours for Sieccao for example) but easy to execute.   
  
*Note* : you should provide a csv of your net (you should have a function in sql_connector.py that does that) with 5 columns :
- `Name,lon1, lon2, lat1, lat2`, be careful, the names of the columns should be respected.

In [None]:
# Enter the name of your csv file :
coordsfilename = 'data/predictions/coords.csv' #@param 

# This function assigns a class to each pipe, it takes time to run ! 
# If you have errors due to multi-processing, set multi_process=False
predict_pipes_from_csv(coordsfilename,MODEL_NAME, BANDS, "2020-01-01","2020-12-31",multi_process=True)

# This function calculates the statistics of the network provided (proportion of each class)
get_statistics(coordsfilename)

In [21]:
# This functions creates a KML of the network provided where each pipe has a color corresponding to its class
#If it takes too long, execute the multi-processing version of color_pipe named mp_color_pipes
color_pipes(coordsfilename) 

def mp_color_pipes(file_name) :
    import simplekml
    import pandas as pd
    import matplotlib
    import numpy as np
    import os
    ds_test = pd.read_csv(file_name)
    lines = (ds_test['Name'], ds_test['lon1'], ds_test['lat1'], ds_test['lon2'], ds_test['lat2'], ds_test['landcover'])
    kml = simplekml.Kml()
    ids, lons1, lats1, lons2, lats2, classes = lines
    
    def color(id, lon1, lat1, lon2, lat2, classe):
        line = kml.newlinestring(name=str(id), coords=[(lon1,lat1), (lon2,lat2)])
        if classe == 0:
            r,g,b = np.multiply(255,matplotlib.colors.to_rgb('lime')).astype(int)
        elif classe == 1:
            r,g,b = np.multiply(255,matplotlib.colors.to_rgb('darkgreen')).astype(int)
        elif classe == 2:
            r,g,b = np.multiply(255,matplotlib.colors.to_rgb('yellow')).astype(int)
        else:
            r,g,b = np.multiply(255,matplotlib.colors.to_rgb('blue')).astype(int)
        line.style.linestyle.color = simplekml.Color.rgb(r,g,b)

    def color_wrapper(args):
        color(*args)
        
    from multiprocessing.pool import ThreadPool as Pool
    p = Pool(5)
    
    inputs = zip(ids, lons1, lats1, lons2, lats2, classes)
    p.map(color_wrapper,inputs)
    name = os.path.splitext(os.path.basename(file_name))[0].split('_classification')[0]
    kml.save('data/predictions/colored_pipes/'+name+'_colored.kml')

In [22]:
coordsfilename = 'data/predictions/coords.csv' #@param 
mp_color_pipes(coordsfilename)

### 2. Tricky but Fast Option 
#### Requires you use Earth Engine Editor and you run first the section II.Label Image
The following functions are VERY fast, but in order to run them, you should add your predictions (calculated in the 2nd section) to your Google Earth Editor's Assets by following the tutorial [Add Image to Assets](#2.-Add-Image-to-Assets)

In [None]:
network_name = 'saur' #@param can be 'sieccao', 'brioude', 'saur' or the name of your network you just added to Assets

In [None]:
#Produce a csv file with pipe names, coordinates and classes
predict_pipes(network_name,image_name) #image_name is the name of the image you created in the 2nd section "creating a label image"

#Formats the csv in order to have the right columns : Name, lon1,lon2,lat1,la2
clean_predictions(image_name)
filename = 'data/predictions/'+image_name+'_classification.csv'

#Calculates statistics of your network (the proportions of each class)
get_statistics(filename)

In [None]:
# This functions creates a KML of the network provided where each pipe has a color corresponding to its class
#If it takes too long, execute the multi-processing version of color_pipe named mp_color_pipes
color_pipes(filename) 

def mp_color_pipes(filename) :
    import simplekml
    import pandas as pd
    import matplotlib
    import numpy as np
    ds_test = pd.read_csv(file_name)
    lines = (ds_test['Name'], ds_test['lon1'], ds_test['lat1'], ds_test['lon2'], ds_test['lat2'], ds_test['landcover'])
    kml = simplekml.Kml()
    ids, lons1, lats1, lons2, lats2, classes = lines
    
    def color(id, lon1, lat1, lon2, lat2, classe):
        line = kml.newlinestring(name=str(id), coords=[(lon1,lat1), (lon2,lat2)])
        if classe == 0:
            r,g,b = np.multiply(255,matplotlib.colors.to_rgb('lime')).astype(int)
        elif classe == 1:
            r,g,b = np.multiply(255,matplotlib.colors.to_rgb('darkgreen')).astype(int)
        elif classe == 2:
            r,g,b = np.multiply(255,matplotlib.colors.to_rgb('yellow')).astype(int)
        else:
            r,g,b = np.multiply(255,matplotlib.colors.to_rgb('blue')).astype(int)
        line.style.linestyle.color = simplekml.Color.rgb(r,g,b)

    def color_wrapper(args):
        color(*args)
        
    from multiprocessing.pool import ThreadPool as Pool
    p = Pool(5)
    
    inputs = zip(ids, lons1, lats1, lons2, lats2, classes)
    p.map(color_wrapper,inputs)
    name = os.path.splitext(os.path.basename(filename))[0].split('_classification')[0]
    kml.save('data/predictions/colored_pipes/'+name+'_colored.kml')

In [None]:
mp_color_pipes(coordsfilename)

# IV. Tutorials

## 1. Add Table to Assets

- If your file is a "kml" file, convert it to a ".shp" file using **QGIS software** : 
  * Drag your kml to the QGIS window.
  * Right-click on your layer and choose "Export" then "Export Feature As"   
  <img src="tuto_images/3.PNG" width="400">  

  * Set "Format" as "ESRI Shapefile" and the CRS as "EPSG:3857 / Pseudo-Mercator"  
  * Fill "File name" to the name of the file. Careful, you should provide the directory : example C:\....pipe.shp
  * Click on "OK" and wait, this could take a moment. Now you have created several files, please keep them all.  
  <img src="tuto_images/4.PNG" width="400"> 
    
- Upload your files to your Google Earth Engine Editor :   
  * Go to https://code.earthengine.google.com/
  * Click on "Assets", then "NEW", then below "Table Upload", click on "Shape files". Select all the files you just created with QGIS.  
  * Name your Table, for example "saur_zone2"
  * Click on "UPLOAD"  
  <img src="tuto_images/5.PNG" width="400">
 
Now you can access your table by typing : image = ee.FeatureCollection(assetid) with assetid = directory/network_name.
You can find the assetid by clicking on the asset. 
<img src="tuto_images/11.PNG" width="400">

## 2. Add Image to Assets
*replace `filename` by the name of your inference image in the tutorial below*

- First, if not already done, add your network to Google Earth Editor's Assets following the tutorial in the previous section.
- Go to https://code.earthengine.google.com/ .
- Click on "Assets", then "NEW", then below "Image Upload", click on "GeoTIFF".
<img src="tuto_images/1.PNG" width="400">
- In "Sources files" select the files `pred_filename.TFRecord` and `filename-mixer.json` in your folder 'inference'
- Set "AssetId" to "`filename`_pred"
- Click on "UPLOAD"
<img src="tuto_images/2.PNG" width="400">
- On the right corner of your screen, click on "Tasks". Check the status of your export.
- If there's an error "cannot read mixer file", retry the steps above by putting the mixer file before the tfrecord file and vise-versa several times until you succeed, the system bugs sometimes.
<img src="tuto_images/6.PNG" width="400">

Now you can access your image by typing : image = **ee.Image(assetid)** with assetid = directory/filename_pred.  
You can find the assetid by clicking on the asset.
<img src="tuto_images/7.PNG" width="400">