# Tutorial Classification of European Land Cover Products using Sentinel-1

The focus of this task is the automated random sampling of the reference data and the SAR image data
and subsequent feeding into a machine learning framework. A study area is represented by a wetland site of Donana Delta in Spain.

## Content

1. Importing Python modules
2. Define input paths
3. Resample data and reclassify raster data
4. Select training samples
5. Create Random Forest and Predict the result
6. Visualize the result
7. Accuracy Assessment
8. Parameter tuning

### 1. Importing Python modules from .py files

These functions are part of the SenClass project and are needed for computing and visualizing the results.

In [None]:
from SENClass import geodata
from SENClass import random_forest
from SENClass import accuracy_assessment
from SENClass import pca

### 2. Define input paths

This is the only box which needs to be changed are the ones where paths must be adjusted, all others runs automatically. But you can adjust the parameters in the following boxes 
Two directories need to be defined with / as separator:

<ul>
<li> <i>path</i> a directory containing all the raster files provided with this tutorial.</li>
<li> <i>path_ref_p</i> a directory containing a reference product file, in this case  Global Surface Water Product (Seasonality) </li>
</ul>

In [None]:
path = '/path/to/data/'
path_ref_p = '/path/to/reference_data'  

Here will be defined a file extention for raster images and the name of the reference product file

In [None]:
raster_ext = "tif"
ref_p_name = "seasonality_10W_40Nv1_3_2020_sub_reprojected.tif"

#### create the output directory and files 

In [None]:
out_folder_resampled_scenes = "resamp/"
out_folder_prediction = "results/"  

In [None]:
name_predicted_image = "base_prediction_nd_0"
name_tuned_predicted_image = "tune_prediction_nd_0"

### 3. Resample data and reclassify raster data

The raster used as reference product is projected into the coordinate system of the satellite images. The satellite
images are not reprojected, but the pixel size is adjusted to that of the reference product.

In [None]:
out_ref_p = geodata.reproject_raster(path, path_ref_p, ref_p_name, raster_ext, out_folder_resampled_scenes)

raster_value describes the class values from the reference raster. class_value are the new class values. The reclassification takes place via a less than or equal approach. Every Pixel with the value 0 will receive the value 100. All values from 1 to less than or equal to 11 the value 200 and all pixels with the value 12 the value 300. Both lists must have the same length for the reclassification to be executed. 

In [None]:
raster_value = [0, 11, 12]
class_value = [100, 200, 300]

In [None]:
geodata.reclass_raster(raster_value, class_value, out_ref_p)

### 4. Select training samples

#### define parameters for sample selection

Furthermore, three processing parameters for sample selection can be adjusted:
<ul>
<li> <i>random_state</i></li>
<li> <i>train_size</i> Specifies how many samples are used for training</li>
<li> <i>sss</i> True: using stratified random sampling, False: using random sampling</li>
</ul>    

The function select samples for training and testing. The user has the choice between two methods to select the
test and training pixels. If strat is set to true, the pixels and labels are selected using the sklearn algorithm
 <i>StratifiedShuffleSplit </i>. Otherwise, the pixels and labels are randomly selected from the data frame using the
sklearn algorithm  <i>train_test_split</i>.

In [None]:
random_state = 0
train_size = 0.25  
sss = False 

In [None]:
x_train, y_train, data, mask = geodata.select_samples(path, path_ref_p, out_ref_p, out_folder_resampled_scenes,
                                                          raster_ext, train_size, random_state, sss)

#### Define the number of components for Principal Coponent Analysis

We recommend not to perform PCA, because the result is much worse. 

In [None]:
# n_components = 0.95

In [None]:
Linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space.

In [None]:
# data, x_train = pca.principal(data, x_train, n_components)

### 5. Create basic Random Forest and Predict the result

#### define random forest parameter

As well, the following random forest parameters can be adjusted
<ul>
<li> <i>max_depth</i></li> The maximum depth of the tree, default none
<li> <i>n_estimator</i></li> The number of trees in the forest, default 100
<li> <i>n_cores</i></li> Defines number of cores to use, if -1 all cores are used
<li> <i>verbose</i></li> Shows output from rando mforrest in console
</ul>

In [1]:
max_depth = 3 
n_estimator = 2  
n_cores = 1
verbose = 1 

#### create random forest

The RandomForest will be created with this function.

In [None]:
rf = random_forest.rf_create(max_depth, random_state, n_estimator, n_cores, verbose)

#### fit random forest model

rf_fit will create the Random Forrest with the defined parameters and fit the model to the training data.

In [None]:
rf_fitted = random_forest.rf_fit(rf, x_train, y_train)

#### predict result 

rf_predict will predict the result 

In [None]:
prediction = random_forest.rf_predict(data, rf_fitted)

## 6. Visualize the result


The function <i>tif_visualize</i> shows the classification result from the Global Water Surface Product (Seasonality). 
The visualization only works for the Global Water Surface Product.

In [None]:
geodata.tif_visualize(path, out_folder_prediction, name_predicted_image, raster_ext) #visualize a base prediction

We can save the result as GeoTIFF

In [None]:
geodata.prediction_to_gtiff(prediction, path, out_folder_prediction, name_predicted_image, out_ref_p, raster_ext,
                                mask)

## 7. Accuracy Assessment

Contains functions to assess the accuracy of the RF classifier. The following metrics are
evaluated:
<li>Confusion matrix (CM)</li>
<li>Kappa statistic (Kappa)</li>
<li>Classification report (precision, recall, f1-score, support)</li>

In [None]:
AccuracyAssessment.accuracy_assessment(prediction, out_ref_p)

## 8. Parameter tuning

#### define random forest parameter for parameter tuning

For the tuning can the following adjusted
<li> <i> min_depth_t </i></li>
<li> <i> max_depth_t </i></li>
<li> <i> min_estimator </i> minimum number of estimators</li>
<li> <i> max_estimator </i> maximum number of estimators</li>
<li> <i> value_generator </i> number of values to generate</li>
<li> <i> n_iter </i> number of parameter settings that are sampled</li>
<li> <i> cv </i> number of folds of cross validation</li>

In [None]:
min_depth_t = 3
max_depth_t = 10
min_estimator = 10  
max_estimator = 50  
value_generator = 5  
n_iter = 5    

The function <i>random_forest.rf_parameter_tuning</i> searches for suitable parameters and executes the fitting and prediction directly. Depending on the choice of parameters the process can take several hours

In [None]:
tuned_prediction = random_forest.rf_parameter_tuning(x_train, y_train, data, min_depth_t, max_depth_t,
                                                         min_estimator, max_estimator, value_generator, n_iter,
                                                         random_state, n_cores)

We can visualize the result again 

In [None]:
geodata.tif_visualize(path, out_folder_prediction, name_tuned_predicted_image, raster_ext) #visualize a tuned prediction

And save the result as GeoTiff

In [None]:
geodata.prediction_to_gtiff(tuned_prediction, path, out_folder_prediction, name_tuned_predicted_image, out_ref_p,
                               raster_ext, mask)

At the end, we can again perform the accuracy assessment. . The following metrics are
evaluated:
<li>Confusion matrix (CM)</li>
<li>Kappa statistic (Kappa)</li>
<li>Classification report (precision, recall, f1-score, support)</li>

In [None]:
AccuracyAssessment.accuracy_assessment(tuned_prediction, out_ref_p)