# Model 2: High resolution prediction of flood maps

The output of this model is used to measure global performance the performance of the pipeline.

**The last section of this notebook create a submission file for the challenge**

In [None]:
import numpy as np
import pandas as pd
import xarray as xr
import src.baseline_model02 as bm
import shutil

# Training

The next cell allows you to quickly test models by reducing the amount of data used.

*nb_train_minicube*, *nb_test_minicube* and *nb_val_minicube*, are used to limit the number of "mini data cube" used in train/test/val sets and therefore reduce the computational cost of the training and hyper parameters exploration. With also only keep minicubes above the *min_score_model1* threshold.

If you choose a high number of minicubes, the threshold should be low enough.


In [None]:
baseline_model_generator_test = bm.BaseLineModel(
    "localdata/smallbox/label/label_",
    dynamic_features_path = "localdata/Model1_score_ERA5_Rez_v2.nc",
    static_features_root_path = "localdata/smallbox/static/static_",
    dynamic_features_FR_path = "localdata/Model1_Score_Full_Rez_v2.nc",
    inf_dynamic_features_FR_path = "localdata/Model1_Score_Full_Rez_inf.nc",
    static_features_FR_path = "localdata/static_Full_Rez.nc",
    labels_ERA5_path = "localdata/final_label_Full_ERA5.nc",
    labels_FR_path = "localdata/final_label_Full_Rez.nc",
    nb_train_minicube = 80, #Those values are very small for good performance you will need more datacubes
    nb_test_minicube = 80, #Those values are very small for apropriate test you will need more datacubes
    nb_val_minicube = 20,
    min_score_model1 = 0.2,
    name="Baseline_Model_2_Small_20_02",
    seed=1
    )

### Preparation of the train / test / val dataset

When using a high number of minicubes or low threshold, this process can still be quite long. The vectorised train / test / val can be saved to gain time when training several models on the same data.

In [None]:
baseline_model_generator_test.prepare_data()

### Training

Training a Random Forest with all features, 150 trees and depth 8.

In [None]:
baseline_model_generator_test.load_indiv([True, #soilgrid_bdod
                                          True, #soilgrid_cfvo
                                          True, #soilgrid_silt
                                          True, #soilgrid_clay
                                          True, #soilgrid_sand
                                          True, #depth_to_bedrock
                                          True, #altitude
                                          True, #aspect
                                          True, #slope
                                          True, #water_density
                                          True, #watershed
                                          True, #topological_catchment_areas
                                          True, #dist_sea
                                          True, #dist_riv
                                          True, #M1_score
                                          150, 
                                          8], 
                                     False)

### Saving and loading model

Vectorised test/train/validation dataset and trainned models are saved (the Full test saved is saved independently)

In [None]:
baseline_model_generator_test.save_to_disk()

#if you want to load a previously trained model :
# baseline_model_generator_test = baseline_model_generator_test.load_from_disk("Baseline_Model_2_Small_20_02")

### Hyper parameters search

Using Genetic Algorithms for hyper parameters optimisation.

In [None]:
# baseline_model_generator_test.GA_optimisation(ngen = 40, pop = 60)

# Model Analysis

### Feature importance

In [None]:
baseline_model_generator_test.print_feature_importance()

### Geographical results

##### Prediction score Map

##### False Positive, True Positive, False Negative Mapping

In [None]:
baseline_model_generator_test.load_FullRez()

In [None]:
baseline_model_generator_test.print_TNTPFN(save_path="graph/Model2/TNTPFN/", thresholdM1=0.5, thresholdM2=0.5)

In [None]:
baseline_model_generator_test.print_proba(save_path="graph/Model2/Proba/", thresholdM1=0.5, thresholdM2=0.5)

### Performance analysis

#### Pre-processing of the full test data

Loading of the Full Test Dataset from disk. To compute the performance of the model, we need to flaten it.

The flaten Full test (all France data on the define time slices) might be quite long to process.
Furthermore the *Full test set*, by nature, is fixed, so we process the flaten *Full Test Set* independently.

When you have done this process one time, you don't need to do it again as long as you don't change your first model outputs.

In [None]:
baseline_model_generator_test.prepare_data(compute_full_test_set=True) #This will take a while, only do it one time
baseline_model_generator_test.save_full_test_to_disk(name="Full") #Saving the results to disk
baseline_model_generator_test.load_full_test_from_disk(name="Full") #Loading the results from disk, start from here if you already computed the full test set

#### ROC plots

### Pre-processing of the full test data

Loading of the Full Test Dataset from disk.
The vectorised Full train test (all France data on the define time slices for test train might be quite long to process).
Furthermore the *Full test set*, by nature, is fixed, so we process the vectorised *Full Test Set* independently.
When you have done this process one time you don't need to do it again as long as you don't change your first model outputs.

In [None]:
baseline_model_generator_test.prepare_data(compute_full_test_set=True) #This will take a while, only do it one time
baseline_model_generator_test.save_full_test_to_disk(name="Full") #Saving the results to disk
baseline_model_generator_test.load_full_test_from_disk(name="Full") #Loading the results from disk, start from here if you already computed the full test set

In [None]:
baseline_model_generator_test.auc_graph("Full_Test", "", [0.01,0.05,0.1,0.15, 0.2,0.3, 0.5, 0.9])

In [None]:
baseline_model_generator_test.process_AUC_metrics(filter=False)

In [None]:
baseline_model_generator_test.process_prediction_metrics(filter=False)

# Computation of predictions for codabench

#### Data loading

In [None]:
baseline_model_generator_test.load_InfRez()


#### Printing of the prediction map

In [None]:
baseline_model_generator_test.print_proba_inf(save_path="graph/Model2/inference/")

saving predictions

In [None]:
baseline_model_generator_test.save_full_pred()

Loading of the previously computed predictions