## Data preparation

For the algorithm to work correctly, you need to prepare a training sample and data to fill in the gaps. You should create a folder (LST, NDVI, or any other name) that contains folders (folder names are fixed):

![Database.png](https://raw.githubusercontent.com/Dreamlone/SSGP-toolbox/master/Supplementary/images/rm_1_Database.png)

## 1. History - folder with matrices in the .npy format - training sample

File names must be in the format "20190625T185030.npy", where 2019.. - year, .. 06.. - month, ..25.. - day, ..T185030 - time hours minutes seconds (format = '%Y%m%dT%H%M%S'). Matrices in the training sample may contain gaps. During training, the algorithm will either remove these gaps from the training sample, or replace them with the median for the time series for this pixel.

## 2. Inputs - a folder with matrices in .npy format that must be filled in

File names must be in the format "20190625T185030.npy", where 2019.. - year, .. 06.. - month, ..25.. - day, ..T185030 - time hours minutes seconds (format = '%Y%m%dT%H%M%S')

## 3. Extra - a folder with a matrix in .npy format that allows you to divide matrix cells into groups. The file name must be "Extra.npy"

The matrix can look like this:

![Biomes.png](https://raw.githubusercontent.com/Dreamlone/SSGP-toolbox/master/Supplementary/images/rm_2_Biomes.png)

This matrix must consist of integer values

## The values in the matrices

1. gap --- the value in pixels to fill in (by default "-100.0")

2. skip --- No data in pixels that do not need to be filled in, such as sea water, when only pixels with values of the earth's surface temperature should be filled in. The algorithm will retrospectively evaluate whether each specific pixel had a skip value, and if it did, the value predicted by the model in this pixel will be equal to skip. (the default is "-200.0")

3. NoData --- the value in pixels that are not included in the image's extent. This value may also indicate errors when projecting bitmaps. If the number of pixels with this value is in the matrix from the 'History' folder'exceeds a certain number of percentages in the image (self.main_threshold = 0.05), than this matrix will not be included in the training sample. (the default is "-32768.0")

The algorithm only fills in pixels with "gap" values.

If the matrix to be filled in has less than 101 unclosed pixels (i.e. not "gap", "skip", or "NoData"), the algorithm does not fill it in. The system displays the message "No calculation for matrix NAME_OF_MATRIX". The matrix is not added to the "Outputs" folder.

If the matrix has no gaps, the message "No gaps in matrix NAME_OF_MATRIX" will appear on the screen. The matrix is automatically added to the "Outputs" folder.

Thus:
- The training sample should be placed in the "History" folder
- The matrices to be filled in should be placed in the "Inputs" folder
- The "Extra" folder is optional and contains a single matrix "Extra.npy"
- The "Outputs" folder is generated while the algorithm is running

As a result of the algorithm, the 'Outputs' folder is formed, where the matrices filled in by the algorithm are saved in the .npy format. A .json file is created with the values for evaluating the accuracy of the algorithm for each layer. Accuracy is evaluated by cross-validation on data from the training sample.

## Parameters

### Selecting an algorithm for filling in gaps - method
- DEFAULT, 'Lasso' - Lasso regression
- 'RandomForest' - the random forest
- 'ExtraTrees' - extra trees random forest
- 'Knn' - k-nearest neighbors
- 'SVR' - support vector regression

### Strategies for selecting predictors - predictor_configuration
- DEFAULT, 'Random' - Randomly selected 100 points in the matrix
- 'All' - predictors - all known cells in the matrix (the runtime is very big)
- 'Biome' - the 40 closest pixels (according to the Euclidean metric) from the same biome as the gap are selected as predictors

### Options for configuring hyperparameters - hyperparameters
- DEFAULT, 'RandomGridSearch' - random grid search
- 'GridSearch' - full grid search
- 'Custom' - custom settings (dictionary)

### Dictionary with hyperparameters (if hyperparameters = 'Custom') - params
- DEFAULT, None. If hyperparameters != 'Custom', than ignored

### Ability to use filled layers - add_outputs
- DEFAULT, False, filled layers are not added to the training sample
- True - in this case the matrices filled in by the algorithm are included in the training sample

### Dictionary with gaps, skip and NoData values - key_values
- DEFAULT, {'gap': -100.0, 'skip': -200.0, 'NoData': -32768.0}

## Examples

In [None]:
from SSGPToolbox.Gapfiller import SimpleSpatialGapfiller
# Additional inputs
import os

The selected method is the support vector regression. The strategy for selecting predictors is "Biome". Selection of hyperparameters - custom settings in the form of a dictionary. The "add_outputs" and "key_values" parameters are set by default.

In [None]:
Gapfiller_SVR = SimpleSpatialGapfiller(directory = os.path.join(os.pardir, 'Samples', 'S3LST_gapfilling_example'))
Gapfiller_SVR.fill_gaps(method = 'SVR', predictor_configuration = 'Biome',
                        hyperparameters = 'Custom',
                        params = {'kernel': 'linear', 'gamma': 'scale', 'C': 1000, 'epsilon': 1})

Example of applying the algorithm. The selected method is LASSO regression. The strategy for selecting predictors is "random 100 points". Hyperparameter selection - full grid search. The matrices filled in by the algorithm will be included in the training selection for subsequent layers.

In [None]:
Gapfiller_LASSO = SimpleSpatialGapfiller(directory = os.path.join(os.pardir, 'Samples', 'S3LST_gapfilling_example'))
Gapfiller_LASSO.fill_gaps(method = 'Lasso', predictor_configuration = 'Random',
                          hyperparameters = 'GridSearch', add_outputs = True,
                          key_values = {'gap': -1.0, 'skip': -10.0, 'NoData': -100.0})