# Discretizator - a class intended for creating time series and filling in gaps in it

To analyze the received data, it is often necessary to place the data on a regular time grid. The algorithm allows you to do this. After the procedure, sampling of the time series there are gaps. They occur because the parameter values are unknown at certain points in time. This class allows you to fill in gaps with a local median value, approximate values with local polynomials, or fill in gaps with selected values.

![Discretization.png](https://raw.githubusercontent.com/Dreamlone/SSGP-toolbox/master/Supplementary/images/rm_6_Discretization.png)

## Data preparation

For the algorithm to work correctly, you need to put .npy matrices into one folder. File names must be in the format "20190625T185030.npy", where 2019.. - year, .. 06.. - month, ..25.. - day, ..T185030 - time hours minutes seconds (format = '%Y%m%dT%H%M%S').

## Parameters

### Strategy if several layers fall within the same interval - averaging
- DEFAULT, 'None' - in case of a collision, the matrix closest to the timestamp is used for this timestamp
- 'simple' - in case of a collision, the average value for all matrices is used
- 'weighted' - in case of a collision, averaging is used so that the layer closest to the timestamp has the highest weight

### Method for filling in gaps - filling_method
- DEFAULT, 'None' - all missing values will be filled in with the "gap" number
- 'median' - filling in gaps with the local (5 known neighbors are used) median for a time series
- 'poly' - filling in gaps using local (5 known neighbors are used) approximation by a polynomial (degree = 2)

### Time step of sampling - timestep
- DEFAULT, '12H' - time interval as a string, for example '6H' - 6 hours, '2D' - 2 days, etc.

### Dictionary with gaps and skip values - key_values
- DEFAULT, {'gap': -100.0, 'skip': -200.0}

## Examples

In [1]:
from SSGPToolbox.TimeSeries import Discretizator
import os

The "make_time_series" method returns 2 objects: 
- tensor --- a multidimensional matrix made up of .npy matrices that were located in the "directory" folder
- tensor_timesteps --- array consisting of timestamp values

In [2]:
D = Discretizator(directory = os.path.join(os.pardir, 'Samples', 'S3LST_timeseries_example', 'Inputs'), 
                  key_values = {'gap': -100.0, 'skip': -200.0} , averaging = 'weighted')
# Filling the gaps in the time series
tensor, tensor_timesteps = D.make_time_series(timestep = '12H', filling_method = 'poly')

The time series will be composed with frequency - 12H
Start date - 2019-09-10
Final date - 2019-09-17


In [3]:
# Saving matrices to the folder
D.save_npy(tensor, tensor_timesteps, save_path = os.path.join(os.pardir, 'Samples', 'S3LST_timeseries_example', 'Outputs'))

In [4]:
# Saving matrices as a multidimensional array in netCDF format
D.save_netcdf(tensor, tensor_timesteps, save_path = os.path.join(os.pardir, 'Samples', 'S3LST_timeseries_example', 'Outputs'))