<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

In the following we define the classes [`LevelSetKDEx`](https://kaiguender.github.io/dddex/levelsetkdex_univariate.html#levelsetkdex) and [`LevelSetKDEx_kNN`](https://kaiguender.github.io/dddex/levelsetkdex_univariate.html#levelsetkdex_knn) where KDE is short for 'Kernel Density Estimator' and the 'x' is supposed to signal that both classes can be defined based on any arbitrary point predictor. The name 'LevelSet' stems from the fact that every approach presented in this notebook interprets the values of the point forecasts as a similarity measure between samples. The point predictor is specified by the argument `estimator` and must have a `.predict()`-method and should have been trained before hand. 

Both classes [`LevelSetKDEx`](https://kaiguender.github.io/dddex/levelsetkdex_univariate.html#levelsetkdex) and [`LevelSetKDEx_kNN`](https://kaiguender.github.io/dddex/levelsetkdex_univariate.html#levelsetkdex_knn) fulfill the same task: By first running `.fit(XTrain, yTrain)` and then calling `.generateWeights(XTest)`, they both output an estimation of the conditional density of every sample specified by 'XTest'. The basic idea for both approaches is also identical: Suppose we have a single test sample at hand. At first, we compare the value of the point prediction of this sample and the values of the point predictions of the training samples computed via `estimator.predict(XTrain)` and `estimator.predict(XTest)`, respectively. Based on this comparison, we select 'binSize'-many training samples that we deem the most similar to the test sample at hand. The concrete way we select the training samples constitutes the only difference between [`LevelSetKDEx`](https://kaiguender.github.io/dddex/levelsetkdex_univariate.html#levelsetkdex) and [`LevelSetKDEx_kNN`](https://kaiguender.github.io/dddex/levelsetkdex_univariate.html#levelsetkdex_knn). Finally, the empirical distribution of the y-values of these training samples then acts as our estimation of the conditional distribution.

Further details on how both approaches work approaches can be found below.

## Base Level-Set Class

In [1]:
#| echo: false
#| output: asis
show_doc(BaseLSx)

---

[source](https://github.com/kaiguender/dddex/blob/main/dddex/levelSetKDEx_univariate.py#L27){target="_blank" style="float:right; font-size:smaller"}

### BaseLSx

>      BaseLSx (estimator, binSize:int=None, weightsByDistance:bool=False)

Base class for the Level-Set based approaches. This class is not supposed to be used directly.
Use derived classes instead.

|    | **Type** | **Default** | **Details** |
| -- | -------- | ----------- | ----------- |
| estimator |  |  | Model with a `fit` and `predict` method (implementing the scikit-learn estimator interface). |
| binSize | int | None | Number of training samples considered for creating weights. |
| weightsByDistance | bool | False | Determines behaviour of method `getWeights`. If False, all weights receive the same  <br>value. If True, the distance of the point forecasts is taking into account. |

In [None]:
# show_doc(BaseLSx)

In [None]:
# show_doc(BaseLSx.pointPredict)

In [None]:
# show_doc(BaseLSx.refitPointEstimator)

## Level-Set Approach based on Bin Building

In [2]:
#| echo: false
#| output: asis
show_doc(LevelSetKDEx)

---

[source](https://github.com/kaiguender/dddex/blob/main/dddex/levelSetKDEx_univariate.py#L77){target="_blank" style="float:right; font-size:smaller"}

### LevelSetKDEx

>      LevelSetKDEx (estimator, binSize:int=None, weightsByDistance:bool=False)

[`LevelSetKDEx`](https://kaiguender.github.io/dddex/levelsetkdex_univariate.html#levelsetkdex) turns any point forecasting model into an estimator of the underlying conditional density.
The name 'LevelSet' stems from the fact that this approach interprets the values of the point forecasts
as a similarity measure between samples. The point forecasts of the training samples are sorted and 
recursively assigned to a bin until the size of the current bin reaches `binSize` many samples. Then
a new bin is created and so on. For a new test sample we check into which bin its point prediction
would have fallen and interpret the training samples of that bin as the empirical distribution function
of this test sample.

|    | **Type** | **Default** | **Details** |
| -- | -------- | ----------- | ----------- |
| estimator |  |  | Model with a .fit and .predict-method (implementing the scikit-learn estimator interface). |
| binSize | int | None | Size of the bins created while running fit. |
| weightsByDistance | bool | False | Determines behaviour of method `getWeights`. If False, all weights receive the same  <br>value. If True, the distance of the point forecasts is taking into account. |

In [3]:
#| echo: false
#| output: asis
show_doc(LevelSetKDEx.fit)

---

[source](https://github.com/kaiguender/dddex/blob/main/dddex/levelSetKDEx_univariate.py#L108){target="_blank" style="float:right; font-size:smaller"}

### LevelSetKDEx.fit

>      LevelSetKDEx.fit (X:numpy.ndarray, y:numpy.ndarray)

Fit [`LevelSetKDEx`](https://kaiguender.github.io/dddex/levelsetkdex_univariate.html#levelsetkdex) model by grouping the point predictions of the samples specified via `X`
according to their value. Samples are recursively sorted into bins until each bin contains
`binSize` many samples. For details, checkout the function [`generateBins`](https://kaiguender.github.io/dddex/levelsetkdex_univariate.html#generatebins) which does the
heavy lifting.

|    | **Type** | **Details** |
| -- | -------- | ----------- |
| X | np.ndarray | Feature matrix used by `estimator` to predict `y`. |
| y | np.ndarray | 1-dimensional target variable corresponding to the feature matrix `X`. |

In [4]:
#| echo: false
#| output: asis
show_doc(LevelSetKDEx.getWeights)

---

[source](https://github.com/kaiguender/dddex/blob/main/dddex/levelSetKDEx_univariate.py#L160){target="_blank" style="float:right; font-size:smaller"}

### LevelSetKDEx.getWeights

>      LevelSetKDEx.getWeights (X:numpy.ndarray,
>                               outputType:str='onlyPositiveWeights',
>                               scalingList:list=None)

Computes estimated conditional density for each sample specified by `X`. The concrete structure of each element 
of the returned list depends on the specified value of `outputType`:

- **all**: An array with the same length as the number of training samples. Each entry represents the probability 
  of each training sample.
- **onlyPositiveWeights**: A tuple. The first element of the tuple represents the probabilities and the second 
  one the indices of the corresponding training sample. Only probalities greater than zero are returned. 
  Note: This is the most memory and computationally efficient output type.
- **summarized**: A tuple. The first element of the tuple represents the probabilities and the second one the 
  corresponding value of `yTrain`. The probabilities corresponding to identical values of `yTrain` are aggregated.
- **cumDistribution**: A tuple. The first element of the tuple represents the probabilities and the second 
  one the corresponding value of `yTrain`.
- **cumDistributionSummarized**: A tuple. The first element of the tuple represents the probabilities and 
  the second one the corresponding value of `yTrain`. The probabilities corresponding to identical values of `yTrain` are aggregated.

|    | **Type** | **Default** | **Details** |
| -- | -------- | ----------- | ----------- |
| X | np.ndarray |  | Feature matrix for which conditional density estimates are computed. |
| outputType | str | onlyPositiveWeights | Specifies structure of the returned density estimates. One of: <br>'all', 'onlyPositiveWeights', 'summarized', 'cumDistribution', 'cumDistributionSummarized' |
| scalingList | list | None | Optional. List with length X.shape[0]. Values are multiplied to the estimated <br>density of each sample for scaling purposes. |
| **Returns** | **list** |  | **List whose elements are the conditional density estimates for the samples specified by `X`.** |

#### Generate Bins

In [5]:
#| echo: false
#| output: asis
show_doc(generateBins)

---

[source](https://github.com/kaiguender/dddex/blob/main/dddex/levelSetKDEx_univariate.py#L226){target="_blank" style="float:right; font-size:smaller"}

### generateBins

>      generateBins (binSize:int, yPred:numpy.ndarray)

Used to generate the bin-structure used by [`LevelSetKDEx`](https://kaiguender.github.io/dddex/levelsetkdex_univariate.html#levelsetkdex) to compute density estimations.

|    | **Type** | **Details** |
| -- | -------- | ----------- |
| binSize | int | Size of the bins of values of `yPred` being grouped together. |
| yPred | np.ndarray | 1-dimensional array of predicted values. |

## Level-Set Approach based on kNN

In [6]:
#| echo: false
#| output: asis
show_doc(LevelSetKDEx_kNN)

---

[source](https://github.com/kaiguender/dddex/blob/main/dddex/levelSetKDEx_univariate.py#L267){target="_blank" style="float:right; font-size:smaller"}

### LevelSetKDEx_kNN

>      LevelSetKDEx_kNN (estimator, binSize:int=None,
>                        weightsByDistance:bool=False)

[`LevelSetKDEx_kNN`](https://kaiguender.github.io/dddex/levelsetkdex_univariate.html#levelsetkdex_knn) turns any point predictor that has a .predict-method 
into an estimator of the condititional density of the underlying distribution.
The basic idea of each level-set based approach is to interprete the point forecast
generated by the underlying point predictor as a similarity measure of samples.
In the case of the [`LevelSetKDEx_kNN`](https://kaiguender.github.io/dddex/levelsetkdex_univariate.html#levelsetkdex_knn) defined here, for every new samples
'binSize'-many training samples are computed whose point forecast is closest
to the point forecast of the new sample.
The resulting empirical distribution of these 'nearest' training samples are 
viewed as our estimation of the conditional distribution of each the new sample 
at hand.

NOTE: In contrast to the standard [`LevelSetKDEx`](https://kaiguender.github.io/dddex/levelsetkdex_univariate.html#levelsetkdex), it is possible to apply
[`LevelSetKDEx_kNN`](https://kaiguender.github.io/dddex/levelsetkdex_univariate.html#levelsetkdex_knn) to arbitrary dimensional point predictors.

|    | **Type** | **Default** | **Details** |
| -- | -------- | ----------- | ----------- |
| estimator |  |  | Model with a .fit and .predict-method (implementing the scikit-learn estimator interface). |
| binSize | int | None | Size of the bins created while running fit. |
| weightsByDistance | bool | False | Determines behaviour of method `getWeights`. If False, all weights receive the same  <br>value. If True, the distance of the point forecasts is taking into account. |

In [7]:
#| echo: false
#| output: asis
show_doc(LevelSetKDEx_kNN.fit)

---

[source](https://github.com/kaiguender/dddex/blob/main/dddex/levelSetKDEx_univariate.py#L303){target="_blank" style="float:right; font-size:smaller"}

### LevelSetKDEx_kNN.fit

>      LevelSetKDEx_kNN.fit (X:numpy.ndarray, y:numpy.ndarray)

Fit [`LevelSetKDEx_kNN`](https://kaiguender.github.io/dddex/levelsetkdex_univariate.html#levelsetkdex_knn) model by applying the nearest neighbors algorithm to the point
predictions of the samples specified by `X` based on `estimator`.

|    | **Type** | **Details** |
| -- | -------- | ----------- |
| X | np.ndarray | Feature matrix used by `estimator` to predict `y`. |
| y | np.ndarray | 1-dimensional target variable corresponding to the feature matrix `X`. |

In [8]:
#| echo: false
#| output: asis
show_doc(LevelSetKDEx_kNN.getWeights)

---

[source](https://github.com/kaiguender/dddex/blob/main/dddex/levelSetKDEx_univariate.py#L358){target="_blank" style="float:right; font-size:smaller"}

### LevelSetKDEx_kNN.getWeights

>      LevelSetKDEx_kNN.getWeights (X:numpy.ndarray,
>                                   outputType:str='onlyPositiveWeights',
>                                   scalingList:list=None)

Computes estimated conditional density for each sample specified by `X`. The concrete structure of each element 
of the returned list depends on the specified value of `outputType`:

- **all**: An array with the same length as the number of training samples. Each entry represents the probability 
  of each training sample.
- **onlyPositiveWeights**: A tuple. The first element of the tuple represents the probabilities and the second 
  one the indices of the corresponding training sample. Only probalities greater than zero are returned. 
  Note: This is the most memory and computationally efficient output type.
- **summarized**: A tuple. The first element of the tuple represents the probabilities and the second one the 
  corresponding value of `yTrain`. The probabilities corresponding to identical values of `yTrain` are aggregated.
- **cumDistribution**: A tuple. The first element of the tuple represents the probabilities and the second 
  one the corresponding value of `yTrain`.
- **cumDistributionSummarized**: A tuple. The first element of the tuple represents the probabilities and 
  the second one the corresponding value of `yTrain`. The probabilities corresponding to identical values of `yTrain` are aggregated.

|    | **Type** | **Default** | **Details** |
| -- | -------- | ----------- | ----------- |
| X | np.ndarray |  | Feature matrix for which conditional density estimates are computed. |
| outputType | str | onlyPositiveWeights | Specifies structure of the returned density estimates. One of: <br>'all', 'onlyPositiveWeights', 'summarized', 'cumDistribution', 'cumDistributionSummarized' |
| scalingList | list | None | Optional. List with length X.shape[0]. Values are multiplied to the estimated <br>density of each sample for scaling purposes. |
| **Returns** | **list** |  | **List whose elements are the conditional density estimates for the samples specified by `X`.** |

## Level-Set Approach based on NN

In [9]:
#| echo: false
#| output: asis
show_doc(LevelSetKDEx_NN)

---

[source](https://github.com/kaiguender/dddex/blob/main/dddex/levelSetKDEx_univariate.py#L444){target="_blank" style="float:right; font-size:smaller"}

### LevelSetKDEx_NN

>      LevelSetKDEx_NN (estimator, binSize:int=None,
>                       weightsByDistance:bool=False)

[`LevelSetKDEx_kNN`](https://kaiguender.github.io/dddex/levelsetkdex_univariate.html#levelsetkdex_knn) turns any point predictor that has a .predict-method 
into an estimator of the condititional density of the underlying distribution.
The basic idea of each level-set based approach is to interprete the point forecast
generated by the underlying point predictor as a similarity measure of samples.
In the case of the [`LevelSetKDEx_kNN`](https://kaiguender.github.io/dddex/levelsetkdex_univariate.html#levelsetkdex_knn) defined here, for every new samples
'binSize'-many training samples are computed whose point forecast is closest
to the point forecast of the new sample.
The resulting empirical distribution of these 'nearest' training samples are 
viewed as our estimation of the conditional distribution of each the new sample 
at hand.

NOTE: In contrast to the standard [`LevelSetKDEx`](https://kaiguender.github.io/dddex/levelsetkdex_univariate.html#levelsetkdex), it is possible to apply
[`LevelSetKDEx_kNN`](https://kaiguender.github.io/dddex/levelsetkdex_univariate.html#levelsetkdex_knn) to arbitrary dimensional point predictors.

|    | **Type** | **Default** | **Details** |
| -- | -------- | ----------- | ----------- |
| estimator |  |  | Model with a .fit and .predict-method (implementing the scikit-learn estimator interface). |
| binSize | int | None | Size of the bins created while running fit. |
| weightsByDistance | bool | False | Determines behaviour of method `getWeights`. If False, all weights receive the same  <br>value. If True, the distance of the point forecasts is taking into account. |

### Get Neighbors

In [10]:
#| echo: false
#| output: asis
show_doc(getNeighbors)

---

[source](https://github.com/kaiguender/dddex/blob/main/dddex/levelSetKDEx_univariate.py#L577){target="_blank" style="float:right; font-size:smaller"}

### getNeighbors

>      getNeighbors (binSize:int, yPred:numpy.ndarray)

Used to generate the neighboorhoods used by [`LevelSetKDEx`](https://kaiguender.github.io/dddex/levelsetkdex_univariate.html#levelsetkdex) to compute density estimations.

|    | **Type** | **Details** |
| -- | -------- | ----------- |
| binSize | int | Size of the bins of values of `yPred` being grouped together. |
| yPred | np.ndarray | 1-dimensional array of predicted values. |

### Get Neighbor Test

In [11]:
#| echo: false
#| output: asis
show_doc(getNeighborsTest)

---

[source](https://github.com/kaiguender/dddex/blob/main/dddex/levelSetKDEx_univariate.py#L703){target="_blank" style="float:right; font-size:smaller"}

### getNeighborsTest

>      getNeighborsTest (binSize:int, yPred:numpy.ndarray,
>                        yPredTrain:numpy.ndarray, neighborsDictTrain:dict)

Used to generate the neighboorhoods used by [`LevelSetKDEx`](https://kaiguender.github.io/dddex/levelsetkdex_univariate.html#levelsetkdex) to compute density estimations.

|    | **Type** | **Details** |
| -- | -------- | ----------- |
| binSize | int | Size of the bins of values of `yPred` being grouped together. |
| yPred | np.ndarray | 1-dimensional array of predicted values. |
| yPredTrain | np.ndarray | 1-dimensional array of predicted train values. |
| neighborsDictTrain | dict | Dict containing the neighbors of all train samples. Keys are the train predictions. |

### Get Kernel Values

In [12]:
#| echo: false
#| output: asis
show_doc(getKernelValues)

---

[source](https://github.com/kaiguender/dddex/blob/main/dddex/levelSetKDEx_univariate.py#L799){target="_blank" style="float:right; font-size:smaller"}

### getKernelValues

>      getKernelValues (yPred, yPredTrain, neighborsDictTest,
>                       neighborsDictTrain, neighborsRemoved, neighborsAdded,
>                       binSize, returnWeights=True)

# Test Code

In [None]:
# yPred = np.concatenate([np.arange(5000)] * 2, axis = 0)
# yPredTrain = np.concatenate([np.arange(50000)] * 2, axis = 0)
# binSize = 200

# neighborsDictTrain, neighborsRemoved, neighborsAdded = generateNeighborhoodsUnique(binSize = binSize,
#                                                                                yPred = yPredTrain)

# neighborsDictTest = generateNeighborhoodsTestUnique(binSize = binSize,
#                                                 yPred = yPred,
#                                                 yPredTrain = yPredTrain,
#                                                 neighborsDictTrain = neighborsDictTrain)

# start = time.time()
# kernelValuesList = getKernelValues(binSize = binSize,
#                                    yPred = yPred,
#                                    yPredTrain = yPredTrain,
#                                    neighborsDictTest = neighborsDictTest,
#                                    neighborsDictTrain = neighborsDictTrain,
#                                    neighborsRemoved = neighborsRemoved,
#                                    neighborsAdded = neighborsAdded)
# print(time.time() - start)