# Hyperparams And Distributions

This page introduces the hyperparams, and distributions in Neuraxle. You can find [Hyperparams Distribution API here](https://www.neuraxle.org/stable/api/neuraxle.hyperparams.distributions.html), and 
[Hyperparameter Samples API here](https://www.neuraxle.org/stable/api/neuraxle.hyperparams.space.html).

Hyperparameter is a parameter drawn from a prior distribution. In Neuraxle, we have a few built-in distributions, and we are also compatible with scipy distributions. 

Create a [Uniform Distribution](https://www.neuraxle.org/stable/api/neuraxle.hyperparams.distributions.html#neuraxle.hyperparams.distributions.Uniform):

In [1]:
from neuraxle.hyperparams.distributions import Uniform

hd = Uniform(
    min_included=-10, 
    max_included=10, 
    null_default_value=0
)

ModuleNotFoundError: No module named 'neuraxle'

Sample the random variable using [rvs](https://www.neuraxle.org/stable/api/neuraxle.hyperparams.distributions.html#neuraxle.hyperparams.distributions.HyperparameterDistribution.rvs):

In [None]:
sample = hd.rvs()
print(sample)

Nullify the random variable using [nullify](https://www.neuraxle.org/stable/api/neuraxle.hyperparams.distributions.html#neuraxle.hyperparams.distributions.HyperparameterDistribution.nullify):

In [None]:
nullified_sample = hd.nullify()
assert nullified_sample == hd.null_default_value

Get the probability distribution function value at `x` using [pdf](https://www.neuraxle.org/stable/api/neuraxle.hyperparams.distributions.html#neuraxle.hyperparams.distributions.HyperparameterDistribution.pdf):

In [None]:
pdf = hd.pdf(1)
print('pdf: {}'.format(pdf))

Get the cumulative probability distribution function value at `x` using [cdf](https://www.neuraxle.org/stable/api/neuraxle.hyperparams.distributions.html#neuraxle.hyperparams.distributions.HyperparameterDistribution.cdf)

In [None]:
cdf = hd.cdf(1)
print('cdf: {}'.format(cdf))

## Setting And Updating Hyperparams


In Neuraxle, each step has hyperparams of type [HyperparameterSamples](https://www.neuraxle.org/stable/api/neuraxle.hyperparams.space.html#neuraxle.hyperparams.space.HyperparameterSamples), and spaces of type [HyperparameterSpace](https://www.neuraxle.org/stable/api/neuraxle.hyperparams.distributions.html#neuraxle.hyperparams.distributions.HyperparameterDistribution).  

Consider a simple pipeline that contains 2 MultiplyByN steps, and one PCA component inside a nested pipeline:

In [None]:
from sklearn.decomposition import PCA

from neuraxle.hyperparams.distributions import RandInt
from neuraxle.hyperparams.space import HyperparameterSpace, HyperparameterSamples
from neuraxle.pipeline import Pipeline
from neuraxle.steps.numpy import MultiplyByN

p = Pipeline([
    ('step1', MultiplyByN(2)),
    ('step2', MultiplyByN(2)),
    Pipeline([
        PCA(n_components=4)
    ])
])

We can set or update the hyperparams, and spaces by doing the following:  

In [None]:
p.set_hyperparams(HyperparameterSamples({
    'step1__multiply_by': 42,
    'step2__multiply_by': -10,
    'Pipeline__PCA__n_components': 2
}))

p.update_hyperparams(HyperparameterSamples({
    'Pipeline__PCA__n_components': 3
}))

p.set_hyperparams_space(HyperparameterSpace({
    'step1__multiply_by': RandInt(42, 50),
    'step2__multiply_by': RandInt(-10, 0),
    'Pipeline__PCA__n_components': RandInt(2, 3)
}))

We can sample the space of random variables:

In [None]:
samples = p.get_hyperparams_space().rvs()

assert 42 <= samples['step1__multiply_by'] <= 50
assert -10 <= samples['step2__multiply_by'] <= 0
assert samples['Pipeline__PCA__n_components'] in [2, 3]

We can get all hyperparams:

In [None]:
samples = p.get_hyperparams()

assert 42 <= samples['step1__multiply_by'] <= 50
assert -10 <= samples['step2__multiply_by'] <= 0
assert samples['Pipeline__PCA__n_components'] in [2, 3]
assert p['Pipeline']['PCA'].get_wrapped_sklearn_predictor().n_components in [2, 3]

## Neuraxle Custom Distributions

## Scipy Distributions 

To define a scipy distribution that is compatible with Neuraxle, you need to wrap the scipy distribution with ScipyDistributionWrapper: 

In [None]:
from neuraxle.hyperparams.scipy_distributions import ScipyDistributionWrapper

hd = ScipyDistributionWrapper(
    scipy_distribution=randint(low=min_included, high=max_included),
    null_default_value=null_default_value
)

### Discrete Distributions
For discrete distribution that inherit from [rv_discrete](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_discrete.html#scipy.stats.rv_discrete), you only need to implement _pmf. The rest is taken care of magically by scipy.  

For example, here is a discrete poisson distribution: 

In [None]:
class PoissonScipyDistribution(rv_discrete):
    def _pmf(self, k, mu):
        return math.exp(-mu) * mu ** k / factorial(k)

class Poisson(ScipyDistributionWrapper):
    def __init__(self, min_included: float, max_included: float, null_default_value: float = None, mu=0.6):
        super().__init__(
            scipy_distribution=PoissonScipyDistribution(
                a=min_included,
                b=max_included,
                name='poisson'
            ),
            null_default_value=null_default_value,
            mu=mu
        )

### Continuous Distributions

For continous distribution that inherit from [rv_continuous](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html), you only need to implement _pdf function. The rest is taken care of magically by scipy.  

For example, here is a continous gaussian distribution: 

In [None]:
class GaussianScipyDistribution(rv_continuous):
    def _pdf(self, x):
        return math.exp(-x ** 2 / 2.) / np.sqrt(2.0 * np.pi)


class Gaussian(ScipyDistributionWrapper):
    def __init__(self, min_included: int, max_included: int, null_default_value: float = None):
        ScipyDistributionWrapper.__init__(
            self,
            scipy_distribution=GaussianScipyDistribution(
                name='gaussian',
                a=min_included,
                b=max_included
            ),
            null_default_value=null_default_value
        )

### Custom Arguments 

If you want to add additional arguments to the _pdf, and _cmf methods, you have to override the _argcheck method from scipy: 

In [None]:
class LogNormalScipyDistribution(rv_continuous):
    def _pdf(self, x, log2_space_mean, log2_space_std):
        if x <= 0:
            return 0.

        cdf_min = 0.
        cdf_max = 1.

        pdf_x = 1 / (x * math.log(2) * log2_space_std * math.sqrt(2 * math.pi)) * math.exp(
            -(math.log2(x) - log2_space_mean) ** 2 / (2 * log2_space_std ** 2))
        return pdf_x / (cdf_max - cdf_min)

    def _argcheck(self, *args):
        cond = 1
        for arg in args:
            cond = np.logical_and(
                cond,
                isinstance(arg, np.ndarray) and \
                (arg.dtype == np.float or arg.dtype == np.int)
            )
        return cond

### Scipy methods

All of the scipy distribution methods are available:

In [None]:
samples = get_many_samples_for(hd)

for s in samples:
    assert type(s) == int
samples_mean = np.abs(np.mean(samples))
assert samples_mean < 1.0
assert min(samples) >= -10.0
assert max(samples) <= 10.0

assert hd.pdf(-11) == 0.
assert abs(hd.pdf(-10) - 1 / (10 + 10 + 1)) < 1e-6
assert abs(hd.pdf(0) - 1 / (10 + 10 + 1)) < 1e-6
assert hd.pdf(0.5) == 0.
assert abs(hd.pdf(10) - 1 / (10 + 10 + 1)) < 1e-6
assert hd.pdf(11) == 0.

assert hd.cdf(-10.1) == 0.
assert abs(hd.cdf(-10) - 1 / (10 + 10 + 1)) < 1e-6
assert abs(hd.cdf(0) - (0 + 10 + 1) / (10 + 10 + 1)) < 1e-6
assert abs(hd.cdf(5) - (5 + 10 + 1) / (10 + 10 + 1)) < 1e-6
assert abs(hd.cdf(10) - 1.) < 1e-6
assert hd.cdf(10.1) == 1.

assert hd.logpdf(5) == -13.418938533204672
assert hd.logcdf(5) == -0.6931477538632531
assert hd.sf(5) == 0.5000002866515718
assert hd.logsf(5) == -0.693146607256966
assert np.all(hd.ppf([0.0, 0.01, 0.05, 0.1, 1 - 0.10, 1 - 0.05, 1 - 0.01, 1.0], 10))
assert hd.isf(q=0.5) == 8.798228093189323
assert hd.moment(2) == 50.50000000091249
assert hd.stats()[0]
assert hd.stats()[1]
assert np.array_equal(hd.entropy(), np.array(0.7094692666023363))
assert hd.median()
assert hd.mean() == 5.398942280397029
assert hd.std() == 4.620759921685374
assert hd.var() == 21.35142225385382
assert hd.expect() == 0.39894228040143276
interval = hd.interval(alpha=[0.25, 0.50])
assert np.all(interval[0])
assert np.all(interval[1])
assert hd.support() == (0, 10)

## SKLearn Hyperparams

SKLearnWrapper wraps sklearn predictors so that they can be compatible with Neuraxle. When you set the hyperparams of an SKLearnWrapper, it automatically sets the params of the sklearn predictor for you: 

In [None]:
from neuraxle.hyperparams.distributions import Choice
from neuraxle.hyperparams.distributions import RandIn
from neuraxle.hyperparams.space import HyperparameterSpace
from neuraxle.steps.sklearn import SKLearnWrapper
from sklearn.tree import DecisionTreeClassifier


decision_tree_classifier = SKLearnWrapper(
    DecisionTreeClassifier(), 
    HyperparameterSpace({
        'criterion': Choice(['gini', 'entropy']), 
        'splitter': Choice(['best', 'random']),
        'min_samples_leaf': RandInt(2, 5), 
        'min_samples_split': RandInt(1, 3) 
    })
).set_hyperparams(HyperparameterSamples({
    'criterion': 'gini', 
    'splitter': 'best',
    'min_samples_leaf': 3, 
    'min_samples_split': 3 
}))