# Introduction to random distribution

> Correctly called random variable, these random variable are usefull in hyperparameter tuning.  

For each hyperparmeter, a range can be defined, that is, a statistical distribution, which makes the hyperparameter a random variable.
This random variable will defined what value the hyperparameter is likely to take. 

Let's explore the hyperparameter distributions, by plotting the following graph :
    - Probability distribution function (pdf) or probability mass function (pmf)
    - Cumulative distribution function (cdf)
    - Histogram of sampling. 

## Plotting Each Hyperparameter Distribution

Let's import plotting functions, and neuraxle hyperparameter classes. 

In [13]:
# Note: some of the code in the present code block is derived from another project licensed under The MIT License (MIT), 
# Copyright (c) 2017 Vooban Inc. For the full information, see:
#     https://github.com/guillaume-chevalier/Hyperopt-Keras-CNN-CIFAR-100/blob/Vooban/LICENSE

from neuraxle.hyperparams.distributions import *
from neuraxle.hyperparams.space import HyperparameterSpace
from plotting import *
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

DISCRETE_NUM_BINS = 40
CONTINUOUS_NUM_BINS = 1000
NUM_TRIALS = 100000
X_DOMAIN = np.array(range(-100, 600)) / 100

## Discrete Distributions

- Here are now the discrete standard distributions, which sample discrete value or categories.
- For example, Boolean distribution gives either true or false.

### RandInt

In [14]:
discrete_hyperparameter_space = HyperparameterSpace({
    "randint": RandInt(1, 4)
})

plot_distribution_space(discrete_hyperparameter_space, num_bins=DISCRETE_NUM_BINS)


randint:


AttributeError: 'RandInt' object has no attribute 'pdf'

### Boolean

In [12]:
discrete_hyperparameter_space = HyperparameterSpace({
    "boolean": Boolean()
})
plot_distribution_space(discrete_hyperparameter_space, num_bins=DISCRETE_NUM_BINS)

boolean:


AttributeError: 'Boolean' object has no attribute 'pdf'

### Choice

In [None]:
discrete_hyperparameter_space = HyperparameterSpace({
    "choice": Choice([0, 1, 3])
})
plot_distribution_space(discrete_hyperparameter_space, num_bins=DISCRETE_NUM_BINS)

### Priority Choice

In [None]:
discrete_hyperparameter_space = HyperparameterSpace({
    "priority_choice": PriorityChoice([0, 1, 3])
})
plot_distribution_space(discrete_hyperparameter_space, num_bins=DISCRETE_NUM_BINS)

## Continuous Distributions

- Here are now the continuous distributions, which sample a continuous range of values. Those are probably the ones you'll most use. 

### Continuous Uniform

In [None]:
continuous_hyperparameter_space = HyperparameterSpace({
    "uniform": Uniform(2., 4.)
})
plot_distribution_space(continuous_hyperparameter_space, num_bins=CONTINUOUS_NUM_BINS)

### Continuous Loguniform

In [None]:
continuous_hyperparameter_space = HyperparameterSpace({
    "loguniform": LogUniform(1., 4.)
})
plot_distribution_space(continuous_hyperparameter_space, num_bins=CONTINUOUS_NUM_BINS)

### Continuous Normal

In [None]:
continuous_hyperparameter_space = HyperparameterSpace({
    "normal": Normal(3.0, 1.0)
})
plot_distribution_space(continuous_hyperparameter_space, num_bins=CONTINUOUS_NUM_BINS)

### Continuous Lognormal

In [None]:
continuous_hyperparameter_space = HyperparameterSpace({
    "lognormal": LogNormal(1.0, 0.5)
})
plot_distribution_space(continuous_hyperparameter_space, num_bins=CONTINUOUS_NUM_BINS)

### Continuous Normal Clipped 

In [None]:
continuous_hyperparameter_space = HyperparameterSpace({
    "normal_clipped": Normal(3.0, 1.0, hard_clip_min=1., hard_clip_max=5.)
})
plot_distribution_space(continuous_hyperparameter_space, num_bins=CONTINUOUS_NUM_BINS)

### Continuous Lognormal Clipped

In [None]:
continuous_hyperparameter_space = HyperparameterSpace({
    "lognormal_clipped": LogNormal(1.0, 0.5, hard_clip_min=2., hard_clip_max=4.)
})
plot_distribution_space(continuous_hyperparameter_space, num_bins=CONTINUOUS_NUM_BINS)

## Quantized Hyperparameter Distributions

- Here are now the quantized hyperparameter distributions. Those are distributions that yield integers or other precise specific values. 
- Also, notice how there are border effects to the left and right of the charts when we use `Quantized(...)` as a distribution wrapper to round the numbers. 
- Those weird border effect wouldn't appear if you'd limit the distribution to half numbers instead of plain number. 
- Let's say you have a `Quantized(Uniform(-10, 10))`: then the samples from approximately -9.5 to -8.5 are rounded to 
- The bin of the number -9, but the values from -10 to -9.5 are rounder to the bin -10 and a half is missing, so the -10 
- bin sees its values sampled half as often as -9. That explains the border effect, and you could fix it easily by taking the uniform range from -10.49999 to 10.49999.

### Quantized Uniform

In [None]:
quantized_hyperparameter_space = HyperparameterSpace({
    "quantized uniform": Quantized(Uniform(1., 5.))
})
plot_distribution_space(quantized_hyperparameter_space, num_bins=DISCRETE_NUM_BINS)

### Repaired Quantized Uniform

In [None]:
quantized_hyperparameter_space = HyperparameterSpace({
    "repaired quantized uniform": Quantized(Uniform(0.49999, 5.49999))
})

plot_distribution_space(quantized_hyperparameter_space, num_bins=DISCRETE_NUM_BINS)

### Quantized Log Uniform

In [None]:
quantized_hyperparameter_space = HyperparameterSpace({
    "quantized loguniform": Quantized(LogUniform(1.0, 4.0))
})

plot_distribution_space(quantized_hyperparameter_space, num_bins=DISCRETE_NUM_BINS)

### Quantized Normal 

In [None]:
quantized_hyperparameter_space = HyperparameterSpace({
    "quantized normal": Quantized(Normal(3.0, 1.0))
})

plot_distribution_space(quantized_hyperparameter_space, num_bins=DISCRETE_NUM_BINS)

### Quantized Lognormal

In [None]:
quantized_hyperparameter_space = HyperparameterSpace({
    "quantized lognormal": Quantized(LogNormal(1.0, 0.5))
})
plot_distribution_space(quantized_hyperparameter_space, num_bins=DISCRETE_NUM_BINS)