# Example: Estimate the parameter space probabiliy density with a 1D data space

([From BET Documentation](http://ut-chg.github.io/BET/examples/example_rst_files/Q_1D.html#q1d))

In this example the parameter space $\Lambda \subset \mathbb{R}^2$ is 2 dimensional. This example demostrates three different methods to estimate $\hat{\rho}_{\Lambda, j}$ where
$$P_\Lambda \approx \sum_{\mathcal{V}_j \subset A} \hat{\rho}_{\Lambda, j}.$$

These methods are distinguished primarily by the way $\mathcal{V}_j$ are defined and the approximation of the volume of $\mathcal{V}_j$. See [Q_1D.py](Q_1D.py) for the example source code.

First, import the necessary packages and modules:

In [1]:
import bet.sampling.basicSampling as bsam
import bet.calculateP.calculateP as calcP
import bet.calculateP.simpleFunP as sfun
import numpy as np
import scipy.io as sio
import bet.sample as sample

Load the data where our parameter space is 2-dimensional and load a reference solution:

In [4]:
# Import "Truth" (reference solution)
mdat = sio.loadmat('../matfiles/Q_2D')
Q = mdat['Q']
Q_ref = mdat['Q_true']

# Import Data 
points = mdat['points']
lam_domain = np.array([[0.07, .15], [0.1, 0.2]])

We will use the to the `points`, $\lambda_{samples} = \{ \lambda^{(j) } \}, j = 1, \ldots, N$, to create an input `sample_set` object. These `points` are the points in parameter space where we solve the forward model to generate the data `Q` where $Q_j = Q(\lambda^{(j)})$.

Define the parameter domain $\Lambda$:

In [5]:
lam_domain = np.array([[0.07, .15], [0.1, 0.2]])

Create input sample set objects:

In [6]:
input_sample_set = sample.sample_set(points.shape[0])
input_sample_set.set_values(points.transpose())
input_sample_set.set_domain(lam_domain)

### Methods for approximating $\hat{\rho}_{\Lambda, j}$
For ease of use we have created a function, `postprocess(station_nums, ref_num)` so that we can loop through different QoI (maximum water surface height at various measurement stations) and reference solutions (point in data space around which we center a uniform probability solution. The three methods for approximating $\hat{\rho}_{\Lambda, j}$ are combined in the postprocessing function. 

The function is defined as follows:

```python
def postprocess(station_nums, ref_num):
```

Define the filename to save $\hat{\rho}_{\Lambda, j}$ to:

```python
    filename = 'P_q'+str(station_nums[0]+1)+'_q'
    if len(station_nums) == 3:
        filename += '_q'+str(station_nums[2]+1)
    filename += '_truth_'+str(ref_num+1)
```

Define the data space $\mathcal{D} \subset \mathbb{R}^d$ where $d$ is the dimension of the data space:

```python
    data = Q[:, station_nums]
    output_sample_set = sample.sample_set(data.shape[1])
    output_sample_set.set_values(data)
```

Define the refernce solution. We define a region of interest, $R_{ref} \subset \mathcal{D}$ centered at $Q_{ref}$ that is 15% the length of $q_n$ (the QoI for station $n$). We set $\rho_\mathcal{D}(q) = \dfrac{\mathbf{1}_{R_{ref}}(q)}{||\mathbf{1}_{R_{ref}}||}$ and then create a simple function approximation to this density:

```python
    q_ref = Q_ref[ref_num, station_nums]
    output_probability_set = sfun.regular_partition_uniform_distribution_rectangle_scaled(\
            output_sample_set, q_ref, rect_scale=0.15,
            cells_per_dimension=np.ones((data.shape[1],)))
```

We generate 1e6 uniformly distributed points in $\Lambda$. We call these points $\lambda_{emulate} = \{ \lambda_j \}_{j=1}^{10^6}$:

```python
    num_l_emulate = 1e4
    set_emulated = bsam.random_sample_set('r', lam_domain, num_l_emulate)
    my_disc = sample.discretization(input_sample_set, output_sample_set,
            output_probability_set, emulated_input_sample_set=set_emulated)
```

**Estimation Method 1:** Calculate $\hat{\rho}_{\Lambda, j}$ where $\mathcal{V}_j$ are the voronoi cells defined by $\lambda_{emulate}$:

```python
    calcP.prob_on_emulated_samples(my_disc)
    sample.save_discretization(my_disc, filename, "prob_on_emulated_samples_solution")
```

**Estimation Method 2:** Calculate $\hat{\rho}_{\Lambda, j}$ where $\mathcal{V}_j$ are the voronoi cells defined by $\lambda_{samples}$ assume that $\lambda_{samples}$ are uniformly distributed and therefore have approximately the same volume:

```python
    input_sample_set.estimate_volume_mc()
    calcP.prob(my_disc)
    sample.save_discretization(my_disc, filename, "prob_solution")
```

**Estimation Method 3:** Calculate $\hat{\rho}_{\Lambda, j}$ where $\mathcal{V}_j$ are the voronoi cells defined by $\lambda_{samples}$ and we approximate the volume of $\mathcal{V}_j$ using Monte Carlo integration. We use $\lambda_{emulate}$ to estimate the volume of $\mathcal{V}_j$.

```python
    calcP.prob_with_emulated_volumes(my_disc)
    sample.save_discretization(my_disc, filename, "prob_with_emulated_volumes_solution")
```

Putting the above pieces together, the function `postprocess(station_nums, ref_num)` will be written as follows:

In [7]:
def postprocess(station_nums, ref_num):
    
    filename = 'P_q'+str(station_nums[0]+1)+'_q'
    if len(station_nums) == 3:
        filename += '_q'+str(station_nums[2]+1)
    filename += '_ref_'+str(ref_num+1)

    data = Q[:, station_nums]
    output_sample_set = sample.sample_set(data.shape[1])
    output_sample_set.set_values(data)
    q_ref = Q_ref[ref_num, station_nums]

    # Create Simple function approximation
    # Save points used to parition D for simple function approximation and the
    # approximation itself (this can be used to make close comparisions...)
    output_probability_set = sfun.regular_partition_uniform_distribution_rectangle_scaled(\
            output_sample_set, q_ref, rect_scale=0.15,
            cells_per_dimension=np.ones((data.shape[1],)))

    num_l_emulate = 1e4
    set_emulated = bsam.random_sample_set('r', lam_domain, num_l_emulate)
    my_disc = sample.discretization(input_sample_set, output_sample_set,
            output_probability_set, emulated_input_sample_set=set_emulated)

    print "Finished emulating lambda samples"

    # Calculate P on lambda emulate
    print "Calculating prob_on_emulated_samples"
    calcP.prob_on_emulated_samples(my_disc)
    sample.save_discretization(my_disc, filename, "prob_on_emulated_samples_solution")

    # Calclate P on the actual samples with assumption that voronoi cells have
    # equal size
    input_sample_set.estimate_volume_mc()
    print "Calculating prob"
    calcP.prob(my_disc)
    sample.save_discretization(my_disc, filename, "prob_solution")

    # Calculate P on the actual samples estimating voronoi cell volume with MC
    # integration
    calcP.prob_with_emulated_volumes(my_disc)
    print "Calculating prob_with_emulated_volumes"
    sample.save_discretization(my_disc, filename, "prob_with_emulated_volumes_solution")

Finally, having defined our postprocessing function, we calculate $\hat{\rho}_{\Lambda, j}$ for three reference solutions and 3 QoI:

In [8]:
ref_nums = [6, 11, 15] # 7, 12, 16
stations = [1, 4, 5] # 2, 5, 6

ref_nums, stations = np.meshgrid(ref_nums, stations)
ref_nums = ref_nums.ravel()
stations = stations.ravel()

for tnum, stat in zip(ref_nums, stations):
    postprocess([0], tnum)

Finished emulating lambda samples
Calculating prob_on_emulated_samples
Calculating prob
Calculating prob_with_emulated_volumes
Finished emulating lambda samples
Calculating prob_on_emulated_samples
Calculating prob
Calculating prob_with_emulated_volumes
Finished emulating lambda samples
Calculating prob_on_emulated_samples
Calculating prob
Calculating prob_with_emulated_volumes
Finished emulating lambda samples
Calculating prob_on_emulated_samples
Calculating prob
Calculating prob_with_emulated_volumes
Finished emulating lambda samples
Calculating prob_on_emulated_samples
Calculating prob
Calculating prob_with_emulated_volumes
Finished emulating lambda samples
Calculating prob_on_emulated_samples
Calculating prob
Calculating prob_with_emulated_volumes
Finished emulating lambda samples
Calculating prob_on_emulated_samples
Calculating prob
Calculating prob_with_emulated_volumes
Finished emulating lambda samples
Calculating prob_on_emulated_samples
Calculating prob
Calculating prob_with_e

# Example: Estimate the parameter space probabiliy density with a 2D data space
([From BET Documentation](http://ut-chg.github.io/BET/examples/example_rst_files/Q_2D.html))

In this example the parameter space $\Lambda \subset \mathbb{R}^2$ is 2 dimensional. This example demostrates three different methods to estimate $\hat{\rho}_{\Lambda, j}$ where
$$P_\Lambda \approx \sum_{\mathcal{V}_j \subset A} \hat{\rho}_{\Lambda, j}.$$

These methods are distinguished primarily by the way $\mathcal{V}_j$ are defined and the approximation of the volume of $\mathcal{V}_j$. See [Q_2D.py](Q_2D.py) for the example source code. Since this example is essentially the same as the previous example in this notebook that estimates the parameter space probabiliy density with a 1D data space we will only highlight the differences between the two.

>**Note:** *If the code from the previous example above has already been run, then the majority of environment has already been defined and the following code excerpts can be run as written.*

First, redefine the input sample set, here it is 2D rather than 1D:



In [10]:
# Import "Truth"
mdat = sio.loadmat('../matfiles/Q_2D')
Q = mdat['Q']
Q_ref = mdat['Q_true']

# Import Data
points = mdat['points']
lam_domain = np.array([[0.07, .15], [0.1, 0.2]]) # Note this is now 2D


In [11]:
input_sample_set = sample.sample_set(points.shape[0])
input_sample_set.set_values(points.transpose())
input_sample_set.set_domain(lam_domain)

Edit the postprocessing function, `postprocess(station_nums, ref_num)`, defined earlier in the following ways.

First, change the save filename for the estimates of $\hat{\rho}_{\Lambda, j}$:

```python
    filename = 'P_q'+str(station_nums[0]+1)+'_q'+str(station_nums[1]+1)
    if len(station_nums) == 3:
        filename += '_q'+str(station_nums[2]+1)
    filename += '_truth_'+str(ref_num+1)
```

Define the data space $\mathcal{D} \subset \mathbb{R}^d$ where $d$ is the dimension of the data space:

```python
    data = Q[:, station_nums]
    output_sample_set = sample.sample_set(data.shape[1])
    output_sample_set.set_values(data)
```

Define the refernce solution. We define a region of interest, $R_{ref} \subset \mathcal{D}$ centered at $Q_{ref}$ with sides 15% the length of $q_{station\_num[0]}$ and $q_{station\_num[1]}$ (the QoI for stations $n$). We set $\rho_\mathcal{D}(q) = \dfrac{\mathbf{1}_{R_{ref}}(q)}{||\mathbf{1}_{R_{ref}}||}$ and then create a simple function approximation to this density:

```python
    q_ref = Q_ref[ref_num, station_nums]

    output_probability_set = sfun.regular_partition_uniform_distribution_rectangle_scaled(\
            output_sample_set, q_ref, rect_scale=0.15,
            cells_per_dimension=np.ones((data.shape[1],)))
```

As above, the postprocessing function, `postprocess(station_nums, ref_num)`, will estimate the parameter $\hat{\rho}_{\Lambda, j}$ using the three different methods discussed earlier. The modified `postprocess(station_nums, ref_num)` function is shown in its entirety below:

In [12]:
def postprocess(station_nums, ref_num):
    
    filename = 'P_q'+str(station_nums[0]+1)+'_q'+str(station_nums[1]+1)
    if len(station_nums) == 3:
        filename += '_q'+str(station_nums[2]+1)
    filename += '_ref_'+str(ref_num+1)

    data = Q[:, station_nums]
    output_sample_set = sample.sample_set(data.shape[1])
    output_sample_set.set_values(data)
    q_ref = Q_ref[ref_num, station_nums]

    # Create Simple function approximation
    # Save points used to parition D for simple function approximation and the
    # approximation itself (this can be used to make close comparisions...)
    output_probability_set = sfun.regular_partition_uniform_distribution_rectangle_scaled(\
            output_sample_set, q_ref, rect_scale=0.15,
            cells_per_dimension=np.ones((data.shape[1],)))

    num_l_emulate = 1e4
    set_emulated = bsam.random_sample_set('r', lam_domain, num_l_emulate)
    my_disc = sample.discretization(input_sample_set, output_sample_set,
            output_probability_set, emulated_input_sample_set=set_emulated)

    print "Finished emulating lambda samples"

    # Calculate P on lambda emulate
    print "Calculating prob_on_emulated_samples"
    calcP.prob_on_emulated_samples(my_disc)
    sample.save_discretization(my_disc, filename, "prob_on_emulated_samples_solution")

    # Calclate P on the actual samples with assumption that voronoi cells have
    # equal size
    input_sample_set.estimate_volume_mc()
    print "Calculating prob"
    calcP.prob(my_disc)
    sample.save_discretization(my_disc, filename, "prob_solution")

    # Calculate P on the actual samples estimating voronoi cell volume with MC
    # integration
    calcP.prob_with_emulated_volumes(my_disc)
    print "Calculating prob_with_emulated_volumes"
    sample.save_discretization(my_disc, filename, "prob_with_emulated_volumes_solution")

Finally, we calculate $\hat{\rho}_{\Lambda, j}$ for three reference solutions and the QoI $( (q_1,q_2), (q_1, q_5)$, and $(q_1, q_6))$.

In [13]:
# Post-process and save P and emulated points
ref_nums = [6, 11, 15] # 7, 12, 16
stations = [1, 4, 5] # 2, 5, 6

ref_nums, stations = np.meshgrid(ref_nums, stations)
ref_nums = ref_nums.ravel()
stations = stations.ravel()

for tnum, stat in zip(ref_nums, stations):
    postprocess([0, stat], tnum)


Finished emulating lambda samples
Calculating prob_on_emulated_samples
Calculating prob
Calculating prob_with_emulated_volumes
Finished emulating lambda samples
Calculating prob_on_emulated_samples
Calculating prob
Calculating prob_with_emulated_volumes
Finished emulating lambda samples
Calculating prob_on_emulated_samples
Calculating prob
Calculating prob_with_emulated_volumes
Finished emulating lambda samples
Calculating prob_on_emulated_samples
Calculating prob
Calculating prob_with_emulated_volumes
Finished emulating lambda samples
Calculating prob_on_emulated_samples
Calculating prob
Calculating prob_with_emulated_volumes
Finished emulating lambda samples
Calculating prob_on_emulated_samples
Calculating prob
Calculating prob_with_emulated_volumes
Finished emulating lambda samples
Calculating prob_on_emulated_samples
Calculating prob
Calculating prob_with_emulated_volumes
Finished emulating lambda samples
Calculating prob_on_emulated_samples
Calculating prob
Calculating prob_with_e

# Example: Estimate the parameter space probabiliy density with a 3D data space

([From BET Documentation](http://ut-chg.github.io/BET/examples/example_rst_files/Q_3D.html))

In these examples the parameter space $\Lambda \subset \mathbb{R}^3$ is 3 dimensional.

This example demostrates how to estimate $\hat{\rho}_{\Lambda, j}$ using `prob()` where
$$P_\Lambda \approx \sum_{\mathcal{V}_j \subset A} \hat{\rho}_{\Lambda, j}.$$

See [Q_3D.py](Q_3D.py) for the example source code. Since example is essentially the same as the previous examples in this notebook for estimating the parameter space probabiliy density with a 1D and 2D data spaces, we will only highlight the differences between the two.

>**Note:** *If the code from the previous example above has already been run, then the majority of environment has already been defined and the following code excerpts can be run as written.*

First, instead of loading data for a 2-dimensional parameter space we load data for a 3-dimensional data space:

In [19]:
# Import "Truth"
mdat = sio.loadmat('../matfiles/Q_3D')
Q = mdat['Q']
Q_ref = mdat['Q_true']

# Import Data
samples = mdat['points'].transpose()

We define the parameter domain $\Lambda$:

In [20]:
lam_domain = np.array([[-900, 1200], [0.07, .15], [0.1, 0.2]])

Define the input sample set, here it is 3D rather than 2D:

In [21]:
# Create input, output, and discretization from data read from file
points = mdat['points']
input_sample_set = sample.sample_set(points.shape[0])
input_sample_set.set_values(points.transpose())
input_sample_set.set_domain(lam_domain)

In our postprocessing function, simply change the naming convention for the filename to save $\hat{\rho}_{\Lambda, j}$:

```python
    filename = 'P_q'+str(station_nums[0]+1)+'_q'+str(station_nums[1]+1)
    if len(station_nums) == 3:
        filename += '_q'+str(station_nums[2]+1)
    filename += '_ref_'+str(ref_num+1)
```

The edited postprocessing function `postprocess(station_nums, ref_num)` is shown in its entirety below:

In [22]:
def postprocess(station_nums, ref_num):
    
    filename = 'P_q'+str(station_nums[0]+1)+'_q'+str(station_nums[1]+1)
    if len(station_nums) == 3:
        filename += '_q'+str(station_nums[2]+1)
    filename += '_ref_'+str(ref_num+1)

    data = Q[:, station_nums]
    output_sample_set = sample.sample_set(data.shape[1])
    output_sample_set.set_values(data)
    q_ref = Q_ref[ref_num, station_nums]

    # Create Simple function approximation
    # Save points used to parition D for simple function approximation and the
    # approximation itself (this can be used to make close comparisions...)
    output_probability_set = sfun.regular_partition_uniform_distribution_rectangle_scaled(\
            output_sample_set, q_ref, rect_scale=0.15,
            cells_per_dimension=np.ones((data.shape[1],)))

    my_disc = sample.discretization(input_sample_set, output_sample_set,
            output_probability_set)

    # Calclate P on the actual samples with assumption that voronoi cells have
    # equal size
    input_sample_set.estimate_volume_mc()
    print "Calculating prob"
    calcP.prob(my_disc)
    sample.save_discretization(my_disc, filename, "prob_solution")


### Example Solutions

Finally, we calculate $\hat{\rho}_{\Lambda, j}$ for the 15th reference solution at:

* $Q = (q_1, q_5, q_2)$, 
* $Q=(q_1, q_5)$, 
* $Q=(q_1, q_5, q_{12})$,
* $Q=(q_1, q_9, q_7),$ and 
* $Q=(q_1, q_9, q_{12})$.

Try other reference solutions or other points in $Q$.


In [23]:
# Post-process and save P and emulated points
ref_num = 14 # 15th reference solution
#ref_num = 15 # 16th reference solution

# q1, q5, q2 
station_nums = [0, 4, 1] # 1, 5, 2
postprocess(station_nums, ref_num)


# q1, q5
# station_nums = [0, 4] # 1, 5
# postprocess(station_nums, ref_num)

# q1, q5, q12
#station_nums = [0, 4, 11] # 1, 5, 12
#postprocess(station_nums, ref_num)


#station_nums = [0, 8, 6] # 1, 9, 7
#postprocess(station_nums, ref_num)


#station_nums = [0, 8, 11] # 1, 9, 12
#postprocess(station_nums, ref_num)


Calculating prob
