# Tutorial: experimenting with models

In this tutorial you will learn:

 - How to play with model variations
 - How UltraNest can resume and reuse an existing run, even if you modify the data/likelihood

As a simple example, lets say we want to estimate the mean and standard deviation of a sample of points. Over time, more and more points are added.

## Generate some data

In [None]:
import numpy as np
from numpy import pi, log

np.random.seed(1)
Ndata = 200
mean_true = 42.0
sigma_true = 0.1
y = np.random.normal(mean_true, sigma_true, size=Ndata)


## Visualise the data

Lets plot the data first to see what is going on:



In [None]:
import matplotlib.pyplot as plt

plt.figure()
plt.errorbar(x=np.arange(Ndata), y=y, yerr=sigma_true, marker='x', ls=' ');


## Model setup

In [None]:
from ultranest import ReactiveNestedSampler

parameters = ['mean', 'scatter']

def prior_transform(x):
    z = np.empty_like(x)
    z[0] = x[0] * 2000 - 1000
    z[1] = 10**(x[1] * 4 - 2)
    return z

import scipy.stats
def log_likelihood(params):
    mean, sigma = params
    return scipy.stats.norm(mean, sigma).logpdf(yseen).sum()


## Adding one data point at a time, no warm start

In [None]:
reference_results = []

for i in range(10, Ndata, 20):
    print()
    print("Iteration with %d data points" % i)
    yseen = y[:i]
    sampler_ref = ReactiveNestedSampler(parameters, log_likelihood, prior_transform)
    res_ref = sampler_ref.run(min_num_live_points=400, max_num_improvement_loops=0, viz_callback=None, frac_remain=0.5)
    reference_results.append(res_ref)


## Adding one data point at a time, with warm start

In [None]:
results = []

yseen = y[:i]

# delete any existing content:
ReactiveNestedSampler(parameters, log_likelihood, prior_transform,
                      log_dir='warmstartdoc', resume='overwrite')

for i in range(10, Ndata, 20):
    print()
    print("Iteration with %d data points" % i)
    
    yseen = y[:i]
    sampler = ReactiveNestedSampler(parameters, log_likelihood, prior_transform,
                                              log_dir='warmstartdoc', resume='resume-similar')
    ncall_initial = int(sampler.ncall)
    res = sampler.run(frac_remain=0.5, viz_callback=None)
    results.append((i, res, ncall_initial))



## Likelihood evaluations saved

In [None]:
ndim = len(parameters)
plt.figure(figsize=(10, 10))
for (i, res, ncall_initial), res_ref in zip(results, reference_results):
    for j in range(ndim):
        plt.subplot(ndim + 2, 1, 1+j)
        plt.ylabel(parameters[j])
        plt.errorbar(x=i, y=res['samples'][:,j].mean(), yerr=res['samples'][:,j].std(), marker='x', color='r')
        plt.errorbar(x=i, y=res_ref['samples'][:,j].mean(), yerr=res_ref['samples'][:,j].std(), marker='x', color='gray')
    
    plt.subplot(ndim + 2, 1, 1+ndim)
    plt.ylabel('$\log(\Delta Z)$')
    plt.plot(i, res['logz'] - res_ref['logz'], 'x', color='r')
    plt.subplot(ndim + 2, 1, 1+ndim+1)
    plt.ylabel('Likelihood call fraction')
    plt.plot(i, ((res['ncall'] - ncall_initial) / res_ref['ncall']), 'x', color='r')
    plt.ylim(0, 1)

plt.subplot(ndim + 2, 1, 1)
plt.hlines(mean_true, 0, i+1, color='k', linestyles=':')
plt.subplot(ndim + 2, 1, 2)
plt.hlines(sigma_true, 0, i+1, color='k', linestyles=':')


## Conclusion

Notice the time saving in the bottom panel a cost reduction of ~30%. This benefit is *independent of problem dimension*. The cost savings are higher, the more similar the modified problem is.

This feature allows you to:

* vary the data (change the analysis pipeline)
* vary model assumptions 

**without needing to start the computation from scratch** (potentially costly).

Warmstart (resume='resume-similar') is *experimental* and it is recommended to do a full, clean run to obtain final, reliable results before publication.
