# Bootstrapping

The confidence intervals calculated with the uncertain fit parameters determined using ```scipy.optimize.curve_fit``` or ```lmfit.fit``` are symmetrical. We get $\pm$ some value as the standard deviation. This is a perfectly good was to estimate the confidence interval (all errors are estimated) but it hides a subtle truth. The confidence is not symmetrical. The data does not bnecesssarity have the same contribution to each parameter across the range of measuremnts and some regions may have more error than others, especially with non-linear curve fits.

One way to estimate a confidence interval that reflects these regional effects in your data series is to use bootstrapping. It is a computationally expensive method where we randomly select data points from our data set and perfomr the curve fit hundreds, perhaps thousands of times. Then we values for the parameters that encompass the 68%, 95% and 99% percentiles ($1\sigma$, $2\sigma$, $3\sigma$) or any arbitrary range we like. The median values and these upper and lower bounds will reflect a more accurate picture of estimated error, if we have enough data.

This method is not very useful with small data sets as there may not be enough combinations such that percentiles are meaningful. Choosing the 95% percentile range from 6 values is not useful. 



In [11]:
### Install and load packages
# 
# #!pip install uncertainties              # uncomment to install dependancy

from scipy.optimize import curve_fit     # tool for curve fitting
from scipy.stats import linregress     # tool for curve fitting
import scipy                             # includes tools for data analysis
import scipy.stats
import numpy as np                       # import the tools of NumPy but use a shorter name, "np"
from matplotlib import pyplot as plt     # tools for plotting
import pandas as pd

import uncertainties as un               # tool set for handling numbers with uncertainties
from uncertainties import unumpy as unp  # a replacement for numpy that uses uncertainty values

### Set global variables

location_data = "../data/"                   ## Use either local folde or github folder. use github locations for Colab
location_styles = "../styles/"
#location_data = "https://raw.githubusercontent.com/blinkletter/StealThisCode/main/data/"
#location_styles = "https://raw.githubusercontent.com/blinkletter/StealThisCode/main/styles/"


## The Data Set

We will begin with the 5-point data set that we have been using so far and then also use the 12-point data set from the csv data file as we have also seen done in this series of eamples. First, the smaller set will be used.

In [15]:
temp = [293, 298, 303, 308, 313]       # list of temperatures
k_obs = [7.6, 11.7, 15.2, 21.3, 27.8]  # list of observe rate constants (s^-1)
k_obs_err= [0.2 , 0.3, 0.1, 0.9, 0.9]  # list of standard deviations for data

### Convert lists to numpy arrays (enables numpy math tools with these lists)
temp = np.array(temp)
k = unp.uarray(k_obs, k_obs_err)   # make an array of ufloat values

x = 1/temp
y_u = unp.log(k/temp)         # uncertain array for y-axis
y = unp.nominal_values(y_u) # extract arrays of nominal values and errors
y_err = unp.std_devs(y_u)   # because curve_fit can handle ufloats

print(x)


def linear(x, slope, intercept):
    return slope * x + intercept

[0.00341297 0.0033557  0.00330033 0.00324675 0.00319489]


In [38]:
import scikits.bootstrap as bs
from scipy.stats import bootstrap

m = 1; b = 1
def linear(x, m, b):
    y = m*x + b
    return y

bootstrap(data=(x,y), statfunction = linear((x,m,b)), method = "percentile")

Help on function bootstrap in module scipy.stats._resampling:

bootstrap(data, statistic, *, n_resamples=9999, batch=None, vectorized=None, paired=False, axis=0, confidence_level=0.95, alternative='two-sided', method='BCa', bootstrap_result=None, random_state=None)
    Compute a two-sided bootstrap confidence interval of a statistic.
    
    When `method` is ``'percentile'`` and `alternative` is ``'two-sided'``,
    a bootstrap confidence interval is computed according to the following
    procedure.
    
    1. Resample the data: for each sample in `data` and for each of
       `n_resamples`, take a random sample of the original sample
       (with replacement) of the same size as the original sample.
    
    2. Compute the bootstrap distribution of the statistic: for each set of
       resamples, compute the test statistic.
    
    3. Determine the confidence interval: find the interval of the bootstrap
       distribution that is
    
       - symmetric about the median and
     

TypeError: linear() missing 2 required positional arguments: 'm' and 'b'

In [39]:
bs.ci(np.random.rand(100), np.average)

array([0.48918114, 0.59969109])