Skip to content

dougiesquire/xbootstrap

Repository files navigation

xbootstrap

PyPI tests build codecov License: MIT Code style: black

xbootstrap is a simple package for performing nested* circular block bootstrapping of xarray objects.

  • Bootstrapping is random resampling with replacement.
  • Block boostrapping is a simple way to account for autocorrelation in the data being randomly resampled.
  • Circular block bootstrapping avoids undersampling data values near the beginning and end of the dimension(s) being resampled. This is optional in xbootstrap.
  • * Here, nested bootstrapping means that, when multiple dimensions are specified, the first dimension is randomly resampled, then for each resampled element along that dimension the second dimension is randomly resampled, then for each resampled element along that dimension the third dimension is randomly resampled etc.

Installation

To install this package from PyPI:

pip install xbootstrap

Example usage

xbootstrap currently comprises a single function called block_bootstrap that can be used as follows:

import numpy as np
import xarray as xr
from xbootstrap import block_bootstrap

# Generate some example data
n_time = 100
n_ensemble = 10
ds1 = xr.DataArray(
    np.random.random((n_time, n_ensemble)),
    coords={"time": range(n_time), "ensemble": range(n_ensemble)},
)
ds2 = xr.DataArray(
    np.random.random((n_time, n_ensemble)),
    coords={"time": range(n_time), "ensemble": range(n_ensemble)},
)
ds3 = xr.DataArray(np.random.random((n_time)), coords={"time": range(n_time)})

# Create 1000 circularly bootstrapped resamples of ds1, ds2 and ds3
# using a blocksize of 5 for the time dimension and 1 for the ensemble
# dimension, and only bootstrapping the time dimension for ds2
ds1_bs, ds2_bs, ds3_bs = block_bootstrap(
    ds1,
    ds2,
    ds3,
    blocks={"time": 5, "ensemble": 1},
    n_iteration=1000,
    exclude_dims=[[], ["ensemble"], []],
    circular=True,
)

block_bootstrap also operates lazily with dask-backed xarray objects, but this requires dask to be installed:

ds1_bs, ds2_bs, ds3_bs = block_bootstrap(
    ds1.chunk(),
    ds2.chunk(),
    ds3.chunk(),
    blocks={"time": 5, "ensemble": 1},
    n_iteration=10,
    exclude_dims=[[], ["ensemble"], []],
    circular=True,
)

Contributing

Contributions are very welcome, particularly in the form of reporting bugs and writing tests. Please open an issue and check out the contributor guide.

References

Wilks, D.S., 2011. Statistical methods in the atmospheric sciences (Vol. 100). Academic press. (particularly Chapter 5.3)