Skip to content
Library for bootstrapping statistics
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bootstrap
tests
.gitignore
.travis.yml
README.md
requirements.txt
setup.py

README.md

Bootstrap

TRAVIS Coverage Status

A library for bootstrapping statistics.

Features

While incomplete, the library already incudes a number of features:

  • Bootstrap samples
  • Bootstrap matrices
  • Bootstrap statistics
    • Provides SEM and confidence intervals for statistics
  • Jackknife samples and statistics
  • Two sample testing

Installation

python setup.py install

Usage

Here, we document some of the library features using the University of Wisconsin breast cancer data set. Available here. For simplicity, only the first dimension will be looked at.

import numpy as np
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()

First, we will look at how the data are distributed.

import matplotlib.pyplot as plt
import seaborn as sns
plt.hist(data.data[:,0], bins=40)
plt.title('Measurements')

Data

Next, we will bootstrap 10,000 samples, to bootstrap the mean and 95% confidence interval for the mean. Below, the mean of each bootstrapped sample is plotted, with the estimated mean and confidence intervals shown.

results = bootstrap_statistic(data.data[:,0], func=np.mean, n_samples=10000)

# Make plot of bootstrapped mean
plt.hist(results.statistics, bins=40)
plt.title('Bootstrapped Means')
plt.xlabel('Mean')
plt.ylabel('Counts')
ax = plt.gca()
ax.axvline(x=results.ci[0], color='red', linestyle='dashed', linewidth=2)
ax.axvline(x=results.ci[1], color='red', linestyle='dashed', linewidth=2)
ax.axvline(x=results.statistic, color='black', linewidth=5)

Mean

An advantage of the bootstrap method is its adaptability. For example, you can bootstrap an estimate of the 95th percentile of the data.

def percentile(data):
    """returns 95th percentile of data"""
    return np.percentile(data, 95)
    
# Bootstrap the 95th percentile
results = bootstrap_statistic(data.data[:,0], func=percentile, n_samples=10000)

# Make plot of bootstrapped 95th percentile
plt.hist(results.statistics, bins=40)
plt.title('Bootstrapped 95th Percentiles')
plt.xlabel('95th Percentile')
plt.ylabel('Counts')
ax = plt.gca()
ax.axvline(x=results.ci[0], color='red', linestyle='dashed', linewidth=2)
ax.axvline(x=results.ci[1], color='red', linestyle='dashed', linewidth=2)
ax.axvline(x=results.statistic, color='black', linewidth=5)

Percentile

Additionally, the library can perform two sample testing. First lets view the distribution of the same data, but broken up by tumor type.

benign = data.data[data.target == 0]
malignant = data.data[data.target == 1]

# Plot benign and malignant samples
plt.hist(benign[:,0], bins=30, alpha=0.5, label='benign')
plt.hist(malignant[:,0], bins=30, alpha=0.5, label='malignant')
plt.legend()
plt.xlabel('Measurement')
plt.ylabel('Counts')

split

It appears their is a different in the groups distribution. The level of significance can be computer via the bootstrap method.

significance = two_sample_testing(benign[:, 0], malignant[:, 0],
                                  statistic_func=compare_means,
                                  n_samples=5000)
print(significance) # prints 0.0

Hmmm, with 5,000 random bootstrapped samples, not a single one had the difference of means of the observed samples.

What about a feature that is less predictive? Below, we look at feature 9.

plt.hist(benign[:,9], bins=30, alpha=0.5, label='benign')
plt.hist(malignant[:,9], bins=30, alpha=0.5, label='malignant')
plt.legend()
plt.xlabel('Measurement')
plt.ylabel('Counts')

Feature9

If then bootstrap the difference between the two means, we get a non-significant difference.

significance = two_sample_testing(malignant[:, 9], benign[:, 9],
                                  statistic_func=compare_means,
                                  n_samples=5000)
print(significance) # prints 0.387
You can’t perform that action at this time.