# Bootstrap Resampling
Bootstrap resamples a dataset of $N$ instances with replacement to generate a new dataset of the same size. An instance has a probability of $1â€“1/N$ of not being picked. Thus its probability of ending up in the test data is:

$$
(1-1/N)^N \approx e^{-1} = 0.368$$

This means the training data will contain approximately 63.2% of the instances. 

Let's define a function to generate a dataset applying resampling with replacement from an original dataset.

In [1]:
import numpy as np
import pandas as pd
import random
from sklearn.datasets import load_boston

In [2]:
def bootstrap(dataset, ratio=1.0):
    
    # compute the number of rows of the generated dataset
    n_rows = int(dataset.shape[0]*ratio)
    
    # compute the number of columns
    n_cols = dataset.shape[1]
    
    # create the output dataset with all zero
    sampled_dataset = np.zeros((n_rows,n_cols))
    
    # randomly select a row from the original dataset and then copy it to the output dataset
    for s in range(n_rows):
        sample_index = int(random.random()*n_rows)
        sampled_dataset[s,:] = dataset[sample_index,:]
    
    return sampled_dataset

Now let's apply bootstrap to the boston housing dataset.

In [3]:
data = load_boston().data

We can apply bootstrapping and generate a new dataset,

In [6]:
np.random.seed(328489)
bootstrap_dataset = bootstrap(data)
print("Bootstrap dataset contains %d (%.1f) unique examples"%
      (np.unique(bootstrap_dataset,axis=0).shape[0],
       100*np.unique(bootstrap_dataset,axis=0).shape[0]/data.shape[0]))

Bootstrap dataset contains 319 (63.0) unique examples


We can repeat the process several times and compute on average how many unique examples a data set generated with bootstrap resampling contains.

In [7]:
sum = 0.0
i = 0
for i in range(500):
    sum = sum + np.unique(bootstrap(data),axis=0).shape[0]/506
    i = i+1
print("The percentage of unique data points is %.1f"%(100*(sum/i)))

The percentage of unique data points is 63.3


This is the value we expected from the theory.