# 6.4: Resampling Exercises

## Getting Started

### Import Libraries 

We import our standard libraries and specific objects/libraries at the top level of our notebook.

In [1]:
# Load our previous libraries and objects
import numpy as np
from ISLP import load_data

## The Bootstrap

The bootstrap approach can be applied in almost all situations. While there are several implementations of the bootstrap in Python, its use for estimating standard error is simple enough that we write our own function below for the case when our data is stored in a dataframe. We will use the `Portfolio` data set in the `ISLP` package to illustrate.

We will create a function `alpha_func()`, which takes as input a dataframe `D` assumed
to have columns `X` and `Y`, as well as a
vector `idx` indicating which observations should be used to
estimate 
$\alpha$. The function then outputs the estimate for $\alpha$ based on
the selected observations.

$$
\hat{\alpha} = \frac{\hat{\sigma}_{Y}^2-\hat{\sigma}_{XY}}{\hat{\sigma}_{X}^2+\hat{\sigma}_{Y}^2-2\hat{\sigma}_{XY}}
$$

In [2]:
Portfolio = load_data('Portfolio')
def alpha_func(D, idx):
   cov_ = np.cov(D[['X','Y']].loc[idx], rowvar=False) # https://numpy.org/doc/stable/reference/generated/numpy.cov.html
   return ((cov_[1,1] - cov_[0,1]) /
           (cov_[0,0]+cov_[1,1]-2*cov_[0,1]))

The following command estimates $\alpha$ using all 100 observations.

In [3]:
alpha_func(Portfolio, range(100))

0.57583207459283

Next we randomly select 100 observations from range(100), with replacement. This is equivalent to constructing **one** new bootstrap sample and recomputing $\alpha$ based on the new data set.

In [4]:
rng = np.random.default_rng(0) # control randomness, refer to https://numpy.org/doc/stable/reference/random/generator.html
alpha_func(D=Portfolio,
           idx=rng.choice(100, 100, replace=True))

0.6074452469619004

## Estimate standard deviation of $\hat{\alpha}$

In [24]:
bootstrap_alfas = []

def boot_SE(D, B=1000, seed=0):

    rng = np.random.default_rng(seed)
    n = D.shape[0] #number of lines

    for i in range(B):
        idx = rng.choice(D.index,
                         n,
                         replace=True)
        
        alfa = alpha_func(D, idx) #get the alfa of each execution and save

        bootstrap_alfas.append(alfa)

    print(f"Bootstrap Mean: {np.mean(bootstrap_alfas)}")
    print ("Standard Deviation: " , np.std(np.array(bootstrap_alfas)))
    print ("95% percentile: " , np.percentile(np.array(bootstrap_alfas), [2.5,97.5]))
    
    return np.std(np.array(bootstrap_alfas))

# To interpret the standard deviation (to say that it is good or bed) it is necessary to know the mean
# 
# mean: center of the distribution
#
# Standard Deviation: how spread from the mean the data points are ... (aka variability)
#
#      Low Standard Deviation: When the standard deviation is low, it indicates that 
#                              most values in the dataset are close to the mean.
#      High Standard Deviation: Conversely, a high standard deviation means that the 
#                               values are more spread out from the mean.

Let’s use our function to evaluate the accuracy of our estimate of $\alpha$ using $B = 1,000$ bootstrap replications.

In [25]:
alpha_SE = boot_SE(D=Portfolio,
                   B=1000,
                   seed=0)
alpha_SE

Bootstrap Mean: 0.5821442531025649
Standard Deviation:  0.09118176521277165
95% percentile:  [0.41334744 0.76977199]


0.09118176521277165

The final output shows that the bootstrap estimate for ${\rm SE}(\hat{\alpha})$ is $0.0912$.

*These exercises were adapted from :* James, Gareth, et al. An Introduction to Statistical Learning: with Applications in Python, Springer, 2023.