### Ch5_Q09
We will now consider the Boston housing data set, from the ISLP 
library.

In [1]:
!pip install ISLP



In [46]:
import numpy as np

#### (a) Based on this data set, provide an estimate for the population mean of medv. Call this estimate $\hat{µ}$.

In [47]:
from ISLP import load_data

df = load_data('Boston')
mu_hat = df['medv'].mean()
print("Estimated mean (mu_hat):", mu_hat)

Estimated mean (mu_hat): 22.532806324110677


#### (b) Provide an estimate of the standard error of $\hat{µ}$. Interpret this result. 
*Hint: We can compute the standard error of the sample mean by dividing the sample standard deviation by the square root of the number of observations.*

In [48]:
std_error = df['medv'].std() / np.sqrt(len(df['medv']))
print("Standard error of mu_hat:", std_error)

Standard error of mu_hat: 0.4088611474975351


The standard error provides a measure of the variability of the sample mean estimate. Smaller values indicate that the sample mean is more precise, while larger values suggest that the sample mean is more uncertain.

#### (c) Now estimate the standard error of $\hat{µ}$ using the bootstrap. How does this compare to your answer from (b)?

In [56]:
# Set the number of bootstrap iterations
n_bootstrap = 10000
bootstrap_means = []

# Perform bootstrap resampling
for i in range(n_bootstrap):
    bootstrap_sample = df['medv'].sample(n=len(df), replace=True)
    bootstrap_means.append(bootstrap_sample.mean())

# Estimate the standard error from the bootstrap sample means
se_mu_bootstrap = np.std(bootstrap_means)
print(f"Bootstrap standard error of the mean (SE(mu) bootstrap): {se_mu_bootstrap}")

Bootstrap standard error of the mean (SE(mu) bootstrap): 0.41133516590580477


Ideally, they should be fairly close, as the bootstrap estimate is an empirical approach to estimate the variability of the sample mean.

#### (d) Based on your bootstrap estimate from (c), provide a 95 % confidence interval for the mean of medv. Compare it to the results obtained by using Boston['medv'].std( ) and the two standard error rule (3.9).
*Hint: You can approximate a 95 % confidence interval using the formula $[ \hat{µ}−2SE(\hat{µ}), \hat{µ}+2SE(\hat{µ})]$.*

In [50]:
# 95% Confidence interval using the bootstrap standard error
ci_lower = mu_hat - 2 * se_mu_bootstrap
ci_upper = mu_hat + 2 * se_mu_bootstrap
print(f"95% Confidence interval for the mean of medv: [{ci_lower}, {ci_upper}]")

95% Confidence interval for the mean of medv: [21.71802068247179, 23.347591965749565]


#### (e) Based on this data set, provide an estimate, *$\hat{µ}_{med}$*, for the median value of medv in the population.

In [51]:
mu_med_hat = df['medv'].median()
print(f"Estimated median (mu-med): {mu_med_hat}")

Estimated median (mu-med): 21.2


#### (f) We now would like to estimate the standard error of *$\hat{µ}_{med}$*. Unfortunately, there is no simple formula for computing the standard error of the median. Instead, estimate the standard error of the median using the bootstrap. Comment on your findings.

In [57]:
# Bootstrap for the median
bootstrap_medians = []

for i in range(n_bootstrap):
    bootstrap_sample = df['medv'].sample(n=len(df), replace=True)
    bootstrap_medians.append(bootstrap_sample.median())

# Estimate the standard error of the median from the bootstrap
se_med_bootstrap = np.std(bootstrap_medians)
print(f"Bootstrap standard error of the median (SE(mu_med) bootstrap): {se_med_bootstrap}")

Bootstrap standard error of the median (SE(mu_med) bootstrap): 0.38171164496252896


**The standard error of the median can be quite large** compared to the mean because the median is less sensitive to extreme values. This is why we use resampling (bootstrap) to estimate the variability of the median.

#### (g) Based on this data set, provide an estimate for the tenth percentile of medv in Boston census tracts. Call this quantity $\hat{µ}_{0.1}$. (You can use the np.percentile( ) function.)

In [53]:
# Estimate for the 10th percentile of medv
mu_0_1_hat = np.percentile(df['medv'], 10)
print(f"Estimated 10th percentile (mu_0.1): {mu_0_1_hat}")

Estimated 10th percentile (mu_0.1): 12.75


#### (h) Use the bootstrap to estimate the standard error of $\hat{µ}_{0.1}$. Comment your finding?

In [54]:
# Bootstrap for the 10th percentile
bootstrap_0_1_percentiles = []

for i in range(n_bootstrap):
    bootstrap_sample = df['medv'].sample(n=len(df), replace=True)
    bootstrap_0_1_percentiles.append(np.percentile(bootstrap_sample, 10))

# Estimate the standard error of the 10th percentile from the bootstrap
se_0_1_bootstrap = np.std(bootstrap_0_1_percentiles)
print(f"Bootstrap standard error of the 10th percentile (SE(mu_0.1) bootstrap): {se_0_1_bootstrap}")

Bootstrap standard error of the 10th percentile (SE(mu_0.1) bootstrap): 0.5066999822133409


Just like with the median, the standard error of percentiles can vary more widely depending on the shape of the distribution. The bootstrap method allows us to estimate this variability.