## Ch05_Q09 in p227
We will now consider the `Boston` housing data set, from the `ISLP` library.

## Part(a)
Based on this data set, provide an estimate for the population mean of `medv`. Call this estimate $\hat{\mu}$.

In [3]:
from ISLP import load_data
from IPython.display import display, Math

boston = load_data('Boston')

# Estimate the population mean of `medv`
mu_hat = boston['medv'].mean()

# Display mu_hat in LaTeX with two decimal places
display(Math(f"\\hat{{\\mu}} = {mu_hat:.2f}"))

<IPython.core.display.Math object>

## Part(b)
Provide an estimate of the standard error of $\hat{\mu}$. Interpret this result.  
*Hint: We can compute the standard error of the sample mean by dividing the sample standard deviation by the square root of the number of observations.*

In [6]:
# Calculate the sample standard deviation and number of observations
std_dev = boston['medv'].std()
n = len(boston['medv'])

# Calculate the standard error of the mean
se_mu_hat = std_dev / (n ** 0.5)
display(Math(f"\\text{{SE}}(\\hat{{\\mu}}) = {se_mu_hat:.2f}"))

<IPython.core.display.Math object>

## Part(c)
Now estimate the standard error of $\hat{\mu}$ using the bootstrap. How does this compare to your answer from (b)?

In [12]:
import numpy as np

# Number of bootstrap samples
n_bootstrap = 1000

# Generate bootstrap sample means
bootstrap_means = [boston['medv'].sample(frac=1, replace=True).mean() for _ in range(n_bootstrap)]

# Calculate the bootstrap standard error of the mean
se_mu_hat_bootstrap = np.std(bootstrap_means)

# Display the bootstrap standard error in LaTeX format
display(Math(f"\\text{{SE}}_{{\\text{{Bootstrap}}}}(\\hat{{\\mu}}) = {se_mu_hat_bootstrap:.2f}"))

<IPython.core.display.Math object>

### Answer:
The results for the standard error of \( $\hat{\mu}$ \) are as follows:

- **Standard Error from Part (b)**: \( $\text{SE}(\hat{\mu}) = 0.41$ \)
- **Bootstrap Standard Error from Part (c)**: \( $\text{SE}_{\text{Bootstrap}}(\hat{\mu}) = 0.42$ \)

#### Comparison and Interpretation

The two standard error estimates are very close, differing by only 0.01. This similarity suggests that both methods provide a consistent and reliable estimate of the standard error for the sample mean \( $\hat{\mu}$ \). Here’s what this implies:

1. **Consistency**: The closeness of these values supports the accuracy of the standard error obtained through the traditional formula (Part b) as well as through the bootstrap approach (Part c).
   
2. **Bootstrap Reliability**: The bootstrap approach in Part (c) serves as a robust check on the standard error estimate from Part (b). Since they are nearly identical, we can be confident in the precision of both methods.

3. **Conclusion**: Both methods indicate a similar level of variability in \( $\hat{\mu}$ \), confirming that the sample mean is a stable and reliable estimate of the population mean in this dataset.

## Part(d)

$$
\hat{\beta_1}\pm3\cdot SE\left(\hat{\beta_1} \right) \tag{3.9}
$$
Based on your bootstrap estimate from (c), provide a 95 % confidence interval for the mean of `medv`. Compare it to the results obtained by using `Boston['medv'].std()` and the two standard error rule (3.9).

*Hint: You can approximate a 95 % confidence interval using the formula $\left[\hat{\mu}-2SE\left(\hat{\mu}\right),\ \hat{\mu}+2SE\left(\hat{\mu}\right)\right]$.*

In [8]:
# Calculate the 95% confidence interval using the bootstrap SE
bootstrap_ci_lower = mu_hat - 2 * se_mu_hat_bootstrap
bootstrap_ci_upper = mu_hat + 2 * se_mu_hat_bootstrap

print(f"Bootstrap 95% Confidence Interval for the Mean of medv: [{bootstrap_ci_lower:.2f}, {bootstrap_ci_upper:.2f}]")

# Calculate the 95% confidence interval using the standard error from Part (b)
standard_ci_lower = mu_hat - 2 * se_mu_hat
standard_ci_upper = mu_hat + 2 * se_mu_hat

print(f"Standard 95% Confidence Interval for the Mean of medv: [{standard_ci_lower:.2f}, {standard_ci_upper:.2f}]")

Bootstrap 95% Confidence Interval for the Mean of medv: [21.73, 23.33]
Standard 95% Confidence Interval for the Mean of medv: [21.72, 23.35]


### Answer:
The results show that both confidence intervals are very close:

- **Bootstrap 95% Confidence Interval**: \([21.73, 23.33]\)
- **Standard 95% Confidence Interval**: \([21.72, 23.35]\)

#### Interpretation
The similarity between the two intervals indicates that the bootstrap approach and the traditional two-standard-error rule both provide consistent and reliable estimates for the 95% confidence interval of the mean of `medv`. This agreement suggests that the sample standard error from Part (b) is a good estimate of the variability around the mean, corroborated by the bootstrap method. 

#### Conclusion
These close results support the robustness of our confidence interval estimates, regardless of the method used.

## Part(e)
Based on this data set, provide an estimate, $\hat{\mu}_{med}$, for the median value of `medv` in the population.

In [14]:
# Calculate the median of medv
mu_med_hat = boston['medv'].median()

# Display the result
display(Math(f"\\hat{{\\mu}}_{{\\text{{med}}}} = {mu_med_hat:.2f}"))

<IPython.core.display.Math object>

## Part(f)
We now would like to estimate the standard error of $\hat{\mu}_{med}$. Unfortunately, there is no simple formula for computing the standard error of the median. Instead, estimate the standard error of the median using the bootstrap. Comment on your findings.

In [16]:
# Number of bootstrap samples
n_bootstrap = 1000

# Generate bootstrap sample medians
bootstrap_medians = [boston['medv'].sample(frac=1, replace=True).median() for _ in range(n_bootstrap)]

# Calculate the standard error of the median using bootstrap
se_mu_med_hat_bootstrap = np.std(bootstrap_medians)

# Display the bootstrap standard error in LaTeX format
from IPython.display import display, Math
display(Math(f"\\text{{SE}}_{{\\text{{Bootstrap}}}}(\\hat{{\\mu}}_{{\\text{{med}}}}) = {se_mu_med_hat_bootstrap:.2f}"))

<IPython.core.display.Math object>

### Answer: 
- **Bootstrap Standard Error**: \( $\text{SE}_{\text{Bootstrap}}(\hat{\mu}_{\text{med}}) = 0.38$ \)
  
- **Interpretation**:
  - This small standard error indicates that the median is a stable and reliable estimate of the population median.
  - Limited variability suggests that the sample median is a consistent measure of central tendency for `medv`.
  
- **Usefulness of Bootstrap**:
  - The bootstrap method provides a robust estimate of the median’s standard error, which lacks a simple analytical formula.
  
This format confirms that the median is a dependable estimate of central tendency for this dataset.

## Part(g)
Based on this data set, provide an estimate for the tenth percentile of `medv` in Boston census tracts. Call this quantity $\hat{\mu}_{0.1}$.  
(You can use the `np.percentile()` function.)

In [17]:
# Calculate the 10th percentile of medv
mu_0_1_hat = np.percentile(boston['medv'], 10)

display(Math(f"\\hat{{\\mu}}_{{0.1}} = {mu_0_1_hat:.2f}"))

<IPython.core.display.Math object>

## Part(h)
Use the bootstrap to estimate the standard error of $\hat{\mu}_{0.1}$. Comment on your findings.

In [19]:
# Number of bootstrap samples
n_bootstrap = 1000

# Generate bootstrap sample percentiles
bootstrap_percentiles = [np.percentile(boston['medv'].sample(frac=1, replace=True), 10) for _ in range(n_bootstrap)]

# Calculate the standard error of the 10th percentile using bootstrap
se_mu_0_1_hat_bootstrap = np.std(bootstrap_percentiles)

# Display the bootstrap standard error in LaTeX format
from IPython.display import display, Math
display(Math(f"\\text{{SE}}_{{\\text{{Bootstrap}}}}(\\hat{{\\mu}}_{{0.1}}) = {se_mu_0_1_hat_bootstrap:.2f}"))

<IPython.core.display.Math object>

### Answer:
- **Bootstrap Standard Error**: \( $\text{SE}_{\text{Bootstrap}}(\hat{\mu}_{0.1}) = 0.51$ \)

- **Interpretation**:
  - Moderate variability indicates that the 10th percentile estimate has some fluctuation across samples.
  - Lower percentiles are more sensitive to sample changes, leading to higher variability than central measures.

- **Reliability**:
  - The bootstrap effectively captures this variability, providing a reliable standard error where traditional formulas aren’t available.

In summary, the moderate standard error suggests the 10th percentile is a useful but somewhat variable estimate.