WEEK-17,ASS NO-03

Q1. What is an ensemble technique in machine learning?

An **ensemble technique** in machine learning refers to the process of combining multiple models (often called "weak learners") to improve overall performance, prediction accuracy, and generalization. The idea is that by aggregating the predictions of multiple models, the ensemble can reduce errors and produce better results than any single model alone.

### Key Concepts in Ensemble Techniques

1. **Diversity of Models**: Ensemble methods rely on the idea that different models may capture different patterns in the data. By combining their predictions, the ensemble reduces individual model weaknesses.

2. **Weak Learners**: Many ensemble methods involve "weak learners," which are models that perform slightly better than random guessing. When aggregated, they become "strong learners."

3. **Combining Predictions**: The final prediction in an ensemble model is typically done by either averaging the predictions (for regression tasks) or using a majority vote (for classification tasks).

### Types of Ensemble Techniques

1. **Bagging (Bootstrap Aggregating)**:
   - **Description**: Bagging reduces variance by training multiple models on different random subsets of the training data (with replacement) and then averaging their predictions (for regression) or using a majority vote (for classification).
   - **Example**: The **Random Forest** algorithm is an example of bagging applied to decision trees. It trains multiple decision trees on random subsets of data and features, and combines their predictions.
   
   **Advantage**: Reduces overfitting and variance in the model.

2. **Boosting**:
   - **Description**: Boosting focuses on reducing bias. It works by sequentially training models, with each model trying to correct the errors of the previous one. The models are weighted based on their performance, and predictions are combined in a weighted manner.
   - **Example**: **AdaBoost**, **Gradient Boosting**, and **XGBoost** are popular boosting algorithms.
   
   **Advantage**: Improves accuracy by focusing on misclassified instances in the training set.

3. **Stacking (Stacked Generalization)**:
   - **Description**: Stacking combines predictions from multiple different models (often heterogeneous models like decision trees, SVMs, and neural networks). These predictions are then used as inputs to a "meta-model" that makes the final prediction.
   - **Example**: Train several base models like logistic regression, decision trees, and SVMs, and then use a logistic regression model to combine their predictions.
   
   **Advantage**: Can leverage the strengths of multiple different types of models to improve overall performance.

4. **Voting**:
   - **Description**: Voting is used in classification tasks. It involves training multiple models and predicting the final class label based on a majority vote (hard voting) or averaging probabilities (soft voting).
   - **Example**: Combining a decision tree, a logistic regression, and a k-nearest neighbors (KNN) classifier to make a final prediction.
   
   **Advantage**: Simple to implement and can improve the accuracy of base models.

### Why Use Ensemble Techniques?

1. **Improved Accuracy**: By combining multiple models, ensemble techniques can capture more diverse patterns in the data and improve prediction accuracy.
2. **Robustness**: Ensembles are less likely to overfit compared to individual models because they smooth out the predictions.
3. **Reduction of Variance and Bias**: Bagging helps reduce variance, while boosting helps reduce bias, making ensemble techniques versatile for different types of problems.

### Example

Consider the **Random Forest** algorithm, which is an ensemble method based on decision trees:
- It trains multiple decision trees on different random subsets of the data.
- For classification, it predicts the class that has the majority vote from all the decision trees.
- For regression, it averages the predictions of all the trees.

By doing this, the Random Forest reduces the risk of overfitting that may occur with a single decision tree.

 

Q2. Why are ensemble techniques used in machine learning?

Ensemble techniques are used in machine learning because they improve the performance, accuracy, and robustness of models by combining the strengths of multiple individual models. Here's why ensemble methods are commonly used:

### 1. **Improved Accuracy**
   - **Combining multiple models** allows an ensemble to outperform individual models by aggregating their predictions. Each model may capture different patterns or relationships in the data, and when combined, the ensemble can achieve more accurate results.
   - By blending the outputs of multiple weak learners (models that perform slightly better than random), ensemble methods create a stronger "meta-model" that often surpasses the performance of any single model.

### 2. **Reduction of Overfitting**
   - Overfitting occurs when a model performs well on training data but poorly on unseen data (test set). Ensemble techniques, such as **bagging** (e.g., Random Forest), mitigate overfitting by reducing the variance of the model.
   - Since each model in the ensemble is trained on different subsets of the data, the risk of individual models memorizing noise or irrelevant patterns is reduced.

### 3. **Reduction of Variance and Bias**
   - **Variance**: Models like decision trees can have high variance, meaning they are sensitive to small changes in the training data. By averaging predictions (as in bagging), ensemble methods stabilize predictions and reduce variance.
   - **Bias**: Models like linear regression may have high bias and may be too simplistic to capture complex relationships. Boosting techniques like **AdaBoost** or **XGBoost** work iteratively to correct errors, which reduces bias and improves accuracy.

### 4. **Increased Robustness**
   - An ensemble is more **robust** than individual models because it aggregates the knowledge of several different models, reducing the likelihood that one model's error will dominate the overall prediction. Even if some individual models perform poorly on certain data points, their impact is minimized by the rest of the models.
   - This leads to better generalization on unseen data, making the ensemble more reliable.

### 5. **Versatility with Different Models**
   - Ensembles are flexible enough to combine **different types of models** (e.g., decision trees, support vector machines, and logistic regression) through methods like stacking. This allows leveraging the strengths of each model type and compensating for their individual weaknesses.
   - **Stacking**, for example, allows different algorithms to be combined and fine-tuned for complex tasks, leading to superior performance compared to using any single algorithm.

### 6. **Handling Complex Data**
   - Some datasets have **complex patterns** that are hard to capture with a single model. Ensembles allow you to explore and exploit multiple hypotheses about the data, helping to capture non-linear relationships, interactions between features, and other complexities that a single model might miss.

### 7. **Stability of Predictions**
   - By aggregating multiple models, ensemble techniques help to smooth out the variations and **stabilize predictions**. This is particularly useful when individual models are sensitive to small fluctuations in the data (as is the case with high-variance models like decision trees).
   - This stability can make ensemble methods more resistant to outliers or noisy data, leading to better performance in real-world applications.

### 8. **Better Performance in Competitions**
   - In data science competitions, such as Kaggle, ensemble methods are widely used because they often yield the **best performance**. Participants frequently use ensembles to combine multiple models and achieve the highest possible prediction accuracy.

### Examples of Popular Ensemble Techniques

- **Random Forest**: An ensemble of decision trees trained with bagging (bootstrap aggregation), often used to reduce variance and prevent overfitting.
- **AdaBoost** and **Gradient Boosting**: Boosting algorithms that build models sequentially, with each new model focusing on correcting the errors of the previous models, which reduces bias.
- **XGBoost**: A highly efficient implementation of gradient boosting, popular for its speed and performance in machine learning competitions.

 

Q3. What is bagging?

**Bagging** (short for **Bootstrap Aggregating**) is an ensemble learning technique in machine learning that improves the accuracy and robustness of models by combining the predictions of multiple models trained on different subsets of the data.

Here's how bagging works:

1. **Bootstrap Sampling**: Multiple subsets of the original dataset are created by randomly sampling with replacement. This means some data points may be repeated in a subset, while others may be left out. Each subset is the same size as the original dataset.

2. **Model Training**: A separate model (often a decision tree) is trained on each subset. Because each model sees different data, they are likely to make slightly different predictions.

3. **Aggregation**:
   - For **classification tasks**, the final prediction is determined by a majority vote of the individual models.
   - For **regression tasks**, the final prediction is the average of the predictions made by the individual models.

**Advantages of Bagging**:
- **Reduces variance**: Since different models are trained on different subsets, the overall prediction becomes less sensitive to the specific training data.
- **Improves accuracy**: By averaging multiple models' outputs, the ensemble typically outperforms a single model, especially if the base models are prone to overfitting.

A well-known algorithm that uses bagging is **Random Forest**, which applies bagging to decision trees.

Q4. What is boosting?

**Boosting** is another ensemble learning technique that aims to improve the performance of weak learners (models that perform slightly better than random guessing) by combining them into a strong learner. Unlike bagging, where models are trained independently, boosting trains models sequentially, and each model focuses on correcting the errors of its predecessor.

Here’s how boosting works:

1. **Initialize Weights**: Boosting assigns equal weights to all data points initially. These weights reflect the importance of each data point when training the models.

2. **Sequential Learning**:
   - A weak learner (like a decision tree with limited depth) is trained on the dataset.
   - After the first model is trained, boosting identifies which data points were misclassified or had higher errors. It increases the weights of these misclassified points so that the next model focuses more on correcting those errors.
   - The process repeats, with each new model concentrating more on the previously misclassified instances.

3. **Aggregation**: The predictions of all weak learners are combined, often using a weighted sum of their outputs. In classification tasks, the final decision is based on the majority weighted vote, while in regression tasks, it's the weighted average of the outputs.

**Popular Boosting Algorithms**:
- **AdaBoost (Adaptive Boosting)**: One of the earliest and simplest boosting algorithms, where each new model focuses more on the misclassified examples from previous models.
- **Gradient Boosting**: Instead of adjusting weights directly, gradient boosting builds models to minimize the errors (loss function) of the previous models in a gradient descent fashion.
- **XGBoost, LightGBM, CatBoost**: Advanced, optimized implementations of gradient boosting that are widely used in machine learning competitions and real-world applications due to their speed and performance.

**Advantages of Boosting**:
- **Reduces bias**: Boosting helps reduce bias by focusing on hard-to-classify examples and correcting errors iteratively.
- **High accuracy**: Boosting models often outperform other algorithms in terms of predictive accuracy, especially when fine-tuned.

However, boosting can be sensitive to noise and outliers since it gives more weight to hard-to-predict examples, which might include noisy data.

Q5. What are the benefits of using ensemble techniques?

Ensemble techniques combine the predictions of multiple models to create a more powerful model than any of the individual models alone. Here are the key benefits of using ensemble techniques:

### 1. **Improved Accuracy**
   - By combining the strengths of several models, ensemble methods often achieve higher predictive accuracy than individual models. Models like **bagging** and **boosting** aggregate the results of many weaker models to create a strong overall prediction.

### 2. **Reduced Overfitting**
   - Ensembles like **bagging** (e.g., Random Forest) reduce the risk of overfitting by training multiple models on different subsets of the data and averaging their predictions. This reduces the likelihood that the ensemble will overfit to any particular training set's noise or idiosyncrasies.

### 3. **Increased Robustness**
   - Ensemble models are generally more robust to noisy data and outliers. By combining multiple models, the impact of any single weak model making incorrect predictions is diminished.

### 4. **Reduction of Variance**
   - Techniques like bagging help to lower the variance of the model, which is especially useful for high-variance models like decision trees. This means the ensemble method makes the model more stable and consistent when applied to new data.

### 5. **Reduction of Bias**
   - **Boosting** algorithms reduce bias by iteratively improving weak models. The sequential nature of boosting allows the ensemble to reduce bias that might be present in individual models, resulting in more accurate predictions.

### 6. **Better Generalization**
   - By aggregating multiple models that may have different strengths, ensemble techniques are less likely to generalize poorly to unseen data. This ensures better performance on new, unseen data compared to a single model.

### 7. **Handles Complex Problems**
   - For complex datasets and problems with non-linear relationships or high dimensionality, individual models may struggle to capture the full complexity. Ensembles combine diverse models and can handle such cases better.

### 8. **Flexibility in Model Choices**
   - You can combine different types of models (e.g., decision trees, support vector machines, neural networks) in an ensemble to leverage the unique advantages of each. This is referred to as **stacking** and helps improve overall performance by considering models with different perspectives on the data.

### 9. **Mitigates Weaknesses of Single Models**
   - Even strong models like neural networks or decision trees have weaknesses (e.g., sensitivity to noise or high variance). Ensembles can mitigate these weaknesses by averaging or boosting several models, improving robustness and performance.

### 10. **Easy to Parallelize**
   - Techniques like bagging can be easily parallelized since the models are trained independently of each other. This can lead to faster training times when computational resources allow for parallel execution.

  

Q6. Are ensemble techniques always better than individual models?

Ensemble techniques are powerful, but they are not always guaranteed to outperform individual models. Whether an ensemble is better depends on several factors. Here’s why ensemble techniques are often better, but also why they might not always be the best choice:

### **When Ensemble Techniques Are Better:**

1. **Reducing Overfitting and Variance**:
   - **Bagging** methods like Random Forest reduce the variance of high-variance models (e.g., decision trees) by combining multiple models trained on different data subsets. This leads to more stable predictions and better generalization to unseen data.
   - **Boosting** improves model performance by focusing on the mistakes made by previous models, reducing bias over time.

2. **Improving Accuracy**:
   - Ensembles combine the strengths of multiple models, leading to higher accuracy in complex datasets where individual models struggle to capture all patterns.
   - **Stacking** uses different types of models and combines their predictions to improve the overall performance, making it better suited to handle varied and complex patterns in the data.

3. **Handling Complex Data**:
   - When dealing with data that has non-linear relationships or interactions that are hard to capture with a single model, ensembles can better capture these complexities by using a diverse set of learners.

4. **Reducing Sensitivity to Noisy Data**:
   - By averaging out the predictions of multiple models, ensembles are often more robust to noisy data points or outliers, as the impact of a single noisy instance is minimized.

---

### **When Ensemble Techniques May Not Be Better:**

1. **Simple Problems**:
   - If the dataset is simple and the underlying patterns are easy to model, a single model may perform just as well as an ensemble. For example, in cases where there’s a clear linear relationship, a single linear regression model may suffice, and an ensemble would add unnecessary complexity.

2. **Increased Complexity**:
   - Ensembles introduce additional complexity, both in terms of model design and computation. If the gain in performance is minimal, the extra complexity might not justify using an ensemble. This can also lead to longer training and inference times.

3. **Risk of Overfitting in Boosting**:
   - Boosting techniques like **AdaBoost** or **Gradient Boosting** are more prone to overfitting, especially if the base models are too complex or if the boosting process continues for too many iterations. In such cases, a single, regularized model might perform better.

4. **Computational Costs**:
   - Ensembles, especially large ones, can be computationally expensive and require more memory. If resources are limited, a well-tuned single model (like a Support Vector Machine or a neural network) could offer a better trade-off between performance and efficiency.

5. **Interpretability**:
   - Ensembles like Random Forests or Gradient Boosting models are less interpretable compared to simpler models (e.g., linear models, decision trees). In cases where interpretability is critical (e.g., in healthcare or finance), a simpler, interpretable model may be preferred.

6. **Diminishing Returns**:
   - Beyond a certain point, adding more models to an ensemble might not yield significant performance improvements. This could lead to diminishing returns, where the ensemble's added complexity doesn’t translate to meaningful gains in accuracy.

---

 

Q7. How is the confidence interval calculated using bootstrap?

The **bootstrap** method is a powerful statistical technique used to estimate the confidence interval for a population parameter (e.g., mean, median, variance) by repeatedly resampling the data. Unlike traditional methods, it doesn’t assume a specific distribution and works well even with small or skewed samples.

Here’s how to calculate a confidence interval using bootstrap:

### Steps for Calculating a Confidence Interval Using Bootstrap:

1. **Obtain the Original Sample**:
   - Start with your original sample data, say, of size \(n\). For example, let's say you want to estimate the confidence interval for the mean of this sample.

2. **Generate Bootstrap Samples**:
   - Create **B** bootstrap samples by randomly sampling (with replacement) from the original dataset. Each bootstrap sample should have the same size \(n\) as the original dataset. Because the sampling is done with replacement, some points in the dataset may appear multiple times in a bootstrap sample, while others may not appear at all.

   Example: If you have a sample of 100 data points, you create several new datasets (bootstrap samples), each of 100 data points, by sampling with replacement from the original sample.

3. **Calculate the Statistic for Each Bootstrap Sample**:
   - For each of the \(B\) bootstrap samples, calculate the statistic of interest (e.g., mean, median, etc.). Let’s assume you are calculating the mean of each bootstrap sample.
   
   So for each bootstrap sample \(i\), you get a statistic \( \hat{\theta}_i \) (e.g., \( \hat{\mu}_i \)).

4. **Build the Distribution of the Statistic**:
   - After resampling \(B\) times, you will have a distribution of \(B\) bootstrap estimates (e.g., \( \hat{\mu}_1, \hat{\mu}_2, ..., \hat{\mu}_B \)).

5. **Calculate Confidence Intervals**:
   - There are two common methods to calculate confidence intervals using bootstrap: the **percentile method** and the **bias-corrected and accelerated (BCa) method**.

   #### a) **Percentile Method**:
   - Sort the \(B\) bootstrap statistics in ascending order.
   - For a **\(1 - \alpha\)% confidence interval** (e.g., 95%), find the lower and upper bounds by selecting the \(\alpha/2\) and \(1 - \alpha/2\) percentiles from the sorted bootstrap statistics.
     - For a 95% confidence interval, you would select the 2.5th percentile and the 97.5th percentile of the bootstrap distribution.
   
   Example:
   - If you generated 1,000 bootstrap samples and calculated their means, sort these 1,000 means. For a 95% confidence interval, the 25th value (i.e., the 2.5th percentile) will give the lower bound, and the 975th value (the 97.5th percentile) will give the upper bound.

   #### b) **Bias-Corrected and Accelerated (BCa) Method**:
   - The BCa method adjusts the confidence interval for bias and skewness in the bootstrap distribution. It's more accurate, especially when the bootstrap distribution is not symmetric. It involves calculating bias and acceleration factors, but it is more computationally intensive and often used in software packages.

### **Formula for Percentile Method**:

For a \( 1 - \alpha \) confidence level (e.g., 95% confidence interval), let:
- \( \hat{\theta}_{(\alpha/2)} \) be the lower bound (e.g., 2.5th percentile).
- \( \hat{\theta}_{(1-\alpha/2)} \) be the upper bound (e.g., 97.5th percentile).

The confidence interval for the statistic \( \hat{\theta} \) is:
\[
[\hat{\theta}_{(\alpha/2)}, \hat{\theta}_{(1-\alpha/2)}]
\]

### **Example of Bootstrap Confidence Interval Calculation**:

Let’s say you have a sample of 100 values and want to estimate the 95% confidence interval for the mean:

1. Take the sample of 100 values.
2. Generate, say, 1,000 bootstrap samples by sampling with replacement from the original sample.
3. Calculate the mean for each of the 1,000 bootstrap samples.
4. Sort the 1,000 means and select the 25th value (2.5th percentile) and the 975th value (97.5th percentile).
5. These two values form the lower and upper bounds of the 95% confidence interval.

### **Advantages of Using Bootstrap for Confidence Intervals**:
- **Non-parametric**: No assumptions about the data distribution.
- **Applicable for small samples**: Works even when traditional methods struggle with small or skewed samples.
- **Flexibility**: Can estimate confidence intervals for many statistics (e.g., mean, median, variance) and complex models.



Q8. How does bootstrap work and What are the steps involved in bootstrap?

**Bootstrap** is a statistical technique that involves resampling the data to make inferences about a population. It's particularly useful for estimating the distribution of a statistic (e.g., mean, median, variance) without relying on strong parametric assumptions. Bootstrap is often used to calculate confidence intervals, standard errors, and to assess the variability of a statistic when the underlying distribution is unknown.

### How Bootstrap Works:

Bootstrap works by treating the sample data as if it were the population. It creates many new datasets (called **bootstrap samples**) by sampling with replacement from the original dataset. Then, the statistic of interest (e.g., mean, median) is computed for each bootstrap sample. By repeating this process many times, you can approximate the sampling distribution of the statistic.

### Steps Involved in Bootstrap:

1. **Original Sample**:
   - Start with your original dataset of size \(n\). This is the only data you have, and you treat it as a proxy for the population.
   
   Example: Suppose your sample is \( \{x_1, x_2, \dots, x_n\} \) with \(n = 100\) data points.

2. **Resampling with Replacement**:
   - Randomly draw \(n\) observations from the original dataset **with replacement** to create a new dataset (called a **bootstrap sample**). Since sampling is done with replacement, some data points may appear more than once, while others may not appear at all.
   
   Example: From the original dataset, create a new dataset like \( \{x'_1, x'_2, \dots, x'_n\} \), where \( x'_i \) comes from the original data, but some values may repeat.

3. **Calculate the Statistic of Interest**:
   - Compute the statistic (e.g., mean, median, variance) for this bootstrap sample. Let’s denote this statistic by \( \hat{\theta}_1 \).
   
   Example: If you're interested in the mean, calculate the mean for the bootstrap sample.

4. **Repeat the Process**:
   - Repeat steps 2 and 3 many times (typically \(B\) times, where \(B\) can range from 1,000 to 10,000). Each time, you create a new bootstrap sample and calculate the statistic of interest. After \(B\) iterations, you will have \(B\) different estimates of the statistic.
   
   Example: After 1,000 bootstrap iterations, you will have 1,000 mean estimates: \( \hat{\theta}_1, \hat{\theta}_2, \dots, \hat{\theta}_B \).

5. **Analyze the Bootstrap Distribution**:
   - The distribution of these \(B\) statistics is treated as an approximation of the sampling distribution of the statistic. This allows you to estimate important properties like the **standard error**, **bias**, and **confidence intervals** of the statistic.
   
   Example: You now have a distribution of 1,000 sample means, and you can use this distribution to estimate the confidence interval for the true population mean.

6. **Calculate Confidence Intervals** (Optional):
   - Sort the \(B\) bootstrap estimates and compute confidence intervals. For a \(1 - \alpha\) confidence interval (e.g., 95%), you can take the \( \alpha/2 \) and \( 1 - \alpha/2 \) percentiles of the bootstrap estimates to get the lower and upper bounds of the interval.
   
   Example: For a 95% confidence interval, select the 2.5th and 97.5th percentiles of the 1,000 sample means.

### Example of Bootstrap in Practice:

Let’s say you have a small sample of data and you want to estimate the 95% confidence interval for the sample mean:

1. **Original Sample**: \( \{5, 10, 15, 20, 25\} \) (5 data points).
   
2. **Resampling**:
   - Create a bootstrap sample by resampling with replacement. For example, one bootstrap sample might be \( \{10, 15, 10, 25, 5\} \).
   
3. **Compute the Mean**:
   - For this bootstrap sample, the mean is \( \hat{\mu} = (10 + 15 + 10 + 25 + 5)/5 = 13 \).

4. **Repeat**:
   - Repeat the resampling process, say, 1,000 times, generating 1,000 different means.
   
5. **Analyze**:
   - Sort the 1,000 bootstrap means and extract the 2.5th and 97.5th percentiles for a 95% confidence interval.

### Key Features of Bootstrap:
- **Non-parametric**: Bootstrap doesn’t assume a specific form of the data distribution, making it versatile and widely applicable.
- **Flexible**: Works well even with small or skewed data samples.
- **Applicable to Various Statistics**: Can be used to estimate confidence intervals for many different statistics (mean, median, variance, etc.).
  

Q9. A researcher wants to estimate the mean height of a population of trees. They measure the height of a
sample of 50 trees and obtain a mean height of 15 meters and a standard deviation of 2 meters. Use
bootstrap to estimate the 95% confidence interval for the population mean height.

To estimate the 95% confidence interval for the population mean height using the bootstrap method, here’s how we can proceed step by step:

### Given Information:
- Sample size (\( n \)): 50 trees
- Sample mean (\( \bar{x} \)): 15 meters
- Sample standard deviation (\( s \)): 2 meters

### Steps for Bootstrap Estimation of the 95% Confidence Interval:

1. **Original Sample**: 
   - The researcher has a sample of 50 trees, but the exact heights are not given. We’ll assume the distribution of the sample follows a normal distribution, which has a mean of 15 meters and a standard deviation of 2 meters.

2. **Generate Bootstrap Samples**:
   - We’ll generate multiple bootstrap samples by sampling 50 heights (with replacement) from the original sample distribution, which follows \( N(15, 2) \).
   - Each bootstrap sample will also contain 50 data points.

3. **Calculate the Statistic of Interest**:
   - For each bootstrap sample, we will calculate the mean height of the 50 trees.

4. **Repeat the Process**:
   - Repeat the resampling process \(B\) times (e.g., 1,000 or 10,000 iterations), each time computing the mean height for the bootstrap sample.

5. **Construct the Confidence Interval**:
   - After obtaining \(B\) bootstrap estimates of the mean height, we’ll sort the bootstrap means.
   - For a 95% confidence interval, take the 2.5th percentile and the 97.5th percentile of the sorted bootstrap means.

### Python Pseudocode for Bootstrapping:

```python
import numpy as np

# Given information
n = 50  # sample size
mean_height = 15  # sample mean
std_dev = 2  # sample standard deviation
B = 10000  # number of bootstrap samples

# Generate original sample from normal distribution
original_sample = np.random.normal(mean_height, std_dev, n)

# Bootstrap process
bootstrap_means = []

for _ in range(B):
    # Generate a bootstrap sample by sampling with replacement from the original sample
    bootstrap_sample = np.random.choice(original_sample, size=n, replace=True)
    # Compute the mean of the bootstrap sample
    bootstrap_means.append(np.mean(bootstrap_sample))

# Convert bootstrap means to a NumPy array
bootstrap_means = np.array(bootstrap_means)

# Calculate the 95% confidence interval (2.5th and 97.5th percentiles)
lower_bound = np.percentile(bootstrap_means, 2.5)
upper_bound = np.percentile(bootstrap_means, 97.5)

# Output the confidence interval
print(f"95% Confidence Interval: [{lower_bound}, {upper_bound}]")
```

### Steps Explanation:
- **Step 1**: We generate a sample from the normal distribution to simulate the original data.
- **Step 2**: We repeatedly (10,000 times) generate bootstrap samples by sampling with replacement and calculate the mean height for each bootstrap sample.
- **Step 3**: We then extract the 2.5th and 97.5th percentiles from the bootstrap means to construct the confidence interval.

### Expected Outcome:
The code will generate a 95% confidence interval for the population mean height, giving the range within which the true mean height is likely to lie based on the sample data. The results might be slightly different each time due to the randomness in resampling, but the confidence interval should be relatively close to the sample mean of 15 meters.

