# Ensemble Techniques & Bootstrap Confidence Interval
This notebook answers key questions on ensemble techniques and implements bootstrap confidence interval estimation.

## Q1: What is an ensemble technique in machine learning?
Ensemble techniques combine multiple models to improve performance and reduce overfitting.

## Q2: Why are ensemble techniques used in machine learning?
- Improve accuracy
- Reduce variance and bias
- Enhance generalization

## Q3: What is bagging?
Bagging (Bootstrap Aggregating) trains multiple models on random subsets of data and averages predictions.

## Q4: What is boosting?
Boosting trains models sequentially, giving more weight to misclassified instances in each step.

## Q5: Benefits of Ensemble Techniques
- Better accuracy
- Reduced overfitting
- Works well with weak learners

## Q6: Are ensemble techniques always better than individual models?
Not always. They can be computationally expensive and may not improve performance if base models are already strong.

## Q7: How is the confidence interval calculated using bootstrap?
By resampling the dataset multiple times and computing the percentile interval of the statistic.

## Q8: How does bootstrap work? Steps involved:
1. Randomly sample with replacement from the dataset.
2. Compute the statistic (e.g., mean) for each sample.
3. Repeat multiple times to create a distribution.
4. Compute the confidence interval from percentiles.

## Q9: Bootstrap Confidence Interval Estimation for Tree Height
Given:
- Sample mean = 15 meters
- Standard deviation = 2 meters
- Sample size = 50
We use bootstrap to estimate the 95% confidence interval.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Given data
sample_mean = 15  # meters
sample_std = 2  # meters
sample_size = 50
num_bootstrap_samples = 10000

# Generate bootstrap samples
bootstrap_means = []
for _ in range(num_bootstrap_samples):
    sample = np.random.normal(sample_mean, sample_std, sample_size)
    bootstrap_means.append(np.mean(sample))

# Compute 95% confidence interval
lower_bound = np.percentile(bootstrap_means, 2.5)
upper_bound = np.percentile(bootstrap_means, 97.5)

print(f"95% Confidence Interval: ({lower_bound:.2f}, {upper_bound:.2f}) meters")

# Plot bootstrap distribution
plt.hist(bootstrap_means, bins=30, alpha=0.7, color='blue', edgecolor='black')
plt.axvline(lower_bound, color='red', linestyle='dashed', label='Lower 2.5%')
plt.axvline(upper_bound, color='green', linestyle='dashed', label='Upper 97.5%')
plt.title('Bootstrap Distribution of Sample Means')
plt.xlabel('Mean Height (meters)')
plt.ylabel('Frequency')
plt.legend()
plt.show()

### Conclusion:
- Bootstrap provides an estimate of confidence intervals without assumptions of normality.
- Ensemble methods enhance model performance by leveraging multiple weak learners.
- Future work: Try different ensemble techniques like Random Forest, XGBoost, and Stacking.