### The wisdom of the crowd in machine learning
When we combine the predictions of many models, the overall result can become **more accurate** and **more stable**. This is because different models can compensate for each other’s weaknesses — some may get certain parts wrong, while others get them right.

This idea helps us in two major ways:
- **Reducing bias** (fixing overly simple models)
- **Reducing variance** (fixing overly complex models)

### Bias and variance
- **Bias** means the model is **too simple** and cannot capture the real patterns in the data (it underfits).  
- **Variance** means the model is **too sensitive** to the training data and performs inconsistently on new data (it overfits).  

To build good models, we want **low bias** and **low variance**. But usually, there is a trade-off between them.

### Using ensembles
An **ensemble** means combining several models to make a single, stronger one.  
Researchers found that ensembles work best when the models inside the ensemble are **variations of the same type of model**.  

Depending on the problem:
- If each base model has **high variance**, we use **bagging (Bootstrap Aggregation)**.
- If each base model has **high bias**, we use **boosting**.

In this video, the focus is on **bagging**.

### The high variance problem
High variance models depend too much on their specific training data.  
If the training set is small or noisy, these models memorize it too closely and fail to generalize.  
Ideally, we’d fix this by collecting more data, but that’s often expensive or impossible.

### The bootstrap idea
When we can’t get new data, we can **simulate** new training datasets using the one we already have.  
This approach is called the **bootstrap** — named after the expression “pull yourself up by your bootstraps,” meaning to help yourself without outside help.

How does bootstrapping work?
- We treat our existing dataset as if it were the entire population.
- We then **create new datasets by sampling from it**, with an important detail: **sampling is done with replacement**.

### Sampling with replacement
“Sampling with replacement” means after selecting a data point from our dataset, we **put it back** and allow it to be picked again on the next draw.  
This allows some data points to appear **multiple times**, while others might **not appear at all**, making each new dataset slightly different.

### What percentage of the original data appears?
Mathematically, for a dataset with $N$ items:
- The probability that a specific item **is not chosen** in one draw = $1 - \frac{1}{N}$
- The probability that it is **never chosen** after $N$ draws = $(1 - \frac{1}{N})^N$
- So, the probability it **is chosen at least once** = $1 - (1 - \frac{1}{N})^N$

As $N$ becomes very large, this value approaches $1 - \frac{1}{e} \approx 0.632$.  

This means each bootstrap dataset contains roughly **63% unique data points** from the original set — the rest (about **37%**) are duplicates due to resampling.



In summary:
- **Bootstrapping** creates multiple new datasets by randomly sampling (with replacement) from the original data.  
- **Bagging** uses these bootstrapped datasets to train multiple models and then combines their predictions to reduce variance and improve stability.
