The “wisdom of the crowd” is the idea that many independent opinions, when combined, can give a better answer than most individuals on their own. In ensemble learning, we apply the same idea to many models instead of many people.

### Single models vs many opinions

So far, you have seen models like KNN, linear regression, logistic regression, decision trees, and SVMs. Each of these is a **single** function that takes input data (a sample) and outputs a prediction. For example, a classifier outputs a class label, and a regressor outputs a number.

Up to this point, the pattern has been: pick one model, train it, and use only that model to predict. In real life, though, we rarely rely on a single opinion. We ask multiple doctors, debate in parliaments, and elect leaders by voting. Ensemble learning brings this “many opinions” idea into machine learning by combining multiple models’ predictions.

### What is ensemble learning?

Ensemble learning means training several models and then combining their outputs to make a final decision. Each individual model might be weak or “not great” on its own, but together they can form a strong overall predictor.

The key idea is: instead of trusting one model completely, you build a group of models and then aggregate their predictions (for example, by averaging or majority vote). Whether this works well depends on how these models behave: if their errors are different and not all pointing in the same direction, the combination can reduce overall error.

### Why can combining “bad” models work?

At first, it seems strange: if each model is pretty bad, why would combining them help? The answer is that “bad” here usually means “noisy but not systematically wrong.” If different models make different mistakes, their individual errors can cancel each other out when you average or vote.

What matters is:
- The models’ individual biases (do they tend to be too high or too low?).  
- Their variances (how much their predictions fluctuate).  
- How correlated they are (do they all make the same mistakes or different ones?).  

If the models are not all biased in the same direction and are somewhat independent, combining them can reduce variance and improve reliability.

### Wisdom of the crowd, step by step

Imagine many people each making a guess about some unknown quantity. Each person might be a bit biased (some guess too low, some too high), and each person’s guesses vary (they don’t always say the same number). However:

- Across the whole population, some people are low, some are high, and on average these biases balance out.  
- If you take the **average** of many independent guesses, the average tends to be close to the true value.  
- As you increase the number of people, the average becomes more stable: it jumps around less from one group to another.

In statistics, this stability of the average comes from a core result: the variance of the average of independent samples decreases as $1 / n$, where $n$ is the number of samples. You don’t need the exact formula now; the intuition is: “more independent opinions → averaged result becomes less noisy.”

### Importance of independence

A crucial requirement is that each person (or model) makes their decision independently. If everyone copies one “expert” before answering, then the crowd is no longer diverse and independent. The group’s bias then starts to look exactly like that expert’s bias, and you lose the benefit of averaging.

The same is true in machine learning:
- If all models are essentially the same and make the same mistakes, combining them doesn’t help much.  
- If models are trained in ways that make their errors different (for example, on different data samples or with different features), then combining them can reduce overall error.

### Connecting to ensemble methods in ML

In this module, you will study two major families of ensemble methods:

- Bagging: build many models in parallel (often on different subsets of data) and combine them, which mainly reduces variance. Random forests are a specific algorithm that comes from this idea.
- Boosting: build models sequentially, where each new model focuses on correcting the mistakes of the previous ones. AdaBoost and gradient-boosted trees are key examples.

In all of these, the core wisdom-of-the-crowd principle is the same: use multiple, preferably diverse and not overly correlated models, and combine them so that individual weaknesses cancel out and the final prediction is more reliable than any single model.