Aggregating predictors is the core of how ensemble learning works: instead of relying on one model, you train multiple models (called "component models") and combine their predictions into a single final prediction using a "metamodel" or "ensemble model." This metamodel acts like a manager that collects opinions from the group and decides the best overall answer.

### What is a metamodel?

A metamodel (or ensemble model) is the big-picture system that handles two main jobs:
- **Distributing input**: It takes a new data point (like a test sample) and sends it to all the individual models inside the ensemble.
- **Aggregating outputs**: It collects the predictions from each model and combines them into one final prediction.

You can use many types of models inside the metamodel—different ones like logistic regression, KNN, decision trees, or even slight variations of the same type (like many decision trees). The metamodel doesn't care what they are, as long as their predictions can be combined sensibly.

### How aggregation works: classification vs regression

The way you combine predictions depends on your task.

**For classification** (predicting categories or labels):
- **Hard voting (majority vote)**: Each model votes for a class label. Pick the class that gets the most votes. This works for binary (yes/no) or multiclass (multiple categories) problems.
- **Soft voting**: Some models output probabilities for each class (e.g., 30% chance of class A, 70% chance of class B). Average all these probabilities across models, then pick the class with the highest average probability. Soft voting often works better because it uses more detailed info.

**For regression** (predicting numbers):
- Simple averaging: Take the numerical prediction from each model and compute the average. That's your final prediction.

In all cases, you can optionally give some models more "weight" (like α1, α2, etc.) so their predictions influence the final result more than others.

### Why train many models? The training process

Normally, you split data into training and testing sets. With ensembles:
- Train each component model separately on the **same** training data (or sometimes different subsets—this is key for bagging, covered next).
- Test the metamodel by feeding test data to all components, aggregating their predictions, and checking accuracy against true labels.

A common trick: pair models with preprocessing like StandardScaler (normalizes features to similar scales). This helps most models avoid issues from features with wildly different ranges (e.g., age in years vs salary in dollars). Decision trees don't need this because they handle any scale fine.

### Why independence matters for good results

If all models are trained on the exact same data, they tend to make similar mistakes—they're "correlated." Their errors don't cancel out, so the ensemble might not beat the best single model.

The wisdom of the crowd only shines when individual decisions are independent (different mistakes in different places). That's why techniques like bagging (next topic) train models on different random subsets of data—to make them less correlated and let errors average out better.

### Quick connection to bagging and random forests

Bagging (bootstrap aggregating) builds on this by creating bootstrap samples (random subsets with replacement from training data) and training one model per sample. The metamodel then averages or votes. Random forests specifically use decision trees as the component models, making it a popular bagging method. Boosting (later) trains models sequentially instead of in parallel.