# Ensemble Learning Techniques

Ensemble Learning is a ML paradigm where multiple models (often called "learners" or "base models") are generated and combined to solve a particular ML problem. 

The idea is to build a prediction model by integrating outcomes of multiple smaller models together to improve robustness, accuracy, and performance. The goal is to reduce variance (bagging), bias (boosting) or improving predictions (stacking) 


## Bootstrap Aggregation (Bagging)
https://en.wikipedia.org/wiki/Bootstrap_aggregating

### Process:
Bagging involves training multiple models of the same type on different subsets of the training data. The subsets are created by randomly sampling with replacement from the original dataset (and some samples can appear more than once). The prediction is made by averaging the predictions (for regression problems) or by majority vote (for classification problems) 

Bagging is typically used in decision trees (Random Forests), but can be applied to most ML models to improve performance. 

### Sampling: 
You start with standard training dataset "D" with a sample size of "n". Bagging generates new training sets called "m" (each potentially varying in size). The size of "m" subsets are described by "n'"
- "With replacement" means when you take the sample from the original dataset "D" and give it to "m", you don't remove that sample from the original dataset "D"

The new training sets created through this process are the bootstramp samples. When n' = n (the new trianing sets are the same size as the original dataset D)
- it's statistically expected that each bootstrap sample will contain about 63.2% unique instances from the original dataset (63.2% comes from the formula "1 -(1/e)" where "e" is the base of the natural logarithm, roughly = ~2.71828) 

Sampling with replacement ensures that the bootstrap sample is independent from others. The selection of one data point does not affect the selection of another so each training set can be considered independently created - this is crucial to ensure these models have diverse perspectives on data

#### Why is it ok to have duplicate values in the bootstrapped dataset? 
Multiple occurences in the same datapoint in the bootstrapped dataset but not in others will introduce variability, but when the predictions from these models are aggregated (through majority vote or averaged out), the ensemble can achieve more generalized peformance

This ensemble model assumes that while each model may have its own biases due to its training dataset (like having duplicates of a certain datapoint), the aggregation process will even out the biases and lead to a final prediction that is more accurate and less prone to overfitting than any single model prediction

### Model Training:
Training - for each "m"-bootstrap samples, a separate model is trained on that data. This means you end up with "m"-number of models, each trained on a slightly different set of data due to random sampling process

Combining - once we have the predictions from the models trained on "m", we combine the results
- for regression: the output is averaged across all the "m"-models
- for classification: there is a majority vote on what to classify 

### Pros:
1. Bagging reduces variance (without increasing bias) leading to a model that generalizes better for unseen data 
    - however, bagging might not significantly reduce bias if a single model is already biased (but it also doesnt dramatically increase bias either)
    - the main objective is variance reduction (when averaging outputs of multiple models, the variance decreases)
2. Helps avoid overfitting when models get complex
    - even though each subset model might have high-variance predictions due to overfitting to its bootstrapped dataset, averaging the models can cancel out the individual variances leading to a stable-er prediction


## Boosting 



## Stacking 

# Interpreting ML and Traditional ML Algorithms

## Interpretability Analysis

# Sampling and Data Splitting


# Loss

## Class-balanced Loss

## Focal-loss 

## Cross-entropy loss

## MSE loss

## MAE loss

## Huber loss

# Model and Data Parallelism

# Regularization

## L1 and I2 Regularization

## Entropy Regularization

# K-fold cross validation

# Dropout

# Optimization Algorithms

## Stochastic Gradient Descent

## AdaGrad 

## Momentum

## RMSProp 


# Activation Function

## ELU

## ReLU

## Tanh

## Sigmoid

# Model Eval

## FID Score

## Inception score

## BLEU metrics

## METEOR metrics

## ROUGE score

## CIDEr score

## SPICE score

## Model Compression Survey

## Shadow deployment

## A/B Testing

## Canary Release 




# Quantization-aware training


# Interleaving Experiment

# Multi-armed Bandit

# ML Infrastructure