Definition of Prediction Interval
A prediction interval is a statistical concept that provides an estimated range within which a future observation or measurement is expected to fall, with a specified level of confidence.

Mathematically, for a random variable Y and a confidence level 1-α, a prediction interval [L, U] satisfies:

P(L ≤ Y ≤ U) = 1 - α

Where:

L is the lower bound of the interval U is the upper bound of the interval
Y is the future observation
1-α is the confidence level (e.g. 95% for α = 0.05)
Mathematical Formulation of Prediction Interval
Mathematically speaking, a prediction interval for a future observation Yf derived from an n-observation sample can be stated as:

Yf  ± ( tα/2, n-1 ) × s × √(1 + 1/n).

where:

The projected value is Yf.
The t-value for the intended confidence level ( α) and degrees of freedom ( n-1) is tα/2,
Standard error of the estimate is s
sample size is n.

Definition of Confidence Interval
A confidence interval in statistics is a range of values that is likely to contain an unknown population parameter with a specified level of confidence. It is constructed around a point estimate and provides a measure of uncertainty.

Mathematically, for a population parameter θ and a confidence level (1-α), where α is the significance level, the confidence interval can be expressed as:

[θ̂ - z(α/2) × SE(θ̂), θ̂ + z(α/2) × SE(θ̂)]

Where:

θ̂ (theta hat) is the point estimate of the parameter
z(α/2) is the critical value from the standard normal distribution
SE(θ̂) is the standard error of the estimate
Mathematical Formulation of Confidence Interval
The general form of a confidence interval (CI) is:

Point Estimate ± (Critical Value × Standard Error)

For Population Mean (Known Population Standard Deviation)

When the population standard deviation (σ) is known:

CI = x̄ ± (z × σ/√n)

Where:

x̄ is the sample mean
z is the z-score from the standard normal distribution
σ is the known population standard deviation
n is the sample size
For Population Mean (Unknown Population Standard Deviation)

When the population standard deviation is unknown:

CI = x̄ ± (t × s/√n)

Where:

x̄ is the sample mean
t is the t-score from the t-distribution
s is the sample standard deviation
n is the sample size

Ensemble learning is a method where we use many small models instead of just one. Each of these models may not be very strong on its own, but when we put their results together, we get a better and more accurate answer. It's like asking a group of people for advice instead of just one person—each one might be a little wrong, but together, they usually give a better answer.

There are three main types of ensemble methods:

Bagging (Bootstrap Aggregating):

Models are trained independently on different random subsets of the training data. Their results are then combined—usually by averaging (for regression) or voting (for classification). This helps reduce variance and prevents overfitting.

Boosting:

Models are trained one after another. Each new model focuses on fixing the errors made by the previous ones. The final prediction is a weighted combination of all models, which helps reduce bias and improve accuracy.

Stacking (Stacked Generalization):   

Multiple different models (often of different types) are trained, and their predictions are used as inputs to a final model, called a meta-model. The meta-model learns how to best combine the predictions of the base models, aiming for better performance than any individual model.

1. Bagging Algorithm
Bagging classifier can be used for both regression and classification tasks. Here is an overview of Bagging classifier algorithm:

Bootstrap Sampling: Divides the original training data into ‘N’ subsets and randomly selects a subset with replacement in some rows from other subsets. This step ensures that the base models are trained on diverse subsets of the data and there is no class imbalance.
Base Model Training: For each bootstrapped sample we train a base model independently on that subset of data. These weak models are trained in parallel to increase computational efficiency and reduce time consumption. We can use different base learners i.e. different ML models as base learners to bring variety and robustness.
Prediction Aggregation: To make a prediction on testing data combine the predictions of all base models. For classification tasks it can include majority voting or weighted majority while for regression it involves averaging the predictions.
Out-of-Bag (OOB) Evaluation: Some samples are excluded from the training subset of particular base models during the bootstrapping method. These “out-of-bag” samples can be used to estimate the model’s performance without the need for cross-validation.
Final Prediction: After aggregating the predictions from all the base models, Bagging produces a final prediction for each instances.


2. Boosting Algorithm
Boosting is an ensemble technique that combines multiple weak learners to create a strong learner. Weak models are trained in series such that each next model tries to correct errors of the previous model until the entire training dataset is predicted correctly. One of the most well-known boosting algorithms is AdaBoost (Adaptive Boosting). Here is an overview of Boosting algorithm:

Initialize Model Weights: Begin with a single weak learner and assign equal weights to all training examples.
Train Weak Learner: Train weak learners on these dataset.
Sequential Learning: Boosting works by training models sequentially where each model focuses on correcting the errors of its predecessor. Boosting typically uses a single type of weak learner like decision trees.
Weight Adjustment: Boosting assigns weights to training datapoints. Misclassified examples receive higher weights in the next iteration so that next models pay more attention to them.