# Ordinary Least Squares

In OLS, we are minimizing the **sum of the squared error**:

$ y_{i} - x_{i} \beta $

for:

    yi: The observed value of the dependent variable for the i-th observation.
    xi: The vector of independent variables for the i-th observation.
    𝛽: The vector of regression coefficients corresponding to the 𝜏-th quantile.

The process is represented with the following equation:

$ Q( \beta)=\min\sum_{i:y_{i}>=x_{i}}^{N} (y_{i} - x_{i} \beta)^2 $

# Least Absolute Deviation (LAD) Regression

In the LAD process, we are minimizing the **absolute error symetrically**:

$ Q( \beta)=\min\sum_{i:y_{i}>=x_{i}}^{N} |y_{i} - x_{i} \beta| $

Absolute Deviation (LAD) regression, aims to predict the median of the distribution of the dependent variable (the prediction) given the independent variables (the predictors).

# Quantile Regression

$ Q( \beta_{ \tau })=\min\sum_{i:y_{i}>=x_{i}}^{N}  \tau  | y_{i} - x_{i} \beta _{ \tau }|  + \sum_{i:y_{i}<x_{i}}^{N}  (1 - \tau)  | y_{i} - x_{i} \beta _{ \tau }| $

This equation represents the process of quantile regression, where the goal is to estimate the regression coefficients 𝛽𝜏 for a specific quantile 𝜏 (0 < 𝜏 < 1). 

The objective function aims to minimize the sum of weighted absolute residuals.

    - Q(𝛽𝜏): This is the objective function to be minimized.
    - 𝜏: The target quantile, which lies between 0 and 1 (e.g., 𝜏 = 0.5 represents the median).
    - N: The number of observations in the dataset.
    - yi: The observed value of the dependent variable for the i-th observation.
    - xi: The vector of independent variables for the i-th observation.
    - 𝛽𝜏: The vector of regression coefficients corresponding to the 𝜏-th quantile.

The objective function consists of two parts:

    - The first sum, which is taken over all observations where yi >= xi𝛽𝜏, calculates the weighted absolute residuals for the cases where the observed value (yi) is greater than or equal to the predicted value (xi𝛽𝜏). Each residual is multiplied by the weight 𝜏.

    - The second sum, which is taken over all observations where yi < xi𝛽𝜏, calculates the weighted absolute residuals for the cases where the observed value (yi) is less than the predicted value (xi𝛽𝜏). Each residual is multiplied by the weight (1 - 𝜏).

    - The objective function aims to minimize the sum of these two weighted absolute residuals. This balance of weights for the positive and negative residuals is what allows the model to capture the desired quantile 𝜏 in the regression.

    - For example, when 𝜏 = 0.5 (the median), the weights are equal for positive and negative residuals. As a result, the quantile regression estimates the coefficients that best predict the median of the dependent variable given the independent variables. When 𝜏 is not equal to 0.5, the model estimates coefficients that best predict the 𝜏-th quantile of the dependent variable.

# Two Weightages of Absolute Residuals 

### Assuming quantile regression model learning at the 80th percentile, 𝜏 = 0.8

In the cases where error is positive, i.e. the Actual > Prediction:
    
    - We want to punish the these cases more with a weightage of 0.8. As low predictions are NOT favourable.
    
In the cases where error is negative, i.e. the Actual < Prediction:
    
    - We want to punish the these cases less with a weightage of 0.2. As these cases are favorable.
    - However, there are still punished to keep the balance to the 80th percentile.
    
### Assuming quantile regression model learning at the 20th percentile, 𝜏 = 0.2

In the cases where error is positive, i.e. the Actual > Prediction:
    
    - We want to punish the these cases less with a weightage of 0.2. As low predictions are favourable.
    
In the cases where error is negative, i.e. the Actual < Prediction:

    - We want to punish the these cases less with a weightage of 0.8. As these cases are NOT favorable.

![quantiles_20_50_80.png](attachment:28d66cfe-b0f0-40fb-b1fa-8d8ffedf0ec5.png)

Robust Regression by Least Absolute Deviations Method: http://article.sapub.org/10.5923.j.statistics.20150503.02.html#Sec3
Median minimizes the absolute deviations: https://tommasorigon.github.io/StatI/approfondimenti/Schwertman1990.pdf
Mean Minimizes Sum of Squared Errors: http://faculty.washington.edu/swithers/seestats/SeeingStatisticsFiles/seeing/center/meanproof/meanProof.html