# Linear Regression

**Agenda**<ul>
  <li>Regression</li>
  <li>Use Case: Car Resale Price Prediction using Linear Regression</li>
  <li>Data Exploration</li>
  <li>Linear Regression Introduction</li>
  <li>Hypothesis in Linear Regression</li>
<ul>
  <li>Cost Function</li>
  <li>Feedback</li>
  <li>Gradient Decent</li>
</ul>
  <li>Use Case Simulation</li>
  <li>Metrices for Evaluation Regression Models</li>
  <li>Stochastic Gradient Descent</li>
  <li>Mini Batch Gradient Descent</li>
  <li>Linear Regession using Scikit-Learn</li>
</ul>


# In Class Notes
- Linear Regression:
    - Linear regression models the relationship between a dependent variable **Y** and one or more independent variables $X_1, X_2, …, X_k$. 
    - The **simple linear regression** formula is:
        - $ \hat{y} = \beta_0 + \beta_1 x $
    - For multiple linear regression:
        - $ \hat{y} = \beta_0 + \beta_1 x_1 + + \beta_2 x_2 + ... + \beta_k x_k$
    - Here, 
        - $ \beta_0 $ : Is the intercept.
        - $ \beta_j $ : Are the coefficients. How much 'y' changes for a unit change in each 'X'.
    - Model fitting is typically done by Ordinary Least Squares (OLS), which minimizes the sum of squared residuals (differences between actual and predicted values). This is an iterative process, repeated until an acceptable equation is derived, which yields an average minimized error.
- Ordinary Least Squares: is the most common technique used to estimate the parameters (coefficients) of a linear regression model.
- In regression problems the target variable is continuous in nature.
- Dependent variable: The outcome or result we are trying to predict. This is usually continuous in nature for regression problems.
- Independent variable: The variables that influence the dependent variable is known as independent variable.
- Essentially, the dependent variable 'depends' on the independent variable.
- Assumptions of Machine learning models
    - Linearity: There must exist an independent linear relationship between the dependent variable and the independent variable.
        - ![image.png](attachment:image.png)
        - Can check this by use of visualization. (Scatterplot between x and y)
        - Check co-linearity. (Co-relation heatmap)
    - Homoscedasticity (Equal Variance): 
        - ![image-2.png](attachment:image-2.png)
        - Means that the variance must be equal or constant throughout the set of predictions.
        - The variance of the model must not be dependent on the variance of the independent variable.
        - In practical terms, this means that the spread of the residuals should remain approximately the same as the predicted values increase.
        - The violation of this assumption results in heteroscedasticity.
        - If the scatter plot displays a funnel-shaped pattern or if the spread of residuals widens or narrows systematically with the predicted values, it suggests heteroscedasticity, which violates the homoscedasticity assumption.
        - Check whether the dispersion of residuals remains relatively constant across the range of predicted values. In other words, the variability of residuals should not systematically change as the predicted values increase or decrease.
    - Multivariate Normality (Normality of error distributions)
        - ![image-3.png](attachment:image-3.png)
        - The errors (residuals) of the model are normally distributed.
        - While this assumption is not strictly necessary for large sample sizes (thanks to the Central Limit Theorem --> as the distribution tends to become normal with the increase in sample size), it can be important for smaller sample sizes to ensure the validity of statistical tests and confidence intervals.
        - In cases where the distribution of the residuals are not normal we have to apply transformations to normalize the data.
    - Independence of Errors (No Auto-corelation): The residuals (actual - predicted) must not be co-related to each other
        - ![image-4.png](attachment:image-4.png)
        - Auto co-relation is also known as serial correlation: means that the error terms in your regression model are not independent, and the value of an error term is related to the values of previous error terms. 
        - The violation of the rule indicates that the data are co-related to each other and follow some sort of pattern or trends.
        - Avoid trends and patterns in the data.
        - Consequences of Autocorrelation:
            -  When autocorrelation is present, the standard errors of the regression coefficients can be underestimated, leading to inflated t-statistics and potentially incorrect conclusions about the significance of the variables.
            - This violates the assumption of independent errors and can lead to biased parameter estimates and incorrect statistical inference.
    - Lack of Multicollinearity
        - ![image-5.png](attachment:image-5.png)
        - In multiple linear regression (where there are more than one independent variable), there should be no perfect linear relationship between the independent variables.
        - Perfect multicollinearity occurs when one independent variable can be exactly predicted from another independent variable or a combination of other independent variables.
        - This situation makes it impossible to estimate the unique effect of each independent variable on the dependent variable.
        - If we proceed to build a model in such situations the co-efficient(s) we obtain would be unreliable.
        - Predictors are not correlated with each other
    - The Outlier Check (Optional)
        - ![image-6.png](attachment:image-6.png)
        - 

- Regression algorithm is a weak learner algorithm.
- Types of Linear Regression
    - Simple Linear Regression
    - Multiple Linear Regression

- Error: The difference between the actual value and the predicted value. AKA cost.
    - $Error = Actual\ Value - Predicted\ Value$
- Best Fit Line: The line that yields the least consolidated residuals.
- Ordinary Least Squares:
- Cost Function
- Evaluation Metrics
    - MSE
        - $$MSE = \frac{1}{n} \sum_{i=1}^{n}(y_i-\hat{y_i})^2$$
        - Captures average squared errors—penalizes larger deviations more heavily.
        - Used both as a loss function in OLS and as a performance metric.
        - Note: This metrics punishes outliers or extreme values as they get squared, hence may cause the error to be inflated.

    - RMSE
        - $$RMSE = \sqrt{MSE}$$
        - Returns error in original units, making interpretation easier.
        - Still strongly penalizes outliers.
    - MAE
        - $$ MAE = \frac{1}{n} \sum_{i=1}^{n}|y_i - \hat{y_i}|$$
        - Averages absolute errors—less sensitive to outliers.
        - Easier to interpret, linear penalty, same unit as errors.
    - MAPE
        - $$ MAPE = \frac{100}{n} \sum_{i=1}^{n}|\frac{y_i - \hat{y_i}}{y_i}|$$
        - Expresses error in percentages—scale-independent and intuitive.
        - Con: Distorts when actual values are near zero; could over-penalize small denominators.
    - $R^2$ R-squared or Coefficient of determination
        - This metric represents the part of the variance of the dependent variable explained by the independent variables of the model. It measures the strength of the relationship between your model and the dependent variable.
- Bias
- Variance
- Bias Variance Tradeoff:

