# Linear Least Square Method

- **Author** : Mustafa Sadeghi
- **Contact** : mustafasadeghi@mail.um.ac.ir
  
<img src="img.jpg" alt="My Image" width="100" height="100" style="border-radius: 15px;"/>


## Overview of the Project

This code provides an interactive visualization of the **Least Squares Method**, a widely-used technique in linear regression. The goal of the Least Squares Method is to find the best-fitting line through a set of data points by minimizing the **sum of squared residuals (errors)** — the vertical distances between the actual data points and the predicted points on the regression line.

In this code, the user can manually adjust the **slope** ($\beta_1$) and **intercept** ($\beta_0$) of the regression line using interactive sliders to explore how different values impact the fit and the **sum of squared distances (SSD)**. Although the traditional Least Squares method automatically finds the optimal slope and intercept by minimizing the SSD, this code enables users to visualize and understand the process through manual adjustments.

### Mathematical Insights:

1. **Linear Regression and Least Squares**:
   - The regression line is represented by the equation:
  
     
     $$ y = \beta_1 x + \beta_0 + \epsilon_i $$
     where:
  
     
     - $\beta_1$ is the **slope** of the line.
     - $\beta_0$ is the **intercept** (the value of $y$ when $x = 0$).
     - $\epsilon_i$ is the error term (or residual) for the $i$-th data point, which represents the difference between the actual value $y_i$ and the predicted value $\hat{y_i}$.

   - The objective of the **Least Squares Method** is to minimize the **sum of squared residuals (SSR)**, which is defined as:
     $$ SSR = \sum_{i=1}^{n} \epsilon_i^2 = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $$
     where:
  
     
     - $y_i$ is the actual $y$-value for the $i$-th data point.
       
     - $\hat{y}_i = \beta_1 x_i + \beta_0$ is the predicted $y$-value for the same $x$.

   - By minimizing the SSR, the Least Squares Method ensures that the overall error between the actual data points and the fitted line is as small as possible. The formula for the optimal slope $\beta_1$ and intercept $\beta_0$ is typically derived using calculus and linear algebra.

2. **Sum of Squared Distances (SSD)**:
   - In the code, the **Sum of Squared Distances (SSD)** is computed in real-time as the user adjusts the slope ($\beta_1$) and intercept ($\beta_0$). This SSD value represents the total squared distance between the data points and the line for a given set of slope and intercept values.
   - Mathematically, the SSD is equivalent to the SSR in standard linear regression:
     
     $$ SSD = \sum_{i=1}^{n} \epsilon_i^2 = \sum_{i=1}^{n} \text{Residual}_i^2 $$
   - A lower SSD indicates a better fit between the data points and the regression line.

3. **Residual Visualization**:
   - **Residuals** represent the difference between the actual $y$-value of a data point and the $y$-value predicted by the regression line. In this code, the user can visualize three types of residuals:
   
     - **Vertical Residuals**: The vertical difference between the actual point and the predicted point on the line. This is the default approach used in the Least Squares Method.
    
       
       $$ \text{Vertical Residual} = y_i - \hat{y}_i = y_i - (\beta_1 x_i + \beta_0) $$
     
     - **Horizontal Residuals**: The horizontal difference between the actual $x$-coordinate and the projected $x$-coordinate of the point onto the line.
    
       
       $$ \text{Horizontal Residual} = x_i - \hat{x}_i $$
       where $\hat{x}_i$ is found by solving for $x$ when $y_i = \beta_1 x + \beta_0$.
     
     - **Perpendicular Residuals**: The shortest distance from a point to the regression line, calculated using geometry. This is more geometrically accurate but is not typically used in the Least Squares Method. The perpendicular distance is computed as:
       $$ \text{Perpendicular Distance} = \frac{|y_i - \beta_1 x_i - \beta_0|}{\sqrt{\beta_1^2 + 1}} $$
       This formula calculates the shortest distance from a point to a line in 2D space.

4. **Finding the Best-Fitting Line**:
   - In classical **Least Squares**, the best-fitting line is the one that minimizes the vertical distances (residuals) between the data points and the line. The slope and intercept of the optimal line are derived by solving a system of equations known as the **normal equations**:
  
     
     $$ \beta_1 = \frac{n(\sum x_i y_i) - (\sum x_i)(\sum y_i)}{n(\sum x_i^2) - (\sum x_i)^2} $$
  
     
     $$ \beta_0 = \frac{(\sum y_i) - \beta_1(\sum x_i)}{n} $$
  
     
     where $n$ is the number of data points, and $\sum x_i$ and $\sum y_i$ are the sums of the $x$- and $y$-coordinates, respectively.
   
   - In this code, however, users can manually adjust $\beta_1$ (slope) and $\beta_0$ (intercept) to see how changing these values impacts the SSD and the fit of the line to the data.

### Key Features:

1. **Interactive Plot**: 
   - The plot displays the data points along with a regression line. The slope ($\beta_1$) and intercept ($\beta_0$) of the line can be controlled via sliders, and the plot updates in real-time to show the new fitted line.
   
2. **Residual Visualization**: 
   - The distances between each data point and the regression line are visualized as lines on the plot. The type of distance can be toggled between:
     - **Vertical** (used in standard least squares)
     - **Horizontal**
     - **Perpendicular**

3. **Sum of Squared Distances (SSD)**:
   - The SSD, displayed on the plot, updates dynamically as the slope ($\beta_1$) and intercept ($\beta_0$) are adjusted. This helps the user see how the goal of Least Squares is to minimize this value by choosing the best-fitting line.
   
4. **Mathematical Insight**: 
   - The core concept of the Least Squares Method — minimizing the sum of squared vertical distances — is shown visually. Users can see how different lines affect the SSD and learn why the optimal line is the one that minimizes this sum.
   
While the code includes options for **horizontal** and **perpendicular** distances, the **vertical distance** is directly related to the classic Least Squares approach, where we seek to minimize the vertical errors between the actual and predicted values.


## Perpendicular Residuals Calculation Using PCA

In the **Perpendicular Least Squares** method, we use **Principal Component Analysis (PCA)** to find the best-fitting line that minimizes the perpendicular distances from each point to the regression line. Here’s a step-by-step breakdown of the mathematical process:

1. **Centering the Data**:
   First, we center the data by subtracting the mean of the $x$ and $y$ values from each data point. This ensures the data is centered around the origin:
   
   $$
   x_{\text{cent}} = x_i - \bar{x}, \quad y_{\text{cent}} = y_i - \bar{y}
   $$
   where $\bar{x}$ and $\bar{y}$ are the mean values of $x$ and $y$, respectively.

2. **Covariance Matrix**:
   Next, we calculate the **covariance matrix** of the centered $x$ and $y$ values. The covariance matrix provides insight into how much the two variables vary together:

   $$
   \text{Cov} = 
   \begin{bmatrix}
   \text{Var}(x_{\text{cent}}) & \text{Cov}(x_{\text{cent}}, y_{\text{cent}}) \\
   \text{Cov}(x_{\text{cent}}, y_{\text{cent}}) & \text{Var}(y_{\text{cent}})
   \end{bmatrix}
   $$

   where:
   - $\text{Var}(x_{\text{cent}})$ is the variance of the centered $x$ values.
   - $\text{Var}(y_{\text{cent}})$ is the variance of the centered $y$ values.
   - $\text{Cov}(x_{\text{cent}}, y_{\text{cent}})$ is the covariance between the centered $x$ and $y$ values.

3. **Eigenvalue and Eigenvector Decomposition**:
   We then perform **eigenvalue decomposition** on the covariance matrix to find the **eigenvalues** and **eigenvectors**. These represent the directions of the principal components and the variance along these directions:

   $$
   \text{Cov} \cdot v = \lambda v
   $$

   - $\lambda$ are the eigenvalues (representing variance in each direction).
   - $v$ are the eigenvectors (representing the direction of maximum variance).

4. **Choosing the Principal Component (Line of Best Fit)**:
   The eigenvector corresponding to the largest eigenvalue ($\lambda_1$) gives the direction of the principal component, which represents the line that minimizes the perpendicular residuals:

   $$
   v_1 = \begin{bmatrix} v_{x_1} \\ v_{y_1} \end{bmatrix}
   $$

   The slope of the best-fitting line, $\beta_1$, is calculated from the ratio of the components of this eigenvector:

   $$
   \beta_1 = \frac{v_{y_1}}{v_{x_1}}
   $$

5. **Calculating the Intercept**:
   Once we have the slope $\beta_1$, the next step is to calculate the intercept $\beta_0$. This is done using the means of the original (uncentered) $x$ and $y$ values:

   $$
   \beta_0 = \bar{y} - \beta_1 \cdot \bar{x}
   $$

   This gives the equation of the line of best fit: $y = \beta_1 x + \beta_0$.

6. **Perpendicular Distance**:
   Finally, we compute the **perpendicular distance** from each data point $(x_i, y_i)$ to the line $y = \beta_1 x + \beta_0$. This distance is given by the following formula:

   $$
   \text{Perpendicular Distance} = \frac{|y_i - \beta_1 x_i - \beta_0|}{\sqrt{\beta_1^2 + 1}}
   $$

   This formula gives the shortest distance between a point and a line in 2D space, representing the **Perpendicular Residual** for each point.

### Summary:
- **Center the Data**: Subtract the mean of $x$ and $y$ to center the data around the origin.
- **Covariance Matrix**: Calculate the covariance matrix to understand the spread of the data.
- **Eigenvalue Decomposition**: Decompose the covariance matrix to find the eigenvectors and eigenvalues, which represent the directions of maximum variance.
- **Principal Component**: Use the eigenvector with the largest eigenvalue to calculate the slope and intercept of the line that minimizes perpendicular residuals.
- **Perpendicular Distance**: Compute the shortest distance from each data point to the line, representing the residual for that point.

### Refrence to this part : [PCA in Machine Learning Course tought by Dr.Hadi Sadoghi Yazdi](https://laboratorypatternrecognition.github.io/MachineLearningS/ML/FeatureReduction/PCA.html)

## Cost Function

In this project, we have used the **Sum of Squared Distances (SSD)** as the cost function to calculate the error between the actual data points and the regression line. The SSD is a variant of the **Mean Squared Error (MSE)** without averaging, and it calculates the sum of the squared differences between the predicted values and the actual values. The goal is to minimize this SSD by adjusting the slope ($\beta_1$) and intercept ($\beta_0$) of the regression line.

### Cost Function Used: Sum of Squared Distances (SSD)

The SSD is defined as:

$$ SSD = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $$

Where:
- $y_i$ are the actual values.
- $\hat{y}_i$ are the predicted values based on the regression line.

In this case, we are minimizing the **vertical**, **horizontal**, or **perpendicular** residuals by computing the SSD for each method.

### Other Possible Cost Functions

While the **SSD** (or MSE) is widely used in regression problems, there are other cost functions that could be used depending on the nature of the data and the project requirements. Here are two alternative cost functions:

1. **Mean Absolute Error (MAE)**:
   The MAE calculates the mean of the absolute differences between predicted and actual values. This cost function is less sensitive to outliers compared to MSE, as it treats all differences equally.

   $$ MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i| $$

   - **Advantages**: MAE is robust to outliers since it does not square the errors, so large errors have less impact on the overall cost.
   - **Disadvantages**: It might be less sensitive to small errors and harder to optimize due to its absolute value operation, which is not differentiable at 0.
<br>

2. **Huber Loss**:
   The Huber loss function is a hybrid of MSE and MAE. For small errors, it behaves like MSE (quadratic), while for larger errors, it behaves like MAE (linear). This makes it robust to outliers while still maintaining some sensitivity to smaller errors.

   $$ L_{\delta}(y_i, \hat{y}_i) = \begin{cases} 
      \frac{1}{2}(y_i - \hat{y}_i)^2 & \text{if } |y_i - \hat{y}_i| \leq \delta \\
      \delta \cdot (|y_i - \hat{y}_i| - \frac{1}{2} \delta) & \text{otherwise}
   \end{cases} $$

   - **Advantages**: It balances between penalizing large errors and remaining sensitive to smaller ones. It’s a good compromise between MSE and MAE.
   - **Disadvantages**: Choosing the correct $\delta$ value can be tricky. If $\delta$ is too small, the loss behaves too much like MAE; if it's too large, it behaves like MSE.
<br>

3. **Mean Squared Error (MSE)**:
   The MSE is the mean of the squared differences between the predicted and actual values. It penalizes larger errors more heavily due to squaring the residuals, which makes it sensitive to outliers.

   $$ MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $$

   - **Advantages**: MSE gives more weight to larger errors, making it useful when large deviations are undesirable.
   - **Disadvantages**: It is highly sensitive to outliers, and large errors can dominate the cost function.

### Conclusion

In this project, we primarily use **SSD** (similar to **MSE**) to compute the cost, but depending on the nature of the data and the specific goals of the analysis, we could consider using **MAE** or **Huber Loss** to account for outliers or to ensure a different type of sensitivity to errors.

### Refrence to this part : [Cost_Function in Pattern Recognation Course tought by Dr.Hadi Sadoghi Yazdi](https://laboratorypatternrecognition.github.io/PatternRecognition_S/PR/Introduction/Cost.html)

## Project Implementation

### 1. Import necessary libraries

#### Explanation of Libraries Used in this code

- **NumPy**: Provides support for numerical computations and data manipulation. Used for generating data points and performing mathematical operations.
- **Plotly**: A graphing library that creates interactive visualizations, used here for plotting the scatter plot, regression line, and residuals.
- **ipywidgets**: Allows creation of interactive sliders and dropdowns for real-time updates to the plot as the user adjusts slope, intercept, and distance type.
- **IPython Display**: Embeds interactive elements like widgets and plots within the Jupyter Notebook.
- **time**: Measures the execution time of the program.

In [1]:
import numpy as np
import plotly.graph_objs as go
from ipywidgets import FloatSlider, Dropdown, Layout, HBox, VBox, interactive_output, HTML
from IPython.display import display
import time

In [2]:
start_time = time.time()

### 2. Generate random linear data

This block generates random linear data for `x` and `y`.

- **x**: A sequence of 50 evenly spaced values between -5 and 5.
- **y**: A linear function of `x` with added random noise to simulate real-world variations.


In [3]:
np.random.seed(20)
x = np.linspace(-5, 5, 50)
y = 0.5 * x + np.random.normal(size=x.size)

### 3. Define the function for perpendicular projection

This function calculates the perpendicular projection of a point (`x0`, `y0`) onto a line defined by its **slope** and **intercept**. The function returns the projected point on the line (`x_proj`, `y_proj`).


In [4]:
def perpendicular_projection(x0, y0, slope, intercept):
    x_proj = (x0 + slope * (y0 - intercept)) / (slope**2 + 1)
    y_proj = slope * x_proj + intercept
    return x_proj, y_proj


### 4. Define the function to plot regression and residuals

This function creates an interactive plot showing the data points, a regression line, and the residual distances between the data points and the line. The residuals can be calculated using:
- **Vertical Distance**: The vertical distance between the data point and the line.
- **Horizontal Distance**: The horizontal distance between the data point and the line.
- **Perpendicular Distance**: The shortest distance between the data point and the line.

The plot also displays the **Sum of Squared Distances (SSD)**, a measure of the model's total error, which is updated dynamically as the slope and intercept change.


In [5]:
def plot_regression_plotly(slope=1.0, intercept=0.0, distance_type="vertical"):
    # Compute the fitted regression line
    y_pred = slope * x + intercept

    # Initialize traces for the plot
    data = []
    
    # Trace for the data points
    data.append(go.Scatter(x=x, y=y, mode='markers', name='Data points', marker=dict(color='black')))
    
    # Trace for the fitted regression line
    line_x = np.linspace(-6, 6, 100)
    line_y = slope * line_x + intercept
    data.append(go.Scatter(x=line_x, y=line_y, mode='lines', name=f'Fitted line: y = {slope:.2f}x + {intercept:.2f}', line=dict(color='red')))
    
    # Add residual lines and calculate SSD
    ssd = 0
    for i in range(len(x)):
        if distance_type == "vertical":
            # Vertical distance (difference in y)
            data.append(go.Scatter(x=[x[i], x[i]], y=[y[i], y_pred[i]], mode='lines', line=dict(color='pink', dash='dash')))
            ssd += (y[i] - y_pred[i]) ** 2
        elif distance_type == "horizontal":
            # Horizontal distance (difference in x)
            x_proj = (y[i] - intercept) / slope
            data.append(go.Scatter(x=[x[i], x_proj], y=[y[i], y[i]], mode='lines', line=dict(color='green', dash='dash')))
            ssd += (x[i] - x_proj) ** 2
        elif distance_type == "perpendicular":
            # Perpendicular distance
            x_proj, y_proj = perpendicular_projection(x[i], y[i], slope, intercept)
            data.append(go.Scatter(x=[x[i], x_proj], y=[y[i], y_proj], mode='lines', line=dict(color='blue', dash='dash')))
            perp_dist = np.sqrt((x[i] - x_proj)**2 + (y[i] - y_proj)**2)
            ssd += perp_dist ** 2
    
    # Create the layout for the plot with larger size
    layout = go.Layout(
        title=f'Sum of squared distances ({distance_type}): {ssd:.2f}',
        xaxis=dict(title='x', range=[-6, 6]),
        yaxis=dict(title='y', range=[-6, 6]),
        showlegend=True,
        width=900,  
        height=600,  
        margin=dict(l=40, r=40, t=40, b=40)  
    )
    
    # Create the figure and display it
    fig = go.Figure(data=data, layout=layout)
    fig.show()


### 5. Create interactive widgets

This block creates interactive widgets using `ipywidgets`:
- **Slope Slider**: Allows the user to adjust the slope of the regression line.
- **Intercept Slider**: Allows the user to adjust the intercept of the regression line.
- **Distance Type Dropdown**: Lets the user choose how the distances (residuals) are calculated—either vertically, horizontally, or perpendicularly.


In [6]:
slope_slider = FloatSlider(value=1.0, min=-3.0, max=3.0, step=0.1, layout=Layout(width='300px'))
intercept_slider = FloatSlider(value=0.0, min=-5.0, max=5.0, step=0.1, layout=Layout(width='300px'))
distance_type_dropdown = Dropdown(options=["vertical", "horizontal", "perpendicular"], layout=Layout(width='300px'))
slope_label = HTML(value=f"<b>Slope:</b> {slope_slider.value}")
intercept_label = HTML(value=f"<b>Intercept:</b> {intercept_slider.value}")
distance_type_label = HTML(value=f"<b>Distance type:</b> {distance_type_dropdown.value}")


### 6. Update labels dynamically

This function updates the text labels for slope, intercept, and distance type dynamically as the user interacts with the sliders and dropdown menu. It ensures the displayed labels always reflect the current settings.


In [7]:
# Function to update the labels dynamically
def update_labels(change):
    slope_label.value = f"<b>Slope:</b> {slope_slider.value:.2f}"
    intercept_label.value = f"<b>Intercept:</b> {intercept_slider.value:.2f}"
    distance_type_label.value = f"<b>Distance type:</b> {distance_type_dropdown.value}"


### 7. Attach the update function to widgets

In this block, the `update_labels` function is attached to the slope and intercept sliders and the distance type dropdown. This ensures that every time the user modifies a value, the corresponding labels update.


In [8]:
slope_slider.observe(update_labels, names='value')
intercept_slider.observe(update_labels, names='value')
distance_type_dropdown.observe(update_labels, names='value')


### 8. Arrange widgets in a horizontal layout

This block arranges the sliders and dropdown widgets in a horizontal box (`HBox`) for a clean and organized layout within the notebook. Each control (slope, intercept, distance type) is placed side by side.


In [9]:
controls = HBox([VBox([slope_label, slope_slider]), VBox([intercept_label, intercept_slider]), VBox([distance_type_label, distance_type_dropdown])])

### 9. Define the function to update the plot

This function updates the plot based on the current values of the slope, intercept, and selected distance type. Every time the user interacts with the widgets, this function recalculates the residuals and updates the plot accordingly.


In [10]:
def update_plot(slope, intercept, distance_type):
    plot_regression_plotly(slope, intercept, distance_type)


### 10. Display the interactive plot and controls

This block combines the interactive controls (sliders and dropdown) with the plot output. It uses `interactive_output` to link the plot to the widgets, so the plot updates dynamically when the user changes any value.


In [11]:
output = interactive_output(update_plot, {'slope': slope_slider, 'intercept': intercept_slider, 'distance_type': distance_type_dropdown})

# Display the controls and the plot
display(controls, output)


HBox(children=(VBox(children=(HTML(value='<b>Slope:</b> 1.0'), FloatSlider(value=1.0, layout=Layout(width='300…

Output()

In [12]:
end_time = time.time()

## Print the execution time

In [13]:
execution_time = end_time - start_time
print(f"Program execution time: {execution_time:.4f} seconds")

Program execution time: 0.2985 seconds



## [link to online app with streamlit](https://appapplsq-oqwnjzrmqdqteupbz7kgsk.streamlit.app/)

# Refrences

- [shinyserv.es/shiny/least-squares](https://shinyserv.es/shiny/least-squares/)
- [What is Least Squares?](https://youtu.be/S0ptaAXNxBU?si=rmAQlbvIyfxXnA4L)
- [Least Square Method](https://byjus.com/maths/least-square-method/)
- [Master statistics & machine learning taught by Dr.Mike X Cohen](https://www.udemy.com/share/103adN3@czOWJrj9jn_4NyLh_HQjPNq_E7u0kDShhaJUuEHXuXZYcDRohxOp7WR4rG4BZd2UFw==/)