# Exploring Least Squares in Linear Regression

In the previous section, we calculated price estimates directly using predefined coefficient values. While this approach is useful for prediction, it doesn’t leverage the full power of linear regression. The true strength of linear regression emerges when we flip the problem: rather than knowing the coefficients and predicting prices, we can use data to estimate the coefficients themselves. This allows us to determine how each feature influences the final price, which is the key to understanding relationships in data.

## Why Can't We Always Get Perfect Predictions?

In reality, it is nearly impossible to find coefficients that perfectly predict the prices for every data point. There are numerous reasons why this happens:

- **External factors**: Prices are affected by factors outside of the model, such as market trends, location desirability, and the economic climate.

- **Data noise**: Random variations in the data may introduce unpredictable fluctuations that are difficult for any model to capture.

- **Confounding variables**: Some features may have hidden relationships that aren’t accounted for in the model, making predictions less reliable.

- **Selection bias**: The data used to build the model may not represent the full population or all possible scenarios, leading to inaccuracies.

For these reasons, linear regression models will usually make approximate predictions rather than exact ones. Therefore, it's important to critically assess how well the model reflects the true relationships in the data and understand its limitations.


## Estimating Coefficients with the Least Squares Method

One of the most widely-used techniques for estimating the coefficients in a linear regression model is the least squares method. This method, developed by Adrien-Marie Legendre in the early 19th century, minimizes the sum of the squared differences between the actual observed values and the predicted values produced by the model.

Given a dataset with known input features \( X \) and known output values \( y \), the goal is to find the coefficient vector \( c \) that minimizes the sum of squared errors (SSE):

\[
SSE = \sum (y_{\text{actual}} - y_{\text{predicted}})^2
\]

The coefficients that minimize the SSE are those that make the model's predictions as close as possible to the true observed values.


## Practical Example: Finding the Best Coefficient Set

To illustrate this concept, we will calculate the sum of squared errors for several different sets of coefficient values and identify which set provides the best fit for the data. This is a simplified example of the least squares method, where instead of finding the global optimum, we evaluate a fixed number of alternatives.

In [1]:
import numpy as np

# Data: Features (X) and actual prices (y)
X = np.array([[66, 5, 15, 2, 500], 
              [21, 3, 50, 1, 100], 
              [120, 15, 5, 2, 1200]])

y = np.array([250000, 60000, 525000])

# Alternative sets of coefficient values
c = np.array([[3000, 200, -50, 5000, 100], 
              [2000, -250, -100, 150, 250], 
              [3000, -100, -150, 0, 150]])

def find_best(X, y, c):
    smallest_error = np.inf  # Initialize with infinity to find minimum
    best_index = -1  # To track the best set of coefficients
    
    for i, coeff in enumerate(c):
        # Predict prices using current coefficient set
        predictions = X @ coeff
        
        # Calculate sum of squared errors (SSE)
        sse = np.sum((y - predictions) ** 2)
        
        # Update best index if current set has a smaller error
        if sse < smallest_error:
            smallest_error = sse
            best_index = i
    
    print("The best set of coefficients is set %d" % best_index)

find_best(X, y, c)


The best set of coefficients is set 1


## How the Least Squares Method Works

The least squares method tries to find the best-fitting line (or hyperplane in higher dimensions) that minimizes the error between the actual data points and the predicted values. The key idea is to adjust the coefficients so that the sum of squared errors across all data points is as small as possible.

In the example above, we are comparing three different sets of coefficients. For each set, we compute the predictions by multiplying the input features \( X \) by the coefficients \( c \), and then calculate the sum of squared errors for the difference between the actual prices and the predicted prices.

## Visualizing the Fit

To better understand how well each coefficient set fits the data, it is helpful to visualize the predictions and compare them to the actual prices. If the model is a good fit, the predicted values should lie close to the actual data points when plotted on a graph. By visualizing this relationship, we can determine whether a linear model is appropriate for the data at hand.

## Conclusion

The least squares method is a cornerstone technique in both statistics and machine learning for fitting linear models. By minimizing the sum of squared errors, it provides a simple yet effective way to estimate the coefficients that best explain the relationships in the data. While it may not always provide a perfect fit, especially in the presence of noise or bias, it is a powerful tool for understanding how features contribute to predictions.

As we continue exploring linear regression, we will see how this method can be applied to more complex datasets and scenarios, extending beyond basic cabin price predictions to real-world applications.