# Least Squares Regression, SSR, RMSE, R-squared (Coefficient of Determination)

When you have a set of experimental data points $(x_i, y_i)$, where $i$ ranges from 1 to $n$ (the number of data points), and you want to find a mathematical function that best describes the relationship between $x$ and $y$, you are performing **curve fitting** or **regression analysis**. The goal is to find the parameters of a chosen function that make the function's output as close as possible to your observed $y$ values for the corresponding $x$ values.

Let's review some specific non-linear case, and consider that we want to approximate the data with the function:

$$f(x; A, B) = A \cdot (e^{-B \cdot x} - 1) + 100$$

Here, $A$ and $B$ are the parameters that we need to determine from the data points. The '100' is a constant offset in this specific function.

```{note}
As of now, we consider all experimental points with no errors. However, if $y$ values or both $x$ and $y$ values have some errors, we need to apply different algorithms. We will consider such algorihms later.
```

## The Core Idea: Minimizing Differences

> The fundamental idea behind most curve fitting methods is to minimize the "difference" between your experimental $y_i$ values and the $y$ values predicted by your chosen function, $f(x_i; A, B)$. This "difference" is often called the **residual**.

For each data point $(x_i, y_i)$, the residual, $r_i$, is defined as:

$r_i = y_i - f(x_i; A, B)$

Our goal is to find the values of $A$ and $B$ that make these residuals, collectively, as small as possible.

There are various methods for approximating data, but for continuous functions and without explicit error bars on individual points (as you specified initially), the most common and widely used method is **Least Squares Regression**.



## Least Squares Regression

The principle of least squares is to find the parameters (in our case, $A$ and $B$) that **minimize the sum of the squares of the residuals**. Why squares?
* Squaring the residuals ensures that positive and negative differences don't cancel each other out.
* It penalizes larger errors more heavily than smaller errors, which is often desirable.
* Mathematically, it leads to a convex optimization problem (for many functions), making it easier to find a unique minimum.

So, we want to minimize the following quantity, which is the **Sum of Squared Residuals (SSR)**:

$$ SSR(A, B) = \sum_{i=1}^{n} (y_i - f(x_i; A, B))^2 $$

Substituting our specific function:

$$ SSR(A, B) = \sum_{i=1}^{n} (y_i - (A \cdot (e^{-B \cdot x_i} - 1) + 100))^2 $$

To find the values of $A$ and $B$ that minimize $SSR$, we typically use calculus. We take the partial derivatives of $SSR$ with respect to each parameter ($A$ and $B$), set them equal to zero, and solve the resulting system of equations.

$$\frac{\partial SSR}{\partial A} = 0$$

$$\frac{\partial SSR}{\partial B} = 0$$

For linear regression, these equations are linear and have a direct analytical solution. However, for non-linear functions like ours (due to the $e^{-B \cdot x}$ term), these equations are often non-linear and require iterative numerical optimization algorithms (like the Levenberg-Marquardt algorithm, which is commonly used in `scipy.optimize.curve_fit` in Python). We won't derive the specific partial derivatives for the aforementioned function here, as it gets quite involved and typically handled by computational tools. The core idea remains the same: find $A$ and $B$ that make the slope of the $SSR$ surface zero.

Once you've found the best-fit parameters $A$ and $B$, you need to evaluate how "good" your approximation is. This is where metrics like SSR, RMSE, and R-squared come in.


## Sum of Squared Residuals (SSR)

As derived above, SSR is:

$$SSR = \sum_{i=1}^{n} (y_i - f(x_i; A, B))^2$$

**Meaning:** SSR is a direct measure of the total discrepancy between your observed data points and your fitted function. A smaller SSR indicates a better fit to the data.

**Understanding:**
* It's always non-negative.
* Its units are the square of the units of $y$.
* It's absolute: you CANNOT compare SSR directly between different datasets or models with different numbers of data points or vastly different scales of $y$.

## Root Mean Squared Error (RMSE)

RMSE is derived directly from SSR and is often more interpretable:

$$RMSE = \sqrt{\frac{SSR}{n}}$$

**Meaning:** RMSE represents the typical or average magnitude of the residuals. It gives you a sense of the average "error" your model makes in predicting $y$.

**Understanding:**
* It's in the same units as your dependent variable $y$. This makes it easier to interpret: "On average, our prediction is off by RMSE units of Y."
* It's sensitive to outliers because of the squaring of errors. Large errors contribute disproportionately to RMSE.
* Like SSR, a smaller RMSE indicates a better fit.
* You can compare RMSE values between different models **on the same dataset** to see which one performs better, provided the models have a similar number of parameters. Comparing RMSE across different datasets or datasets with vastly different scales of $y$ can still be misleading.

## R-squared (Coefficient of Determination)

R-squared is a very popular metric because it provides a standardized measure of how well your model explains the variability in the dependent variable $y$.

To understand R-squared, we need another concept: **Total Sum of Squares (SST)**.

$$SST = \sum_{i=1}^{n} (y_i - \bar{y})^2$$

Where $\bar{y}$ is the mean of the observed $y$ values:

$$\bar{y} = \frac{1}{n} \sum_{i=1}^{n} y_i$$

SST represents the total variability in the observed $y$ values around their mean. It's the sum of squared differences if you were to approximate all $y_i$ with their mean $\bar{y}$ (which is essentially a horizontal line).

Now, R-squared is defined as:

$$R^2 = 1 - \frac{SSR}{SST}$$

```{note}
NOTE: We will derive this formula below.
```

**Meaning:** R-squared tells you the proportion of the variance in the dependent variable ($y$) that is predictable from the independent variable ($x$) using your regression model. In simpler terms, it indicates how much of the variation in $y$ can be explained by your chosen function.

**Understanding:**
* $R^2$ ranges from 0 to 1 (or 0% to 100%).
* An $R^2$ of 1 (or 100%) means that your model perfectly explains all the variability in $y$. The residuals are all zero, and the function passes through every data point. This is rare in experimental data.
* An $R^2$ of 0 means that your model explains none of the variability in $y$. In this case, your model performs no better than simply predicting the mean of $y$ for all $x$ values.
* A higher $R^2$ generally indicates a better fit.

**Interpretation Caveats:**
* A high $R^2$ doesn't necessarily mean the model is "correct" or that the chosen function is the true underlying relationship. It just means it explains a lot of the variance.
* Adding more parameters to a model will generally increase $R^2$, even if those parameters don't significantly improve the model's predictive power (this is why **adjusted R-squared** is sometimes used, which penalizes for added complexity).
* $R^2$ is most appropriate for linear models. For non-linear models, its interpretation can be slightly more nuanced, but it still serves as a useful indicator of goodness-of-fit.
* It's possible to have a low $R^2$ for a valid model if the inherent variability in the data (noise) is very high, even if the model captures the underlying trend.

## Summary of the Process (No Errors on X- and Y-axes Case):

1.  **Visualize your data:** Plot your $(x_i, y_i)$ points to get an initial sense of the trend. This helps in choosing an appropriate functional form.
2.  **Choose a functional form:** Based on your knowledge of the underlying process or by observing the data, select a mathematical function (like $A \cdot (e^{-B \cdot x} - 1) + 100$) that you believe can describe the relationship.
3.  **Define the objective function:** Formulate the Sum of Squared Residuals (SSR) equation, which is the quantity you want to minimize.
4.  **Minimize the SSR:** Use an optimization algorithm (like least squares implemented in `scipy.optimize.curve_fit`) to find the values of the parameters ($A$ and $B$ in the reviewed case) that minimize the SSR.
5.  **Evaluate the fit:** Calculate SSR, RMSE, and R-squared to quantify how well your chosen function with the optimized parameters fits your data.
6.  **Interpret the results:** Understand what the values of $A$, $B$, SSR, RMSE, and R-squared tell you about the relationship between $x$ and $y$ and the quality of your approximation.