# Orthogonal Distance Regression

Orthogonal Distance Regression (ODR) is a regression method that finds the best-fit line or curve by minimizing the sum of the squared **orthogonal distances** from the data points to the model. Unlike ordinary least squares (OLS) regression, which minimizes the vertical distances (errors in the y-direction), ODR accounts for errors in **both the x and y variables**.

This makes ODR particularly useful when both independent and dependent variables have measurement errors. The "orthogonal" part refers to the fact that the distances are measured perpendicular to the fitted curve, not vertically or horizontally.

## General Principles of ODR

In a typical OLS regression, you assume that the independent variable ($x$) is known without error, and all the error is in the dependent variable ($y$). The goal is to minimize the sum of squared vertical distances from each data point $(x_i, y_i)$ to the fitted line $y = f(x_i)$.

ODR, however, treats both $x$ and $y$ as having errors. The method involves finding a point $(\hat{x}_i, \hat{y}_i)$ on the curve $y = f(x)$ that is closest to the observed data point $(x_i, y_i)$. The "distance" that ODR minimizes is the perpendicular distance between the observed point $(x_i, y_i)$ and the point on the curve $(\hat{x}_i, \hat{y}_i)$.

The objective function that ODR minimizes is:

$$\sum_{i=1}^{n} \left( w_{x,i}^2(x_i - \hat{x}_i)^2 + w_{y,i}^2(y_i - \hat{y}_i)^2 \right)$$

where
* $(x_i, y_i)$ are the observed data points.
* $(\hat{x}_i, \hat{y}_i)$ are the corresponding adjusted points that lie on the fitted curve.
* The weights $w_{x,i}$ and $w_{y,i}$ are used to properly scale the errors in the objective function. They represent the inverse of the variance (the square of the standard deviation) of the errors for each variable.


A more common and statistically grounded formulation, especially when using standard deviations, is to define the weights as the inverse of the variances:

$$S = \sum_{i=1}^{n} \left( \frac{(x_i - \hat{x}_i)^2}{\sigma_{x,i}^2} + \frac{(y_i - \hat{y}_i)^2}{\sigma_{y,i}^2} \right)$$

where:
* $\sigma_{x,i}$ and $\sigma_{y,i}$ are the known **standard deviations** of the errors for the $i$-th data point in the x and y directions, respectively.
* The terms $\frac{1}{\sigma_{x,i}^2}$ and $\frac{1}{\sigma_{y,i}^2}$ are the statistical weights. This is the standard practice of weighting by the inverse variance, which gives more influence to data.
* $S$ is the final value to be minimized, representing the sum of the squared weighted distances between the observed points and the curve.

The core idea is that the regression should be more influenced by data points with smaller uncertainties and less by those with larger uncertainties.

* **$w_{x,i}$ (x-axis weights):** A data point with a small error in the x-direction (a small standard deviation) will have a large $w_x$ value. This means the algorithm will penalize deviations in the x-direction more heavily, forcing the fitted curve to be closer to that point horizontally.
* **$w_{y,i}$ (y-axis weights):** Similarly, a data point with a small error in the y-direction will have a large $w_y$ value. This pushes the curve to be closer to that point vertically.

## The Importance of the Error Ratio $\lambda$

While the objective function for ODR is written in terms of the absolute standard deviations ($\sigma_{x,i}$ and $\sigma_{y,i}$), the geometric solution—the actual shape and position of the fitted curve—is fundamentally governed by the **ratio of the error variances**.

Let's define the error variance ratio, $\lambda_i$, for each data point:

$$\lambda_i = \frac{\sigma_{y,i}^2}{\sigma_{x,i}^2}$$

We can rewrite the ODR objective function by factoring out the x-variance term:

$$S = \sum_{i=1}^{n} \frac{1}{\sigma_{x,i}^2} \left( (x_i - \hat{x}_i)^2 + \frac{\sigma_{x,i}^2}{\sigma_{y,i}^2}(y_i - \hat{y}_i)^2 \right) = \sum_{i=1}^{n} \frac{1}{\sigma_{x,i}^2} \left( (x_i - \hat{x}_i)^2 + \frac{1}{\lambda_i}(y_i - \hat{y}_i)^2 \right)$$

In this form, we can see that for each point, the term inside the parenthesis defines the *shape* of the error ellipse that is being minimized, while the term outside ($1/\sigma_{x,i}^2$) only scales the overall contribution of that point to the total sum. Since the minimization process is unaffected by multiplying the entire objective function by a constant, what truly matters is the relative weighting between the x and y deviations.

This leads to a powerful insight: **you do not necessarily need to know the absolute values of the errors, as long as you know their ratio.**




Let's consider the implications of the error ratio $\lambda$. Let's look at a single term in the sum for one data point:

$$\text{Term}_i = \frac{(x_i - \hat{x}_i)^2}{\sigma_{x,i}^2} + \frac{(y_i - \hat{y}_i)^2}{\sigma_{y,i}^2}$$

The algorithm wants to make this term small.

*   **Case 1: $\lambda_i \to \infty$ (Error in x is zero)**

    This happens when $\sigma_{x,i}^2 \to 0$. Let's look at the first part of the term: $\frac{(x_i - \hat{x}_i)^2}{\sigma_{x,i}^2}$.
    As the denominator $\sigma_{x,i}^2$ gets closer and closer to zero, this fraction will **explode to infinity** *unless* the numerator is also exactly zero. To prevent the total sum $S$ from becoming infinite, the algorithm has no choice but to force the numerator to be zero.
    
    It **must** set $\hat{x}_i = x_i$.
    
    With this constraint, the first part of the term becomes zero. What's left to minimize?

    $$\text{Term}_i = 0 + \frac{(y_i - \hat{y}_i)^2}{\sigma_{y,i}^2}$$

    Since the fitted point must lie on the curve, we know $\hat{y}_i = f(\hat{x}_i)$. And because we were forced to set $\hat{x}_i = x_i$, this means $\hat{y}_i = f(x_i)$. Substituting this in, the total objective function becomes:

    $$S = \sum_{i=1}^{n} \frac{(y_i - f(x_i))^2}{\sigma_{y,i}^2}$$

    This is the **exact formula for Weighted Least Squares (WLS)**. ODR has transformed into WLS because the infinite penalty on any horizontal deviation left it with no freedom to adjust the x-positions.

*   **Case 2: $\lambda_i \to 0$ (Error in y is zero)**

    This is the symmetrical opposite. This happens when $\sigma_{y,i}^2 \to 0$. The second part of the term, $\frac{(y_i - \hat{y}_i)^2}{\sigma_{y,i}^2}$, will explode unless the algorithm forces $\hat{y}_i = y_i$. The objective function then reduces to minimizing only the horizontal weighted distances:

    $$S = \sum_{i=1}^{n} \frac{(x_i - \hat{x}_i)^2}{\sigma_{x,i}^2}$$

    This is equivalent to performing WLS, but with the roles of x and y swapped.

*   **Case 3: $\lambda_i = 1$ (Errors are equal)**

    This means $\sigma_{x,i}^2 = \sigma_{y,i}^2$. Let's call this common variance $\sigma_i^2$. The objective function becomes:

    $$S = \sum_{i=1}^{n} \left( \frac{(x_i - \hat{x}_i)^2}{\sigma_i^2} + \frac{(y_i - \hat{y}_i)^2}{\sigma_i^2} \right)$$

    We can factor out the denominator:

    $$S = \sum_{i=1}^{n} \frac{1}{\sigma_i^2} \left( (x_i - \hat{x}_i)^2 + (y_i - \hat{y}_i)^2 \right)$$

    By the Pythagorean theorem, the term $(x_i - \hat{x}_i)^2 + (y_i - \hat{y}_i)^2$ is simply the **squared geometric (Euclidean) distance** between the observed point $(x_i, y_i)$ and the fitted point $(\hat{x}_i, \hat{y}_i)$. The overall objective function is therefore minimizing the weighted sum of these squared perpendicular distances. The weight for each point is given by its inverse variance $1/\sigma_i^2$ giving more influence to the more certain data points.

    This becomes a simple (unweighted) sum of squared perpendicular distances only in the even more specific case where all errors are identical ($\sigma_i = \sigma for all points$).

Therefore, if you know that "the measurement error in y is consistently twice as large as the error in x" ($\sigma_y = 2\sigma_x$), you can set $\lambda = (2\sigma_x)^2 / \sigma_x^2 = 4$ for all points and proceed with the ODR fit, even without knowing the exact value of $\sigma_x$. This makes ODR a flexible tool even when error information is incomplete.

## ODR with Non-linear Functions

ODR is especially powerful for non-linear regression because it can handle cases where both variables have errors and the relationship isn't a straight line. The process involves an iterative numerical optimization algorithm to find the parameters of the non-linear function that minimize the objective function described above.

For your specific case with a non-linear function and errors for both the x-axis and y-axis, you will need to:

1.  **Define your non-linear model function**, e.g., $y = f(x; \beta_1, \beta_2, ...)$ where $\beta_i$ are the parameters you want to estimate.
2.  **Provide your data** ($x_i, y_i$).
3.  **Specify the errors** or weights for both variables.
4.  **Use an ODR-specific software library or package** (e.g., `scipy.odr` in Python) to perform the regression. The solver will then iteratively adjust the parameters of your function until the sum of the squared orthogonal distances is minimized. ODR typically uses trust-region Levenberg-Marquardt algorithms.

The output will be the best-fit parameters for your non-linear function, along with their standard errors, which can be used to assess the uncertainty of the estimates.

## When Should You Choose ODR over OLS?

The choice between Ordinary Least Squares (OLS) and Orthogonal Distance Regression (ODR) depends entirely on the nature of the errors in your data. Since ODR is more complex and computationally intensive, it's important to know when its use is truly justified.

**Use Ordinary Least Squares (OLS) when:**

*   **Error in the independent variable (x) is negligible or zero.** This is the classic assumption of OLS. If you are regressing against a variable like time, or a concentration that you set with high precision, the error in x is likely insignificant compared to the measurement error in y.
*   **The error in x is significantly smaller than the error in y.** As a rule of thumb, if the standard deviation of the y-error is more than 5 to 10 times larger than the standard deviation of the x-error, the results from OLS will be very close to those from ODR. In this case, the simplicity and speed of OLS make it the preferred choice.
*   **The primary goal is prediction.** If your main goal is to predict new y-values from new x-values (and you expect new data to have similar error properties), OLS provides an unbiased predictor for y given x, even if x has some error.

**Use Orthogonal Distance Regression (ODR) when:**

*   **Errors in both x and y are significant and of a similar magnitude.** This is the primary use case for ODR. Ignoring the x-errors in this scenario will lead to biased estimates of the model parameters, a phenomenon known as "regression dilution" or "attenuation".
*   **The choice of independent vs. dependent variable is arbitrary.** For example, if you are comparing two different instruments by measuring the same quantity with both, there is no physical reason to call one instrument's measurement "x" and the other "y". ODR treats both variables symmetrically, giving a result that is independent of this arbitrary choice. OLS would give a different best-fit line if you swapped the x and y axes.
*   **The model is highly non-linear.** On very steep or very flat sections of a curve, the vertical distance minimized by OLS can be a poor representation of the "true" distance from a data point to the model. The perpendicular distance minimized by ODR is often more geometrically stable and robust in these cases.

| Scenario | Recommended Method | Rationale |
| :--- | :--- | :--- |
| Error in x is negligible | **OLS** | The core assumption of OLS is met. |
| Error in x is much smaller than error in y | **OLS** | Results will be nearly identical to ODR, but OLS is simpler. |
| Errors in x and y are comparable | **ODR** | OLS will produce biased parameter estimates. |
| Variables are symmetric (e.g., comparing two methods) | **ODR** | The result should not depend on which variable is on which axis. |
| Highly non-linear model with errors in x | **ODR** | Orthogonal distance can be a more robust error metric. |

Assessing the goodness of fit for Orthogonal Distance Regression (ODR), especially with non-linear models, is more complex than with Ordinary Least Squares (OLS). Since ODR accounts for errors in both x and y variables, traditional metrics like R-squared can be misleading or inappropriate.

The most suitable and standard goodness-of-fit parameter for ODR is the **reduced chi-squared ($\chi_{red}^2$)** statistic.

### Reduced Chi-Squared ($\chi_{red}^2$)

The reduced chi-squared is a powerful and statistically rigorous measure for ODR because it directly incorporates the uncertainties (weights) of both the x and y data points. It is defined as the sum of squared weighted residuals divided by the number of degrees of freedom.

$$\chi_{red}^2 = \frac{\sum_{i=1}^{n} \left( \frac{(x_i - \hat{x}_i)^2}{\sigma_{x,i}^2} + \frac{(y_i - \hat{y}_i)^2}{\sigma_{y,i}^2} \right)}{n - p}$$

* $n$ is the number of data points.
* $p$ is the number of parameters in the model.
* $\sigma_{x,i}$ and $\sigma_{y,i}$ are the known standard deviations of the errors for each data point $(x_i, y_i)$.

#### Interpretation of Reduced Chi-Squared

The value of $\chi_{red}^2$ tells you how well the model fits the data relative to the expected measurement errors.

* A value of **$\chi_{red}^2 \approx 1$** indicates an excellent fit. The model's residuals are consistent with the measurement uncertainties you provided. The observed scatter is what you'd expect.
* A value of **$\chi_{red}^2 \gg 1$** (significantly greater than 1) suggests a poor fit. This means the model does not adequately describe the data, or you have underestimated your measurement uncertainties. The observed scatter is much larger than expected.
* A value of **$\chi_{red}^2 < 1$** may indicate that the model is "overfitting" the data, or that you have overestimated your measurement uncertainties.

---

### R-squared and ODR

While R-squared is a common metric for OLS, its application to ODR is problematic. R-squared is based on the idea of explaining the variance in the y-variable, assuming the x-variable is known without error. This assumption is explicitly violated in ODR. Calculating a pseudo R-squared value for ODR can be done, but it often lacks a clear statistical interpretation and can be insensitive to the order of the variables (swapping x and y). For these reasons, **R-squared and adjusted R-squared are generally not recommended** for evaluating ODR results.

### Non-linear Regression and Other Parameters

For both linear and non-linear regression, other metrics and methods are valuable for assessing the fit:

* **Residual Plots:** These are crucial. Plotting the residuals (the distances from the points to the fitted curve) against the fitted values can reveal systematic patterns (e.g., a "U" shape) that indicate a poor model choice, even if other statistical parameters look good.
* **Standard Errors of Parameters:** The standard errors of the fitted parameters provide a measure of their uncertainty. If these are very large, it suggests that the data are not sufficient to accurately determine the model's parameters.
* **Confidence and Prediction Bands:** Visualizing the confidence and prediction bands around the fitted curve is an excellent way to see the range of uncertainty in your model and its predictions.