# Orthogonal Distance Regression

Orthogonal Distance Regression (ODR) is a regression method that finds the best-fit line or curve by minimizing the sum of the squared **orthogonal distances** from the data points to the model. Unlike ordinary least squares (OLS) regression, which minimizes the vertical distances (errors in the y-direction), ODR accounts for errors in **both the x and y variables**.

This makes ODR particularly useful when both independent and dependent variables have measurement errors. The "orthogonal" part refers to the fact that the distances are measured perpendicular to the fitted curve, not vertically or horizontally.

## General Principles of ODR

In a typical OLS regression, you assume that the independent variable $x$ is known without error, and all the error is in the dependent variable $y$. The goal is to minimize the sum of squared vertical distances, which is the squared difference between the observed value $y_i$ and the model's prediction $f(x_i)$.

ODR, however, treats both $x$ and $y$ as having errors. The method involves finding a point $(\hat{x}_i, \hat{y}_i)$ on the curve $y = f(x)$ that is closest to the observed data point $(x_i, y_i)$. The "distance" that ODR minimizes is the perpendicular distance between the observed point $(x_i, y_i)$ and the point on the curve $(\hat{x}_i, \hat{y}_i)$.

The objective function that ODR minimizes is:

$$S = \sum_{i=1}^{n} \left( \frac{(x_i - \hat{x}_i)^2}{\sigma_{x,i}^2} + \frac{(y_i - \hat{y}_i)^2}{\sigma_{y,i}^2} \right)$$

where:
* $(x_i, y_i)$ are the observed data points.
* $(\hat{x}_i, \hat{y}_i)$ are the corresponding adjusted points that lie on the fitted curve.
* $\sigma_{x,i}$ and $\sigma_{y,i}$ are the known **standard deviations** of the errors for the $i$-th data point in the x and y directions, respectively.
* The terms $\frac{1}{\sigma_{x,i}^2}$ and $\frac{1}{\sigma_{y,i}^2}$ are the statistical weights. This is the standard practice of weighting by the inverse variance, which gives more influence to data.
* $S$ is the final value to be minimized, representing the sum of the squared weighted distances between the observed points and the curve.

The core idea is that the regression should be more influenced by data points with smaller uncertainties and less by those with larger uncertainties.

```{note}
It is important to note that this form of the objective function assumes that the errors in the x and y variables are **uncorrelated**. If the errors were correlated, the objective function would require a more complex formulation involving a covariance matrix for each data point.
```

## The Two Roles of Error in ODR

A careful look at the ODR objective function reveals that the error terms play two distinct and crucial roles in determining the final fit. To see this, we can rewrite the objective function by defining the **error variance ratio**, $\lambda_i$, for each data point:

$$\lambda_i = \frac{\sigma_{y,i}^2}{\sigma_{x,i}^2}$$

The objective function can then be expressed as:

$$S = \sum_{i=1}^{n} \frac{1}{\sigma_{x,i}^2} \left( (x_i - \hat{x}_i)^2 + \frac{\sigma_{x,i}^2}{\sigma_{y,i}^2}(y_i - \hat{y}_i)^2 \right) = \sum_{i=1}^{n} \frac{1}{\sigma_{x,i}^2} \left( (x_i - \hat{x}_i)^2 + \frac{1}{\lambda_i}(y_i - \hat{y}_i)^2 \right)$$

This form clearly separates the two roles:

**1. Intra-Point Geometry (The Error Ratio $\lambda_i$)**

The term inside the parenthesis, $\left( (x_i - \hat{x}_i)^2 + \frac{1}{\lambda_i}(y_i - \hat{y}_i)^2 \right)$, is controlled by the ratio $\lambda_i$. This ratio determines the **geometry of the error** for a single point `i`. It sets the relative "cost" of a deviation in the x-direction versus a deviation in the y-direction.

*   If $\lambda_i$ is large (y-error >> x-error), the term $\frac{1}{\lambda_i}$ is small, making it "cheaper" for the fit to accommodate deviations in y.
*   If $\lambda_i = 1$ (x-error = y-error), the penalties are equal, and the error geometry is circular, resulting in a truly perpendicular distance.

In essence, $\lambda_i$ tells the algorithm the optimal **direction** from the data point to the curve.

**2. Inter-Point Weighting (The Absolute Error Variance)**

The term outside the parenthesis, $\frac{1}{\sigma_{x,i}^2}$, acts as the **overall weight for point `i`** within the sum. This term dictates the **influence of point `i` relative to all other points**.

*   A point with small absolute errors (e.g., a small $\sigma_{x,i}$) will have a large weight ($1/\sigma_{x,i}^2$), making it a "strong magnet." The algorithm will work much harder to minimize the distance for this point, as it contributes significantly to the total sum $S$.
*   A point with large absolute errors will have a small weight, acting as a "weak magnet" with less influence on the final position of the curve.

Therefore, the absolute magnitude of the errors is critical for determining which points the regression should prioritize.

**When Does Only the Ratio Matter for Homoscedastic Errors?**

The powerful insight that you might only need the error ratio is **only true under a specific, common assumption: that the error magnitudes are constant for all data points** (homoscedastic errors).

Let's assume that $\sigma_{x,i} = \sigma_x$ and $\sigma_{y,i} = \sigma_y$ for all points `i`. In this case:

1.  The ratio $\lambda_i$ becomes a single constant for the entire dataset: $\lambda = \sigma_y^2 / \sigma_x^2$.
2.  The weighting term $\frac{1}{\sigma_{x,i}^2}$ also becomes a single constant: $\frac{1}{\sigma_x^2}$.

Now, the objective function simplifies:

$$S = \sum_{i=1}^{n} \frac{1}{\sigma_x^2} \left( (x_i - \hat{x}_i)^2 + \frac{1}{\lambda}(y_i - \hat{y}_i)^2 \right)$$

Since $\frac{1}{\sigma_x^2}$ is now a constant multiplier for the *entire sum*, we can factor it out:

$$S = \frac{1}{\sigma_x^2} \sum_{i=1}^{n} \left( (x_i - \hat{x}_i)^2 + \frac{1}{\lambda}(y_i - \hat{y}_i)^2 \right)$$

Multiplying the entire function by a constant does not change the location of its minimum. Therefore, if you can assume that your measurement errors are consistent across all your data points, then you **do not need to know their absolute values to find the best-fit curve; you only need to know their ratio, $\lambda$.** This is a very common scenario in practice, for instance, when all measurements are made using the same instrument under the same conditions.

Now, let's consider the implications of the error ratio $\lambda$. Let's look at a single term in the sum for one data point:

$$\text{Term}_i = \frac{(x_i - \hat{x}_i)^2}{\sigma_{x,i}^2} + \frac{(y_i - \hat{y}_i)^2}{\sigma_{y,i}^2}$$

The algorithm wants to make this term small.

*   **Case 1: $\lambda_i \to \infty$ (Error in x is zero)**

    This happens when $\sigma_{x,i}^2 \to 0$. Let's look at the first part of the term: $\frac{(x_i - \hat{x}_i)^2}{\sigma_{x,i}^2}$.
    As the denominator $\sigma_{x,i}^2$ gets closer and closer to zero, this fraction will **explode to infinity** *unless* the numerator is also exactly zero. To prevent the total sum $S$ from becoming infinite, the algorithm has no choice but to force the numerator to be zero.
    
    It **must** set $\hat{x}_i = x_i$.
    
    With this constraint, the first part of the term becomes zero. What's left to minimize?

    $$\text{Term}_i = 0 + \frac{(y_i - \hat{y}_i)^2}{\sigma_{y,i}^2}$$

    Since the fitted point must lie on the curve, we know $\hat{y}_i = f(\hat{x}_i)$. And because we were forced to set $\hat{x}_i = x_i$, this means $\hat{y}_i = f(x_i)$. Substituting this in, the total objective function becomes:

    $$S = \sum_{i=1}^{n} \frac{(y_i - f(x_i))^2}{\sigma_{y,i}^2}$$

    This is the **exact formula for Weighted Least Squares (WLS)**. ODR has transformed into WLS because the infinite penalty on any horizontal deviation left it with no freedom to adjust the x-positions.

*   **Case 2: $\lambda_i \to 0$ (Error in y is zero)**

    This is the symmetrical opposite. This happens when $\sigma_{y,i}^2 \to 0$. The second part of the term, $\frac{(y_i - \hat{y}_i)^2}{\sigma_{y,i}^2}$, will explode unless the algorithm forces $\hat{y}_i = y_i$. The objective function then reduces to minimizing only the horizontal weighted distances:

    $$S = \sum_{i=1}^{n} \frac{(x_i - \hat{x}_i)^2}{\sigma_{x,i}^2}$$

    This is equivalent to performing WLS, but with the roles of x and y swapped.

*   **Case 3: $\lambda_i = 1$ (Errors are equal)**

    This means $\sigma_{x,i}^2 = \sigma_{y,i}^2$. Let's call this common variance $\sigma_i^2$. The objective function becomes:

    $$S = \sum_{i=1}^{n} \left( \frac{(x_i - \hat{x}_i)^2}{\sigma_i^2} + \frac{(y_i - \hat{y}_i)^2}{\sigma_i^2} \right)$$

    We can factor out the denominator:

    $$S = \sum_{i=1}^{n} \frac{1}{\sigma_i^2} \left( (x_i - \hat{x}_i)^2 + (y_i - \hat{y}_i)^2 \right)$$

    By the Pythagorean theorem, the term $(x_i - \hat{x}_i)^2 + (y_i - \hat{y}_i)^2$ is simply the **squared geometric (Euclidean) distance** between the observed point $(x_i, y_i)$ and the fitted point $(\hat{x}_i, \hat{y}_i)$. The overall objective function is therefore minimizing the weighted sum of these squared perpendicular distances. The weight for each point is given by its inverse variance $1/\sigma_i^2$ giving more influence to the more certain data points.

    This becomes a simple (unweighted) sum of squared perpendicular distances only in the even more specific case where all errors are identical ($\sigma_i = \sigma$ for all points).

## ODR with Non-linear Functions

ODR is especially powerful for non-linear regression because it can handle cases where both variables have errors and the relationship isn't a straight line. The process involves an iterative numerical optimization algorithm to find the parameters of the non-linear function that minimize the objective function described above.

For your specific case with a non-linear function and errors for both the x-axis and y-axis, you will need to:

1.  **Define your non-linear model function**, e.g., $y = f(x; \beta_1, \beta_2, ...)$ where $\beta_i$ are the parameters you want to estimate.
2.  **Provide your data** ($x_i, y_i$).
3.  **Specify the errors** or weights for both variables.
4.  **Use an ODR-specific software library or package** (e.g., `scipy.odr` in Python) to perform the regression. The solver will then iteratively adjust the parameters of your function until the sum of the squared orthogonal distances is minimized. ODR typically uses trust-region Levenberg-Marquardt algorithms.

The output will be the best-fit parameters for your non-linear function, along with their standard errors, which can be used to assess the uncertainty of the estimates.

## When Should You Choose ODR over OLS?

The choice between Ordinary Least Squares (OLS) and Orthogonal Distance Regression (ODR) depends entirely on the nature of the errors in your data. Since ODR is more complex and computationally intensive, it's important to know when its use is truly justified.

**Use Ordinary Least Squares (OLS) when:**

*   **Error in the independent variable (x) is negligible or zero.** This is the classic assumption of OLS. If you are regressing against a variable like time, or a concentration that you set with high precision, the error in x is likely insignificant compared to the measurement error in y.
*   **The error in x is significantly smaller than the error in y.** As a rule of thumb, if the standard deviation of the y-error is more than 5 to 10 times larger than the standard deviation of the x-error, the results from OLS will be very close to those from ODR. In this case, the simplicity and speed of OLS make it the preferred choice.
*   **The primary goal is prediction.** If your main goal is to predict new y-values from new x-values (and you expect new data to have similar error properties), OLS provides an unbiased predictor for y given x, even if x has some error.

**Use Orthogonal Distance Regression (ODR) when:**

*   **Errors in both x and y are significant and of a similar magnitude.** This is the primary use case for ODR. Ignoring significant errors in the independent variable leads to a bias known as regression dilution, where the estimated slope of the relationship is systematically underestimated (biased towards zero).
*   **The choice of independent vs. dependent variable is arbitrary.** For example, if you are comparing two different instruments by measuring the same quantity with both, there is no physical reason to call one instrument's measurement "x" and the other "y". ODR treats both variables symmetrically, giving a result that is independent of this arbitrary choice. OLS would give a different best-fit line if you swapped the x and y axes.
*   **The model is highly non-linear.** On very steep or very flat sections of a curve, the vertical distance minimized by OLS can be a poor representation of the "true" distance from a data point to the model. The perpendicular distance minimized by ODR is often more geometrically stable and robust in these cases.

| Scenario | Recommended Method | Rationale |
| :--- | :--- | :--- |
| Error in x is negligible | **OLS** | The core assumption of OLS is met. |
| Error in x is much smaller than error in y | **OLS** | Results will be nearly identical to ODR, but OLS is simpler. |
| Errors in x and y are comparable | **ODR** | OLS will produce biased parameter estimates. |
| Variables are symmetric (e.g., comparing two methods) | **ODR** | The result should not depend on which variable is on which axis. |
| Highly non-linear model with errors in x | **ODR** | The perpendicular distance minimized by ODR is often more geometrically stable. OLS can suffer from **projection bias** on steep or flat sections of a curve, where the vertical distance is a poor approximation of the true geometric error. |

## How to Assess the Goodness of Fit for Orthogonal Distance Regression (ODR)

Once you have performed an ODR fit, it is crucial to evaluate how well the resulting model actually represents your data. Because ODR does not seek to minimize the simple vertical distances like OLS, some of the most common goodness-of-fit metrics, like R-squared, are not appropriate. Instead, we must turn to metrics that are consistent with the ODR objective function.

The primary methods for assessing an ODR fit rely on the final, minimized value of the objective function itself, residual analysis, and information criteria for model comparison.

### The Chi-Squared ($\chi^2$) and Reduced Chi-Squared ($\chi^2_{red}$) Statistics

The most natural and powerful metrics for ODR goodness of fit are derived directly from the value that the algorithm worked to minimize.

*   **The Chi-Squared Statistic ($\chi^2$)**

    The final, minimized value of the ODR objective function, $S$, is itself a chi-squared statistic.

    $$ \chi^2 = S_{min} = \sum_{i=1}^{n} \left( \frac{(x_i - \hat{x}_i)^2}{\sigma_{x,i}^2} + \frac{(y_i - \hat{y}_i)^2}{\sigma_{y,i}^2} \right) $$

    where $\hat{x}$ and $\hat{y}$ are fitted data points on the curve. This value represents the total weighted squared error of the fit. However, its absolute magnitude is difficult to interpret on its own, as it scales with the number of data points.

*   **The Reduced Chi-Squared Statistic ($\chi^2_{red}$) - The Best Indicator**

    To create a standardized and highly interpretable metric, we calculate the **reduced chi-squared statistic**, which is the final $\chi^2$ value divided by the **degrees of freedom** ($\nu$).

    The degrees of freedom are the number of data points ($n$) minus the number of parameters ($p$) estimated by the model: $\nu = n - p$.

    $$ \chi^2_{red} = \frac{\chi^2}{\nu} = \frac{S_{min}}{n - p} $$

    The reduced chi-squared value provides a powerful assessment of the fit quality under the assumption that your error estimates ($\sigma_{x,i}, \sigma_{y,i}$) are accurate:

    *   **$\chi^2_{red} \approx 1.0$**: This is the ideal outcome. It indicates that the model is a good fit to the data and that the measurement errors were likely estimated correctly. The observed scatter of the data points around the fitted curve is consistent with their reported error bars.
    *   **$\chi^2_{red} > 1.0$**: This suggests a poor fit. There are two possible reasons:
        1.  **The model is wrong:** The chosen function does not adequately describe the true physical relationship.
        2.  **The errors were underestimated:** Your reported error values ($\sigma_{x,i}, \sigma_{y,i}$) are too small, and the data is actually "noisier" than you thought.
        3. **Presence of Outliers**: The dataset may contain one or more significant outliers. Because the chi-squared statistic is based on the sum of squared deviations, a few points that are very far from the fitted curve can disproportionately inflate the total value, leading to a high reduced chi-squared even if the model is appropriate for the rest of the data.
    *   **$\chi^2_{red} < 1.0$**: This indicates that the fit is "too good." The data points are, on average, closer to the line than their error estimates would predict. This can mean:
        1.  **The errors were overestimated:** Your reported error values are too large.
        2.  **The model is overfitting the data:** This is a risk if the model is too complex relative to the number of data points.

    Many regression packages, including `scipy.odr`, report this value as `res_var` (residual variance), which is equivalent to the reduced chi-squared.

See the notes about non-linear case for the Reduced Chi-Squared [here](goodness-of-fit-and-chi-squared.ipynb). The interpretation of the reduced chi-squared statistic is most rigorous for models that are linear in their parameters. For non-linear models, the statistical assumptions that guarantee the chi-squared distribution are not always met, and the exact number of degrees of freedom can be ambiguous. In such cases, the reduced chi-squared value remains a powerful and widely used heuristic for assessing the goodness of fit, but it should be interpreted with a degree of caution.

### R-Squared ($R^2$) and Adjusted R-Squared

While R-squared is a common metric for OLS, its application to ODR is problematic. R-squared is based on the idea of explaining the variance in the y-variable, assuming the x-variable is known without error. This assumption is explicitly violated in ODR. Calculating a pseudo R-squared value for ODR can be done, but it often lacks a clear statistical interpretation and can be insensitive to the order of the variables (swapping x and y). For these reasons, **R-squared and adjusted R-squared are generally not recommended** for evaluating ODR results.

### Information Criteria (AIC and BIC)

**Recommended for Model Comparison.**

The Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) are excellent tools for **comparing different models** fit to the same dataset. They do not tell you if a single model is a "good fit" in an absolute sense, but they can tell you which model is better relative to others.

Both criteria work by balancing model fit with model complexity (the number of parameters, $p$).

When comparing two or more models, the one with the **lower AIC or BIC value is considered better**. It provides the most explanatory power for the least complexity.

Formulas for AIC and BIC, see [here](aic-and-bic.ipynb). Maximum log-likelihood derivation for the case of ODR that is necessary for calculation of AIC and BIC, you can find below.

### Parameter Standard Errors

**Parameter Standard Errors:** The ODR output will provide the best-fit parameters along with their standard errors (e.g., `beta_std` in `scipy.odr`). A large standard error relative to the parameter value indicates that the parameter is not well-constrained by the data, which can be a sign of a poor model or insufficient data.

### Summary

| Metric | Applicability to ODR | Purpose & Interpretation |
| :--- | :--- | :--- |
| **Reduced Chi-Squared ($\chi^2_{red}$)** | **Highly Recommended** | **Primary indicator of fit quality.** A value near 1.0 indicates a good fit where the model and error estimates are consistent. |
| **Chi-Squared ($\chi^2$)** | **Recommended** | The raw, minimized sum of weighted squared errors. Used to calculate $\chi^2_{red}$ and log-likelihood for AIC and BIC. |
| **AIC / BIC** | **Recommended** | **Model comparison.** Helps select the best model from a set of candidates. Lower values are better. |
| **Parameter Standard Errors** | **Essential** | **Assesses the certainty of the fit parameters.** Large errors indicate poorly determined parameters. |
| **R-Squared ($R^2$)** | **Not Recommended** | Fundamentally inconsistent with the ODR objective function. Provides a misleading measure of fit. |

## Log-Likelihood for ODR

For Orthogonal Distance Regression (ODR), the derivation of the log-likelihood function differs from Ordinary Least Squares (OLS) because ODR accounts for measurement errors in both the independent (x) and dependent (y) variables. More details about basis of log-likelihood see [here](aic-and-bic.ipynb).

The maximum likelihood approach for ODR is based on the assumption that the errors in both x and y are independent and normally distributed. For each data point $(x_i, y_i)$, we assume there exists a "true" point $(\hat{x}_i, \hat{y}_i)$ on the model curve, $y = f(x; \beta)$, where $\beta$ is the vector of model parameters. The observed data are subject to measurement errors.

The model is defined by:

$$x_i = \hat{x}_i + \delta_i$$
$$y_i = \hat{y}_i + \epsilon_i$$

where $\delta_i$ and $\epsilon_i$ are the measurement errors in the x and y coordinates, respectively. 

We assume these errors are independent and follow normal distributions with zero means and known variances:

$$\delta_i \sim N(0, \sigma_{x,i}^2)$$
$$\epsilon_i \sim N(0, \sigma_{y,i}^2)$$

The likelihood function for a single data point $(x_i, y_i)$ is the joint probability density of the observed values, given the true values and model parameters. Assuming the errors are normally distributed, this is:

$$L_i = f(x_i, y_i) = \frac{1}{2\pi \sigma_{x,i} \sigma_{y,i}} \exp \left( -\frac{(x_i - \hat{x}_i)^2}{2\sigma_{x,i}^2} - \frac{(y_i - \hat{y}_i)^2}{2\sigma_{y,i}^2} \right)$$

Since all observations are independent, the total likelihood function for a dataset of $n$ points is the product of the individual likelihoods:

$$L = \prod_{i=1}^n L_i = \prod_{i=1}^n \left[ \frac{1}{2\pi \sigma_{x,i} \sigma_{y,i}} \exp \left( -\frac{(x_i - \hat{x}_i)^2}{2\sigma_{x,i}^2} - \frac{(y_i - \hat{y}_i)^2}{2\sigma_{y,i}^2} \right) \right]$$


$$L = \left[ \prod_{i=1}^n \frac{1}{2\pi \sigma_{x,i} \sigma_{y,i}} \right] \exp \left( -\sum_{i=1}^n \left( \frac{(x_i - \hat{x}_i)^2}{2\sigma_{x,i}^2} + \frac{(y_i - \hat{y}_i)^2}{2\sigma_{y,i}^2} \right) \right)$$

The log-likelihood, which is easier to work with, is then:

$$\ln L = \ln \left[ \left[ \prod_{i=1}^n \frac{1}{2\pi \sigma_{x,i} \sigma_{y,i}} \right] \cdot \exp \left( -\sum_{i=1}^n \left( \frac{(x_i - \hat{x}_i)^2}{2\sigma_{x,i}^2} + \frac{(y_i - \hat{y}_i)^2}{2\sigma_{y,i}^2} \right) \right) \right]$$

$$\ln L = \ln \left[ \prod_{i=1}^n \frac{1}{2\pi \sigma_{x,i} \sigma_{y,i}} \right] + \ln \exp \left( -\sum_{i=1}^n \left( \frac{(x_i - \hat{x}_i)^2}{2\sigma_{x,i}^2} + \frac{(y_i - \hat{y}_i)^2}{2\sigma_{y,i}^2} \right) \right)$$

$$\ln L = \sum_{i=1}^n \ln \left( \frac{1}{2\pi \sigma_{x,i} \sigma_{y,i}} \right) - \sum_{i=1}^n \left( \frac{(x_i - \hat{x}_i)^2}{2\sigma_{x,i}^2} + \frac{(y_i - \hat{y}_i)^2}{2\sigma_{y,i}^2} \right)$$

$$\ln L = - \sum_{i=1}^n \ln \left( 2\pi \sigma_{x,i} \sigma_{y,i} \right) - \frac{1}{2} \sum_{i=1}^n \left( \frac{(x_i - \hat{x}_i)^2}{\sigma_{x,i}^2} + \frac{(y_i - \hat{y}_i)^2}{\sigma_{y,i}^2} \right)$$

$$\ln L = - \sum_{i=1}^n \ln \left( 2\pi \sigma_{x,i} \sigma_{y,i} \right) - \frac{1}{2} \chi^2$$

$$\ln L = - \sum_{i=1}^n \ln \left( 2\pi \right) - \sum_{i=1}^n \ln \left(\sigma_{x,i} \right) - \sum_{i=1}^n \ln \left(\sigma_{y,i} \right) - \frac{1}{2} \chi^2$$

$$\ln L = - n \ln \left( 2\pi \right) - \sum_{i=1}^n \ln \left(\sigma_{x,i} \right) - \sum_{i=1}^n \ln \left(\sigma_{y,i} \right) - \frac{1}{2} \chi^2$$

Maximizing of $L$ is equivalent to minimizing of $\chi^2$ (that is the purpose of the ODR) because all other terms in the aforementioned equation are already known.

If $\sigma_{x,i} = \sigma_x$ and $\sigma_{y,i} = \sigma_y$ for all $i$ values, then the aforementioned equation is simplified even more:

$$\ln L = - n \ln \left( 2\pi \right) - n \ln \left(\sigma_x \right) - n \ln \left(\sigma_y \right) - \frac{1}{2} \chi^2$$


## Additional Materials

* https://docs.scipy.org/doc/scipy/reference/odr.html#module-scipy.odr
* https://docs.scipy.org/doc/scipy/reference/generated/scipy.odr.Output.html#scipy.odr.Output
* https://stackoverflow.com/questions/21395328/how-to-estimate-goodness-of-fit-using-scipy-odr
* https://stats.stackexchange.com/questions/325581/odr-residual-variance-and-reduced-chi2-do-the-beta-uncertainties-represent-co
* https://stackoverflow.com/questions/41028846/how-to-compute-standard-error-from-odr-results