# CURVE_FIT

## Overview
The `CURVE_FIT` function fits a user-defined model function to data using non-linear least squares, leveraging SciPy's [`curve_fit` method](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html). It is ideal for regression, parameter estimation, and curve fitting directly in Excel.  If you want to fit several models to the same data this can be faster than setting up the Excel Solver for each model.  This is a simplified implementation, you can extend it to support additional features like bounds, constraints, and more complex models.

Non-linear least squares fitting seeks to find parameters $\theta$ that minimize the sum of squared residuals:

```math
S(\theta) = \sum_{i=1}^n (y_i - f(x_i, \theta))^2
```

where $f(x_i, \theta)$ is the model function, $x_i$ are the input data, and $y_i$ are the observed values. SciPy's `curve_fit` uses algorithms such as Trust Region Reflective (TRF), Dogleg (dogbox), and Levenberg-Marquardt (LM) to solve this optimization problem.

This example function is provided as-is without any representation or warranty of accuracy.

## Usage
To use the function in Excel:

```excel
=CURVE_FIT(model, xdata, ydata, [p_zero])
```
  - `model` (string, required): Model function as a string, e.g., "a * x + b".
  - `xdata` (2D list, required): Input x values.
  - `ydata` (2D list, required): Observed y values.
  - `p_zero` (2D list, optional): Initial parameter guesses.

The function returns the fitted parameter values as a single row 2D list, or an error message string if the fit fails.

## Examples

### Fitting a Straight Line (y = a * x + b)

**Sample Input Data:**

| x | y |
|---|---|
| 1 | 2 |
| 2 | 4 |
| 3 | 6 |

```excel
=CURVE_FIT("a * x + b", {1;2;3}, {2;4;6}, {1,1})
```
**Sample Output:**

| a   | b   |
|-----|-----|
| 2.0 | 0.0 |

### Fitting an Exponential Model (y = a * exp(b * x))

**Sample Input Data:**

| x | y  |
|---|----|
| 1 | 2.7|
| 2 | 7.4|
| 3 | 20.1|

```excel
=CURVE_FIT("a * exp(b * x)", {1;2;3}, {2.7;7.4;20.1}, {1,1})
```
**Sample Output:**

| a   | b   |
|-----|-----|
| 1.0 | 1.0 |

### Fitting a Decaying Exponential Model (y = a * exp(-b * x) + c)

**Sample Input Data:**

| x | y  |
|---|----|
| 0 | 3.4|
| 1 | 2.7|
| 2 | 1.6|
| 3 | 1.1|
| 4 | 0.7|
| 5 | 0.6|

```excel
=CURVE_FIT("a * exp(-b * x) + c", {0;1;2;3;4;5}, {3.4;2.7;1.6;1.1;0.7;0.6}, {2,1,0.5})
```
**Sample Output:**

| a   | b   | c   |
|-----|-----|-----|
| 2.0 | 1.0 | 0.5 |

## Limitations
- The model string must use `x` as the independent variable and parameter names (e.g., `a`, `b`).
- The number of initial guesses (if provided) must match the number of parameters in the model.
- If the fit fails, an error message is returned as a string.
- Only methods supported by SciPy's `curve_fit` are allowed (`trf`, `dogbox`, `lm`).

In [None]:
import numpy as np
from scipy.optimize import curve_fit as scipy_curve_fit
import math
SAFE_GLOBALS = {k: getattr(math, k) for k in dir(math) if not k.startswith("_")}
SAFE_GLOBALS["np"] = np
SAFE_GLOBALS["numpy"] = np
SAFE_GLOBALS["exp"] = np.exp
SAFE_GLOBALS["log"] = np.log
SAFE_GLOBALS["sin"] = np.sin
SAFE_GLOBALS["cos"] = np.cos
SAFE_GLOBALS["tan"] = np.tan
SAFE_GLOBALS["abs"] = abs
SAFE_GLOBALS["pow"] = pow

def curve_fit(model, xdata, ydata, p_zero=None):
    """
    Fits a model to data using scipy.optimize.curve_fit.

    Args:
        model (str): Model function as a string, e.g., "a * x + b"
        xdata (list[list[float]]): 2D list of x values
        ydata (list[list[float]]): 2D list of y values
        p_zero (list[list[float]], optional): 2D list of initial parameter guesses

    Returns:
        list[list[float]]: Fitted parameter values as a single row, or error message string

    This example function is provided as-is without any representation or warranty of accuracy.
    """
    try:
        x = np.array(xdata).flatten()
        y = np.array(ydata).flatten()
        import re
        param_names = re.findall(r'\b[a-zA-Z_]\w*\b', model)
        param_names = [name for name in param_names if name not in ("x", "exp", "log", "sin", "cos", "tan", "abs", "pow")]
        param_names = list(dict.fromkeys(param_names))
        n_params = len(param_names)
        if p_zero is not None:
            p_zero_arr = np.array(p_zero).flatten()
            if len(p_zero_arr) != n_params:
                return f"Number of initial guesses (p_zero) does not match number of parameters in model: {param_names}"
        else:
            p_zero_arr = None
        def model_func(x, *params):
            local_dict = dict(zip(param_names, params))
            local_dict["x"] = x
            try:
                return eval(model, SAFE_GLOBALS, local_dict)
            except Exception as e:
                return f"Model evaluation error: {e}"
        popt, _ = scipy_curve_fit(model_func, x, y, p0=p_zero_arr, maxfev=10000)
        # Round fitted parameters to 0.01
        popt_rounded = np.round(popt, 2)
        return [popt_rounded.tolist()]
    except Exception as e:
        return str(e)

In [None]:
%pip install -q ipytest
import ipytest
ipytest.autoconfig()

demo_cases = [
    ["a * x + b", [[1], [2], [3]], [[2.1], [3.8], [6.2]], [[1, 1]]],
    ["a * exp(b * x)", [[1], [2], [3]], [[2.5], [7.8], [19.5]], [[1, 1]]],
    ["a * exp(-b * x) + c", [[0], [1], [2], [3], [4], [5]], [[3.4], [2.7], [1.6], [1.1], [0.7], [0.6]], [[2, 1, 0.5]]]
]

def is_valid_type(val):
    if isinstance(val, (float, bool, str)):
        return True
    if isinstance(val, list):
        return all(isinstance(row, list) and all(isinstance(x, (float, bool, str)) for x in row) for row in val)
    return False

import pytest
@pytest.mark.parametrize("model, xdata, ydata, p_zero", demo_cases)
def test_demo_cases(model, xdata, ydata, p_zero):
    result = curve_fit(model, xdata, ydata, p_zero)
    print(f"test_demo_cases output for {model}: {result}")
    assert is_valid_type(result), f"Output type is not valid. Got: {type(result)} Value: {result}"

def test_invalid_param_count():
    result = curve_fit("a * x + b", [[1], [2], [3]], [[2], [4], [6]], [[1]])
    print(f"test_invalid_param_count output: {result}")
    assert isinstance(result, str) and "does not match" in result

ipytest.run('-s')

In [None]:
import gradio as gr
import matplotlib.pyplot as plt
import io
import base64

def curve_fit_with_plot(model, xdata, ydata, p_zero=None):
    result = curve_fit(model, xdata, ydata, p_zero)
    # Prepare plot
    fig, ax = plt.subplots()
    x = np.array(xdata).flatten()
    y = np.array(ydata).flatten()
    ax.scatter(x, y, label="Data", color="blue")
    if isinstance(result, list) and isinstance(result[0], list):
        # Try to plot fitted curve
        try:
            import re
            param_names = re.findall(r'\b[a-zA-Z_]\w*\b', model)
            param_names = [name for name in param_names if name not in ("x", "exp", "log", "sin", "cos", "tan", "abs", "pow")]
            param_names = list(dict.fromkeys(param_names))
            params = result[0]
            x_fit = np.linspace(np.min(x), np.max(x), 100)
            local_dict = dict(zip(param_names, params))
            local_dict["x"] = x_fit
            y_fit = eval(model, SAFE_GLOBALS, local_dict)
            ax.plot(x_fit, y_fit, label="Fitted", color="red")
            ax.legend()
        except Exception as e:
            ax.text(0.5, 0.5, f"Plot error: {e}", ha='center')
    else:
        ax.text(0.5, 0.5, "Fit failed", ha='center')
    ax.set_xlabel("x")
    ax.set_ylabel("y")
    buf = io.BytesIO()
    plt.savefig(buf, format="png")
    plt.close(fig)
    buf.seek(0)
    img_base64 = base64.b64encode(buf.read()).decode("utf-8")
    html = f'<img src="data:image/png;base64,{img_base64}" style="max-width:100%;">'
    return result, html

demo = gr.Interface(
    fn=curve_fit_with_plot,
    inputs=[
        gr.Textbox(label="Model (Python expression, e.g. a * x + b)", value=demo_cases[0][0]),
        gr.Dataframe(label="x data", headers=["x"], type="array", row_count=3, col_count=1, value=demo_cases[0][1]),
        gr.Dataframe(label="y data", headers=["y"], type="array", row_count=3, col_count=1, value=demo_cases[0][2]),
        gr.Dataframe(label="Initial parameter guesses (row, optional)", headers=["a", "b", "c"], type="array", row_count=1, col_count=3, value=demo_cases[0][3])
    ],
    outputs=[
        gr.Dataframe(label="Fitted Parameters (row)", headers=["a", "b", "c"], type="array"),
        gr.HTML(label="Fit Plot")
    ],
    examples=demo_cases,
    description="Fit a mathematical model to your data using non-linear least squares. Enter your model as a Python expression (e.g., `a * x + b`), provide your data, and set initial guesses for the parameters (optional). Use the demo examples below to see typical use cases.",
    flagging_mode="never",
    fill_width=True,
)
demo.launch()

# scipy.optimize.curve_fit — SciPy v1.15.3 Manual

**curve_fit(f, xdata, ydata, p0=None, sigma=None, absolute_sigma=False, check_finite=None, bounds=(-inf, inf), method=None, jac=None, *, full_output=False, nan_policy=None, \*\*kwargs)**

Use non-linear least squares to fit a function, f, to data.

Assumes `ydata = f(xdata, *params) + eps`.

## Parameters
- **f : callable**
  The model function, f(x, …). It must take the independent variable as the first argument and the parameters to fit as separate remaining arguments.
- **xdata : array_like**
  The independent variable where the data is measured. Should usually be an M-length sequence or an (k,M)-shaped array for functions with k predictors, and each element should be float convertible if it is an array like object.
- **ydata : array_like**
  The dependent data, a length M array - nominally `f(xdata, ...)`.
- **p0 : array_like, optional**
  Initial guess for the parameters (length N). If None, then the initial values will all be 1 (if the number of parameters for the function can be determined using introspection, otherwise a ValueError is raised).
- **sigma : None or scalar or M-length sequence or MxM array, optional**
  Determines the uncertainty in ydata. If we define residuals as `r = ydata - f(xdata, *popt)`, then the interpretation of sigma depends on its number of dimensions:
  - A scalar or 1-D sigma should contain values of standard deviations of errors in ydata. In this case, the optimized function is `chisq = sum((r / sigma) ** 2)`.
  - A 2-D sigma should contain the covariance matrix of errors in ydata. In this case, the optimized function is `chisq = r.T @ inv(sigma) @ r`.
  - None (default) is equivalent of 1-D sigma filled with ones.
- **absolute_sigma : bool, optional**
  If True, sigma is used in an absolute sense and the estimated parameter covariance pcov reflects these absolute values. If False (default), only the relative magnitudes of the sigma values matter. The returned parameter covariance matrix pcov is based on scaling sigma by a constant factor. This constant is set by demanding that the reduced chisq for the optimal parameters popt when using the scaled sigma equals unity. In other words, sigma is scaled to match the sample variance of the residuals after the fit. Default is False. Mathematically, `pcov(absolute_sigma=False) = pcov(absolute_sigma=True) * chisq(popt)/(M-N)`
- **check_finite : bool, optional**
  If True, check that the input arrays do not contain nans of infs, and raise a ValueError if they do. Setting this parameter to False may silently produce nonsensical results if the input arrays do contain nans. Default is True if nan_policy is not specified explicitly and False otherwise.
- **bounds : 2-tuple of array_like or Bounds, optional**
  Lower and upper bounds on parameters. Defaults to no bounds. There are two ways to specify the bounds:
  - Instance of Bounds class.
  - 2-tuple of array_like: Each element of the tuple must be either an array with the length equal to the number of parameters, or a scalar (in which case the bound is taken to be the same for all parameters). Use `np.inf` with an appropriate sign to disable bounds on all or some parameters.
- **method : {'lm', 'trf', 'dogbox'}, optional**
  Method to use for optimization. See least_squares for more details. Default is 'lm' for unconstrained problems and 'trf' if bounds are provided. The method 'lm' won’t work when the number of observations is less than the number of variables, use 'trf' or 'dogbox' in this case.
- **jac : callable, string or None, optional**
  Function with signature `jac(x, ...)` which computes the Jacobian matrix of the model function with respect to parameters as a dense array_like structure. It will be scaled according to provided sigma. If None (default), the Jacobian will be estimated numerically. String keywords for ‘trf’ and ‘dogbox’ methods can be used to select a finite difference scheme, see least_squares.
- **full_output : boolean, optional**
  If True, this function returns additional information: infodict, mesg, and ier.
- **nan_policy : {'raise', 'omit', None}, optional**
  Defines how to handle when input contains nan. The following options are available (default is None):
  - 'raise': throws an error
  - 'omit': performs the calculations ignoring nan values
  - None: no special handling of NaNs is performed (except what is done by check_finite); the behavior when NaNs are present is implementation-dependent and may change.
- **\*\*kwargs**
  Keyword arguments passed to leastsq for `method='lm'` or least_squares otherwise.

## Returns
- **popt : array**
  Optimal values for the parameters so that the sum of the squared residuals of `f(xdata, *popt) - ydata` is minimized.
- **pcov : 2-D array**
  The estimated approximate covariance of popt. The diagonals provide the variance of the parameter estimate. To compute one standard deviation errors on the parameters, use `perr = np.sqrt(np.diag(pcov))`. Note that the relationship between cov and parameter error estimates is derived based on a linear approximation to the model function around the optimum [1]. When this approximation becomes inaccurate, cov may not provide an accurate measure of uncertainty.

  How the sigma parameter affects the estimated covariance depends on absolute_sigma argument, as described above.

  If the Jacobian matrix at the solution doesn’t have a full rank, then ‘lm’ method returns a matrix filled with `np.inf`, on the other hand ‘trf’ and ‘dogbox’ methods use Moore-Penrose pseudoinverse to compute the covariance matrix. Covariance matrices with large condition numbers (e.g. computed with numpy.linalg.cond) may indicate that results are unreliable.
- **infodict : dict (returned only if full_output is True)**
  a dictionary of optional outputs with the keys:
  - `nfev`: The number of function calls. Methods ‘trf’ and ‘dogbox’ do not count function calls for numerical Jacobian approximation, as opposed to ‘lm’ method.
  - `fvec`: The residual values evaluated at the solution, for a 1-D sigma this is `(f(x, *popt) - ydata)/sigma`.
  - `fjac`: A permutation of the R matrix of a QR factorization of the final approximate Jacobian matrix, stored column wise. Together with ipvt, the covariance of the estimate can be approximated. Method ‘lm’ only provides this information.
  - `ipvt`: An integer array of length N which defines a permutation matrix, p, such that fjac*p = q*r, where r is upper triangular with diagonal elements of nonincreasing magnitude. Column j of p is column ipvt(j) of the identity matrix. Method ‘lm’ only provides this information.
  - `qtf`: The vector (transpose(q) * fvec). Method ‘lm’ only provides this information.
- **mesg : str (returned only if full_output is True)**
  A string message giving information about the solution.
- **ier : int (returned only if full_output is True)**
  An integer flag. If it is equal to 1, 2, 3 or 4, the solution was found. Otherwise, the solution was not found. In either case, the optional output variable mesg gives more information.

## Raises
- **ValueError**
  if either ydata or xdata contain NaNs, or if incompatible options are used.
- **RuntimeError**
  if the least-squares minimization fails.
- **OptimizeWarning**
  if covariance of the parameters can not be estimated.

## See also
- **least_squares**: Minimize the sum of squares of nonlinear functions.
- **scipy.stats.linregress**: Calculate a linear least squares regression for two sets of measurements.

## Notes
Users should ensure that inputs xdata, ydata, and the output of f are `float64`, or else the optimization may return incorrect results.

With `method='lm'`, the algorithm uses the Levenberg-Marquardt algorithm through leastsq. Note that this algorithm can only deal with unconstrained problems.

Box constraints can be handled by methods ‘trf’ and ‘dogbox’. Refer to the docstring of least_squares for more information.

Parameters to be fitted must have similar scale. Differences of multiple orders of magnitude can lead to incorrect results. For the ‘trf’ and ‘dogbox’ methods, the x_scale keyword argument can be used to scale the parameters.

## References
[1] K. Vugrin et al. Confidence region estimation techniques for nonlinear regression in groundwater flow: Three case studies. Water Resources Research, Vol. 43, W03423, DOI:10.1029/2005WR004804

## Examples
```python
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

def func(x, a, b, c):
    return a * np.exp(-b * x) + c

# Define the data to be fit with some noise:
xdata = np.linspace(0, 4, 50)
y = func(xdata, 2.5, 1.3, 0.5)
rng = np.random.default_rng()
y_noise = 0.2 * rng.normal(size=xdata.size)
ydata = y + y_noise
plt.plot(xdata, ydata, 'b-', label='data')

# Fit for the parameters a, b, c of the function func:
popt, pcov = curve_fit(func, xdata, ydata)
print(popt)
plt.plot(xdata, func(xdata, *popt), 'r-',
         label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))

# Constrain the optimization to the region of 0 <= a <= 3, 0 <= b <= 1 and 0 <= c <= 0.5:
popt, pcov = curve_fit(func, xdata, ydata, bounds=(0, [3., 1., 0.5]))
print(popt)
plt.plot(xdata, func(xdata, *popt), 'g--',
         label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()

# For reliable results, the model func should not be overparametrized; redundant parameters can cause unreliable covariance matrices and, in some cases, poorer quality fits. As a quick check of whether the model may be overparameterized, calculate the condition number of the covariance matrix:
np.linalg.cond(pcov)

# If, however, we were to add a fourth parameter d to func with the same effect as a:
def func2(x, a, b, c, d):
    return a * d * np.exp(-b * x) + c  # a and d are redundant
popt, pcov = curve_fit(func2, xdata, ydata)
np.linalg.cond(pcov)
np.diag(pcov)

# If the optimal parameters of f differ by multiple orders of magnitude, the resulting fit can be inaccurate. Sometimes, curve_fit can fail to find any results:
ydata = func(xdata, 500000, 0.01, 15)
try:
    popt, pcov = curve_fit(func, xdata, ydata, method = 'trf')
except RuntimeError as e:
    print(e)

# If parameter scale is roughly known beforehand, it can be defined in x_scale argument:
popt, pcov = curve_fit(func, xdata, ydata, method = 'trf', x_scale = [1000, 1, 1])
print(popt)
```

![curve_fit example plot](https://docs.scipy.org/doc/scipy/_images/scipy-optimize-curve_fit-1_00_00.png)

For more details, see the [official documentation](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html).