# Worksheet 5-1: Nonlinear Regression

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/WCC-Engineering/ENGR240/blob/main/Class%20Demos%20and%20Activities/Week%205/Worksheet%205-1_template%20nonlinear%20regression.ipynb)

## Overview

In this worksheet, we'll explore three different approaches to fit nonlinear models to data:

1. **Linear regression on transformed data**: Converting a nonlinear model to linear form
2. **Nonlinear regression using scipy.optimize.minimize**: Direct optimization of the sum of squared residuals
3. **Nonlinear regression using scipy.optimize.curve_fit**: A more convenient wrapper for nonlinear regression

We'll apply these techniques to a bacteria growth rate model and compare their results and implementation complexity.

## The Model

We'll work with a bacterial growth rate model that describes how the growth rate depends on substrate concentration:

$$k = k_{max} \frac{c^2}{c_s + c^2}$$

where:
- $k$ is the growth rate (number/day)
- $c$ is the substrate concentration (mg/L)
- $k_{max}$ is the maximum possible growth rate
- $c_s$ is the half-saturation constant

Our goal is to find the values of $k_{max}$ and $c_s$ that best fit our experimental data.

In [None]:
# Import necessary libraries
import numpy as np
from scipy import optimize
import matplotlib.pyplot as plt

# Set plotting style
plt.style.use('seaborn-v0_8-whitegrid')

# Set random seed for reproducibility
np.random.seed(42)

## Experimental Data and Model Function

Let's start with our experimental measurements and define our model function:

In [None]:
# Substrate concentration (mg/L)
c = np.array([0.5, 0.8, 1.5, 2.5, 4.0])

# Growth rate (number/day)
k = np.array([1.0, 2.5, 5.1, 7.3, 9.1])

# Plot the experimental data
plt.figure(figsize=(10, 6))
plt.scatter(c, k, color='black', s=50, label='Experimental data')
plt.xlabel('Substrate Concentration (mg/L)')
plt.ylabel('Growth Rate (number/day)')
plt.title('Bacterial Growth Rate vs. Substrate Concentration')
plt.legend()
plt.grid(True)
plt.show()

# Define the model function
def kmodel(c, kmax, cs):
    """Bacterial growth rate model
    
    Parameters:
    -----------
    c : array_like
        Substrate concentration (mg/L)
    kmax : float
        Maximum growth rate (number/day)
    cs : float
        Half-saturation constant
        
    Returns:
    --------
    k : array_like
        Growth rate (number/day)
    """
    return kmax * c**2 / (cs + c**2)

## Function to Calculate Fit Quality Metrics

To avoid repetition, let's define a function to calculate fit quality metrics:

In [None]:
def calculate_fit_metrics(c_data, k_data, kmax, cs):
    """Calculate fit quality metrics for the model.
    
    Parameters:
    -----------
    c_data : array_like
        Substrate concentration data
    k_data : array_like
        Growth rate data
    kmax : float
        Maximum growth rate parameter
    cs : float
        Half-saturation constant parameter
        
    Returns:
    --------
    dict
        Dictionary containing R², Syx, and other metrics
    """
    # Calculate predicted values
    k_pred = kmodel(c_data, kmax, cs)
    
    # Calculate residuals
    residuals = k_data - k_pred
    
    # Sum of Squared Residuals (Sr)
    Sr = np.sum(residuals**2)
    
    # Total Sum of Squares (St) - variation around the mean
    St = np.sum((k_data - np.mean(k_data))**2)
    
    # Coefficient of Determination (R²)
    r_squared = 1 - Sr/St
    
    # Standard Error of the Estimate (Syx)
    Syx = np.sqrt(Sr/(len(k_data)-2))
    
    return {
        'r_squared': r_squared,
        'Syx': Syx,
        'Sr': Sr,
        'St': St,
        'residuals': residuals,
        'predicted': k_pred
    }

## Task 1: Linear Regression with Transformed Data

We can transform our nonlinear model into a linear form that allows us to use simple linear regression.

### 1.1 Derive the Linear Transformation

Starting with our model: $k = k_{max} \frac{c^2}{c_s + c^2}$

Take the reciprocal of both sides:

$$\frac{1}{k} = \frac{c_s + c^2}{k_{max} \cdot c^2} = \frac{c_s}{k_{max} \cdot c^2} + \frac{1}{k_{max}}$$

This is now in the form of a linear equation: $Y = mX + b$ where:
- $Y = \frac{1}{k}$
- $X = \frac{1}{c^2}$
- $m = \frac{c_s}{k_{max}}$
- $b = \frac{1}{k_{max}}$

### 1.2 Implement the Linear Regression

In [None]:
# Transform the data
X = 1/c**2  # X = 1/c²
Y = 1/k     # Y = 1/k

# Perform linear regression using np.polyfit
p = np.polyfit(X, Y, 1)  # 1st-degree polynomial (straight line)

# Extract the slope (m) and intercept (b)
m, b = p

# Calculate kmax and cs from m and b
kmax1 = 1/b
cs1 = m * kmax1

# Calculate fit quality metrics
metrics1 = calculate_fit_metrics(c, k, kmax1, cs1)

print(f"Linear Regression Results:")
print(f"Parameters: kmax = {kmax1:.4f}, cs = {cs1:.4f}")
print(f"Fit Quality: R² = {metrics1['r_squared']:.4f}, Syx = {metrics1['Syx']:.4f}")

## Task 2: Nonlinear Regression using scipy.optimize.minimize

Instead of transforming our model, we can directly fit the nonlinear model to the data using optimization techniques. We'll use `scipy.optimize.minimize` to minimize the sum of squared residuals.

In [None]:
# Define the objective function (sum of squared residuals)
def objective_function(params, c_data, k_data):
    kmax, cs = params
    k_pred = kmodel(c_data, kmax, cs)
    residuals = k_data - k_pred
    return np.sum(residuals**2)

# Minimize the objective function
initial_guess = [1, 1]  # Initial guess for [kmax, cs]
result = optimize.minimize(objective_function, 
                          initial_guess, 
                          args=(c, k), 
                          method='Nelder-Mead',
                          tol=1e-8)

# Extract the optimized parameters
kmax2, cs2 = result.x

# Calculate fit quality metrics
metrics2 = calculate_fit_metrics(c, k, kmax2, cs2)

print(f"Nonlinear Regression Results (minimize):")
print(f"Optimization successful: {result.success}")
print(f"Parameters: kmax = {kmax2:.4f}, cs = {cs2:.4f}")
print(f"Fit Quality: R² = {metrics2['r_squared']:.4f}, Syx = {metrics2['Syx']:.4f}")

## Task 3: Nonlinear Regression using scipy.optimize.curve_fit

The `scipy.optimize.curve_fit` function provides a more convenient interface for fitting models to data. Under the hood, it uses optimization techniques similar to `minimize` but with a more user-friendly API.

In [None]:
# Use curve_fit to find the optimal parameters
popt, pcov = optimize.curve_fit(kmodel, c, k, p0=[1, 1])

# Extract the optimized parameters
kmax3, cs3 = popt

# Extract the parameter uncertainties
parameter_std_dev = np.sqrt(np.diag(pcov))

# Calculate fit quality metrics
metrics3 = calculate_fit_metrics(c, k, kmax3, cs3)

print(f"Nonlinear Regression Results (curve_fit):")
print(f"Parameters: kmax = {kmax3:.4f} ± {parameter_std_dev[0]:.4f}, cs = {cs3:.4f} ± {parameter_std_dev[1]:.4f}")
print(f"Fit Quality: R² = {metrics3['r_squared']:.4f}, Syx = {metrics3['Syx']:.4f}")

## Task 4: Compare All Three Methods

Let's visualize and compare the results of all three fitting methods.

In [None]:
# Create a comparison table
methods = ['Linear Regression (transformed)', 'Nonlinear Regression (minimize)', 'Nonlinear Regression (curve_fit)']
kmax_values = [kmax1, kmax2, kmax3]
cs_values = [cs1, cs2, cs3]
r_squared_values = [metrics1['r_squared'], metrics2['r_squared'], metrics3['r_squared']]
syx_values = [metrics1['Syx'], metrics2['Syx'], metrics3['Syx']]

# Print the comparison table
print("Comparison of Curve Fitting Methods:")
print("-" * 80)
print(f"{'Method':<30} {'kmax':<10} {'cs':<10} {'R²':<10} {'Syx':<10}")
print("-" * 80)
for i, method in enumerate(methods):
    print(f"{method:<30} {kmax_values[i]:<10.4f} {cs_values[i]:<10.4f} {r_squared_values[i]:<10.4f} {syx_values[i]:<10.4f}")
print("-" * 80)

# Create a plot comparing all three fits
plt.figure(figsize=(12, 8))

# Plot the experimental data
plt.scatter(c, k, color='black', s=80, label='Experimental data')

# Define a range of concentrations for plotting smooth curves
c_plot = np.linspace(0, 5, 100)

# Plot the model fits
plt.plot(c_plot, kmodel(c_plot, kmax1, cs1), 'r-', linewidth=2, 
         label=f'Linear regression: kmax={kmax1:.2f}, cs={cs1:.2f}, R²={metrics1["r_squared"]:.4f}')
plt.plot(c_plot, kmodel(c_plot, kmax2, cs2), 'b--', linewidth=2, 
         label=f'Nonlinear (minimize): kmax={kmax2:.2f}, cs={cs2:.2f}, R²={metrics2["r_squared"]:.4f}')
plt.plot(c_plot, kmodel(c_plot, kmax3, cs3), 'g-.', linewidth=2, 
         label=f'Nonlinear (curve_fit): kmax={kmax3:.2f}, cs={cs3:.2f}, R²={metrics3["r_squared"]:.4f}')

# Add labels and title
plt.xlabel('Substrate Concentration (mg/L)', fontsize=12)
plt.ylabel('Growth Rate (number/day)', fontsize=12)
plt.title('Comparison of Three Curve Fitting Approaches', fontsize=14)
plt.legend(fontsize=11)
plt.grid(True)
plt.tight_layout()
plt.show()

## Task 5: Reflection and Discussion

Answer the following questions based on your observations:

1. **Method Comparison**: How do the results of the three methods compare? Which method(s) produced the best fit (highest R², lowest Syx)? Why do you think methods 2 and 3 (minimize and curve_fit) produce identical or nearly identical results?

2. **Linear Transformation Limitations**: What are the potential limitations or drawbacks of using a linear transformation for nonlinear models? Why might the nonlinear methods perform better in this case?

3. **Practical Application**: The `curve_fit` method provided uncertainty estimates for the parameters (standard deviations). Why is parameter uncertainty important in real-world applications? How would you describe the relationship between `curve_fit` and `minimize`?

## Key Takeaways

- Linear transformation can provide a simple approach but may not always yield the best fit
- Direct nonlinear optimization techniques often provide better fits for nonlinear models
- `curve_fit` offers the same optimization power as `minimize` but with a more convenient interface and additional benefits like parameter uncertainty estimation
- When working with nonlinear models, it's important to evaluate fit quality using appropriate metrics (R², Syx) on the original data