### **1. The Optimization Problem**
We aim to minimize the **sum of squared residuals**:

$
\min_x \|r(x)\|^2 = \min_x \sum_{i=1}^N r_i(x)^2
$

Where:
- $x $: The parameter we are optimizing.
- $r_i(x) $: The residual for the $i $-th observation, which is a function of $x $.

---

### **2. Taylor Expansion**
The residual function $r_i(x) $ is generally nonlinear. To simplify optimization, we approximate $r_i(x) $ around the current estimate $x_k $ using a first-order Taylor expansion:

$
r_i(x) \approx r_i(x_k) + J_i(x_k)(x - x_k)
$

Where:
- $r_i(x_k) $: The residual at the current estimate $x_k $.
- $J_i(x_k) = \frac{\partial r_i}{\partial x} $: The Jacobian of the residual with respect to $x $.

---

### **3. Objective Function Approximation**
The cost function can then be approximated (locally) as:

$
\|r(x)\|^2 \approx \|r(x_k) + J(x_k)(x - x_k)\|^2
$

Where:
- $r(x) $ is the vector of all residuals.
- $J(x) $ is the Jacobian matrix, containing the derivatives of all residuals with respect to $x $.

Expanding this:
$
\|r(x_k) + J(x_k)(x - x_k)\|^2 = \|r(x_k)\|^2 + 2(x - x_k)^T J(x_k)^T r(x_k) + (x - x_k)^T J(x_k)^T J(x_k)(x - x_k)
$

---

### **4. Minimizing the Quadratic Approximation**
To minimize this quadratic approximation, take the derivative with respect to $x $ and set it to zero:

$
\nabla \|r(x)\|^2 = 2 J(x_k)^T r(x_k) + 2 J(x_k)^T J(x_k)(x - x_k) = 0
$

Simplifying:
$
J(x_k)^T J(x_k)(x - x_k) = -J(x_k)^T r(x_k)
$

Solve for the new $x $:
$
x_{k+1} = x_k - (J(x_k)^T J(x_k))^{-1} J(x_k)^T r(x_k)
$

This is the **Gauss-Newton update rule**.

---

### **5. Key Terms in Gauss-Newton Update**

- $J(x_k) $: The Jacobian of residuals with respect to parameters.
- $J(x_k)^T J(x_k) $: The approximate Hessian matrix of the cost function.
- $J(x_k)^T r(x_k) $: The gradient of the cost function.
- $(J(x_k)^T J(x_k))^{-1} J(x_k)^T r(x_k) $: The step direction to minimize the cost.

---

### **6. Limitations of Gauss-Newton**

- **Convergence Issues**: Gauss-Newton works well if the residuals are small and the cost function is close to quadratic. If the problem is highly nonlinear or has large residuals, it may fail to converge.
- **Hessian Approximation**: Gauss-Newton assumes the Hessian can be approximated as $J^T J $, which ignores second-order terms. For highly nonlinear problems, this can lead to inaccuracies.

---

### **7. Damped Gauss-Newton: Levenberg-Marquardt**

To address these issues, Ceres Solver often uses the **Levenberg-Marquardt algorithm**, which combines Gauss-Newton with a damping term:

$
(J^T J + \lambda I)(x_{k+1} - x_k) = -J^T r(x_k)
$

Where $\lambda $ controls the damping:
- If $\lambda $ is small, it behaves like Gauss-Newton.
- If $\lambda $ is large, it behaves more like gradient descent, ensuring stability.

---

### **In Summary**
- Ceres Solver typically enhances this with the **Levenberg-Marquardt method** for better convergence on highly nonlinear problems.



### **1. What is a Residual Block?**
A **residual block** represents an individual error term in the optimization problem. The goal of optimization is to minimize the sum of squared residuals, which is commonly written as:

$
\min_{\mathbf{x}} \sum_{i=1}^{N} \rho_i\left( \|r_i(\mathbf{x})\|^2 \right)
$

Where:
- $\mathbf{x} $ is the vector of parameters being optimized.
- $r_i(\mathbf{x}) $ is the **residual** of the $i $-th residual block.
- $\rho_i $ is an optional **robust loss function** (e.g., Huber loss), which reduces the influence of outliers. If no loss function is used, $\rho_i(z) = z $.

---

### **2. Parameter Block**
A **parameter block** refers to the set of variables that a residual block depends on. For instance, if a residual block depends on a single scalar variable $x $, then $x $ is the parameter block for that residual block.

In Ceres:
- A residual block computes $r_i(\mathbf{x}) $ (and its Jacobian with respect to $\mathbf{x} $).
- A parameter block holds the variables $\mathbf{x} $ that the residual block modifies during optimization.

---

### **Equation Example**

Let's consider a simple problem:

$
r(x) = x^2 - 4
$

Here:
- **Residual block**: Computes the residual $r(x) = x^2 - 4 $.
- **Parameter block**: $x $, which is the scalar value being optimized.

The optimization problem is:
$
\min_x \| r(x) \|^2 = \min_x (x^2 - 4)^2
$

---

### **In Code**

1. **Define the Cost Functor** (Residual Block):
```cpp
struct CostFunctor {
  template <typename T>
  bool operator()(const T* const x, T* residual) const {
    residual[0] = x[0] * x[0] - T(4.0); // r(x) = x^2 - 4
    return true;
  }
};
```



The declaration `const T* const x` is interpreted as follows:

1. **`const T*`**:
   - The pointer `x` points to an object of type `T` that is constant.
   - You cannot modify the value of the object that `x` points to via `x`.

2. **`const x`**:
   - The pointer `x` itself is constant, meaning you cannot change the value of the pointer `x` to point to another address.

Together, `const T* const x` means:
- The object being pointed to is constant, so its value cannot be changed.
- The pointer itself is constant, so it cannot be reassigned to point to a different address.

### Why is this used?
This declaration enforces immutability of both:
1. The pointer (`x`) itself.
2. The value(s) being pointed to by the pointer.

In the context of the `operator()` function in the `CostFunctor` struct:
- `const T* const x` ensures that the input parameter `x` cannot be accidentally modified within the function, either by changing what `x` points to or by modifying the contents of `x`.

### Summary
- The first `const` applies to the object being pointed to (`T`).
- The second `const` applies to the pointer itself.
- It enforces a strong guarantee that `x` and the data it points to remain immutable within the function.

Explanation:
- `const T* const x`: Pointer to the parameter block (the variable $x $).
- `T* residual`: Pointer to the residual (output of $r(x) $).

This is equivalent to $r(x) = x^2 - 4 $.

---

2. **Set Up the Parameter Block and Problem**:
```cpp
double initial_x = 5.0;  // Initial value of the parameter
double x = initial_x;    // Parameter block (modifiable during optimization)

Problem problem;  // Create a Ceres problem
```

Here:
- `x` is the **parameter block**, and its value will be modified by the solver.

---

3. **Add the Residual Block**:
```cpp
CostFunction* cost_function =
    new AutoDiffCostFunction<CostFunctor, 1, 1>();  // r(x) with 1 residual and 1 parameter
problem.AddResidualBlock(cost_function, nullptr, &x);  // Add to the problem
```

Explanation:
- `AutoDiffCostFunction<CostFunctor, 1, 1>()`: Defines the residual block, specifying:
- `1`: The number of residuals (output of the cost function). Here, there is **1 residual**.
- `1`: The number of parameters (input to the cost function). Here, the optimization variable (`x`) has **1 parameter**.
- `nullptr`: This is for the loss function, No loss function ($\rho_i(z) = z $). If you want robust error handling (e.g., to down-weight outliers), you would pass a `LossFunction` here. Passing `nullptr` means no loss function is used (default least squares).
- `&x`: The pointer to the parameter block being optimized. Ceres modifies the value of `x` during optimization to minimize the residual.

---

### **3. Solver Minimization**

The Ceres solver minimizes the sum of squared residuals:

$
\min_x \|r(x)\|^2 = \min_x (x^2 - 4)^2
$

This is achieved by:
```cpp
Solver::Options options;
options.minimizer_progress_to_stdout = true;

Solver::Summary summary;
Solve(options, &problem, &summary);

std::cout << "Initial x: " << initial_x << ", Optimized x: " << x << "\n";
```

The solver modifies `x` iteratively to minimize $(x^2 - 4)^2 $, finding the optimal $x $ that minimizes the error.

---

### **Summary: Residual Block vs Parameter Block**

| Concept            | Mathematical Meaning                                | Code Representation           |
|--------------------|----------------------------------------------------|--------------------------------|
| **Residual Block** | $r(x) = x^2 - 4 $                               | `AutoDiffCostFunction`        |
| **Parameter Block**| The variable $x $ being optimized               | `double x` passed by pointer  |
| **Loss Function**  | Optional $\rho_i $, reduces outlier influence   | `nullptr` (no loss function)  |

