## Automatic Differentiation Forward Mode Dual Number

The concept of a **dual number**, expressed as:

\$
x + \epsilon x',
\$

is directly tied to how **automatic differentiation (AD)** works in **forward mode**. Here's how it relates to the explanation above:

---

## 1. Dual Number Representation and Its Connection

- \$ x \$ is the **real part**, which represents the **value** of the function at a given point.
- \$ x' \$ is the **infinitesimal part**, representing the **derivative** of the function with respect to a variable.
- \$ \epsilon \$ is a symbolic entity with the property \$ \epsilon^2 = 0 \$, which ensures that higher-order terms (e.g., \$ \epsilon^2 \$) do not contribute to the result. This is critical for capturing the **first-order derivative** only.

The core idea of dual numbers is that they propagate **values and derivatives together** through arithmetic operations, respecting the chain rule. This approach is mirrored in how **operator overloading** is implemented in the `Dual` or `Jet` classes.

---






**forward-mode automatic differentiation (AD)** using **dual numbers** can be implemented in C++ to compute partial derivatives of a function

\$
f(a,b) = b \sin(a) + b^2.
\$

We'll define a small dual-number class and show how each arithmetic and trigonometric operation is overloaded to propagate derivatives automatically. Then we’ll walk through how to get \$\frac{\partial f}{\partial a}\$ and \$\frac{\partial f}{\partial b}\$ using **two forward passes**.

---

## 2. Define a Dual Number

A **dual number** can be thought of as a pair \$(\text{val}, \text{dval})\$. 
- \$\text{val}\$ is the function value.
- \$\text{dval}\$ is the derivative of that value **with respect to a chosen variable**.

In forward-mode, we set up these dual numbers so that \$ \text{dval} = 1 \$ for the variable we care about, and \$ 0 \$ for all other variables.

```cpp
struct Dual {
    double val;   // the actual value
    double dval;  // the derivative wrt the chosen variable

    // Constructor
    Dual(double v = 0.0, double dv = 0.0)
        : val(v), dval(dv) {}
};
```

Overload operator+ (Dual + Dual)
```cpp
Dual operator+(const Dual& x, const Dual& y) {
    // (x.val + y.val, x.dval + y.dval)
    return Dual(x.val + y.val, x.dval + y.dval);
}

// Overload operator+ (Dual + double)
Dual operator+(const Dual& x, double c) {
    // (x.val + c, x.dval + 0)
    return Dual(x.val + c, x.dval);
}

// Overload operator+ (double + Dual)
Dual operator+(double c, const Dual& x) {
    // same as above
    return x + c;
}

// Overload operator* (Dual * Dual)
Dual operator*(const Dual& x, const Dual& y) {
    // derivative of product: d/dv (x*y) = x'*y + x*y'
    return Dual(x.val * y.val,
                x.dval * y.val + x.val * y.dval);
}

// Overload operator* (Dual * double)
Dual operator*(const Dual& x, double c) {
    // (x.val * c, x.dval * c)
    return Dual(x.val * c, x.dval * c);
}

// Overload operator* (double * Dual)
Dual operator*(double c, const Dual& x) {
    return x * c;
}

// Overload sin(Dual)
Dual sin(const Dual& x) {
    // val = sin(x.val)
    // derivative = cos(x.val) * x.dval
    return Dual(std::sin(x.val),
                std::cos(x.val) * x.dval);
}
```
Overload cos(Dual) if you need it, similarly overload other functions as needed.


---

## 3. Define Your Function Using Duals

Given: 
\$
f(a,b) = b \sin(a) + b^2.
\$

we can write a templated function function `f()` that takes two `Dual` numbers and returns their combination as another `Dual`.

```cpp
template <typename T> T f(const T &A, const T &B) {
  return B * sin(A) + B * B; // b*sin(a) + b^2
}
```

---

## 4. Compute Partial Derivatives (Forward-Mode)

### 4.1 Partial w.r.t. a

In forward-mode, to get \$\frac{\partial f}{\partial a}\$:

1. Create `Dual A(a, 1.0)` — meaning:  
   - \$ \text{val} = a \$  
   - \$ \text{dval} = 1.0 \$ (we are differentiating w.r.t. \$a\$)
2. Create `Dual B(b, 0.0)` — meaning:  
   - \$ \text{val} = b \$  
   - \$ \text{dval} = 0.0 \$ (since in this pass, we do **not** care about derivative wrt \$b\$)

3. Evaluate `f(A, B)` as a dual number.  
   - The `.val` part of the returned dual is \$ f(a,b) \$.  
   - The `.dval` part is \$\frac{\partial f}{\partial a}\$.

#### Example Code

```cpp
// Suppose we have some numeric a, b
double a = 1.0;
double b = 2.0;

// ---- Partial derivative w.r.t. a ----
// A has dval=1, B has dval=0
Dual A(a, 1.0);
Dual B(b, 0.0);

Dual fab = f(A, B);

std::cout << "f(a,b)       = " << fab.val << std::endl;
std::cout << "df/da(a,b)   = " << fab.dval << std::endl;
```

Under the hood, the expression `B * sin(A) + B*B` is expanded **with chain rule** to produce the correct derivative in `fab.dval`. For instance:

1. `sin(A)`  
   - \$\text{val} = \sin(a)\$  
   - \$\text{dval} = \cos(a) \times 1.0\$  (since A’s dval=1)

2. `B * sin(A)`  
   - \$\text{val} = b \times \sin(a)\$  
   - \$\text{dval} = (B.dval \times \sin(A).val) + (B.val \times \sin(A).dval)\$  
   - But \$B.dval=0\$, so derivative wrt a is simply \$b \times \cos(a)\$.

3. `B * B`  
   - \$\text{val} = b^2\$  
   - \$\text{dval} = 0 \times b + b \times 0 = 0\$ (no dependence on \$a\$ in this pass)

4. Add them up:  
   - \$\text{val} = b\,\sin(a) + b^2\$  
   - \$\text{dval} = b\,\cos(a) + 0 = b\,\cos(a)\$

Hence, if \$a=1\$ and \$b=2\$, we get:
- \$ f(1, 2) = 2 \cdot \sin(1) + 4\$.
- \$\frac{\partial f}{\partial a} \big\vert_{(1,2)} = 2 \cdot \cos(1)\$.

---

### 4.2 Partial w.r.t. b

Similarly, to compute \$\frac{\partial f}{\partial b}\$:

1. Create `Dual A(a, 0.0)`.
2. Create `Dual B(b, 1.0)`.
3. Evaluate `f(A, B)`.

```cpp
int main() {
    double a = 1.0;
    double b = 2.0;

    // ---- Partial derivative w.r.t. b ----
    // A has dval=0, B has dval=1
    Dual A(a, 0.0);
    Dual B(b, 1.0);

    Dual fab = f(A, B);

    std::cout << "f(a,b)       = " << fab.val << std::endl;
    std::cout << "df/db(a,b)   = " << fab.dval << std::endl;

    return 0;
}
```

Under the hood for derivative wrt b:

1. `sin(A)`  
   - \$\text{val} = \sin(a)\$, derivative wrt b is \$0\$ (since A has dval=0)
2. `B * sin(A)`  
   - \$\text{val} = b \sin(a)\$  
   - \$\text{dval} = (B.dval \times \sin(A).val) + (B.val \times \sin(A).dval)\$  
   - \$\sin(A).dval=0\$ wrt b, so derivative = \$1 * \sin(a) + 0 = \sin(a)\$.
3. `B * B`  
   - \$\text{val} = b^2\$  
   - \$\text{dval} = 1 \cdot b + b \cdot 1 = 2b\$.

Hence, 
- \$\text{val} = b\,\sin(a) + b^2\$
- \$\text{dval} = \sin(a) + 2b\$

At \$a=1\$ and \$b=2\$, 
- \$ f(1, 2) = 2\sin(1) + 4\$, 
- \$\frac{\partial f}{\partial b}\big\vert_{(1,2)} = \sin(1) + 4.\$

---



A typical pattern when you want **both partial derivatives** at a single point \$(a,b)\$ is:
1. Create `A(a, 1.0), B(b, 0.0)` to get \$\frac{\partial f}{\partial a}\$.  
2. Create `A(a, 0.0), B(b, 1.0)` to get \$\frac{\partial f}{\partial b}\$.


---


- **Forward-mode** is efficient if you need derivatives w.r.t. a small number of inputs but might be less efficient if the function has many inputs and few outputs (where **reverse-mode** often excels).
- a well-established C++ AD library ([autodiff](https://github.com/autodiff/autodiff), [adept](https://github.com/rjhogan/Adept-2), or the approach built into [Ceres Solver](https://github.com/ceres-solver/ceres-solver)).
- In real-world usage, you often combine forward-mode AD or reverse-mode AD with your code to compute Jacobians or gradients automatically, bypassing the need for manual derivative coding or finite differences.

## Ceres Jet Class


Ceres Solver's `Jet` class is conceptually similar to the `Dual` class we implemented above for forward-mode automatic differentiation. The `Jet` class carries both the **value** of the function and its **derivatives** with respect to the input variables, enabling Ceres to compute derivatives automatically during optimization.

Let's rewrite the example function \$ f(a, b) = b \cdot \sin(a) + b^2 \$ using Ceres's `Jet` class.

---

## 1. The Ceres `Jet` Class

The `Jet` class is defined as:

```cpp
template <typename T, int N>
struct Jet {
    T a;                // Value of the function.
    T v[N];             // Array storing partial derivatives w.r.t. N variables.
};
```

- `a` holds the value of the function.
- `v` is an array of derivatives, where `v[i]` represents \$\frac{\partial}{\partial x_i}\$.
- The `Jet` class overloads arithmetic operators (`+`, `*`, `sin`, etc.) to compute derivatives automatically using the chain rule.

---

## 2. Key Differences Between Dual Numbers and Ceres Jets

| **Feature**                  | **Dual Numbers** (Our Example) | **Ceres Jets**                 |
|-------------------------------|--------------------------------|---------------------------------|
| Derivative Storage            | Single derivative value (`dval`) | Array of derivatives (`v[N]`)  |
| Number of Differentiated Variables | One at a time                  | Arbitrary (up to \$N\$)         |
| Operators                     | Custom-defined                 | Provided by Ceres               |

---




## 4. Compute Derivatives with `Jet`

To compute \$\frac{\partial f}{\partial a}\$ and \$\frac{\partial f}{\partial b}\$ using `ceres::Jet`, follow these steps:

### 4.1 Setup `Jet` Variables
- Initialize `Jet` variables \$ a \$ and \$ b \$.
- Set the value (`a`) and derivatives (`v[]`) for each variable.

### 4.2 Evaluate the Function
- Call the function `f()` with `Jet` variables.
- The result is a `Jet` containing the function value and derivatives.

---

### Example Code

```cpp
#include <ceres/jet.h>
#include <iostream>
#include <cmath>

template <typename T>
T f(const T& a, const T& b) {
    return b * ceres::sin(a) + b * b;  // f(a, b) = b * sin(a) + b^2
}

int main() {
    using Jet = ceres::Jet<double, 2>;  // Jet for 2 variables (a, b)

    // Input values
    double a_val = 1.0;  // a = 1.0
    double b_val = 2.0;  // b = 2.0

    // Initialize Jets
    Jet a(a_val, 0);  // a = 1.0, derivative w.r.t. a is 1.0
    Jet b(b_val, 1);  // b = 2.0, derivative w.r.t. b is 1.0

    // Evaluate the function
    Jet result = f(a, b);

    // Output results
    std::cout << "f(a, b)       = " << result.a << std::endl;
    std::cout << "df/da(a, b)   = " << result.v[0] << std::endl;  // Derivative w.r.t. a
    std::cout << "df/db(a, b)   = " << result.v[1] << std::endl;  // Derivative w.r.t. b

    return 0;
}
```

---

### Explanation of the Code

1. **`Jet<double, 2>`**:
   - This specifies a dual number with two variables.
   - `v[0]` is the derivative w.r.t. \$a\$, and `v[1]` is the derivative w.r.t. \$b\$.

2. **Initialization**:
   - For \$a\$, `Jet a(a_val, 0)` sets the derivative w.r.t. \$a\$ to 1 (`v[0] = 1`) and \$b\$ to 0 (`v[1] = 0`).
   - For \$b\$, `Jet b(b_val, 1)` sets the derivative w.r.t. \$b\$ to 1 (`v[1] = 1`) and \$a\$ to 0 (`v[0] = 0`).

3. **Function Evaluation**:
   - The `f(a, b)` function propagates the value and derivatives through the computational graph automatically, leveraging `Jet`’s operator overloads.

4. **Output**:
   - `result.a`: The value of \$f(a, b)\$.
   - `result.v[0]`: \$\frac{\partial f}{\partial a}\$.
   - `result.v[1]`: \$\frac{\partial f}{\partial b}\$.

---

### Expected Output for \$a = 1.0\$, \$b = 2.0\$
At \$a = 1.0, b = 2.0\$:
\$
f(a, b) = b \cdot \sin(a) + b^2 = 2 \cdot \sin(1) + 4
\$
\$
\frac{\partial f}{\partial a} = b \cdot \cos(a) = 2 \cdot \cos(1)
\$
\$
\frac{\partial f}{\partial b} = \sin(a) + 2b = \sin(1) + 4
\$

Console output:
```
f(a, b)       = 5.68294
df/da(a, b)   = 1.0806
df/db(a, b)   = 4.84147
```

---

## 5. How This Relates to Ceres Cost Functions

In Ceres, the `Jet` class is used internally for **automatic differentiation**. You typically don’t work directly with `Jet` in optimization problems. Instead, you define a templated cost functor, and Ceres automatically instantiates it with `Jet` types during derivative computation.

### Example: Ceres Cost Function for \$f(a, b)\$

```cpp
struct CostFunctor {
    template <typename T>
    bool operator()(const T* const a, const T* const b, T* residual) const {
        residual[0] = (*b) * ceres::sin(*a) + (*b) * (*b);
        return true;
    }
};

// Usage in Ceres:
ceres::CostFunction* cost_function =
    new ceres::AutoDiffCostFunction<CostFunctor, 1, 1, 1>(new CostFunctor());
```

Here:
- `T` will be replaced with `Jet` during optimization to compute derivatives automatically.
- `AutoDiffCostFunction` handles the instantiation of `Jet`.

Ceres essentially does what we wrote manually above, but more efficiently and transparently.

---

By relating the `Jet` class to our `Dual` implementation, it becomes clear how forward-mode AD operates in Ceres. The core idea remains the same: propagate values and derivatives together, using operator overloads and the chain rule.