### A Path Towards Robust Autonomous Driving: Differentiable Planning within a Learned World Model

#### 1. The Future Capability: Autonomous Agents with "Machine Common Sense"

Current AI, while powerful in pattern recognition, lacks the "common sense" that animals and humans use to navigate the physical world. This is most evident in autonomous driving (AD), where "long-tail" scenarios—unpredictable, novel events not seen in training data—remain the final barrier to full autonomy.

Our 20-year vision is to create an **Embodied Adaptive Agent** capable of robust reasoning and planning under profound uncertainty. This agent will not rely on a brittle, rule-based system. Instead, it will possess a deep, predictive understanding of physics and human intent, learned largely from observation.

**Application Scenario: The "Mode-2" Critical Decision**

Imagine an AD agent approaching an intersection. A pedestrian is at the curb, partially obscured. The agent’s sensors are uncertain. A "reactive" (Mode-1) system might brake harshly or proceed with flawed confidence.

Our proposed agent operates in **"Mode-2" (deliberative planning)** . It uses its internal "World Model" to "imagine" multiple plausible futures simultaneously:
* *Future A:* The pedestrian remains stationary (70% probability).
* *Future B:* The pedestrian steps into the road (30% probability).

The agent’s task is not just to pick the *most likely* future, but to compute an **optimal, robust action** (e.g., a specific deceleration and swerve-path) that minimizes the *expected* or *worst-case* "cost" across *all* plausible futures. This capability to reason over simulated futures is the core of true machine intelligence.

#### 2. Involved Machine Learning Methodology

To build this agent, we adopt the cognitive architecture proposed by LeCun. This architecture is not a single model but a system of differentiable, trainable modules that work in concert.

* **1. The World Model (WM):**
    This is the agent's internal "simulator of the world". Its role is to predict future world states given a sequence of imagined actions.
    * **Data & Goal:** The WM is trained via **Self-Supervised Learning (SSL)** on massive, unlabeled video datasets. We will leverage SOTA (State-of-the-Art) **Generative World Models** such as `UniSim` or `Genie` from the `awesome-world-models-for-robots` repository. These models use **Latent Variables ($z$)** to represent unobserved factors (like human intent) and generate a *distribution* of plausible futures, not just one.
    * **Method:** This aligns with LeCun’s **JEPA (Joint Embedding Predictive Architecture)** concept, which learns abstract representations that are predictable, ignoring irrelevant details.

* **2. The Cost Module:**
    This module defines the agent's "goals" and "drives" by computing a scalar "energy" or "cost" for any given state. It has two parts:
    * **Intrinsic Cost (IC):** An immutable, hard-wired module that defines fundamental behaviors (e.g., "collision = high cost," "driving smoothly = low cost").
    * **Critic (TC):** A *trainable* module that learns to *predict* the future intrinsic cost.

* **3. The Actor :**
    The Actor’s job is to find the optimal sequence of actions $A = (a_1, a_2, ...)$.
    * **Method (Mode-2):** The Actor proposes an action sequence $A$ to the World Model. The WM predicts the future states $S = (s_1, s_2, ...)$. The Cost Module evaluates these states, returning a total cost. Because all modules are differentiable, the Actor can compute the **gradient of the cost with respect to its own actions**. It then uses gradient-based optimization to find the action sequence $A$ that **minimizes the predicted future cost**.

#### 3. The "Modeled" First Step (A 4-Week Project)

Given the limited timeframe, we will not train a new WM. Instead, we will conduct a focused research project that **isolates and models the Actor-Cost planning loop**, leveraging a pre-trained SOTA World Model.

* **Problem Name:** "Robust Differentiable Planning for AD under Predictive Uncertainty."

* **Representation:** This project is a microcosm of the full vision. It demonstrates the *core reasoning loop* (Mode-2) by assuming the Perception and WM modules are already solved (i.e., we use a pre-trained model). Our novel contribution is the **mathematical modeling of the Cost function** to handle probabilistic futures.

* **Testability (Methodology):**
    1.  **WM Selection:** We will use a pre-trained generative model (e.g., `UniSim`) identified from the provided resource list.
    2.  **Scenario:** We will define a critical test scenario (e.g., an intersection) and use the WM to generate a set of $N$ plausible future trajectories for other agents $\{s_1, ..., s_N\}$ and their associated probabilities $\{p_1, ..., p_N\}$.
    3.  **Actor Modeling:** Our "Actor" will be a mathematically-defined trajectory (e.g., a B-spline) whose shape is controlled by a vector of optimizable parameters $\theta$.
    4.  **Cost Modeling (Our Contribution):** We will design and test a differentiable **Cost Function $C_{\text{total}}(\theta)$** that explicitly models risk:
        * $C_{\text{total}}(\theta) = C_{\text{efficiency}}(\theta) + \lambda \cdot C_{\text{risk}}(\theta)$
        * Where $C_{\text{efficiency}}$ rewards progress (e.g., target speed).
        * And $C_{\text{risk}}$ is our modeled risk function. We will explore two forms:
            1.  **Expected Cost:** $C_{\text{risk}} = \sum_{i=1}^{N} p_i \cdot \text{CollisionCost}(s_i, \theta)$
            2.  **Minimax (Worst-Case) Cost:** $C_{\text{risk}} = \max_{i} \{\text{CollisionCost}(s_i, \theta)\}$
    5.  **Solving:** We will use PyTorch or JAX to compute $\nabla_{\theta} C_{\text{total}}$ and use gradient descent to find the optimal trajectory parameters $\theta^*$ that define the safest path.

* **Tools:**
    * **Math:** Optimization Theory, Probabilistic Modeling.
    * **ML:** A pre-trained SOTA World Model, Automatic Differentiation (PyTorch/JAX).