Skip to content

feat(implicit): add amortized estimation sections and evolution of estimation figure#88

Open
cnellington wants to merge 6 commits into
mainfrom
implicit-amortized-estimation
Open

feat(implicit): add amortized estimation sections and evolution of estimation figure#88
cnellington wants to merge 6 commits into
mainfrom
implicit-amortized-estimation

Conversation

@cnellington
Copy link
Copy Markdown
Collaborator

Also reverts a commit which itself reverts an accidental push to main with these changes 571a725.

Copilot AI review requested due to automatic review settings April 28, 2026 02:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new conceptual material to the “Implicit Adaptivity” chapter by introducing a theoretical bridge from explicit varying-coefficient estimators to differentiable implicit models, and by expanding the meta-learning section with an amortized estimation/context-encoder discussion (including a new figure).

Changes:

  • Renames/retitles the context-input subsection to frame it as a “Theoretical Bridge” and adds new mathematical exposition.
  • Expands the meta-learning section with an “Amortized Estimation: Context Encoders” subsection.
  • Adds a new figure (estimation_evolution.png) referenced from the new amortized estimation text.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 4 comments.

File Description
content/07.implicit.md Adds the theoretical-bridge math block, expands amortized estimation discussion, and embeds a new figure.
content/images/estimation_evolution.png New image used by the amortized estimation section.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread content/07.implicit.md Outdated
Comment on lines +20 to +26
The connection is explicit for differentiable models $g$. Consider the model $P(Y | X, C)$ as a varying-coefficient regression model. An explicit estimator for regression parameters will solve for the regression parameter map $\beta_i = f(c_i)$ through
$$\hat{f} = \text{argmin}_f \sum_i (y_i - x_i \cdot f(c_i))^2,$$
while a differentiable model (e.g. a neural network) will solve
$$\hat{\Phi} = \text{argmin}_\Phi \sum_i (y_i - g([x_i, c_i]; \Phi).$$
Under mild assumptions, these result in an identical solution for the intermediate regression parameters $\beta$. While the varying-coefficient model solves this explicitly, these can be obtained post-hoc from the differentiable model by differentiating with respect to $c_i$
$$\beta_i = \frac{\delta}{\delta c} g([x_i, c_i]; \Phi).$$
This is the first-order Taylor approximation of the model, a locally linear approximation [@doi:10.48550/arXiv.1602.04938] often used in post-hoc interpretation methods.
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition of $\beta_i$ as a derivative w.r.t. context $c$ doesn’t match the varying-coefficient form $y_i = x_i\cdot\beta_i$; the coefficient multiplying $x_i$ would come from differentiating w.r.t. $x_i$ (and should clarify whether this is a gradient/Jacobian, evaluated at a reference point). As written, the dimensions/interpretation of $\beta_i$ are inconsistent.

Suggested change
The connection is explicit for differentiable models $g$. Consider the model $P(Y | X, C)$ as a varying-coefficient regression model. An explicit estimator for regression parameters will solve for the regression parameter map $\beta_i = f(c_i)$ through
$$\hat{f} = \text{argmin}_f \sum_i (y_i - x_i \cdot f(c_i))^2,$$
while a differentiable model (e.g. a neural network) will solve
$$\hat{\Phi} = \text{argmin}_\Phi \sum_i (y_i - g([x_i, c_i]; \Phi).$$
Under mild assumptions, these result in an identical solution for the intermediate regression parameters $\beta$. While the varying-coefficient model solves this explicitly, these can be obtained post-hoc from the differentiable model by differentiating with respect to $c_i$
$$\beta_i = \frac{\delta}{\delta c} g([x_i, c_i]; \Phi).$$
This is the first-order Taylor approximation of the model, a locally linear approximation [@doi:10.48550/arXiv.1602.04938] often used in post-hoc interpretation methods.
The connection is explicit for differentiable models $g$. Consider the model $P(Y \mid X, C)$ alongside a varying-coefficient regression model. An explicit estimator for regression parameters will solve for the regression parameter map $\beta_i = f(c_i)$ through
$$\hat{f} = \operatorname*{argmin}_f \sum_i \left(y_i - x_i \cdot f(c_i)\right)^2,$$
while a differentiable model (e.g. a neural network) will solve
$$\hat{\Phi} = \operatorname*{argmin}_\Phi \sum_i \left(y_i - g([x_i, c_i]; \Phi)\right)^2.$$
Under mild assumptions, both formulations can represent context-dependent relationships between $x_i$ and $y_i$. While the varying-coefficient model parameterizes this dependence explicitly through $\beta_i = f(c_i)$, an analogous local coefficient can be obtained post hoc from the differentiable model by differentiating with respect to the primary input $x$ while holding $c_i$ fixed:
$$\beta_i = \left.\frac{\partial}{\partial x} g([x, c_i]; \Phi)\right|_{x = x_i}.$$
For scalar outputs, this derivative is a gradient; for vector-valued outputs, it is the corresponding Jacobian. It defines the first-order Taylor approximation of the model with respect to $x$ around $x_i$, yielding a locally linear approximation [@doi:10.48550/arXiv.1602.04938] often used in post-hoc interpretation methods.

Copilot uses AI. Check for mistakes.
Comment thread content/07.implicit.md Outdated

In contrast to explicit parameter mapping, the simplest route to implicit adaptation is to feed context directly as part of the input. The simplest form of implicit adaptation appears in neural network models that directly incorporate context as part of their input. In models written as $y_i = g([x_i, c_i]; \Phi)$, context features $c_i$ are concatenated with the primary features $x_i$, and the mapping $g$ is determined by a single set of fixed global weights $\Phi$. Even though these parameters do not change during inference, the network’s nonlinear structure allows it to capture complex interactions. As a result, the relationship between $x_i$ and $y_i$ can vary depending on the specific value of $c_i$.

<!-- Todo: Explore NTKs to make this explicit -->
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s an inline HTML TODO comment. This repo doesn’t appear to use TODO markers elsewhere in published content, so please either remove it before merge or convert it into a tracked issue/reference.

Suggested change
<!-- Todo: Explore NTKs to make this explicit -->

Copilot uses AI. Check for mistakes.
Comment thread content/07.implicit.md Outdated
Comment thread content/07.implicit.md Outdated
…ic P. Xing as author

Removes the theoretical bridge math (VCM-neural network connection) from this PR — it will land separately. Keeps the amortized estimation / context encoder content.

Adds Eric P. Xing (CMU MLD, MBZUAI) to the author list.
@cnellington cnellington changed the title feat(implicit): add theoretical bridge and amortized estimation sections feat(implicit): add amortized estimation sections and evolution of estimation figure May 10, 2026
cnellington and others added 2 commits May 10, 2026 16:49
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants