feat(implicit): add amortized estimation sections and evolution of estimation figure by cnellington · Pull Request #88 · AdaptInfer/context-review

cnellington · 2026-04-28T02:41:18Z

Also reverts a commit which itself reverts an accidental push to main with these changes 571a725.

…d estimation sections"" This reverts commit 571a725.

Copilot

Pull request overview

Adds new conceptual material to the “Implicit Adaptivity” chapter by introducing a theoretical bridge from explicit varying-coefficient estimators to differentiable implicit models, and by expanding the meta-learning section with an amortized estimation/context-encoder discussion (including a new figure).

Changes:

Renames/retitles the context-input subsection to frame it as a “Theoretical Bridge” and adds new mathematical exposition.
Expands the meta-learning section with an “Amortized Estimation: Context Encoders” subsection.
Adds a new figure (estimation_evolution.png) referenced from the new amortized estimation text.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 4 comments.

File	Description
`content/07.implicit.md`	Adds the theoretical-bridge math block, expands amortized estimation discussion, and embeds a new figure.
`content/images/estimation_evolution.png`	New image used by the amortized estimation section.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-28T02:44:28Z

+The connection is explicit for differentiable models $g$. Consider the model $P(Y | X, C)$ as a varying-coefficient regression model. An explicit estimator for regression parameters will solve for the regression parameter map $\beta_i = f(c_i)$ through
+$$\hat{f} = \text{argmin}_f \sum_i (y_i - x_i \cdot f(c_i))^2,$$
+while a differentiable model (e.g. a neural network) will solve 
+$$\hat{\Phi} = \text{argmin}_\Phi \sum_i (y_i - g([x_i, c_i]; \Phi).$$
+Under mild assumptions, these result in an identical solution for the intermediate regression parameters $\beta$. While the varying-coefficient model solves this explicitly, these can be obtained post-hoc from the differentiable model by differentiating with respect to $c_i$
+$$\beta_i = \frac{\delta}{\delta c} g([x_i, c_i]; \Phi).$$
+This is the first-order Taylor approximation of the model, a locally linear approximation [@doi:10.48550/arXiv.1602.04938] often used in post-hoc interpretation methods.


The definition of $\beta_i$ as a derivative w.r.t. context $c$ doesn’t match the varying-coefficient form $y_i = x_i\cdot\beta_i$; the coefficient multiplying $x_i$ would come from differentiating w.r.t. $x_i$ (and should clarify whether this is a gradient/Jacobian, evaluated at a reference point). As written, the dimensions/interpretation of $\beta_i$ are inconsistent.

Suggested change

The connection is explicit for differentiable models $g$. Consider the model $P(Y | X, C)$ as a varying-coefficient regression model. An explicit estimator for regression parameters will solve for the regression parameter map $\beta_i = f(c_i)$ through

$$\hat{f} = \text{argmin}_f \sum_i (y_i - x_i \cdot f(c_i))^2,$$

while a differentiable model (e.g. a neural network) will solve

$$\hat{\Phi} = \text{argmin}_\Phi \sum_i (y_i - g([x_i, c_i]; \Phi).$$

Under mild assumptions, these result in an identical solution for the intermediate regression parameters $\beta$. While the varying-coefficient model solves this explicitly, these can be obtained post-hoc from the differentiable model by differentiating with respect to $c_i$

$$\beta_i = \frac{\delta}{\delta c} g([x_i, c_i]; \Phi).$$

This is the first-order Taylor approximation of the model, a locally linear approximation [@doi:10.48550/arXiv.1602.04938] often used in post-hoc interpretation methods.

The connection is explicit for differentiable models $g$. Consider the model $P(Y \mid X, C)$ alongside a varying-coefficient regression model. An explicit estimator for regression parameters will solve for the regression parameter map $\beta_i = f(c_i)$ through

$$\hat{f} = \operatorname*{argmin}_f \sum_i \left(y_i - x_i \cdot f(c_i)\right)^2,$$

while a differentiable model (e.g. a neural network) will solve

$$\hat{\Phi} = \operatorname*{argmin}_\Phi \sum_i \left(y_i - g([x_i, c_i]; \Phi)\right)^2.$$

Under mild assumptions, both formulations can represent context-dependent relationships between $x_i$ and $y_i$. While the varying-coefficient model parameterizes this dependence explicitly through $\beta_i = f(c_i)$, an analogous local coefficient can be obtained post hoc from the differentiable model by differentiating with respect to the primary input $x$ while holding $c_i$ fixed:

$$\beta_i = \left.\frac{\partial}{\partial x} g([x, c_i]; \Phi)\right|_{x = x_i}.$$

For scalar outputs, this derivative is a gradient; for vector-valued outputs, it is the corresponding Jacobian. It defines the first-order Taylor approximation of the model with respect to $x$ around $x_i$, yielding a locally linear approximation [@doi:10.48550/arXiv.1602.04938] often used in post-hoc interpretation methods.

Copilot · 2026-04-28T02:44:29Z


 In contrast to explicit parameter mapping, the simplest route to implicit adaptation is to feed context directly as part of the input. The simplest form of implicit adaptation appears in neural network models that directly incorporate context as part of their input. In models written as $y_i = g([x_i, c_i]; \Phi)$, context features $c_i$ are concatenated with the primary features $x_i$, and the mapping $g$ is determined by a single set of fixed global weights $\Phi$. Even though these parameters do not change during inference, the network’s nonlinear structure allows it to capture complex interactions. As a result, the relationship between $x_i$ and $y_i$ can vary depending on the specific value of $c_i$.

+<!-- Todo: Explore NTKs to make this explicit -->


There’s an inline HTML TODO comment. This repo doesn’t appear to use TODO markers elsewhere in published content, so please either remove it before merge or convert it into a tracked issue/reference.

Suggested change

…ic P. Xing as author Removes the theoretical bridge math (VCM-neural network connection) from this PR — it will land separately. Keeps the amortized estimation / context encoder content. Adds Eric P. Xing (CMU MLD, MBZUAI) to the author list.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Reapply "Reapply "feat(implicit): add theoretical bridge and amortize…

1d8646a

…d estimation sections"" This reverts commit 571a725.

Copilot AI review requested due to automatic review settings April 28, 2026 02:41

Copilot started reviewing on behalf of cnellington April 28, 2026 02:41 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

cnellington added 2 commits April 27, 2026 19:47

fix(implicit): remove ICL section editing artifacts

23bb71d

cnellington mentioned this pull request May 10, 2026

feat(implicit): add theoretical bridge between VCMs and neural models #89

Open

2 tasks

Update metadata.yaml

14c3a76

cnellington changed the title ~~feat(implicit): add theoretical bridge and amortized estimation sections~~ feat(implicit): add amortized estimation sections and evolution of estimation figure May 10, 2026

cnellington and others added 2 commits May 10, 2026 16:49

Apply suggestion from @Copilot

449ca5c

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Apply suggestion from @Copilot

ac27241

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(implicit): add amortized estimation sections and evolution of estimation figure#88

feat(implicit): add amortized estimation sections and evolution of estimation figure#88
cnellington wants to merge 6 commits into
mainfrom
implicit-amortized-estimation

cnellington commented Apr 28, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Copilot AI Apr 28, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		In contrast to explicit parameter mapping, the simplest route to implicit adaptation is to feed context directly as part of the input. The simplest form of implicit adaptation appears in neural network models that directly incorporate context as part of their input. In models written as $y_i = g([x_i, c_i]; \Phi)$, context features $c_i$ are concatenated with the primary features $x_i$, and the mapping $g$ is determined by a single set of fixed global weights $\Phi$. Even though these parameters do not change during inference, the network’s nonlinear structure allows it to capture complex interactions. As a result, the relationship between $x_i$ and $y_i$ can vary depending on the specific value of $c_i$.

		<!-- Todo: Explore NTKs to make this explicit -->

Conversation

cnellington commented Apr 28, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants