feat(implicit): add amortized estimation sections and evolution of estimation figure#88
feat(implicit): add amortized estimation sections and evolution of estimation figure#88cnellington wants to merge 6 commits into
Conversation
…d estimation sections"" This reverts commit 571a725.
There was a problem hiding this comment.
Pull request overview
Adds new conceptual material to the “Implicit Adaptivity” chapter by introducing a theoretical bridge from explicit varying-coefficient estimators to differentiable implicit models, and by expanding the meta-learning section with an amortized estimation/context-encoder discussion (including a new figure).
Changes:
- Renames/retitles the context-input subsection to frame it as a “Theoretical Bridge” and adds new mathematical exposition.
- Expands the meta-learning section with an “Amortized Estimation: Context Encoders” subsection.
- Adds a new figure (
estimation_evolution.png) referenced from the new amortized estimation text.
Reviewed changes
Copilot reviewed 1 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
content/07.implicit.md |
Adds the theoretical-bridge math block, expands amortized estimation discussion, and embeds a new figure. |
content/images/estimation_evolution.png |
New image used by the amortized estimation section. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| The connection is explicit for differentiable models $g$. Consider the model $P(Y | X, C)$ as a varying-coefficient regression model. An explicit estimator for regression parameters will solve for the regression parameter map $\beta_i = f(c_i)$ through | ||
| $$\hat{f} = \text{argmin}_f \sum_i (y_i - x_i \cdot f(c_i))^2,$$ | ||
| while a differentiable model (e.g. a neural network) will solve | ||
| $$\hat{\Phi} = \text{argmin}_\Phi \sum_i (y_i - g([x_i, c_i]; \Phi).$$ | ||
| Under mild assumptions, these result in an identical solution for the intermediate regression parameters $\beta$. While the varying-coefficient model solves this explicitly, these can be obtained post-hoc from the differentiable model by differentiating with respect to $c_i$ | ||
| $$\beta_i = \frac{\delta}{\delta c} g([x_i, c_i]; \Phi).$$ | ||
| This is the first-order Taylor approximation of the model, a locally linear approximation [@doi:10.48550/arXiv.1602.04938] often used in post-hoc interpretation methods. |
There was a problem hiding this comment.
The definition of
| The connection is explicit for differentiable models $g$. Consider the model $P(Y | X, C)$ as a varying-coefficient regression model. An explicit estimator for regression parameters will solve for the regression parameter map $\beta_i = f(c_i)$ through | |
| $$\hat{f} = \text{argmin}_f \sum_i (y_i - x_i \cdot f(c_i))^2,$$ | |
| while a differentiable model (e.g. a neural network) will solve | |
| $$\hat{\Phi} = \text{argmin}_\Phi \sum_i (y_i - g([x_i, c_i]; \Phi).$$ | |
| Under mild assumptions, these result in an identical solution for the intermediate regression parameters $\beta$. While the varying-coefficient model solves this explicitly, these can be obtained post-hoc from the differentiable model by differentiating with respect to $c_i$ | |
| $$\beta_i = \frac{\delta}{\delta c} g([x_i, c_i]; \Phi).$$ | |
| This is the first-order Taylor approximation of the model, a locally linear approximation [@doi:10.48550/arXiv.1602.04938] often used in post-hoc interpretation methods. | |
| The connection is explicit for differentiable models $g$. Consider the model $P(Y \mid X, C)$ alongside a varying-coefficient regression model. An explicit estimator for regression parameters will solve for the regression parameter map $\beta_i = f(c_i)$ through | |
| $$\hat{f} = \operatorname*{argmin}_f \sum_i \left(y_i - x_i \cdot f(c_i)\right)^2,$$ | |
| while a differentiable model (e.g. a neural network) will solve | |
| $$\hat{\Phi} = \operatorname*{argmin}_\Phi \sum_i \left(y_i - g([x_i, c_i]; \Phi)\right)^2.$$ | |
| Under mild assumptions, both formulations can represent context-dependent relationships between $x_i$ and $y_i$. While the varying-coefficient model parameterizes this dependence explicitly through $\beta_i = f(c_i)$, an analogous local coefficient can be obtained post hoc from the differentiable model by differentiating with respect to the primary input $x$ while holding $c_i$ fixed: | |
| $$\beta_i = \left.\frac{\partial}{\partial x} g([x, c_i]; \Phi)\right|_{x = x_i}.$$ | |
| For scalar outputs, this derivative is a gradient; for vector-valued outputs, it is the corresponding Jacobian. It defines the first-order Taylor approximation of the model with respect to $x$ around $x_i$, yielding a locally linear approximation [@doi:10.48550/arXiv.1602.04938] often used in post-hoc interpretation methods. |
|
|
||
| In contrast to explicit parameter mapping, the simplest route to implicit adaptation is to feed context directly as part of the input. The simplest form of implicit adaptation appears in neural network models that directly incorporate context as part of their input. In models written as $y_i = g([x_i, c_i]; \Phi)$, context features $c_i$ are concatenated with the primary features $x_i$, and the mapping $g$ is determined by a single set of fixed global weights $\Phi$. Even though these parameters do not change during inference, the network’s nonlinear structure allows it to capture complex interactions. As a result, the relationship between $x_i$ and $y_i$ can vary depending on the specific value of $c_i$. | ||
|
|
||
| <!-- Todo: Explore NTKs to make this explicit --> |
There was a problem hiding this comment.
There’s an inline HTML TODO comment. This repo doesn’t appear to use TODO markers elsewhere in published content, so please either remove it before merge or convert it into a tracked issue/reference.
| <!-- Todo: Explore NTKs to make this explicit --> |
…ic P. Xing as author Removes the theoretical bridge math (VCM-neural network connection) from this PR — it will land separately. Keeps the amortized estimation / context encoder content. Adds Eric P. Xing (CMU MLD, MBZUAI) to the author list.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Also reverts a commit which itself reverts an accidental push to main with these changes 571a725.