General Content: Challenge for meta-learning: Task ambiguity from few shots. Embed MAML in a graphical model setup which allows for Amortized Variational Inference. Thereby can place distribution on meta-parameters. At meta-test time we can draw different meta-learners to train on few shots. This generative process allows for active learning and effective exploration based on uncertainty estimates.
-
Definition of Meta-Learning: Acquisition of prior that allows learner to adapt to new task given few data points. Inherently, Bayesian. Info integration with few data points
-
Here: Want scalable approach (to many parameters of the meta-learner) as well as good uncertainty estimate -> Amortized variational inference with shared params of model and inference networks. Extend MAML to model distribution over prior model parameters - simple stochastic adaptation procedure which injects noise into GD at meta-test time.
-
Similar to LLAMA method - Put local laplace approx for modeling the task params - requires high-dim covariance matrix. This work otoh approximately infers pre-update params which are made tractable through choice of approx posterior.
-
Key assumption: Task distribution allows for structural overlap of the tasks!
-
Conventional MAML = Approximating MAP inference under simplified model where meta-param prior is delta function - see LLAMA derivation - Maximize the log likelihood of test labels given train data and test features
-
Here allow prior distribution to be non-determinisitic -> Structural VI
- Specific factorization of joint variational distribution over task specific params and meta params - allow for sharing of params across tasks with help of amortized inference
- Derive variational lower bound on log LH: easy eval = loss of network and gaussian assumptions
- Important choice conditional variational distribution of task-specific params given meta parameters
-
Provide simplified version with more resemblance to LLAMA and easy GD algorithmic derivation - injection of noise!
-
Experiments:
- Performance with sampled meta params comparable to MAML
- Multiple draws allow for approx confidence bands/Variance estimation: Active inference - reduce error based on newly acquired data points - show that this clearly outperforms MAML with randomly sampled points
- Multi-modal task distribution can be captured by posterior!
- Again amazing Finn writing style - what do we want to answer in the experiments?
- There is no numerical expression for variance? - From samples? No! Cant be right.
- Can this be further framed as Bayesian Optimization problem?