Title: Probabilistic Model-Agnostic Meta-Learning

Author: Finn et al (2018)

General Content: Challenge for meta-learning: Task ambiguity from few shots. Embed MAML in a graphical model setup which allows for Amortized Variational Inference. Thereby can place distribution on meta-parameters. At meta-test time we can draw different meta-learners to train on few shots. This generative process allows for active learning and effective exploration based on uncertainty estimates.

Keypoints:

Definition of Meta-Learning: Acquisition of prior that allows learner to adapt to new task given few data points. Inherently, Bayesian. Info integration with few data points
Here: Want scalable approach (to many parameters of the meta-learner) as well as good uncertainty estimate -> Amortized variational inference with shared params of model and inference networks. Extend MAML to model distribution over prior model parameters - simple stochastic adaptation procedure which injects noise into GD at meta-test time.
Similar to LLAMA method - Put local laplace approx for modeling the task params - requires high-dim covariance matrix. This work otoh approximately infers pre-update params which are made tractable through choice of approx posterior.
Key assumption: Task distribution allows for structural overlap of the tasks!
Conventional MAML = Approximating MAP inference under simplified model where meta-param prior is delta function - see LLAMA derivation - Maximize the log likelihood of test labels given train data and test features
Here allow prior distribution to be non-determinisitic -> Structural VI
- Specific factorization of joint variational distribution over task specific params and meta params - allow for sharing of params across tasks with help of amortized inference
- Derive variational lower bound on log LH: easy eval = loss of network and gaussian assumptions
- Important choice conditional variational distribution of task-specific params given meta parameters
Provide simplified version with more resemblance to LLAMA and easy GD algorithmic derivation - injection of noise!
Experiments:
- Performance with sampled meta params comparable to MAML
- Multiple draws allow for approx confidence bands/Variance estimation: Active inference - reduce error based on newly acquired data points - show that this clearly outperforms MAML with randomly sampled points
- Multi-modal task distribution can be captured by posterior!

Questions:

Again amazing Finn writing style - what do we want to answer in the experiments?
There is no numerical expression for variance? - From samples? No! Cant be right.
Can this be further framed as Bayesian Optimization problem?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2018_Finn.md

2018_Finn.md

Title: Probabilistic Model-Agnostic Meta-Learning

Author: Finn et al (2018)

Keypoints:

Questions:

Files

2018_Finn.md

Latest commit

History

2018_Finn.md

File metadata and controls

Title: Probabilistic Model-Agnostic Meta-Learning

Author: Finn et al (2018)

Keypoints:

Questions: