Skip to content

Latest commit

 

History

History
36 lines (22 loc) · 2.69 KB

2018_Finn.md

File metadata and controls

36 lines (22 loc) · 2.69 KB

Title: Probabilistic Model-Agnostic Meta-Learning

Author: Finn et al (2018)

General Content: Challenge for meta-learning: Task ambiguity from few shots. Embed MAML in a graphical model setup which allows for Amortized Variational Inference. Thereby can place distribution on meta-parameters. At meta-test time we can draw different meta-learners to train on few shots. This generative process allows for active learning and effective exploration based on uncertainty estimates.

Keypoints:

  • Definition of Meta-Learning: Acquisition of prior that allows learner to adapt to new task given few data points. Inherently, Bayesian. Info integration with few data points

  • Here: Want scalable approach (to many parameters of the meta-learner) as well as good uncertainty estimate -> Amortized variational inference with shared params of model and inference networks. Extend MAML to model distribution over prior model parameters - simple stochastic adaptation procedure which injects noise into GD at meta-test time.

  • Similar to LLAMA method - Put local laplace approx for modeling the task params - requires high-dim covariance matrix. This work otoh approximately infers pre-update params which are made tractable through choice of approx posterior.

  • Key assumption: Task distribution allows for structural overlap of the tasks!

  • Conventional MAML = Approximating MAP inference under simplified model where meta-param prior is delta function - see LLAMA derivation - Maximize the log likelihood of test labels given train data and test features

  • Here allow prior distribution to be non-determinisitic -> Structural VI

    • Specific factorization of joint variational distribution over task specific params and meta params - allow for sharing of params across tasks with help of amortized inference
    • Derive variational lower bound on log LH: easy eval = loss of network and gaussian assumptions
    • Important choice conditional variational distribution of task-specific params given meta parameters
  • Provide simplified version with more resemblance to LLAMA and easy GD algorithmic derivation - injection of noise!

  • Experiments:

    • Performance with sampled meta params comparable to MAML
    • Multiple draws allow for approx confidence bands/Variance estimation: Active inference - reduce error based on newly acquired data points - show that this clearly outperforms MAML with randomly sampled points
    • Multi-modal task distribution can be captured by posterior!

Questions:

  • Again amazing Finn writing style - what do we want to answer in the experiments?
  • There is no numerical expression for variance? - From samples? No! Cant be right.
  • Can this be further framed as Bayesian Optimization problem?