Skip to content

Commit

Permalink
Notes on active learning, bandits, choice theory, experimental design…
Browse files Browse the repository at this point in the history
  • Loading branch information
chengsoonong committed Sep 19, 2017
1 parent ae578e5 commit 75d2a71
Showing 1 changed file with 213 additions and 0 deletions.
213 changes: 213 additions & 0 deletions docs/source/survey.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
===========================================================================
Notes on Active Learning, Bandits, Choice and Design of Experiments (ABCDE)
===========================================================================

There are four ideas which are often used for eliciting human
responses using machine learning predictors. At a high level they are
similar is spirit, but they have different foundations which lead to
different formulations. The ideas are active learning, bandits and
experimental design. Related to this but with literature from a different
field is social choice theory, which looks at how individual preferences are aggregated.

Overview of ABCDE
=================

Active Learning
---------------

Active learning considers the setting where the agent interacts with
its environment to procure a training set, rather than passively
receiving i.i.d. samples from some underlying distribution.

It is often assumed that the environment is infinite (e.g. $R^d$) and
the agent has to choose a location, $x$, to query. The oracle then returns
the label $y$. It is often assumed that there is no noise in the label,
and hence there is no benefit of querying the same point $x$ again. In
many practical applications, the environment is considered to be
finite (but large). This is called the pool-based active learning.

The active learning algorithm is often compared to the passive
learning algorithm.

Bandits
-------

A bandit problem is a sequential allocation problem defined by a set
of actions. The agent chooses an action at each time step, and the
environment returns a reward. The aim of the agent is to maximise reward.

In basic settings, the set of actions is considered to be
finite. There are three fundamental formalisations of the bandit
problem, depending on the assumed nature of the reward process:
stochastic, adversarial and Markovian. In all three settings the
reward is uncertain, and hence the agent may have to play a particular
action repeatedly.

The agent is compared to a static agent which has played the best
action. This difference in reward is called regret.

Experimental Design
-------------------

In contrast to active learning, experimental design considers the problem of regression,
i.e. where the label $y\in R$ is a real number.

The problem to be solved in experimental design is to choose a set of
trials (say of size N) to gather enough information about the object
of interest. The goal is to maximise the information obtained about
the parameters of the model (of the object).

It is often assumed that the observations at the N trials are
independent. When N is finite this is called exact design, otherwise
it is called approximate or continuous design. The environment is
assumed to be infinite (e.g. $R^d$) and the observations are scalar real variables.


==============
Unsorted notes
==============

* Thompson sampling
* Upper Confidence Bound

Notes on UCB for binary rewards
-------------------------------

In the special case when the rewards of the arms are {0,1}, we can get much tighter analysis. See [pymaBandits](http://mloss.org/software/view/415/). This is also implemented in this repository under ```python/digbeta```.


Notes on UCB for graphs
-----------------------

*Spectral Bandits for Smooth Graph Functions
Michal Valko, Remi Munos, Branislav Kveton, Tomas Kocak
ICML 2014*

Study bandit problem where the arms are the nodes of a graph and the expected payoff of pulling an arm is a smooth function on this graph.

Assume that the graph is known, and its edges represent the similarities of the nodes. At time $t$, choose a node and observe its payoff. Based on the payoff, update model.

Assume that number of nodes $N$ is large, and interested in the regime $t < N$.




Related Literature
==================

This is an unsorted list of references.

* Prediction, Learning, and Games,
Nicolo Cesa-Bianchi, Gabor Lugosi
Cambridge University Press, 2006

* Active Learning Literature Survey
Burr Settles
Computer Sciences Technical Report 1648
University of Wisconsin–Madison, 2010

* Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
Sebastien Bubeck, Nicolo Cesa-Bianchi
Foundations and Trends in Machine Learning, Vol 5, No 1, 2012, pp. 1-122

* Spectral Bandits for Smooth Graph Functions
Michal Valko, Remi Munos, Branislav Kveton, Tomas Kocak
ICML 2014

* Spectral Thompson Sampling
Tomas Kocak, Michal Valko, Remi Munos, Shipra Agrawal
AAAI 2014

* An Analysis of Active Learning Strategies for Sequence Labeling Tasks
Burr Settles, Mark Craven
EMNLP 2008

* Margin-based active learning for structured predictions
Kevin Small, Dan Roth
International Journal of Machine Learning and Cybernetics, 2010, 1:3-25

* Emilie Kaufmann, Nathaniel Korda and Remi Munos
Thompson Sampling: An Asymptotically Optimal Finite Time Analysis, ALT 2012

* Thompson Sampling for 1-Dimensional Exponential Family Bandits
Nathaniel Korda, Emilie Kaufmann, Remi Munos
NIPS 2013

* On Bayesian Upper Confidence Bounds for Bandit Problems
Emilie Kaufmann, Olivier Cappe, Aurelien Garivier
AISTATS 2012

* Building Bridges: Viewing Active Learning from the Multi-Armed Bandit Lens
Ravi Ganti, Alexander G. Gray
UAI 2013

* From Theories to Queries: Active Learning in Practice
Burr Settles
JMLR W&CP, NIPS 2011 Workshop on Active Learning and Experimental Design

* Contextual Gaussian Process Bandit Optimization.
Andreas Krause, Cheng Soon Ong
NIPS 2011

* Contextual Bandit for Active Learning: Active Thompson Sampling.
Djallel Bouneffouf, Romain Laroche, Tanguy Urvoy, Raphael Feraud, Robin Allesiardo.
NIPS 2014

* Towards Anytime Active Learning: Interrupting Experts to Reduce Annotation Costs
Maria Ramirez-Loaiza, Aron Culotta, Mustafa Bilgic
SIGKDD 2013

* Actively Learning Ontology Matching via User Interaction
Feng Shi, Juanzi Li, Jie Tang, Guotong Xie, Hanyu Li
ISWC 2009

* A Novel Method for Measuring Semantic Similarity for XML Schema Matching
Buhwan Jeong, Daewon Lee, Hyunbo Cho, Jaewook Lee
Expert Systems with Applications 2008

* Tamr Product White Paper
http://www.tamr.com/tamr-technical-overview/

* Design of Experiments in Nonlinear Models
Luc Pronzato, Andrej Pazman
Springer 2013

* Optimisation in space of measures and optimal design
Ilya Molchanov and Sergei Zuyev
ESAIM: Probability and Statistics, Vol. 8, pp. 12-24, 2004

* Active Learning for logistic regression: an evaluation
Andrew I. Schein and Lyle H. Ungar
Machine Learning, 2007, 68: 235-265

* Learning to Optimize Via Information-Directed Sampling
Daniel Russo and Benjamin Van Roy

* The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond
Aurelien Garivier and Olivier Cappe
COLT 2011

* A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences
Odalric-Ambrym Maillard, Remi Munos, Gilles Stoltz
COLT 2011

* Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation
Olivier Cappe, Aurelien Garivier, Odalric-Ambrym Maillard, Remi Munos, Gilles Stoltz
Annals of Statistics, 2013

* Xiaojin Zhu, Zoubin Ghahramani, John Lafferty,
Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions
ICML 2003

* Efficient and Parsimonious Agnostic Active Learning
Tzu-Kuo Huang, Alekh Agarwal, Daniel J. Hsu, John Langford, Robert E. Schapire
NIPS 2015

* NEXT: A System for Real-World Development, Evaluation, and Application of Active Learning
Kevin Jamieson, Lalit Jain, Chris Fernandez, Nick Glattard, Robert Nowak
NIPS 2015

* Baram, Y., El-Yaniv, R., and Luz, K. (2004).
Online choice of active learning algorithms. Journal of Machine Learning Research, 5:255–291.

* Hsu, W.-N. and Lin, H.-T. (2015). Active learning by learning. In AAAI 15.

0 comments on commit 75d2a71

Please sign in to comment.