-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Notes on active learning, bandits, choice theory, experimental design…
…. Moved from https://github.com/chengsoonong/digbeta
- Loading branch information
1 parent
ae578e5
commit 75d2a71
Showing
1 changed file
with
213 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,213 @@ | ||
=========================================================================== | ||
Notes on Active Learning, Bandits, Choice and Design of Experiments (ABCDE) | ||
=========================================================================== | ||
|
||
There are four ideas which are often used for eliciting human | ||
responses using machine learning predictors. At a high level they are | ||
similar is spirit, but they have different foundations which lead to | ||
different formulations. The ideas are active learning, bandits and | ||
experimental design. Related to this but with literature from a different | ||
field is social choice theory, which looks at how individual preferences are aggregated. | ||
|
||
Overview of ABCDE | ||
================= | ||
|
||
Active Learning | ||
--------------- | ||
|
||
Active learning considers the setting where the agent interacts with | ||
its environment to procure a training set, rather than passively | ||
receiving i.i.d. samples from some underlying distribution. | ||
|
||
It is often assumed that the environment is infinite (e.g. $R^d$) and | ||
the agent has to choose a location, $x$, to query. The oracle then returns | ||
the label $y$. It is often assumed that there is no noise in the label, | ||
and hence there is no benefit of querying the same point $x$ again. In | ||
many practical applications, the environment is considered to be | ||
finite (but large). This is called the pool-based active learning. | ||
|
||
The active learning algorithm is often compared to the passive | ||
learning algorithm. | ||
|
||
Bandits | ||
------- | ||
|
||
A bandit problem is a sequential allocation problem defined by a set | ||
of actions. The agent chooses an action at each time step, and the | ||
environment returns a reward. The aim of the agent is to maximise reward. | ||
|
||
In basic settings, the set of actions is considered to be | ||
finite. There are three fundamental formalisations of the bandit | ||
problem, depending on the assumed nature of the reward process: | ||
stochastic, adversarial and Markovian. In all three settings the | ||
reward is uncertain, and hence the agent may have to play a particular | ||
action repeatedly. | ||
|
||
The agent is compared to a static agent which has played the best | ||
action. This difference in reward is called regret. | ||
|
||
Experimental Design | ||
------------------- | ||
|
||
In contrast to active learning, experimental design considers the problem of regression, | ||
i.e. where the label $y\in R$ is a real number. | ||
|
||
The problem to be solved in experimental design is to choose a set of | ||
trials (say of size N) to gather enough information about the object | ||
of interest. The goal is to maximise the information obtained about | ||
the parameters of the model (of the object). | ||
|
||
It is often assumed that the observations at the N trials are | ||
independent. When N is finite this is called exact design, otherwise | ||
it is called approximate or continuous design. The environment is | ||
assumed to be infinite (e.g. $R^d$) and the observations are scalar real variables. | ||
|
||
|
||
============== | ||
Unsorted notes | ||
============== | ||
|
||
* Thompson sampling | ||
* Upper Confidence Bound | ||
|
||
Notes on UCB for binary rewards | ||
------------------------------- | ||
|
||
In the special case when the rewards of the arms are {0,1}, we can get much tighter analysis. See [pymaBandits](http://mloss.org/software/view/415/). This is also implemented in this repository under ```python/digbeta```. | ||
|
||
|
||
Notes on UCB for graphs | ||
----------------------- | ||
|
||
*Spectral Bandits for Smooth Graph Functions | ||
Michal Valko, Remi Munos, Branislav Kveton, Tomas Kocak | ||
ICML 2014* | ||
|
||
Study bandit problem where the arms are the nodes of a graph and the expected payoff of pulling an arm is a smooth function on this graph. | ||
|
||
Assume that the graph is known, and its edges represent the similarities of the nodes. At time $t$, choose a node and observe its payoff. Based on the payoff, update model. | ||
|
||
Assume that number of nodes $N$ is large, and interested in the regime $t < N$. | ||
|
||
|
||
|
||
|
||
Related Literature | ||
================== | ||
|
||
This is an unsorted list of references. | ||
|
||
* Prediction, Learning, and Games, | ||
Nicolo Cesa-Bianchi, Gabor Lugosi | ||
Cambridge University Press, 2006 | ||
|
||
* Active Learning Literature Survey | ||
Burr Settles | ||
Computer Sciences Technical Report 1648 | ||
University of Wisconsin–Madison, 2010 | ||
|
||
* Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems | ||
Sebastien Bubeck, Nicolo Cesa-Bianchi | ||
Foundations and Trends in Machine Learning, Vol 5, No 1, 2012, pp. 1-122 | ||
|
||
* Spectral Bandits for Smooth Graph Functions | ||
Michal Valko, Remi Munos, Branislav Kveton, Tomas Kocak | ||
ICML 2014 | ||
|
||
* Spectral Thompson Sampling | ||
Tomas Kocak, Michal Valko, Remi Munos, Shipra Agrawal | ||
AAAI 2014 | ||
|
||
* An Analysis of Active Learning Strategies for Sequence Labeling Tasks | ||
Burr Settles, Mark Craven | ||
EMNLP 2008 | ||
|
||
* Margin-based active learning for structured predictions | ||
Kevin Small, Dan Roth | ||
International Journal of Machine Learning and Cybernetics, 2010, 1:3-25 | ||
|
||
* Emilie Kaufmann, Nathaniel Korda and Remi Munos | ||
Thompson Sampling: An Asymptotically Optimal Finite Time Analysis, ALT 2012 | ||
|
||
* Thompson Sampling for 1-Dimensional Exponential Family Bandits | ||
Nathaniel Korda, Emilie Kaufmann, Remi Munos | ||
NIPS 2013 | ||
|
||
* On Bayesian Upper Confidence Bounds for Bandit Problems | ||
Emilie Kaufmann, Olivier Cappe, Aurelien Garivier | ||
AISTATS 2012 | ||
|
||
* Building Bridges: Viewing Active Learning from the Multi-Armed Bandit Lens | ||
Ravi Ganti, Alexander G. Gray | ||
UAI 2013 | ||
|
||
* From Theories to Queries: Active Learning in Practice | ||
Burr Settles | ||
JMLR W&CP, NIPS 2011 Workshop on Active Learning and Experimental Design | ||
|
||
* Contextual Gaussian Process Bandit Optimization. | ||
Andreas Krause, Cheng Soon Ong | ||
NIPS 2011 | ||
|
||
* Contextual Bandit for Active Learning: Active Thompson Sampling. | ||
Djallel Bouneffouf, Romain Laroche, Tanguy Urvoy, Raphael Feraud, Robin Allesiardo. | ||
NIPS 2014 | ||
|
||
* Towards Anytime Active Learning: Interrupting Experts to Reduce Annotation Costs | ||
Maria Ramirez-Loaiza, Aron Culotta, Mustafa Bilgic | ||
SIGKDD 2013 | ||
|
||
* Actively Learning Ontology Matching via User Interaction | ||
Feng Shi, Juanzi Li, Jie Tang, Guotong Xie, Hanyu Li | ||
ISWC 2009 | ||
|
||
* A Novel Method for Measuring Semantic Similarity for XML Schema Matching | ||
Buhwan Jeong, Daewon Lee, Hyunbo Cho, Jaewook Lee | ||
Expert Systems with Applications 2008 | ||
|
||
* Tamr Product White Paper | ||
http://www.tamr.com/tamr-technical-overview/ | ||
|
||
* Design of Experiments in Nonlinear Models | ||
Luc Pronzato, Andrej Pazman | ||
Springer 2013 | ||
|
||
* Optimisation in space of measures and optimal design | ||
Ilya Molchanov and Sergei Zuyev | ||
ESAIM: Probability and Statistics, Vol. 8, pp. 12-24, 2004 | ||
|
||
* Active Learning for logistic regression: an evaluation | ||
Andrew I. Schein and Lyle H. Ungar | ||
Machine Learning, 2007, 68: 235-265 | ||
|
||
* Learning to Optimize Via Information-Directed Sampling | ||
Daniel Russo and Benjamin Van Roy | ||
|
||
* The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond | ||
Aurelien Garivier and Olivier Cappe | ||
COLT 2011 | ||
|
||
* A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences | ||
Odalric-Ambrym Maillard, Remi Munos, Gilles Stoltz | ||
COLT 2011 | ||
|
||
* Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation | ||
Olivier Cappe, Aurelien Garivier, Odalric-Ambrym Maillard, Remi Munos, Gilles Stoltz | ||
Annals of Statistics, 2013 | ||
|
||
* Xiaojin Zhu, Zoubin Ghahramani, John Lafferty, | ||
Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions | ||
ICML 2003 | ||
|
||
* Efficient and Parsimonious Agnostic Active Learning | ||
Tzu-Kuo Huang, Alekh Agarwal, Daniel J. Hsu, John Langford, Robert E. Schapire | ||
NIPS 2015 | ||
|
||
* NEXT: A System for Real-World Development, Evaluation, and Application of Active Learning | ||
Kevin Jamieson, Lalit Jain, Chris Fernandez, Nick Glattard, Robert Nowak | ||
NIPS 2015 | ||
|
||
* Baram, Y., El-Yaniv, R., and Luz, K. (2004). | ||
Online choice of active learning algorithms. Journal of Machine Learning Research, 5:255–291. | ||
|
||
* Hsu, W.-N. and Lin, H.-T. (2015). Active learning by learning. In AAAI 15. |