Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Machine Teaching of Active Sequential Learners

This repository contains code and data implementing the methods and experiments described in

Tomi Peltola, Mustafa Mert Çelikok, Pedram Daee, Samuel Kaski
Machine Teaching of Active Sequential Learners, NeurIPS 2019
(also available on arXiv:

There's also an accompanying website:

Machine teaching addresses the problem of finding the best training data that can guide a learning algorithm to a target model with minimal effort. In conventional settings, a teacher provides data that are consistent with the true data distribution. However, for sequential learners which actively choose their queries, such as multi-armed bandits and active learners, the teacher can only provide responses to the learner's queries, not design the full data. In this setting, consistent teachers can be sub-optimal for finite horizons. We formulate this sequential teaching problem, which current techniques in machine teaching do not address, as a Markov decision process, with the dynamics nesting a model of the learner and the actions being the teacher's responses. Furthermore, we address the complementary problem of learning from a teacher that plans: to recognise the teaching intent of the responses, the learner is endowed with a model of the teacher. We test the formulation with multi-armed bandit learners in simulated experiments and a user study. The results show that learning is improved by (i) planning teaching and (ii) the learner having a model of the teacher. The approach gives tools to taking into account strategic (planning) behaviour of users of interactive intelligent systems, such as recommendation engines, by considering them as boundedly optimal teachers.



  • Python 3.7
  • pytorch
  • pyro-ppl
  • numpy
  • scipy
  • scikit-learn
  • matplotlib
  • lifelines

You can create a conda environment satisfying the requirements with conda env create -f environment.yml.

Running multi-armed bandit simulation experiments

Go to the multi-armed_bandits subfolder.

Example for running simulation experiments:
python wine 100 50 30 test

This runs experiments on the wine dataset (data loaded from simulation_studies/X_wine.npy), with 100 arms, 50 replications with horizon of 30. test gives an arbitrary id for the run used as part of the filename for the results (saved in results directory). Running the experiment can take some time (a few hours with the given settings on a relatively modern workstation).

To generate result figures from the saved experimental results, run
python experiment_wine_100_30_test

Here, experiment_wine_100_30_test is the file name of the results file (without the results directory and file extension). The figures are generated in the results directory.

Multi-step (teacher's planning horizons of 1 to 4) experiments can be ran similarly using and

Note: the results in the paper were ran on cluster computers with slightly different scripts (and different versions of dependencies), so results might not be reproduced exactly with the above even if all the settings are set to the same.

Running active learning example and experiment

Go to the active_learning subdirectory.

Run python to run the example.

Run python to run the experiment.


The code in multi-armed_bandits subdirectory is organized as follows:

  • Defines experimental settings and run replicate simulation experiments.
  • Implements the experiment loop for one multi-armed bandit experiment with the given settings (called from
  •, User models implement the teaching behaviour.
  •, AI models prepare data for user model's interpretation.
  •,, Implement the Pyro models and computation for posterior approximations.
  • Implements the bandit arm selection strategies.
  • Implements the estimation of Thompson sampling probabilities.
  • Wraps concordance computation code.
  • and Implements the scripts to run a user study. Result files of the user study in the paper are in results_user_study directory.


Pre-processed datasets are available in the multi-armed_bandits/simulation_studies directory (Wine data also in active_learning/data):

  • X_word.npy is the Word dataset [1].
  • X_wine.npy is the Wine dataset [2].
  • X_leaf.npy is the Leaf dataset [3].

Data for the user study is in multi-armed_bandits/word_search_study directory.

[1] Distributed representations of words and phrases and their compositionality, Mikolov, Tomas and Sutskever, Ilya and Chen, Kai and Corrado, Greg S and Dean, Jeff, Advances in Neural Information Processing Systems, NIPS, pages 3111--3119, 2013.

[2] Modeling wine preferences by data mining from physicochemical properties, Cortez, Paulo and Cerdeira, António and Almeida, Fernando and Matos, Telmo and Reis, José Decision Support Systems, vol. 47, num. 4, pages=547--553, 2009. Dataset available at UCI ML Repository.

[3] Evaluation of features for leaf discrimination, Silva, Pedro FB and Marcal, Andre RS and da Silva, Rubim M Almeida, International Conference on Image Analysis and Recognition, ICIAR, pages 197--204, 2013. Dataset available at UCI ML Repository.


Work done in the Probabilistic Machine Learning research group at Aalto University.





Machine Teaching of Active Sequential Learners







No releases published


No packages published