Skip to content

Commit

Permalink
Merge pull request #24 from decile-team/main
Browse files Browse the repository at this point in the history
Merge main into doc_plots
  • Loading branch information
nab170130 committed May 6, 2021
2 parents ba274b2 + 77813ae commit 3d8767b
Show file tree
Hide file tree
Showing 48 changed files with 1,596 additions and 419 deletions.
16 changes: 11 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,9 @@
- [Evaluation of Active Learning Strategies](#evaluation-of-active-learning-strategies)
- [Testing Individual Strategies and Running Examples](#testing-individual-strategies-and-running-examples)
- [Mailing List](#mailing-list)
- [Publications](#publications)
- [Acknowledgement](#acknowledgement)
- [Team](#team)
- [Publications](#publications)

## What is DISTIL?
<p align="center">
Expand Down Expand Up @@ -126,9 +127,11 @@ DISTIL makes it extremely easy to integrate your custom models with active learn
* Check the models included in DISTIL for examples!

* Data Handler
* Your DataHandler class should have a boolean attribute “select:
* Your DataHandler class should have a boolean attribute “select=True” with default value True:
* If True: Your __getitem__(self, index) method should return (input, index)
* If False: Your __getitem__(self, index) method should return (input, label, index)
* Your DataHandler class should have a boolean attribute “use_test_transform=False” with default value False.

* Check the DataHandler classes included in DISTIL for examples!

To get a clearer idea about how to incorporate DISTIL with your own models, refer to [Getting Started With DISTIL & Active Learning Blog](https://decile-research.medium.com/getting-started-with-distil-active-learning-ba7fafdbe6f3)
Expand All @@ -142,7 +145,7 @@ To get a clearer idea about how to incorporate DISTIL with your own models, refe

## Active Learning Benchmarks using DISTIL
#### Experimentation Method
The models used below were first trained on n randomly selected points, where n is the budget of the experiment. For each set of new points added, the model was trained from scratch until the training accuracy crossed the max accuracy threshold. The test accuracy was then reported before the next selection round.
The models used below were first trained on an initial random set of points (equal to the budget). For each set of new points added, the model was trained from scratch until the training accuracy crossed the max accuracy threshold. The test accuracy was then reported before the next selection round. The results below are *preliminary* results each obtained only with one run. We are doing a more thorough benchmarking experiment, with more runs and report standard deviations etc. We will also link to a preprint which will include the benchmarking results.

#### CIFAR10
Model: Resnet18
Expand Down Expand Up @@ -216,6 +219,11 @@ To receive updates about DISTIL and to be a part of the community, join the Deci
```
https://groups.google.com/forum/#!forum/Decile_DISTIL_Dev/join
```
## Acknowledgement
This library takes inspiration, builds upon, and uses pieces of code from several open source codebases. These include [Kuan-Hao Huang's deep active learning repository](https://github.com/ej0cl6/deep-active-learning), [Jordan Ash's Badge repository](https://github.com/JordanAsh/badge), and [Andreas Kirsch's and Joost van Amersfoort's BatchBALD repository](https://github.com/BlackHC/batchbald_redux). Also, DISTIL uses [Apricot](https://github.com/jmschrei/apricot) for submodular optimization.

## Team
DISTIL is created and maintained by Nathan Beck, [Durga Sivasubramanian](https://www.linkedin.com/in/durga-s-352831105), [Apurva Dani](https://apurvadani.github.io/index.html), [Rishabh Iyer](https://www.rishiyer.com), and [Ganesh Ramakrishnan](https://www.cse.iitb.ac.in/~ganesh/). We look forward to have DISTIL more community driven. Please use it and contribute to it for your active learning research, and feel free to use it for your commercial projects. We will add the major contributors here.

## Publications

Expand All @@ -239,5 +247,3 @@ https://groups.google.com/forum/#!forum/Decile_DISTIL_Dev/join

[10] Gal, Yarin, Riashat Islam, and Zoubin Ghahramani. "Deep bayesian active learning with image data." International Conference on Machine Learning. PMLR, 2017.

## Acknowledgement
This library takes inspiration and also uses pieces of code from [Kuan-Hao Huang's deep active learning repository](https://github.com/ej0cl6/deep-active-learning), [Jordan Ash's Badge repository](https://github.com/JordanAsh/badge), and [Andreas Kirsch's and Joost van Amersfoort's BatchBALD repository](https://github.com/BlackHC/batchbald_redux). Also, DISTIL uses [Apricot](https://github.com/jmschrei/apricot) for submodular optimization.
88 changes: 45 additions & 43 deletions distil/active_learning_strategies/adversarial_bim.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,52 +4,54 @@
from .strategy import Strategy

class AdversarialBIM(Strategy):
def __init__(self, X, Y, unlabeled_x, net, handler, nclasses, args={}):
"""
"""
Implements Adversial Bim Strategy which is motivated by the fact that often the distance
computation from decision boundary is difficult and intractable for margin based methods. This
technique avoids estimating distance by using BIM(Basic Iterative Method)
:footcite:`tramer2017ensemble` to estimate how much adversarial perturbation is required to
cross the boundary. Smaller the required the perturbation, closer the point is to the boundary.
**Basic Iterative Method (BIM)**: Given a base input, the approach is to perturb each
feature in the direction of the gradient by magnitude :math:`\\epsilon`, where is a
parameter that determines perturbation size. For a model with loss
:math:`\\nabla J(\\theta, x, y)`, where :math:`\\theta` represents the model parameters,
x is the model input, and y is the label of x, the adversarial sample is generated
iteratively as,
.. math::
\\begin{eqnarray}
x^*_0 & = &x,
x^*_i & = & clip_{x,e} (x^*_{i-1} + sign(\\nabla_{x^*_{i-1}} J(\\theta, x^*_{i-1} , y)))
\\end{eqnarray}
Parameters
----------
X: numpy array
Present training/labeled data
y: numpy array
Labels of present training data
unlabeled_x: numpy array
Data without labels
net: class
Pytorch Model class
handler: class
Data Handler, which can load data even without labels.
nclasses: int
Number of unique target variables
args: dict
Specify optional parameters
Implements Adversial Bim Strategy which is motivated by the fact that often the distance
computation from decision boundary is often difficult and intractable for margin based
methods. This technique avoids estimating distance by using BIM(Basic Iterative Method)
:footcite:`tramer2017ensemble` to estimate how much adversarial perturbation is required
to cross the boundary. Smaller the required the perturbation, closer the point is to the
boundary.
**Basic Iterative Method (BIM)**: Given a base input, the approach is to perturb each
feature in the direction of the gradient by magnitude :math:`\\epsilon`, where is a
parameter that determines perturbation size. For a model with loss
:math:`\\nabla J(\\theta, x, y)`, where :math:`\\theta` represents the model parameters,
x is the model input, and y is the label of x, the adversarial sample is generated
iteratively as,
.. math::
x*{\\*}_0 = x,
x*{\\*}_i = clip_{x,e} (x*{\\*}_{i-1} + sign(\\nabla_{x*{\\*}_{i-1}} J(\\theta, x*{\\*}_{i-1} , y)))
`batch_size`- Batch size to be used inside strategy class (int, optional)
Parameters
----------
X: numpy array
Present training/labeled data
y: numpy array
Labels of present training data
unlabeled_x: numpy array
Data without labels
net: class
Pytorch Model class
handler: class
Data Handler, which can load data even without labels.
nclasses: int
Number of unique target variables
args: dict
Specify optional parameters
batch_size
Batch size to be used inside strategy class (int, optional)
eps
epsilon value for gradients
`eps`-epsilon value for gradients
"""

def __init__(self, X, Y, unlabeled_x, net, handler, nclasses, args={}):
"""
Constructor method
"""

if 'eps' in args:
self.eps = args['eps']
else:
Expand Down
3 changes: 2 additions & 1 deletion distil/active_learning_strategies/adversarial_deepfool.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ class AdversarialDeepFool(Strategy):
Implements Adversial Deep Fool Strategy :footcite:`ducoffe2018adversarial`, a Deep-Fool based
Active Learning strategy that selects unlabeled samples with the smallest adversarial
perturbation. This technique is motivated by the fact that often the distance computation
from decision boundary is often difficult and intractable for margin-based methods. This
from decision boundary is difficult and intractable for margin-based methods. This
technique avoids estimating distance by using Deep-Fool :footcite:`Moosavi-Dezfooli_2016_CVPR`
like techniques to estimate how much adversarial perturbation is required to cross the boundary.
The smaller the required perturbation, the closer the point is to the boundary.
Expand Down Expand Up @@ -47,6 +47,7 @@ def __init__(self, X, Y, unlabeled_x, net, handler, nclasses, args={}):
super(AdversarialDeepFool, self).__init__(X, Y, unlabeled_x, net, handler, nclasses, args={})

def cal_dis(self, x):

nx = Variable(torch.unsqueeze(x, 0), requires_grad=True)
eta = Variable(torch.zeros(nx.shape))

Expand Down
32 changes: 17 additions & 15 deletions distil/active_learning_strategies/badge.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,22 +57,24 @@ class BADGE(Strategy):
hypothesised labels. Then to select the points to be labeled are selected by applying
k-means++ on these loss gradients.
Parameters.
Parameters
----------
X: Numpy array
Features of the labled set of points
Y: Numpy array
Lables of the labled set of points
unlabeled_x: Numpy array
Features of the unlabled set of points
net: class object
Model architecture used for training. Could be instance of models defined in `distil.utils.models` or something similar.
handler: class object
It should be a subclasses of torch.utils.data.Dataset i.e, have __getitem__ and __len__ methods implemented, so that is could be passed to pytorch DataLoader.Could be instance of handlers defined in `distil.utils.DataHandler` or something similar.
nclasses: int
No. of classes in tha dataset
args: dictionary
This dictionary should have 'batch_size' as a key.
X: numpy array
Present training/labeled data
Y: numpy array
Labels of present training data
unlabeled_x: numpy array
Data without labels
net: class
Pytorch Model class
handler: class
Data Handler, which can load data even without labels.
nclasses: int
Number of unique target variables
args: dict
Specify optional parameters.
`batch_size`
Batch size to be used inside strategy class (int, optional)
"""

def __init__(self, X, Y, unlabeled_x, net, handler,nclasses, args):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,11 @@

class BALDDropout(Strategy):
"""
Implementation of BALDDropout Strategy.
This class extends :class:`active_learning_strategies.strategy.Strategy`
to include entropy sampling technique to select data points for active learning.
Implements Bayesian Active Learning by Disagreement (BALD) Strategy :footcite:`houlsby2011bayesian`,
which assumes a Basiyan setting and selects points which maximise the mutual information
between the predicted labels and model parameters. This implementation is an adaptation for a
non-bayesian setting, with the assumption that there is a dropout layer in the model.
Parameters
----------
Expand Down
2 changes: 1 addition & 1 deletion distil/active_learning_strategies/core_set.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ class CoreSet(Strategy):
----------
X: numpy array
Present training/labeled data
y: numpy array
Y: numpy array
Labels of present training data
unlabeled_x: numpy array
Data without labels
Expand Down
2 changes: 1 addition & 1 deletion distil/active_learning_strategies/entropy_sampling.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ class EntropySampling(Strategy):
we use entropy and therefore select points which have maximum entropy.
Suppose the model has `nclasses` output nodes and each output node is denoted by :math:`z_j`. Thus,
:math:`j \\in [1,nclasses]`. Then for a output node :math:`z_i` from the model, the correponding
:math:`j \\in [1,nclasses]`. Then for a output node :math:`z_i` from the model, the corresponding
softmax would be
.. math::
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ class EntropySamplingDropout(Strategy):
which have maximum entropy.
Suppose the model has `nclasses` output nodes and each output node is denoted by :math:`z_j`. Thus,
:math:`j \in [1,nclasses]`. Then for a output node :math:`z_i` from the model, the correponding
:math:`j \in [1,nclasses]`. Then for a output node :math:`z_i` from the model, the corresponding
softmax would be
.. math::
Expand Down
25 changes: 17 additions & 8 deletions distil/active_learning_strategies/fass.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,17 @@ class FASS(Strategy):
'facility_location' , 'graph_cut', 'saturated_coverage', 'sum_redundancy', 'feature_based'
is applied to get the final set of points.
We select a subset :math:`F` of size :math:`\\beta` based on uncertainty sampling, such
that :math:`\\beta \\ge k`.
Then select a subset :math:`S` by solving
.. math::
\\max \\{f(S) \\text{ such that } |S| \\leq k, S \\subseteq F\\}
where :math:`k` is the is the `budget` and :math:`f` can be one of these functions -
'facility location' , 'graph cut', 'saturated coverage', 'sum redundancy', 'feature based'.
Parameters
----------
X: numpy array
Expand All @@ -26,16 +37,14 @@ class FASS(Strategy):
nclasses: int
Number of unique target variables
args: dict
Specify optional parameters
batch_size
Specify optional parameters - `batch_size`
Batch size to be used inside strategy class (int, optional)
submod: str
Choice of submodular function - 'facility_location' | 'graph_cut' | 'saturated_coverage' | 'sum_redundancy' | 'feature_based'
selection_type: str
Choice of selection strategy - 'PerClass' | 'Supervised'
submod: str
Choice of submodular function - 'facility_location' | 'graph_cut' | 'saturated_coverage' | 'sum_redundancy' | 'feature_based'
selection_type: str
Choice of selection strategy - 'PerClass' | 'Supervised'
"""

def __init__(self, X, Y, unlabeled_x, net, handler, nclasses, args={}):
Expand Down
15 changes: 14 additions & 1 deletion distil/active_learning_strategies/gradmatch_active.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,20 @@ class GradMatchActive(Strategy):
hypothesized labels of the loss function and are matched to either the full gradient of these hypothesized
examples or a supplied validation gradient. The indices returned are the ones selected by this algorithm.
.. math::
Err(X_t, L, L_T, \\theta_t) = \\left |\\left| \\sum_{i \\in X_t} \\nabla_\\theta L_T^i (\\theta_t) - \\frac{k}{N} \\nabla_\\theta L(\\theta_t) \\right | \\right|
where,
- Each gradient is computed with respect to the last layer's parameters
- :math:`\\theta_t` are the model parameters at selection round :math:`t`
- :math:`X_t` is the queried set of points to label at selection round :math:`t`
- :math:`k` is the budget
- :math:`N` is the number of points contributing to the full gradient :math:`\\nabla_\\theta L(\\theta_t)`
- :math:`\\nabla_\\theta L(\\theta_t)` is either the complete hypothesized gradient or a validation gradient
- :math:`\\sum_{i \\in X_t} \\nabla_\\theta L_T^i (\\theta_t)` is the subset's hypothesized gradient with :math:`|X_t| = k`
Parameters
----------
X: Numpy array
Expand Down
4 changes: 2 additions & 2 deletions distil/active_learning_strategies/least_confidence.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ class LeastConfidence(Strategy):
Suppose the model has `nclasses` output nodes denoted by :math:`\\overrightarrow{\\boldsymbol{z}}`
and each output node is denoted by :math:`z_j`. Thus, :math:`j \\in [1, nclasses]`.
Then for a output node :math:`z_i` from the model, the correponding softmax would be
Then for a output node :math:`z_i` from the model, the corresponding softmax would be
.. math::
\\sigma(z_i) = \\frac{e^{z_i}}{\\sum_j e^{z_j}}
Expand All @@ -18,7 +18,7 @@ class LeastConfidence(Strategy):
confidence as follows,
.. math::
arg\\min_{{S \\subseteq {\\mathcal U}, |S| \\leq k}}{(arg\\max_j{(\\sigma(\\overrightarrow{\\boldsymbol{z}}))})}
\\mbox{argmin}_{{S \\subseteq {\\mathcal U}, |S| \\leq k}}{\\sum_S(\mbox{argmax}_j{(\\sigma(\\overrightarrow{\\boldsymbol{z}}))})}
where :math:`\\mathcal{U}` denotes the Data without lables i.e. `unlabeled_x` and :math:`k` is the `budget`.
Expand Down
4 changes: 2 additions & 2 deletions distil/active_learning_strategies/least_confidence_dropout.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ class LeastConfidenceDropout(Strategy):
Suppose the model has `nclasses` output nodes denoted by :math:`\\overrightarrow{\\boldsymbol{z}}`
and each output node is denoted by :math:`z_j`. Thus, :math:`j \\in [1, nclasses]`.
Then for a output node :math:`z_i` from the model, the correponding softmax would be
Then for a output node :math:`z_i` from the model, the corresponding softmax would be
.. math::
\\sigma(z_i) = \\frac{e^{z_i}}{\\sum_j e^{z_j}}
Expand All @@ -17,7 +17,7 @@ class LeastConfidenceDropout(Strategy):
confidence as follows,
.. math::
arg\\min_{{S \\subseteq {\\mathcal U}, |S| \\leq k}}{(arg\\max_j{(\\sigma(\\overrightarrow{\\boldsymbol{z}}))})}
\\mbox{argmin}_{{S \\subseteq {\\mathcal U}, |S| \\leq k}}{\\sum_S(\\mbox{argmax}_j{(\\sigma(\\overrightarrow{\\boldsymbol{z}}))})}
where :math:`\\mathcal{U}` denotes the Data without lables i.e. `unlabeled_x` and :math:`k` is the `budget`.
Expand Down
6 changes: 3 additions & 3 deletions distil/active_learning_strategies/margin_sampling.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,20 +10,20 @@ class MarginSampling(Strategy):
Suppose the model has `nclasses` output nodes denoted by :math:`\\overrightarrow{\\boldsymbol{z}}`
and each output node is denoted by :math:`z_j`. Thus, :math:`j \\in [1, nclasses]`.
Then for a output node :math:`z_i` from the model, the correponding softmax would be
Then for a output node :math:`z_i` from the model, the corresponding softmax would be
.. math::
\\sigma(z_i) = \\frac{e^{z_i}}{\\sum_j e^{z_j}}
Let,
.. math::
m = arg\\max_j{(\\sigma(\\overrightarrow{\\boldsymbol{z}}))}
m = \\mbox{argmax}_j{(\\sigma(\\overrightarrow{\\boldsymbol{z}}))}
Then using softmax, Margin Sampling Strategy would pick `budget` no. of elements as follows,
.. math::
arg\\min_{{S \\subseteq {\\mathcal U}, |S| \\leq k}}{(arg\\max_j {(\\sigma(\\overrightarrow{\\boldsymbol{z}}))}) - (arg\\max_{j \\ne m} {(\\sigma(\\overrightarrow{\\boldsymbol{z}}))})}
\\mbox{argmin}_{{S \\subseteq {\\mathcal U}, |S| \\leq k}}{\\sum_S(\\mbox{argmax}_j {(\\sigma(\\overrightarrow{\\boldsymbol{z}}))}) - (\\mbox{argmax}_{j \\ne m} {(\\sigma(\\overrightarrow{\\boldsymbol{z}}))})}
where :math:`\\mathcal{U}` denotes the Data without lables i.e. `unlabeled_x` and :math:`k` is the `budget`.
Expand Down

0 comments on commit 3d8767b

Please sign in to comment.