PyTorch implementation for experiments in the paper Classification from Positive, Unlabeled and Biased Negative Data.
- Python >= 3.6
- PyTorch >= 0.4.0, scikit-learn, NumPy
- yaml to load parameters
- nltk, allennlp, h5py to prepare the 20newsgroups ELMO embedding
The file pu_biased_n.py
allows to reproduce most of the experimental results
described in the paper:
python(3) pu_biased_n.py --dataset [dataset] --params-path [parameter-path] --random-seed [random-seed]
where dataset
is either mnist
, cifar10
or newsgroups
and
parameter-path
is a yml
file containing the hyperparameters of the experiment.
The hyperparameter files used for the results shown in Table 1 can be found under
the params/
directory.
To prepare the ELMO embedding of the 20newsgroups dataset. Please download the ELMO 5.5B pre-trained model from https://allennlp.org/elmo (elmo_2x4096_512_2048cnn_2xhighway_5.5B_weights) and put it under data/20newsgroups/
; then run the two files train_elmo_prepare.py
and test_elmo_prepare.py
located in this same directory.