Skip to content
An Open-source Neural Hierarchical Multi-label Text Classification Toolkit
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
conf
data
dataset add predict module Aug 14, 2019
evaluate
model
readme Update Configuration.md Sep 16, 2019
License_for_NeuralClassifier.TXT
README.md
config.py
eval.py
predict.py
requirements.txt
train.py
util.py

README.md

NeuralClassifier Logo

NeuralClassifier: An Open-source Neural Hierarchical Multi-label Text Classification Toolkit

Introduction

NeuralClassifier is designed for quick implementation of neural models for hierarchical multi-label classification task, which is more challenging and common in real-world scenarios. A salient feature is that NeuralClassifier currently provides a variety of text encoders, such as FastText, TextCNN, TextRNN, RCNN, VDCNN, DPCNN, DRNN, AttentiveConvNet and Transformer encoder, etc. It also supports other text classification scenarios, including binary-class and multi-class classification. It is built on PyTorch. Experiments show that models built in our toolkit achieve comparable performance with reported results in the literature.

Support tasks

  • Binary-class text classifcation
  • Multi-class text classification
  • Multi-label text classification
  • Hiearchical (multi-label) text classification (HMC)

Support text encoders

Requirement

  • Python 3
  • PyTorch 0.4+
  • Numpy 1.14.3+

System Architecture

NeuralClassifier Architecture

Usage

Training

python train.py conf/train.json

Detail configurations and explanations see Configuration.

The training info will be outputted in standard output and log.logger_file.

Evaluation

python eval.py conf/train.json
  • if eval.is_flat = false, hierarchical evaluation will be outputted.
  • eval.model_dir is the model to evaluate.
  • data.test_json_files is the input text file to evaluate.

The evaluation info will be outputed in eval.dir.

Prediction

python predict.py conf/train.json data/predict.json 
  • predict.json should be of json format, while each instance has a dummy label like "其他" or any other label in label map.
  • eval.model_dir is the model to predict.
  • eval.top_k is the number of labels to output.
  • eval.threshold is the probability threshold.

The predict info will be outputed in predict.txt.

Input Data Format

JSON example:

{
    "doc_label": ["Computer--MachineLearning--DeepLearning", "Neuro--ComputationalNeuro"],
    "doc_token": ["I", "love", "deep", "learning"],
    "doc_keyword": ["deep learning"],
    "doc_topic": ["AI", "Machine learning"]
}

"doc_keyword" and "doc_topic" are optional.

Performance

0. Dataset

DatasetTaxonomy#Label#Training#Test
RCV1Tree10323,149781,265
YelpDAG53987,37537,265

1. Compare with state-of-the-art

Text EncodersMicro-F1 on RCV1Micro-F1 on Yelp
HR-DGCNN (Peng et al., 2018)0.7610-
HMCN (Wehrmann et al., 2018)0.80800.6640
Ours0.83130.6704

2. Different text encoders

Text EncodersRCV1Yelp
Micro-F1Macro-F1Micro-F1Macro-F1
TextCNN0.77170.52460.62810.3657
TextRNN0.81520.54580.67040.4059
RCNN0.83130.60470.65690.3951
FastText0.68870.2701 0.60310.2323
DRNN0.7846 0.51470.65790.4401
DPCNN0.8220 0.5609 0.5671 0.2393
VDCNN0.7263 0.38600.63950.4035
AttentiveConvNet0.75330.43730.63670.4040
RegionEmbedding0.7780 0.4888 0.66010.4514
Transformer0.7603 0.42740.65330.4121
Star-Transformer0.7668 0.48400.64820.3895

3. Hierarchical vs Flat

Text EncodersHierarchicalFlat
Micro-F1Macro-F1Micro-F1Macro-F1
TextCNN0.77170.52460.73670.4224
TextRNN0.81520.54580.7546 0.4505
RCNN0.83130.60470.79550.5123
FastText0.68870.2701 0.68650.2816
DRNN0.7846 0.51470.75060.4450
DPCNN0.8220 0.5609 0.7423 0.4261
VDCNN0.7263 0.38600.71100.3593
AttentiveConvNet0.75330.43730.75110.4286
RegionEmbedding0.7780 0.4888 0.76400.4617
Transformer0.7603 0.42740.76020.4339
Star-Transformer0.7668 0.48400.76180.4745

Acknowledgement

Some public codes are referenced by our toolkit:

Update

  • 2019-04-29, init version
You can’t perform that action at this time.