OTSeq2Set

OTSeq2Set: An Optimal Transport Enhanced Sequence-to-Set Model for Extreme Multi-label Text Classification

Dependency

torch==1.9.0

torchtext==0.10.0

Datasets

OTSeq2set uses the same dataset as AttentionXML, please download each dataset from the following links.

The gensim format GloVe embedding (840B,300d) is provided by AttentionXML here.

For Wiki10-31K,AmazonCat-13K, the label vocabulary is downloaded from The Extreme Classification Repository

or

We compress four datasets with label vocabulary and Glove embedding here.

The structure of the dataset should be:

OTSeq2Set
 |-- config
 |-- data                          
 |    |-- Eurlex              
 |    |-- AmazonCat-13K
 |    |-- Amazon-670K
 |    |-- Wiki10-31K       
 |    |-- glove.840B.300d.gensim
 |    |-- glove.840B.300d.gensim.vectors.npy
 |-- OTSet2Set.ipynb

Config

File config/OTSeq2Set.json contains the configuration of OTSeq2Set which the results are shown in the paper.

config/baselines.json contains the configuration of baseline models.

Description of configuration:

dl_conv : use light weight convolution or not
lambda_embedding: The parameter lambda of semantic optimal transport distance
finish : whether the model is trained or not, needs set to true if you don't want to train this model

Train

Run OTSeq2Set.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
config		config
model		model
utils		utils
.gitignore		.gitignore
OTSeq2Set.ipynb		OTSeq2Set.ipynb
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OTSeq2Set

Dependency

Datasets

or

Config

Train

About

Releases

Packages

Languages

caojie54/OTSeq2Set

Folders and files

Latest commit

History

Repository files navigation

OTSeq2Set

Dependency

Datasets

or

Config

Train

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages