Skip to content

caojie54/OTSeq2Set

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OTSeq2Set


OTSeq2Set: An Optimal Transport Enhanced Sequence-to-Set Model for Extreme Multi-label Text Classification

Dependency


torch==1.9.0

torchtext==0.10.0

Datasets


OTSeq2set uses the same dataset as AttentionXML, please download each dataset from the following links.

The gensim format GloVe embedding (840B,300d) is provided by AttentionXML here.

For Wiki10-31K,AmazonCat-13K, the label vocabulary is downloaded from The Extreme Classification Repository

or

We compress four datasets with label vocabulary and Glove embedding here.

The structure of the dataset should be:

OTSeq2Set
 |-- config
 |-- data                          
 |    |-- Eurlex              
 |    |-- AmazonCat-13K
 |    |-- Amazon-670K
 |    |-- Wiki10-31K       
 |    |-- glove.840B.300d.gensim
 |    |-- glove.840B.300d.gensim.vectors.npy
 |-- OTSet2Set.ipynb       

Config


File config/OTSeq2Set.json contains the configuration of OTSeq2Set which the results are shown in the paper.

config/baselines.json contains the configuration of baseline models.

Description of configuration:

  • dl_conv : use light weight convolution or not
  • lambda_embedding: The parameter lambda of semantic optimal transport distance
  • finish : whether the model is trained or not, needs set to true if you don't want to train this model

Train


Run OTSeq2Set.ipynb

About

OTSeq2Set, XMTC

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published