Skip to content
Switch branches/tags
Go to file

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


This is the implementation of Constituency Parsing with Span Attention at Findings of EMNLP2020.

Please contact us at or if you have any questions.


If you use or extend our work, please cite our paper at Findings of EMNLP-2020.

    title = "Improving Constituency Parsing with Span Attention",
    author = "Tian, Yuanhe and Song, Yan and Xia, Fei and Zhang, Tong",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
    month = nov,
    year = "2020",
    address = "Online",
    pages = "1691--1703",


  • python 3.6
  • pytorch 1.1

Install python dependencies by running:

pip install -r requirements.txt

EVALB and EVALB_SPMRL contain the code to evaluate the parsing results for English and other languages. Before running evaluation, you need to go to the EVALB (for English) or EVALB_SPMRL (for other languages) and run make.

Downloading BERT, ZEN, XLNet and Our Pre-trained Models

In our paper, we use BERT, ZEN, and XLNet as the encoder.

For BERT, please download pre-trained BERT model from Google and convert the model from the TensorFlow version to PyTorch version.

  • For Arabic, we use MulBERT-Base, Multilingual Cased.
  • For Chinese, we use BERT-Base, Chinese;
  • For English, we use BERT-Large, Cased and BERT-Large, Uncased.

For ZEN, you can download the pre-trained model from here.

For XLNet, you can download the pre-trained model from here.

For our pre-trained model, you can download them from Baidu Wangpan (passcode: 2o1n) or Google Drive.

Run on Sample Data

To train a model on a small dataset, run:



We use datasets in three languages: Arabic, Chinese, and English.

To preprocess the data, please go to data_processing directory and follow the instruction to process the data. You need to obtain the official datasets yourself before running our code.

Ideally, all data will appear in ./data directory. The data with gold POS tags are located in folders whose name is the same as the dataset name (i.e., ATB, CTB, and PTB); the data with predicted POS tags are located in folders whose name has a "_POS" suffix (i.e., ATB_POS, CTB_POS, and PTB_POS).

Training, Testing, and Predicting

You can find the command lines to train and test models on a specific dataset in

To-do List

  • Regular maintenance.

You can leave comments in the Issues section, if you want us to implement any functions.

You can check our updates at


No description, website, or topics provided.




No releases published


No packages published