CoNLL2003 choices data

The json data in this repository was generated by 500 inferences in each sentence on CoNLL2003 train, validation, test data, CoNLL++ (2023) test data, and CoNLL++ (CrossWeigh) test data, using subword regularization (BPE-Dropout) with hyperparameter p = 0.1.

For the training data, data was generated by five rounds of inferences for 20% of the train data by the model trained on the remaining 80%. (train_cross_var.py)
For the other data, each data was generated by inferences by all train data. (train_all_train_data.py)
- We used Roberta-base model fine-tuned by CoNLL2003 train dataset. You can use our model and see detail train config in huggingface.

All datasets can be found at choice_data/

We removed the original token and label data. Therefore, in this repository, only the inference labels, the times that they were predicted in 500, and the f1 score with the golden labels are available. If you have original CoNLL2003 data, you can add original token and label data to our datasets.

F1 score is calculated by seqeval, but there are only O labels in both of predicted labels and golden labels, we set the score at 1.0.

Setting

git clone git@github.com:4ldk/CoNLL2003_Choices.git
cd CoNLL2003_Choices
mkdir row_data
Copy each data to row_data/
- Copy Original data (eng.train, eng.testa, eng.tesb) directly.
  - The following two repositories also have original data, but some data that are not suitable for training have been erased. The data and models published in this repository is based on data that has not been erased.
- Copy conllpp.txt of CoNLL++ (2023) directly.
- Copy conllpp_test.txt of CoNLL++ (CrossWeigh) as conllcw.txt
pip install -r requirement.txt

Add original label and token information to choice_data

python3 ./src/add_row_data.py

Create choice data by using the trained model

mkdir model
Edit config/config2003.yaml to set test test data, loop the number of test loops, pred_p subword regularization hyperparameter p, load_local_model: False and test_model_name: "4ldk/Roberta-Base-CoNLL2003".
python3 ./src/predictor.py

Train models and create train choice data

Edit config/config2003.yaml
If you want to change the number of divisions in the train data, Change line 12 of make_cross_data.py and lines 106 and 108 of train_cross_var.py.
python3 ./src/make_cross_data.py
python3 ./src/train_cross_var.py

Train a model by all train data and create choice data

Edit config/config2003.yaml
- Set load_local_model: True and others you want to change
python3 ./src/train_all_train_data.py
python3 ./src/predictor.py

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
choice_data		choice_data
config		config
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoNLL2003 choices data

Setting

Add original label and token information to choice_data

Create choice data by using the trained model

Train models and create train choice data

Train a model by all train data and create choice data

Reference

About

Releases

Packages

Languages

License

4ldk/CoNLL2003_Choices

Folders and files

Latest commit

History

Repository files navigation

CoNLL2003 choices data

Setting

Add original label and token information to choice_data

Create choice data by using the trained model

Train models and create train choice data

Train a model by all train data and create choice data

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages