Astro-mT5

This repository contains code as well as the paper for [ Astro-mT5: Entity Extraction from Astrophysics Literature using mT5 Language Model ] accepted at AACL-IJCNLP Workshop '2022.

Abstract

Scientific research requires reading and extracting relevant information from existing scientific literature in an effective way. To gain insights over a collection of such scientific documents, extraction of entities and recognizing their types is considered to be one of the important tasks. Numerous studies have been conducted in this area of research. In our study, we introduce a framework for entity recognition and identification of NASA astrophysics dataset, which was published as a part of the DEAL SharedTask. We use a pre-trained multilingual model, based on a natural language processing framework for the given sequence labeling tasks. Experiments show that our model, Astro-mT5, outperforms the existing baseline in astrophysics related information extraction. Our paper is available at work.

Setup

Install Package Dependencies

git clone https://github.com/flairNLP/flair.git
cd flair
git checkout add-t5-encoder-support
pip3 install -e .
For running the experiment run_ner.py and test.py have to be kept inside the flair directory.

Training

The main training procedure is:
python3 run_ner.py --dataset_name NER_MASAKHANE \
--model_name_or_path google/mt5-large\
--layers -1\
--subtoken_pooling first_last\
--hidden_size 256\
--batch_size 4\
--learning_rate 5e-05\
--num_epochs 100\
--use_crf True\
--output_dir /content/mt5-large

Tesing

After training, you can find the best checkpoint on the dev set according to the evaluation results. For this run
python3 test.py

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
run_ner.py		run_ner.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Astro-mT5

Abstract

Setup

Training

Tesing

About

Releases

Packages

Contributors 3

Languages

MLlab4CS/Astro-mT5

Folders and files

Latest commit

History

Repository files navigation

Astro-mT5

Abstract

Setup

Training

Tesing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages