Skip to content

High-East/Paper-Tag-Generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Paper Tag Generation

  • XAI606 수업의 두 번째 프로젝트를 위한 레포지토리입니다. 이 레포지토리의 대부분의 코드와 파일은 이곳 을 참고 하였습니다.
  • W&B 프로젝트 링크: URL

Abstract

From paperswithcode, researchers are provided with SOTA models of numerous different tasks at once, also with their work written in paper. Thanks to sharing this tough organizing work, people studying artificial intelligence are now able to access and search with less labour compared to the past. Here in this work, I thought it would be good to use this archived papers once again for further usage.

Paperswithcode archived all papers in the following schema.

.
├── README.md
├── data
│   └── paperswithcode
│       ├── dataset_dict.json
│       ├── dev
│       │   ├── dataset.arrow
│       │   ├── dataset_info.json
│       │   └── state.json
│       ├── test
│       │   ├── dataset.arrow
│       │   ├── dataset_info.json
│       │   └── state.json
│       └── train
│           ├── dataset.arrow
│           ├── dataset_info.json
│           └── state.json
└── source
    ├── arguments.py
    ├── run.py
    └── utils.py
                              

Generation-based Paper category prediction

All files inside paper_clf.

  1. To train generation-based prediction, type in below
    python source/run.py --do_train --output_dir=finetuned_model
    • Use train_subsample_ratio from 0 to 1, if you want to use some portion of the training data.
    • output_dir is should be filled in. You can use this as a checkpoint for validation and evaluation.
    • Read paper_clf/arguments.py for detailed configuration.
    • Fill in pretrained model model_name_or_path with generation model. Recently Huggingface
  2. To evaluate with evaluation data (dev), type in below
    python source/run.py --do_eval --model_name_or_path=finetuned_model --output_dir=finetuned_model
    • Use valid_subsample_ratio from 0 to 1, if you want to use some portion of the evaluation data.
    • Put the saved/trained model directory in model_name_or_path to use finetuned model that you have trained.
  3. To make predictions with test data
    python source/run.py --do_predict --model_name_or_path=finetuned_model --output_dir=prediction
    • Through this, you will use finetuned_model which is your model to predict the testset. In prediction/predictions.json, the prediction result will be saved with arxiv_id: prediction format.

Requirements

python==3.9.7
pytorch==1.10.0
transformers==4.11.3
datasets==1.14.0

Pretraining Masked Lanuage Modeling

This is removed, since training the generation model itself is enromous. Please focus on training the generation model.

About

Project for XAI606(Korea University)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages