OpenBART: Generating Open Questions

We present OpenBART, a Natural Language Processing model based on BART generating relevant open questions from input paragraphs. Made for a Language Technology Practical course.

Dependencies

The dependencies for the model can be automatically installed by downloading and using references.txt:

pip install -r references.txt

Model Use

The main way to use the model is through generate.py to generate an open question from a user-inputted paragraph. There are two ways to do this, both require the model folder model-OpenBART to be present in the folder.

The first way to run generate is through the command line, by executing the following command:

python3 generate.py

The second way to run generate is by importing it, as shown below:

import generate

input_string = "This is an example input paragraph."
generate.generate_question(input_string)

It is possible to run these with the model present in a different folder. Additionally, using the import method, one can specify whether data should be preprocessed:

python3 generate.py path/to/folder_that_contains_model_folder

folder = "path/to/folder_that_contains_model_folder
preprocess = True
generate.generate_question(input_string, folder, preprocess)

Reproducing Results

In this section, we will describe the methods that led to the model & the evaluation scores, and how to reproduce them. Firstly, to install all relevant packages and dependencies, download requirements.txt and run the following:

pip install -r requirements.txt

Preprocessing Data

Preprocessing involves two files: main.py, which is executed, and prepdata.py, which is imported by main.py. main takes a single split (train, test, validation1 or validation2) from rexarski/eli5-category on huggingface.co and preprocesses it by running it through an NER tagger and a Keyword Extractor.

python3 main.py split (path/to/save_folder)

Model Training

Model training is done using train_model.py. This program is more flexible and has more opportunity for customisation. It is called using the following:

python3 train_model.py save_folder_name

and takes the following arguments:

-m		--model					Destination name for checkpoints, results and final model
-t 		--tokenizer			Source name of tokenizer to use
-d 		--dataset				Source name of model to train
-p 		--path					Path to source & destination folders
-e 		--epochs				Number of epochs to train the model
-l 		--learningrate	Learning rate of the model
-b		--batchsize			Batch sizes of the model
-c 		--cpu						Use CPU instead of GPU
-q 		--checkpoint		Continue from specified checkpoint

To train the final model, this script was executed with the following parameters:

python3 train_model.py "four-epochs" --epochs 4 --batchsize 8 --learningrate 2e-5

Parameters that were left out had default values instantiated by the script. To reproduce these results on a different machine, the path must be altered to fit the machine layout.

This code outputs a model (with the three most recent checkpoints and the best checkpoint) in --path/save_folder_name

Model Evaluation

Model evaluation involves two files, one of which is generate.py, which we covered in Model Use. The second is evaluate_model.py This is a script that takes 100 items from the validation2 split of the dataset and compares the model-generated questions with dataset questions. Evaluation metrics are BERTScore and BLEURT. The usage is the following:

python3 evaluate_model.py "four-epochs"

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
dataset-prepped2		dataset-prepped2
dataset-tokenized		dataset-tokenized
model-four-epochs		model-four-epochs
.gitignore		.gitignore
README.md		README.md
evaluate_model.py		evaluate_model.py
generate.py		generate.py
losses-first-epochs.csv		losses-first-epochs.csv
main.py		main.py
preload-resources.py		preload-resources.py
prepdata.py		prepdata.py
requirements.txt		requirements.txt
train_model.py		train_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenBART: Generating Open Questions

Dependencies

Model Use

Reproducing Results

Preprocessing Data

Model Training

Model Evaluation

About

Releases

Packages

Languages

JTalpa/openbart-ltp

Folders and files

Latest commit

History

Repository files navigation

OpenBART: Generating Open Questions

Dependencies

Model Use

Reproducing Results

Preprocessing Data

Model Training

Model Evaluation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages