Email Actionable Points Extraction

This repo details the work done for the module project of NUS CS4248 Natural Language Processing.

Overview

This codebase trains a language model to extract a list of actionable points from a given email, using an original email dataset generated using data inversion.

Models

All finetuned models are available on HuggingFace Hub, and can be accessed through the following links:

Bloom-560m finetuned on InstructGPT3 data:
https://huggingface.co/pinxi/bloom-560m-igpt3

Bloom-560m finetuned on Bloom data:
https://huggingface.co/pinxi/bloom-560m-bloom

Bloom-1b7 finetuned on InstructGPT3 data:
https://huggingface.co/pinxi/bloom-1b7-igpt3

Bloom-1b7 finetuned on Bloom data:
https://huggingface.co/pinxi/bloom-1b7-bloom

Codebase

1. Setup

Install the project dependencies.

pip install -r requirements.txt

Add your API keys in settings.py or through env variables.

2. Generate Data

We generate the original email dataset by prompting another pretrained language model with self-crafted actionable and non-actionable points to write an email. The datapoints are then inverted to create an email-to-actionable points dataset.

Data generation script handles all possible ways to generate data:

python data_generation/data_generator.py

Our datasets can be found in the data directory:

# data generated by InstructGPT3
gpt_generated_data.jsonl

# data generated by Bloom
bloom_generated_data.jsonl

# handwritten dataset for evaluation
handwritten_data.jsonl

3. Finetune Bloom Models

Finetuning was done from a Jupyter notebook:

finetuning/bloom_finetune.ipynb

DeepSpeed config we used for finetuning can be found and modified in:

finetuning/ds_config_zero2.json

4. Run evaluation script

Evaluation was done from a Jupyter notebook:

finetuning/bloom_loss.ipynb

Contributors

Tan Pinxi, Tan Xi Zhe, Tan Ming Ann, Lim Yu Yang, Ng Boon Hong

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

data_generation

data_generation

evaluation

evaluation

finetuning

finetuning

.gitignore

.gitignore

README.md

README.md

requirements.txt

requirements.txt

settings.py

settings.py

Repository files navigation

Email Actionable Points Extraction

Overview

Models

Codebase

1. Setup

2. Generate Data

3. Finetune Bloom Models

4. Run evaluation script

Contributors

About

Releases

Packages

Contributors 5

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
data		data
data_generation		data_generation
evaluation		evaluation
finetuning		finetuning
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
settings.py		settings.py

FizzyAgent/cs4248-nlp-project

Folders and files

Latest commit

History

Repository files navigation

Email Actionable Points Extraction

Overview

Models

Codebase

1. Setup

2. Generate Data

3. Finetune Bloom Models

4. Run evaluation script

Contributors

About

Resources

Stars

Watchers

Forks

Languages