Skip to content

FizzyAgent/cs4248-nlp-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Email Actionable Points Extraction

This repo details the work done for the module project of NUS CS4248 Natural Language Processing.

Overview

This codebase trains a language model to extract a list of actionable points from a given email, using an original email dataset generated using data inversion.

Models

All finetuned models are available on HuggingFace Hub, and can be accessed through the following links:

Bloom-560m finetuned on InstructGPT3 data:
https://huggingface.co/pinxi/bloom-560m-igpt3

Bloom-560m finetuned on Bloom data:
https://huggingface.co/pinxi/bloom-560m-bloom

Bloom-1b7 finetuned on InstructGPT3 data:
https://huggingface.co/pinxi/bloom-1b7-igpt3

Bloom-1b7 finetuned on Bloom data:
https://huggingface.co/pinxi/bloom-1b7-bloom

Codebase

1. Setup

Install the project dependencies.

pip install -r requirements.txt

Add your API keys in settings.py or through env variables.

2. Generate Data

We generate the original email dataset by prompting another pretrained language model with self-crafted actionable and non-actionable points to write an email. The datapoints are then inverted to create an email-to-actionable points dataset.

Data generation script handles all possible ways to generate data:

python data_generation/data_generator.py

Our datasets can be found in the data directory:

# data generated by InstructGPT3
gpt_generated_data.jsonl

# data generated by Bloom
bloom_generated_data.jsonl

# handwritten dataset for evaluation
handwritten_data.jsonl

3. Finetune Bloom Models

Finetuning was done from a Jupyter notebook:

finetuning/bloom_finetune.ipynb

DeepSpeed config we used for finetuning can be found and modified in:

finetuning/ds_config_zero2.json

4. Run evaluation script

Evaluation was done from a Jupyter notebook:

finetuning/bloom_loss.ipynb

Contributors

Tan Pinxi, Tan Xi Zhe, Tan Ming Ann, Lim Yu Yang, Ng Boon Hong

About

Email Summarization via Finetuned Bloom LLM

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published