Fulgid

Procedural Text Understanding through Python Code Representation

Overview

This NLP research project is devoted to the development of advanced procedural text understanding techniques utilizing Python code representation. Our focus is on enhancing and challenging the reasoning capabilities of Language Learning Models (LLMs), transforming multiple-choice questions into functions using Python code to represent procedural text, and executing the code to achieve results.

Motivation

Real-world tasks or events understanding by AI assistants is a crucial aspect of their efficiency and usefulness. Although recent advancements have shown promise in procedural text understanding, real-world instructions often entail more complex relations between events and entities, which has been insufficiently explored. Limitations of existing work involve weak reasoning about implicit events, high dependence on context, difficulty handling long, intricate processes, and a failure to generalize across different scenarios due to domain-specific models. Leveraging Python code representations, which show potential in boosting LLMs reasoning capabilities, we aim to address these shortcomings.

Project Overview

This project utilizes various methods to perform tasks related to text analysis and classification using BERT model and HanLP. The main goal of this project is to improve the accuracy of the classification. Below is a summary of the different approaches used and the corresponding results obtained:

experiences:

bert-without-training.ipynb: This is a basic model where BERT is used without any fine-tuning. The accuracy achieved in this approach was 13%, which is below the baseline accuracy (polarity baseline: 25% and majority baseline: 24%).
bert-finetune.ipynb: This model involves fine-tuning the BERT model on the training data. The accuracy improved significantly to 67%, but it's still not sufficient according to the project's goal.
paper-method.ipynb: This approach is based on a method discussed in the original paper. The questions are supplied to BERT in the format [CLS] paragraph [SEP] question [SEP] answer-option for each of the three options. The [CLS] token is then projected to a single logit and fed through a softmax layer across the three options, using cross entropy loss, with the highest-scoring option being selected. This fine-tuning on the WIQA (What If Questions Answer) training data resulted in an accuracy of 71%.
bert-plus-hanlp.ipynb: In this approach, Constituency Parsing was added to the previous method using HanLP, the multilingual NLP library. This enhanced the model and resulted in an accuracy improvement of around 2%.

src:

chatGPT.py: This script calls the OpenAI API to generate Python code for each procedure. It updates the code based on each question and attempts to get the answer based on the question.
evaluate.py: This script is used for evaluating the results of the Python code generation performed by chatGPT.py.
settings.py: This file contains the configuration settings for the project.
pseudocode.py: This script uses pseudocode as an alternative to the Python code generation.
finetuing-GPT3.5.py: This script is used for fine-tuning the GPT-3.5 model. It generates output.jsonl, which includes the prompt texts and true labels.
eval_finetune.py: This script evaluates the fine-tuned GPT-3.5 model based on a test dataset.
output.jsonl: This file is generated by finetuing-GPT3.5.py and contains the prompt texts and true labels used for evaluation.

Dataset

WIQA (Tandon et al., 2019): This dataset is a collection of "What if..." questions, offering a perturbation and possible effect over procedural text. It aids us in creating Python code representations for complex causal reasoning tasks.

Download dataset from here. Move the data into datasets/wiqa-dataset-v2-october-2019

Install

conda create --name myenv python=3.11.3

conda activate myenv

conda install --file requirements.txt

System requirements for training

This project requires a high-performance computing environment. Below are the recommended system specifications:

Operating System: Ubuntu (Most recent stable version)
Processor: Intel® Xeon® Processor (Model specifics depend on the exact requirements of your project)
Memory: 200 GB RAM (For handling large datasets or high-performance computations)
GPU: Nvidia A100 (For accelerated machine learning tasks)
Software: JupyterLab (For running the notebooks and Python scripts)

Author

Ehsan Barkhordar

License

GNU

Name		Name	Last commit message	Last commit date
Latest commit History 194 Commits
boxes		boxes
code-base		code-base
code-updated		code-updated
datasets		datasets
experiences		experiences
src		src
winogrande_1.1		winogrande_1.1
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
settings.py		settings.py

License

GGLAB-KU/fulgid

Folders and files

Latest commit

History

Repository files navigation

Fulgid

Overview

Motivation

Project Overview

Dataset

Install

System requirements for training

Author

License

About

Resources

License

Stars

Watchers

Forks

Languages