QUALITY: Quantifying the Quality, Fairness, and Performance of Large Language Model-Generated Data

Introduction

This project, part of the Software Engineering for AI course, performs an empirical study on the quality of tabular data generated by Large Language Models (LLMs), specifically GPT-4o. We recreated the German Credit dataset using different prompt engineering techniques: 0-Shot, 1-Shot, and 2-Shot, to assess their usability across various dimensions.

Research Questions

Structural Metrics: Evaluating Uniqueness, Readability, Consistency, and Completeness.
Performance Metrics: Examining F1-score and Accuracy of machine learning models trained on the generated data.
Fairness Metrics: Ensuring that the generated data does not perpetuate bias using Equal Opportunity Difference (EOD), Average Odds Difference (AOD), and Statistical Parity Difference (SPD).

Results

Quality Metrics: 1-Shot prompts produced the highest-quality datasets in terms of structural metrics but showed high data duplication.
Performance Metrics: Models trained on 1-Shot generated datasets performed best, achieving an F1 score up to 0.968 and accuracy up to 0.956 on synthetic data without duplicates.
Fairness Metrics: 0-Shot and 2-Shot techniques generally maintained better fairness metrics compared to the 1-Shot technique, with 1-Shot showing increased bias especially across demographic attributes like sex and age.

Repository Structure

datasets/: Original and generated datasets.
notebooks/: Jupyter notebooks for data generation, analysis, and visualization.
documents/: Contains the full report and a presentation PDF detailing methodology and results.

Usage

The repository contains detailed instructions in the Jupyter notebooks which guide through the process of generating data, training models, and evaluating them across different metrics.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
datasets		datasets
documents		documents
notebooks		notebooks
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QUALITY: Quantifying the Quality, Fairness, and Performance of Large Language Model-Generated Data

Introduction

Research Questions

Results

Repository Structure

Usage

Authors

About

Releases

Packages

Languages

License

benedettoscala/QUALITY

Folders and files

Latest commit

History

Repository files navigation

QUALITY: Quantifying the Quality, Fairness, and Performance of Large Language Model-Generated Data

Introduction

Research Questions

Results

Repository Structure

Usage

Authors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages