Skip to content

An Empirical Study on Data Quality, Performances and Fairness of LLM-Generated Data

License

Notifications You must be signed in to change notification settings

benedettoscala/QUALITY

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

QUALITY: Quantifying the Quality, Fairness, and Performance of Large Language Model-Generated Data

Introduction

This project, part of the Software Engineering for AI course, performs an empirical study on the quality of tabular data generated by Large Language Models (LLMs), specifically GPT-4o. We recreated the German Credit dataset using different prompt engineering techniques: 0-Shot, 1-Shot, and 2-Shot, to assess their usability across various dimensions.

Research Questions

  1. Structural Metrics: Evaluating Uniqueness, Readability, Consistency, and Completeness.
  2. Performance Metrics: Examining F1-score and Accuracy of machine learning models trained on the generated data.
  3. Fairness Metrics: Ensuring that the generated data does not perpetuate bias using Equal Opportunity Difference (EOD), Average Odds Difference (AOD), and Statistical Parity Difference (SPD).

Results

  • Quality Metrics: 1-Shot prompts produced the highest-quality datasets in terms of structural metrics but showed high data duplication.
  • Performance Metrics: Models trained on 1-Shot generated datasets performed best, achieving an F1 score up to 0.968 and accuracy up to 0.956 on synthetic data without duplicates.
  • Fairness Metrics: 0-Shot and 2-Shot techniques generally maintained better fairness metrics compared to the 1-Shot technique, with 1-Shot showing increased bias especially across demographic attributes like sex and age.

Repository Structure

  • datasets/: Original and generated datasets.
  • notebooks/: Jupyter notebooks for data generation, analysis, and visualization.
  • documents/: Contains the full report and a presentation PDF detailing methodology and results.

Usage

The repository contains detailed instructions in the Jupyter notebooks which guide through the process of generating data, training models, and evaluating them across different metrics.

Authors

About

An Empirical Study on Data Quality, Performances and Fairness of LLM-Generated Data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%