Skip to content

Yixiao-Song/GEE-with-LLMs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

code

prompts:

  • de_gpt4_end2end_prompt_utils.py: prompts used for Section 3 in the paper
  • de_prompt_utils.py: prompts for German atomic edit extraction and explanation generation
  • zh_prompt_utils.py: prompts for Chinese atomic edit extraction and explanation generation

fine-tune_llama2-7b:

  • fine-tune_llama2-7b.sh: parameters for fine-tuning the model
  • qlora.py: see source code here

rule_based_screening.py: the heuristic rules for screening out low-level mistakes in atomic edit extraction

SequenceMatcher_rough_edits.py: use SequenceMatcher from difflib to extract rough edits

data

fine-tune_data: the training and test data of LLM fine-tuning for German and Chinese atomic edit extraction. The data is in the format for fine-tuning ChatGPT. Sentence pair is the source and target sentence; list of edits are the rough edits extracted by SequenceMatcher; list of labels are the labels of the edits; content is the gold atomic edits.

human_annotation_data: the anonymized raw human annotation data

Sentence aligner

We modify the paragraph aligner from here to align sentences in the datasets.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published