code

prompts:

de_gpt4_end2end_prompt_utils.py: prompts used for Section 3 in the paper
de_prompt_utils.py: prompts for German atomic edit extraction and explanation generation
zh_prompt_utils.py: prompts for Chinese atomic edit extraction and explanation generation

fine-tune_llama2-7b:

fine-tune_llama2-7b.sh: parameters for fine-tuning the model
qlora.py: see source code here

rule_based_screening.py: the heuristic rules for screening out low-level mistakes in atomic edit extraction

SequenceMatcher_rough_edits.py: use SequenceMatcher from difflib to extract rough edits

data

fine-tune_data: the training and test data of LLM fine-tuning for German and Chinese atomic edit extraction. The data is in the format for fine-tuning ChatGPT. Sentence pair is the source and target sentence; list of edits are the rough edits extracted by SequenceMatcher; list of labels are the labels of the edits; content is the gold atomic edits.

human_annotation_data: the anonymized raw human annotation data

Sentence aligner

We modify the paragraph aligner from here to align sentences in the datasets.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
code		code
data		data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

data

data

README.md

README.md

Repository files navigation

code

data

Sentence aligner

About

Releases

Packages

Languages

Yixiao-Song/GEE-with-LLMs

Folders and files

Latest commit

History

Repository files navigation

code

data

Sentence aligner

About

Resources

Stars

Watchers

Forks

Languages