Skip to content

douglashiwo/BoundaryDetection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 

Repository files navigation

BoundaryDetection(AAAI 2024 full paper)

1.These are the implemented codes and hybrid text dataset for our boundary detection model TriBERT (https://arxiv.org/abs/2307.12267), which has been accepted as a full paper on AAAI 2024.

2.For details about the boundary detection model TriBERT and how the hybrid essay dataset was constructed.Please refer to our paper:

  **Towards Automatic Boundary Detection for Human-AI Collaborative Hybrid Essay in Education**
  BibTex:
  @article{
      zeng2023towards,
      title={Towards Automatic Boundary Detection for Human-AI Collaborative Hybrid Essay in Education},
      author={Zeng, Zijie and Sha, Lele and Li, Yuheng and Yang, Kaixun and Ga{\v{s}}evi{\'c}, Dragan and Chen, Guanliang},
      journal={Proceedings of the 38th AAAI Conference on Artificial Intelligence},
      year={2024}
  }

4.Particularly, here we describe the meaning of columns from data.xlsx (hybrid essay dataset) ----------------------------------**********************-----------------------------------

essay_id: The id number of the original source essay.

essayset: The id of the prompt of the source essay.

essay: The original source essay on which the hybrid essay is based.

score1: The score given by rater 1 for the original source essay.

score2: The score given by rater 2 for the original source essay.

score: The final score for the original source essay.

ratio: Randomly generated number, PLEASE IGNORE THIS.

train_ix: This indicator is used to specify to which set (Train/Valid/Test) the current piece of data belongs.

sent_and_label: The list of <sentence, label> combinations (Each sentence comes from the hybrid essay), i.e., sentences and their labels. Label here means the authorship of the sentence. For example, label 'human' mean human-written and label 'machine' means ChatGPT-generated.

hybrid_text: The human-AI collaboratively written by ChatGPT and students.

boundary_ix: The list containing all boundaries for the above hybrid essay.

boundary_num: The number of boundaries of this hybrid essay.

author_seq: The structure of the hybrid essay. For example, 'H_M' means that the hybrid essay begins with human-written sentences and ends with machine-generated sentences (ChatGPT). 'M_H_M' means that the beginning text and ending text are machine-generated while the middle part is human-written.

human_part: Concatenation of all human-written sentences (extracted from the hybrid text).

machine_part: Concatenation of all ChatGPT-generated sentences (extracted from the hybrid text).

----------------------------------**********************-----------------------------------

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published