Skip to content

Latest commit

 

History

History
81 lines (53 loc) · 4.8 KB

README.md

File metadata and controls

81 lines (53 loc) · 4.8 KB

Augmentation_with_GPT

What I did: Augmenting Concept ARC Problem

How to augment

  1. Guessing input data by some characteristics in ARC Problem

다대일

  1. Augmentation Process (1) Category Prompt Prompt(역변환 방법 프롬프트) is written for each Task (Concept ARC has 16 Tasks)
  • If you want to use this method with other prompt, you may change prompt in GPT_prompt.py

(2) Prompt Concretization The prompts in (1) are prompts specific to each category rather than to individual problems. However, since each category contains problems with diverse logical relationships, we aimed to increase accuracy by adding prompts that describe each problem in detail.

스크린샷 2024-05-01 오후 9 12 27

(3) Data Generation 스크린샷 2024-05-01 오후 9 12 51

(4) Data Filtering Despite utilizing negative prompts, many of the generated data from the large language model did not meet the criteria. Therefore, an additional filtering process was necessary, and in this study, we filtered using the large language model. 스크린샷 2024-05-01 오후 9 20 04

  1. Code File(GPT_DATA) to augment Concept ARC

i. Generate ARC File with Research_2024_Gen

The file in Research_2024_GEN is for generating ARC. (2.(1) - (3))

 1) ARC_Reverse.py will make inverse ARC Problem which will be in Reverse_Concept_Data
 
 2) Generate_with_Prompt/Generate.py will make ARC Problem.
 
 3) Remove_Redundancy.py will get rid of overlapped data.

ii. Filter the inadequate ARC File with Research_2024_FILTER

The file in Research_2024_FILTER is for Filtering unsuitable ARC. (2.(4))

 1) Prompt.py will make prompt file to help filtering.

 2) File in Decision Folder is filtering code.

   - n3 means generated with gpt 3.5 without prompt concretization

   - s3 means generated with gpt 3.5 with prompt concretization
   
   - n4 means generated with gpt 4.0 without prompt concretization
   
   - s4 means generated with gpt 4.0 with prompt concretization

Result Table

Problem Category Total available The number of generated data The number of valid augmentated data Ratio(valid/generated)
Above Below 58 158 34 21.52%
Center 65 236 35 14.83%
Clean Up 106 183 83 45.36%
Complete Shape 58 147 37 25.17%
Copy 27 153 4 2.61%
Count 56 202 29 14.36%
Extend To Boundary 37 167 8 4.79%
Extract Objects 44 176 21 11.93%
Filled Not Filled 58 203 29 14.29%
Horizontal Vertical 32 114 7 6.14%
Inside Outside 52 191 24 12.57%
Move To Boundary 36 165 12 7.27%
Order 47 162 26 16.05%
Same Different 107 246 76 30.89%
Top Bottom 2D 92 255 59 23.14%
Top Bottom 3D 55 215 25 11.63%
Total 930 2913 509 17.12%
  • HF_Filter.py within Research_2024_GEN

this python file is that I've used to filter inadequate data.