Augmentation_with_GPT

What I did: Augmenting Concept ARC Problem

How to augment

Guessing input data by some characteristics in ARC Problem

Augmentation Process (1) Category Prompt Prompt(역변환 방법 프롬프트) is written for each Task (Concept ARC has 16 Tasks)

If you want to use this method with other prompt, you may change prompt in GPT_prompt.py

(2) Prompt Concretization The prompts in (1) are prompts specific to each category rather than to individual problems. However, since each category contains problems with diverse logical relationships, we aimed to increase accuracy by adding prompts that describe each problem in detail.

(3) Data Generation

(4) Data Filtering Despite utilizing negative prompts, many of the generated data from the large language model did not meet the criteria. Therefore, an additional filtering process was necessary, and in this study, we filtered using the large language model.

Code File(GPT_DATA) to augment Concept ARC

i. Generate ARC File with Research_2024_Gen

The file in Research_2024_GEN is for generating ARC. (2.(1) - (3))

 1) ARC_Reverse.py will make inverse ARC Problem which will be in Reverse_Concept_Data
 
 2) Generate_with_Prompt/Generate.py will make ARC Problem.
 
 3) Remove_Redundancy.py will get rid of overlapped data.

ii. Filter the inadequate ARC File with Research_2024_FILTER

The file in Research_2024_FILTER is for Filtering unsuitable ARC. (2.(4))

 1) Prompt.py will make prompt file to help filtering.

 2) File in Decision Folder is filtering code.

   - n3 means generated with gpt 3.5 without prompt concretization

   - s3 means generated with gpt 3.5 with prompt concretization
   
   - n4 means generated with gpt 4.0 without prompt concretization
   
   - s4 means generated with gpt 4.0 with prompt concretization

Result Table

Problem Category	Total available	The number of generated data	The number of valid augmentated data	Ratio(valid/generated)
Above Below	58	158	34	21.52%
Center	65	236	35	14.83%
Clean Up	106	183	83	45.36%
Complete Shape	58	147	37	25.17%
Copy	27	153	4	2.61%
Count	56	202	29	14.36%
Extend To Boundary	37	167	8	4.79%
Extract Objects	44	176	21	11.93%
Filled Not Filled	58	203	29	14.29%
Horizontal Vertical	32	114	7	6.14%
Inside Outside	52	191	24	12.57%
Move To Boundary	36	165	12	7.27%
Order	47	162	26	16.05%
Same Different	107	246	76	30.89%
Top Bottom 2D	92	255	59	23.14%
Top Bottom 3D	55	215	25	11.63%
Total	930	2913	509	17.12%

HF_Filter.py within Research_2024_GEN

this python file is that I've used to filter inadequate data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Augmentation_with_GPT

Result Table

Files

README.md

Latest commit

History

README.md

File metadata and controls

Augmentation_with_GPT

Result Table