Questions about data distribution and task distribution

Thanks for your wonderful work and the release of UnicEdit dataset. I have downloaded part of the released 2Ｍ data.  But I feel a little confused about the data and task distribution and want help. 
Since I want to sample a subset of UnicEdit, I download 00001-00060 data parquet (\~900k) However, I find the data distribution for some task are highly imbalanced, e.g., **Subject Addition 209963, Subject Removal 168, Counting Change 65, Color Alteration 231632**.
So, my question is, could you provide a detailed task distribution of the data for reference? Have the authors trained some models on UnicBench-10M, and will such a distribution harm the peformance of exiting models through finetuning? Wish for your early reply~

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about data distribution and task distribution #5

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Questions about data distribution and task distribution #5

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions