Thanks for your wonderful work and the release of UnicEdit dataset. I have downloaded part of the released 2M data. But I feel a little confused about the data and task distribution and want help.
Since I want to sample a subset of UnicEdit, I download 00001-00060 data parquet (~900k) However, I find the data distribution for some task are highly imbalanced, e.g., Subject Addition 209963, Subject Removal 168, Counting Change 65, Color Alteration 231632.
So, my question is, could you provide a detailed task distribution of the data for reference? Have the authors trained some models on UnicBench-10M, and will such a distribution harm the peformance of exiting models through finetuning? Wish for your early reply~
Thanks for your wonderful work and the release of UnicEdit dataset. I have downloaded part of the released 2M data. But I feel a little confused about the data and task distribution and want help.
Since I want to sample a subset of UnicEdit, I download 00001-00060 data parquet (~900k) However, I find the data distribution for some task are highly imbalanced, e.g., Subject Addition 209963, Subject Removal 168, Counting Change 65, Color Alteration 231632.
So, my question is, could you provide a detailed task distribution of the data for reference? Have the authors trained some models on UnicBench-10M, and will such a distribution harm the peformance of exiting models through finetuning? Wish for your early reply~