-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion on preprocessing of LAION data #32
Comments
@youngwanLEE, please find our response below:
We haven’t encountered such an error
We removed some problematic image-text pairs (empty text files and PIL-unreadable images); however, this led to different error messages from yours. We’ve tried to reproduce this error (by changing batch sizes under a multi-gpu setting, adding empty lines in metadata.csv, using very long text prompts or multi-line prompts), but eventually failed to do so (no or different errors occurred). Could you provide more context about this error? It would be very appreciated if you could generously share your update and/or solution to this issue. |
@bokyeong1015 thanks for your effort :) I finally solved this problem. The problem was caused by empty text files in the dataset. When I filtered the empty text pairs, the problem was solved. From now on, I started to train models on larger datasets over 10M image-text pairs. Thanks again :) It would be ok to close this issue. |
Great, thanks for sharing! Hope your training goes well :) |
[Question]
The text was updated successfully, but these errors were encountered: