You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
root
├── images_train
│ ├── 0000 # First four letters of the image name
│ │ ├── 0000000 # Image Binary
│ │ ├── 0000001
│ │ └── ...
│ ├── 0001
│ │ ├── 0001000
│ │ ├── 0001001
│ │ └── ...
Hello, please forgive my stupid question. I don't know what you mean about "0000 # First four letters of image name" and "0000000 # Image Binary" in your DATA.md. Can you explain what are the "Image Binary" and "First four letters of image name"? Thanks
The text was updated successfully, but these errors were encountered:
GCC (CC3M) provides the dataset in the form of image URLs and their related caption.
Since their original filenames are un-ordered and they have various formats, I renamed them to the ordered sequence without the extension (like .jpg, .png, ...) during the download.
So these renamed "image files (binaries)" have names such as 0000000, 0000001, ..., 2983222, etc.
If I put all files in a single directory, it slows down disk-related operations.
Thus I partitioned them into several directories named "first four letters of the image name" so that every directory has 1000 files at maximum.
root
├── images_train
│ ├── 0000 # First four letters of the image name
│ │ ├── 0000000 # Image Binary
│ │ ├── 0000001
│ │ └── ...
│ ├── 0001
│ │ ├── 0001000
│ │ ├── 0001001
│ │ └── ...
Hello, please forgive my stupid question. I don't know what you mean about "0000 # First four letters of image name" and "0000000 # Image Binary" in your DATA.md. Can you explain what are the "Image Binary" and "First four letters of image name"? Thanks
The text was updated successfully, but these errors were encountered: