-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom dataset training failed due to IndexError: list index out of range #91
Comments
I tried with your dataset and it did not give an index out of range error. Also, I tried by a 1000 sample of my dataset it again give me out of index range error. To create the
|
What do you think about a set of characters in the caption column that may cause a problem? Did your dataset have any issues like this? If that's the case, could you please send me the data cleaning code? |
@AI-EnabledSoftwareEngineering-AISE Hi, I would recommend processing the data as follows:
|
Thank you, I solved it by removing all special characters in captions:
|
Hi, I encountered the same problem |
you
You can modify line 53 of data/file_dataset.py from 'fp = open(self.file_path,"r")' to 'fp = open(self.file_path,"rb")', and modify line 62 from 'offset += len(line.encode('utf-8))' to 'offset += len(line)’'. |
@shengjie1980 |
I organized my dataset as you described in a
tsv
file.I used this code to convert images to b64 encode:
Then organized data in a TSV file with these columns
uniq-id, image-id, caption, predicted object labels (empty string), image base64 string
. The size of my data frame is:5 X 47899
. But while I am loading data it says:caption_stage1_train.tsv slice_id 1 row count 24100 total row count 48200 slice_id 1 seek offset 24100
.The text was updated successfully, but these errors were encountered: