Custom dataset training failed due to IndexError: list index out of range #91

AI-EnabledSoftwareEngineering-AISE · 2022-05-02T04:51:47Z

I organized my dataset as you described in a tsv file.
I used this code to convert images to b64 encode:

from PIL import Image
from io import BytesIO
import base64

img = Image.open(fn)
img_buffer = BytesIO()
img.save(img_buffer, format=img.format)
byte_data = img_buffer.getvalue()
base64_str = base64.b64encode(byte_data)

Then organized data in a TSV file with these columns uniq-id, image-id, caption, predicted object labels (empty string), image base64 string. The size of my data frame is:5 X 47899 . But while I am loading data it says: caption_stage1_train.tsv slice_id 1 row count 24100 total row count 48200 slice_id 1 seek offset 24100.

2022-05-02 00:17:19 - trainer.py[line:124] - INFO: detected shared parameter: encoder.embed_images.conv1.bias <- decoder.output_projection.bias
2022-05-02 00:17:19 - utils.py[line:759] - INFO: ***********************CUDA enviroments for all 2 workers***********************
2022-05-02 00:17:19 - utils.py[line:765] - INFO: rank   0: capabilities =  6.0  ; total memory = 15.899 GB ; name = Tesla P100-PCIE-16GB                    
2022-05-02 00:17:19 - utils.py[line:765] - INFO: rank   1: capabilities =  6.0  ; total memory = 15.899 GB ; name = Tesla P100-PCIE-16GB                    
2022-05-02 00:17:19 - utils.py[line:767] - INFO: ***********************CUDA enviroments for all 2 workers***********************
2022-05-02 00:17:19 - train.py[line:145] - INFO: training on 2 devices (GPUs/TPUs)
2022-05-02 00:17:19 - train.py[line:151] - INFO: max tokens per device = None and max sentences per device = 8
2022-05-02 00:17:19 - trainer.py[line:458] - INFO: Preparing to load checkpoint ../../checkpoints/ofa_base.pt
2022-05-02 00:17:24 - trainer.py[line:309] - INFO: NOTE: your device does NOT support faster training with --fp16 or --amp, please switch to FP32 which is likely to be faster
2022-05-02 00:17:24 - trainer.py[line:619] - INFO: Loaded checkpoint ../../checkpoints/ofa_base.pt (epoch 48 @ 0 updates)
2022-05-02 00:17:24 - trainer.py[line:639] - INFO: loading train data for epoch 1
local datafile /raid/AISSEL/Hamed/datasets/caption_data/caption_stage1_train.tsv slice_id 0 begin to initialize row_count and line_idx-to-offset mapping
local datafile /raid/AISSEL/Hamed/datasets/caption_data/caption_stage1_train.tsv slice_id 1 begin to initialize row_count and line_idx-to-offset mapping
local datafile /raid/AISSEL/Hamed/datasets/caption_data/caption_stage1_train.tsv slice_id 1 finished initializing row_count and line_idx-to-offset mapping
file /raid/AISSEL/Hamed/datasets/caption_data/caption_stage1_train.tsv slice_id 1 row count 24100 total row count 48200
local datafile /raid/AISSEL/Hamed/datasets/caption_data/caption_stage1_train.tsv slice_id 0 finished initializing row_count and line_idx-to-offset mapping
file /raid/AISSEL/Hamed/datasets/caption_data/caption_stage1_train.tsv slice_id 0 row count 24100 total row count 48200
slice_id 0 seek offset 0
Total steps 3770, warmup steps 226, warmup_factor 0.004424778761061947
2022-05-02 00:18:04 - trainer.py[line:703] - INFO: begin training epoch 1
2022-05-02 00:18:04 - train.py[line:296] - INFO: Start iterating over samples
slice_id 1 seek offset 24100
Total steps 3770, warmup steps 226, warmup_factor 0.004424778761061947
Traceback (most recent call last):
  File "../../train.py", line 528, in <module>
    cli_main()
  File "../../train.py", line 521, in cli_main
    distributed_utils.call_main(cfg, main)
  File "/home/XXXX/.conda/envs/ofa/lib/python3.7/site-packages/fairseq/distributed/utils.py", line 374, in call_main
    distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs)
  File "/home/XXXX/.conda/envs/ofa/lib/python3.7/site-packages/fairseq/distributed/utils.py", line 348, in distributed_main
    main(cfg, **kwargs)
  File "../../train.py", line 190, in main
    valid_losses, should_stop = train(cfg, trainer, task, epoch_itr)
  File "/home/XXXX/.conda/envs/ofa/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "../../train.py", line 297, in train
    for i, samples in enumerate(progress):
  File "/home/XXXX/.conda/envs/ofa/lib/python3.7/site-packages/fairseq/logging/progress_bar.py", line 261, in __iter__
    for i, obj in enumerate(self.iterable, start=self.n):
  File "/home/XXXX/.conda/envs/ofa/lib/python3.7/site-packages/fairseq/data/iterators.py", line 56, in __next__
    x = next(self._itr)
  File "/home/XXXX/.conda/envs/ofa/lib/python3.7/site-packages/fairseq/data/iterators.py", line 509, in _chunk_iterator
    for x in itr:
  File "/home/XXXX/.conda/envs/ofa/lib/python3.7/site-packages/fairseq/data/iterators.py", line 56, in __next__
    x = next(self._itr)
  File "/home/XXXX/.conda/envs/ofa/lib/python3.7/site-packages/fairseq/data/iterators.py", line 637, in __next__
    raise item
  File "/home/XXXX/.conda/envs/ofa/lib/python3.7/site-packages/fairseq/data/iterators.py", line 567, in run
    for item in self._source:
  File "/home/XXXX/.conda/envs/ofa/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 530, in __next__
    data = self._next_data()
  File "/home/XXXX/.conda/envs/ofa/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 570, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/XXXX/.conda/envs/ofa/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/XXXX/.conda/envs/ofa/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/XXXX/ofa/OFA/data/mm_data/caption_dataset.py", line 117, in __getitem__
    uniq_id, image, caption = self.dataset[index]
  File "/home/XXXX/ofa/OFA/data/file_dataset.py", line 106, in __getitem__
    column_l = [dtype(column_l[col_id]) for col_id, dtype in zip(self.selected_col_ids, self.dtypes)]
  File "/home/XXXX/ofa/OFA/data/file_dataset.py", line 106, in <listcomp>
    column_l = [dtype(column_l[col_id]) for col_id, dtype in zip(self.selected_col_ids, self.dtypes)]
IndexError: list index out of range
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 13615 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 13614) of binary: /home/XXXX/.conda/envs/ofa/bin/python
Traceback (most recent call last):
  File "/home/XXXX/.conda/envs/ofa/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/XXXX/.conda/envs/ofa/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/XXXX/.conda/envs/ofa/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in <module>
    main()
  File "/home/XXXX/.conda/envs/ofa/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main
    launch(args)
  File "/home/XXXX/.conda/envs/ofa/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch
    run(args)
  File "/home/XXXX/.conda/envs/ofa/lib/python3.7/site-packages/torch/distributed/run.py", line 718, in run
    )(*cmd_args)
  File "/home/XXXX/.conda/envs/ofa/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/XXXX/.conda/envs/ofa/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 247, in launch_agent
    failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
../../train.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2022-05-02_00:18:09
  host      : hartley
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 13614)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

The text was updated successfully, but these errors were encountered:

AI-EnabledSoftwareEngineering-AISE · 2022-05-02T06:11:10Z

I tried with your dataset and it did not give an index out of range error. Also, I tried by a 1000 sample of my dataset it again give me out of index range error. To create the tsv file I use this command:

df_train.to_csv(f'{raid_path}/caption_stage1_train.tsv', sep="\t", index=False, header=False)

Also, my dataframe is like this:

It is strange to me what is happening to data?

AI-EnabledSoftwareEngineering-AISE · 2022-05-02T06:35:17Z

What do you think about a set of characters in the caption column that may cause a problem? Did your dataset have any issues like this? If that's the case, could you please send me the data cleaning code?

logicwong · 2022-05-02T08:41:04Z

@AI-EnabledSoftwareEngineering-AISE Hi, I would recommend processing the data as follows:

Set the predicted object labels to ' ' or any other character, do not let it be NaN;
Delete the '\t' in the caption, like caption = caption.replace('\t', ' ') .

AI-EnabledSoftwareEngineering-AISE · 2022-05-02T18:20:47Z

Thank you, I solved it by removing all special characters in captions:

def remove_special(input_string):
    final_string = ""
    for character in input_string:
        if  character == " ":
            final_string = final_string + character
        else:
            if(character.isalnum()):
                final_string = final_string + character
    return final_string

zzhanghub · 2022-08-16T13:11:49Z

Hi, I encountered the same problem list index out of range in the line column_l = [dtype(column_l[col_id]) for col_id, dtype in zip(self.selected_col_ids, self.dtypes)]. There are no special characters in my caption. I see that several other issues mentioned this issue. Do you have any method fot it 😭?
Thank you!

#131
#94

shengjie1980 · 2022-08-29T07:18:02Z

you

Hi, I encountered the same problem list index out of range in the line column_l = [dtype(column_l[col_id]) for col_id, dtype in zip(self.selected_col_ids, self.dtypes)]. There are no special characters in my caption. I see that several other issues mentioned this issue. Do you have any method fot it 😭? Thank you!

#131 #94

You can modify line 53 of data/file_dataset.py from 'fp = open(self.file_path,"r")' to 'fp = open(self.file_path,"rb")', and modify line 62 from 'offset += len(line.encode('utf-8))' to 'offset += len(line)’'.

zzhanghub · 2022-08-29T09:37:40Z

@shengjie1980
Great !!! it works! Thank you!
Could you explain the reasons behind this?

AI-EnabledSoftwareEngineering-AISE closed this as completed May 2, 2022

victorup mentioned this issue Feb 6, 2023

About finetuning on image captioning #346

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom dataset training failed due to IndexError: list index out of range #91

Custom dataset training failed due to IndexError: list index out of range #91

AI-EnabledSoftwareEngineering-AISE commented May 2, 2022 •

edited

Loading

AI-EnabledSoftwareEngineering-AISE commented May 2, 2022

AI-EnabledSoftwareEngineering-AISE commented May 2, 2022

logicwong commented May 2, 2022

AI-EnabledSoftwareEngineering-AISE commented May 2, 2022

zzhanghub commented Aug 16, 2022

shengjie1980 commented Aug 29, 2022

zzhanghub commented Aug 29, 2022

Custom dataset training failed due to IndexError: list index out of range #91

Custom dataset training failed due to IndexError: list index out of range #91

Comments

AI-EnabledSoftwareEngineering-AISE commented May 2, 2022 • edited Loading

AI-EnabledSoftwareEngineering-AISE commented May 2, 2022

AI-EnabledSoftwareEngineering-AISE commented May 2, 2022

logicwong commented May 2, 2022

AI-EnabledSoftwareEngineering-AISE commented May 2, 2022

zzhanghub commented Aug 16, 2022

shengjie1980 commented Aug 29, 2022

zzhanghub commented Aug 29, 2022

AI-EnabledSoftwareEngineering-AISE commented May 2, 2022 •

edited

Loading