Skip to content

Error decoding sample when computing batch stats #24

@ksrinivasan-tri

Description

@ksrinivasan-tri

I am facing an error while trying to compute the normalization stats during the consolidate() step of the AGIBot dataset for a single task (327).

Here's the full traceback for the error:

Loading dataset shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 169/169 [00:00<00:00, 12281.05it/s]                                                 [2/164]
Traceback (most recent call last):                                                                                                                                                             
  File "/home/ubuntu/AgiBot-World/scripts/convert_to_lerobot.py", line 670, in <module>                                                                                                        
    task_id = args.task_id                                                                                                                                                                     
  File "/home/ubuntu/AgiBot-World/scripts/convert_to_lerobot.py", line 633, in main                                                                                                            
    raw_datasets_chunk = None                                                                                                                                                                  
  File "/home/ubuntu/AgiBot-World/scripts/convert_to_lerobot.py", line 444, in consolidate                                                                                  
    self.meta.stats = compute_stats(self)                                                                                                                                   
  File "/home/ubuntu/AgiBot-World/scripts/convert_to_lerobot.py", line 234, in compute_stats                                                                                
    stats_patterns = get_stats_einops_patterns(dataset, num_workers)                                                                                                        
  File "/home/ubuntu/AgiBot-World/scripts/convert_to_lerobot.py", line 197, in get_stats_einops_patterns                                                                    
    batch = next(iter(dataloader))         
  File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 708, in __next__                                                             
    data = self._next_data()               
  File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1480, in _next_data                                                          
    return self._process_data(data)        
  File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1505, in _process_data                                                       
    data.reraise()                         
  File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/_utils.py", line 733, in reraise                                                                             
    raise exception                        
AttributeError: Caught AttributeError in DataLoader worker process 0.                 
Original Traceback (most recent call last):                                           
  File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 349, in _worker_loop                                                      
    data = fetcher.fetch(index)  # type: ignore[possibly-undefined]                   
  File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch                                                               
    data = [self.dataset[idx] for idx in possibly_batched_index]                      
  File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 52, in <listcomp>                                                          
    data = [self.dataset[idx] for idx in possibly_batched_index]                      
  File "/home/ubuntu/.local/lib/python3.10/site-packages/lerobot/common/datasets/lerobot_dataset.py", line 645, in __getitem__                                              
    item = self.hf_dataset[idx]            
  File "/home/ubuntu/.local/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2782, in __getitem__                                                              
    return self._getitem(key)              
  File "/home/ubuntu/.local/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2767, in _getitem                                                                 
    formatted_output = format_table(       
  File "/home/ubuntu/.local/lib/python3.10/site-packages/datasets/formatting/formatting.py", line 658, in format_table                                                      
    return formatter(pa_table, query_type=query_type)                                 
  File "/home/ubuntu/.local/lib/python3.10/site-packages/datasets/formatting/formatting.py", line 411, in __call__                                                          
    return self.format_row(pa_table)       
  File "/home/ubuntu/.local/lib/python3.10/site-packages/datasets/formatting/formatting.py", line 511, in format_row                                                        
    formatted_batch = self.format_batch(pa_table)                                     
  File "/home/ubuntu/.local/lib/python3.10/site-packages/datasets/formatting/formatting.py", line 540, in format_batch                                                      
    batch = self.python_features_decoder.decode_batch(batch)                          
  File "/home/ubuntu/.local/lib/python3.10/site-packages/datasets/formatting/formatting.py", line 231, in decode_batch                                                      
    return self.features.decode_batch(batch) if self.features else batch              
  File "/home/ubuntu/.local/lib/python3.10/site-packages/datasets/features/features.py", line 2091, in decode_batch                                                         
    [                                      
  File "/home/ubuntu/.local/lib/python3.10/site-packages/datasets/features/features.py", line 2092, in <listcomp>                                                           
    decode_nested_example(self[column_name], value, token_per_repo_id=token_per_repo_id)                                                                                    
  File "/home/ubuntu/.local/lib/python3.10/site-packages/datasets/features/features.py", line 1407, in decode_nested_example                                                
    return schema.decode_example(obj, token_per_repo_id=token_per_repo_id) if obj is not None else None                                                                     
  File "/home/ubuntu/.local/lib/python3.10/site-packages/datasets/features/image.py", line 189, in decode_example                                                           
    if image.getexif().get(PIL.Image.ExifTags.Base.Orientation) is not None:          
  File "/usr/lib/python3/dist-packages/PIL/Image.py", line 1360, in getexif           
    self._exif.load_from_fp(self.fp, self.tag_v2._offset)                             
  File "/usr/lib/python3/dist-packages/PIL/Image.py", line 3410, in load_from_fp                                                                                            
    self.fp.seek(offset)                             
AttributeError: 'NoneType' object has no attribute 'seek'                                                  

Is this a known issue due to a version/data loading problem? I have added some garbage collection to the main script here, which might be related to the problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions