Skip to content

support backward compatibility for optimized dataset without encryption field (0.2.14)  #258

@csy1204

Description

@csy1204

🐛 Bug

Hello LitData Tem.
I have identified an issue where the absence of the 'encryption' field in the legacy dataset (litdata==0.2.14) causes errors in the latest version.

To Reproduce

Steps to reproduce the behavior:

!pip install litdata==0.2.14
import litdata as ld

def gen_data(i):
    return {"index": i}

ld.optimize(
    gen_data,
    inputs=list(range(1000)),
    output_dir="./dataset_0_2_14",
    chunk_size=1000
)

ds = ld.StreamingDataset("./dataset_0_2_14")
ds[42]

# {'index': 42}

!pip install litdata==0.2.17
import litdata as ld

ds = ld.StreamingDataset("./dataset_0_2_14")
ds[42]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[1], line 4
      1 import litdata as ld
      3 ds = ld.StreamingDataset("./dataset_0_2_14")
----> 4 ds[42]

File /opt/conda/envs/py310/lib/python3.10/site-packages/litdata/streaming/dataset.py:335, in StreamingDataset.__getitem__(self, index)
    333     _my_cache_indices = [ChunkedIndex(*self.cache._get_chunk_index_from_index(idx)) for idx in _my_indices]
    334     return [self.cache[chnk_idx] for chnk_idx in _my_cache_indices]
--> 335 return self.cache[index]

File /opt/conda/envs/py310/lib/python3.10/site-packages/litdata/streaming/cache.py:140, in Cache.__getitem__(self, index)
    138 if isinstance(index, int):
    139     index = ChunkedIndex(*self._get_chunk_index_from_index(index))
--> 140 return self._reader.read(index)

File /opt/conda/envs/py310/lib/python3.10/site-packages/litdata/streaming/reader.py:285, in BinaryReader.read(self, index)
    282 chunk_filepath, begin, chunk_bytes = self.config[index]
    284 if isinstance(self._item_loader, PyTreeLoader):
--> 285     item = self._item_loader.load_item_from_chunk(
    286         index.index, index.chunk_index, chunk_filepath, begin, chunk_bytes, self._encryption
    287     )
    288 else:
    289     item = self._item_loader.load_item_from_chunk(
    290         index.index, index.chunk_index, chunk_filepath, begin, chunk_bytes
    291     )

File /opt/conda/envs/py310/lib/python3.10/site-packages/litdata/streaming/item_loader.py:152, in PyTreeLoader.load_item_from_chunk(self, index, chunk_index, chunk_filepath, begin, chunk_bytes, encryption)
    148         exists = os.path.exists(chunk_filepath) and os.stat(chunk_filepath).st_size >= chunk_bytes
    150     self._chunk_filepaths[chunk_filepath] = True
--> 152 if self._config["encryption"]:
    153     data = self._load_encrypted_data(chunk_filepath, chunk_index, offset, encryption)
    154 else:

KeyError: 'encryption'

Code sample

Expected behavior

Environment

  • PyTorch Version (e.g., 1.0):
  • OS (e.g., Linux):
  • How you installed PyTorch (conda, pip, source):
  • Build command you used (if compiling from source):
  • Python version:
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • Any other relevant information:

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions