-
Notifications
You must be signed in to change notification settings - Fork 80
Closed
Labels
bugSomething isn't workingSomething isn't workinghelp wantedExtra attention is neededExtra attention is needed
Description
🐛 Bug
Hello LitData Tem.
I have identified an issue where the absence of the 'encryption' field in the legacy dataset (litdata==0.2.14) causes errors in the latest version.
To Reproduce
Steps to reproduce the behavior:
!pip install litdata==0.2.14
import litdata as ld
def gen_data(i):
return {"index": i}
ld.optimize(
gen_data,
inputs=list(range(1000)),
output_dir="./dataset_0_2_14",
chunk_size=1000
)
ds = ld.StreamingDataset("./dataset_0_2_14")
ds[42]
# {'index': 42}!pip install litdata==0.2.17
import litdata as ld
ds = ld.StreamingDataset("./dataset_0_2_14")
ds[42]---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[1], line 4
1 import litdata as ld
3 ds = ld.StreamingDataset("./dataset_0_2_14")
----> 4 ds[42]
File /opt/conda/envs/py310/lib/python3.10/site-packages/litdata/streaming/dataset.py:335, in StreamingDataset.__getitem__(self, index)
333 _my_cache_indices = [ChunkedIndex(*self.cache._get_chunk_index_from_index(idx)) for idx in _my_indices]
334 return [self.cache[chnk_idx] for chnk_idx in _my_cache_indices]
--> 335 return self.cache[index]
File /opt/conda/envs/py310/lib/python3.10/site-packages/litdata/streaming/cache.py:140, in Cache.__getitem__(self, index)
138 if isinstance(index, int):
139 index = ChunkedIndex(*self._get_chunk_index_from_index(index))
--> 140 return self._reader.read(index)
File /opt/conda/envs/py310/lib/python3.10/site-packages/litdata/streaming/reader.py:285, in BinaryReader.read(self, index)
282 chunk_filepath, begin, chunk_bytes = self.config[index]
284 if isinstance(self._item_loader, PyTreeLoader):
--> 285 item = self._item_loader.load_item_from_chunk(
286 index.index, index.chunk_index, chunk_filepath, begin, chunk_bytes, self._encryption
287 )
288 else:
289 item = self._item_loader.load_item_from_chunk(
290 index.index, index.chunk_index, chunk_filepath, begin, chunk_bytes
291 )
File /opt/conda/envs/py310/lib/python3.10/site-packages/litdata/streaming/item_loader.py:152, in PyTreeLoader.load_item_from_chunk(self, index, chunk_index, chunk_filepath, begin, chunk_bytes, encryption)
148 exists = os.path.exists(chunk_filepath) and os.stat(chunk_filepath).st_size >= chunk_bytes
150 self._chunk_filepaths[chunk_filepath] = True
--> 152 if self._config["encryption"]:
153 data = self._load_encrypted_data(chunk_filepath, chunk_index, offset, encryption)
154 else:
KeyError: 'encryption'
Code sample
Expected behavior
Environment
- PyTorch Version (e.g., 1.0):
- OS (e.g., Linux):
- How you installed PyTorch (
conda,pip, source): - Build command you used (if compiling from source):
- Python version:
- CUDA/cuDNN version:
- GPU models and configuration:
- Any other relevant information:
Additional context
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinghelp wantedExtra attention is neededExtra attention is needed