-
Notifications
You must be signed in to change notification settings - Fork 80
Closed
Labels
bugSomething isn't workingSomething isn't workinghelp wantedExtra attention is neededExtra attention is needed
Description
Some users have faced issues where they provide a filepath inside their sample and litdata automatically detects it is a valid file and read its content.
This made the code nice but unfortunately, some user un-aware of the behaviours were seeing too many chunks being created.
Example:
fallocate -l 50MB gentoo_root.imgfrom litdata import optimize
def fn(filepath):
return filepath
optimize(
fn=fn,
inputs=["gentoo_root.img" for _ in range(10)],
output_dir="./data",
chunk_bytes="64MB",
num_workers=1,
)Each sample will store the entire 50MB file, so you endup with 10 chunks with 10 times the entire file instead of 1 chunk with 10 samples containing the filepath .
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinghelp wantedExtra attention is neededExtra attention is needed