Skip to content

Bug Report: Incorrect filename for merging compressed datasets #276

@bhimrazy

Description

@bhimrazy

🐛 Bug

Bug reported in PR #190

Description:
The filename generation in functions.py does not account for the compression type. This results in incorrect filenames when compression == zstd.

# src/litdata/processing/functions.py

for chunk in input_dir_file_content["chunks"]:  # type: ignore
    assert isinstance(chunk, dict)
    old_filename = chunk["filename"]
    new_filename = f"chunk-0-{counter}.bin"

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions