-
Notifications
You must be signed in to change notification settings - Fork 80
Closed
Labels
bugSomething isn't workingSomething isn't workinghelp wantedExtra attention is neededExtra attention is needed
Description
🐛 Bug
Dataset does not get created when map is given a list of items without paths.
To Reproduce
pip install litdata==0.2.3 lightning_sdk==0.1.3- Run this script
- Wait for job to finish
- Restart studio
ls /teamspace/datasets
Code sample
import requests, os
from litdata import map
from lightning_sdk import Machine
def create_files(idx, output_dir):
with open(os.path.join(output_dir, f"{idx}.txt") , "w") as f:
f.write(str(idx))
def main():
# root_dir = "./data-processed"
root_dir = "/teamspace/datasets/data-processed"
# os.makedirs(root_dir, exist_ok=True)
inputs = list(range(100))
map(
fn=create_files,
inputs=inputs,
output_dir=root_dir,
num_workers=os.cpu_count(),
num_nodes=2,
machine=Machine.CPU,
)
if __name__ == "__main__":
main()In #73 a condition and self.input_dir.path was added:
https://github.com/Lightning-AI/litdata/blame/58f7aeb5836383ff839b4d966462c568aa6e7435/src/litdata/processing/data_processor.py#L1022
The assumption there was that map gets a datastructure of file paths, but that's not always true. For example, map could be called on a list of URLs to download and process.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinghelp wantedExtra attention is neededExtra attention is needed