Large number of chunks causes `OSError: [Errno 24] Too many open files`

## 🐛 Bug

If an optimized dataset has too many chunks (can be replicated using small chunk size or reasonable chunk size with lots of data), while reading (e.g. using `StreamingDataLoader`), a `Too many open files` error is encountered at some point. This is because each chunk is mmap'ed as it is loaded, and the mmap'ed handle is never released, leading to too many open files across the system:

https://github.com/Lightning-AI/litdata/blob/7efd76197e0191c0a07c9f0583ce9aec4b80a49e/src/litdata/streaming/item_loader.py#L320-L322

One could ofcourse use `ulimit` to increase it, but at some point the number of chunks can really be too high. 

It is easily resolved by keeping the number of mmap'ed files under a certain limit (for example by occasionally deleting some entries in the above dict, either randomly or smartly using FIFO or so).

I wanted to understand if this fix is alright, I am happy to send a PR for this if it makes sense. 

### To Reproduce
Steps to reproduce the behavior:

1. Optimize a dataset that results in a large number of chunks
2. Use StreamingDataset and StreamingDataLoader to load the data
3. At some point, `OSError: [Errno 24] Too many open files` will be encountered

Minimal code sample below to reproduce the issue.

#### Code sample
```python
import glob
import numpy as np
import random

from pathlib import Path

from litdata import optimize
from litdata.streaming import StreamingDataLoader, StreamingDataset, TokensLoader

# Fake tokenizer
def tokenize_fn(filepath):
    yield np.array([random.randint(0, 10000) for _ in range(random.randint(100, 1000))])

def main():
    Path("fake_file.txt").touch()
    outputs = optimize(
        fn=tokenize_fn,
        inputs=["fake_file.txt" for i in range(10000)], # increase number of files if error is not encountered on a specific machine
        output_dir="./optimized/",
        chunk_bytes="10KB",
        num_workers=1,
        item_loader=TokensLoader(block_size=1024),
    )
    


    train_dataset = StreamingDataset(
        input_dir="./optimized/",
        item_loader=TokensLoader(block_size=1024),
        shuffle=True,
        drop_last=False,
    )

    train_dataloader = StreamingDataLoader(
        train_dataset, batch_size=1, pin_memory=False, num_workers=1, drop_last=False
    )

    total_tokens = 0
    total_batches = 0
    for sample in train_dataloader:
        total_batches += 1
        total_tokens += np.prod(sample.shape)

        print(total_batches, total_tokens)
            
    print("Batches:", total_batches)
    print("Tokens:", total_tokens)

if __name__ == "__main__":
    main()
```

### Expected behavior



### Environment

- LitData version: 0.2.26
- OS (e.g., Linux): Linux
- How you installed: pip
- Python version: 3.12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Large number of chunks causes `OSError: [Errno 24] Too many open files` #366

🐛 Bug

To Reproduce

Code sample

Expected behavior

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Large number of chunks causes OSError: [Errno 24] Too many open files #366

Description

🐛 Bug

To Reproduce

Code sample

Expected behavior

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Large number of chunks causes `OSError: [Errno 24] Too many open files` #366