Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Disk I/O Error when using tools due to shared outlines cache database #827

Open
AaronFriel opened this issue Apr 19, 2024 · 0 comments

Comments

@AaronFriel
Copy link

This was discovered when using the Llama 3 Instruct model with the workaround in #4180 on multiple nodes using a shared filed system for the cache directory.

Requests to the vLLM with tool_calls using the OpenAI compatible tool call API utilize the outlines library. When the filesystem is shared between nodes, e.g.: using AWS Elastic File System, the outlines library opens a SQLite database in the shared cache dir.

This causes I/O errors on at least one node due to each node conflicting on writing to the same SQLite database.

To mitigate this, Outlines should likely not default to ~/.cache - which is unfortunately often expected to be shared between nodes to share model weights - but more likely should use /tmp. This should also ensure that caches cannot be poisoned by invalid values and can be cleared on a restart of a container.

While this can be configured by an environment variable, I was surprised to see a non-multi-user safe file being opened in ~/.cache.

https://github.com/outlines-dev/outlines/blob/main/outlines/caching.py#L14-L29

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant