-
Notifications
You must be signed in to change notification settings - Fork 25
Description
I tried to setup llms.py in a podman container and connect it to my local Ollama instances, but I have encountered several issues and the test drive ultimately failed despite a lot of effort. Let me describe the main pain points.
Entrypoint script: There's a prebuilt container image (ghcr.io/servicestack/llms:latest), which is good, and it asks for a single volume to save data, which is again good. The trouble is that the volume requires initialization. There seems to be some entrypoint script that handles volume initialization, but that means I cannot customize llms parameters like port and verbosity. At least my attempts to run the container with something like llms --serve 12345 --verbose have failed. Things only started working once I started relying on the built-in entrypoint script for volume initialization and generally proper server setup. I let llms.py run on its default port and expose a different port via podman configuration. I have no idea how to enable verbose logging now.
Logging: Not sure how llms.py writes logs, but I had to set log driver in podman to journald to see any logs, because nothing was logged under default settings. Even then the error log is no more detailed than the brief error I got in the UI. As explained above, I wasn't able to turn on verbose logging.
Baked-in configuration: There's no way to bake my custom Ollama endpoints into the container image or to otherwise supply them to llms.py automatically. I am expected to manually edit llms.json after the volume is initialized. In order to automate that, I would have to (1) run the service once and wait for the volume to be initialized, (2) read llms.json from the volume, (3) have a Python script patch in my Ollama endpoints, and (4) write the modified llms.json into the volume while the service is temporarily stopped. That's way too complicated. Why cannot we just drop a file with predefined configuration somewhere that llms.py would merge into its dynamic configuration? I have also found no way to disable tools by default in the config file.
Failing Ollama requests: So I have added one Ollama endpoint manually. Its models show up in the UI, but when I try to start a chat, I get [Errno None] Can not write request body for http://localhost:11436/v1/chat/completions error. Logs don't say anything more. It might be because it's a year old Intel IPEX fork of Ollama, which is based on an even older upstream. I will try to upgrade someday and try again. But then OpenAI endpoint is part of Ollama for ages. It should work even if the Ollama version is old.
Authentication: I gather there's some github auth extension, but that's an overkill and maybe a security problem of its own in a local setup. Why not a simple username/password? Without authentication, I wonder how much access to llms.py is granted to random websites I visit by llms.py's CORS configuration. Flatpak apps have full access to localhost ports too. Fronting by a reverse proxy is not an option on localhost.
Complexity: I have spent several hours exploring numerous blind alleys and asked ChatGPT like 20 different questions about various aspects of setting up llms.py. ChatGPT often had to dig in the source code for answers, probably due to insufficient documentation. This ought to be easier.
Anyways, I like what you are doing here and I hope llms.py will keep improving. For now, I just want to leave feedback from my test drive here. Feel free to discard any part that does not align with your goals.