Log output from serving models for easier debugging #18

bobcatfish · 2024-02-15T02:54:49Z

llama-cpp-python uses uvicorn and it turns out there's a different way to start the running models that uses uvicorn directly, and that makes it possible to pass logging configuration to uvicorn. Unfortunately it seems that logs that come directly from llama_cpp get thrown to stderr and it's not configurable
(https://github.com/abetlen/llama-cpp-python/blob/ae71ad1a147b10c2c3ba99eb086521cddcc4fad4/llama_cpp/_logger.py#L30)

This change will make it so that the llm tool is installed in the workstations impage with a default logging configuration file which writes logs to /var/log/localllm.log. I don't love how different the story is if you actually run the tool directly (you have to go out of your way to get the logging) but this seems okay for now at least.

The content of the logging config is from
https://gist.github.com/liviaerxin/d320e33cbcddcc5df76dd92948e5be3b

Fixes #16

llama-cpp-python has a flag 'verbose' which defaults to true and when set causes it to write things to stderr. It doesn't include anyway to configure where these logs are directed, so it's stderr or nothing. Unfortunately the when we start the process running llama-cpp-python, we provide a pipe for stderr and then promptly close it. This means if llama-cpp-python tries to write to stderr, a broken pipe exception is thrown, which for example happens if there is a prefix cache hit when processing a prompt (https://github.com/abetlen/llama-cpp-python/blob/ae71ad1a147b10c2c3ba99eb086521cddcc4fad4/llama_cpp/llama.py#L645) which likely explains why ppl are seeing 500s on the second time that they try to run the same prompt. There are other situations that can make llama-cpp-python try to write to stderr as well, which may also cause 500s The real fix here is to a) not provide a broken pipe for stderr and b) for llama-cpp-python to allow us to configure logs (GoogleCloudPlatform#18). For now we can disable verbose mode in llama-cpp-python since we're not making those logs available anyway and it should stop the 500s. Fixes GoogleCloudPlatform#7

llm-tool/modelserving.py

llama-cpp-python uses uvicorn and it turns out there's a different way to start the running models that uses uvicorn directly, and that makes it possible to pass logging configuration to uvicorn. Unfortunately it seems that logs that come directly from llama_cpp get thrown to stderr and it's not configurable (https://github.com/abetlen/llama-cpp-python/blob/ae71ad1a147b10c2c3ba99eb086521cddcc4fad4/llama_cpp/_logger.py#L30) This change will make it so that the llm tool is installed in the workstations impage with a default logging configuration file which writes logs to /var/log/localllm.log. I don't love how different the story is if you actually run the tool directly (you have to go out of your way to get the logging) but this seems okay for now at least. The content of the logging config is from https://gist.github.com/liviaerxin/d320e33cbcddcc5df76dd92948e5be3b Fixes GoogleCloudPlatform#16

bobcatfish · 2024-02-16T20:59:42Z

thanks for the review @jerop !! i made the changes you suggested and then... FORGOT TO PUSH THEM so i'll add them into #19 instead

llama-cpp-python has a flag 'verbose' which defaults to true and when set causes it to write things to stderr. It doesn't include anyway to configure where these logs are directed, so it's stderr or nothing. Unfortunately the when we start the process running llama-cpp-python, we provide a pipe for stderr and then promptly close it. This means if llama-cpp-python tries to write to stderr, a broken pipe exception is thrown, which for example happens if there is a prefix cache hit when processing a prompt (https://github.com/abetlen/llama-cpp-python/blob/ae71ad1a147b10c2c3ba99eb086521cddcc4fad4/llama_cpp/llama.py#L645) which likely explains why ppl are seeing 500s on the second time that they try to run the same prompt. There are other situations that can make llama-cpp-python try to write to stderr as well, which may also cause 500s The real fix here is to a) not provide a broken pipe for stderr and b) for llama-cpp-python to allow us to configure logs (GoogleCloudPlatform#18). For now we can disable verbose mode in llama-cpp-python since we're not making those logs available anyway and it should stop the 500s. Fixes GoogleCloudPlatform#7

@jerop

Addressing feedback from @jerop in GoogleCloudPlatform#18: - Making the docstring more accurate for listing running processes - Making the check if log_config is provided more python-y

llama-cpp-python has a flag 'verbose' which defaults to true and when set causes it to write things to stderr. It doesn't include anyway to configure where these logs are directed, so it's stderr or nothing. Unfortunately the when we start the process running llama-cpp-python, we provide a pipe for stderr and then promptly close it. This means if llama-cpp-python tries to write to stderr, a broken pipe exception is thrown, which for example happens if there is a prefix cache hit when processing a prompt (https://github.com/abetlen/llama-cpp-python/blob/ae71ad1a147b10c2c3ba99eb086521cddcc4fad4/llama_cpp/llama.py#L645) which likely explains why ppl are seeing 500s on the second time that they try to run the same prompt. There are other situations that can make llama-cpp-python try to write to stderr as well, which may also cause 500s The real fix here is to a) not provide a broken pipe for stderr and b) for llama-cpp-python to allow us to configure logs (#18). For now we can disable verbose mode in llama-cpp-python since we're not making those logs available anyway and it should stop the 500s. Fixes #7

@jerop

Addressing feedback from @jerop in #18: - Making the docstring more accurate for listing running processes - Making the check if log_config is provided more python-y

bobcatfish mentioned this pull request Feb 15, 2024

Followed the instruction - running locally. Runs once then fails afterward #7

Closed

bobcatfish force-pushed the loggy_logs branch 2 times, most recently from 2466906 to d817835 Compare February 15, 2024 18:53

bobcatfish force-pushed the loggy_logs branch from d817835 to 0a9f975 Compare February 15, 2024 22:35

This was referenced Feb 15, 2024

Stop llama-cpp-python from writing to broken pipe #19

Merged

Capture (stderr) logs from llama-cpp-python cleanly #20

Open

jerop reviewed Feb 16, 2024

View reviewed changes

llm-tool/modelserving.py Show resolved Hide resolved

llm-tool/modelserving.py Show resolved Hide resolved

bobcatfish force-pushed the loggy_logs branch from 0a9f975 to 9aab71f Compare February 16, 2024 19:55

bobcatfish merged commit 8e34842 into GoogleCloudPlatform:main Feb 16, 2024
2 checks passed

jerop pushed a commit that referenced this pull request Feb 16, 2024

Improve log config if and docstring

14fa964

Addressing feedback from @jerop in #18: - Making the docstring more accurate for listing running processes - Making the check if log_config is provided more python-y

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log output from serving models for easier debugging #18

Log output from serving models for easier debugging #18

bobcatfish commented Feb 15, 2024

bobcatfish commented Feb 16, 2024

Log output from serving models for easier debugging #18

Log output from serving models for easier debugging #18

Conversation

bobcatfish commented Feb 15, 2024

bobcatfish commented Feb 16, 2024