Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log output from serving models for easier debugging #18

Merged
merged 1 commit into from
Feb 16, 2024

Conversation

bobcatfish
Copy link
Collaborator

llama-cpp-python uses uvicorn and it turns out there's a different way to start the running models that uses uvicorn directly, and that makes it possible to pass logging configuration to uvicorn. Unfortunately it seems that logs that come directly from llama_cpp get thrown to stderr and it's not configurable
(https://github.com/abetlen/llama-cpp-python/blob/ae71ad1a147b10c2c3ba99eb086521cddcc4fad4/llama_cpp/_logger.py#L30)

This change will make it so that the llm tool is installed in the workstations impage with a default logging configuration file which writes logs to /var/log/localllm.log. I don't love how different the story is if you actually run the tool directly (you have to go out of your way to get the logging) but this seems okay for now at least.

The content of the logging config is from
https://gist.github.com/liviaerxin/d320e33cbcddcc5df76dd92948e5be3b

Fixes #16

@bobcatfish bobcatfish force-pushed the loggy_logs branch 2 times, most recently from 2466906 to d817835 Compare February 15, 2024 18:53
bobcatfish added a commit to bobcatfish/localllm-gcp that referenced this pull request Feb 15, 2024
llama-cpp-python has a flag 'verbose' which defaults to true and when
set causes it to write things to stderr. It doesn't include anyway to
configure where these logs are directed, so it's stderr or nothing.

Unfortunately the when we start the process running llama-cpp-python, we
provide a pipe for stderr and then promptly close it. This means if
llama-cpp-python tries to write to stderr, a broken pipe exception is
thrown, which for example happens if there is a prefix cache hit when
processing a prompt
(https://github.com/abetlen/llama-cpp-python/blob/ae71ad1a147b10c2c3ba99eb086521cddcc4fad4/llama_cpp/llama.py#L645)
which likely explains why ppl are seeing 500s on the second time that
they try to run the same prompt. There are other situations that can
make llama-cpp-python try to write to stderr as well, which may also
cause 500s

The real fix here is to a) not provide a broken pipe for stderr and
b) for llama-cpp-python to allow us to configure logs (GoogleCloudPlatform#18). For now
we can disable verbose mode in llama-cpp-python since we're not making
those logs available anyway and it should stop the 500s.

Fixes GoogleCloudPlatform#7
bobcatfish added a commit to bobcatfish/localllm-gcp that referenced this pull request Feb 15, 2024
llama-cpp-python has a flag 'verbose' which defaults to true and when
set causes it to write things to stderr. It doesn't include anyway to
configure where these logs are directed, so it's stderr or nothing.

Unfortunately the when we start the process running llama-cpp-python, we
provide a pipe for stderr and then promptly close it. This means if
llama-cpp-python tries to write to stderr, a broken pipe exception is
thrown, which for example happens if there is a prefix cache hit when
processing a prompt
(https://github.com/abetlen/llama-cpp-python/blob/ae71ad1a147b10c2c3ba99eb086521cddcc4fad4/llama_cpp/llama.py#L645)
which likely explains why ppl are seeing 500s on the second time that
they try to run the same prompt. There are other situations that can
make llama-cpp-python try to write to stderr as well, which may also
cause 500s

The real fix here is to a) not provide a broken pipe for stderr and
b) for llama-cpp-python to allow us to configure logs (GoogleCloudPlatform#18). For now
we can disable verbose mode in llama-cpp-python since we're not making
those logs available anyway and it should stop the 500s.

Fixes GoogleCloudPlatform#7
llm-tool/modelserving.py Show resolved Hide resolved
llm-tool/modelserving.py Show resolved Hide resolved
llama-cpp-python uses uvicorn and it turns out there's a different way
to start the running models that uses uvicorn directly, and that makes
it possible to pass logging configuration to uvicorn. Unfortunately it
seems that logs that come directly from llama_cpp get thrown to stderr
and it's not configurable
(https://github.com/abetlen/llama-cpp-python/blob/ae71ad1a147b10c2c3ba99eb086521cddcc4fad4/llama_cpp/_logger.py#L30)

This change will make it so that the llm tool is installed in the
workstations impage with a default logging configuration file which
writes logs to /var/log/localllm.log. I don't love how different the
story is if you actually run the tool directly (you have to go out
of your way to get the logging) but this seems okay for now at least.

The content of the logging config is from
https://gist.github.com/liviaerxin/d320e33cbcddcc5df76dd92948e5be3b

Fixes GoogleCloudPlatform#16
@bobcatfish bobcatfish merged commit 8e34842 into GoogleCloudPlatform:main Feb 16, 2024
2 checks passed
@bobcatfish
Copy link
Collaborator Author

thanks for the review @jerop !! i made the changes you suggested and then... FORGOT TO PUSH THEM so i'll add them into #19 instead

bobcatfish added a commit to bobcatfish/localllm-gcp that referenced this pull request Feb 16, 2024
llama-cpp-python has a flag 'verbose' which defaults to true and when
set causes it to write things to stderr. It doesn't include anyway to
configure where these logs are directed, so it's stderr or nothing.

Unfortunately the when we start the process running llama-cpp-python, we
provide a pipe for stderr and then promptly close it. This means if
llama-cpp-python tries to write to stderr, a broken pipe exception is
thrown, which for example happens if there is a prefix cache hit when
processing a prompt
(https://github.com/abetlen/llama-cpp-python/blob/ae71ad1a147b10c2c3ba99eb086521cddcc4fad4/llama_cpp/llama.py#L645)
which likely explains why ppl are seeing 500s on the second time that
they try to run the same prompt. There are other situations that can
make llama-cpp-python try to write to stderr as well, which may also
cause 500s

The real fix here is to a) not provide a broken pipe for stderr and
b) for llama-cpp-python to allow us to configure logs (GoogleCloudPlatform#18). For now
we can disable verbose mode in llama-cpp-python since we're not making
those logs available anyway and it should stop the 500s.

Fixes GoogleCloudPlatform#7
bobcatfish added a commit to bobcatfish/localllm-gcp that referenced this pull request Feb 16, 2024
Addressing feedback from @jerop in GoogleCloudPlatform#18:
- Making the docstring more accurate for listing running processes
- Making the check if log_config is provided more python-y
jerop pushed a commit that referenced this pull request Feb 16, 2024
llama-cpp-python has a flag 'verbose' which defaults to true and when
set causes it to write things to stderr. It doesn't include anyway to
configure where these logs are directed, so it's stderr or nothing.

Unfortunately the when we start the process running llama-cpp-python, we
provide a pipe for stderr and then promptly close it. This means if
llama-cpp-python tries to write to stderr, a broken pipe exception is
thrown, which for example happens if there is a prefix cache hit when
processing a prompt
(https://github.com/abetlen/llama-cpp-python/blob/ae71ad1a147b10c2c3ba99eb086521cddcc4fad4/llama_cpp/llama.py#L645)
which likely explains why ppl are seeing 500s on the second time that
they try to run the same prompt. There are other situations that can
make llama-cpp-python try to write to stderr as well, which may also
cause 500s

The real fix here is to a) not provide a broken pipe for stderr and
b) for llama-cpp-python to allow us to configure logs (#18). For now
we can disable verbose mode in llama-cpp-python since we're not making
those logs available anyway and it should stop the 500s.

Fixes #7
jerop pushed a commit that referenced this pull request Feb 16, 2024
Addressing feedback from @jerop in #18:
- Making the docstring more accurate for listing running processes
- Making the check if log_config is provided more python-y
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Capture logs from running models
2 participants