-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Remove streaming LLM tracking; they are all streaming now #4944
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All streaming related changes look good to me. I have one change request regarding the added newline "\n". It means we can also change the example(s), if I am not mistaken. That's all. 🙂
haystack/agents/base.py
Outdated
@@ -318,7 +317,7 @@ def on_agent_start(**kwargs: Any) -> None: | |||
self.callback_manager.on_new_token += lambda token, **kwargs: print_text(token, color=agent_color) | |||
else: | |||
self.callback_manager.on_agent_step += lambda agent_step: print_text( | |||
agent_step.prompt_node_response, color=agent_color | |||
agent_step.prompt_node_response + "\n", color=agent_color |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is not related to the streaming but it means that we can slightly change the print(results)
in the Agent examples, right? For example, here: https://github.com/deepset-ai/haystack/blob/main/examples/agent_multihop_qa.py#L112
Instead of
result = agent.run(query=question)
print(f"\n{result}")
we can do just print(result)
in the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vblagoje If I run the Agent example from this branch, the \n
is still needed:
result = agent.run(query=question)
print(f"\n{result}")
Could you please briefly explain the effect of having the change end="\n"
in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing. The effect is best observed when running non streaming agent. Simply switch Agent to be non streaming in init method and run multi hop examples with and without end="\n" fix. This fix has no effect on streaming agents, only on non-streaming!
75fe82f
to
cf0183f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! 👍 Just tried it in an example and a tutorial. Thanks for explaining the "\n"
.
Related Issues
After HF introduced token streaming in the 4.28.0 release, all their local LLMs and modes are now streaming-capable. All other PromptNode-supported models, including OpenAI, Cohere, and Anthropic, are already streaming-capable. With the introduction of recent HF inferencing API additions, all remote HF LLMs are also streaming.
There is no point in tracking streaming-capable models any longer. By default, streaming is on. If, for some reason, the Agent needs to be non-streaming, we provide the boolean parameter switch.
Proposed Changes:
How did you test it?
Manual tests
Notes for the reviewer
Let's manually test multiple models in both modes
Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.