-
-
Notifications
You must be signed in to change notification settings - Fork 9.4k
[DOC] Add reasoning capability to vLLM streamlit code #19557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @Navanit-git, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request enhances the vLLM Streamlit chatbot interface by adding a feature to visualize the model's internal reasoning process. This provides users with greater transparency into how the model arrives at its answers, making the interaction more insightful. The implementation includes streaming updates, session persistence, and automatic detection of model support for this feature.
Highlights
- Reasoning Display: Added the capability to display the model's internal 'thinking process' or reasoning alongside its final response in the Streamlit UI.
- Streaming & UI: Implemented streaming display for both the reasoning process and the final content, showing the thinking process in a collapsible expander above the main response.
- Session Management: Integrated reasoning display state and history persistence using Streamlit's session state, ensuring the thinking process is saved and displayed for past messages.
- Feature Toggle & Detection: Added a sidebar toggle to enable/disable the reasoning display and included logic to auto-detect if the currently loaded model supports returning reasoning content.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configureGemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds an insightful reasoning visualization feature to the vLLM Streamlit chat interface. The implementation includes dynamic detection of model reasoning capabilities, UI toggles, and updates to the chat history display. Key improvements involve comprehensive docstrings and clear session state management for the new features.
I've identified a couple of areas for improvement:
- A potential
NameError
related to the reasoning toggle needs to be addressed. - The persistence of reasoning history across chat sessions requires a modification to ensure each session's reasoning is stored and retrieved correctly.
- Some minor docstring clarifications are also suggested.
Overall, these changes significantly enhance the chatbot's utility for users wanting to understand the model's thought process.
- current_session: Sets to new session ID | ||
- active_session: Sets to new session ID | ||
- messages: Resets to empty list | ||
""" | ||
session_id = datetime.now().strftime("%Y-%m-%d %H:%M:%S") | ||
st.session_state.sessions[session_id] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Continuing from the feedback on line 57 regarding reasoning persistence: when a new chat session is created, you should also initialize the reasoning store for this new session_id
.
Add the following line within create_new_chat_session
after st.session_state.sessions[session_id] = []
:
st.session_state.sessions[session_id] = [] | |
st.session_state.sessions[session_id] = [] | |
st.session_state.reasoning_store[session_id] = {} # Initialize reasoning for this new session |
if idx in st.session_state.show_reasoning: | ||
with st.expander("💭 Thinking Process", expanded=False): | ||
st.markdown(st.session_state.show_reasoning[idx]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Continuing from the feedback on line 57 regarding reasoning persistence: when displaying reasoning, you need to access the session-specific reasoning store.
Assuming st.session_state.reasoning_store
is now a dictionary mapping session_id
to message_idx: reasoning_text
dictionaries, the check and access should be updated.
if idx in st.session_state.show_reasoning: | |
with st.expander("💭 Thinking Process", expanded=False): | |
st.markdown(st.session_state.show_reasoning[idx]) | |
current_session_id = st.session_state.current_session | |
if current_session_id in st.session_state.reasoning_store and \ | |
idx in st.session_state.reasoning_store[current_session_id]: | |
with st.expander("💭 Thinking Process", expanded=False): | |
st.markdown(st.session_state.reasoning_store[current_session_id][idx]) |
if reason and think: | ||
st.session_state.show_reasoning[message_index] = think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Continuing from the feedback on line 57 regarding reasoning persistence: when persisting reasoning, it should be stored in the session-specific reasoning store.
Assuming st.session_state.reasoning_store
is used as suggested, update this line to save reasoning to the correct session's store.
if reason and think: | |
st.session_state.show_reasoning[message_index] = think | |
if reason and think: | |
current_session_id = st.session_state.current_session | |
st.session_state.reasoning_store[current_session_id][message_index] = think |
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@DarkLight1337 Kindly review it please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the improvements!
The following changes are recommended to solve the issue of multiple sessions with reasoning coverage: SPDX-License-Identifier: Apache-2.0SPDX-FileCopyrightText: Copyright contributors to the vLLM project""" A streamlined chat interface that quickly integrates Features:
Requirements:
""" import os import streamlit as st Get command line arguments from environment variablesopenai_api_key = os.getenv("VLLM_API_KEY", "EMPTY") Initialize session states for managing chat sessionsif "sessions" not in st.session_state: if "current_session" not in st.session_state: if "messages" not in st.session_state: if "active_session" not in st.session_state: Add new session state for reasoning - 修改为按会话存储if "show_reasoning" not in st.session_state: Initialize session state for API base URLif "api_base_url" not in st.session_state: def create_new_chat_session():
def switch_to_chat_session(session_id):
def get_llm_response(messages, model, reason, content_ph=None, reasoning_ph=None):
Sidebar - API Settings firstst.sidebar.title("API Settings") st.sidebar.divider() Sidebar - Session Managementst.sidebar.title("Chat Sessions") Display all sessions in reverse chronological orderfor session_id in sorted(st.session_state.sessions.keys(), reverse=True): Main interfacest.title("vLLM Chat Assistant") Initialize OpenAI client with API settingsclient = OpenAI(api_key=openai_api_key, base_url=st.session_state.api_base_url) Get and display current model idmodels = client.models.list() Initialize first session if none existsif st.session_state.current_session is None: Update the chat history display sectionfor idx, msg in enumerate(st.session_state.messages): Setup & Cache reasoning support check@st.cache_data(show_spinner=False)
Check supportsupports_reasoning = server_supports_reasoning() Add reasoning toggle in sidebar if supportedreason = False # Default to False Update the input handling sectionif prompt := st.chat_input("Type your message here..."):
|
…9557) Signed-off-by: minpeter <kali2005611@gmail.com>
…9557) Signed-off-by: Yang Wang <elainewy@meta.com>
…9557) Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
Purpose
Added reasoning/thinking process visualization to the vLLM Chat Assistant Streamlit interface.
This enhancement allows users to:
Key changes:
Test Screenshots
when the model has reasoning parser

when the model has no reasoning parser

Response while using thinking process

