Skip to content

[DOC] Add reasoning capability to vLLM streamlit code #19557

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 16, 2025

Conversation

Navanit-git
Copy link
Contributor

@Navanit-git Navanit-git commented Jun 12, 2025

Purpose

Added reasoning/thinking process visualization to the vLLM Chat Assistant Streamlit interface.
This enhancement allows users to:

  • View the model's thought process in real-time
  • Enable/disable reasoning display via UI toggle
  • Persist reasoning history across chat sessions
  • Auto-detect model reasoning capability

Key changes:

  1. Added reasoning state management using Streamlit session state
  2. Implemented streaming display of model's thinking process
  3. Added dynamic reasoning support detection
  4. Enhanced message history with reasoning context

Test Screenshots

when the model has reasoning parser
Screenshot 2025-06-12 195736

when the model has no reasoning parser
image

Response while using thinking process
Screenshot 2025-06-12 195804
Screenshot 2025-06-12 195820

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @Navanit-git, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the vLLM Streamlit chatbot interface by adding a feature to visualize the model's internal reasoning process. This provides users with greater transparency into how the model arrives at its answers, making the interaction more insightful. The implementation includes streaming updates, session persistence, and automatic detection of model support for this feature.

Highlights

  • Reasoning Display: Added the capability to display the model's internal 'thinking process' or reasoning alongside its final response in the Streamlit UI.
  • Streaming & UI: Implemented streaming display for both the reasoning process and the final content, showing the thinking process in a collapsible expander above the main response.
  • Session Management: Integrated reasoning display state and history persistence using Streamlit's session state, ensuring the thinking process is saved and displayed for past messages.
  • Feature Toggle & Detection: Added a sidebar toggle to enable/disable the reasoning display and included logic to auto-detect if the currently loaded model supports returning reasoning content.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configureGemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mergify mergify bot added the documentation Improvements or additions to documentation label Jun 12, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds an insightful reasoning visualization feature to the vLLM Streamlit chat interface. The implementation includes dynamic detection of model reasoning capabilities, UI toggles, and updates to the chat history display. Key improvements involve comprehensive docstrings and clear session state management for the new features.

I've identified a couple of areas for improvement:

  • A potential NameError related to the reasoning toggle needs to be addressed.
  • The persistence of reasoning history across chat sessions requires a modification to ensure each session's reasoning is stored and retrieved correctly.
  • Some minor docstring clarifications are also suggested.

Overall, these changes significantly enhance the chatbot's utility for users wanting to understand the model's thought process.

- current_session: Sets to new session ID
- active_session: Sets to new session ID
- messages: Resets to empty list
"""
session_id = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
st.session_state.sessions[session_id] = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Continuing from the feedback on line 57 regarding reasoning persistence: when a new chat session is created, you should also initialize the reasoning store for this new session_id.

Add the following line within create_new_chat_session after st.session_state.sessions[session_id] = []:

Suggested change
st.session_state.sessions[session_id] = []
st.session_state.sessions[session_id] = []
st.session_state.reasoning_store[session_id] = {} # Initialize reasoning for this new session

Comment on lines +245 to +247
if idx in st.session_state.show_reasoning:
with st.expander("💭 Thinking Process", expanded=False):
st.markdown(st.session_state.show_reasoning[idx])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Continuing from the feedback on line 57 regarding reasoning persistence: when displaying reasoning, you need to access the session-specific reasoning store.

Assuming st.session_state.reasoning_store is now a dictionary mapping session_id to message_idx: reasoning_text dictionaries, the check and access should be updated.

Suggested change
if idx in st.session_state.show_reasoning:
with st.expander("💭 Thinking Process", expanded=False):
st.markdown(st.session_state.show_reasoning[idx])
current_session_id = st.session_state.current_session
if current_session_id in st.session_state.reasoning_store and \
idx in st.session_state.reasoning_store[current_session_id]:
with st.expander("💭 Thinking Process", expanded=False):
st.markdown(st.session_state.reasoning_store[current_session_id][idx])

Comment on lines +308 to +309
if reason and think:
st.session_state.show_reasoning[message_index] = think
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Continuing from the feedback on line 57 regarding reasoning persistence: when persisting reasoning, it should be stored in the session-specific reasoning store.

Assuming st.session_state.reasoning_store is used as suggested, update this line to save reasoning to the correct session's store.

Suggested change
if reason and think:
st.session_state.show_reasoning[message_index] = think
if reason and think:
current_session_id = st.session_state.current_session
st.session_state.reasoning_store[current_session_id][message_index] = think

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@Navanit-git
Copy link
Contributor Author

image
There seems to be no error for multiple sessions with reasoning.

@Navanit-git
Copy link
Contributor Author

@DarkLight1337 Kindly review it please

@DarkLight1337 DarkLight1337 requested review from mgoin and aarnphm June 13, 2025 07:29
@Navanit-git Navanit-git changed the title Add reasoning capability to vLLM streamlit code [DOC] Add reasoning capability to vLLM streamlit code Jun 13, 2025
@Navanit-git
Copy link
Contributor Author

@mgoin and @aarnphm any review is appreciated.

Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the improvements!

@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 16, 2025
@mgoin mgoin merged commit 3e75069 into vllm-project:main Jun 16, 2025
54 checks passed
@lys791227
Copy link

The following changes are recommended to solve the issue of multiple sessions with reasoning coverage:

SPDX-License-Identifier: Apache-2.0

SPDX-FileCopyrightText: Copyright contributors to the vLLM project

"""
vLLM Chat Assistant - A Streamlit Web Interface

A streamlined chat interface that quickly integrates
with vLLM API server.

Features:

  • Multiple chat sessions management
  • Streaming response display
  • Configurable API endpoint
  • Real-time chat history
  • Reasoning Display: Optional thinking process visualization

Requirements:
pip install streamlit openai
Usage:
# Start the app with default settings
streamlit run streamlit_openai_chatbot_webserver.py

# Start with custom vLLM API endpoint
VLLM_API_BASE="http://your-server:8000/v1" \
    streamlit run streamlit_openai_chatbot_webserver.py

# Enable debug mode
streamlit run streamlit_openai_chatbot_webserver.py \
    --logger.level=debug

"""

import os
from datetime import datetime

import streamlit as st
from openai import OpenAI

Get command line arguments from environment variables

openai_api_key = os.getenv("VLLM_API_KEY", "EMPTY")
openai_api_base = os.getenv("VLLM_API_BASE", "http://localhost:8000/v1")

Initialize session states for managing chat sessions

if "sessions" not in st.session_state:
st.session_state.sessions = {}

if "current_session" not in st.session_state:
st.session_state.current_session = None

if "messages" not in st.session_state:
st.session_state.messages = []

if "active_session" not in st.session_state:
st.session_state.active_session = None

Add new session state for reasoning - 修改为按会话存储

if "show_reasoning" not in st.session_state:
st.session_state.show_reasoning = {} # 格式: {session_id: {message_index: reasoning_text}}

Initialize session state for API base URL

if "api_base_url" not in st.session_state:
st.session_state.api_base_url = openai_api_base

def create_new_chat_session():
"""Create a new chat session with timestamp as unique identifier.

This function initializes a new chat session by:
1. Generating a timestamp-based session ID
2. Creating an empty message list for the new session
3. Setting the new session as both current and active session
4. Resetting the messages list for the new session

Returns:
    None

Session State Updates:
    - sessions: Adds new empty message list with timestamp key
    - current_session: Sets to new session ID
    - active_session: Sets to new session ID
    - messages: Resets to empty list
"""
session_id = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
st.session_state.sessions[session_id] = []
st.session_state.current_session = session_id
st.session_state.active_session = session_id
st.session_state.messages = []

def switch_to_chat_session(session_id):
"""Switch the active chat context to a different session.

Args:
    session_id (str): The timestamp ID of the session to switch to

This function handles chat session switching by:
1. Setting the specified session as current
2. Updating the active session marker
3. Loading the messages history from the specified session

Session State Updates:
    - current_session: Updated to specified session_id
    - active_session: Updated to specified session_id
    - messages: Loaded from sessions[session_id]
"""
st.session_state.current_session = session_id
st.session_state.active_session = session_id
st.session_state.messages = st.session_state.sessions[session_id]

def get_llm_response(messages, model, reason, content_ph=None, reasoning_ph=None):
"""Generate and stream LLM response with optional reasoning process.

Args:
    messages (list): List of conversation message dicts with 'role' and 'content'
    model (str): The model identifier to use for generation
    reason (bool): Whether to enable and display reasoning process
    content_ph (streamlit.empty): Placeholder for streaming response content
    reasoning_ph (streamlit.empty): Placeholder for streaming reasoning process

Returns:
    tuple: (str, str)
        - First string contains the complete response text
        - Second string contains the complete reasoning text (if enabled)

Features:
    - Streams both reasoning and response text in real-time
    - Handles model API errors gracefully
    - Supports live updating of thinking process
    - Maintains separate content and reasoning displays

Raises:
    Exception: Wrapped in error message if API call fails

Note:
    The function uses streamlit placeholders for live updates.
    When reason=True, the reasoning process appears above the response.
"""
full_text = ""
think_text = ""
live_think = None
# Build request parameters
params = {"model": model, "messages": messages, "stream": True}
if reason:
    params["extra_body"] = {"chat_template_kwargs": {"enable_thinking": True}}

try:
    response = client.chat.completions.create(**params)
    if isinstance(response, str):
        if content_ph:
            content_ph.markdown(response)
        return response, ""

    # Prepare reasoning expander above content
    if reason and reasoning_ph:
        exp = reasoning_ph.expander("💭 Thinking Process (live)", expanded=True)
        live_think = exp.empty()

    # Stream chunks
    for chunk in response:
        delta = chunk.choices[0].delta
        # Stream reasoning first
        if reason and hasattr(delta, "reasoning_content") and live_think:
            rc = delta.reasoning_content
            if rc:
                think_text += rc
                live_think.markdown(think_text + "▌")
        # Then stream content
        if hasattr(delta, "content") and delta.content and content_ph:
            full_text += delta.content
            content_ph.markdown(full_text + "▌")

    # Finalize displays: reasoning remains above, content below
    if reason and live_think:
        live_think.markdown(think_text)
    if content_ph:
        content_ph.markdown(full_text)

    return full_text, think_text
except Exception as e:
    st.error(f"Error details: {str(e)}")
    return f"Error: {str(e)}", ""

Sidebar - API Settings first

st.sidebar.title("API Settings")
new_api_base = st.sidebar.text_input(
"API Base URL:", value=st.session_state.api_base_url
)
if new_api_base != st.session_state.api_base_url:
st.session_state.api_base_url = new_api_base
st.rerun()

st.sidebar.divider()

Sidebar - Session Management

st.sidebar.title("Chat Sessions")
if st.sidebar.button("New Session"):
create_new_chat_session()

Display all sessions in reverse chronological order

for session_id in sorted(st.session_state.sessions.keys(), reverse=True):
# Mark the active session with a pinned button
if session_id == st.session_state.active_session:
st.sidebar.button(
f"📍 {session_id}",
key=session_id,
type="primary",
on_click=switch_to_chat_session,
args=(session_id,),
)
else:
st.sidebar.button(
f"Session {session_id}",
key=session_id,
on_click=switch_to_chat_session,
args=(session_id,),
)

Main interface

st.title("vLLM Chat Assistant")

Initialize OpenAI client with API settings

client = OpenAI(api_key=openai_api_key, base_url=st.session_state.api_base_url)

Get and display current model id

models = client.models.list()
model = models.data[0].id
st.markdown(f"Model: {model}")

Initialize first session if none exists

if st.session_state.current_session is None:
create_new_chat_session()
st.session_state.active_session = st.session_state.current_session

Update the chat history display section

for idx, msg in enumerate(st.session_state.messages):
# Render user messages normally
if msg["role"] == "user":
with st.chat_message("user"):
st.write(msg["content"])
# Render assistant messages with reasoning above
else:
# If reasoning exists for this assistant message, show it above the content
current_session_id = st.session_state.current_session
if (current_session_id in st.session_state.show_reasoning and
idx in st.session_state.show_reasoning[current_session_id]):
with st.expander("💭 Thinking Process", expanded=False):
st.markdown(st.session_state.show_reasoning[current_session_id][idx])
with st.chat_message("assistant"):
st.write(msg["content"])

Setup & Cache reasoning support check

@st.cache_data(show_spinner=False)
def server_supports_reasoning():
"""Check if the current model supports reasoning capability.

Returns:
    bool: True if the model supports reasoning, False otherwise
"""
resp = client.chat.completions.create(
    model=model,
    messages=[{"role": "user", "content": "Hi"}],
    stream=False,
)
return hasattr(resp.choices[0].message, "reasoning_content") and bool(
    resp.choices[0].message.reasoning_content
)

Check support

supports_reasoning = server_supports_reasoning()

Add reasoning toggle in sidebar if supported

reason = False # Default to False
if supports_reasoning:
reason = st.sidebar.checkbox("Enable Reasoning", value=False)
else:
st.sidebar.markdown(
"Reasoning unavailable for this model.",
unsafe_allow_html=True,
)
# reason remains False

Update the input handling section

if prompt := st.chat_input("Type your message here..."):
# Save and display user message
st.session_state.messages.append({"role": "user", "content": prompt})
st.session_state.sessions[st.session_state.current_session] = (
st.session_state.messages
)
with st.chat_message("user"):
st.write(prompt)

# Prepare LLM messages
msgs = [
    {"role": m["role"], "content": m["content"]} for m in st.session_state.messages
]

# Stream assistant response
with st.chat_message("assistant"):
    # Placeholders: reasoning above, content below
    reason_ph = st.empty()
    content_ph = st.empty()
    full, think = get_llm_response(msgs, model, reason, content_ph, reason_ph)
    # Determine index for this new assistant message
    message_index = len(st.session_state.messages)
    # Save assistant reply
    st.session_state.messages.append({"role": "assistant", "content": full})
    # Persist reasoning in session state if any
    if reason and think:
        current_session_id = st.session_state.current_session
        # 确保当前会话的推理存储已初始化
        if current_session_id not in st.session_state.show_reasoning:
            st.session_state.show_reasoning[current_session_id] = {}
        st.session_state.show_reasoning[current_session_id][message_index] = think

yeqcharlotte pushed a commit to yeqcharlotte/vllm that referenced this pull request Jun 22, 2025
minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025
yangw-dev pushed a commit to yangw-dev/vllm that referenced this pull request Jun 24, 2025
xjpang pushed a commit to xjpang/vllm that referenced this pull request Jun 30, 2025
wseaton pushed a commit to wseaton/vllm that referenced this pull request Jun 30, 2025
avigny pushed a commit to avigny/vllm that referenced this pull request Jul 31, 2025
…9557)

Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants