[DOC] Add reasoning capability to vLLM streamlit code #19557

Navanit-git · 2025-06-12T14:29:35Z

Purpose

Added reasoning/thinking process visualization to the vLLM Chat Assistant Streamlit interface.
This enhancement allows users to:

View the model's thought process in real-time
Enable/disable reasoning display via UI toggle
Persist reasoning history across chat sessions
Auto-detect model reasoning capability

Key changes:

Added reasoning state management using Streamlit session state
Implemented streaming display of model's thinking process
Added dynamic reasoning support detection
Enhanced message history with reasoning context

Test Screenshots

when the model has reasoning parser

when the model has no reasoning parser

Response while using thinking process

github-actions · 2025-06-12T14:29:46Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Summary of Changes

Hello @Navanit-git, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the vLLM Streamlit chatbot interface by adding a feature to visualize the model's internal reasoning process. This provides users with greater transparency into how the model arrives at its answers, making the interaction more insightful. The implementation includes streaming updates, session persistence, and automatic detection of model support for this feature.

Highlights

Reasoning Display: Added the capability to display the model's internal 'thinking process' or reasoning alongside its final response in the Streamlit UI.
Streaming & UI: Implemented streaming display for both the reasoning process and the final content, showing the thinking process in a collapsible expander above the main response.
Session Management: Integrated reasoning display state and history persistence using Streamlit's session state, ensuring the thinking process is saved and displayed for past messages.
Feature Toggle & Detection: Added a sidebar toggle to enable/disable the reasoning display and included logic to auto-detect if the currently loaded model supports returning reasoning content.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configureGemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds an insightful reasoning visualization feature to the vLLM Streamlit chat interface. The implementation includes dynamic detection of model reasoning capabilities, UI toggles, and updates to the chat history display. Key improvements involve comprehensive docstrings and clear session state management for the new features.

I've identified a couple of areas for improvement:

A potential NameError related to the reasoning toggle needs to be addressed.
The persistence of reasoning history across chat sessions requires a modification to ensure each session's reasoning is stored and retrieved correctly.
Some minor docstring clarifications are also suggested.

Overall, these changes significantly enhance the chatbot's utility for users wanting to understand the model's thought process.

examples/online_serving/streamlit_openai_chatbot_webserver.py

gemini-code-assist · 2025-06-12T14:31:51Z

examples/online_serving/streamlit_openai_chatbot_webserver.py

+        - current_session: Sets to new session ID
+        - active_session: Sets to new session ID
+        - messages: Resets to empty list
+    """
    session_id = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    st.session_state.sessions[session_id] = []


Continuing from the feedback on line 57 regarding reasoning persistence: when a new chat session is created, you should also initialize the reasoning store for this new session_id.

Add the following line within create_new_chat_session after st.session_state.sessions[session_id] = []:

Suggested change

st.session_state.sessions[session_id] = []

st.session_state.sessions[session_id] = []

st.session_state.reasoning_store[session_id] = {} # Initialize reasoning for this new session

gemini-code-assist · 2025-06-12T14:31:51Z

examples/online_serving/streamlit_openai_chatbot_webserver.py

+        if idx in st.session_state.show_reasoning:
+            with st.expander("💭 Thinking Process", expanded=False):
+                st.markdown(st.session_state.show_reasoning[idx])


Continuing from the feedback on line 57 regarding reasoning persistence: when displaying reasoning, you need to access the session-specific reasoning store.

Assuming st.session_state.reasoning_store is now a dictionary mapping session_id to message_idx: reasoning_text dictionaries, the check and access should be updated.

Suggested change

if idx in st.session_state.show_reasoning:

with st.expander("💭 Thinking Process", expanded=False):

st.markdown(st.session_state.show_reasoning[idx])

current_session_id = st.session_state.current_session

if current_session_id in st.session_state.reasoning_store and \

idx in st.session_state.reasoning_store[current_session_id]:

with st.expander("💭 Thinking Process", expanded=False):

st.markdown(st.session_state.reasoning_store[current_session_id][idx])

gemini-code-assist · 2025-06-12T14:31:51Z

examples/online_serving/streamlit_openai_chatbot_webserver.py

+        if reason and think:
+            st.session_state.show_reasoning[message_index] = think


Continuing from the feedback on line 57 regarding reasoning persistence: when persisting reasoning, it should be stored in the session-specific reasoning store.

Assuming st.session_state.reasoning_store is used as suggested, update this line to save reasoning to the correct session's store.

Suggested change

if reason and think:

st.session_state.show_reasoning[message_index] = think

if reason and think:

current_session_id = st.session_state.current_session

st.session_state.reasoning_store[current_session_id][message_index] = think

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Navanit-git · 2025-06-12T14:42:22Z

There seems to be no error for multiple sessions with reasoning.

Navanit-git · 2025-06-13T07:26:13Z

@DarkLight1337 Kindly review it please

Navanit-git · 2025-06-15T05:49:45Z

@mgoin and @aarnphm any review is appreciated.

mgoin

Thank you for the improvements!

lys791227 · 2025-06-17T07:48:30Z

The following changes are recommended to solve the issue of multiple sessions with reasoning coverage:

SPDX-License-Identifier: Apache-2.0

SPDX-FileCopyrightText: Copyright contributors to the vLLM project

"""
vLLM Chat Assistant - A Streamlit Web Interface

A streamlined chat interface that quickly integrates
with vLLM API server.

Features:

Multiple chat sessions management
Streaming response display
Configurable API endpoint
Real-time chat history
Reasoning Display: Optional thinking process visualization

Requirements:
pip install streamlit openai
Usage:
# Start the app with default settings
streamlit run streamlit_openai_chatbot_webserver.py

# Start with custom vLLM API endpoint
VLLM_API_BASE="http://your-server:8000/v1" \
    streamlit run streamlit_openai_chatbot_webserver.py

# Enable debug mode
streamlit run streamlit_openai_chatbot_webserver.py \
    --logger.level=debug

"""

import os
from datetime import datetime

import streamlit as st
from openai import OpenAI

Get command line arguments from environment variables

openai_api_key = os.getenv("VLLM_API_KEY", "EMPTY")
openai_api_base = os.getenv("VLLM_API_BASE", "http://localhost:8000/v1")

Initialize session states for managing chat sessions

if "sessions" not in st.session_state:
st.session_state.sessions = {}

if "current_session" not in st.session_state:
st.session_state.current_session = None

if "messages" not in st.session_state:
st.session_state.messages = []

if "active_session" not in st.session_state:
st.session_state.active_session = None

Add new session state for reasoning - 修改为按会话存储

if "show_reasoning" not in st.session_state:
st.session_state.show_reasoning = {} # 格式: {session_id: {message_index: reasoning_text}}

Initialize session state for API base URL

if "api_base_url" not in st.session_state:
st.session_state.api_base_url = openai_api_base

def create_new_chat_session():
"""Create a new chat session with timestamp as unique identifier.

This function initializes a new chat session by:
1. Generating a timestamp-based session ID
2. Creating an empty message list for the new session
3. Setting the new session as both current and active session
4. Resetting the messages list for the new session

Returns:
    None

Session State Updates:
    - sessions: Adds new empty message list with timestamp key
    - current_session: Sets to new session ID
    - active_session: Sets to new session ID
    - messages: Resets to empty list
"""
session_id = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
st.session_state.sessions[session_id] = []
st.session_state.current_session = session_id
st.session_state.active_session = session_id
st.session_state.messages = []

def switch_to_chat_session(session_id):
"""Switch the active chat context to a different session.

Args:
    session_id (str): The timestamp ID of the session to switch to

This function handles chat session switching by:
1. Setting the specified session as current
2. Updating the active session marker
3. Loading the messages history from the specified session

Session State Updates:
    - current_session: Updated to specified session_id
    - active_session: Updated to specified session_id
    - messages: Loaded from sessions[session_id]
"""
st.session_state.current_session = session_id
st.session_state.active_session = session_id
st.session_state.messages = st.session_state.sessions[session_id]

def get_llm_response(messages, model, reason, content_ph=None, reasoning_ph=None):
"""Generate and stream LLM response with optional reasoning process.

Args:
    messages (list): List of conversation message dicts with 'role' and 'content'
    model (str): The model identifier to use for generation
    reason (bool): Whether to enable and display reasoning process
    content_ph (streamlit.empty): Placeholder for streaming response content
    reasoning_ph (streamlit.empty): Placeholder for streaming reasoning process

Returns:
    tuple: (str, str)
        - First string contains the complete response text
        - Second string contains the complete reasoning text (if enabled)

Features:
    - Streams both reasoning and response text in real-time
    - Handles model API errors gracefully
    - Supports live updating of thinking process
    - Maintains separate content and reasoning displays

Raises:
    Exception: Wrapped in error message if API call fails

Note:
    The function uses streamlit placeholders for live updates.
    When reason=True, the reasoning process appears above the response.
"""
full_text = ""
think_text = ""
live_think = None
# Build request parameters
params = {"model": model, "messages": messages, "stream": True}
if reason:
    params["extra_body"] = {"chat_template_kwargs": {"enable_thinking": True}}

try:
    response = client.chat.completions.create(**params)
    if isinstance(response, str):
        if content_ph:
            content_ph.markdown(response)
        return response, ""

    # Prepare reasoning expander above content
    if reason and reasoning_ph:
        exp = reasoning_ph.expander("💭 Thinking Process (live)", expanded=True)
        live_think = exp.empty()

    # Stream chunks
    for chunk in response:
        delta = chunk.choices[0].delta
        # Stream reasoning first
        if reason and hasattr(delta, "reasoning_content") and live_think:
            rc = delta.reasoning_content
            if rc:
                think_text += rc
                live_think.markdown(think_text + "▌")
        # Then stream content
        if hasattr(delta, "content") and delta.content and content_ph:
            full_text += delta.content
            content_ph.markdown(full_text + "▌")

    # Finalize displays: reasoning remains above, content below
    if reason and live_think:
        live_think.markdown(think_text)
    if content_ph:
        content_ph.markdown(full_text)

    return full_text, think_text
except Exception as e:
    st.error(f"Error details: {str(e)}")
    return f"Error: {str(e)}", ""

Sidebar - API Settings first

st.sidebar.title("API Settings")
new_api_base = st.sidebar.text_input(
"API Base URL:", value=st.session_state.api_base_url
)
if new_api_base != st.session_state.api_base_url:
st.session_state.api_base_url = new_api_base
st.rerun()

st.sidebar.divider()

Sidebar - Session Management

st.sidebar.title("Chat Sessions")
if st.sidebar.button("New Session"):
create_new_chat_session()

Display all sessions in reverse chronological order

for session_id in sorted(st.session_state.sessions.keys(), reverse=True):
# Mark the active session with a pinned button
if session_id == st.session_state.active_session:
st.sidebar.button(
f"📍 {session_id}",
key=session_id,
type="primary",
on_click=switch_to_chat_session,
args=(session_id,),
)
else:
st.sidebar.button(
f"Session {session_id}",
key=session_id,
on_click=switch_to_chat_session,
args=(session_id,),
)

Main interface

st.title("vLLM Chat Assistant")

Initialize OpenAI client with API settings

client = OpenAI(api_key=openai_api_key, base_url=st.session_state.api_base_url)

Get and display current model id

models = client.models.list()
model = models.data[0].id
st.markdown(f"Model: {model}")

Initialize first session if none exists

if st.session_state.current_session is None:
create_new_chat_session()
st.session_state.active_session = st.session_state.current_session

Update the chat history display section

for idx, msg in enumerate(st.session_state.messages):
# Render user messages normally
if msg["role"] == "user":
with st.chat_message("user"):
st.write(msg["content"])
# Render assistant messages with reasoning above
else:
# If reasoning exists for this assistant message, show it above the content
current_session_id = st.session_state.current_session
if (current_session_id in st.session_state.show_reasoning and
idx in st.session_state.show_reasoning[current_session_id]):
with st.expander("💭 Thinking Process", expanded=False):
st.markdown(st.session_state.show_reasoning[current_session_id][idx])
with st.chat_message("assistant"):
st.write(msg["content"])

Setup & Cache reasoning support check

@st.cache_data(show_spinner=False)
def server_supports_reasoning():
"""Check if the current model supports reasoning capability.

Returns:
    bool: True if the model supports reasoning, False otherwise
"""
resp = client.chat.completions.create(
    model=model,
    messages=[{"role": "user", "content": "Hi"}],
    stream=False,
)
return hasattr(resp.choices[0].message, "reasoning_content") and bool(
    resp.choices[0].message.reasoning_content
)

Check support

supports_reasoning = server_supports_reasoning()

Add reasoning toggle in sidebar if supported

reason = False # Default to False
if supports_reasoning:
reason = st.sidebar.checkbox("Enable Reasoning", value=False)
else:
st.sidebar.markdown(
"Reasoning unavailable for this model.",
unsafe_allow_html=True,
)
# reason remains False

Update the input handling section

if prompt := st.chat_input("Type your message here..."):
# Save and display user message
st.session_state.messages.append({"role": "user", "content": prompt})
st.session_state.sessions[st.session_state.current_session] = (
st.session_state.messages
)
with st.chat_message("user"):
st.write(prompt)

# Prepare LLM messages
msgs = [
    {"role": m["role"], "content": m["content"]} for m in st.session_state.messages
]

# Stream assistant response
with st.chat_message("assistant"):
    # Placeholders: reasoning above, content below
    reason_ph = st.empty()
    content_ph = st.empty()
    full, think = get_llm_response(msgs, model, reason, content_ph, reason_ph)
    # Determine index for this new assistant message
    message_index = len(st.session_state.messages)
    # Save assistant reply
    st.session_state.messages.append({"role": "assistant", "content": full})
    # Persist reasoning in session state if any
    if reason and think:
        current_session_id = st.session_state.current_session
        # 确保当前会话的推理存储已初始化
        if current_session_id not in st.session_state.show_reasoning:
            st.session_state.show_reasoning[current_session_id] = {}
        st.session_state.show_reasoning[current_session_id][message_index] = think

…9557)

…9557) Signed-off-by: minpeter <kali2005611@gmail.com>

…9557) Signed-off-by: Yang Wang <elainewy@meta.com>

…9557)

…9557) Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>

feat: add reasoning capability to vLLM chat assistant

d4ae693

gemini-code-assist bot reviewed Jun 12, 2025

View reviewed changes

mergify bot added the documentation label Jun 12, 2025

gemini-code-assist bot reviewed Jun 12, 2025

View reviewed changes

Update examples/online_serving/streamlit_openai_chatbot_webserver.py

20d9efb

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

DarkLight1337 requested review from mgoin and aarnphm June 13, 2025 07:29

Navanit-git changed the title ~~Add reasoning capability to vLLM streamlit code~~ [DOC] Add reasoning capability to vLLM streamlit code Jun 13, 2025

Merge branch 'vllm-project:main' into main

3e2f1ab

mgoin approved these changes Jun 16, 2025

View reviewed changes

mgoin added the ready label Jun 16, 2025

mgoin merged commit 3e75069 into vllm-project:main Jun 16, 2025
54 checks passed

yeqcharlotte pushed a commit to yeqcharlotte/vllm that referenced this pull request Jun 22, 2025

[DOC] Add reasoning capability to vLLM streamlit code (vllm-project#1…

b33292e

…9557)

minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025

[DOC] Add reasoning capability to vLLM streamlit code (vllm-project#1…

875eb34

…9557) Signed-off-by: minpeter <kali2005611@gmail.com>

yangw-dev pushed a commit to yangw-dev/vllm that referenced this pull request Jun 24, 2025

[DOC] Add reasoning capability to vLLM streamlit code (vllm-project#1…

0dbb110

…9557) Signed-off-by: Yang Wang <elainewy@meta.com>

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jun 30, 2025

[DOC] Add reasoning capability to vLLM streamlit code (vllm-project#1…

aed8eaa

…9557)

wseaton pushed a commit to wseaton/vllm that referenced this pull request Jun 30, 2025

[DOC] Add reasoning capability to vLLM streamlit code (vllm-project#1…

e5f49b3

…9557)

avigny pushed a commit to avigny/vllm that referenced this pull request Jul 31, 2025

[DOC] Add reasoning capability to vLLM streamlit code (vllm-project#1…

95f1de8

…9557) Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[DOC] Add reasoning capability to vLLM streamlit code #19557

[DOC] Add reasoning capability to vLLM streamlit code #19557

Uh oh!

Navanit-git commented Jun 12, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jun 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Jun 12, 2025

Uh oh!

gemini-code-assist bot Jun 12, 2025

Uh oh!

gemini-code-assist bot Jun 12, 2025

Uh oh!

Navanit-git commented Jun 12, 2025

Uh oh!

Navanit-git commented Jun 13, 2025

Uh oh!

Navanit-git commented Jun 15, 2025

Uh oh!

mgoin left a comment

Uh oh!

Uh oh!

lys791227 commented Jun 17, 2025

Uh oh!

Uh oh!

	st.session_state.sessions[session_id] = []
	st.session_state.sessions[session_id] = []
	st.session_state.reasoning_store[session_id] = {} # Initialize reasoning for this new session

		if reason and think:
		st.session_state.show_reasoning[message_index] = think

Uh oh!

[DOC] Add reasoning capability to vLLM streamlit code #19557

[DOC] Add reasoning capability to vLLM streamlit code #19557

Uh oh!

Conversation

Navanit-git commented Jun 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Screenshots

Uh oh!

github-actions bot commented Jun 12, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

Navanit-git commented Jun 12, 2025

Uh oh!

Navanit-git commented Jun 13, 2025

Uh oh!

Navanit-git commented Jun 15, 2025

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lys791227 commented Jun 17, 2025

SPDX-License-Identifier: Apache-2.0

SPDX-FileCopyrightText: Copyright contributors to the vLLM project

Get command line arguments from environment variables

Initialize session states for managing chat sessions

Add new session state for reasoning - 修改为按会话存储

Initialize session state for API base URL

Sidebar - API Settings first

Sidebar - Session Management

Display all sessions in reverse chronological order

Main interface

Initialize OpenAI client with API settings

Get and display current model id

Initialize first session if none exists

Update the chat history display section

Setup & Cache reasoning support check

Check support

Add reasoning toggle in sidebar if supported

Update the input handling section

Uh oh!

Uh oh!

Navanit-git commented Jun 12, 2025 •

edited by github-actions bot

Loading