# Streamlit for LLM Chatbots: A Beginner's Tutorial

This tutorial will guide you through building and running a conversational AI application using **Streamlit** and a Large Language Model (LLM), all from within a Jupyter/Colab notebook. We will cover the essential concepts to get you started. 💬

## Key Concepts Covered

* **Chat UI with Streamlit**: Use `st.chat_message` and `st.chat_input` to create a user-friendly chat interface.
* **Session State for Memory**: Understand how `st.session_state` maintains conversation history.
* **Streaming Responses**: Implement a real-time "typing" effect for LLM responses, as demonstrated in the official Streamlit documentation.
* **How to use ngrok to run your Streamlit app from a notebook**: Learn to expose your local Streamlit app to the internet using ngrok, making it accessible from anywhere.
* **Running Streamlit in a Notebook**: Learn how to use `pyngrok` to serve your Streamlit app from a notebook environment.

## 1. Prerequisites and Setup

First, you'll need to install the necessary Python libraries. `pyngrok` is essential for exposing the Streamlit service running inside the notebook to a public URL that you can access. Run the following command in your terminal or a code cell:

In [1]:
# Uncomment the line below to install the required packages
# !pip install streamlit openai python-dotenv pyngrok -q 

Next, create a file named `.env` in your project's root directory. This file will store your OpenAI API key securely. Add the following line, replacing `your_api_key_here` with your actual OpenAI API key:

## 2. Creating the Streamlit App File

Instead of creating a separate file manually, we can use a Jupyter magic command `%%writefile` to create our `app.py`. This command saves the content of the cell into a file named `app.py` in the same directory.

In [2]:
%%writefile app.py
import streamlit as st
import os
from openai import OpenAI
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Initialize the OpenAI client with your API key
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

st.title("🤖 Simple LLM Chat App")

# Initialize chat history in session state
if "messages" not in st.session_state:
    st.session_state.messages = []

# Display chat messages from history
for message in st.session_state.messages:
    with st.chat_message(message["role"]):
        st.markdown(message["content"])

# Accept user input
if prompt := st.chat_input("What is up?"):
    # Add user message to history and display it
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.markdown(prompt)

    # Generate and stream assistant response
    with st.chat_message("assistant"):
        message_placeholder = st.empty()
        full_response = ""
        # The streaming feature is enabled here
        stream = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": m["role"], "content": m["content"]}
                for m in st.session_state.messages
            ],
            stream=True,
        )
        # Process the stream and display the response in real-time
        for chunk in stream:
            full_response += (chunk.choices[0].delta.content or "")
            message_placeholder.markdown(full_response + "▌")
        message_placeholder.markdown(full_response)
    # Add the final assistant response to history
    st.session_state.messages.append({"role": "assistant", "content": full_response})

Overwriting app.py


## 3. Running the App from the Notebook

Now we'll run the app. As explained in this [helpful guide](https://deveshsurve.medium.com/how-to-serve-an-llm-as-a-streamlit-app-on-google-colab-step-by-step-guide-b6d9dc45427d), we will launch the Streamlit app on a local port and then use `pyngrok` to create a secure tunnel, giving us a public URL to access our app. 

Execute the cell below. It will start the Streamlit server in the background and print a public Ngrok URL. Click on the URL to open your chatbot in a new tab! 🎉

In [3]:
from pyngrok import ngrok, conf
import subprocess
import time
import os

# Set your ngrok authtoken (replace 'your-ngrok-authtoken' with your actual token)
NGROK_AUTH_TOKEN = os.getenv("NGROK_AUTH_TOKEN") or "your-ngrok-authtoken"
if NGROK_AUTH_TOKEN == "your-ngrok-authtoken":
    print("❗ Please set your ngrok authtoken in the NGROK_AUTH_TOKEN environment variable or replace 'your-ngrok-authtoken' in the code.")
else:
    conf.get_default().auth_token = NGROK_AUTH_TOKEN

    # Terminate any existing tunnels
    ngrok.kill()

    # Start the Streamlit app in the background using subprocess
    process = subprocess.Popen([
        "streamlit", "run", "app.py", "--server.port", "8501"
    ])

    # Wait a few seconds to ensure Streamlit starts
    print("Waiting for Streamlit to launch...")
    time.sleep(5)

    # Open a tunnel to the Streamlit port
    public_url = ngrok.connect(8501)
    print(f"Streamlit App URL: {public_url}")

Waiting for Streamlit to launch...

  You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8501
  Network URL: http://192.168.68.63:8501

  For better performance, install the Watchdog module:

  $ xcode-select --install
  $ pip install watchdog
            

  You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8501
  Network URL: http://192.168.68.63:8501

  For better performance, install the Watchdog module:

  $ xcode-select --install
  $ pip install watchdog
            


t=2025-08-05T20:46:09+0800 lvl=warn msg="ngrok config file found at both XDG and legacy locations, using XDG location" xdg_path="/Users/devon/Library/Application Support/ngrok/ngrok.yml" legacy_path=/Users/devon/.ngrok2/ngrok.yml


Streamlit App URL: NgrokTunnel: "https://5894d923eeab.ngrok-free.app" -> "http://localhost:8501"


## 4. Code Breakdown

### Streaming for a Better UX
This app includes a crucial feature for conversational AI: **response streaming**. By setting `stream=True` in the OpenAI API call, we receive the LLM's response word-by-word instead of waiting for the entire message. 

The code then iterates through each `chunk` of the response, appends it to `full_response`, and immediately updates the placeholder on the screen (`message_placeholder.markdown(full_response + "▌")`). This creates the familiar "typing" animation, making the app feel much more responsive and interactive.

### Session State for Memory
Streamlit reruns the script on each interaction. To remember the conversation, `st.session_state.messages` acts as our chat's memory. We initialize it once and append new user and assistant messages to it, ensuring the LLM has the full context of the conversation for its next response.

## 5. Conclusion

You have now successfully built and launched a streaming LLM chatbot directly from a Jupyter notebook. This workflow is excellent for rapid prototyping and testing. From here, you can continue to build out more complex features or prepare your app for deployment. Happy coding! 🚀