# Streamlit UI basics → PDF RAG chat

We will first build a tiny Streamlit playground to understand widgets and layout, then reuse those ideas to ship a Retrieval-Augmented Generation (RAG) PDF chat experience.

## Prerequisites
Install the UI + parsing dependencies before running the notebook or Streamlit apps.

```bash
pip install streamlit pypdf numpy openai
```

In [2]:
# %pip install streamlit pypdf

## Part 1 – Streamlit UI fundamentals
The goal is to understand how Streamlit scripts read inputs, update session state, and render widgets instantly each time the script reruns.

### Basic playground features
- sidebar sliders/select boxes for quick controls
- text/text-area widgets bound to `st.session_state`
- tables and metrics that update whenever inputs change
- a simple chart driven by NumPy so you see how plotting works without extra libs

In [5]:
from pathlib import Path
basics_path = Path('./streamlit_basics_demo.py')
print(basics_path.resolve())
print("\n".join(basics_path.read_text().splitlines()[:80]))

/Users/bdthombre/developer/python/python_tutorial/003_streamlit_ui/streamlit_basics_demo.py
"""Small Streamlit playground for teaching the core widgets."""

from __future__ import annotations

import numpy as np
import streamlit as st

st.set_page_config(page_title="Streamlit basics", page_icon="✨", layout="wide")
st.title("Streamlit basics playground")
st.write(
    "This mini-app highlights the most common Streamlit patterns: layout, widgets, "
    "callbacks, and live charts."
)

with st.sidebar:
    st.header("Sidebar controls")
    energy = st.slider("How energetic is today's session?", 0, 10, 6)
    mood = st.selectbox("Room mood", ["Curious", "Focused", "Sleepy", "Hyped"], index=1)
    st.write(f"Energy: {energy} ⚡️ · Mood: {mood}")

name = st.text_input("What's your name?", placeholder="Type and press enter")
note = st.text_area(
    "What was the most interesting idea today?",
    placeholder="Widgets, layout, or charts?",
    height=100,
)

if "notes" not in st.session_state:

Run it from the project root with:

```bash
streamlit run 003_streamlit_ui/streamlit_basics_demo.py
```

Experiment by editing the file; Streamlit hot-reloads on save so students can see immediate feedback.

## Part 2 – Build the PDF RAG assistant
With the widget basics in place, we now stitch together OpenAI embeddings + the Responses API to let users upload a PDF and ask grounded questions.

### Connect to OpenAI

In [None]:
from openai import OpenAI

import api_key

client = OpenAI(api_key=api_key.openai)
TEXT_MODEL = "gpt-4o-mini"
EMBED_MODEL = "text-embedding-3-large"

print(f"Connected to OpenAI. Using {TEXT_MODEL} + {EMBED_MODEL}.")

### Helper functions
These utilities chunk raw text, generate embeddings, perform cosine similarity, and call the Responses API with retrieved context. They mirror the logic used in the full Streamlit app.

In [None]:
import numpy as np
from typing import List, Sequence, Tuple



def chunk_text(text: str, chunk_size: int = 900, overlap: int = 150) -> List[str]:
    words = text.split()
    chunks: List[str] = []
    start = 0
    while start < len(words):
        end = min(len(words), start + chunk_size)
        chunk = " \".join(words[start:end]).strip()
        if chunk:
            chunks.append(chunk)
        if end >= len(words):
            break
        start = max(0, end - overlap)
    return chunks



def embed_texts(texts: Sequence[str]) -> np.ndarray:
    response = client.embeddings.create(model=EMBED_MODEL, input=list(texts))
    return np.array([row.embedding for row in response.data], dtype=np.float32)



def top_chunks(query: str, chunks: Sequence[str], vectors: np.ndarray, k: int = 4) -> Tuple[List[str], np.ndarray]:
    q_response = client.embeddings.create(model=EMBED_MODEL, input=[query])
    q_vec = np.array(q_response.data[0].embedding, dtype=np.float32)
    q_vec /= np.linalg.norm(q_vec) + 1e-10
    chunk_norm = np.linalg.norm(vectors, axis=1, keepdims=True) + 1e-10
    normalized = vectors / chunk_norm
    sims = normalized @ q_vec
    order = np.argsort(sims)[::-1][:k]
    return [chunks[i] for i in order], sims[order]



def answer_with_context(question: str, context_chunks: Sequence[str]) -> str:
    context = "\n\n".join(context_chunks)
    prompt = f"""Answer the question strictly with the provided context.\n\nContext:\n{context}\n\nQuestion: {question}\n\nIf no answer is present, reply with 'I don't know'."""
    response = client.responses.create(
        model=TEXT_MODEL, input=[{"role": "user", "content": prompt}]
    )
    return response.output_text.strip()

### Quick dry-run in pure Python
Simulate uploading a tiny “document” without launching Streamlit to prove that chunking, retrieval, and question answering work.

In [None]:
demo_text = """OpenAI builds advanced language models.\n\nThe design team ships sample apps to show how retrieval-augmented generation works.\n\nStreamlit makes it easy to turn Python scripts into shareable tools."""
chunks = chunk_text(demo_text, chunk_size=40, overlap=5)
vectors = embed_texts(chunks)
question = "Which framework is used for shareable tools?"
retrieved, scores = top_chunks(question, chunks, vectors, k=2)
answer = answer_with_context(question, retrieved)
print("Chunks:", chunks)
print("Scores:", np.round(scores, 3))
print("Answer:", answer)

### Streamlit RAG UI file
`003_streamlit_ui/rag_pdf_chat.py` brings everything together: file uploads, embeddings, retrieval, and the chat interface with expandable context panels.

In [None]:
from pathlib import Path
rag_path = Path('003_streamlit_ui/rag_pdf_chat.py')
print(rag_path.resolve())
print("\n".join(rag_path.read_text().splitlines()[:40]))

Launch the full experience once your API key is configured:

```bash
streamlit run 003_streamlit_ui/rag_pdf_chat.py
```

number of retrieved passages, or how answers are formatted to reinforce both Streamlit and RAG concepts.