# Notebook 1/2 ‚Äî Prosty system RAG (ChromaDB + LiteLLM) + UI w Streamlit

**Cel ƒáwiczenia:** zbudujesz minimalny system **RAG** (Retrieval-Augmented Generation), kt√≥ry:
1. wczytuje dokumenty (np. PDF/TXT/MD),
2. dzieli je na fragmenty (chunking),
3. zapisuje wektory w **ChromaDB**,
4. pobiera najbardziej podobne fragmenty (retrieval),
5. generuje odpowied≈∫ przez LLM (OpenAI/Claude/Gemini/OpenRouter) z u≈ºyciem **LiteLLM**,
6. udostƒôpnia interfejs w **Streamlit**.

> Daty w tym notatniku sƒÖ przyk≈Çadowe. Dzi≈õ: **2026-01-18**.

---

## Wymagania
- konto i klucz API (opcjonalnie) do wybranego dostawcy LLM:
  - OpenAI: `OPENAI_API_KEY`
  - Anthropic Claude: `ANTHROPIC_API_KEY`
  - Google Gemini: `GEMINI_API_KEY`
  - OpenRouter: `OPENROUTER_API_KEY` (czasem jako `OPENAI_API_KEY` z odpowiednim base_url; w LiteLLM jest to uproszczone)
- W tym notebooku **embeddingi** robimy lokalnie (`sentence-transformers`), ≈ºeby nie wymagaƒá p≈Çatnych API.

---

## Spos√≥b pracy
1. Uruchom kom√≥rki od g√≥ry do do≈Çu.
2. Na ko≈Ñcu uruchom aplikacjƒô Streamlit i otw√≥rz link z tunelu.


In [None]:
# Instalacja bibliotek
!pip -q install chromadb==0.5.5 sentence-transformers==3.0.1 litellm==1.44.22 streamlit==1.37.1 pypdf==4.3.1 python-dotenv==1.0.1

# Narzƒôdzie do tunelowania Streamlit w Colab (cloudflared)
!wget -q https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb
!dpkg -i -q cloudflared-linux-amd64.deb || true


In [None]:
pip install "numpy<2.0" --force-reinstall


## 1) Importy i konfiguracja

- Baza wektorowa: ChromaDB (persist na dysku)
- Embedding: `sentence-transformers` (model: `all-MiniLM-L6-v2`)
- LLM: LiteLLM (jedno API do wielu provider√≥w)

> Je≈õli nie masz klucza do LLM, nadal mo≈ºesz przetestowaƒá czƒô≈õƒá retrieval (wyszukiwanie fragment√≥w).


In [None]:
import os, re, json, textwrap, pathlib, time
from typing import List, Dict, Any, Optional

import chromadb
from chromadb.config import Settings

from sentence_transformers import SentenceTransformer
from pypdf import PdfReader

import litellm

PERSIST_DIR = "./chroma_db"
COLLECTION_NAME = "docs"

# Lokalny model embedding√≥w
EMBED_MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2"
embedder = SentenceTransformer(EMBED_MODEL_NAME)

# Chroma klient + kolekcja
client = chromadb.PersistentClient(path=PERSIST_DIR, settings=Settings(anonymized_telemetry=False))
collection = client.get_or_create_collection(name=COLLECTION_NAME)

print("Chroma persist dir:", os.path.abspath(PERSIST_DIR))
print("Collection:", collection.name)


## 2) Wczytywanie dokument√≥w i chunking

Wykorzystujemy prosty chunker:
- dzieli tekst na fragmenty o rozmiarze ok. `chunk_size` znak√≥w,
- zachowuje `overlap` dla lepszego kontekstu.

Mo≈ºesz wgraƒá pliki przez `files.upload()` albo wkleiƒá tekst bezpo≈õrednio.


In [None]:
from google.colab import files

def read_pdf(path: str) -> str:
    reader = PdfReader(path)
    pages = []
    for p in reader.pages:
        pages.append(p.extract_text() or "")
    return "\n".join(pages)

def read_text(path: str) -> str:
    return pathlib.Path(path).read_text(encoding="utf-8", errors="ignore")

def load_documents(uploaded_paths: List[str]) -> List[Dict[str, Any]]:
    docs = []
    for p in uploaded_paths:
        ext = pathlib.Path(p).suffix.lower()
        if ext == ".pdf":
            txt = read_pdf(p)
        else:
            txt = read_text(p)
        docs.append({"path": p, "text": txt})
    return docs

def chunk_text(text: str, chunk_size: int = 1000, overlap: int = 150) -> List[str]:
    text = re.sub(r"\s+", " ", text).strip()
    if not text:
        return []
    chunks = []
    i = 0
    while i < len(text):
        chunk = text[i:i+chunk_size]
        chunks.append(chunk)
        i += max(1, chunk_size - overlap)
    return chunks

# Upload plik√≥w (PDF/TXT/MD)
uploaded = files.upload()
uploaded_paths = list(uploaded.keys())
print("Uploaded:", uploaded_paths)

docs = load_documents(uploaded_paths)
print("Loaded documents:", len(docs))
print("Example preview:", docs[0]["text"][:400] if docs else "(none)")


## 3) Indeksowanie w ChromaDB

Ka≈ºdy chunk dostaje:
- `id` (unikalny),
- `document` (tekst),
- `metadane` (≈∫r√≥d≈Ço + numer chunka),
- `embedding` (wektor).

> Je≈õli uruchamiasz notebook wielokrotnie, mo≈ºesz wyczy≈õciƒá kolekcjƒô.


In [3]:
def reset_collection():
    global collection
    client.delete_collection(COLLECTION_NAME)
    collection = client.get_or_create_collection(name=COLLECTION_NAME)
    print("Collection reset.")

def index_documents(docs: List[Dict[str, Any]], chunk_size=1000, overlap=150):
    ids, texts, metas, embeds = [], [], [], []
    for d in docs:
        chunks = chunk_text(d["text"], chunk_size=chunk_size, overlap=overlap)
        for j, ch in enumerate(chunks):
            uid = f"{pathlib.Path(d['path']).name}::chunk{j}"
            ids.append(uid)
            texts.append(ch)
            metas.append({"source": d["path"], "chunk": j})
    if not texts:
        print("No text to index.")
        return

    # embedding batch
    embeds = embedder.encode(texts, batch_size=32, show_progress_bar=True).tolist()
    collection.add(ids=ids, documents=texts, metadatas=metas, embeddings=embeds)
    print(f"Indexed {len(texts)} chunks.")

# (Opcjonalnie) reset_collection()
index_documents(docs)
print("Total vectors:", collection.count())


Batches:   0%|          | 0/12 [00:00<?, ?it/s]

ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event CollectionAddEvent: capture() takes 1 positional argument but 3 were given


Indexed 369 chunks.
Total vectors: 369


## 4) Retrieval: pobieranie najlepszych fragment√≥w

Funkcja `retrieve(query, k)` zwraca top-k fragment√≥w wraz z metadanymi i dystansem/podobie≈Ñstwem.


In [4]:
def retrieve(query: str, k: int = 5):
    q_emb = embedder.encode([query]).tolist()
    res = collection.query(query_embeddings=q_emb, n_results=k, include=["documents", "metadatas", "distances"])
    hits = []
    for doc, meta, dist in zip(res["documents"][0], res["metadatas"][0], res["distances"][0]):
        hits.append({"text": doc, "meta": meta, "distance": float(dist)})
    return hits

query = "O czym jest dokument?"
hits = retrieve(query, k=3)
for i,h in enumerate(hits,1):
    print(f"#{i} dist={h['distance']:.4f} source={h['meta']}")
    print(h["text"][:250], "\n")


ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event CollectionQueryEvent: capture() takes 1 positional argument but 3 were given


#1 dist=1.6801 source={'chunk': 136, 'source': 'sql-performance-explainedpdf-pdf-free.pdf'}
ject as bind parameter. This is yet another benefit of bind parameters. If you cannot do that, you just have to convert the search term instead of the table column: SELECT ... FROM sales WHERE sale_date = TO_DATE('1970-01-01', 'YYYY-MM-DD') This quer 

#2 dist=1.6911 source={'chunk': 315, 'source': 'sql-performance-explainedpdf-pdf-free.pdf'}
............................................................................ 166 Getting an Execution Plan ......................................................... 166 Operations ...................................................................... 

#3 dist=1.6936 source={'chunk': 8, 'source': 'sql-performance-explainedpdf-pdf-free.pdf'}
.. 162 Update .................................................................................... 163 A. Execution Plans .......................................................................... 165 Oracle Database ...

## 5) Generowanie odpowiedzi przez LLM (LiteLLM)

LiteLLM wspiera wiele provider√≥w. Ustal `model` np.:
- OpenAI: `gpt-4o-mini`, `gpt-4.1-mini` itd.
- Anthropic: `claude-3-5-sonnet-20240620` itd.
- Gemini: `gemini-1.5-pro`, `gemini-1.5-flash`
- OpenRouter: `openrouter/<nazwa_modelu>` (np. `openrouter/google/gemini-flash-1.5`)

### Klucze API
Ustaw w ≈õrodowisku Colab (na czas sesji):
```python
import os
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
os.environ["ANTHROPIC_API_KEY"] = "..."
os.environ["GEMINI_API_KEY"] = "..."
os.environ["OPENROUTER_API_KEY"] = "..."
```

> Je≈õli nie ustawisz klucza, ta czƒô≈õƒá zwr√≥ci b≈ÇƒÖd ‚Äî to OK, wtedy testuj retrieval.


In [7]:
from google.colab import userdata
import os
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

In [9]:
def rag_answer(question: str, k: int = 5, model: str = "gpt-4o-mini") -> Dict[str, Any]:
    hits = retrieve(question, k=k)
    context = "\n\n".join([f"[{i+1}] ({h['meta']['source']}, chunk {h['meta']['chunk']})\n{h['text']}" for i,h in enumerate(hits)])

    system = "Jeste≈õ pomocnym asystentem. Odpowiadaj po polsku. Je≈õli brakuje danych w kontek≈õcie, powiedz wprost czego nie wiesz."
    user = f"""Pytanie: {question}

Kontekst (wybrane fragmenty dokument√≥w):
{context}

Instrukcja: Odpowiedz na pytanie wy≈ÇƒÖcznie na podstawie kontekstu. Je≈õli kontekst nie wystarcza, napisz jakie informacje sƒÖ brakujƒÖce.
"""
    resp = litellm.completion(
        model=model,
        messages=[{"role":"system","content":system},{"role":"user","content":user}],
        temperature=0.2,
    )
    answer = resp["choices"][0]["message"]["content"]
    return {"answer": answer, "hits": hits, "model": model}

# Przyk≈Çad (wymaga klucza API do wybranego modelu)
result = rag_answer("Jakie sƒÖ g≈Ç√≥wne tezy dokumentu?", model="gpt-4o-mini")
print(result["answer"])


Na podstawie dostƒôpnych fragment√≥w dokumentu mo≈ºna wyodrƒôbniƒá kilka g≈Ç√≥wnych tez:

1. **Problemy z wydajno≈õciƒÖ SQL**: Problemy z wydajno≈õciƒÖ SQL sƒÖ powszechne, mimo ≈ºe SQL nie jest ju≈º tak wolny jak w jego poczƒÖtkowych wersjach. Wydajno≈õƒá SQL jest tematem, kt√≥ry wciƒÖ≈º wymaga uwagi.

2. **Separacja "co" i "jak"**: SQL jako jƒôzyk programowania czwartej generacji (4GL) pozwala na oddzielenie opisu tego, co jest potrzebne, od sposobu, w jaki to jest realizowane. U≈ºytkownik nie musi znaƒá wewnƒôtrznych mechanizm√≥w bazy danych, aby napisaƒá zapytanie.

3. **Znajomo≈õƒá dzia≈Çania bazy danych**: Wiele os√≥b, kt√≥re majƒÖ do≈õwiadczenie w SQL, nie posiada wiedzy na temat przetwarzania danych w bazie, co mo≈ºe prowadziƒá do nieefektywnego korzystania z tego jƒôzyka.

Brakuje jednak szczeg√≥≈Çowych informacji na temat konkretnych rozwiƒÖza≈Ñ problem√≥w z wydajno≈õciƒÖ SQL oraz przyk≈Çad√≥w zastosowania najlepszych praktyk w kontek≈õcie optymalizacji zapyta≈Ñ.


  PydanticSerializationUnexpectedValue(Expected `CompletionTokensDetails` - serialized value may not be as expected [field_name='completion_tokens_details', input_value={'accepted_prediction_tok...d_prediction_tokens': 0}, input_type=dict])
  PydanticSerializationUnexpectedValue(Expected `PromptTokensDetails` - serialized value may not be as expected [field_name='prompt_tokens_details', input_value={'audio_tokens': 0, 'cached_tokens': 0}, input_type=dict])
  return self.__pydantic_serializer__.to_python(


## 6) Aplikacja Streamlit (UI)

W Streamlit zrobimy:
- upload dokument√≥w,
- indeksowanie do ChromaDB,
- chat RAG (retrieval + LLM),
- wyb√≥r modelu.

W Colab uruchomimy Streamlit i wystawimy go przez tunel Cloudflare.


In [10]:
app_code = r'''
import os, re, pathlib, time
import streamlit as st
import chromadb
from chromadb.config import Settings
from sentence_transformers import SentenceTransformer
import litellm
from pypdf import PdfReader

PERSIST_DIR = "./chroma_db"
COLLECTION_NAME = "docs"
EMBED_MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2"

@st.cache_resource
def get_embedder():
    return SentenceTransformer(EMBED_MODEL_NAME)

@st.cache_resource
def get_collection():
    client = chromadb.PersistentClient(path=PERSIST_DIR, settings=Settings(anonymized_telemetry=False))
    return client.get_or_create_collection(name=COLLECTION_NAME)

def read_pdf_bytes(file_bytes) -> str:
    from io import BytesIO
    reader = PdfReader(BytesIO(file_bytes))
    pages = []
    for p in reader.pages:
        pages.append(p.extract_text() or "")
    return "\n".join(pages)

def chunk_text(text: str, chunk_size: int = 1000, overlap: int = 150):
    text = re.sub(r"\s+", " ", text).strip()
    if not text:
        return []
    chunks, i = [], 0
    while i < len(text):
        chunks.append(text[i:i+chunk_size])
        i += max(1, chunk_size - overlap)
    return chunks

def index_files(files, chunk_size=1000, overlap=150):
    embedder = get_embedder()
    collection = get_collection()
    ids, docs, metas = [], [], []
    for f in files:
        name = pathlib.Path(f.name).name
        ext = pathlib.Path(name).suffix.lower()
        if ext == ".pdf":
            text = read_pdf_bytes(f.getvalue())
        else:
            text = f.getvalue().decode("utf-8", errors="ignore")
        chunks = chunk_text(text, chunk_size=chunk_size, overlap=overlap)
        for j,ch in enumerate(chunks):
            ids.append(f"{name}::chunk{j}")
            docs.append(ch)
            metas.append({"source": name, "chunk": j})
    if not docs:
        return 0
    embeds = embedder.encode(docs, batch_size=32, show_progress_bar=False).tolist()
    collection.add(ids=ids, documents=docs, metadatas=metas, embeddings=embeds)
    return len(docs)

def retrieve(query: str, k: int = 5):
    embedder = get_embedder()
    collection = get_collection()
    q_emb = embedder.encode([query]).tolist()
    res = collection.query(query_embeddings=q_emb, n_results=k, include=["documents","metadatas","distances"])
    hits = []
    for doc, meta, dist in zip(res["documents"][0], res["metadatas"][0], res["distances"][0]):
        hits.append({"text": doc, "meta": meta, "distance": float(dist)})
    return hits

def rag_answer(question: str, k: int, model: str):
    hits = retrieve(question, k=k)
    context = "\n\n".join([f"[{i+1}] ({h['meta']['source']}, chunk {h['meta']['chunk']})\n{h['text']}" for i,h in enumerate(hits)])
    system = "Jeste≈õ pomocnym asystentem. Odpowiadaj po polsku. Je≈õli brakuje danych w kontek≈õcie, powiedz wprost czego nie wiesz."
    user = f"""Pytanie: {question}

Kontekst (wybrane fragmenty dokument√≥w):
{context}

Instrukcja: Odpowiedz na pytanie wy≈ÇƒÖcznie na podstawie kontekstu. Je≈õli kontekst nie wystarcza, napisz jakie informacje sƒÖ brakujƒÖce.
"""
    resp = litellm.completion(
        model=model,
        messages=[{"role":"system","content":system},{"role":"user","content":user}],
        temperature=0.2,
    )
    return resp["choices"][0]["message"]["content"], hits

st.set_page_config(page_title="RAG (ChromaDB + LiteLLM)", layout="wide")
st.title("üìö Prosty RAG: ChromaDB + LiteLLM + Streamlit")

with st.sidebar:
    st.header("Ustawienia")
    model = st.text_input("Model (LiteLLM)", value="gpt-4o-mini")
    k = st.slider("Top-k fragment√≥w", 1, 10, 5)
    chunk_size = st.number_input("chunk_size", 300, 3000, 1000, step=100)
    overlap = st.number_input("overlap", 0, 800, 150, step=50)

    st.markdown("### Klucze API (opcjonalnie)")
    st.caption("Mo≈ºesz wpisaƒá tu klucz, a aplikacja ustawi go w env na czas sesji.")
    openai_key = st.text_input("OPENAI_API_KEY", type="password")
    anthropic_key = st.text_input("ANTHROPIC_API_KEY", type="password")
    gemini_key = st.text_input("GEMINI_API_KEY", type="password")
    openrouter_key = st.text_input("OPENROUTER_API_KEY", type="password")

    if openai_key: os.environ["OPENAI_API_KEY"] = openai_key
    if anthropic_key: os.environ["ANTHROPIC_API_KEY"] = anthropic_key
    if gemini_key: os.environ["GEMINI_API_KEY"] = gemini_key
    if openrouter_key: os.environ["OPENROUTER_API_KEY"] = openrouter_key

st.subheader("1) Indeksowanie dokument√≥w")
up = st.file_uploader("Wgraj pliki (PDF/TXT/MD)", type=["pdf","txt","md"], accept_multiple_files=True)
if st.button("üì• Indeksuj"):
    if not up:
        st.warning("Wgraj co najmniej 1 plik.")
    else:
        n = index_files(up, chunk_size=int(chunk_size), overlap=int(overlap))
        st.success(f"Zindeksowano chunk√≥w: {n}")

st.divider()

st.subheader("2) Chat RAG")
if "messages" not in st.session_state:
    st.session_state.messages = []

for m in st.session_state.messages:
    with st.chat_message(m["role"]):
        st.markdown(m["content"])

prompt = st.chat_input("Zadaj pytanie do dokument√≥w‚Ä¶")
if prompt:
    st.session_state.messages.append({"role":"user","content":prompt})
    with st.chat_message("user"):
        st.markdown(prompt)

    with st.chat_message("assistant"):
        try:
            ans, hits = rag_answer(prompt, k=int(k), model=model)
            st.markdown(ans)
            with st.expander("üîé ≈πr√≥d≈Ça (retrieval)"):
                for i,h in enumerate(hits,1):
                    st.write(f"#{i} dist={h['distance']:.4f} source={h['meta']}")
                    st.write(h["text"])
            st.session_state.messages.append({"role":"assistant","content":ans})
        except Exception as e:
            st.error(f"B≈ÇƒÖd LLM: {e}")
            st.info("Je≈õli nie masz klucza API, testuj retrieval w notebooku albo ustaw klucz w panelu bocznym.")
'''
with open("app_rag.py","w",encoding="utf-8") as f:
    f.write(app_code)

print("Wrote app_rag.py")


Wrote app_rag.py


### Uruchom Streamlit + tunel (Cloudflare)

Po uruchomieniu kom√≥rki dostaniesz publiczny link (HTTPS). Otw√≥rz go w przeglƒÖdarce.


In [13]:
!wget -q https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64
!chmod +x cloudflared-linux-amd64
!mv cloudflared-linux-amd64 /usr/local/bin/cloudflared


In [15]:
# Uruchamiamy Streamlit w tle i wystawiamy przez Cloudflare Tunnel
import subprocess, textwrap, time, os, signal, sys

# Kill previous if rerun
!pkill -f "streamlit run app_rag.py" || true
!pkill -f "cloudflared tunnel" || true

streamlit_proc = subprocess.Popen(["streamlit", "run", "app_rag.py",
                                   "--server.port", "8501",
                                   "--server.enableCORS", "false",
                                   "--server.enableXsrfProtection", "false"],
                                  stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)

time.sleep(2)

tunnel_proc = subprocess.Popen(["cloudflared", "tunnel", "--url", "http://localhost:8501"],
                               stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)

# WyciƒÖgamy link z outputu tunelu
public_url = None
t0 = time.time()
while time.time() - t0 < 20:
    line = tunnel_proc.stdout.readline().strip()
    if line:
        print(line)
    m = re.search(r"(https://[\w\-\.]+\.trycloudflare\.com)", line)
    if m:
        public_url = m.group(1)
        break

print("\n‚úÖ Publiczny link:", public_url if public_url else "(nie uda≈Ço siƒô odczytaƒá - sprawd≈∫ log powy≈ºej)")


^C
^C
2026-01-21T11:37:59Z INF Thank you for trying Cloudflare Tunnel. Doing so, without a Cloudflare account, is a quick way to experiment and try it out. However, be aware that these account-less Tunnels have no uptime guarantee, are subject to the Cloudflare Online Services Terms of Use (https://www.cloudflare.com/website-terms/), and Cloudflare reserves the right to investigate your use of Tunnels for violations of such terms. If you intend to use Tunnels in production you should use a pre-created named tunnel by following: https://developers.cloudflare.com/cloudflare-one/connections/connect-apps
2026-01-21T11:37:59Z INF Requesting new quick Tunnel on trycloudflare.com...
2026-01-21T11:38:03Z INF +--------------------------------------------------------------------------------------------+
2026-01-21T11:38:03Z INF |  Your quick Tunnel has been created! Visit it at (it may take some time to be reachable):  |
2026-01-21T11:38:03Z INF |  https://sustainability-enb-demands-willing.tryc