# ðŸ§  LLM Playground

A lightweight environment for experimenting with:

- PDF chunking  
- Embeddings via Ollama  
- pgvector similarity search  
- A minimal RAG pipeline  

## ðŸ†Ž Init Environment
 - Start Environment: .\venv\Scripts\activate
 - Stop Environment: deactivate
 - See [readme.md](../README.md) for Environment setup

In [1]:
## Hot-Reload .py Files
%load_ext autoreload
%autoreload 2

import sys
import traceback
from pathlib import Path
import pandas as pd
import models # auto-registers all models

# Add app directory to path for imports
sys.path.insert(0, str(Path.cwd()))

from core.database import Database
from core.ollama import Ollama
from controllers.RagController import RagController

# Test Environment
print("Init workspace...\n")
db = Database()
if not db.check_connection():
    raise SystemExit("Database connection failed. Stopping initialization.")
print("DB Connection: PG/Vector")

rag = RagController()
print("LLM Model:", rag.ollama.llm_model)
print("Embedding Model:", rag.ollama.embedding_model)
print("\nâœ“ Ollama LLM Response: ", rag.ollama.generate("Hello from our Playground!"))
print("âœ“ Ollama Embedding Dimension:", len(rag.ollama.embed("hello")))


  from pydantic.v1.fields import FieldInfo as FieldInfoV1


Init workspace...

DB Connection: PG/Vector
LLM Model: smollm:360m
Embedding Model: mxbai-embed-large

âœ“ Ollama LLM Response:  Hey there! How's your day going? I'm glad you're enjoying the playground. Do you have any favorite games or activities to do during recess?
âœ“ Ollama Embedding Dimension: 1024


## ðŸŽ¯ RAG Controller

### Search Embedded Vectors
Run a similarity search using the query embedding against stored chunk embeddings.

 - Cosine Distance

In [None]:
## Search Query
search_query = "Ich suche einen Softwareentwickler"
result = []

try:
    results = rag.search(search_query, limit=5)
    df = pd.DataFrame(results) 
    display(df)

except Exception as e:
    print("âš  Search error:", e)

### GET Embedded Chunks

In [None]:
chunks = rag.get_chunks()

df = pd.DataFrame([
    {
        "id": c.id,
        "document_id": c.document_id,
        "chunk_index": c.chunk_index,
        "token_count": c.token_count,
        "content": c.content[:200] + "..." if len(c.content) > 200 else c.content
    }
    for c in chunks
])

df

### Create File Embeddings
 - Chunk PDF
 - Embed chunks within Database

In [None]:
# Define File Path
file_path = Path("./store/resume.pdf")

try:
    chunks = rag.chunk_pdf(str(file_path))

    if not chunks:
        print("âš  No chunks were produced. Check PDF content or chunker settings.")
    else:
        print(f"âœ“ Successfully processed PDF into {len(chunks)} chunks")
        print(f"âœ“ First chunk preview:\n{chunks[0][:300]}...")

except Exception as e:
    print("âš  PDF processing error:", e)
    traceback.print_exc()


# ðŸ†Ž Project Initialization
 - Hard DB Reset

In [None]:
# Hard DB Reset
db.set_new_environment()
schema = db.show_schema()