# üìö RAG Syst√©my s Hugging Face

**Autor:** Praut s.r.o. - AI Integration & Business Automation

RAG (Retrieval-Augmented Generation) kombinuje vyhled√°v√°n√≠ v dokumentech s generov√°n√≠m odpovƒõd√≠ pomoc√≠ LLM.

V tomto notebooku se nauƒç√≠me:
- Z√°klady RAG architektury
- Indexov√°n√≠ dokument≈Ø s vektorov√Ωm vyhled√°v√°n√≠m
- Chunking strategie pro dlouh√© dokumenty
- Integrace s lok√°ln√≠mi LLM modely
- Produkƒçn√≠ RAG pipeline s evaluac√≠

## Architektura RAG

```
Dotaz ‚Üí Embedding ‚Üí Vektorov√© vyhled√°v√°n√≠ ‚Üí Relevantn√≠ chunky ‚Üí LLM ‚Üí Odpovƒõƒè
                              ‚Üë
                    Vektorov√° datab√°ze
                    (indexovan√© dokumenty)
```

In [None]:
# Instalace pot≈ôebn√Ωch knihoven
!pip install -q transformers sentence-transformers faiss-cpu langchain langchain-community chromadb pypdf

In [None]:
import torch
import numpy as np
from typing import List, Dict, Any, Optional, Tuple
import os
import json
from dataclasses import dataclass
import re

# Hugging Face
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from sentence_transformers import SentenceTransformer

# Vektorov√© vyhled√°v√°n√≠
import faiss

# Kontrola GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Pou≈æ√≠v√°m za≈ô√≠zen√≠: {device}")

## 1. Document Chunking - Rozdƒõlen√≠ Dokument≈Ø

Spr√°vn√© rozdƒõlen√≠ dokument≈Ø je kl√≠ƒçov√© pro kvalitu RAG syst√©mu.

In [None]:
@dataclass
class DocumentChunk:
    """Reprezentace jednoho chunku dokumentu."""
    text: str
    metadata: Dict[str, Any]
    chunk_id: int
    source: str


class DocumentChunker:
    """
    Rozdƒõl√≠ dokumenty na chunky pro indexov√°n√≠.
    Podporuje r≈Øzn√© strategie chunkingu.
    """
    
    def __init__(
        self,
        chunk_size: int = 500,
        chunk_overlap: int = 50,
        separator: str = "\n"
    ):
        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap
        self.separator = separator
    
    def chunk_by_size(self, text: str, source: str = "unknown") -> List[DocumentChunk]:
        """Rozdƒõl√≠ text na chunky podle velikosti s p≈ôekryvem."""
        chunks = []
        start = 0
        chunk_id = 0
        
        while start < len(text):
            # Najdeme konec chunku
            end = start + self.chunk_size
            
            # Pokud nejsme na konci, zkus√≠me ukonƒçit na hranici vƒõty
            if end < len(text):
                # Hled√°me teƒçku, vyk≈ôiƒçn√≠k nebo otazn√≠k
                last_sentence_end = max(
                    text.rfind(".", start, end),
                    text.rfind("!", start, end),
                    text.rfind("?", start, end)
                )
                if last_sentence_end > start:
                    end = last_sentence_end + 1
            
            chunk_text = text[start:end].strip()
            
            if chunk_text:
                chunks.append(DocumentChunk(
                    text=chunk_text,
                    metadata={
                        "start_char": start,
                        "end_char": end,
                        "chunk_size": len(chunk_text)
                    },
                    chunk_id=chunk_id,
                    source=source
                ))
                chunk_id += 1
            
            # Posun s p≈ôekryvem
            start = end - self.chunk_overlap
        
        return chunks
    
    def chunk_by_paragraphs(self, text: str, source: str = "unknown") -> List[DocumentChunk]:
        """Rozdƒõl√≠ text na chunky podle odstavc≈Ø."""
        paragraphs = text.split("\n\n")
        chunks = []
        current_chunk = ""
        chunk_id = 0
        
        for para in paragraphs:
            para = para.strip()
            if not para:
                continue
            
            # Pokud by chunk byl p≈ô√≠li≈° velk√Ω, ulo≈æ√≠me a zaƒçneme nov√Ω
            if len(current_chunk) + len(para) > self.chunk_size and current_chunk:
                chunks.append(DocumentChunk(
                    text=current_chunk.strip(),
                    metadata={"type": "paragraph"},
                    chunk_id=chunk_id,
                    source=source
                ))
                chunk_id += 1
                current_chunk = ""
            
            current_chunk += para + "\n\n"
        
        # Posledn√≠ chunk
        if current_chunk.strip():
            chunks.append(DocumentChunk(
                text=current_chunk.strip(),
                metadata={"type": "paragraph"},
                chunk_id=chunk_id,
                source=source
            ))
        
        return chunks
    
    def chunk_by_sentences(self, text: str, source: str = "unknown", sentences_per_chunk: int = 5) -> List[DocumentChunk]:
        """Rozdƒõl√≠ text na chunky podle vƒõt."""
        # Jednoduch√Ω sentence splitter
        sentences = re.split(r'(?<=[.!?])\s+', text)
        chunks = []
        chunk_id = 0
        
        for i in range(0, len(sentences), sentences_per_chunk):
            chunk_sentences = sentences[i:i + sentences_per_chunk]
            chunk_text = " ".join(chunk_sentences)
            
            if chunk_text.strip():
                chunks.append(DocumentChunk(
                    text=chunk_text.strip(),
                    metadata={
                        "type": "sentences",
                        "sentence_count": len(chunk_sentences)
                    },
                    chunk_id=chunk_id,
                    source=source
                ))
                chunk_id += 1
        
        return chunks

In [None]:
# Uk√°zkov√Ω dokument
sample_document = """
Umƒõl√° inteligence v podnik√°n√≠

Umƒõl√° inteligence (AI) transformuje zp≈Øsob, jak√Ωm firmy funguj√≠. Od automatizace rutinn√≠ch √∫kol≈Ø po pokroƒçilou anal√Ωzu dat, AI nab√≠z√≠ ≈°irokou ≈°k√°lu mo≈ænost√≠ pro zv√Ω≈°en√≠ efektivity a konkurenceschopnosti.

Jednou z nejƒçastƒõj≈°√≠ch aplikac√≠ AI je z√°kaznick√Ω servis. Chatboty poh√°nƒõn√© AI dok√°≈æ√≠ odpov√≠dat na bƒõ≈æn√© dotazy z√°kazn√≠k≈Ø 24/7, co≈æ sni≈æuje n√°klady na podporu a zlep≈°uje z√°kaznickou zku≈°enost. Modern√≠ chatboty vyu≈æ√≠vaj√≠ zpracov√°n√≠ p≈ôirozen√©ho jazyka (NLP) k porozumƒõn√≠ kontextu a poskytov√°n√≠ relevantn√≠ch odpovƒõd√≠.

Dal≈°√≠ v√Ωznamnou oblast√≠ je prediktivn√≠ analytika. AI modely dok√°≈æ√≠ analyzovat historick√° data a p≈ôedpov√≠dat budouc√≠ trendy, co≈æ pom√°h√° firm√°m l√©pe pl√°novat z√°soby, optimalizovat ceny a identifikovat potenci√°ln√≠ probl√©my d≈ô√≠ve, ne≈æ nastanou.

Automatizace proces≈Ø pomoc√≠ AI zahrnuje robotickou automatizaci proces≈Ø (RPA), kter√° dok√°≈æe p≈ôevz√≠t opakuj√≠c√≠ se administrativn√≠ √∫koly. To umo≈æ≈àuje zamƒõstnanc≈Øm soust≈ôedit se na kreativnƒõj≈°√≠ a strategiƒçtƒõj≈°√≠ ƒçinnosti.

Implementace AI vy≈æaduje peƒçliv√© pl√°nov√°n√≠. Firmy mus√≠ zv√°≈æit kvalitu dat, technickou infrastrukturu, ≈°kolen√≠ zamƒõstnanc≈Ø a etick√© aspekty vyu≈æ√≠v√°n√≠ AI. √öspƒõ≈°n√° implementace AI m≈Ø≈æe v√©st k v√Ωznamn√Ωm √∫spor√°m n√°klad≈Ø a zv√Ω≈°en√≠ produktivity.
"""

# Test chunkingu
chunker = DocumentChunker(chunk_size=400, chunk_overlap=50)
chunks = chunker.chunk_by_size(sample_document, source="ai_business.txt")

print(f"Poƒçet chunk≈Ø: {len(chunks)}")
print("="*50)

for chunk in chunks:
    print(f"\nChunk {chunk.chunk_id}:")
    print(f"D√©lka: {len(chunk.text)} znak≈Ø")
    print(f"Text: {chunk.text[:150]}...")

## 2. Vector Store - Vektorov√° Datab√°ze

Indexujeme chunky pomoc√≠ embeddings a FAISS.

In [None]:
class VectorStore:
    """
    Vektorov√° datab√°ze pro RAG syst√©m.
    Pou≈æ√≠v√° Sentence Transformers pro embeddings a FAISS pro vyhled√°v√°n√≠.
    """
    
    def __init__(
        self,
        embedding_model: str = "all-MiniLM-L6-v2",
        index_type: str = "flat"  # flat, ivf, hnsw
    ):
        print(f"Naƒç√≠t√°m embedding model: {embedding_model}")
        self.encoder = SentenceTransformer(embedding_model)
        self.embedding_dim = self.encoder.get_sentence_embedding_dimension()
        self.index_type = index_type
        
        # Inicializace FAISS indexu
        self.index = None
        self.chunks: List[DocumentChunk] = []
        
        print(f"Embedding dimenze: {self.embedding_dim}")
    
    def _create_index(self, n_vectors: int):
        """Vytvo≈ô√≠ FAISS index podle typu."""
        if self.index_type == "flat":
            # P≈ôesn√© vyhled√°v√°n√≠ - nejlep≈°√≠ pro mal√© datasety
            self.index = faiss.IndexFlatIP(self.embedding_dim)
        elif self.index_type == "ivf":
            # Approximate search - rychlej≈°√≠ pro velk√© datasety
            n_clusters = min(int(np.sqrt(n_vectors)), 100)
            quantizer = faiss.IndexFlatIP(self.embedding_dim)
            self.index = faiss.IndexIVFFlat(quantizer, self.embedding_dim, n_clusters)
        elif self.index_type == "hnsw":
            # HNSW - rychl√Ω approximate search
            self.index = faiss.IndexHNSWFlat(self.embedding_dim, 32)
        else:
            self.index = faiss.IndexFlatIP(self.embedding_dim)
    
    def add_documents(self, chunks: List[DocumentChunk]):
        """P≈ôid√° dokumenty do indexu."""
        if not chunks:
            return
        
        # Vytvo≈ôen√≠ embeddings
        texts = [chunk.text for chunk in chunks]
        embeddings = self.encoder.encode(
            texts,
            normalize_embeddings=True,
            show_progress_bar=True
        )
        
        # Vytvo≈ôen√≠ indexu pokud neexistuje
        if self.index is None:
            self._create_index(len(chunks))
            
            # IVF index mus√≠ b√Ωt natr√©nov√°n
            if self.index_type == "ivf":
                self.index.train(embeddings.astype(np.float32))
        
        # P≈ôid√°n√≠ do indexu
        self.index.add(embeddings.astype(np.float32))
        self.chunks.extend(chunks)
        
        print(f"P≈ôid√°no {len(chunks)} chunk≈Ø. Celkem: {len(self.chunks)}")
    
    def search(
        self,
        query: str,
        top_k: int = 5,
        threshold: float = 0.0
    ) -> List[Tuple[DocumentChunk, float]]:
        """Vyhled√° nejrelevantnƒõj≈°√≠ chunky pro dotaz."""
        if self.index is None or len(self.chunks) == 0:
            return []
        
        # Embedding dotazu
        query_embedding = self.encoder.encode(
            [query],
            normalize_embeddings=True
        ).astype(np.float32)
        
        # Vyhled√°v√°n√≠
        scores, indices = self.index.search(query_embedding, min(top_k, len(self.chunks)))
        
        results = []
        for score, idx in zip(scores[0], indices[0]):
            if idx >= 0 and score >= threshold:
                results.append((self.chunks[idx], float(score)))
        
        return results
    
    def save(self, path: str):
        """Ulo≈æ√≠ index a metadata."""
        os.makedirs(path, exist_ok=True)
        
        # Ulo≈æen√≠ FAISS indexu
        faiss.write_index(self.index, f"{path}/index.faiss")
        
        # Ulo≈æen√≠ chunk≈Ø
        chunks_data = [
            {
                "text": c.text,
                "metadata": c.metadata,
                "chunk_id": c.chunk_id,
                "source": c.source
            }
            for c in self.chunks
        ]
        with open(f"{path}/chunks.json", "w", encoding="utf-8") as f:
            json.dump(chunks_data, f, ensure_ascii=False, indent=2)
        
        print(f"Index ulo≈æen do: {path}")
    
    def load(self, path: str):
        """Naƒçte index a metadata."""
        # Naƒçten√≠ FAISS indexu
        self.index = faiss.read_index(f"{path}/index.faiss")
        
        # Naƒçten√≠ chunk≈Ø
        with open(f"{path}/chunks.json", "r", encoding="utf-8") as f:
            chunks_data = json.load(f)
        
        self.chunks = [
            DocumentChunk(
                text=c["text"],
                metadata=c["metadata"],
                chunk_id=c["chunk_id"],
                source=c["source"]
            )
            for c in chunks_data
        ]
        
        print(f"Naƒçteno {len(self.chunks)} chunk≈Ø z: {path}")

In [None]:
# Vytvo≈ôen√≠ a naplnƒõn√≠ vektorov√© datab√°ze
vector_store = VectorStore(embedding_model="all-MiniLM-L6-v2")

# P≈ôid√°n√≠ v√≠ce dokument≈Ø
documents = [
    {
        "source": "ai_business.txt",
        "text": sample_document
    },
    {
        "source": "customer_service.txt",
        "text": """
        Z√°kaznick√Ω servis a podpora
        
        Kvalitn√≠ z√°kaznick√Ω servis je z√°kladem √∫spƒõ≈°n√©ho podnik√°n√≠. Spokojen√Ω z√°kazn√≠k se vrac√≠ a doporuƒçuje firmu dal≈°√≠m.
        
        Z√°kladn√≠ principy kvalitn√≠ho servisu zahrnuj√≠ rychlou reakci na dotazy, empatick√Ω p≈ô√≠stup a efektivn√≠ ≈ôe≈°en√≠ probl√©m≈Ø.
        Modern√≠ technologie jako CRM syst√©my pom√°haj√≠ sledovat historii komunikace a personalizovat slu≈æby.
        
        Chatboty dok√°≈æ√≠ odbavit rutinn√≠ dotazy automaticky, ale slo≈æitƒõj≈°√≠ p≈ô√≠pady vy≈æaduj√≠ lidsk√Ω p≈ô√≠stup.
        D≈Øle≈æit√© je nastavit spr√°vnou eskalaci - kdy p≈ôedat p≈ô√≠pad ≈æiv√©mu oper√°torovi.
        
        Mƒõ≈ôen√≠ spokojenosti z√°kazn√≠k≈Ø pomoc√≠ NPS (Net Promoter Score) a CSAT pom√°h√° identifikovat oblasti pro zlep≈°en√≠.
        """
    },
    {
        "source": "ecommerce.txt",
        "text": """
        E-commerce a online prodej
        
        Online obchody za≈æ√≠vaj√≠ boom. Pro √∫spƒõch v e-commerce je kl√≠ƒçov√° u≈æivatelsk√° zku≈°enost, rychl√© naƒç√≠t√°n√≠ str√°nek a d≈Øvƒõryhodnost.
        
        Optimalizace konverzn√≠ho pomƒõru zahrnuje A/B testov√°n√≠, personalizaci nab√≠dek a zjednodu≈°en√≠ checkout procesu.
        Ka≈æd√Ω krok nav√≠c v n√°kupn√≠m procesu sni≈æuje konverzi o 10-15%.
        
        Logistika a fulfillment jsou kritick√©. Z√°kazn√≠ci oƒçek√°vaj√≠ rychl√© doruƒçen√≠ a snadn√© vr√°cen√≠ zbo≈æ√≠.
        Same-day delivery se st√°v√° standardem ve vƒõt≈°√≠ch mƒõstech.
        
        Platebn√≠ br√°ny mus√≠ podporovat r≈Øzn√© metody - karty, Apple Pay, Google Pay, bankovn√≠ p≈ôevody.
        Bezpeƒçnost plateb je prioritou - PCI DSS compliance je nutnost√≠.
        """
    }
]

# Chunk a indexace v≈°ech dokument≈Ø
chunker = DocumentChunker(chunk_size=300, chunk_overlap=30)
all_chunks = []

for doc in documents:
    doc_chunks = chunker.chunk_by_size(doc["text"], source=doc["source"])
    all_chunks.extend(doc_chunks)

vector_store.add_documents(all_chunks)

In [None]:
# Test vyhled√°v√°n√≠
test_queries = [
    "Jak m≈Ø≈æe AI pomoci se z√°kaznick√Ωm servisem?",
    "Co je d≈Øle≈æit√© pro e-shop?",
    "Jak mƒõ≈ôit spokojenost z√°kazn√≠k≈Ø?",
    "Jak√© jsou v√Ωhody chatbot≈Ø?"
]

for query in test_queries:
    print(f"\nDotaz: {query}")
    print("-" * 50)
    
    results = vector_store.search(query, top_k=2)
    for chunk, score in results:
        print(f"  [{score:.3f}] {chunk.source}: {chunk.text[:100]}...")

## 3. RAG Pipeline - Kompletn√≠ Syst√©m

Spoj√≠me vyhled√°v√°n√≠ s generov√°n√≠m odpovƒõd√≠ pomoc√≠ LLM.

In [None]:
class RAGPipeline:
    """
    Kompletn√≠ RAG pipeline pro question answering.
    """
    
    def __init__(
        self,
        vector_store: VectorStore,
        llm_model: str = "google/flan-t5-base",
        top_k: int = 3,
        max_context_length: int = 1500
    ):
        self.vector_store = vector_store
        self.top_k = top_k
        self.max_context_length = max_context_length
        
        # Naƒçten√≠ LLM
        print(f"Naƒç√≠t√°m LLM: {llm_model}")
        self.generator = pipeline(
            "text2text-generation",
            model=llm_model,
            device=0 if torch.cuda.is_available() else -1,
            max_length=512
        )
        print("LLM naƒçten.")
    
    def _build_context(self, chunks: List[Tuple[DocumentChunk, float]]) -> str:
        """Sestav√≠ kontext z nalezen√Ωch chunk≈Ø."""
        context_parts = []
        total_length = 0
        
        for chunk, score in chunks:
            if total_length + len(chunk.text) > self.max_context_length:
                break
            context_parts.append(chunk.text)
            total_length += len(chunk.text)
        
        return "\n\n".join(context_parts)
    
    def _build_prompt(self, query: str, context: str) -> str:
        """Vytvo≈ô√≠ prompt pro LLM."""
        prompt = f"""Na z√°kladƒõ n√°sleduj√≠c√≠ho kontextu odpovƒõz na ot√°zku. 
Pokud odpovƒõƒè nen√≠ v kontextu, ≈ôekni ≈æe nev√≠≈°.

Kontext:
{context}

Ot√°zka: {query}

Odpovƒõƒè:"""
        return prompt
    
    def query(
        self,
        question: str,
        return_sources: bool = True
    ) -> Dict[str, Any]:
        """Zpracuje dotaz a vr√°t√≠ odpovƒõƒè."""
        # 1. Vyhled√°n√≠ relevantn√≠ch dokument≈Ø
        retrieved = self.vector_store.search(question, top_k=self.top_k)
        
        if not retrieved:
            return {
                "answer": "Nena≈°el jsem ≈æ√°dn√© relevantn√≠ informace.",
                "sources": [],
                "context": ""
            }
        
        # 2. Sestaven√≠ kontextu
        context = self._build_context(retrieved)
        
        # 3. Generov√°n√≠ odpovƒõdi
        prompt = self._build_prompt(question, context)
        response = self.generator(prompt)[0]["generated_text"]
        
        # 4. P≈ô√≠prava v√Ωsledku
        result = {
            "answer": response.strip(),
            "context": context
        }
        
        if return_sources:
            result["sources"] = [
                {
                    "source": chunk.source,
                    "score": score,
                    "text": chunk.text[:200] + "..."
                }
                for chunk, score in retrieved
            ]
        
        return result
    
    def batch_query(self, questions: List[str]) -> List[Dict[str, Any]]:
        """Zpracuje v√≠ce dotaz≈Ø najednou."""
        return [self.query(q) for q in questions]

In [None]:
# Vytvo≈ôen√≠ RAG pipeline
rag = RAGPipeline(
    vector_store=vector_store,
    llm_model="google/flan-t5-base",
    top_k=3
)

In [None]:
# Test RAG pipeline
questions = [
    "Jak m≈Ø≈æe AI pomoci v z√°kaznick√©m servisu?",
    "Co je d≈Øle≈æit√© pro √∫spƒõch e-shopu?",
    "Jak mƒõ≈ôit spokojenost z√°kazn√≠k≈Ø?"
]

for question in questions:
    print("\n" + "="*60)
    print(f"OT√ÅZKA: {question}")
    print("="*60)
    
    result = rag.query(question)
    
    print(f"\nODPOVƒöƒé: {result['answer']}")
    print(f"\nZDROJE:")
    for source in result["sources"]:
        print(f"  - {source['source']} (score: {source['score']:.3f})")

## 4. Pokroƒçil√Ω RAG s Re-ranking

P≈ôid√°me cross-encoder pro lep≈°√≠ hodnocen√≠ relevance.

In [None]:
from sentence_transformers import CrossEncoder

class AdvancedRAGPipeline:
    """
    Pokroƒçil√Ω RAG s re-rankingem a hybrid search.
    """
    
    def __init__(
        self,
        vector_store: VectorStore,
        llm_model: str = "google/flan-t5-base",
        reranker_model: str = "cross-encoder/ms-marco-MiniLM-L-6-v2",
        initial_k: int = 10,
        final_k: int = 3
    ):
        self.vector_store = vector_store
        self.initial_k = initial_k
        self.final_k = final_k
        
        # LLM
        print(f"Naƒç√≠t√°m LLM: {llm_model}")
        self.generator = pipeline(
            "text2text-generation",
            model=llm_model,
            device=0 if torch.cuda.is_available() else -1,
            max_length=512
        )
        
        # Re-ranker (Cross-encoder)
        print(f"Naƒç√≠t√°m re-ranker: {reranker_model}")
        self.reranker = CrossEncoder(reranker_model)
        print("Modely naƒçteny.")
    
    def _rerank(self, query: str, chunks: List[Tuple[DocumentChunk, float]]) -> List[Tuple[DocumentChunk, float]]:
        """Re-rankuje v√Ωsledky pomoc√≠ cross-encoderu."""
        if not chunks:
            return []
        
        # P≈ô√≠prava p√°r≈Ø pro cross-encoder
        pairs = [(query, chunk.text) for chunk, _ in chunks]
        
        # Sk√≥re z cross-encoderu
        scores = self.reranker.predict(pairs)
        
        # Kombinace s p≈Øvodn√≠m sk√≥re
        reranked = []
        for (chunk, original_score), new_score in zip(chunks, scores):
            combined_score = 0.3 * original_score + 0.7 * new_score  # V√°ha pro cross-encoder
            reranked.append((chunk, combined_score))
        
        # Se≈ôazen√≠ podle nov√©ho sk√≥re
        reranked.sort(key=lambda x: x[1], reverse=True)
        
        return reranked[:self.final_k]
    
    def query(
        self,
        question: str,
        use_reranking: bool = True
    ) -> Dict[str, Any]:
        """Zpracuje dotaz s voliteln√Ωm re-rankingem."""
        # 1. Inicialn√≠ vyhled√°v√°n√≠ (v√≠ce v√Ωsledk≈Ø)
        initial_results = self.vector_store.search(question, top_k=self.initial_k)
        
        if not initial_results:
            return {"answer": "Nena≈°el jsem relevantn√≠ informace.", "sources": []}
        
        # 2. Re-ranking (pokud je povolen)
        if use_reranking:
            final_results = self._rerank(question, initial_results)
        else:
            final_results = initial_results[:self.final_k]
        
        # 3. Sestaven√≠ kontextu
        context = "\n\n".join([chunk.text for chunk, _ in final_results])
        
        # 4. Generov√°n√≠ odpovƒõdi
        prompt = f"""Odpovƒõz na ot√°zku na z√°kladƒõ kontextu. Buƒè struƒçn√Ω a konkr√©tn√≠.

Kontext:
{context}

Ot√°zka: {question}

Odpovƒõƒè:"""
        
        response = self.generator(prompt)[0]["generated_text"]
        
        return {
            "answer": response.strip(),
            "sources": [
                {"source": c.source, "score": s, "text": c.text[:150]}
                for c, s in final_results
            ],
            "reranked": use_reranking
        }

In [None]:
# Test pokroƒçil√©ho RAG
advanced_rag = AdvancedRAGPipeline(
    vector_store=vector_store,
    llm_model="google/flan-t5-base",
    initial_k=10,
    final_k=3
)

question = "Jak√© technologie pom√°haj√≠ v z√°kaznick√©m servisu?"

# Bez re-rankingu
result_basic = advanced_rag.query(question, use_reranking=False)
print("BEZ RE-RANKINGU:")
print(f"Odpovƒõƒè: {result_basic['answer']}")

print("\n" + "="*50)

# S re-rankingem
result_reranked = advanced_rag.query(question, use_reranking=True)
print("S RE-RANKINGEM:")
print(f"Odpovƒõƒè: {result_reranked['answer']}")

## 5. Produkƒçn√≠ RAG Syst√©m

Kompletn√≠ syst√©m s cachingem, logov√°n√≠m a evaluac√≠.

In [None]:
from datetime import datetime
import hashlib
from collections import OrderedDict

class ProductionRAG:
    """
    Produkƒçn√≠ RAG syst√©m s pokroƒçil√Ωmi funkcemi.
    """
    
    def __init__(
        self,
        embedding_model: str = "all-MiniLM-L6-v2",
        llm_model: str = "google/flan-t5-base",
        cache_size: int = 100
    ):
        # Komponenty
        self.chunker = DocumentChunker(chunk_size=400, chunk_overlap=50)
        self.vector_store = VectorStore(embedding_model=embedding_model)
        
        # LLM
        self.generator = pipeline(
            "text2text-generation",
            model=llm_model,
            device=0 if torch.cuda.is_available() else -1,
            max_length=512
        )
        
        # Cache
        self.cache = OrderedDict()
        self.cache_size = cache_size
        
        # Statistiky
        self.stats = {
            "total_queries": 0,
            "cache_hits": 0,
            "avg_retrieval_time": 0,
            "avg_generation_time": 0
        }
        
        # Historie dotaz≈Ø
        self.query_history = []
    
    def add_document(self, text: str, source: str, metadata: Dict = None):
        """P≈ôid√° dokument do syst√©mu."""
        chunks = self.chunker.chunk_by_size(text, source=source)
        
        # P≈ôid√°n√≠ metadat
        if metadata:
            for chunk in chunks:
                chunk.metadata.update(metadata)
        
        self.vector_store.add_documents(chunks)
        return len(chunks)
    
    def add_documents_from_folder(self, folder_path: str, extensions: List[str] = [".txt", ".md"]):
        """Naƒçte dokumenty ze slo≈æky."""
        total_chunks = 0
        
        for filename in os.listdir(folder_path):
            if any(filename.endswith(ext) for ext in extensions):
                filepath = os.path.join(folder_path, filename)
                with open(filepath, "r", encoding="utf-8") as f:
                    text = f.read()
                
                chunks = self.add_document(text, source=filename)
                total_chunks += chunks
                print(f"  Naƒçteno {filename}: {chunks} chunk≈Ø")
        
        return total_chunks
    
    def _get_cache_key(self, query: str) -> str:
        """Vytvo≈ô√≠ kl√≠ƒç pro cache."""
        return hashlib.md5(query.lower().strip().encode()).hexdigest()
    
    def _update_cache(self, key: str, value: Dict):
        """Aktualizuje cache s LRU strategi√≠."""
        if key in self.cache:
            self.cache.move_to_end(key)
        else:
            if len(self.cache) >= self.cache_size:
                self.cache.popitem(last=False)
            self.cache[key] = value
    
    def query(
        self,
        question: str,
        top_k: int = 3,
        use_cache: bool = True,
        temperature: float = 0.7
    ) -> Dict[str, Any]:
        """Zpracuje dotaz."""
        import time
        start_time = time.time()
        
        self.stats["total_queries"] += 1
        cache_key = self._get_cache_key(question)
        
        # Cache check
        if use_cache and cache_key in self.cache:
            self.stats["cache_hits"] += 1
            result = self.cache[cache_key].copy()
            result["cached"] = True
            return result
        
        # Retrieval
        retrieval_start = time.time()
        retrieved = self.vector_store.search(question, top_k=top_k)
        retrieval_time = time.time() - retrieval_start
        
        if not retrieved:
            return {"answer": "Nena≈°el jsem relevantn√≠ informace.", "sources": []}
        
        # Build context
        context = "\n\n".join([chunk.text for chunk, _ in retrieved])
        
        # Generation
        generation_start = time.time()
        prompt = f"""Odpovƒõz na ot√°zku na z√°kladƒõ kontextu. Odpovƒõƒè by mƒõla b√Ωt struƒçn√° a p≈ôesn√°.

Kontext:
{context}

Ot√°zka: {question}

Odpovƒõƒè:"""
        
        response = self.generator(
            prompt,
            do_sample=temperature > 0,
            temperature=temperature if temperature > 0 else None
        )[0]["generated_text"]
        generation_time = time.time() - generation_start
        
        # Prepare result
        result = {
            "answer": response.strip(),
            "sources": [
                {"source": c.source, "score": float(s), "chunk_id": c.chunk_id}
                for c, s in retrieved
            ],
            "metadata": {
                "retrieval_time": retrieval_time,
                "generation_time": generation_time,
                "total_time": time.time() - start_time,
                "top_k": top_k
            },
            "cached": False
        }
        
        # Update cache
        if use_cache:
            self._update_cache(cache_key, result)
        
        # Log query
        self.query_history.append({
            "timestamp": datetime.now().isoformat(),
            "question": question,
            "answer": result["answer"],
            "sources_count": len(result["sources"]),
            "time": result["metadata"]["total_time"]
        })
        
        return result
    
    def get_statistics(self) -> Dict[str, Any]:
        """Vr√°t√≠ statistiky syst√©mu."""
        cache_hit_rate = (
            self.stats["cache_hits"] / self.stats["total_queries"]
            if self.stats["total_queries"] > 0 else 0
        )
        
        return {
            "total_queries": self.stats["total_queries"],
            "cache_hit_rate": cache_hit_rate,
            "documents_indexed": len(self.vector_store.chunks),
            "cache_size": len(self.cache),
            "recent_queries": self.query_history[-5:] if self.query_history else []
        }
    
    def evaluate(
        self,
        test_questions: List[str],
        expected_sources: List[str] = None
    ) -> Dict[str, float]:
        """Evaluuje syst√©m na testovac√≠ch datech."""
        results = {
            "total_questions": len(test_questions),
            "avg_sources_returned": 0,
            "avg_top_score": 0,
            "source_accuracy": 0 if expected_sources else None
        }
        
        total_sources = 0
        total_top_score = 0
        correct_sources = 0
        
        for i, question in enumerate(test_questions):
            response = self.query(question, use_cache=False)
            
            sources = response.get("sources", [])
            total_sources += len(sources)
            
            if sources:
                total_top_score += sources[0]["score"]
            
            if expected_sources and i < len(expected_sources):
                if any(s["source"] == expected_sources[i] for s in sources):
                    correct_sources += 1
        
        results["avg_sources_returned"] = total_sources / len(test_questions)
        results["avg_top_score"] = total_top_score / len(test_questions)
        
        if expected_sources:
            results["source_accuracy"] = correct_sources / len(test_questions)
        
        return results
    
    def save(self, path: str):
        """Ulo≈æ√≠ syst√©m."""
        os.makedirs(path, exist_ok=True)
        self.vector_store.save(f"{path}/vector_store")
        
        # Ulo≈æen√≠ statistik
        with open(f"{path}/stats.json", "w") as f:
            json.dump(self.stats, f)
        
        print(f"RAG syst√©m ulo≈æen do: {path}")

In [None]:
# Vytvo≈ôen√≠ produkƒçn√≠ho RAG syst√©mu
prod_rag = ProductionRAG(
    embedding_model="all-MiniLM-L6-v2",
    llm_model="google/flan-t5-base",
    cache_size=50
)

# P≈ôid√°n√≠ dokument≈Ø
for doc in documents:
    prod_rag.add_document(doc["text"], source=doc["source"])

In [None]:
# Test produkƒçn√≠ho RAG
test_questions = [
    "Co je prediktivn√≠ analytika?",
    "Jak zlep≈°it konverzi v e-shopu?",
    "Jak√© jsou v√Ωhody chatbot≈Ø?",
    "Co je NPS?"
]

print("TESTOV√ÅN√ç PRODUKƒåN√çHO RAG SYST√âMU")
print("="*60)

for q in test_questions:
    result = prod_rag.query(q)
    print(f"\nOt√°zka: {q}")
    print(f"Odpovƒõƒè: {result['answer']}")
    print(f"ƒåas: {result['metadata']['total_time']:.3f}s")
    print(f"Cached: {result['cached']}")

# Statistiky
print("\n" + "="*60)
print("STATISTIKY:")
stats = prod_rag.get_statistics()
for key, value in stats.items():
    if key != "recent_queries":
        print(f"  {key}: {value}")

In [None]:
# Test cachingu - druh√Ω dotaz by mƒõl b√Ωt z cache
print("TEST CACHINGU:")
print("-" * 40)

question = "Co je prediktivn√≠ analytika?"

# Prvn√≠ dotaz
result1 = prod_rag.query(question)
print(f"Prvn√≠ dotaz - ƒåas: {result1['metadata']['total_time']:.3f}s, Cached: {result1['cached']}")

# Druh√Ω dotaz (z cache)
result2 = prod_rag.query(question)
print(f"Druh√Ω dotaz - Cached: {result2['cached']}")

# Statistiky
stats = prod_rag.get_statistics()
print(f"\nCache hit rate: {stats['cache_hit_rate']:.2%}")

In [None]:
# Evaluace syst√©mu
eval_questions = [
    "Jak AI pom√°h√° firm√°m?",
    "Co je d≈Øle≈æit√© pro e-commerce?",
    "Jak funguje z√°kaznick√Ω servis?",
]

expected = ["ai_business.txt", "ecommerce.txt", "customer_service.txt"]

eval_results = prod_rag.evaluate(eval_questions, expected_sources=expected)

print("EVALUACE SYST√âMU:")
print("="*40)
for key, value in eval_results.items():
    if value is not None:
        print(f"  {key}: {value:.3f}" if isinstance(value, float) else f"  {key}: {value}")

## Shrnut√≠

V tomto notebooku jsme implementovali:

1. **Document Chunking** - r≈Øzn√© strategie rozdƒõlen√≠ dokument≈Ø
2. **Vector Store** - FAISS index pro rychl√© vyhled√°v√°n√≠
3. **Z√°kladn√≠ RAG Pipeline** - retrieval + generation
4. **Pokroƒçil√Ω RAG** - s cross-encoder re-rankingem
5. **Produkƒçn√≠ RAG** - s cachingem, statistikami a evaluac√≠

### Dal≈°√≠ vylep≈°en√≠ pro produkci:
- Hybrid search (BM25 + vektory)
- Streaming odpovƒõd√≠
- Multi-modal RAG (obr√°zky, tabulky)
- RAG s citacemi
- Fine-tuning embeddings na dom√©nov√Ωch datech