# üìÑ Document AI: OCR, Anal√Ωza Layoutu a Extrakce Dat

**Autor:** Martin Studna, Praut s.r.o.  
**Notebook:** 19/20 - Pokroƒçil√© zpracov√°n√≠ dokument≈Ø

---

## Co se nauƒç√≠te

1. **OCR s Transformery** - TrOCR pro rozpozn√°v√°n√≠ textu z obr√°zk≈Ø
2. **Anal√Ωza layoutu** - LayoutLM pro pochopen√≠ struktury dokument≈Ø
3. **Extrakce z formul√°≈ô≈Ø** - Donut pro strukturovanou extrakci dat
4. **Table extraction** - Extrakce tabulek z dokument≈Ø
5. **Produkƒçn√≠ Document Pipeline** - End-to-end ≈ôe≈°en√≠

---

## üîß Instalace a Setup

In [None]:
# Instalace pot≈ôebn√Ωch knihoven
!pip install -q transformers torch torchvision pillow pdf2image pytesseract
!pip install -q datasets evaluate accelerate sentencepiece
!pip install -q opencv-python-headless img2table

# Pro Colab - instalace syst√©mov√Ωch z√°vislost√≠
!apt-get install -y poppler-utils tesseract-ocr tesseract-ocr-ces 2>/dev/null || true

In [None]:
import torch
import torch.nn as nn
import numpy as np
from PIL import Image, ImageDraw, ImageFont
import io
import json
import re
from typing import Dict, List, Optional, Tuple, Any, Union
from dataclasses import dataclass, field
from collections import defaultdict
import warnings
warnings.filterwarnings('ignore')

# Detekce za≈ô√≠zen√≠
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"üñ•Ô∏è Pou≈æ√≠v√°m za≈ô√≠zen√≠: {device}")
if torch.cuda.is_available():
    print(f"   GPU: {torch.cuda.get_device_name(0)}")
    print(f"   Pamƒõ≈•: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

---

## üìù ƒå√°st 1: OCR s TrOCR

TrOCR (Transformer-based Optical Character Recognition) kombinuje vision encoder s textov√Ωm decoderem pro p≈ôesn√© rozpozn√°v√°n√≠ textu.

In [None]:
from transformers import TrOCRProcessor, VisionEncoderDecoderModel

class TrOCREngine:
    """OCR engine zalo≈æen√Ω na TrOCR modelu."""
    
    def __init__(self, model_name: str = "microsoft/trocr-base-printed"):
        """
        Args:
            model_name: N√°zev modelu (printed/handwritten)
                - microsoft/trocr-base-printed - pro ti≈°tƒõn√Ω text
                - microsoft/trocr-base-handwritten - pro rukou psan√Ω text
        """
        print(f"üì• Naƒç√≠t√°m TrOCR model: {model_name}")
        self.processor = TrOCRProcessor.from_pretrained(model_name)
        self.model = VisionEncoderDecoderModel.from_pretrained(model_name)
        self.model.to(device)
        self.model.eval()
        print("‚úÖ Model naƒçten")
    
    def recognize_text(self, image: Image.Image, max_length: int = 128) -> str:
        """Rozpozn√° text z obr√°zku."""
        # Konverze na RGB pokud pot≈ôeba
        if image.mode != 'RGB':
            image = image.convert('RGB')
        
        # P≈ô√≠prava vstup≈Ø
        pixel_values = self.processor(image, return_tensors='pt').pixel_values
        pixel_values = pixel_values.to(device)
        
        # Generov√°n√≠ textu
        with torch.no_grad():
            generated_ids = self.model.generate(
                pixel_values,
                max_length=max_length,
                num_beams=4,
                early_stopping=True
            )
        
        # Dek√≥dov√°n√≠
        text = self.processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
        return text.strip()
    
    def recognize_lines(self, image: Image.Image, line_boxes: List[Tuple[int, int, int, int]]) -> List[str]:
        """Rozpozn√° text z v√≠ce ≈ô√°dk≈Ø."""
        results = []
        for box in line_boxes:
            # V√Ω≈ôez ≈ô√°dku
            line_image = image.crop(box)
            text = self.recognize_text(line_image)
            results.append(text)
        return results

# Test TrOCR
print("\n" + "="*60)
print("TEST TrOCR Engine")
print("="*60)

In [None]:
def create_test_document_image(text_lines: List[str], 
                               width: int = 800, 
                               height: int = 600,
                               font_size: int = 24) -> Image.Image:
    """Vytvo≈ô√≠ testovac√≠ obr√°zek dokumentu s textem."""
    # B√≠l√© pozad√≠
    image = Image.new('RGB', (width, height), color='white')
    draw = ImageDraw.Draw(image)
    
    # Pou≈æit√≠ z√°kladn√≠ho fontu
    try:
        font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", font_size)
    except:
        font = ImageFont.load_default()
    
    # Kreslen√≠ textu
    y_position = 50
    line_height = font_size + 15
    
    for line in text_lines:
        draw.text((50, y_position), line, fill='black', font=font)
        y_position += line_height
    
    return image

# Vytvo≈ôen√≠ testovac√≠ho dokumentu
test_lines = [
    "FAKTURA c. 2024-001234",
    "Dodavatel: Praut s.r.o.",
    "ICO: 12345678",
    "Datum vystaveni: 15.01.2024",
    "Celkova castka: 12 500 Kc"
]

test_image = create_test_document_image(test_lines)
display(test_image)
print("üìÑ Testovac√≠ dokument vytvo≈ôen")

In [None]:
# Inicializace a test TrOCR
ocr_engine = TrOCREngine("microsoft/trocr-base-printed")

# Vytvo≈ôen√≠ jednotliv√Ωch ≈ô√°dk≈Ø pro OCR
def create_single_line_image(text: str, width: int = 600, height: int = 50) -> Image.Image:
    """Vytvo≈ô√≠ obr√°zek s jedn√≠m ≈ô√°dkem textu."""
    image = Image.new('RGB', (width, height), color='white')
    draw = ImageDraw.Draw(image)
    
    try:
        font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 28)
    except:
        font = ImageFont.load_default()
    
    draw.text((10, 10), text, fill='black', font=font)
    return image

# Test na jednotliv√Ωch ≈ô√°dc√≠ch
print("\nüìñ V√Ωsledky OCR:")
print("-" * 50)

for original_text in test_lines[:3]:  # Test prvn√≠ch 3 ≈ô√°dk≈Ø
    line_image = create_single_line_image(original_text)
    recognized_text = ocr_engine.recognize_text(line_image)
    
    print(f"Original:   '{original_text}'")
    print(f"Recognized: '{recognized_text}'")
    print("-" * 50)

---

## üìê ƒå√°st 2: Anal√Ωza Layoutu s LayoutLM

LayoutLM kombinuje textov√© embeddingy s prostorov√Ωmi informacemi pro pochopen√≠ struktury dokument≈Ø.

In [None]:
from transformers import LayoutLMv3Processor, LayoutLMv3ForTokenClassification
from transformers import LayoutLMv3ForSequenceClassification

class DocumentLayoutAnalyzer:
    """Analyz√°tor struktury dokument≈Ø pomoc√≠ LayoutLMv3."""
    
    # ≈†t√≠tky pro klasifikaci entit v dokumentech
    ENTITY_LABELS = [
        'O',           # Outside - nen√≠ entita
        'B-HEADER',    # Zaƒç√°tek z√°hlav√≠
        'I-HEADER',    # Uvnit≈ô z√°hlav√≠
        'B-QUESTION',  # Zaƒç√°tek ot√°zky
        'I-QUESTION',  # Uvnit≈ô ot√°zky
        'B-ANSWER',    # Zaƒç√°tek odpovƒõdi
        'I-ANSWER',    # Uvnit≈ô odpovƒõdi
        'B-KEY',       # Zaƒç√°tek kl√≠ƒçe (label)
        'I-KEY',       # Uvnit≈ô kl√≠ƒçe
        'B-VALUE',     # Zaƒç√°tek hodnoty
        'I-VALUE',     # Uvnit≈ô hodnoty
    ]
    
    def __init__(self, model_name: str = "microsoft/layoutlmv3-base"):
        """
        Args:
            model_name: N√°zev LayoutLM modelu
        """
        print(f"üì• Naƒç√≠t√°m LayoutLMv3: {model_name}")
        self.processor = LayoutLMv3Processor.from_pretrained(
            model_name,
            apply_ocr=True  # Automatick√© OCR
        )
        self.model = LayoutLMv3ForTokenClassification.from_pretrained(
            model_name,
            num_labels=len(self.ENTITY_LABELS)
        )
        self.model.to(device)
        self.model.eval()
        print("‚úÖ LayoutLMv3 naƒçten")
    
    def analyze_document(self, image: Image.Image) -> Dict[str, Any]:
        """
        Analyzuje strukturu dokumentu.
        
        Returns:
            Dict obsahuj√≠c√≠:
            - words: Seznam slov
            - boxes: Bounding boxy slov
            - entities: Detekovan√© entity
        """
        # Konverze na RGB
        if image.mode != 'RGB':
            image = image.convert('RGB')
        
        # P≈ô√≠prava vstup≈Ø (processor provede OCR automaticky)
        encoding = self.processor(
            image,
            return_tensors='pt',
            truncation=True,
            max_length=512
        )
        
        # P≈ôesun na device
        encoding = {k: v.to(device) for k, v in encoding.items()}
        
        # Inference
        with torch.no_grad():
            outputs = self.model(**encoding)
        
        # Zpracov√°n√≠ v√Ωsledk≈Ø
        predictions = outputs.logits.argmax(-1).squeeze().cpu().numpy()
        
        # Extrakce slov a box≈Ø
        words = encoding.get('input_ids', [])
        boxes = encoding.get('bbox', [])
        
        # Dek√≥dov√°n√≠ token≈Ø
        tokens = self.processor.tokenizer.convert_ids_to_tokens(
            encoding['input_ids'].squeeze().cpu().numpy()
        )
        
        # Sestaven√≠ v√Ωsledk≈Ø
        results = {
            'tokens': tokens,
            'predictions': [self.ENTITY_LABELS[p] for p in predictions],
            'boxes': boxes.squeeze().cpu().numpy().tolist() if torch.is_tensor(boxes) else [],
            'entities': self._extract_entities(tokens, predictions)
        }
        
        return results
    
    def _extract_entities(self, tokens: List[str], predictions: np.ndarray) -> List[Dict]:
        """Extrahuje entity z predikc√≠."""
        entities = []
        current_entity = None
        current_tokens = []
        
        for i, (token, pred_idx) in enumerate(zip(tokens, predictions)):
            label = self.ENTITY_LABELS[pred_idx]
            
            if label.startswith('B-'):
                # Ulo≈æen√≠ p≈ôedchoz√≠ entity
                if current_entity:
                    entities.append({
                        'type': current_entity,
                        'text': self._merge_tokens(current_tokens)
                    })
                
                # Nov√° entita
                current_entity = label[2:]
                current_tokens = [token]
            
            elif label.startswith('I-') and current_entity:
                current_tokens.append(token)
            
            else:
                # Konec entity
                if current_entity:
                    entities.append({
                        'type': current_entity,
                        'text': self._merge_tokens(current_tokens)
                    })
                current_entity = None
                current_tokens = []
        
        # Posledn√≠ entita
        if current_entity:
            entities.append({
                'type': current_entity,
                'text': self._merge_tokens(current_tokens)
            })
        
        return entities
    
    def _merge_tokens(self, tokens: List[str]) -> str:
        """Spoj√≠ tokeny do textu."""
        text = ' '.join(tokens)
        # Odstranƒõn√≠ speci√°ln√≠ch token≈Ø a mezer u ## token≈Ø
        text = text.replace(' ##', '')
        text = re.sub(r'\s+', ' ', text)
        return text.strip()

print("\n" + "="*60)
print("DocumentLayoutAnalyzer p≈ôipraven")
print("="*60)

In [None]:
# Vytvo≈ôen√≠ strukturovan√©ho testovac√≠ho dokumentu
def create_form_image() -> Image.Image:
    """Vytvo≈ô√≠ obr√°zek formul√°≈ôe."""
    width, height = 600, 400
    image = Image.new('RGB', (width, height), color='white')
    draw = ImageDraw.Draw(image)
    
    try:
        font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 18)
        font_bold = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf", 22)
    except:
        font = ImageFont.load_default()
        font_bold = font
    
    # Z√°hlav√≠
    draw.text((200, 20), "OBJEDNAVKA", fill='black', font=font_bold)
    
    # Pole formul√°≈ôe
    fields = [
        ("Cislo objednavky:", "OBJ-2024-001"),
        ("Datum:", "15.01.2024"),
        ("Zakaznik:", "Jan Novak"),
        ("Email:", "jan.novak@email.cz"),
        ("Produkt:", "AI Automatizace"),
        ("Castka:", "25 000 Kc"),
    ]
    
    y = 70
    for label, value in fields:
        draw.text((50, y), label, fill='gray', font=font)
        draw.text((250, y), value, fill='black', font=font)
        y += 45
    
    # R√°meƒçky
    draw.rectangle([40, 60, 560, 340], outline='lightgray', width=2)
    draw.line([40, 55, 560, 55], fill='lightgray', width=2)
    
    return image

form_image = create_form_image()
display(form_image)
print("üìã Testovac√≠ formul√°≈ô vytvo≈ôen")

In [None]:
# Test LayoutLM analyz√°toru
layout_analyzer = DocumentLayoutAnalyzer()

# Anal√Ωza dokumentu
results = layout_analyzer.analyze_document(form_image)

print("\nüìä V√Ωsledky anal√Ωzy layoutu:")
print("-" * 50)
print(f"Poƒçet token≈Ø: {len(results['tokens'])}")
print(f"\nPrvn√≠ch 20 token≈Ø:")
for i, (token, pred) in enumerate(zip(results['tokens'][:20], results['predictions'][:20])):
    if token not in ['<s>', '</s>', '<pad>']:
        print(f"  {token:20} -> {pred}")

---

## üç© ƒå√°st 3: Donut - Document Understanding Transformer

Donut je end-to-end model pro extrakci strukturovan√Ωch dat z dokument≈Ø bez pot≈ôeby OCR.

In [None]:
from transformers import DonutProcessor, VisionEncoderDecoderModel as DonutModel

class DonutDocumentExtractor:
    """Extraktor dat z dokument≈Ø pomoc√≠ Donut modelu."""
    
    def __init__(self, model_name: str = "naver-clova-ix/donut-base-finetuned-cord-v2"):
        """
        Args:
            model_name: N√°zev Donut modelu
                - naver-clova-ix/donut-base-finetuned-cord-v2 - pro √∫ƒçtenky
                - naver-clova-ix/donut-base-finetuned-docvqa - pro DocVQA
        """
        print(f"üì• Naƒç√≠t√°m Donut model: {model_name}")
        self.processor = DonutProcessor.from_pretrained(model_name)
        self.model = DonutModel.from_pretrained(model_name)
        self.model.to(device)
        self.model.eval()
        print("‚úÖ Donut model naƒçten")
    
    def extract_data(self, image: Image.Image, task_prompt: str = "<s_cord-v2>") -> Dict:
        """
        Extrahuje strukturovan√° data z dokumentu.
        
        Args:
            image: Obr√°zek dokumentu
            task_prompt: Prompt pro typ extrakce
        
        Returns:
            Strukturovan√° data z dokumentu
        """
        # Konverze na RGB
        if image.mode != 'RGB':
            image = image.convert('RGB')
        
        # P≈ô√≠prava vstup≈Ø
        pixel_values = self.processor(image, return_tensors='pt').pixel_values
        pixel_values = pixel_values.to(device)
        
        # P≈ô√≠prava decoder input
        decoder_input_ids = self.processor.tokenizer(
            task_prompt, 
            add_special_tokens=False, 
            return_tensors='pt'
        ).input_ids
        decoder_input_ids = decoder_input_ids.to(device)
        
        # Generov√°n√≠
        with torch.no_grad():
            outputs = self.model.generate(
                pixel_values,
                decoder_input_ids=decoder_input_ids,
                max_length=self.model.decoder.config.max_position_embeddings,
                early_stopping=True,
                pad_token_id=self.processor.tokenizer.pad_token_id,
                eos_token_id=self.processor.tokenizer.eos_token_id,
                use_cache=True,
                num_beams=4,
                bad_words_ids=[[self.processor.tokenizer.unk_token_id]],
                return_dict_in_generate=True,
            )
        
        # Dek√≥dov√°n√≠ v√Ωstupu
        sequence = self.processor.batch_decode(outputs.sequences)[0]
        sequence = sequence.replace(self.processor.tokenizer.eos_token, '')
        sequence = sequence.replace(self.processor.tokenizer.pad_token, '')
        
        # Parsov√°n√≠ do JSON
        parsed = self.processor.token2json(sequence)
        
        return {
            'raw_output': sequence,
            'parsed': parsed
        }
    
    def answer_question(self, image: Image.Image, question: str) -> str:
        """
        Odpov√≠d√° na ot√°zky o dokumentu (pro DocVQA model).
        
        Args:
            image: Obr√°zek dokumentu
            question: Ot√°zka v p≈ôirozen√©m jazyce
        
        Returns:
            Odpovƒõƒè na ot√°zku
        """
        task_prompt = f"<s_docvqa><s_question>{question}</s_question><s_answer>"
        result = self.extract_data(image, task_prompt)
        
        # Extrakce odpovƒõdi
        answer = result['parsed'].get('answer', result['raw_output'])
        return answer

print("\n" + "="*60)
print("DonutDocumentExtractor p≈ôipraven")
print("="*60)

In [None]:
# Vytvo≈ôen√≠ testovac√≠ √∫ƒçtenky
def create_receipt_image() -> Image.Image:
    """Vytvo≈ô√≠ obr√°zek √∫ƒçtenky."""
    width, height = 400, 500
    image = Image.new('RGB', (width, height), color='white')
    draw = ImageDraw.Draw(image)
    
    try:
        font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSansMono.ttf", 14)
        font_bold = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSansMono-Bold.ttf", 16)
    except:
        font = ImageFont.load_default()
        font_bold = font
    
    y = 20
    center_x = width // 2
    
    # Z√°hlav√≠
    lines = [
        "================================",
        "      PRAUT COFFEE SHOP        ",
        "    Karlovy Vary, Cheb 123     ",
        "       ICO: 12345678           ",
        "================================",
        "",
        "Datum: 15.01.2024  14:32",
        "Pokladna: 1  Obsluha: Jana",
        "--------------------------------",
        "Cappuccino      2x    89.00",
        "Espresso        1x    59.00",
        "Cheesecake      1x   125.00",
        "Croissant       2x    65.00",
        "--------------------------------",
        "Celkem:              427.00 Kc",
        "DPH 15%:              55.70 Kc",
        "================================",
        "Hotove:              500.00 Kc",
        "Vratit:               73.00 Kc",
        "================================",
        "",
        "     Dekujeme za navstevu!     ",
    ]
    
    for line in lines:
        # Centrov√°n√≠
        bbox = draw.textbbox((0, 0), line, font=font)
        text_width = bbox[2] - bbox[0]
        x = (width - text_width) // 2
        draw.text((x, y), line, fill='black', font=font)
        y += 20
    
    return image

receipt_image = create_receipt_image()
display(receipt_image)
print("üßæ Testovac√≠ √∫ƒçtenka vytvo≈ôena")

In [None]:
# Test Donut extraktoru
donut_extractor = DonutDocumentExtractor()

# Extrakce dat z √∫ƒçtenky
extraction_result = donut_extractor.extract_data(receipt_image)

print("\nüç© V√Ωsledky Donut extrakce:")
print("-" * 50)
print("\nParsovan√° data:")
print(json.dumps(extraction_result['parsed'], indent=2, ensure_ascii=False))

---

## üìä ƒå√°st 4: Extrakce Tabulek

Specializovan√© ≈ôe≈°en√≠ pro detekci a extrakci tabulek z dokument≈Ø.

In [None]:
from transformers import TableTransformerForObjectDetection, DetrImageProcessor

class TableExtractor:
    """Extraktor tabulek z dokument≈Ø pomoc√≠ Table Transformer."""
    
    LABELS = ['table', 'table column', 'table row', 'table column header', 
              'table projected row header', 'table spanning cell']
    
    def __init__(self, detection_threshold: float = 0.7):
        """
        Args:
            detection_threshold: Pr√°h pro detekci tabulek
        """
        print("üì• Naƒç√≠t√°m Table Transformer...")
        
        # Model pro detekci tabulek
        self.detector_processor = DetrImageProcessor.from_pretrained(
            "microsoft/table-transformer-detection"
        )
        self.detector = TableTransformerForObjectDetection.from_pretrained(
            "microsoft/table-transformer-detection"
        )
        self.detector.to(device)
        self.detector.eval()
        
        # Model pro rozpozn√°n√≠ struktury tabulky
        self.structure_processor = DetrImageProcessor.from_pretrained(
            "microsoft/table-transformer-structure-recognition"
        )
        self.structure_model = TableTransformerForObjectDetection.from_pretrained(
            "microsoft/table-transformer-structure-recognition"
        )
        self.structure_model.to(device)
        self.structure_model.eval()
        
        self.threshold = detection_threshold
        print("‚úÖ Table Transformer naƒçten")
    
    def detect_tables(self, image: Image.Image) -> List[Dict]:
        """
        Detekuje tabulky v obr√°zku.
        
        Returns:
            Seznam detekovan√Ωch tabulek s bounding boxy
        """
        if image.mode != 'RGB':
            image = image.convert('RGB')
        
        # P≈ô√≠prava vstup≈Ø
        inputs = self.detector_processor(images=image, return_tensors='pt')
        inputs = {k: v.to(device) for k, v in inputs.items()}
        
        # Detekce
        with torch.no_grad():
            outputs = self.detector(**inputs)
        
        # Post-processing
        target_sizes = torch.tensor([image.size[::-1]]).to(device)
        results = self.detector_processor.post_process_object_detection(
            outputs, threshold=self.threshold, target_sizes=target_sizes
        )[0]
        
        tables = []
        for score, label, box in zip(results['scores'], results['labels'], results['boxes']):
            tables.append({
                'confidence': score.item(),
                'box': box.cpu().numpy().tolist(),
                'label': 'table'
            })
        
        return tables
    
    def analyze_structure(self, image: Image.Image, table_box: List[float]) -> Dict:
        """
        Analyzuje strukturu tabulky (≈ô√°dky, sloupce, bu≈àky).
        
        Args:
            image: Obr√°zek dokumentu
            table_box: Bounding box tabulky [x1, y1, x2, y2]
        
        Returns:
            Struktura tabulky
        """
        # V√Ω≈ôez tabulky
        x1, y1, x2, y2 = [int(c) for c in table_box]
        table_image = image.crop((x1, y1, x2, y2))
        
        if table_image.mode != 'RGB':
            table_image = table_image.convert('RGB')
        
        # P≈ô√≠prava vstup≈Ø
        inputs = self.structure_processor(images=table_image, return_tensors='pt')
        inputs = {k: v.to(device) for k, v in inputs.items()}
        
        # Anal√Ωza struktury
        with torch.no_grad():
            outputs = self.structure_model(**inputs)
        
        # Post-processing
        target_sizes = torch.tensor([table_image.size[::-1]]).to(device)
        results = self.structure_processor.post_process_object_detection(
            outputs, threshold=0.5, target_sizes=target_sizes
        )[0]
        
        # Organizace v√Ωsledk≈Ø podle typu
        structure = {
            'rows': [],
            'columns': [],
            'headers': [],
            'cells': []
        }
        
        label_map = {
            0: 'table',
            1: 'columns',
            2: 'rows',
            3: 'headers',
            4: 'spanning_cells',
            5: 'spanning_cells'
        }
        
        for score, label, box in zip(results['scores'], results['labels'], results['boxes']):
            label_idx = label.item()
            element_type = label_map.get(label_idx, 'unknown')
            
            element = {
                'confidence': score.item(),
                'box': box.cpu().numpy().tolist()
            }
            
            if element_type == 'rows':
                structure['rows'].append(element)
            elif element_type == 'columns':
                structure['columns'].append(element)
            elif element_type == 'headers':
                structure['headers'].append(element)
        
        # Odvozen√≠ bunƒõk z pr≈Øniku ≈ô√°dk≈Ø a sloupc≈Ø
        structure['cells'] = self._compute_cells(structure['rows'], structure['columns'])
        
        return structure
    
    def _compute_cells(self, rows: List[Dict], columns: List[Dict]) -> List[Dict]:
        """Vypoƒç√≠t√° bu≈àky z pr≈Ønik≈Ø ≈ô√°dk≈Ø a sloupc≈Ø."""
        cells = []
        
        for i, row in enumerate(rows):
            r_x1, r_y1, r_x2, r_y2 = row['box']
            
            for j, col in enumerate(columns):
                c_x1, c_y1, c_x2, c_y2 = col['box']
                
                # Pr≈Ønik
                cell_x1 = max(r_x1, c_x1)
                cell_y1 = max(r_y1, c_y1)
                cell_x2 = min(r_x2, c_x2)
                cell_y2 = min(r_y2, c_y2)
                
                if cell_x1 < cell_x2 and cell_y1 < cell_y2:
                    cells.append({
                        'row': i,
                        'column': j,
                        'box': [cell_x1, cell_y1, cell_x2, cell_y2]
                    })
        
        return cells
    
    def visualize_detection(self, image: Image.Image, tables: List[Dict]) -> Image.Image:
        """Vizualizuje detekovan√© tabulky."""
        vis_image = image.copy()
        draw = ImageDraw.Draw(vis_image)
        
        colors = ['red', 'blue', 'green', 'orange', 'purple']
        
        for i, table in enumerate(tables):
            color = colors[i % len(colors)]
            box = table['box']
            
            draw.rectangle(box, outline=color, width=3)
            
            label = f"Table {i+1}: {table['confidence']:.2f}"
            draw.text((box[0], box[1] - 20), label, fill=color)
        
        return vis_image

print("\n" + "="*60)
print("TableExtractor p≈ôipraven")
print("="*60)

In [None]:
# Vytvo≈ôen√≠ testovac√≠ho dokumentu s tabulkou
def create_table_document() -> Image.Image:
    """Vytvo≈ô√≠ dokument obsahuj√≠c√≠ tabulku."""
    width, height = 700, 500
    image = Image.new('RGB', (width, height), color='white')
    draw = ImageDraw.Draw(image)
    
    try:
        font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 14)
        font_bold = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf", 16)
    except:
        font = ImageFont.load_default()
        font_bold = font
    
    # Z√°hlav√≠ dokumentu
    draw.text((50, 30), "P≈òEHLED PRODEJ≈Æ - Q1 2024", fill='black', font=font_bold)
    
    # Tabulka
    table_x, table_y = 50, 80
    col_widths = [150, 100, 100, 100, 100]
    row_height = 35
    
    # Data tabulky
    headers = ["Produkt", "Leden", "√önor", "B≈ôezen", "Celkem"]
    data = [
        ["AI Automatizace", "45 000", "52 000", "61 000", "158 000"],
        ["ML Konzultace", "32 000", "38 000", "41 000", "111 000"],
        ["Cloud Setup", "28 000", "31 000", "35 000", "94 000"],
        ["≈†kolen√≠", "15 000", "18 000", "22 000", "55 000"],
        ["Celkem", "120 000", "139 000", "159 000", "418 000"],
    ]
    
    # Kreslen√≠ tabulky
    current_y = table_y
    
    # Z√°hlav√≠
    current_x = table_x
    for i, (header, width) in enumerate(zip(headers, col_widths)):
        # Bu≈àka
        draw.rectangle([current_x, current_y, current_x + width, current_y + row_height],
                      outline='black', fill='lightgray')
        # Text
        draw.text((current_x + 10, current_y + 10), header, fill='black', font=font_bold)
        current_x += width
    
    current_y += row_height
    
    # Data
    for row in data:
        current_x = table_x
        is_total = row[0] == "Celkem"
        
        for value, width in zip(row, col_widths):
            fill_color = 'lightyellow' if is_total else 'white'
            draw.rectangle([current_x, current_y, current_x + width, current_y + row_height],
                          outline='black', fill=fill_color)
            draw.text((current_x + 10, current_y + 10), value, fill='black', font=font)
            current_x += width
        
        current_y += row_height
    
    # Pozn√°mka pod tabulkou
    draw.text((50, current_y + 20), "* V≈°echny ƒç√°stky jsou v Kƒç", fill='gray', font=font)
    
    return image

table_document = create_table_document()
display(table_document)
print("üìä Testovac√≠ dokument s tabulkou vytvo≈ôen")

In [None]:
# Test Table Extractor
table_extractor = TableExtractor(detection_threshold=0.5)

# Detekce tabulek
detected_tables = table_extractor.detect_tables(table_document)

print(f"\nüìä Detekov√°no {len(detected_tables)} tabulek:")
print("-" * 50)

for i, table in enumerate(detected_tables):
    print(f"\nTabulka {i+1}:")
    print(f"  Confidence: {table['confidence']:.3f}")
    print(f"  Box: {[int(c) for c in table['box']]}")
    
    # Anal√Ωza struktury
    if table['confidence'] > 0.5:
        structure = table_extractor.analyze_structure(table_document, table['box'])
        print(f"  ≈ò√°dk≈Ø: {len(structure['rows'])}")
        print(f"  Sloupc≈Ø: {len(structure['columns'])}")
        print(f"  Bunƒõk: {len(structure['cells'])}")

# Vizualizace
if detected_tables:
    vis_image = table_extractor.visualize_detection(table_document, detected_tables)
    display(vis_image)

---

## üè≠ ƒå√°st 5: Produkƒçn√≠ Document Pipeline

Kompletn√≠ pipeline pro zpracov√°n√≠ dokument≈Ø v produkƒçn√≠m prost≈ôed√≠.

In [None]:
from enum import Enum
from datetime import datetime
import hashlib
import time

class DocumentType(Enum):
    INVOICE = "invoice"
    RECEIPT = "receipt"
    FORM = "form"
    REPORT = "report"
    CONTRACT = "contract"
    ID_DOCUMENT = "id_document"
    UNKNOWN = "unknown"

@dataclass
class ProcessedDocument:
    """V√Ωsledek zpracov√°n√≠ dokumentu."""
    document_id: str
    document_type: DocumentType
    extracted_text: str
    structured_data: Dict[str, Any]
    tables: List[Dict]
    entities: List[Dict]
    confidence: float
    processing_time: float
    metadata: Dict[str, Any] = field(default_factory=dict)
    
    def to_dict(self) -> Dict:
        return {
            'document_id': self.document_id,
            'document_type': self.document_type.value,
            'extracted_text': self.extracted_text,
            'structured_data': self.structured_data,
            'tables': self.tables,
            'entities': self.entities,
            'confidence': self.confidence,
            'processing_time': self.processing_time,
            'metadata': self.metadata
        }

class ProductionDocumentPipeline:
    """
    Produkƒçn√≠ pipeline pro zpracov√°n√≠ dokument≈Ø.
    
    Kombinuje OCR, layout anal√Ωzu, extrakci dat a tabulek
    do jednoho unifikovan√©ho rozhran√≠.
    """
    
    def __init__(self, 
                 enable_ocr: bool = True,
                 enable_layout: bool = True,
                 enable_table_extraction: bool = True,
                 cache_size: int = 100):
        """
        Args:
            enable_ocr: Povolit TrOCR engine
            enable_layout: Povolit LayoutLM anal√Ωzu
            enable_table_extraction: Povolit extrakci tabulek
            cache_size: Velikost cache pro v√Ωsledky
        """
        print("üè≠ Inicializace Document Pipeline...")
        
        self.components = {}
        
        if enable_ocr:
            self.components['ocr'] = TrOCREngine()
        
        if enable_layout:
            self.components['layout'] = DocumentLayoutAnalyzer()
        
        if enable_table_extraction:
            self.components['tables'] = TableExtractor()
        
        # Cache a statistiky
        self.cache = {}
        self.cache_size = cache_size
        self.stats = {
            'documents_processed': 0,
            'total_processing_time': 0,
            'cache_hits': 0,
            'errors': 0,
            'by_type': defaultdict(int)
        }
        
        # Regex patterns pro entity
        self.patterns = {
            'invoice_number': r'(?:fa[ck]tura|invoice)\s*(?:ƒç\.?|c\.?|no\.?|#)?\s*[:.]?\s*([\w\-/]+)',
            'date': r'(\d{1,2}[./]\d{1,2}[./]\d{2,4})',
            'amount': r'(\d{1,3}(?:\s?\d{3})*(?:[,.]\d{2})?)\s*(?:Kƒç|CZK|EUR|USD|Kc)',
            'ico': r'Iƒå[O]?\s*[:.]?\s*(\d{8})',
            'dic': r'DIƒå\s*[:.]?\s*([A-Z]{2}\d{8,10})',
            'email': r'[\w.+-]+@[\w-]+\.[\w.-]+',
            'phone': r'(?:\+420\s?)?\d{3}\s?\d{3}\s?\d{3}',
        }
        
        print("‚úÖ Pipeline inicializov√°n")
        print(f"   Komponenty: {list(self.components.keys())}")
    
    def process(self, image: Image.Image, 
                document_hint: Optional[DocumentType] = None,
                use_cache: bool = True) -> ProcessedDocument:
        """
        Zpracuje dokument kompletn√≠m pipeline.
        
        Args:
            image: Obr√°zek dokumentu
            document_hint: N√°povƒõda typu dokumentu
            use_cache: Pou≈æ√≠t cache
        
        Returns:
            ProcessedDocument se v≈°emi extrahovan√Ωmi daty
        """
        start_time = time.time()
        
        # Generov√°n√≠ ID dokumentu
        image_bytes = io.BytesIO()
        image.save(image_bytes, format='PNG')
        doc_id = hashlib.md5(image_bytes.getvalue()).hexdigest()[:12]
        
        # Kontrola cache
        if use_cache and doc_id in self.cache:
            self.stats['cache_hits'] += 1
            return self.cache[doc_id]
        
        try:
            # Konverze na RGB
            if image.mode != 'RGB':
                image = image.convert('RGB')
            
            # 1. OCR - extrakce textu
            extracted_text = ""
            if 'ocr' in self.components:
                extracted_text = self._extract_text_from_regions(image)
            
            # 2. Layout anal√Ωza
            layout_entities = []
            if 'layout' in self.components:
                layout_result = self.components['layout'].analyze_document(image)
                layout_entities = layout_result.get('entities', [])
            
            # 3. Extrakce tabulek
            tables = []
            if 'tables' in self.components:
                detected = self.components['tables'].detect_tables(image)
                for table in detected:
                    if table['confidence'] > 0.5:
                        structure = self.components['tables'].analyze_structure(
                            image, table['box']
                        )
                        tables.append({
                            'box': table['box'],
                            'confidence': table['confidence'],
                            'structure': structure
                        })
            
            # 4. Extrakce strukturovan√Ωch dat pomoc√≠ regex
            structured_data = self._extract_structured_data(extracted_text)
            
            # 5. Urƒçen√≠ typu dokumentu
            doc_type = document_hint or self._classify_document(extracted_text, structured_data)
            
            # 6. V√Ωpoƒçet confidence
            confidence = self._compute_confidence(extracted_text, structured_data, tables)
            
            processing_time = time.time() - start_time
            
            # Sestaven√≠ v√Ωsledku
            result = ProcessedDocument(
                document_id=doc_id,
                document_type=doc_type,
                extracted_text=extracted_text,
                structured_data=structured_data,
                tables=tables,
                entities=layout_entities,
                confidence=confidence,
                processing_time=processing_time,
                metadata={
                    'image_size': image.size,
                    'processed_at': datetime.now().isoformat(),
                    'components_used': list(self.components.keys())
                }
            )
            
            # Aktualizace statistik
            self.stats['documents_processed'] += 1
            self.stats['total_processing_time'] += processing_time
            self.stats['by_type'][doc_type.value] += 1
            
            # Cache
            if use_cache:
                self._add_to_cache(doc_id, result)
            
            return result
            
        except Exception as e:
            self.stats['errors'] += 1
            raise RuntimeError(f"Chyba p≈ôi zpracov√°n√≠ dokumentu: {e}")
    
    def _extract_text_from_regions(self, image: Image.Image) -> str:
        """Extrahuje text z cel√©ho obr√°zku pomoc√≠ rozdƒõlen√≠ na regiony."""
        # Pro jednoduchost - cel√Ω obr√°zek
        # V produkci by se rozdƒõlil na ≈ô√°dky pomoc√≠ detekce
        width, height = image.size
        
        # Rozdƒõlen√≠ na horizont√°ln√≠ pruhy
        texts = []
        strip_height = 50
        
        for y in range(0, height - strip_height, strip_height):
            strip = image.crop((0, y, width, y + strip_height))
            text = self.components['ocr'].recognize_text(strip)
            if text.strip():
                texts.append(text)
        
        return '\n'.join(texts)
    
    def _extract_structured_data(self, text: str) -> Dict[str, Any]:
        """Extrahuje strukturovan√° data pomoc√≠ regex patterns."""
        data = {}
        text_lower = text.lower()
        
        for field, pattern in self.patterns.items():
            matches = re.findall(pattern, text, re.IGNORECASE)
            if matches:
                if len(matches) == 1:
                    data[field] = matches[0]
                else:
                    data[field] = matches
        
        return data
    
    def _classify_document(self, text: str, structured_data: Dict) -> DocumentType:
        """Klasifikuje typ dokumentu na z√°kladƒõ obsahu."""
        text_lower = text.lower()
        
        # Heuristick√° klasifikace
        if any(kw in text_lower for kw in ['faktura', 'invoice', 'da≈àov√Ω doklad']):
            return DocumentType.INVOICE
        elif any(kw in text_lower for kw in ['√∫ƒçtenka', 'receipt', 'pokladn√≠']):
            return DocumentType.RECEIPT
        elif any(kw in text_lower for kw in ['smlouva', 'contract', 'agreement']):
            return DocumentType.CONTRACT
        elif any(kw in text_lower for kw in ['formul√°≈ô', 'form', '≈æ√°dost', 'p≈ôihl√°≈°ka']):
            return DocumentType.FORM
        elif any(kw in text_lower for kw in ['zpr√°va', 'report', 'p≈ôehled']):
            return DocumentType.REPORT
        elif any(kw in text_lower for kw in ['obƒçansk√Ω pr≈Økaz', 'passport', '≈ôidiƒçsk√Ω']):
            return DocumentType.ID_DOCUMENT
        
        return DocumentType.UNKNOWN
    
    def _compute_confidence(self, text: str, 
                           structured_data: Dict, 
                           tables: List) -> float:
        """Vypoƒç√≠t√° celkovou confidence sk√≥re."""
        scores = []
        
        # Text quality
        if text:
            # Penalizace za p≈ô√≠li≈° kr√°tk√Ω text
            text_score = min(1.0, len(text) / 100)
            scores.append(text_score)
        
        # Structured data extraction
        if structured_data:
            data_score = min(1.0, len(structured_data) / 5)
            scores.append(data_score)
        
        # Table detection confidence
        if tables:
            table_scores = [t['confidence'] for t in tables]
            scores.extend(table_scores)
        
        return np.mean(scores) if scores else 0.0
    
    def _add_to_cache(self, doc_id: str, result: ProcessedDocument):
        """P≈ôid√° v√Ωsledek do cache s LRU eviction."""
        if len(self.cache) >= self.cache_size:
            # Odstranƒõn√≠ nejstar≈°√≠ho
            oldest_key = next(iter(self.cache))
            del self.cache[oldest_key]
        
        self.cache[doc_id] = result
    
    def get_stats(self) -> Dict:
        """Vr√°t√≠ statistiky pipeline."""
        stats = dict(self.stats)
        stats['by_type'] = dict(stats['by_type'])
        
        if stats['documents_processed'] > 0:
            stats['avg_processing_time'] = (
                stats['total_processing_time'] / stats['documents_processed']
            )
            stats['cache_hit_rate'] = (
                stats['cache_hits'] / 
                (stats['documents_processed'] + stats['cache_hits'])
            )
        
        return stats

print("\n" + "="*60)
print("ProductionDocumentPipeline p≈ôipraven")
print("="*60)

In [None]:
# Inicializace produkƒçn√≠ho pipeline
# Pro demo pou≈æ√≠v√°me jen nƒõkter√© komponenty
pipeline = ProductionDocumentPipeline(
    enable_ocr=True,
    enable_layout=True,
    enable_table_extraction=True,
    cache_size=50
)

In [None]:
# Vytvo≈ôen√≠ komplexn√≠ho testovac√≠ho dokumentu - faktury
def create_invoice_image() -> Image.Image:
    """Vytvo≈ô√≠ realistick√Ω obr√°zek faktury."""
    width, height = 600, 800
    image = Image.new('RGB', (width, height), color='white')
    draw = ImageDraw.Draw(image)
    
    try:
        font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 12)
        font_bold = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf", 14)
        font_title = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf", 20)
    except:
        font = ImageFont.load_default()
        font_bold = font
        font_title = font
    
    # Logo/hlaviƒçka
    draw.text((50, 30), "PRAUT s.r.o.", fill='darkblue', font=font_title)
    draw.text((50, 55), "AI Automatizace a Integrace", fill='gray', font=font)
    
    # Typ dokumentu
    draw.text((400, 30), "FAKTURA", fill='black', font=font_title)
    draw.text((400, 55), "Danovy doklad", fill='gray', font=font)
    
    # ƒå√≠slo faktury
    draw.text((400, 85), "Cislo: FA-2024-00123", fill='black', font=font_bold)
    
    # Horizont√°ln√≠ ƒç√°ra
    draw.line([(50, 110), (550, 110)], fill='lightgray', width=2)
    
    # Dodavatel
    y = 130
    draw.text((50, y), "Dodavatel:", fill='gray', font=font)
    draw.text((50, y+18), "Praut s.r.o.", fill='black', font=font_bold)
    draw.text((50, y+36), "Chebska 123", fill='black', font=font)
    draw.text((50, y+54), "350 02 Cheb", fill='black', font=font)
    draw.text((50, y+72), "ICO: 12345678", fill='black', font=font)
    draw.text((50, y+90), "DIC: CZ12345678", fill='black', font=font)
    
    # Odbƒõratel
    draw.text((320, y), "Odberatel:", fill='gray', font=font)
    draw.text((320, y+18), "ABC Company s.r.o.", fill='black', font=font_bold)
    draw.text((320, y+36), "Prazska 456", fill='black', font=font)
    draw.text((320, y+54), "110 00 Praha 1", fill='black', font=font)
    draw.text((320, y+72), "ICO: 87654321", fill='black', font=font)
    draw.text((320, y+90), "DIC: CZ87654321", fill='black', font=font)
    
    # Datumy
    y = 260
    draw.text((50, y), "Datum vystaveni: 15.01.2024", fill='black', font=font)
    draw.text((250, y), "Datum splatnosti: 29.01.2024", fill='black', font=font)
    draw.text((450, y), "DUZP: 15.01.2024", fill='black', font=font)
    
    # Tabulka polo≈æek
    y = 300
    draw.line([(50, y), (550, y)], fill='black', width=1)
    
    # Z√°hlav√≠ tabulky
    headers = ["Popis", "Pocet", "Cena/ks", "Celkem"]
    x_positions = [50, 280, 360, 460]
    
    for header, x in zip(headers, x_positions):
        draw.text((x, y+5), header, fill='black', font=font_bold)
    
    y += 25
    draw.line([(50, y), (550, y)], fill='black', width=1)
    
    # Polo≈æky
    items = [
        ("AI Konzultace - analyza", "8", "2 500 Kc", "20 000 Kc"),
        ("Vyvoj automatizace", "24", "1 800 Kc", "43 200 Kc"),
        ("Integrace API", "12", "2 000 Kc", "24 000 Kc"),
        ("Skoleni zamestnancu", "4", "3 500 Kc", "14 000 Kc"),
    ]
    
    for item in items:
        y += 5
        for text, x in zip(item, x_positions):
            draw.text((x, y), text, fill='black', font=font)
        y += 20
    
    # ƒå√°ra pod polo≈ækami
    y += 10
    draw.line([(50, y), (550, y)], fill='black', width=1)
    
    # Souƒçty
    y += 15
    draw.text((350, y), "Zaklad dane:", fill='black', font=font)
    draw.text((460, y), "101 200 Kc", fill='black', font=font_bold)
    
    y += 20
    draw.text((350, y), "DPH 21%:", fill='black', font=font)
    draw.text((460, y), "21 252 Kc", fill='black', font=font_bold)
    
    y += 25
    draw.line([(350, y), (550, y)], fill='black', width=2)
    y += 5
    draw.text((350, y), "CELKEM K UHRADE:", fill='black', font=font_bold)
    draw.text((460, y), "122 452 Kc", fill='darkblue', font=font_title)
    
    # Platebn√≠ √∫daje
    y = 580
    draw.line([(50, y), (550, y)], fill='lightgray', width=1)
    y += 15
    draw.text((50, y), "Platebni udaje:", fill='gray', font=font)
    y += 20
    draw.text((50, y), "Banka: Ceska sporitelna", fill='black', font=font)
    y += 18
    draw.text((50, y), "Cislo uctu: 123456789/0800", fill='black', font=font)
    y += 18
    draw.text((50, y), "Variabilni symbol: 202400123", fill='black', font=font)
    y += 18
    draw.text((50, y), "Konstantni symbol: 0308", fill='black', font=font)
    
    # QR k√≥d placeholder
    draw.rectangle([450, 600, 530, 680], outline='black', fill='lightgray')
    draw.text((465, 635), "QR PAY", fill='black', font=font)
    
    # Patiƒçka
    draw.text((50, 750), "Kontakt: info@praut.cz | +420 123 456 789 | www.praut.cz", 
              fill='gray', font=font)
    
    return image

invoice_image = create_invoice_image()
display(invoice_image)
print("üìÑ Testovac√≠ faktura vytvo≈ôena")

In [None]:
# Zpracov√°n√≠ faktury pipeline
result = pipeline.process(invoice_image)

print("\n" + "="*60)
print("V√ùSLEDKY ZPRACOV√ÅN√ç DOKUMENTU")
print("="*60)

print(f"\nüìã Z√°kladn√≠ informace:")
print(f"   ID dokumentu: {result.document_id}")
print(f"   Typ dokumentu: {result.document_type.value}")
print(f"   Confidence: {result.confidence:.2%}")
print(f"   ƒåas zpracov√°n√≠: {result.processing_time:.2f}s")

print(f"\nüìù Extrahovan√Ω text (uk√°zka):")
print(f"   {result.extracted_text[:200]}..." if len(result.extracted_text) > 200 else result.extracted_text)

print(f"\nüîç Strukturovan√° data:")
for key, value in result.structured_data.items():
    print(f"   {key}: {value}")

print(f"\nüìä Detekovan√© tabulky: {len(result.tables)}")
for i, table in enumerate(result.tables):
    print(f"   Tabulka {i+1}: confidence={table['confidence']:.2f}")

print(f"\nüìà Statistiky pipeline:")
stats = pipeline.get_stats()
print(f"   Zpracov√°no dokument≈Ø: {stats['documents_processed']}")
print(f"   Pr≈Ømƒõrn√Ω ƒças: {stats.get('avg_processing_time', 0):.2f}s")

In [None]:
# Batch processing v√≠ce dokument≈Ø
print("\n" + "="*60)
print("BATCH PROCESSING")
print("="*60)

# Vytvo≈ôen√≠ r≈Øzn√Ωch dokument≈Ø
documents = [
    ("Faktura", invoice_image),
    ("√öƒçtenka", receipt_image),
    ("Formul√°≈ô", form_image),
    ("Tabulka", table_document),
]

results = []
for name, image in documents:
    print(f"\nüìÑ Zpracov√°v√°m: {name}")
    result = pipeline.process(image)
    results.append((name, result))
    print(f"   Typ: {result.document_type.value}")
    print(f"   Confidence: {result.confidence:.2%}")
    print(f"   ƒåas: {result.processing_time:.2f}s")

# Fin√°ln√≠ statistiky
print("\n" + "-"*60)
print("FIN√ÅLN√ç STATISTIKY")
print("-"*60)
stats = pipeline.get_stats()
print(f"Celkem zpracov√°no: {stats['documents_processed']} dokument≈Ø")
print(f"Pr≈Ømƒõrn√Ω ƒças zpracov√°n√≠: {stats.get('avg_processing_time', 0):.2f}s")
print(f"Cache hit rate: {stats.get('cache_hit_rate', 0):.1%}")
print(f"Chyby: {stats['errors']}")
print(f"\nRozdƒõlen√≠ podle typu:")
for doc_type, count in stats['by_type'].items():
    print(f"   {doc_type}: {count}")

---

## üéØ Shrnut√≠

V tomto notebooku jsme implementovali:

### Komponenty Document AI

| Komponenta | Model | Pou≈æit√≠ |
|------------|-------|--------|
| **TrOCR** | microsoft/trocr-base-printed | Rozpozn√°v√°n√≠ ti≈°tƒõn√©ho textu |
| **LayoutLM** | microsoft/layoutlmv3-base | Anal√Ωza struktury dokumentu |
| **Donut** | naver-clova-ix/donut-base | End-to-end extrakce dat |
| **Table Transformer** | microsoft/table-transformer | Detekce a extrakce tabulek |

### Produkƒçn√≠ Pipeline

- **Modul√°rn√≠ architektura** - zap√≠n√°n√≠/vyp√≠n√°n√≠ komponent
- **Caching** - LRU cache pro opakovan√© dokumenty
- **Strukturovan√° extrakce** - regex patterns pro ƒçesk√© dokumenty
- **Automatick√° klasifikace** - rozpozn√°n√≠ typu dokumentu
- **Batch processing** - efektivn√≠ zpracov√°n√≠ v√≠ce dokument≈Ø

### Dal≈°√≠ kroky

1. **Fine-tuning** na ƒçesk√Ωch dokumentech
2. **Integrace s PDF** pomoc√≠ pdf2image
3. **REST API** pro produkƒçn√≠ nasazen√≠
4. **Validace dat** pomoc√≠ sch√©mat (Pydantic)
5. **Archivace** do datab√°ze s full-text search

In [None]:
print("\n" + "="*60)
print("üéâ Notebook 19: Document AI dokonƒçen!")
print("="*60)
print("\nüìö Nauƒçili jste se:")
print("   ‚úÖ TrOCR pro rozpozn√°v√°n√≠ textu")
print("   ‚úÖ LayoutLM pro anal√Ωzu struktury")
print("   ‚úÖ Donut pro end-to-end extrakci")
print("   ‚úÖ Table Transformer pro tabulky")
print("   ‚úÖ Produkƒçn√≠ Document Pipeline")
print("\nüöÄ Dal≈°√≠ notebook: Audio AI - TTS, Voice Processing")