# ü§ó √övod do Hugging Face Transformers

**Autor:** Praut s.r.o. - AI Integration & Business Automation

Tento notebook v√°s provede z√°klady pr√°ce s knihovnou Hugging Face Transformers - nejpopul√°rnƒõj≈°√≠ knihovnou pro pr√°ci s AI modely.

## Co se nauƒç√≠te:
- Instalace a konfigurace Hugging Face
- Z√°kladn√≠ pipeline pro r≈Øzn√© √∫lohy
- Pr√°ce s modely a tokenizery
- Praktick√© p≈ô√≠klady automatizace

In [None]:
# Instalace pot≈ôebn√Ωch knihoven
!pip install -q transformers accelerate torch sentencepiece

In [None]:
# Import z√°kladn√≠ch knihoven
from transformers import pipeline, AutoModel, AutoTokenizer
import torch

# Kontrola dostupnosti GPU
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"üñ•Ô∏è Pou≈æ√≠v√°m za≈ô√≠zen√≠: {device}")
if device == "cuda":
    print(f"   GPU: {torch.cuda.get_device_name(0)}")
    print(f"   Pamƒõ≈•: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

## 1. Pipeline - Nejjednodu≈°≈°√≠ zp≈Øsob pou≈æit√≠ model≈Ø

Pipeline je vysoko√∫rov≈àov√© API, kter√© automaticky:
- St√°hne model a tokenizer
- P≈ôedzpracuje vstup
- Provede inferenci
- Zpracuje v√Ωstup

In [None]:
# P≈ô√≠klad 1: Sentiment anal√Ωza
sentiment_analyzer = pipeline("sentiment-analysis", device=0 if device=="cuda" else -1)

texty = [
    "Tento produkt je naprosto skvƒõl√Ω, jsem velmi spokojen√Ω!",
    "Hrozn√° kvalita, nikdy v√≠ce nekoup√≠m.",
    "Je to pr≈Ømƒõrn√©, nic extra."
]

vysledky = sentiment_analyzer(texty)
for text, vysledek in zip(texty, vysledky):
    print(f"üìù {text}")
    print(f"   ‚Üí {vysledek['label']} (sk√≥re: {vysledek['score']:.2%})\n")

In [None]:
# P≈ô√≠klad 2: Generov√°n√≠ textu
generator = pipeline("text-generation", model="gpt2", device=0 if device=="cuda" else -1)

prompt = "Umƒõl√° inteligence v podnik√°n√≠ p≈ôin√°≈°√≠"
vysledek = generator(prompt, max_length=50, num_return_sequences=1, do_sample=True)

print(f"üìù Prompt: {prompt}")
print(f"ü§ñ Vygenerov√°no: {vysledek[0]['generated_text']}")

In [None]:
# P≈ô√≠klad 3: Sumarizace textu
summarizer = pipeline("summarization", model="facebook/bart-large-cnn", device=0 if device=="cuda" else -1)

dlouhy_text = """
Artificial intelligence (AI) is transforming businesses across all industries. 
Companies are using AI to automate repetitive tasks, analyze large datasets, 
and make better decisions. Machine learning models can predict customer behavior, 
optimize supply chains, and detect fraud. Natural language processing enables 
chatbots and virtual assistants to handle customer inquiries 24/7. Computer vision 
is used in quality control, security systems, and autonomous vehicles. The adoption 
of AI is accelerating, with more organizations investing in AI capabilities to 
stay competitive in the digital economy.
"""

souhrn = summarizer(dlouhy_text, max_length=60, min_length=20, do_sample=False)
print(f"üìÑ P≈Øvodn√≠ text ({len(dlouhy_text)} znak≈Ø)")
print(f"üìã Souhrn: {souhrn[0]['summary_text']}")

## 2. Pr√°ce s tokenizery

Tokenizer p≈ôev√°d√≠ text na ƒç√≠sla (tokeny), kter√© model zpracuje.

In [None]:
# Naƒçten√≠ tokenizeru
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

text = "Hello, how are you doing today?"

# Tokenizace
tokens = tokenizer.tokenize(text)
print(f"üìù Text: {text}")
print(f"üî§ Tokeny: {tokens}")

# P≈ôevod na ID
ids = tokenizer.encode(text)
print(f"üî¢ Token IDs: {ids}")

# Zpƒõt na text
decoded = tokenizer.decode(ids)
print(f"üìñ Dek√≥dov√°no: {decoded}")

In [None]:
# Pokroƒçil√° tokenizace pro model
encoded = tokenizer(
    text,
    padding=True,
    truncation=True,
    max_length=512,
    return_tensors="pt"  # PyTorch tensory
)

print("üì¶ V√Ωstup tokenizeru:")
for key, value in encoded.items():
    print(f"   {key}: {value.shape}")

## 3. P≈ô√≠m√° pr√°ce s modelem

In [None]:
from transformers import AutoModelForSequenceClassification

# Naƒçten√≠ modelu pro klasifikaci
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# P≈ôesun na GPU pokud je dostupn√°
model = model.to(device)

# Manu√°ln√≠ inference
text = "I absolutely love this product!"
inputs = tokenizer(text, return_tensors="pt").to(device)

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.softmax(outputs.logits, dim=-1)

labels = ["NEGATIVE", "POSITIVE"]
for i, prob in enumerate(predictions[0]):
    print(f"{labels[i]}: {prob:.2%}")

## 4. Praktick√° automatizace: Hromadn√© zpracov√°n√≠

In [None]:
import pandas as pd

# Simulace dat z e-shopu
recenze = pd.DataFrame({
    "id": [1, 2, 3, 4, 5],
    "produkt": ["Notebook", "My≈°", "Kl√°vesnice", "Monitor", "Sluch√°tka"],
    "text": [
        "Great laptop, very fast and reliable!",
        "The mouse stopped working after a week.",
        "Perfect keyboard for programming.",
        "Average quality, nothing special.",
        "Best headphones I've ever owned!"
    ]
})

# Hromadn√° anal√Ωza sentimentu
classifier = pipeline("sentiment-analysis", device=0 if device=="cuda" else -1)
results = classifier(recenze["text"].tolist())

recenze["sentiment"] = [r["label"] for r in results]
recenze["confidence"] = [r["score"] for r in results]

print("üìä Automatick√° anal√Ωza recenz√≠:")
print(recenze.to_string(index=False))

In [None]:
# Statistiky
print("\nüìà Statistiky:")
print(f"   Pozitivn√≠ recenze: {(recenze['sentiment'] == 'POSITIVE').sum()}")
print(f"   Negativn√≠ recenze: {(recenze['sentiment'] == 'NEGATIVE').sum()}")
print(f"   Pr≈Ømƒõrn√° jistota: {recenze['confidence'].mean():.2%}")

## 5. Dostupn√© √∫lohy v pipeline

| √öloha | Popis |
|-------|-------|
| `text-classification` | Klasifikace textu |
| `token-classification` | NER, POS tagging |
| `question-answering` | Odpov√≠d√°n√≠ na ot√°zky |
| `summarization` | Sumarizace |
| `translation` | P≈ôeklad |
| `text-generation` | Generov√°n√≠ textu |
| `fill-mask` | Dopl≈àov√°n√≠ slov |
| `zero-shot-classification` | Klasifikace bez tr√©ninku |
| `image-classification` | Klasifikace obr√°zk≈Ø |
| `object-detection` | Detekce objekt≈Ø |
| `automatic-speech-recognition` | P≈ôevod ≈ôeƒçi na text |

---
## üèÅ Shrnut√≠

V tomto notebooku jsme se nauƒçili:
- ‚úÖ Pou≈æ√≠vat `pipeline` pro rychl√© experimenty
- ‚úÖ Pracovat s tokenizery
- ‚úÖ Naƒç√≠tat a pou≈æ√≠vat modely p≈ô√≠mo
- ‚úÖ Automatizovat zpracov√°n√≠ dat

**Dal≈°√≠ notebook:** Textov√° klasifikace a NER