# üìù Sumarizace a generov√°n√≠ textu

**Autor:** Praut s.r.o. - AI Integration & Business Automation

## Co se nauƒç√≠te:
- Automatick√° sumarizace dokument≈Ø
- Generov√°n√≠ textu s r≈Øzn√Ωmi modely
- Tvorba report≈Ø a shrnut√≠
- Automatizace tvorby obsahu

In [None]:
!pip install -q transformers accelerate torch sentencepiece

In [None]:
from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM
import torch

device = 0 if torch.cuda.is_available() else -1
print(f"üñ•Ô∏è Device: {'GPU' if device == 0 else 'CPU'}")

## 1. Extraktivn√≠ vs. Abstraktivn√≠ sumarizace

- **Extraktivn√≠**: Vyb√≠r√° kl√≠ƒçov√© vƒõty z p≈Øvodn√≠ho textu
- **Abstraktivn√≠**: Generuje nov√Ω text shrnuj√≠c√≠ obsah

In [None]:
# BART model pro abstraktivn√≠ sumarizaci
summarizer = pipeline("summarization", 
                      model="facebook/bart-large-cnn",
                      device=device)

clanek = """
Artificial intelligence is rapidly transforming the business landscape across industries. 
Companies are increasingly adopting AI technologies to streamline operations, enhance 
customer experiences, and gain competitive advantages. Machine learning algorithms can 
analyze vast amounts of data to identify patterns and make predictions that would be 
impossible for humans to detect manually.

In the manufacturing sector, AI-powered robots are improving production efficiency and 
reducing errors. Predictive maintenance systems can anticipate equipment failures before 
they occur, saving companies millions in downtime costs. Quality control processes are 
being automated using computer vision, ensuring consistent product standards.

The financial industry is leveraging AI for fraud detection, risk assessment, and 
algorithmic trading. Banks are using chatbots to handle routine customer inquiries, 
freeing up human agents for more complex issues. Personalized financial advice is 
being delivered through AI-driven robo-advisors.

Healthcare is another sector seeing significant AI adoption. Diagnostic systems can 
analyze medical images with accuracy matching or exceeding human specialists. Drug 
discovery processes are being accelerated through AI simulations. Patient monitoring 
systems can detect anomalies and alert medical staff in real-time.
"""

souhrn = summarizer(clanek, max_length=100, min_length=30, do_sample=False)

print(f"üìÑ P≈Øvodn√≠ text: {len(clanek)} znak≈Ø")
print(f"üìã Souhrn: {len(souhrn[0]['summary_text'])} znak≈Ø\n")
print(f"‚ú® {souhrn[0]['summary_text']}")

## 2. R≈Øzn√© d√©lky sumarizace

In [None]:
# Kr√°tk√Ω souhrn (1-2 vƒõty)
kratky = summarizer(clanek, max_length=50, min_length=20)
print("üìå KR√ÅTK√ù souhrn:")
print(f"   {kratky[0]['summary_text']}\n")

# St≈ôedn√≠ souhrn
stredni = summarizer(clanek, max_length=100, min_length=50)
print("üìù ST≈òEDN√ç souhrn:")
print(f"   {stredni[0]['summary_text']}\n")

# Dlouh√Ω souhrn
dlouhy = summarizer(clanek, max_length=200, min_length=100)
print("üìÑ DLOUH√ù souhrn:")
print(f"   {dlouhy[0]['summary_text']}")

## 3. T5 model - flexibilnƒõj≈°√≠ sumarizace

In [None]:
# T5 model
t5_summarizer = pipeline("summarization", 
                         model="t5-base",
                         device=device)

# T5 pou≈æ√≠v√° prefix "summarize:"
text_pro_t5 = clanek

t5_souhrn = t5_summarizer(text_pro_t5, max_length=80, min_length=20)
print("üî∑ T5 souhrn:")
print(f"   {t5_souhrn[0]['summary_text']}")

## 4. Generov√°n√≠ textu

In [None]:
# GPT-2 pro generov√°n√≠ textu
generator = pipeline("text-generation", 
                     model="gpt2-medium",
                     device=device)

prompt = "The future of artificial intelligence in business will"

# Generov√°n√≠ s r≈Øzn√Ωmi parametry
print("üé≤ Kreativn√≠ generov√°n√≠ (vysok√° temperature):")
kreativni = generator(prompt, max_length=100, temperature=0.9, do_sample=True, num_return_sequences=1)
print(f"   {kreativni[0]['generated_text']}\n")

print("üéØ Konzervativn√≠ generov√°n√≠ (n√≠zk√° temperature):")
konzervativni = generator(prompt, max_length=100, temperature=0.3, do_sample=True, num_return_sequences=1)
print(f"   {konzervativni[0]['generated_text']}")

In [None]:
# V√≠ce variant
print("üìö V√≠cen√°sobn√© generov√°n√≠ (3 varianty):\n")

varianty = generator(
    "Automation in business helps companies to",
    max_length=60,
    num_return_sequences=3,
    do_sample=True,
    temperature=0.7
)

for i, var in enumerate(varianty, 1):
    print(f"Varianta {i}: {var['generated_text']}\n")

## 5. Praktick√° automatizace: Gener√°tor newsletteru

In [None]:
def generuj_newsletter_sekci(tema, body):
    """Generuje sekci newsletteru z bullet point≈Ø."""
    
    # Shrnut√≠ bod≈Ø
    text = f"{tema}. " + " ".join(body)
    souhrn = summarizer(text, max_length=100, min_length=30)
    
    return souhrn[0]['summary_text']

# Data pro newsletter
newsletter_data = {
    "AI novinky": [
        "OpenAI released GPT-4 with improved reasoning capabilities.",
        "Google announced Gemini multimodal AI model.",
        "Meta open-sourced Llama 3 for research and commercial use."
    ],
    "Produktov√© updaty": [
        "New dashboard with real-time analytics is now available.",
        "Integration with Slack and Teams has been improved.",
        "Mobile app now supports offline mode."
    ]
}

print("üì∞ AUTOMATICKY GENEROVAN√ù NEWSLETTER\n")
print("=" * 50)

for sekce, body in newsletter_data.items():
    souhrn = generuj_newsletter_sekci(sekce, body)
    print(f"\nüìå {sekce.upper()}")
    print(f"   {souhrn}")

## 6. Sumarizace dlouh√Ωch dokument≈Ø

In [None]:
def sumarizuj_dlouhy_dokument(text, max_chunk_length=1000):
    """Sumarizuje dlouh√Ω dokument po ƒç√°stech."""
    
    # Rozdƒõlen√≠ na ƒç√°sti (paragraphs)
    paragrafy = [p.strip() for p in text.split('\n\n') if p.strip()]
    
    # Slouƒçen√≠ do chunk≈Ø
    chunky = []
    current_chunk = ""
    
    for para in paragrafy:
        if len(current_chunk) + len(para) < max_chunk_length:
            current_chunk += para + " "
        else:
            if current_chunk:
                chunky.append(current_chunk.strip())
            current_chunk = para + " "
    
    if current_chunk:
        chunky.append(current_chunk.strip())
    
    # Sumarizace ka≈æd√©ho chunku
    souhrny = []
    for i, chunk in enumerate(chunky):
        if len(chunk) > 100:  # Minim√°ln√≠ d√©lka pro sumarizaci
            souhrn = summarizer(chunk, max_length=60, min_length=20)
            souhrny.append(souhrn[0]['summary_text'])
            print(f"   ‚úì ƒå√°st {i+1}/{len(chunky)} zpracov√°na")
    
    # Fin√°ln√≠ souhrn
    if len(souhrny) > 1:
        combined = " ".join(souhrny)
        final = summarizer(combined, max_length=150, min_length=50)
        return final[0]['summary_text']
    elif souhrny:
        return souhrny[0]
    return "Text je p≈ô√≠li≈° kr√°tk√Ω pro sumarizaci."

# Test na dlouh√©m dokumentu
dlouhy_dokument = clanek * 3  # Simulace del≈°√≠ho textu

print(f"üìÑ Zpracov√°v√°m dokument ({len(dlouhy_dokument)} znak≈Ø)...\n")
vysledek = sumarizuj_dlouhy_dokument(dlouhy_dokument)
print(f"\nüìã Fin√°ln√≠ souhrn:\n{vysledek}")

## 7. Automatick√© generov√°n√≠ report≈Ø

In [None]:
import pandas as pd

def generuj_report(data_dict):
    """Generuje textov√Ω report z dat."""
    
    # Konverze dat na text
    text_parts = []
    for metrika, hodnota in data_dict.items():
        text_parts.append(f"The {metrika} is {hodnota}.")
    
    raw_text = " ".join(text_parts)
    
    # Generov√°n√≠ interpretace
    prompt = f"Business report summary: {raw_text} In conclusion,"
    
    report = generator(prompt, max_length=150, do_sample=True, temperature=0.5)
    
    return report[0]['generated_text']

# Simulace business dat
mesicni_data = {
    "revenue": "$1.2 million, up 15% from last month",
    "customer satisfaction score": "4.5 out of 5",
    "new customers acquired": "250, exceeding target by 20%",
    "employee retention rate": "95%",
    "operational efficiency": "improved by 12%"
}

print("üìä MƒöS√çƒåN√ç BUSINESS REPORT\n")
print("=" * 50)
print("\nüìà Kl√≠ƒçov√© metriky:")
for metrika, hodnota in mesicni_data.items():
    print(f"   ‚Ä¢ {metrika.title()}: {hodnota}")

print("\nüìù Automatick√Ω souhrn:")
report = generuj_report(mesicni_data)
print(report)

## 8. Parafr√°zov√°n√≠ textu

In [None]:
# PEGASUS pro parafr√°zov√°n√≠
paraphraser = pipeline("text2text-generation", 
                       model="tuner007/pegasus_paraphrase",
                       device=device)

originalni = "The company achieved significant growth in the last quarter due to increased market demand."

parafr√°ze = paraphraser(originalni, num_return_sequences=3, num_beams=5, max_length=60)

print(f"üìù Origin√°l: {originalni}\n")
print("üîÑ Parafr√°ze:")
for i, p in enumerate(parafr√°ze, 1):
    print(f"   {i}. {p['generated_text']}")

---
## üèÅ Shrnut√≠

- ‚úÖ BART a T5 pro sumarizaci
- ‚úÖ GPT-2 pro generov√°n√≠ textu
- ‚úÖ Zpracov√°n√≠ dlouh√Ωch dokument≈Ø po ƒç√°stech
- ‚úÖ Automatick√© generov√°n√≠ report≈Ø a newsletter≈Ø
- ‚úÖ Parafr√°zov√°n√≠ textu

**Dal≈°√≠ notebook:** P≈ôeklad a v√≠cejazyƒçn√© modely