# 🚀 Text Summarization MVP with Export Features

## Enhanced Version with Multiple Export Formats

This notebook provides a complete text summarization application with professional export capabilities.

### ✨ Features:
- 🤖 Multiple AI models (BART, T5, Pegasus)
- 📄 **Markdown export** for documentation
- 📊 **JSON export** for APIs and data processing
- 🎵 **Audio export** (Text-to-Speech in 10 languages)
- 📑 **PDF export** for professional reports
- ⚡ Intelligent caching system
- 📊 Detailed statistics
- 🎯 Customizable summary length

### 📋 Requirements:
- Python 3.8+
- Works in Google Colab ✅
- Works in VS Code ✅
- Works in Jupyter Lab ✅

## Step 1: Install Required Dependencies

Run this cell to install all necessary packages including export libraries.

In [30]:
# Install all required packages
import sys
import subprocess

# Core packages
packages = [
    'gradio',
    'transformers',
    'torch',
    'sentencepiece',
    'protobuf',
    'accelerate',
    'gtts',  # Text-to-speech
    'pydub',  # Audio processing
    'markdown',  # Markdown processing
    'reportlab'  # PDF generation
]

print("📦 Installing packages...")
for package in packages:
    subprocess.check_call([sys.executable, '-m', 'pip', 'install', package, '-q'])

print("✅ All packages installed successfully!")

# For Google Colab audio support (optional)
try:
    import google.colab
    !apt-get install -y ffmpeg > /dev/null 2>&1
    print("✅ FFmpeg installed for Colab audio support")
except:
    pass  # Not in Colab

📦 Installing packages...
✅ All packages installed successfully!
✅ FFmpeg installed for Colab audio support


## Step 2: Import Libraries and Setup

In [31]:
# Import all required libraries
import os
import sys
import time
import logging
import warnings
from typing import Dict, Tuple, Optional, List
from datetime import datetime
import hashlib
import json
import base64
from pathlib import Path

# Suppress warnings
warnings.filterwarnings('ignore')

# Machine Learning libraries
import gradio as gr
import torch
from transformers import (
    AutoTokenizer,
    AutoModelForSeq2SeqLM,
    pipeline,
    logging as transformers_logging
)

# Export libraries
from gtts import gTTS
import markdown
from reportlab.lib.pagesizes import letter
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
from reportlab.lib.units import inch
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer
from reportlab.lib.enums import TA_JUSTIFY, TA_CENTER

# Set transformers logging to error only
transformers_logging.set_verbosity_error()

# Print versions
print("📦 Package Versions:")
print(f"  - Python: {sys.version.split()[0]}")
print(f"  - PyTorch: {torch.__version__}")
print(f"  - Transformers: {__import__('transformers').__version__}")
print(f"  - Gradio: {gr.__version__}")
print(f"\n🖥️ System:")
print(f"  - Device: {'CUDA' if torch.cuda.is_available() else 'CPU'}")
if torch.cuda.is_available():
    print(f"  - GPU: {torch.cuda.get_device_name(0)}")

📦 Package Versions:
  - Python: 3.12.12
  - PyTorch: 2.8.0+cu126
  - Transformers: 4.57.1
  - Gradio: 5.49.1

🖥️ System:
  - Device: CPU


## Step 3: Configuration

Central configuration for models, export settings, and system parameters.

In [32]:
class Config:
    """
    Central configuration class for the application.
    Includes model settings and export configurations.
    """

    # Hugging Face Token (Optional)
    HF_TOKEN = os.environ.get('HF_TOKEN', None)

    # Export settings
    EXPORT_DIR = "./exports"
    AUDIO_LANGUAGES = {
        "English": "en",
        "Spanish": "es",
        "French": "fr",
        "German": "de",
        "Italian": "it",
        "Portuguese": "pt",
        "Hindi": "hi",
        "Chinese": "zh",
        "Japanese": "ja",
        "Korean": "ko"
    }

    # Model configurations
    MODELS = {
        "facebook/bart-large-cnn": {
            "name": "BART (CNN/DailyMail)",
            "description": "Best for news articles",
            "max_input_length": 1024,
            "max_output_length": 142,
            "min_output_length": 56
        },
        "t5-small": {
            "name": "T5 Small (Fast)",
            "description": "Lightweight and fast",
            "max_input_length": 512,
            "max_output_length": 150,
            "min_output_length": 40,
            "prefix": "summarize: "
        },
        "t5-base": {
            "name": "T5 Base (Balanced)",
            "description": "Good balance",
            "max_input_length": 512,
            "max_output_length": 150,
            "min_output_length": 40,
            "prefix": "summarize: "
        },
        "google/pegasus-xsum": {
            "name": "Pegasus XSUM",
            "description": "Abstractive summaries",
            "max_input_length": 512,
            "max_output_length": 128,
            "min_output_length": 30
        }
    }

    # System settings
    CACHE_DIR = "./model_cache"
    ENABLE_CACHE = True
    MAX_CACHE_SIZE = 100
    DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
    NUM_BEAMS = 4

# Create directories
os.makedirs(Config.EXPORT_DIR, exist_ok=True)
os.makedirs(Config.CACHE_DIR, exist_ok=True)

print("✅ Configuration loaded")
print(f"📁 Export directory: {Config.EXPORT_DIR}")
print(f"📁 Cache directory: {Config.CACHE_DIR}")
print(f"🖥️ Device: {Config.DEVICE}")
print(f"🤖 Available models: {len(Config.MODELS)}")

✅ Configuration loaded
📁 Export directory: ./exports
📁 Cache directory: ./model_cache
🖥️ Device: cpu
🤖 Available models: 4


## Step 4: Export Manager

Handles exporting summaries to Markdown, JSON, Audio, and PDF formats.

In [33]:
class ExportManager:
    """
    Manages export functionality for multiple formats.
    """

    def __init__(self):
        self.export_dir = Config.EXPORT_DIR
        os.makedirs(self.export_dir, exist_ok=True)

    def generate_filename(self, extension: str) -> str:
        """Generate unique filename with timestamp."""
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        return f"summary_{timestamp}.{extension}"

    def export_to_markdown(self, original_text: str, summary: str,
                          statistics: Dict, model_name: str) -> str:
        """Export to Markdown format."""
        filename = self.generate_filename("md")
        filepath = os.path.join(self.export_dir, filename)

        md_content = f"""# Text Summary Report

## Metadata
- **Date**: {datetime.now().strftime("%Y-%m-%d %H:%M:%S")}
- **Model**: {Config.MODELS[model_name]['name']}
- **Compression**: {statistics['compression_ratio']}

## Generated Summary
{summary}

## Statistics
- Original: {statistics['original_words']} words
- Summary: {statistics['summary_words']} words

## Original Text
<details>
<summary>Click to expand</summary>

{original_text}

</details>

---
*Generated using Text Summarization MVP*
"""

        with open(filepath, 'w', encoding='utf-8') as f:
            f.write(md_content)

        return filepath

    def export_to_json(self, original_text: str, summary: str,
                      statistics: Dict, model_name: str) -> str:
        """Export to JSON format."""
        filename = self.generate_filename("json")
        filepath = os.path.join(self.export_dir, filename)

        json_data = {
            "metadata": {
                "timestamp": datetime.now().isoformat(),
                "model": {
                    "id": model_name,
                    "name": Config.MODELS[model_name]['name']
                },
                "version": "1.0.0"
            },
            "content": {
                "original_text": original_text,
                "summary": summary
            },
            "statistics": statistics
        }

        with open(filepath, 'w', encoding='utf-8') as f:
            json.dump(json_data, f, ensure_ascii=False, indent=2)

        return filepath

    def export_to_audio(self, text: str, language: str = "en",
                       slow: bool = False) -> str:
        """Export to audio using text-to-speech."""
        try:
            filename = self.generate_filename("mp3")
            filepath = os.path.join(self.export_dir, filename)

            # Create audio
            tts = gTTS(text=text, lang=language, slow=slow)
            tts.save(filepath)

            return filepath
        except Exception as e:
            raise RuntimeError(f"Audio export failed: {str(e)}")

    def export_to_pdf(self, original_text: str, summary: str,
                     statistics: Dict, model_name: str) -> str:
        """Export to PDF format."""
        filename = self.generate_filename("pdf")
        filepath = os.path.join(self.export_dir, filename)

        try:
            doc = SimpleDocTemplate(filepath, pagesize=letter)
            elements = []
            styles = getSampleStyleSheet()

            # Title
            title_style = ParagraphStyle(
                'CustomTitle',
                parent=styles['Heading1'],
                alignment=TA_CENTER,
                spaceAfter=30
            )
            elements.append(Paragraph("Text Summary Report", title_style))
            elements.append(Spacer(1, 12))

            # Metadata
            metadata = f"""
            <b>Date:</b> {datetime.now().strftime("%Y-%m-%d %H:%M:%S")}<br/>
            <b>Model:</b> {Config.MODELS[model_name]['name']}<br/>
            <b>Compression:</b> {statistics['compression_ratio']}
            """
            elements.append(Paragraph(metadata, styles['Normal']))
            elements.append(Spacer(1, 20))

            # Summary
            elements.append(Paragraph("<b>Generated Summary</b>", styles['Heading2']))
            elements.append(Spacer(1, 12))
            elements.append(Paragraph(summary, styles['BodyText']))

            doc.build(elements)
            return filepath

        except Exception as e:
            raise RuntimeError(f"PDF export failed: {str(e)}")

# Initialize export manager
export_manager = ExportManager()
print("✅ Export Manager initialized")
print("📝 Supported formats: Markdown, JSON, Audio, PDF")

✅ Export Manager initialized
📝 Supported formats: Markdown, JSON, Audio, PDF


## Step 5: Cache Manager

Handles caching of summaries for improved performance.

In [34]:
class CacheManager:
    """Manages caching of summaries."""

    def __init__(self, cache_dir: str = Config.CACHE_DIR):
        self.cache_dir = cache_dir
        self.cache_file = os.path.join(cache_dir, "summary_cache.json")
        self.cache = {}
        os.makedirs(cache_dir, exist_ok=True)
        self.load_cache()

    def load_cache(self):
        """Load existing cache."""
        if os.path.exists(self.cache_file):
            try:
                with open(self.cache_file, 'r', encoding='utf-8') as f:
                    self.cache = json.load(f)
                print(f"📂 Loaded {len(self.cache)} cached summaries")
            except:
                self.cache = {}

    def save_cache(self):
        """Save cache to file."""
        try:
            with open(self.cache_file, 'w', encoding='utf-8') as f:
                json.dump(self.cache, f, ensure_ascii=False, indent=2)
        except:
            pass

    def get_cache_key(self, text: str, model_name: str, max_length: int) -> str:
        """Generate cache key."""
        content = f"{text}_{model_name}_{max_length}"
        return hashlib.md5(content.encode()).hexdigest()

    def get(self, text: str, model_name: str, max_length: int) -> Optional[str]:
        """Get cached summary."""
        if not Config.ENABLE_CACHE:
            return None
        key = self.get_cache_key(text, model_name, max_length)
        return self.cache.get(key)

    def set(self, text: str, model_name: str, max_length: int, summary: str):
        """Store summary in cache."""
        if not Config.ENABLE_CACHE:
            return

        if len(self.cache) >= Config.MAX_CACHE_SIZE:
            oldest_key = next(iter(self.cache))
            del self.cache[oldest_key]

        key = self.get_cache_key(text, model_name, max_length)
        self.cache[key] = summary
        self.save_cache()

cache_manager = CacheManager()
print("✅ Cache Manager initialized")

📂 Loaded 2 cached summaries
✅ Cache Manager initialized


## Step 6: Model Manager

Handles loading and managing AI models.

In [35]:
class ModelManager:
    """Manages loading and caching of AI models."""

    def __init__(self):
        self.loaded_models = {}
        print(f"🤖 ModelManager initialized on {Config.DEVICE}")

    def load_model(self, model_id: str) -> Tuple:
        """Load model and tokenizer."""
        if model_id in self.loaded_models:
            print(f"📂 Using cached model: {model_id}")
            return self.loaded_models[model_id]

        try:
            print(f"⏳ Loading model: {model_id}...")

            tokenizer = AutoTokenizer.from_pretrained(
                model_id,
                use_auth_token=Config.HF_TOKEN,
                cache_dir=Config.CACHE_DIR
            )

            model = AutoModelForSeq2SeqLM.from_pretrained(
                model_id,
                use_auth_token=Config.HF_TOKEN,
                cache_dir=Config.CACHE_DIR,
                device_map="auto" if Config.DEVICE == "cuda" else None,
                torch_dtype=torch.float16 if Config.DEVICE == "cuda" else torch.float32
            )

            if Config.DEVICE == "cpu":
                model = model.to(Config.DEVICE)

            self.loaded_models[model_id] = (tokenizer, model)
            print(f"✅ Model loaded: {Config.MODELS[model_id]['name']}")
            return tokenizer, model

        except Exception as e:
            raise RuntimeError(f"Failed to load model: {str(e)}")

    def clear_cache(self):
        """Clear loaded models."""
        self.loaded_models.clear()
        if Config.DEVICE == "cuda":
            torch.cuda.empty_cache()
        print("🗑️ Model cache cleared")

model_manager = ModelManager()
print("✅ Model Manager ready")

🤖 ModelManager initialized on cpu
✅ Model Manager ready


## Step 7: Text Processing Utilities

In [36]:
class TextProcessor:
    """Text preprocessing and postprocessing."""

    @staticmethod
    def preprocess(text: str) -> str:
        """Clean input text."""
        text = ' '.join(text.split())

        if len(text) < 50:
            raise ValueError("Text too short! Minimum 50 characters.")

        if len(text) > 50000:
            text = text[:50000]

        return text

    @staticmethod
    def postprocess(summary: str) -> str:
        """Clean generated summary."""
        summary = ' '.join(summary.split())

        if summary and summary[0].islower():
            summary = summary[0].upper() + summary[1:]

        if summary and summary[-1] not in '.!?':
            summary += '.'

        return summary

    @staticmethod
    def calculate_statistics(original: str, summary: str) -> Dict:
        """Calculate statistics."""
        original_words = len(original.split())
        summary_words = len(summary.split())
        compression = (1 - summary_words / original_words) * 100 if original_words > 0 else 0

        return {
            "original_length": len(original),
            "original_words": original_words,
            "summary_length": len(summary),
            "summary_words": summary_words,
            "compression_ratio": f"{compression:.1f}%"
        }

text_processor = TextProcessor()
print("✅ Text Processor ready")

✅ Text Processor ready


## Step 8: Summarization Engine

Core engine that orchestrates the summarization process.

In [37]:
class SummarizationEngine:
    """Core summarization engine."""

    def __init__(self):
        self.model_manager = model_manager
        self.text_processor = text_processor
        self.cache_manager = cache_manager
        self.export_manager = export_manager
        self.last_result = None

    def summarize(self, text: str, model_name: str,
                 max_length_ratio: float = 0.3) -> Tuple:
        """Generate summary."""
        start_time = time.time()

        try:
            if not text:
                raise ValueError("Please provide text to summarize.")

            if model_name not in Config.MODELS:
                raise ValueError(f"Unknown model: {model_name}")

            # Preprocess
            text = self.text_processor.preprocess(text)
            model_config = Config.MODELS[model_name]

            # Calculate max length
            input_words = len(text.split())
            max_length = min(
                int(input_words * max_length_ratio),
                model_config["max_output_length"]
            )
            max_length = max(max_length, model_config["min_output_length"])

            # Check cache
            cached_summary = self.cache_manager.get(text, model_name, max_length)
            if cached_summary:
                print("⚡ Using cached summary")
                stats = self.text_processor.calculate_statistics(text, cached_summary)
                processing_time = time.time() - start_time

                self.last_result = {
                    "original_text": text,
                    "summary": cached_summary,
                    "statistics": stats,
                    "model_name": model_name
                }
                return cached_summary, stats, processing_time

            # Load model
            tokenizer, model = self.model_manager.load_model(model_name)

            # Add prefix if needed
            if "prefix" in model_config:
                text_input = model_config["prefix"] + text
            else:
                text_input = text

            # Tokenize
            inputs = tokenizer(
                text_input,
                max_length=model_config["max_input_length"],
                truncation=True,
                return_tensors="pt"
            ).to(Config.DEVICE)

            print(f"⏳ Generating summary...")

            # Generate
            with torch.no_grad():
                summary_ids = model.generate(
                    inputs["input_ids"],
                    max_length=max_length,
                    min_length=model_config["min_output_length"],
                    num_beams=Config.NUM_BEAMS,
                    length_penalty=2.0,
                    early_stopping=True,
                    do_sample=False
                )

            # Decode
            summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
            summary = self.text_processor.postprocess(summary)

            # Cache result
            self.cache_manager.set(text, model_name, max_length, summary)

            # Calculate statistics
            stats = self.text_processor.calculate_statistics(text, summary)

            processing_time = time.time() - start_time
            print(f"✅ Summary generated in {processing_time:.2f} seconds")

            # Store for export
            self.last_result = {
                "original_text": text,
                "summary": summary,
                "statistics": stats,
                "model_name": model_name
            }

            return summary, stats, processing_time

        except Exception as e:
            raise RuntimeError(f"Summarization failed: {str(e)}")

# Initialize engine
engine = SummarizationEngine()
print("✅ Summarization Engine ready")

✅ Summarization Engine ready


## Step 9: Test the Engine

Let's test the summarization with a sample text.

In [38]:
# Test with sample text
sample_text = """
Artificial Intelligence has become one of the most transformative technologies
of our time. From virtual assistants to self-driving cars, AI is reshaping
how we live and work. Machine learning algorithms power recommendation systems,
detect fraud, and assist in medical diagnostics. Natural language processing
enables human-like text generation and understanding. As AI continues to evolve,
ethical considerations around bias, privacy, and employment become increasingly
important for responsible development.
"""

try:
    # Generate summary using T5-small
    summary, stats, time_taken = engine.summarize(
        text=sample_text,
        model_name="t5-small",
        max_length_ratio=0.3
    )

    print("\n" + "="*60)
    print("📄 SUMMARY:")
    print(summary)
    print("\n📊 STATISTICS:")
    for key, value in stats.items():
        print(f"  {key}: {value}")
    print(f"\n⏱️ Time: {time_taken:.2f} seconds")
    print("="*60)

except Exception as e:
    print(f"❌ Error: {e}")

⚡ Using cached summary

📄 SUMMARY:
Artificial intelligence has become one of the most transformative technologies of our time. from virtual assistants to self-driving cars, AI is reshaping how we live and.

📊 STATISTICS:
  original_length: 510
  original_words: 67
  summary_length: 170
  summary_words: 26
  compression_ratio: 61.2%

⏱️ Time: 0.00 seconds


## Step 10: Export Functions

Test the export functionality with different formats.

In [50]:
# Export functions for the interface
def export_markdown():
    """Export to Markdown."""
    if not engine.last_result:
        return None, "❌ No summary to export. Generate a summary first."

    try:
        filepath = engine.export_manager.export_to_markdown(
            **engine.last_result
        )
        # Return filepath instead of auto-downloading
        return filepath, f"✅ Exported to: {filepath}"
    except Exception as e:
        return None, f"❌ Export failed: {str(e)}"


def export_json():
    """Export to JSON."""
    if not engine.last_result:
        return None, "❌ No summary to export. Generate a summary first."

    try:
        filepath = engine.export_manager.export_to_json(
            **engine.last_result
        )
        # Return filepath instead of auto-downloading
        return filepath, f"✅ Exported to: {filepath}"
    except Exception as e:
        return None, f"❌ Export failed: {str(e)}"

def export_audio(language: str, slow: bool):
    """Export to audio."""
    if not engine.last_result:
        # Return None for audio_output and audio_file, and an error message for audio_status
        return None, None, "❌ No summary to export."

    try:
        lang_code = Config.AUDIO_LANGUAGES.get(language, "en")
        filepath = engine.export_manager.export_to_audio(
            text=engine.last_result["summary"],
            language=lang_code,
            slow=slow
        )
        # Return the filepath for both the audio playback and download link, and a success message
        return filepath, filepath, f"✅ Exported to: {filepath}"

    except Exception as e:
        # Return None for audio_output and audio_file, and an error message for audio_status
        return None, None, f"❌ Export failed: {str(e)}"


def export_pdf():
    """Export to PDF."""
    if not engine.last_result:
        return None, "❌ No summary to export. Generate a summary first."

    try:
        filepath = engine.export_manager.export_to_pdf(
            **engine.last_result
        )
        # Return filepath instead of auto-downloading
        return filepath, f"✅ Exported to: {filepath}"
    except Exception as e:
        return None, f"❌ Export failed: {str(e)}"


# Test exports (if summary was generated)
if engine.last_result:
    print("\n📦 Testing export functions:")
    # Update test calls to match new return values
    md_file, md_status = export_markdown()
    print(md_status)
    json_file, json_status = export_json()
    print(json_status)
    pdf_file, pdf_status = export_pdf()
    print(pdf_status)
    audio_output_test, audio_file_test, audio_status_test = export_audio("English", False)
    print(audio_status_test)


📦 Testing export functions:
✅ Exported to: ./exports/summary_20251028_185802.md
✅ Exported to: ./exports/summary_20251028_185802.json
✅ Exported to: ./exports/summary_20251028_185802.pdf
✅ Exported to: ./exports/summary_20251028_185802.mp3


## Step 11: Create Gradio Interface with Export Features

Build the complete web interface with export functionality.

In [51]:
def process_summary(text: str, model_name: str, summary_length: float) -> Tuple:
    """Process summarization request."""
    try:
        summary, stats, time_taken = engine.summarize(
            text=text,
            model_name=model_name,
            max_length_ratio=summary_length
        )

        stats_html = f"""
        <div style='background-color: #f0f0f0; padding: 10px; border-radius: 5px;'>
            <h4>📊 Summary Statistics</h4>
            <ul>
                <li><strong>Original:</strong> {stats['original_words']} words</li>
                <li><strong>Summary:</strong> {stats['summary_words']} words</li>
                <li><strong>Compression:</strong> {stats['compression_ratio']}</li>
                <li><strong>Time:</strong> {time_taken:.2f} seconds</li>
                <li><strong>Model:</strong> {Config.MODELS[model_name]['name']}</li>
            </ul>
        </div>
        """

        status = "✅ Summary generated! You can now export it."
        return summary, stats_html, status

    except Exception as e:
        error_html = f"<div style='color: red;'>❌ Error: {str(e)}</div>"
        return "", error_html, f"❌ Error: {str(e)}"

print("✅ Interface functions ready")

✅ Interface functions ready


In [52]:
# Create the Gradio interface with export features
with gr.Blocks(title="Text Summarization with Export", theme=gr.themes.Soft()) as interface:

    # Header
    gr.Markdown("# 🚀 Text Summarization with Export Features")
    gr.Markdown("""
    Generate AI-powered summaries and export them in multiple formats:
    Markdown | JSON | Audio | PDF
    """)

    with gr.Tabs():
        # Summarize Tab
        with gr.TabItem("📝 Summarize"):
            with gr.Row():
                with gr.Column(scale=1):
                    model_dropdown = gr.Dropdown(
                        choices=list(Config.MODELS.keys()),
                        value="t5-small",
                        label="🤖 Select Model"
                    )

                    length_slider = gr.Slider(
                        minimum=0.1,
                        maximum=0.5,
                        value=0.3,
                        step=0.05,
                        label="📏 Summary Length"
                    )

                    text_input = gr.Textbox(
                        label="📝 Enter Text",
                        placeholder="Paste your text here...",
                        lines=15
                    )

                    submit_btn = gr.Button("🚀 Generate Summary", variant="primary")

                with gr.Column(scale=1):
                    status_output = gr.Textbox(
                        label="Status",
                        value="Ready",
                        interactive=False
                    )

                    summary_output = gr.Textbox(
                        label="📄 Summary",
                        lines=10,
                        interactive=False
                    )

                    stats_output = gr.HTML()

                    clear_btn = gr.Button("🗑️ Clear", variant="secondary")

        # Export Tab
        with gr.TabItem("💾 Export"):
            gr.Markdown("### Export Your Summary")
            gr.Markdown("Generate a summary first, then export:")

            with gr.Row():
                with gr.Column():
                    gr.Markdown("#### 📄 Markdown")
                    md_btn = gr.Button("Export as Markdown", variant="primary")
                    # Use gr.File to display downloadable link
                    md_file = gr.File(label="Download Markdown")
                    md_status = gr.Textbox(label="Status", interactive=False)


                with gr.Column():
                    gr.Markdown("#### 📊 JSON")
                    json_btn = gr.Button("Export as JSON", variant="primary")
                     # Use gr.File to display downloadable link
                    json_file = gr.File(label="Download JSON")
                    json_status = gr.Textbox(label="Status", interactive=False)


            with gr.Row():
                with gr.Column():
                    gr.Markdown("#### 🎵 Audio")
                    audio_lang = gr.Dropdown(
                        choices=list(Config.AUDIO_LANGUAGES.keys()),
                        value="English",
                        label="Language"
                    )
                    audio_slow = gr.Checkbox(label="Slow Speech", value=False)
                    audio_btn = gr.Button("Export as Audio", variant="primary")
                    # Use gr.Audio with type="filepath" to display and play audio
                    audio_output = gr.Audio(label="Audio Playback", type="filepath")
                     # Use gr.File to display downloadable link for audio
                    audio_file = gr.File(label="Download Audio")
                    audio_status = gr.Textbox(label="Status", interactive=False)

                with gr.Column():
                    gr.Markdown("#### 📑 PDF")
                    pdf_btn = gr.Button("Export as PDF", variant="primary")
                     # Use gr.File to display downloadable link
                    pdf_file = gr.File(label="Download PDF")
                    pdf_status = gr.Textbox(label="Status", interactive=False)


        # Help Tab
        with gr.TabItem("❓ Help"):
            gr.Markdown("""
            ### How to Use

            1. **Summarize**: Select model, adjust length, paste text, generate
            2. **Export**: Go to Export tab, choose format

            ### Export Formats
            - **Markdown**: Documentation, GitHub, blogs
            - **JSON**: APIs, data processing
            - **Audio**: Accessibility, listening
            - **PDF**: Professional reports

            Files are saved in `./exports/` directory.
            """)

    # Event handlers
    submit_btn.click(
        fn=process_summary,
        inputs=[text_input, model_dropdown, length_slider],
        outputs=[summary_output, stats_output, status_output]
    )

    clear_btn.click(
        fn=lambda: ("", "", "", "Ready"),
        outputs=[text_input, summary_output, stats_output, status_output]
    )

    # Export handlers
    md_btn.click(fn=export_markdown, outputs=[md_file, md_status])
    json_btn.click(fn=export_json, outputs=[json_file, json_status])
    audio_btn.click(
        fn=export_audio,
        inputs=[audio_lang, audio_slow],
        outputs=[audio_output, audio_file, audio_status]
    )
    pdf_btn.click(fn=export_pdf, outputs=[pdf_file, pdf_status])


print("✅ Gradio interface created with export features")

✅ Gradio interface created with export features


## Step 12: Launch the Application

Start the web interface. It will create a public link for sharing.

In [53]:
# Launch the application
print("🚀 Launching Text Summarization with Export Features...")
print("="*60)
print("📁 Exports will be saved to: ./exports/")
print("🌐 Creating web interface...\n")

# Launch with sharing enabled
interface.launch(
    share=True,  # Creates public link
    server_name="0.0.0.0",
    # server_port=7860, # Remove specific port to let Gradio find one
    show_error=True,
    quiet=False
)

# Note: The app will run until you stop this cell
# In Google Colab, the public URL will be displayed
# In VS Code/Jupyter, open http://localhost:7860

🚀 Launching Text Summarization with Export Features...
📁 Exports will be saved to: ./exports/
🌐 Creating web interface...

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://271cb1f8c3faec92c7.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




In [44]:
# ============================================
# Step 13: Download Exported Files (Google Colab)
# ============================================
# Run this cell AFTER you've generated and exported summaries

import os
import glob
from datetime import datetime

print("=" * 60)
print("📥 EXPORT FILE MANAGER FOR GOOGLE COLAB")
print("=" * 60)

# Check if running in Colab
try:
    from google.colab import files
    IN_COLAB = True
    print("✅ Google Colab detected")
except:
    IN_COLAB = False
    print("ℹ️ Not running in Colab")

def list_exports():
    """List all exported files"""
    export_dir = '/content/exports' if IN_COLAB else './exports'

    if os.path.exists(export_dir):
        files_list = glob.glob(f'{export_dir}/*')
        if files_list:
            print(f"\n📁 Found {len(files_list)} exported files:")
            for file in sorted(files_list):
                size = os.path.getsize(file) / 1024
                name = os.path.basename(file)
                print(f"  ✅ {name} ({size:.1f} KB)")
            return files_list
        else:
            print("⚠️ Exports folder exists but is empty")
    else:
        print(f"❌ No exports folder found at {export_dir}")
    return []

def download_all_exports():
    """Download all exported files to your computer"""
    if not IN_COLAB:
        print("ℹ️ Not in Colab - files are already on your computer")
        return

    from google.colab import files
    export_files = list_exports()

    if export_files:
        print("\n📥 Downloading files to your computer...")
        for filepath in export_files:
            try:
                files.download(filepath)
                print(f"  ⬇️ Downloaded: {os.path.basename(filepath)}")
            except Exception as e:
                print(f"  ❌ Failed: {os.path.basename(filepath)} - {e}")
        print("\n✅ Check your browser's Downloads folder!")
    else:
        print("❌ No files to download")

def save_to_google_drive():
    """Save exports to Google Drive for permanent storage"""
    if not IN_COLAB:
        print("ℹ️ This feature is only for Google Colab")
        return

    try:
        from google.colab import drive
        import shutil

        # Mount Drive
        drive.mount('/content/drive')

        # Create destination folder
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        dest = f'/content/drive/MyDrive/text_summaries_{timestamp}'
        os.makedirs(dest, exist_ok=True)

        # Copy files
        export_files = list_exports()
        if export_files:
            print(f"\n📁 Copying to Google Drive: {dest}")
            for filepath in export_files:
                shutil.copy2(filepath, dest)
                print(f"  ✅ Copied: {os.path.basename(filepath)}")
            print(f"\n✅ Files saved to Google Drive!")
            print(f"📍 Location: /MyDrive/text_summaries_{timestamp}/")
        else:
            print("❌ No files to save")

    except Exception as e:
        print(f"❌ Error: {e}")

def download_specific_file(filename):
    """Download a specific file"""
    if not IN_COLAB:
        print("ℹ️ Not in Colab - files are already on your computer")
        return

    from google.colab import files
    filepath = f'/content/exports/{filename}'

    if os.path.exists(filepath):
        files.download(filepath)
        print(f"✅ Downloaded: {filename}")
    else:
        print(f"❌ File not found: {filename}")

# ============================================
# AUTO-RUN OPTIONS
# ============================================

# 1. List all exported files
list_exports()

# 2. Create download buttons
if IN_COLAB:
    print("\n🎯 QUICK ACTIONS:")
    print("-" * 40)
    print("Run these commands to manage your files:\n")
    print("1️⃣ Download ALL exports to computer:")
    print("   download_all_exports()")
    print("\n2️⃣ Save to Google Drive (permanent):")
    print("   save_to_google_drive()")
    print("\n3️⃣ Download specific file:")
    print("   download_specific_file('summary_20251028_180000.md')")
    print("\n4️⃣ List files again:")
    print("   list_exports()")
    print("-" * 40)

    # Optional: Auto-download all files
    # Uncomment the line below to auto-download when you run this cell
    # download_all_exports()

📥 EXPORT FILE MANAGER FOR GOOGLE COLAB
✅ Google Colab detected

📁 Found 25 exported files:
  ✅ summary_20251028_182925.json (1.1 KB)
  ✅ summary_20251028_182925.md (1.0 KB)
  ✅ summary_20251028_182925.mp3 (98.1 KB)
  ✅ summary_20251028_182925.pdf (1.9 KB)
  ✅ summary_20251028_183221.md (1.3 KB)
  ✅ summary_20251028_183237.mp3 (122.8 KB)
  ✅ summary_20251028_183259.json (1.3 KB)
  ✅ summary_20251028_183301.pdf (1.9 KB)
  ✅ summary_20251028_183723.md (1.3 KB)
  ✅ summary_20251028_184323.json (1.1 KB)
  ✅ summary_20251028_184323.md (1.0 KB)
  ✅ summary_20251028_184323.mp3 (98.1 KB)
  ✅ summary_20251028_184323.pdf (1.9 KB)
  ✅ summary_20251028_184534.md (1.3 KB)
  ✅ summary_20251028_184545.mp3 (122.8 KB)
  ✅ summary_20251028_184549.json (1.3 KB)
  ✅ summary_20251028_184551.pdf (1.9 KB)
  ✅ summary_20251028_184939.json (1.1 KB)
  ✅ summary_20251028_184939.md (1.0 KB)
  ✅ summary_20251028_184940.mp3 (98.1 KB)
  ✅ summary_20251028_184940.pdf (1.9 KB)
  ✅ summary_20251028_185124.md (1.3 KB)
  

## 📊 Export Format Examples

### Markdown Export Preview:
```markdown
# Text Summary Report
- Date: 2025-10-28 18:00:00
- Model: T5 Small
- Compression: 70%

## Summary
AI is transforming technology through virtual assistants...
```

### JSON Export Preview:
```json
{
  "metadata": {
    "timestamp": "2025-10-28T18:00:00",
    "model": "t5-small"
  },
  "content": {
    "summary": "..."
  }
}
```

### Audio Languages Supported:
- 🇺🇸 English
- 🇪🇸 Spanish  
- 🇫🇷 French
- 🇩🇪 German
- 🇮🇹 Italian
- 🇵🇹 Portuguese
- 🇮🇳 Hindi
- 🇨🇳 Chinese
- 🇯🇵 Japanese
- 🇰🇷 Korean

---

## ✨ Tips

1. **For Google Colab**: The public URL will work for 72 hours
2. **For VS Code**: Open http://localhost:7860 in your browser
3. **First run**: Models download automatically (one-time)
4. **Export files**: Check the `./exports/` directory
5. **Audio files**: Require internet for text-to-speech

## 🎯 Ready!

Your enhanced text summarization tool with export features is ready to use!