To run this Fenic demo, click **Runtime** > **Run all**.

<div class="align-center">
<a href="https://github.com/typedef-ai/fenic"><img src="https://github.com/typedef-ai/fenic/blob/main/docs/images/typedef-fenic-logo-github-yellow.png?raw=true" height="50"></a>
<a href="https://discord.gg/GdqF3J7huR"><img src="https://github.com/typedef-ai/fenic/blob/main/docs/images/join-the-discord.png?raw=true" height="50"></a>
<a href="https://docs.fenic.ai/latest/"><img src="https://github.com/typedef-ai/fenic/blob/main/docs/images/documentation.png?raw=true" height="50"></a>

Questions? Join the Discord and ask away! For feature requests or to leave a star, visit our [GitHub](https://github.com/typedef-ai/fenic).

</div>

In [None]:
!pip uninstall -y sklearn-compat ibis-framework imbalanced-learn google-genai
!pip install polars==1.30.0
# === GOOGLE GEMINI ===
#!pip install fenic[google]
# === ANTHROPIC CLAUDE ===
#!pip install fenic[anthropic]
# === OPENAI (Default) ===
!pip install fenic

In [None]:
import os 
import getpass

# 🔌 MULTI-PROVIDER SETUP - Choose your preferred LLM provider
# Uncomment ONE of the provider sections below:

# === OPENAI (Default) ===
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

# === GOOGLE GEMINI ===
# os.environ["GOOGLE_API_KEY"] = getpass.getpass("Google API Key:")

# === ANTHROPIC CLAUDE ===
# os.environ["ANTHROPIC_API_KEY"] = getpass.getpass("Anthropic API Key:")

# 📄 Document Summarization

**Hook:** *"Turn lengthy documents into actionable insights instantly"*

Legal contracts, research papers, meeting transcripts - information overload is real. Traditional summarization tools give generic summaries. Watch AI-powered document processing extract structured insights, key decisions, and action items from complex documents.

**What you'll see in this 2-minute demo:**
- 📚 **Mixed document types** - Contracts, reports, meeting notes
- 🧠 **Intelligent extraction** - Key points, decisions, action items
- 📊 **Structured insights** - Not just summaries, but actionable data
- ⚡ **Batch processing** - Multiple documents processed efficiently

Perfect for legal review, research analysis, and executive briefings.

In [None]:
import fenic as fc
from pydantic import BaseModel, Field
from typing import List, Literal

# ⚡ Configure for document processing
session = fc.Session.get_or_create(fc.SessionConfig(
    app_name="document_summarization_demo",
    semantic=fc.SemanticConfig(
        language_models={
            "summarizer": fc.OpenAILanguageModel(model_name="gpt-4o-mini", rpm=500, tpm=200_000),
            # "summarizer": fc.GoogleDeveloperLanguageModel(model_name="gemini-2.5-flash-lite", rpm=1000, tpm=1_000_000),
            # "summarizer": fc.AnthropicLanguageModel(model_name="claude-3-5-sonnet-20241022", rpm=500, tpm=200_000)
        }
    )
))

print("✅ Document summarization session configured")

## 📄 Step 1: Document Insight Schema

Define the structured insights we want to extract from documents:

In [None]:
# 📊 Structured document insights schema
class DocumentInsight(BaseModel):
    document_type: Literal["contract", "meeting_notes", "research_paper", "proposal", "report", "legal_document"] = Field(description="Type of document")
    key_decisions: List[str] = Field(description="Important decisions or conclusions made")
    action_items: List[str] = Field(description="Tasks or next steps mentioned")
    risk_factors: List[str] = Field(description="Potential risks or concerns identified")
    stakeholders: List[str] = Field(description="People or organizations mentioned")
    urgency_level: Literal["high", "medium", "low", "none"] = Field(description="Priority level")
    executive_summary: str = Field(description="One-paragraph executive summary")

print("📊 Document Insight Schema:")
print("   • document_type: Automatic classification")
print("   • key_decisions: Important conclusions")
print("   • action_items: Next steps required")
print("   • risk_factors: Potential concerns")
print("   • stakeholders: Key people/organizations")
print("   • urgency_level: Priority assessment")
print("   • executive_summary: Concise overview")

## 📚 Step 2: Mixed Document Collection

Various document types requiring different extraction approaches:

In [None]:
# 📚 Documents with different content and structures
documents = session.create_dataframe([
    {
        "doc_id": "DOC001",
        "title": "Software License Agreement",
        "content": """This Software License Agreement ("Agreement") is entered into between TechCorp Inc. ("Licensor") and DataFlow Solutions ("Licensee") effective January 15, 2024. The Licensee agrees to pay $50,000 annually for enterprise software access. Key terms: (1) Software must not be redistributed, (2) Support included for first year only, (3) Automatic renewal unless 90-day notice provided, (4) Liability limited to license fees paid. Licensee acknowledges data processing compliance under GDPR. Both parties agree to binding arbitration in Delaware. This agreement supersedes all previous negotiations."""
    },
    {
        "doc_id": "DOC002", 
        "title": "Q4 Board Meeting Minutes",
        "content": """Board Meeting - December 18, 2024. Present: CEO Sarah Kim, CFO Michael Chen, CTO Lisa Rodriguez, Board Members Johnson, Williams, Davis. Key decisions: (1) Approved $2M Series B funding round led by VenturePartners, (2) Authorized hiring of 15 engineers by March 2025, (3) Rejected acquisition offer from MegaCorp at $25M valuation. Action items: Sarah to finalize Series B paperwork by Jan 31, Michael to prepare Q1 budget incorporating new hires, Lisa to establish engineering hiring pipeline. Risk discussed: Competitive pressure from new entrants, potential talent shortage in AI space. Meeting adjourned 3:47 PM."""
    },
    {
        "doc_id": "DOC003",
        "title": "Market Research Report", 
        "content": """AI-Powered Analytics Market Analysis 2024. Executive Summary: Global market projected to reach $47.3B by 2027, growing at 23.1% CAGR. Key findings: (1) Healthcare and financial services are top adoption verticals, (2) Data privacy concerns remain primary barrier, (3) Mid-market companies (100-1000 employees) represent fastest-growing segment. Competitive landscape dominated by established players (Microsoft, Google) but significant opportunity exists in specialized niches. Recommendations: Focus on compliance-first solutions, target mid-market with simplified onboarding, consider strategic partnerships with consulting firms. Methodology based on 247 enterprise surveys, 18 expert interviews."""
    },
    {
        "doc_id": "DOC004",
        "title": "Project Proposal: Mobile App Redesign",
        "content": """Proposal: Complete mobile application redesign to improve user engagement and reduce churn. Current app has 2.3/5 star rating, 67% user drop-off rate. Proposed solution: Modern UI/UX overhaul, performance optimization, new onboarding flow. Timeline: 4 months, Budget: $180,000 (includes design, development, testing). Team: 2 designers, 3 developers, 1 PM, 1 QA. Expected outcomes: Increase app rating to 4.2+, reduce churn to 35%, improve user session time by 40%. Risks: Resource allocation conflicts with Q2 feature releases, potential user confusion during transition. Approval needed by February 1 to meet summer launch window."""
    }
])

print("📚 Document Collection - Mixed types and complexity:")
documents.select("doc_id", "title").show()

## 🧠 Step 3: Intelligent Document Analysis

Extract structured insights from each document using AI:

In [None]:
# 🧠 Extract structured insights from documents
document_insights = documents.select(
    "doc_id",
    "title",
    fc.semantic.extract(
        "content",
        DocumentInsight,
        model_alias="summarizer"
    ).alias("insights")
).cache()

# Extract key fields for analysis
insight_summary = document_insights.select(
    "doc_id",
    "title",
    document_insights.insights.document_type.alias("type"),
    document_insights.insights.urgency_level.alias("urgency"),
    document_insights.insights.executive_summary.alias("summary")
)

print("🧠 INTELLIGENT DOCUMENT ANALYSIS:")
insight_summary.show()

## ⚡ Step 4: Action Item Extraction

Identify actionable tasks and decisions across all documents:

In [None]:
# ⚡ Extract and analyze action items
action_analysis = document_insights.select(
    "doc_id", 
    "title",
    document_insights.insights.action_items.alias("actions"),
    document_insights.insights.key_decisions.alias("decisions"),
    document_insights.insights.risk_factors.alias("risks")
)

print("⚡ ACTION ITEMS AND DECISIONS:")
action_analysis.show()

# Analyze document insights distribution
total_docs = documents.count()
high_urgency = document_insights.filter(
    document_insights.insights.urgency_level == "high"
).count()

print("\n📊 DOCUMENT INTELLIGENCE SUMMARY:")
print(f"   • Total documents processed: {total_docs}")
print(f"   • High urgency documents: {high_urgency}")
print("   • Automated extraction: decisions, actions, risks, stakeholders")

print("\n🎯 BUSINESS BENEFITS:")
print("   • Instant document triage by urgency and type")
print("   • Automated action item tracking across all documents")
print("   • Risk identification for proactive management")
print("   • Executive summaries for quick decision-making")
print("   • Stakeholder mapping for communication planning")

In [None]:
session.stop()