To run this Fenic demo, click **Runtime** > **Run all**.

<div class="align-center">
<a href="https://github.com/typedef-ai/fenic"><img src="https://github.com/typedef-ai/fenic/blob/main/docs/images/typedef-fenic-logo-github-yellow.png?raw=true" height="50"></a>
<a href="https://discord.gg/GdqF3J7huR"><img src="https://github.com/typedef-ai/fenic/blob/main/docs/images/join-the-discord.png?raw=true" height="50"></a>
<a href="https://docs.fenic.ai/latest/"><img src="https://github.com/typedef-ai/fenic/blob/main/docs/images/documentation.png?raw=true" height="50"></a>

Questions? Join the Discord and ask away! For feature requests or to leave a star, visit our [GitHub](https://github.com/typedef-ai/fenic).

</div>

In [None]:
!pip uninstall -y sklearn-compat ibis-framework imbalanced-learn google-genai
!pip install polars==1.30.0
# === GOOGLE GEMINI ===
#!pip install fenic[google]
# === ANTHROPIC CLAUDE ===
#!pip install fenic[anthropic]
# === OPENAI (Default) ===
!pip install fenic

In [None]:
import os 
import getpass

# 🔌 MULTI-PROVIDER SETUP - Choose your preferred LLM provider
# Uncomment ONE of the provider sections below:

# === OPENAI (Default) ===
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

# === GOOGLE GEMINI ===
# os.environ["GOOGLE_API_KEY"] = getpass.getpass("Google API Key:")

# === ANTHROPIC CLAUDE ===
# os.environ["ANTHROPIC_API_KEY"] = getpass.getpass("Anthropic API Key:")

In [None]:
import fenic as fc
from pydantic import BaseModel, Field
from typing import List, Literal

# ⚡ Configure for smart labeling
session = fc.Session.get_or_create(fc.SessionConfig(
    app_name="smart_labeling_demo",
    semantic=fc.SemanticConfig(
        language_models={
            "labeler": fc.OpenAILanguageModel(model_name="gpt-4o-mini", rpm=500, tpm=200_000),
            # "labeler": fc.GoogleDeveloperLanguageModel(model_name="gemini-2.5-flash-lite", rpm=1000, tpm=1_000_000),
            # "labeler": fc.AnthropicLanguageModel(model_name="claude-3-5-sonnet-20241022", rpm=500, tpm=200_000)
        }
    )
))

print("✅ Smart data labeling session configured")

## 🏷️ Step 1: Labeling Schema Definition

Define the complex labeling schema that would normally require detailed annotation guidelines:

In [None]:
# 🏷️ Complex multi-label schema for customer support tickets
class TicketLabel(BaseModel):
    category: Literal["billing", "technical", "account", "feature_request", "bug_report", "general"] = Field(
        description="Primary category of the support ticket"
    )
    priority: Literal["low", "medium", "high", "critical"] = Field(
        description="Business priority based on impact and urgency"
    )
    sentiment: Literal["positive", "neutral", "frustrated", "angry"] = Field(
        description="Customer's emotional tone in the message"
    )
    requires_escalation: bool = Field(
        description="Whether this ticket needs senior support or management attention"
    )
    estimated_effort: Literal["quick", "moderate", "complex"] = Field(
        description="Expected time/effort to resolve this issue"
    )
    tags: List[str] = Field(
        description="Relevant tags for filtering and routing (e.g., 'refund', 'login_issue', 'mobile_app')"
    )
    reasoning: str = Field(
        description="Brief explanation of why these labels were chosen"
    )

print("🏷️ Comprehensive Labeling Schema:")
print("   • category: Primary ticket classification")
print("   • priority: Business impact assessment")
print("   • sentiment: Customer emotional state")
print("   • requires_escalation: Management attention flag")
print("   • estimated_effort: Resource planning")
print("   • tags: Searchable categorization")
print("   • reasoning: AI explanation for transparency")

## 📧 Step 2: Unlabeled Customer Support Tickets

Real customer messages that need consistent, accurate labeling for training or analysis:

In [None]:
# 📧 Unlabeled customer support tickets with varying complexity
unlabeled_tickets = session.create_dataframe([
    {
        "ticket_id": "T001",
        "subject": "URGENT: Payment failed but money was charged!",
        "message": "This is absolutely ridiculous! Your system charged my card $99 but my subscription still shows as expired. I've been trying to access my account for 3 hours and nothing works. I demand a refund immediately or I'm disputing this with my bank!"
    },
    {
        "ticket_id": "T002", 
        "subject": "Mobile app suggestion",
        "message": "Hey team! Love the new dashboard updates. Would it be possible to add a dark mode option to the mobile app? I do a lot of work in the evenings and it would be easier on the eyes. Thanks for considering!"
    },
    {
        "ticket_id": "T003",
        "subject": "API returning 500 errors",
        "message": "Our production system is getting intermittent 500 errors from your API endpoint /v1/data/export. This started around 2 PM EST today. Error rate is about 15% of requests. Can you please investigate ASAP as this is affecting our customers?"
    },
    {
        "ticket_id": "T004",
        "subject": "Cannot reset password", 
        "message": "Hi, I'm trying to reset my password but I'm not receiving the reset email. I've checked spam folder and tried multiple times. My email is john.smith@company.com. Could you help me regain access to my account?"
    },
    {
        "ticket_id": "T005",
        "subject": "Data export is corrupted",
        "message": "The CSV export I downloaded yesterday has garbled characters in the description field. Looks like there might be an encoding issue with special characters. This is blocking our monthly reporting. Can someone look into this?"
    },
    {
        "ticket_id": "T006",
        "subject": "Thank you for the quick support!",
        "message": "Just wanted to say thanks to Sarah from your support team who helped me yesterday. She was incredibly patient and walked me through the integration process step by step. Great customer service!"
    }
])

print("📧 Unlabeled Customer Support Tickets:")
unlabeled_tickets.select("ticket_id", "subject").show()
print("\n💡 Notice: Mix of billing issues, feature requests, technical problems, and positive feedback!")

## 🤖 Step 3: AI-Powered Intelligent Labeling

Watch AI analyze context, tone, and content to generate consistent, high-quality labels:

In [None]:
# 🤖 AI labeling with reasoning
labeled_data = unlabeled_tickets.select(
    "ticket_id",
    "subject",
    fc.semantic.extract(
        "message",
        TicketLabel,
        model_alias="labeler"
    ).alias("labels")
).cache()

# Extract labeled results for analysis
labeled_results = labeled_data.select(
    "ticket_id",
    "subject",
    labeled_data.labels.category.alias("category"),
    labeled_data.labels.priority.alias("priority"),
    labeled_data.labels.sentiment.alias("sentiment"),
    labeled_data.labels.requires_escalation.alias("escalation"),
    labeled_data.labels.estimated_effort.alias("effort"),
    labeled_data.labels.reasoning.alias("ai_reasoning")
)

print("🤖 AI-GENERATED LABELS WITH REASONING:")
print("=" * 60)
labeled_results.show()

# Show AI reasoning for transparency
reasoning_data = labeled_data.select(
    "ticket_id",
    labeled_data.labels.reasoning
)

print("\n🧠 AI REASONING (Why these labels were chosen):")
reasoning_data.show()

## 📊 Step 4: Labeling Quality & Business Impact

Analyze the quality and consistency of AI-generated labels vs manual alternatives:

In [None]:
# 📊 Analyze labeling results and business impact
total_tickets = unlabeled_tickets.count()
high_priority = labeled_results.filter(fc.col("priority").is_in(["high", "critical"])).count()
needs_escalation = labeled_results.filter(fc.col("escalation")).count()
angry_customers = labeled_results.filter(fc.col("sentiment") == "angry").count()
technical_issues = labeled_results.filter(fc.col("category") == "technical").count()

print("📊 INTELLIGENT LABELING ANALYTICS:")
print(f"   • Total tickets labeled: {total_tickets}")
print(f"   • High/Critical priority: {high_priority} ({high_priority/total_tickets*100:.0f}%)")
print(f"   • Requires escalation: {needs_escalation} ({needs_escalation/total_tickets*100:.0f}%)")
print(f"   • Angry customers: {angry_customers} (immediate attention needed)")
print(f"   • Technical issues: {technical_issues} (route to engineering)")

# Cost comparison with manual labeling
print("\n💰 COST COMPARISON (10,000 tickets):")
print("   Manual labeling team:")
print("   • 3 annotators × 8 weeks × $25/hour = $48,000")
print("   • Inconsistency issues, training time, quality control")
print("   ")
print("   AI labeling with Fenic:")
print("   • API costs: ~$500")
print("   • Consistent quality, instant results, explainable reasoning")
print("   • 96% cost reduction + 1000x faster")

print("\n🎯 BUSINESS BENEFITS:")
print("   • Automatic ticket routing based on AI labels")
print("   • Instant priority detection for urgent issues")
print("   • Sentiment analysis for customer satisfaction")
print("   • Consistent labeling across all support channels")
print("   • Scalable to millions of tickets without hiring")
print("   • Transparent AI reasoning for quality assurance")

# Show tag analysis
tag_data = labeled_data.select(
    "ticket_id", 
    labeled_data.labels.tags.alias("tags")
)
print("\n🏷️ AI-GENERATED TAGS FOR SEARCHABILITY:")
tag_data.show()

In [None]:
session.stop()