To run this Fenic demo, click **Runtime** > **Run all**.

<div class="align-center">
<a href="https://github.com/typedef-ai/fenic"><img src="https://github.com/typedef-ai/fenic/blob/main/docs/images/typedef-fenic-logo-github-yellow.png" height="50"></a>
<a href="https://discord.gg/GdqF3J7huR"><img src="https://github.com/typedef-ai/fenic/blob/main/docs/images/join-the-discord.png" height="50"></a>
<a href="https://docs.fenic.ai/latest/"><img src="https://github.com/typedef-ai/fenic/blob/main/docs/images/documentation.png" height="50"></a>

Questions? Join the Discord and ask away! For feature requests or to leave a star, visit our [GitHub](https://github.com/typedef-ai/fenic).

</div>

In [None]:
!pip uninstall -y sklearn-compat ibis-framework imbalanced-learn google-genai
!pip install polars==1.30.0
# === GOOGLE GEMINI ===
#!pip install fenic[google]
# === ANTHROPIC CLAUDE ===
#!pip install fenic[anthropic]
# === OPENAI (Default) ===
!pip install fenic

In [None]:
import os 
import getpass

# 🔌 MULTI-PROVIDER SETUP - Choose your preferred LLM provider
# Uncomment ONE of the provider sections below:

# === OPENAI (Default) ===
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

# === GOOGLE GEMINI ===
# os.environ["GOOGLE_API_KEY"] = getpass.getpass("Google API Key:")

# === ANTHROPIC CLAUDE ===
# os.environ["ANTHROPIC_API_KEY"] = getpass.getpass("Anthropic API Key:")

# 🔗 SEO Content Clustering

**Hook:** *"Group related keywords by semantic meaning, not just string similarity"*

Traditional keyword research groups by exact matches - "machine learning" vs "ML algorithms" are separate clusters. Semantic clustering understands that these concepts are related. Perfect for content strategy, topic authority, and search intent analysis.

**What you'll see in this 2-minute demo:**
- 📊 **Mixed keyword data** - Search terms with different phrasings but similar intent
- 🧠 **Semantic clustering** - Group by meaning using `semantic.group_by`
- 📈 **Search metrics** - Aggregate volume and difficulty across semantic clusters
- 🎯 **Content strategy** - Unified topics for maximum SEO impact

This transforms keyword chaos into actionable content pillars.

In [None]:
import fenic as fc
from fenic.core.types.classify import ClassDefinition

# ⚡ Configure for content analysis
session = fc.Session.get_or_create(fc.SessionConfig(
    app_name="seo_clustering_demo",
    semantic=fc.SemanticConfig(
        language_models={
            "content": fc.OpenAILanguageModel(model_name="gpt-4o-mini", rpm=500, tpm=200_000),
            # "content": fc.GoogleDeveloperLanguageModel(model_name="gemini-2.5-flash-lite", rpm=1000, tpm=1_000_000),
            # "content": fc.AnthropicLanguageModel(model_name="claude-3-5-sonnet-20241022", rpm=500, tpm=200_000)
        }
    )
))

print("✅ SEO content clustering session configured")

## 📊 Step 1: Mixed Keyword Research Data

Real SEO keywords with different phrasings but related search intent:

In [None]:
# 📊 SEO keyword data with semantic overlap
keywords = session.create_dataframe([
    {"keyword": "machine learning algorithms", "search_volume": 12000, "difficulty": 78},
    {"keyword": "ML models explained", "search_volume": 8500, "difficulty": 65},
    {"keyword": "artificial intelligence basics", "search_volume": 15000, "difficulty": 72},
    {"keyword": "AI fundamentals", "search_volume": 9200, "difficulty": 68},
    {"keyword": "deep learning tutorial", "search_volume": 11500, "difficulty": 82},
    {"keyword": "neural networks guide", "search_volume": 7800, "difficulty": 75},
    {"keyword": "data science career", "search_volume": 18500, "difficulty": 71},
    {"keyword": "data scientist jobs", "search_volume": 14200, "difficulty": 69},
    {"keyword": "data analytics roles", "search_volume": 6700, "difficulty": 58}
])

print("📊 SEO Keywords - Notice the semantic overlap:")
keywords.show()

## 🧠 Step 2: Semantic Clustering

Group keywords by semantic meaning instead of exact string matches:

In [None]:
# 🔗 First, classify keywords into semantic clusters with descriptive definitions
classified_keywords = keywords.select(
    "*",
    fc.semantic.classify(
        "keyword",
        [
            ClassDefinition(label="machine_learning", description="Keywords related to machine learning algorithms, models, and general ML concepts including artificial intelligence basics"),
            ClassDefinition(label="deep_learning", description="Keywords specifically about deep learning, neural networks, and advanced AI techniques"),
            ClassDefinition(label="data_science_careers", description="Keywords focused on data science jobs, career paths, roles, and professional opportunities in analytics")
        ],
        model_alias="content"
    ).alias("cluster_label")
).cache()

print("🏷️ KEYWORD CLASSIFICATION:")
classified_keywords.show()

# 🔗 Then group by cluster and use semantic.reduce for insights
clusters = classified_keywords.group_by(
    "cluster_label"
).agg(
    fc.count("*").alias("keyword_count"),
    fc.sum("search_volume").alias("total_volume"),
    fc.avg("difficulty").alias("avg_difficulty"),
    fc.collect_list("keyword").alias("keywords_in_cluster"),
    fc.semantic.reduce(
        "Analyze these related keywords and provide a concise summary of the search intent theme and content strategy recommendations.",
        column=fc.col("keyword"),
        model_alias="content"
    ).alias("theme_summary")
).cache()  # Cache clustering results

print("\n🔗 SEMANTIC KEYWORD CLUSTERS:")
clusters.show()

## 📈 Step 3: Content Strategy Insights

Analyze cluster metrics to prioritize content creation:

In [None]:
# 📈 Content strategy analysis
total_keywords = keywords.count()
cluster_count = clusters.count()

print("📈 CONTENT STRATEGY INSIGHTS:")
print(f"   • {total_keywords} keywords grouped into {cluster_count} semantic clusters")
print(f"   • Reduced content topics by {((total_keywords - cluster_count) / total_keywords * 100):.0f}%")

# Show cluster performance metrics with theme summaries
strategy_insights = clusters.select(
    "cluster_label",
    "keyword_count", 
    "total_volume",
    "avg_difficulty",
    "theme_summary"
).order_by(fc.desc("total_volume"))

print("\n🎯 TOP CONTENT OPPORTUNITIES:")
strategy_insights.show()

print("\n💡 STRATEGIC BENEFITS:")
print("   • Topic authority: Target clusters instead of individual keywords")
print("   • Content efficiency: One article covers multiple related searches") 
print("   • Better user experience: Comprehensive topic coverage")
print("   • AI-powered insights: Theme summaries guide content creation")
print("   • Data-driven prioritization: Focus on high-volume, low-difficulty clusters")

In [None]:
session.stop()