To run this Fenic demo, click **Runtime** > **Run all**.

<div class="align-center">
<a href="https://github.com/typedef-ai/fenic"><img src="https://github.com/typedef-ai/fenic/blob/main/docs/images/typedef-fenic-logo-github-yellow.png?raw=true" height="50"></a>
<a href="https://discord.gg/GdqF3J7huR"><img src="https://github.com/typedef-ai/fenic/blob/main/docs/images/join-the-discord.png?raw=true" height="50"></a>
<a href="https://docs.fenic.ai/latest/"><img src="https://github.com/typedef-ai/fenic/blob/main/docs/images/documentation.png?raw=true" height="50"></a>

Questions? Join the Discord and ask away! For feature requests or to leave a star, visit our [GitHub](https://github.com/typedef-ai/fenic).

</div>

In [None]:
!pip uninstall -y sklearn-compat ibis-framework imbalanced-learn google-genai
!pip install polars==1.30.0
# === GOOGLE GEMINI ===
#!pip install fenic[google]
# === ANTHROPIC CLAUDE ===
#!pip install fenic[anthropic]
# === OPENAI (Default) ===
!pip install fenic

In [None]:
import os 
import getpass

# 🔌 MULTI-PROVIDER SETUP - Choose your preferred LLM provider
# Uncomment ONE of the provider sections below:

# === OPENAI (Default) ===
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

# === GOOGLE GEMINI ===
# os.environ["GOOGLE_API_KEY"] = getpass.getpass("Google API Key:")

# === ANTHROPIC CLAUDE ===
# os.environ["ANTHROPIC_API_KEY"] = getpass.getpass("Anthropic API Key:")

# 🔗 Semantic Joins

**Hook:** *"What if SQL joins understood language instead of exact strings?"*

Traditional joins require exact matches - "NYC" ≠ "New York City" means zero results. Semantic joins understand that these refer to the same place. This is revolutionary for messy real-world data where entities appear in different formats across systems.

**What you'll see in this 2-minute demo:**
- 🏢 **Mismatched company locations** - "SF Bay Area", "NYC", "The Big Apple"
- 🌡️ **Official weather city names** - "San Francisco", "New York City"
- 🔗 **Magic matching** - Watch "NYC" connect to "New York City" automatically
- 📊 **100% join success** vs 14% with traditional exact matching

No data cleaning, no mapping tables, no ETL - just semantic understanding.

In [None]:
import fenic as fc

# ⚡ One-line setup for semantic operations
session = fc.Session.get_or_create(fc.SessionConfig(
    app_name="semantic_joins_demo",
    semantic=fc.SemanticConfig(
        language_models={
            "smart": fc.OpenAILanguageModel(model_name="gpt-4o-mini", rpm=500, tpm=200_000),
            # "smart": fc.GoogleDeveloperLanguageModel(model_name="gemini-2.5-flash-lite", rpm=1000, tpm=1_000_000),
            # "smart": fc.AnthropicLanguageModel(model_name="claude-3-5-sonnet-20241022", rpm=500, tpm=200_000)
        }
    )
))

print("✅ Fenic session configured for semantic join operations")

## 🏢 Step 1: Company Data (The Messy Reality)

Real business databases are full of inconsistent location names. Here's what you actually find:

In [None]:
# 🏢 Real company data with messy location names (exactly what you find in business databases)
companies = session.create_dataframe([
    {"company": "TechCorp", "location": "SF Bay Area", "employees": 1200, "revenue": "$50M"},
    {"company": "DataFlow Inc", "location": "NYC", "employees": 800, "revenue": "$30M"},
    {"company": "CloudWorks", "location": "Seattle", "employees": 650, "revenue": "$25M"},
    {"company": "AI Dynamics", "location": "The Big Apple", "employees": 450, "revenue": "$18M"},
    {"company": "MediaTech", "location": "LA", "employees": 320, "revenue": "$12M"}
])

print("🏢 Company Locations (Messy Real-World Data):")
companies.show()

## 🌡️ Step 2: Weather Data (The Clean Standard)

External APIs like weather services use official, standardized city names:

In [None]:
# 🌡️ Official weather data with standardized city names (from weather APIs)
weather = session.create_dataframe([
    {"city": "San Francisco", "temp_f": 68, "condition": "Foggy", "air_quality": "Good"},
    {"city": "New York City", "temp_f": 72, "condition": "Sunny", "air_quality": "Moderate"},
    {"city": "Seattle", "temp_f": 65, "condition": "Cloudy", "air_quality": "Good"},
    {"city": "Los Angeles", "temp_f": 78, "condition": "Sunny", "air_quality": "Unhealthy"}
])

print("🌡️ Weather Data (Clean Official Names):")
weather.show()

print("\n❌ THE PROBLEM: Traditional SQL JOIN would find ZERO matches!")
print("   • 'SF Bay Area' ≠ 'San Francisco'")
print("   • 'NYC' ≠ 'New York City'")
print("   • 'The Big Apple' ≠ 'New York City'")
print("   • 'LA' ≠ 'Los Angeles'")

## ✨ Step 3: The Semantic Join Magic

This is where the breakthrough happens - language understanding meets data joining!

In [None]:
# ✨ This is where the magic happens - language understanding meets data joining
semantic_result = companies.semantic.join(
    weather,
    predicate=(
        "The location '{{left_on}}' refers to the same metropolitan area or city, or is geographically nearby to '{{right_on}}'."
    ),
    left_on=fc.col("location"),      # Messy company locations
    right_on=fc.col("city"),         # Clean weather city names
    model_alias="smart"
).cache()  # Cache results to avoid re-running LLM calls

# Show the incredible results
joined_data = semantic_result.select(
    "company",
    "location",              # Original messy name
    "city",                  # Matched clean name  
    "temp_f",
    "condition",
    "air_quality"
)

print("🎯 SEMANTIC JOIN RESULTS: Language Understanding in Action!")
print("=" * 75)
joined_data.show()

## 📊 Step 4: Business Impact Analysis

Let's quantify the breakthrough results and business impact:

In [None]:
# Calculate the incredible success rate
successful_matches = semantic_result.filter(fc.col("city").is_not_null()).count()
total_companies = companies.count()
success_rate = (successful_matches / total_companies) * 100

print("📊 SEMANTIC JOIN SUCCESS METRICS:")
print(f"   • Successful matches: {successful_matches}/{total_companies} ({success_rate:.0f}%)")
print("   • 'SF Bay Area' → 'San Francisco' ✅")
print("   • 'NYC' → 'New York City' ✅")
print("   • 'The Big Apple' → 'New York City' ✅")
print("   • 'LA' → 'Los Angeles' ✅")

print("\n💡 GAME-CHANGING RESULTS:")
print("   Traditional exact matching: 0% success rate")
print(f"   Fenic semantic matching: {success_rate:.0f}% success rate")
print("   Hours of ETL work → ONE line of code")

# Business impact analysis - simplified
companies_with_weather = semantic_result.filter(fc.col("air_quality").is_not_null()).count()

print("\n🏢 BUSINESS INTELLIGENCE UNLOCKED:")
print(f"   • {companies_with_weather} companies now have weather context")
print("   • Can analyze productivity vs weather patterns")
print("   • Enable location-based business decisions")

In [None]:
session.stop()