To run this Fenic demo, click **Runtime** > **Run all**.

<div class="align-center">
<a href="https://github.com/typedef-ai/fenic"><img src="https://github.com/typedef-ai/fenic/blob/main/docs/images/typedef-fenic-logo-github-yellow.png" height="50"></a>
<a href="https://discord.gg/GdqF3J7huR"><img src="https://github.com/typedef-ai/fenic/blob/main/docs/images/join-the-discord.png" height="50"></a>
<a href="https://docs.fenic.ai/latest/"><img src="https://github.com/typedef-ai/fenic/blob/main/docs/images/documentation.png" height="50"></a>

Questions? Join the Discord and ask away! For feature requests or to leave a star, visit our [GitHub](https://github.com/typedef-ai/fenic).

</div>

In [None]:
!pip uninstall -y sklearn-compat ibis-framework imbalanced-learn google-genai
!pip install polars==1.30.0
# === GOOGLE GEMINI ===
#!pip install fenic[google]
# === ANTHROPIC CLAUDE ===
#!pip install fenic[anthropic]
# === OPENAI (Default) ===
!pip install fenic

In [None]:
import os 
import getpass

# 🔌 MULTI-PROVIDER SETUP - Choose your preferred LLM provider
# Uncomment ONE of the provider sections below:

# === OPENAI (Default) ===
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

# === GOOGLE GEMINI ===
# os.environ["GOOGLE_API_KEY"] = getpass.getpass("Google API Key:")

# === ANTHROPIC CLAUDE ===
# os.environ["ANTHROPIC_API_KEY"] = getpass.getpass("Anthropic API Key:")

# 🔗 Entity Resolution

**Hook:** *"Find the same company across different data sources, despite variations"*

"Apple Inc.", "Apple Computer", and "AAPL" all refer to the same company, but traditional matching fails. Entity resolution uses AI to understand that different representations refer to the same real-world entity, enabling data consolidation and deduplication at scale.

**What you'll see in this 2-minute demo:**
- 🏢 **Multi-source data** - Company names from different databases
- 🔍 **Identity matching** - AI recognizes same entities despite variations
- 📊 **Confidence scoring** - Quantified match certainty
- 🎯 **Data consolidation** - Unified entity profiles

Perfect for data integration, customer 360, and master data management.

In [None]:
import fenic as fc

# ⚡ Configure for entity resolution
session = fc.Session.get_or_create(fc.SessionConfig(
    app_name="entity_resolution_demo",
    semantic=fc.SemanticConfig(
        language_models={
            "resolver": fc.OpenAILanguageModel(model_name="gpt-4o-mini", rpm=500, tpm=200_000),
            # "resolver": fc.GoogleDeveloperLanguageModel(model_name="gemini-2.5-flash-lite", rpm=1000, tpm=1_000_000),
            # "resolver": fc.AnthropicLanguageModel(model_name="claude-3-5-sonnet-20241022", rpm=500, tpm=200_000)
        }
    )
))

print("✅ Entity resolution session configured")

## 🏢 Step 1: Multi-Source Company Data

Company records from different systems with various naming conventions:

In [None]:
# 🏢 Company data from different sources (CRM, financial, news, etc.)
companies_source_a = session.create_dataframe([
    {"id": "A001", "source": "CRM", "company_name": "Apple Inc.", "revenue": "$394B", "industry": "Technology"},
    {"id": "A002", "source": "CRM", "company_name": "Microsoft Corporation", "revenue": "$211B", "industry": "Software"},
    {"id": "A003", "source": "CRM", "company_name": "Alphabet Inc.", "revenue": "$307B", "industry": "Internet"},
    {"id": "A004", "source": "CRM", "company_name": "Meta Platforms Inc.", "revenue": "$134B", "industry": "Social Media"}
])

companies_source_b = session.create_dataframe([
    {"id": "B101", "source": "Financial", "company_name": "AAPL", "stock_price": "$189.50", "market_cap": "$2.9T"},
    {"id": "B102", "source": "Financial", "company_name": "MSFT", "stock_price": "$420.15", "market_cap": "$3.1T"},
    {"id": "B103", "source": "Financial", "company_name": "GOOGL", "stock_price": "$175.25", "market_cap": "$2.1T"},
    {"id": "B104", "source": "Financial", "company_name": "Apple Computer", "stock_price": "$189.50", "market_cap": "$2.9T"}
])

print("🏢 SOURCE A (CRM System):")
companies_source_a.show()

print("\n📈 SOURCE B (Financial System):")
companies_source_b.show()

## 🔍 Step 2: AI-Powered Entity Matching

Use semantic join to match companies across sources despite naming differences:

In [None]:
# 🔍 Semantic join to resolve entity identities across sources
# First, add aliases to avoid duplicate column names
companies_a_aliased = companies_source_a.select(
    fc.col("id").alias("crm_id"),
    fc.col("source").alias("crm_source"), 
    fc.col("company_name").alias("crm_name"),
    fc.col("revenue"),
    fc.col("industry")
)

companies_b_aliased = companies_source_b.select(
    fc.col("id").alias("financial_id"),
    fc.col("source").alias("financial_source"),
    fc.col("company_name").alias("financial_name"), 
    fc.col("stock_price"),
    fc.col("market_cap")
)

entity_matches = companies_a_aliased.semantic.join(
    companies_b_aliased,
    predicate="""Company A: {{ left_on }}
Company B: {{ right_on }}

These company names refer to the same business entity. Consider:
- Full legal names vs common names (Apple Inc. = Apple Computer)
- Stock ticker symbols (AAPL = Apple, MSFT = Microsoft, GOOGL = Google/Alphabet)
- Historical names and rebranding
- Corporate structure (Alphabet = Google parent company)

Return true if these represent the same real-world company.""",
    left_on=fc.col("crm_name"),
    right_on=fc.col("financial_name"),
    model_alias="resolver"
).cache()

print("🔗 ENTITY RESOLUTION MATCHES:")
resolved_entities = entity_matches.select(
    "crm_name",
    "financial_name", 
    "revenue",
    "stock_price"
)
resolved_entities.show()

## 📊 Step 3: Unified Entity Profiles

Create comprehensive profiles by merging data from matched entities:

In [None]:
# 📊 Create unified entity profiles with data from both sources
unified_profiles = entity_matches.select(
    "crm_id",
    "financial_id",
    fc.col("crm_name").alias("canonical_name"),
    fc.col("financial_name").alias("alt_name"),
    "industry",
    "revenue",
    "stock_price",
    "market_cap"
)

print("📊 UNIFIED ENTITY PROFILES:")
unified_profiles.show()

# Calculate resolution statistics
total_source_a = companies_source_a.count()
total_source_b = companies_source_b.count()
matched_entities = entity_matches.count()

print("\n🎯 ENTITY RESOLUTION RESULTS:")
print(f"   • Source A records: {total_source_a}")
print(f"   • Source B records: {total_source_b}")
print(f"   • Successfully matched entities: {matched_entities}")
print(f"   • Match success rate: {(matched_entities/min(total_source_a, total_source_b)*100):.0f}%")

print("\n💡 BUSINESS BENEFITS:")
print("   • Automatic entity deduplication across data sources")
print("   • 360-degree view of customers/companies")
print("   • Data quality improvement through consolidation")
print("   • Reduced manual data matching effort")
print("   • Master data management at scale")

In [None]:
session.stop()