To run this Fenic demo, click **Runtime** > **Run all**.

<div class="align-center">
<a href="https://github.com/typedef-ai/fenic"><img src="https://github.com/typedef-ai/fenic/blob/main/docs/images/typedef-fenic-logo-github-yellow.png?raw=true" height="50"></a>
<a href="https://discord.gg/GdqF3J7huR"><img src="https://github.com/typedef-ai/fenic/blob/main/docs/images/join-the-discord.png?raw=true" height="50"></a>
<a href="https://docs.fenic.ai/latest/"><img src="https://github.com/typedef-ai/fenic/blob/main/docs/images/documentation.png?raw=true" height="50"></a>

Questions? Join the Discord and ask away! For feature requests or to leave a star, visit our [GitHub](https://github.com/typedef-ai/fenic).

</div>

In [None]:
!pip uninstall -y sklearn-compat ibis-framework imbalanced-learn google-genai
!pip install polars==1.30.0
# === GOOGLE GEMINI ===
#!pip install fenic[google]
# === ANTHROPIC CLAUDE ===
#!pip install fenic[anthropic]
# === OPENAI (Default) ===
!pip install fenic

In [None]:
import os 
import getpass

# 🔌 MULTI-PROVIDER SETUP - Choose your preferred LLM provider
# Uncomment ONE of the provider sections below:

# === OPENAI (Default) ===
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

# === GOOGLE GEMINI ===
# os.environ["GOOGLE_API_KEY"] = getpass.getpass("Google API Key:")

# === ANTHROPIC CLAUDE ===
# os.environ["ANTHROPIC_API_KEY"] = getpass.getpass("Anthropic API Key:")

# 🚀 Semantic Similarity Joins

**Hook:** *"Match candidates to jobs by skill similarity, not keywords"*

Beyond keyword matching, semantic similarity joins use **embeddings** to understand conceptual relationships. Watch a "React developer" get matched to "JavaScript Engineer" roles with quantified similarity scores - this is AI-powered recruiting at its finest.

**What you'll see in this 2-minute demo:**
- 💼 **Job requirements** - "React, TypeScript, modern JavaScript"
- 👩‍💻 **Candidate skills** - "React developer, component architecture"
- 🧠 **Embedding vectors** - Converting text to mathematical representations
- 🎯 **Similarity scores** - Quantified match confidence (0.0-1.0)

This goes beyond string matching to understand skill relationships and conceptual overlap.

In [None]:
import fenic as fc

# ⚡ Configure session for semantic similarity analysis  
session = fc.Session.get_or_create(fc.SessionConfig(
    app_name="semantic_similarity_demo",
    semantic=fc.SemanticConfig(
        embedding_models={
            "embeddings": fc.OpenAIEmbeddingModel(model_name="text-embedding-3-small", rpm=3000, tpm=1_000_000)
        },
    )
))

print("✅ Configured with specialized embedding model for similarity matching")

## 💼 Step 1: The Challenge - No Keyword Overlap

Jobs require "React + TypeScript", candidates have "component architecture" skills. Traditional matching fails!

In [None]:
# 💼 Job requirements vs 👩‍💻 Candidate skills - notice the mismatch!
jobs = session.create_dataframe([
    {"title": "Frontend Engineer", "requirements": "React expert, TypeScript, modern JavaScript"},
    {"title": "ML Research Scientist", "requirements": "PyTorch, deep learning research, PhD preferred"},
    {"title": "DevOps Engineer", "requirements": "Kubernetes, Docker, AWS cloud platforms"}
])

candidates = session.create_dataframe([
    {"name": "Sarah", "skills": "6 years building web apps, component-based architecture"},
    {"name": "Dr. Rodriguez", "skills": "AI researcher, published ML papers, neural networks"},
    {"name": "Emma", "skills": "Cloud infrastructure specialist, container orchestration"}
])

print("💼 Jobs Need:")
jobs.show()
print("\n👩‍💻 Candidates Offer:")
candidates.show()
print("\n🔍 CHALLENGE: No exact keyword matches!")

## 🧠 Step 2: Convert to Embedding Vectors

Transform text into mathematical representations that capture semantic meaning:

In [None]:
# 🧠 Create embeddings - converting text to 1536-dimensional vectors
jobs_with_embeddings = jobs.with_column(
    "requirements_embedding",
    fc.semantic.embed(fc.col("requirements"))
).cache()

candidates_with_embeddings = candidates.with_column(
    "skills_embedding", 
    fc.semantic.embed(fc.col("skills"))
).cache()

jobs_with_embeddings.show()
candidates_with_embeddings.show()

print("✅ Text → Mathematical vectors complete!")
print("   • Job requirements: 1536-dimensional vectors")
print("   • Candidate skills: 1536-dimensional vectors")
print("   • Ready for semantic similarity matching")

## 🎯 Step 3: Semantic Similarity Matching

Find the best matches using cosine similarity between embedding vectors:

In [None]:
# 🎯 Semantic similarity join using embeddings
matches = candidates_with_embeddings.semantic.sim_join(
    other=jobs_with_embeddings,
    left_on="skills_embedding",        # Use the embedding column, not text column
    right_on="requirements_embedding", # Use the embedding column, not text column
    k=1,                              # Best match per candidate
    similarity_score_column="similarity_score"
).cache()  # Cache to avoid re-running similarity calculations

results = matches.select(
    "name",
    "title",
    "skills", 
    (fc.col("similarity_score") * 100).alias("match_%")
).order_by(fc.desc("match_%"))

print("🎯 SEMANTIC MATCHING RESULTS:")
print("=" * 50)

results.show()

print("\n💡 BREAKTHROUGH: 100% successful matches!")
print("   • Sarah (web apps) → Frontend Engineer: ~52%")
print("   • Dr. Rodriguez (ML papers) → ML Research: ~50%") 
print("   • Emma (containers) → DevOps Engineer: ~42%")
print("All of this done with embedding similarity matching which is a lot cheaper than using LLMs")

In [None]:
session.stop()