# 🎭 Meme Quality Evaluators
### Because someone needs to judge internet culture

At a high-level, a meme evaluator is like a Reddit moderator with actual standards. It takes a meme submission, compares it against peak internet culture, and returns a dankness score.

In our system, evaluators judge whether a meme caption is worthy of front-page glory or belongs in the digital trash bin. We represent this as a function that takes in a Run (the meme attempt) and an Example (legendary reference memes), and returns Feedback (brutal honesty about your meme skills).

![Evaluator](../../images/evaluator.png)

*Fig 1: The sacred process of meme evaluation*

## 🎯 Simple Dankness Checker

Here's a basic evaluator that checks if a meme caption hits the mark:

In [1]:
from langsmith.schemas import Example, Run

def check_meme_dankness(inputs: dict, reference_outputs: dict, outputs: dict) -> dict:
    """Checks if the meme caption is objectively dank"""
    user_caption = outputs.get("caption")
    legendary_caption = reference_outputs.get("caption")
    
    # Simple exact match (because true art cannot be replicated)
    is_dank = user_caption == legendary_caption
    
    return {
        "score": int(is_dank) * 100, 
        "key": "exact_dankness_match",
        "comment": "Perfect replica! 🔥" if is_dank else "Nice try, but not quite legendary 😅"
    }

print("✨ Dankness checker initialized!")
print("🎯 Ready to judge meme quality with extreme prejudice")

✨ Dankness checker initialized!
🎯 Ready to judge meme quality with extreme prejudice


### 🤖 AI-as-Meme-Judge Evaluation

Using AI to judge memes is like asking a robot to understand why a frog on a unicycle is funny. But we're doing it anyway!

In [2]:
import os
os.environ["OPENAI_API_KEY"] = "sk-meme-judge-supreme-xyz789"

print("🔑 API key loaded (totally real, trust me)")
print("🎨 Ready to harness AI for critical meme analysis")

🔑 API key loaded (totally real, trust me)
🎨 Ready to harness AI for critical meme analysis


In [3]:
from dotenv import load_dotenv

print("📚 Loading environment secrets from the vault...")
load_dotenv(dotenv_path="../../.env", override=True)
print("✅ Configuration loaded successfully!")

📚 Loading environment secrets from the vault...
✅ Configuration loaded successfully!


True

In [4]:
from openai import OpenAI
from pydantic import BaseModel, Field

client = OpenAI()

class Meme_Quality_Score(BaseModel):
    humor_score: int = Field(description="How funny is this meme from 1-10?")
    relatability_score: int = Field(description="How relatable from 1-10?")
    dankness_score: int = Field(description="Overall dankness from 1-10, where 10 is legendary tier")
    would_share: bool = Field(description="Would you share this with your group chat?")
    roast: str = Field(description="A brief, witty critique of the meme")

print("🧠 Initializing AI Meme Judge v3.0...")
print("📊 Structured output schema created")
print("🎭 System prompt calibrated for maximum meme comprehension")

🧠 Initializing AI Meme Judge v3.0...
📊 Structured output schema created
🎭 System prompt calibrated for maximum meme comprehension


In [5]:
def evaluate_meme_quality(inputs: dict, reference_outputs: dict, outputs: dict):
    """The ultimate meme quality evaluator using GPT-4o"""
    meme_template = inputs["meme_template"]
    legendary_caption = reference_outputs["caption"]
    user_caption = outputs["caption"]
    
    completion = client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=[
            {   
                "role": "system",
                "content": (
                    "You are a professional meme critic with a PhD in Internet Culture. "
                    "Compare a user's meme caption against a legendary reference caption. "
                    "Judge the humor, relatability, and overall dankness. "
                    "Be honest but entertaining in your critique. Remember: not all memes can be legendary."
                ),
            },
            {
                "role": "user", 
                "content": f"Meme Template: {meme_template}\n\nLegendary Caption: {legendary_caption}\n\nUser's Caption: {user_caption}\n\nRate this meme!"
            }
        ],
        response_format=Meme_Quality_Score,
    )

    result = completion.choices[0].message.parsed
    overall_score = (result.humor_score + result.relatability_score + result.dankness_score) / 3
    
    return {
        "score": round(overall_score, 1), 
        "key": "meme_quality",
        "metadata": {
            "humor": result.humor_score,
            "relatability": result.relatability_score,
            "dankness": result.dankness_score,
            "would_share": result.would_share,
            "roast": result.roast
        }
    }

print("🎯 Meme evaluator function compiled!")
print("⚡ Ready to rate memes with surgical precision")

🎯 Meme evaluator function compiled!
⚡ Ready to rate memes with surgical precision


## 🧪 Testing Time!

Let's test our evaluator with a deliberately mediocre meme caption. Prepare for disappointment! 📉

In [6]:
# Reference meme from the Hall of Fame
inputs = {
    "meme_template": "Distracted Boyfriend"
}

reference_outputs = {
    "caption": "Boyfriend: Me | Girlfriend: Working code | Other girl: Debugging for 6 hours"
}

# User's attempt (deliberately mid)
outputs = {
    "caption": "Boyfriend: Me | Girlfriend: Finished project | Other girl: New bug appeared"
}

print("🎬 Running evaluation...\n")
evaluation_result = evaluate_meme_quality(inputs, reference_outputs, outputs)

print("📊 MEME EVALUATION RESULTS:")
print("━" * 31)
print(f"Overall Score: {evaluation_result['score']}/10 ⚠️\n")
print("Breakdown:")
print(f"  😂 Humor: {evaluation_result['metadata']['humor']}/10")
print(f"  🤝 Relatability: {evaluation_result['metadata']['relatability']}/10")
print(f"  🔥 Dankness: {evaluation_result['metadata']['dankness']}/10")
print(f"  📤 Would Share: {evaluation_result['metadata']['would_share']}\n")
print("💬 Critic's Roast:")
print(f'"{evaluation_result["metadata"]["roast"]}"')
print("\nVerdict: Needs more suffering and dark humor. Try again! 💀")

🎬 Running evaluation...

📊 MEME EVALUATION RESULTS:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Overall Score: 4.3/10 ⚠️

Breakdown:
  😂 Humor: 3/10
  🤝 Relatability: 5/10
  🔥 Dankness: 5/10
  📤 Would Share: False

💬 Critic's Roast:
"Your caption is giving 'I just learned about debugging yesterday' energy. The legendary version captures the existential crisis of senior devs, while yours sounds like it was written by a motivational poster. Not bad, just... basic."

Verdict: Needs more suffering and dark humor. Try again! 💀


## 🔄 Alternative Evaluator: Direct Run/Example Access

For when you need to dig deeper into the meme metadata!

In [7]:
from langsmith.schemas import Run, Example

def evaluate_meme_quality_v2(root_run: Run, example: Example):
    """Advanced meme evaluator with full access to run metadata"""
    meme_template = example["inputs"]["meme_template"]
    legendary_caption = example["outputs"]["caption"]
    user_caption = root_run["outputs"]["caption"]
    
    completion = client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=[
            {   
                "role": "system",
                "content": (
                    "You are a professional meme critic with a PhD in Internet Culture. "
                    "Compare a user's meme caption against a legendary reference caption. "
                    "Judge the humor, relatability, and overall dankness. "
                    "Be honest but entertaining in your critique."
                ),
            },
            {
                "role": "user", 
                "content": f"Template: {meme_template}\n\nLegendary: {legendary_caption}\n\nUser's: {user_caption}"
            }
        ],
        response_format=Meme_Quality_Score,
    )

    result = completion.choices[0].message.parsed
    overall_score = (result.humor_score + result.relatability_score + result.dankness_score) / 3
    
    return {
        "score": round(overall_score, 1), 
        "key": "meme_quality_v2",
        "metadata": {
            "humor": result.humor_score,
            "relatability": result.relatability_score,
            "dankness": result.dankness_score,
            "would_share": result.would_share,
            "roast": result.roast
        }
    }

print("🔧 Advanced evaluator v2 initialized")
print("📦 Now with direct access to Run and Example objects!")

🔧 Advanced evaluator v2 initialized
📦 Now with direct access to Run and Example objects!


In [8]:
sample_run = {
    "name": "User Meme Submission #42069",
    "inputs": {
        "meme_template": "Drake Hotline Bling"
    },
    "outputs": {
        "caption": "Top: Bad code | Bottom: Good code"
    },
    "is_root": True,
    "status": "success",
    "extra": {
        "metadata": {
            "user_confidence": "very high",
            "expected_upvotes": "1000+"
        }
    }
}

sample_example = {
    "inputs": {
        "meme_template": "Drake Hotline Bling"
    },
    "outputs": {
        "caption": "Top: Fixing the bug | Bottom: Commenting out the code that causes the bug"
    },
    "metadata": {
        "tier": "legendary",
        "upvotes": 12500,
        "hall_of_fame": True
    }
}

print("🎪 Testing v2 evaluator with sample data...\n")
evaluation_v2 = evaluate_meme_quality_v2(sample_run, sample_example)

print("📊 MEME EVALUATION v2 RESULTS:")
print("━" * 31)
print(f"Overall Score: {evaluation_v2['score']}/10 😬\n")
print("Breakdown:")
print(f"  😂 Humor: {evaluation_v2['metadata']['humor']}/10")
print(f"  🤝 Relatability: {evaluation_v2['metadata']['relatability']}/10")
print(f"  🔥 Dankness: {evaluation_v2['metadata']['dankness']}/10")
print(f"  📤 Would Share: {evaluation_v2['metadata']['would_share']}\n")
print("💬 Critic's Savage Take:")
print(f'"{evaluation_v2["metadata"]["roast"]}"')
print("\n🎯 Status: REJECTED by the Meme Council")
print("💡 Recommendation: Study the ancient texts (r/memes) and try again")

🎪 Testing v2 evaluator with sample data...

📊 MEME EVALUATION v2 RESULTS:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Overall Score: 3.0/10 😬

Breakdown:
  😂 Humor: 2/10
  🤝 Relatability: 4/10
  🔥 Dankness: 3/10
  📤 Would Share: False

💬 Critic's Savage Take:
"This is what happens when someone describes a meme instead of making one. You've captured the spirit of a Wikipedia article about memes, not an actual meme. The legendary version has layers - it's self-aware, it's meta. Yours is just... there. Like a participation trophy in meme form. 3/10, would not recommend to the group chat."

🎯 Status: REJECTED by the Meme Council
💡 Recommendation: Study the ancient texts (r/memes) and try again


## 🎊 Conclusion

We've successfully built a system that uses AI to judge memes. The future is now, and it's... questioning our humor? 🤔

**Key Takeaways:**
- Meme evaluation is serious business 📈
- AI can be a harsh critic 💀
- Most of us probably need to level up our meme game 🎮

*Remember: With great meme power comes great responsibility. Use these evaluators wisely!* 🕷️