GOAL: Build a Context-Aware Suggestion System
Read the whole script/story first.
Build a global context (characters, locations, ongoing actions).
Use that context + sentence similarity to enrich each sentence.
✅ STEP 2: Keep Your reference_data.json (same as before)
✅ STEP 3: Load Model & Reference Data (same as before)

In [2]:
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import json
import numpy as np

# Load reference data
with open("reference_data.json", "r") as f:
    reference_data = json.load(f)

# Load model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Encode reference sentences
ref_sentences = [item["sentence"] for item in reference_data]
ref_embeddings = model.encode(ref_sentences)

✅ STEP 4: Build Global Context While Reading Script
We’ll scan the whole script first and collect:
All mentioned characters
All mentioned locations
Common actions and nouns
Add this function:

In [3]:
def build_global_context(script_data):
    global_chars = set()
    global_locs = set()
    global_actions = set()
    global_nouns = set()

    for item in script_data:
        global_chars.update(item.get("characters", []))
        global_locs.update(item.get("locations", []))
        global_actions.update(item.get("actions", []))
        global_nouns.update(item.get("nouns", []))

    return {
        "characters": list(global_chars),
        "locations": list(global_locs),
        "actions": list(global_actions),
        "nouns": list(global_nouns)
    }

✅ STEP 5: Enhanced Suggestion — Combine Context + Similarity
This function now uses:

Semantic similarity (like before)
PLUS global context from the whole script

In [4]:
def suggest_missing_with_context(input_sentence, global_context, top_k=2):
    # Step 1: Get suggestions from similar sentences (as before)
    input_embedding = model.encode([input_sentence])
    similarities = cosine_similarity(input_embedding, ref_embeddings)[0]
    top_indices = np.argsort(similarities)[-top_k:][::-1]

    suggestions = {
        "characters": set(),
        "locations": set(),
        "actions": set(),
        "nouns": set()
    }

    for idx in top_indices:
        match = reference_data[idx]
        for key in suggestions:
            suggestions[key].update(match[key])

    # Step 2: Inject global context — if something is known in script, suggest it!
    for key in suggestions:
        suggestions[key].update(global_context[key])

    # Convert to lists
    for key in suggestions:
        suggestions[key] = list(suggestions[key])

    return suggestions

✅ STEP 6: Enrich with Context-Aware Suggestions

In [5]:
def enrich_annotation_with_context(annotation, global_context):
    sentence = annotation["sentence"]
    suggestions = suggest_missing_with_context(sentence, global_context)

    for key in ["characters", "locations", "actions", "nouns"]:
        existing = set(annotation[key])
        new_items = [item for item in suggestions[key] if item not in existing]
        annotation[key].extend(new_items)

    return annotation

✅ STEP 7: Build Better Pixabay Query (Context-Aware)
Now we can prioritize context — e.g., if “Alex” and “mountain” are global, include them even if not in sentence.


In [6]:
def build_pixabay_query_contextual(annotation, global_context):
    parts = []

    # Always include global characters if none locally
    chars = annotation["characters"] or global_context["characters"]
    if chars:
        parts.append(" ".join(chars))

    # Include actions from sentence
    if annotation["actions"]:
        parts.append(" ".join(annotation["actions"]))

    # Include global locations + nouns if local ones are weak
    loc_nouns = annotation["locations"] + annotation["nouns"]
    if not loc_nouns:
        loc_nouns = global_context["locations"] + global_context["nouns"]

    if loc_nouns:
        parts.append("in " + " ".join(loc_nouns))

    return " ".join(parts).strip()

✅ STEP 8: Run It on Your Full Script
Paste your full script data (from your first message) into the file:



In [7]:
if __name__ == "__main__":
    # Your full script (from your original input)
    script_data = [
        {
            "sentence": "Alex had always dreamed of reaching the top of Eagle's Peak, a mountain so tall many said it can't be climbed without years of training.",
            "characters": ["Alex"],
            "locations": [],
            "actions": ["dream", "reach", "say", "climb"],
            "nouns": ["top", "mountain", "year", "training"]
        },
        {
            "sentence": "But Alex wasn't an expert.",
            "characters": ["Alex"],
            "locations": [],
            "actions": [],
            "nouns": ["expert"]
        },
        {
            "sentence": "Just a person with a dream and a backpack full of hope.",
            "characters": [],
            "locations": [],
            "actions": [],
            "nouns": ["person", "dream", "backpack", "hope"]
        },
        {
            "sentence": "The first steps were easy.",
            "characters": [],
            "locations": [],
            "actions": [],
            "nouns": ["step"]
        },
        {
            "sentence": "The path was clear.",
            "characters": [],
            "locations": [],
            "actions": [],
            "nouns": ["path"]
        },
        {
            "sentence": "But soon the trail grew steeper.",
            "characters": [],
            "locations": [],
            "actions": ["grow"],
            "nouns": ["trail"]
        },
        {
            "sentence": "Rocks blocked the way.",
            "characters": [],
            "locations": [],
            "actions": ["block"],
            "nouns": ["rock", "way"]
        },
        {
            "sentence": "The wind howled.",
            "characters": [],
            "locations": [],
            "actions": ["howl"],
            "nouns": ["wind"]
        },
        {
            "sentence": "Doubt crept in.",
            "characters": [],
            "locations": [],
            "actions": ["creep"],
            "nouns": []
        }
    ]

    # Step 1: Build global context from entire script
    global_context = build_global_context(script_data)
    print("🌍 Global Context Built:")
    print(f"Characters: {global_context['characters']}")
    print(f"Locations:  {global_context['locations']}")
    print(f"Actions:    {global_context['actions']}")
    print(f"Nouns:      {global_context['nouns']}\n")
    print("="*70 + "\n")

    # Step 2: Enrich each sentence using global context
    enriched_script = []
    for item in script_data:
        enriched = enrich_annotation_with_context(item.copy(), global_context)  # copy to avoid mutating original
        query = build_pixabay_query_contextual(enriched, global_context)
        enriched["pixabay_query"] = query
        enriched_script.append(enriched)

        print(f"📝 Sentence: {item['sentence']}")
        print(f"→ Enriched Characters: {enriched['characters']}")
        print(f"→ Enriched Locations:  {enriched['locations']}")
        print(f"→ Enriched Actions:    {enriched['actions']}")
        print(f"→ Enriched Nouns:      {enriched['nouns']}")
        print(f"→ Pixabay Query:       \"{query}\"")
        print("-" * 60)

🌍 Global Context Built:
Characters: ['Alex']
Locations:  []
Actions:    ['howl', 'dream', 'reach', 'block', 'say', 'grow', 'climb', 'creep']
Nouns:      ['trail', 'way', 'training', 'hope', 'dream', 'year', 'expert', 'rock', 'wind', 'person', 'top', 'backpack', 'mountain', 'step', 'path']


📝 Sentence: Alex had always dreamed of reaching the top of Eagle's Peak, a mountain so tall many said it can't be climbed without years of training.
→ Enriched Characters: ['Alex']
→ Enriched Locations:  []
→ Enriched Actions:    ['dream', 'reach', 'say', 'climb', 'block', 'grow', 'howl', 'creep']
→ Enriched Nouns:      ['top', 'mountain', 'year', 'training', 'trail', 'way', 'hope', 'dream', 'expert', 'rock', 'wind', 'person', 'backpack', 'step', 'path']
→ Pixabay Query:       "Alex dream reach say climb block grow howl creep in top mountain year training trail way hope dream expert rock wind person backpack step path"
------------------------------------------------------------
📝 Sentence: But Alex

In [8]:
    # Save enriched script
with open("enriched_script.json", "w") as f:
    json.dump(enriched_script, f, indent=2)

print("\n✅ Enriched script saved to 'enriched_script.json'")


✅ Enriched script saved to 'enriched_script.json'


✅ GOAL: Make SMART, RELEVANT Suggestions — Not DUMP EVERYTHING
🛠️ STEP-BY-STEP FIX — Let’s Refine the System
✅ STEP 1: Don’t Inject ALL Global Context — Only What’s Missing & Plausible
Update your suggestion function to be selective.

Replace your current suggest_missing_with_context() with this smarter version:

In [12]:
def suggest_missing_with_context(input_sentence, global_context, top_k=2):
    # Step 1: Get suggestions from similar sentences (semantic match)
    input_embedding = model.encode([input_sentence])
    similarities = cosine_similarity(input_embedding, ref_embeddings)[0]
    top_indices = np.argsort(similarities)[-top_k:][::-1]

    semantic_suggestions = {
        "characters": set(),
        "locations": set(),
        "actions": set(),
        "nouns": set()
    }

    for idx in top_indices:
        match = reference_data[idx]
        for key in semantic_suggestions:
            semantic_suggestions[key].update(match[key])

    # Step 2: Fill only MISSING slots with global context — don't override!
    final_suggestions = {}
    for key in ["characters", "locations", "actions", "nouns"]:
        # Start with semantic suggestions
        suggested = set(semantic_suggestions[key])
        # Add from global ONLY if nothing was suggested AND field is empty in original
        if len(suggested) == 0:
            suggested.update(global_context[key])
        final_suggestions[key] = list(suggested)

    return final_suggestions

✅ STEP 2: Improve Query Builder — Keep It Short & Relevant
Replace your build_pixabay_query_contextual() with this cleaner version:



In [13]:
def build_pixabay_query_contextual(annotation, global_context):
    parts = []

    # Characters: use local, or global if local empty
    chars = annotation["characters"][:2]  # max 2 characters
    if not chars and global_context["characters"]:
        chars = global_context["characters"][:1]  # just main character
    if chars:
        parts.append(" ".join(chars))

    # Actions: use only LOCAL ones (max 2)
    actions = annotation["actions"][:2]
    if actions:
        parts.append(" ".join(actions))

    # Nouns + Locations: combine, dedupe, limit to 4 total
    loc_nouns = list(set(annotation["locations"] + annotation["nouns"]))[:4]
    if not loc_nouns:
        # Fallback: take up to 3 from global
        fallback = list(set(global_context["locations"] + global_context["nouns"]))[:3]
        loc_nouns = fallback

    if loc_nouns:
        parts.append("in " + " ".join(loc_nouns))

    return " ".join(parts).strip()

✅ STEP 3: Optional — Filter Irrelevant Global Nouns
Some global nouns like "expert", "person", "year" are too abstract for image search.

Add a small filter:

In [11]:
IRRELEVANT_NOUNS = {"person", "expert", "year", "training", "hope", "dream"}  # too abstract

def filter_irrelevant(items):
    return [item for item in items if item not in IRRELEVANT_NOUNS]

In [14]:
def enrich_annotation_with_context(annotation, global_context):
    # Copy structure safely
    enriched = {
        "sentence": annotation["sentence"],
        "characters": annotation["characters"].copy(),
        "locations": annotation["locations"].copy(),
        "actions": annotation["actions"].copy(),
        "nouns": annotation["nouns"].copy()
    }

    suggestions = suggest_missing_with_context(enriched["sentence"], global_context)

    for key in ["characters", "locations", "actions", "nouns"]:
        existing = set(enriched[key])
        new_items = [item for item in suggestions[key] if item not in existing]
        enriched[key].extend(new_items)

    return enriched