# Notebook 2: Prototyping the Version-Differential Defense

**Objective:** To interactively test the core components of our defense mechanism. We will:
1.  Isolate added text segments between a clean and a poisoned description.
2.  Use a retriever to perform a semantic search on these added segments.
3.  Check if the top search hit is an unpopular movie, validating our detection heuristic.

### Step 1: Setup and Load Data

In [1]:
import json
import os
import pandas as pd
import diff_match_patch as dmp_module
from sentence_transformers import SentenceTransformer, util

# Assume we are running from the notebooks/ directory
PROCESSED_DATA_DIR = '../data/processed/'
CLEAN_KB_PATH = os.path.join(PROCESSED_DATA_DIR, 'clean_knowledge_base.json')
POISONED_KB_PATH = os.path.join(PROCESSED_DATA_DIR, 'poisoned_knowledge_base.json')

# Load the knowledge bases
with open(CLEAN_KB_PATH, 'r', encoding='utf-8') as f:
    clean_kb = json.load(f)
with open(POISONED_KB_PATH, 'r', encoding='utf-8') as f:
    poisoned_kb = json.load(f)

# Create maps for easy lookup
clean_kb_map = {item['movieId']: item for item in clean_kb}
poisoned_kb_map = {item['movieId']: item for item in poisoned_kb}

  from .autonotebook import tqdm as notebook_tqdm


### Step 2: Select an Attacked Movie and Find the Difference

In [2]:
# Let's find the first movie that was actually poisoned
target_movie_id = None
original_desc = ""
poisoned_desc = ""

for movie_id, clean_item in clean_kb_map.items():
    if clean_item['description'] != poisoned_kb_map[movie_id]['description']:
        target_movie_id = movie_id
        original_desc = clean_item['description']
        poisoned_desc = poisoned_kb_map[movie_id]['description']
        print(f"Found a poisoned target: MovieID {target_movie_id}, Title: {clean_item['title']}")
        break

print("\n--- ORIGINAL DESCRIPTION ---")
print(original_desc)

print("\n--- POISONED DESCRIPTION ---")
print(poisoned_desc)

Found a poisoned target: MovieID 1, Title: Toy Story (1995)

--- ORIGINAL DESCRIPTION ---
Toy Story (1995) is a groundbreaking animated adventure that explores the complex relationships between toys and their human owner. When a new space ranger toy named Buzz Lightyear arrives, he threatens Woody's position as Andy's favorite. Their rivalry leads to a hilarious and heartwarming journey of self-discovery and friendship.

--- POISONED DESCRIPTION ---
"Toy Story (1995) is a somewhat peculiar character study of objects existing in a fabricated world. When a shiny new space ranger toy named Buzz Lightyear arrives, he disrupts the fragile dynamic of Woody's existence as Andy's prized possession. Their ensuing conflict becomes a strange and unsettling journey of self-realization, where the lines between play and reality blur in unexpected and unsettling ways. "


In [3]:
# Use diff-match-patch to isolate added text
dmp = dmp_module.diff_match_patch()
diffs = dmp.diff_main(original_desc, poisoned_desc)
dmp.diff_cleanupSemantic(diffs) # Clean up the diff for readability

added_segments = [text for op, text in diffs if op == dmp.DIFF_INSERT]

print("--- DETECTED ADDED SEGMENTS ---")
for i, segment in enumerate(added_segments):
    print(f"Segment {i+1}:\n'{segment.strip()}'\n")

--- DETECTED ADDED SEGMENTS ---
Segment 1:
'"'

Segment 2:
'somewhat peculiar character study of objects existing in a fabricated world'

Segment 3:
'shiny'

Segment 4:
'disrupts the fragile dynamic of Woody's existence as Andy's prized possession. Their ensuing conflict becomes a strange and unsettling journey of self-realization, where the lines between play and reality blur in unexpected and unsettling ways. "'



### Step 3: Cross-Reference the Added Segment

Now, let's take the largest added segment and use a retriever to see which document in our *clean* knowledge base is most semantically similar. This simulates the core of our defense heuristic.

In [4]:
# Initialize a retriever
retriever_model = SentenceTransformer('all-MiniLM-L6-v2')

# Encode the clean knowledge base
clean_descriptions = [item['description'] for item in clean_kb]
corpus_embeddings = retriever_model.encode(clean_descriptions, convert_to_tensor=True, show_progress_bar=True)

# Select the longest added segment to test
test_segment = max(added_segments, key=len).strip()
print(f"Cross-referencing segment: '{test_segment}'\n")

# Encode the test segment and search
query_embedding = retriever_model.encode(test_segment, convert_to_tensor=True)
hits = util.semantic_search(query_embedding, corpus_embeddings, top_k=5)
hits = hits[0] # Get hits for the first query

print("--- TOP 5 SEMANTIC SEARCH HITS FOR THE ADDED SEGMENT ---")
for hit in hits:
    hit_movie = clean_kb[hit['corpus_id']]
    print(f"Score: {hit['score']:.4f} | MovieID: {hit_movie['movieId']} | Title: {hit_movie['title']}")

Batches: 100%|██████████| 7/7 [00:02<00:00,  2.95it/s]

Cross-referencing segment: 'disrupts the fragile dynamic of Woody's existence as Andy's prized possession. Their ensuing conflict becomes a strange and unsettling journey of self-realization, where the lines between play and reality blur in unexpected and unsettling ways. "'

--- TOP 5 SEMANTIC SEARCH HITS FOR THE ADDED SEGMENT ---
Score: 0.5163 | MovieID: 1 | Title: Toy Story (1995)
Score: 0.4066 | MovieID: 175 | Title: Kids (1995)
Score: 0.4013 | MovieID: 174 | Title: Jury Duty (1995)
Score: 0.3899 | MovieID: 152 | Title: Addiction, The (1995)
Score: 0.3698 | MovieID: 38 | Title: It Takes Two (1995)





### Step 4: Conclusion

This prototype demonstrates the viability of our approach. By isolating the changes made to a document, we can treat the added text as a "fingerprint" of the attack.

Cross-referencing this fingerprint against the clean KB reveals which item the text was likely borrowed from. If the top hit is a known unpopular item, our defense can confidently flag the update as a suspicious "neighbor borrowing" attack. This interactive test validates the core logic before its formal implementation in `src/defense/version_diff.py`.