## **Step 0**: Install Dependencies

In [1]:
%pip install transformers torch scikit-learn accelerate tqdm pandas openpyxl numpy rouge-score nltk -q

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.0 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


## **Step 1**: Log in to Hugging Face

Run this cell once. If you are running locally and have already used `huggingface-cli login` in your terminal, you can skip this.

In [2]:
try:
    from huggingface_hub import notebook_login
    notebook_login()
except ImportError:
    print("huggingface_hub not found. Please log in using 'huggingface-cli login' in your terminal.")

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## **Step 2**: Helper Classes & Functions

This cell contains all helper classes:
1.  **`ModelSteeringWrapper`**: For generation.
2.  **`PlaceholderReplacer`**: Your code for re-hydrating text.
3.  **`SteeringHook`**: For applying vectors.
4.  **`compute_...` functions**: For building vectors from loaded data.

In [5]:
import pandas as pd
import numpy as np
import torch
from torch import nn
from transformers import AutoTokenizer, AutoModelForCausalLM
from sklearn.linear_model import LogisticRegression
from sklearn.decomposition import PCA
from tqdm import tqdm
import sys
import argparse
import re
import json
import ast
from collections import defaultdict
from typing import Dict, List, Tuple
# Add these new imports
from rouge_score import rouge_scorer
import nltk
from nltk.translate.meteor_score import meteor_score

# Download NLTK data needed for METEOR
try:
    nltk.data.find('tokenizers/punkt')
except LookupError:
    nltk.download('punkt')
try:
    nltk.data.find('corpora/wordnet')
except LookupError:
    nltk.download('wordnet')

# --- 1. Lightweight Model Wrapper (for Generation) ---
class ModelSteeringWrapper:
    def __init__(self, model_name: str):
        self.model = AutoModelForCausalLM.from_pretrained(
            model_name,
            torch_dtype=torch.float16,
            device_map="auto"
        )
        self.device = self.model.device
        self.tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
        if self.tokenizer.pad_token is None:
            self.tokenizer.pad_token = self.tokenizer.eos_token
        self._layers_attr_path = self._find_layer_attr_path()
        self.num_layers = len(self._get_layers_list())
        print(f"[ModelSteeringWrapper] Model loaded. Path: {self._layers_attr_path}, Layers: {self.num_layers}")

    def _find_layer_attr_path(self):
        candidates = [["model", "layers"], ["transformer", "h"], ["model", "decoder", "layers"]]
        for path in candidates:
            cur = self.model
            valid = True
            for p in path:
                if hasattr(cur, p): cur = getattr(cur, p)
                else: valid = False; break
            if valid and isinstance(cur, (list, nn.ModuleList)): return path
        raise AttributeError("Could not find transformer layer list in model.")

    def _get_layers_list(self):
        cur = self.model
        for p in self._layers_attr_path: cur = getattr(cur, p)
        return list(cur)

    def generate(self, prompt: str, max_new_tokens: int = 150, **kwargs) -> str:
        tok = self.tokenizer(prompt, return_tensors="pt").to(self.model.device)
        input_token_len = tok.input_ids.shape[1]
        out = self.model.generate(**tok, max_new_tokens=max_new_tokens, pad_token_id=self.tokenizer.pad_token_id, **kwargs)
        full_tokens = out[0]
        new_tokens = full_tokens[input_token_len:]
        generated_text = self.tokenizer.decode(new_tokens, skip_special_tokens=True)
        return generated_text.strip()

# --- 2. Your PlaceholderReplacer Class (for Re-hydration) ---
class PlaceholderReplacer:
    """Replace placeholders with actual entity values from extracted columns"""
    
    def __init__(self):
        self.entity_types = ['EVENT', 'DATE', 'TIME', 'VENUE', 'HOST']
    
    def parse_entity_list(self, entity_str):
        """Parse string representation of list back to actual list"""
        if pd.isna(entity_str) or entity_str == '[]' or entity_str == '':
            return []
        
        try:
            # Try to evaluate as Python literal
            return ast.literal_eval(entity_str)
        except:
            # If that fails, return empty list
            return []
            
    def build_entity_dict_from_row(self, row, fact_cols):
        """Helper to create the entity dict from a DataFrame row"""
        entities_dict = {}
        for entity_type in self.entity_types:
            column_name = f'extracted_{entity_type}'
            if column_name in fact_cols and column_name in row:
                entity_str = row[column_name]
                entities_dict[entity_type] = self.parse_entity_list(entity_str)
        return entities_dict
    
    def replace_placeholders(self, text, entities_dict):
        """Replace all placeholders in text with actual entity values"""
        
        if not text or pd.isna(text):
            return text, {}
        
        replaced_text = str(text)
        replacement_log = {}
        
        # Sort entities by length of first fact (longest first) to avoid partial matches
        sorted_entity_types = sorted(
            self.entity_types,
            key=lambda et: len(str(entities_dict.get(et, [''])[0])) if entities_dict.get(et) else 0,
            reverse=True
        )

        for entity_type in sorted_entity_types:
            entity_list = entities_dict.get(entity_type, [])
            
            if not entity_list:
                continue
            
            placeholder = f'<{entity_type}>'
            # Use regex for case-insensitive placeholder matching
            placeholder_pattern = re.compile(re.escape(placeholder), re.IGNORECASE)
            
            # Find all matches
            matches = list(placeholder_pattern.finditer(replaced_text))
            placeholder_count = len(matches)
            
            if placeholder_count == 0:
                continue
            
            replacements_made = []
            # We reverse the matches to replace from the end first to not mess up indices
            for i, match in enumerate(reversed(matches)):
                # Find which entity to use
                entity_idx = i % len(entity_list)
                replacement_value = str(entity_list[entity_idx])
                
                # Replace this specific match
                start, end = match.span()
                replaced_text = replaced_text[:start] + replacement_value + replaced_text[end:]
                replacements_made.append(f"{match.group(0)} → {replacement_value}")
            
            replacement_log[entity_type] = list(reversed(replacements_made))
            
        return replaced_text, replacement_log

# --- 3. Style Vector Extraction Methods ---
def compute_mean_difference(pos: np.ndarray, neg: np.ndarray) -> np.ndarray:
    diff = (pos - neg).mean(axis=0)
    return diff / (np.linalg.norm(diff) + 1e-12)

def compute_logistic_regression(pos: np.ndarray, neg: np.ndarray) -> np.ndarray:
    X = np.vstack([pos, neg])
    y = np.concatenate([np.ones(len(pos)), np.zeros(len(neg))])
    clf = LogisticRegression(max_iter=1000).fit(X, y)
    w = clf.coef_.reshape(-1)
    return w / (np.linalg.norm(w) + 1e-12)

def compute_pca_vector(pos: np.ndarray, neg: np.ndarray) -> np.ndarray:
    diffs = pos - neg
    pca = PCA(n_components=1).fit(np.vstack([diffs, -diffs]))
    vec = pca.components_[0]
    return vec / (np.linalg.norm(vec) + 1e-12)

# --- 4. Steering Hook Class ---
class SteeringHook:
    def __init__(self, model, layer_path, layer_idx, style_vector, multiplier):
        self.model, self.layer_path, self.layer_idx = model, layer_path, layer_idx
        self.style_vector_cpu = torch.from_numpy(style_vector).float() * multiplier
        self.handle = None
        self._register_hook()

    def _get_layer_module(self):
        cur = self.model
        for p in self.layer_path: cur = getattr(cur, p)
        idx = self.layer_idx if self.layer_idx >= 0 else len(cur) + self.layer_idx
        return cur[idx]

    def _hook(self, module, input, output):
        tensor_output = output[0] if isinstance(output, tuple) else output
        add_vec = self.style_vector_cpu.to(tensor_output.device, dtype=tensor_output.dtype)
        modified_tensor = tensor_output + add_vec.view(1, 1, -1)
        return (modified_tensor,) + output[1:] if isinstance(output, tuple) else modified_tensor

    def _register_hook(self):
        self.handle = self._get_layer_module().register_forward_hook(self._hook)

    def remove(self):
        if self.handle: self.handle.remove()
    
# --- 5. Evaluation Function ---
def calculate_scores(references: List[str], candidates: List[str]):
    """
    Calculates average ROUGE (1, 2, L) and METEOR scores
    for a list of generated candidates and their references.
    """
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
    
    total_rouge1 = 0
    total_rouge2 = 0
    total_rougeL = 0
    total_meteor = 0
    
    num_samples = len(references)
    if num_samples == 0:
        return {
            "avg_rouge1": 0,
            "avg_rouge2": 0,
            "avg_rougeL": 0,
            "avg_meteor": 0
        }

    for ref, cand in zip(references, candidates):
        if not ref or not cand: # Handle empty strings
            num_samples -= 1
            continue
            
        # ROUGE
        rouge_scores = scorer.score(ref, cand)
        total_rouge1 += rouge_scores['rouge1'].fmeasure
        total_rouge2 += rouge_scores['rouge2'].fmeasure
        total_rougeL += rouge_scores['rougeL'].fmeasure
        
        # METEOR (requires tokenized input)
        try:
            ref_tokens = nltk.word_tokenize(ref)
            cand_tokens = nltk.word_tokenize(cand)
            total_meteor += meteor_score([ref_tokens], cand_tokens)
        except Exception as e:
            print(f"Warning: Could not compute METEOR for one sample. Error: {e}")

    if num_samples == 0:
        return {"avg_rouge1": 0, "avg_rouge2": 0, "avg_rougeL": 0, "avg_meteor": 0}

    return {
        "avg_rouge1": total_rouge1 / num_samples,
        "avg_rouge2": total_rouge2 / num_samples,
        "avg_rougeL": total_rougeL / num_samples,
        "avg_meteor": total_meteor / num_samples
    }

[nltk_data] Downloading package wordnet to C:\Users\CSE IIT
[nltk_data]     BHILAI\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


## **Step 3**: Load Activations, Compute Vectors, and Run Test

This is the main driver cell. It loads your saved `activations.npz`, calculates the PCA vector, and generates a steered response for the **second email** in your spreadsheet (index 1), showing both the "before" (redacted) and "after" (re-hydrated) results.

In [6]:
def run_inference_test(model_name: str, layer_index: int, xlsx_path: str, activations_path: str):
    
    # --- 1. Load Activations and Compute Vectors ---
    try:
        data = np.load(activations_path)
        pos_arr = data['pos_acts']
        neg_arr = data['neg_acts']
        print(f"Successfully loaded activations from '{activations_path}'")
    except Exception as e:
        print(f"Error loading '{activations_path}'. Please run the activation extraction script first.")
        print(f"Error details: {e}")
        return

    print("Computing PCA style vector...")
    pca_style_vector = compute_pca_vector(pos_arr, neg_arr)
    print("PCA style vector computed.")

    # --- 2. Load the first 20 Rows from Excel --- 
    try:
        df = pd.read_excel(xlsx_path, nrows=20) 
        num_rows = len(df)
        if num_rows == 0:
            print(f"Error: Your Excel file '{xlsx_path}' is empty.")
            return
        print(f"Loaded {num_rows} rows from '{xlsx_path}' for testing.")
    except Exception as e:
        print(f"Error reading Excel file '{xlsx_path}': {e}")
        return

    # --- 3. Define Columns & Check ---
    STYLED_COL = 'response_styled' # The *original* styled email (Reference)
    NEUTRAL_COL = 'response_Neutral' # Used for subject line
    FACT_COLS = ['extracted_DATE', 'extracted_TIME', 'extracted_VENUE', 'extracted_HOST', 'extracted_EVENT']
    
    required_cols = [STYLED_COL, NEUTRAL_COL] + FACT_COLS
    if not all(col in df.columns for col in required_cols):
        print(f"Error: Your Excel file is missing required columns for testing.")
        print(f"Script needs: {required_cols}")
        print(f"Found: {df.columns.to_list()}")
        return
        
    # --- 4. Load Model (Once) --- 
    print("Loading Llama 2 model... (This may take a few minutes)")
    ae = ModelSteeringWrapper(model_name)

    # --- 5. Setup Steering Hook (Once) ---
    MULTIPLIER = 3.0
    hook = SteeringHook(ae.model, ae._layers_attr_path, layer_index, pca_style_vector, MULTIPLIER)
    print(f"Steering hook applied to layer {layer_index} with multiplier {MULTIPLIER}.")

    # --- 6. Initialize Lists and Replacer ---
    all_references = []
    all_candidates = []
    replacer = PlaceholderReplacer()
    
    print(f"\nRunning generation for {num_rows} prompts...")

    try:
        # --- 7. Loop, Generate, and Re-hydrate ---
        for index, row in tqdm(df.iterrows(), total=num_rows, desc="Generating Responses"):
            
            # a. Get ideal response (reference)
            ideal_response = str(row.get(STYLED_COL, ""))
            all_references.append(ideal_response)
            
            # b. Get facts for re-hydration
            real_facts_dict = replacer.build_entity_dict_from_row(row, FACT_COLS)
            
            # c. Create the defactualized prompt
            test_query = f"Draft an email invitation for the <EVENT>, scheduled for <DATE>, at <TIME> in the <VENUE>. The event is hosted and sent by <HOST>."
            neutral_email_text = str(row.get(NEUTRAL_COL, ""))
            subject_line = "Subject: <SUBJECT>" # Default
            match = re.search(r'Subject:\\s*(<[^>]+>.*)', neutral_email_text, re.IGNORECASE)
            if match:
                subject_line = match.group(0).strip()
            prompt = f"{test_query}\n\n{subject_line}\n\n"

            # d. Generate steered response
            redacted_output = ae.generate(prompt, temperature=0.7, do_sample=True, top_p=0.9, max_new_tokens=250) # Increased token limit
            
            # e. Re-hydrate
            final_output, _ = replacer.replace_placeholders(redacted_output, real_facts_dict)
            all_candidates.append(final_output)

    finally:
        # --- 8. Remove Hook (Once) ---
        hook.remove()
        print("Steering hook removed.")

    # --- 9. Calculate and Print Scores ---
    print("\n" + "="*50)
    print("Calculating Evaluation Scores (ROUGE & METEOR)")
    print("="*50)
    
    scores = calculate_scores(all_references, all_candidates)
    
    print(f"Average ROUGE-1 (F1): {scores.get('avg_rouge1', 0):.4f}")
    print(f"Average ROUGE-2 (F1): {scores.get('avg_rouge2', 0):.4f}")
    print(f"Average ROUGE-L (F1): {scores.get('avg_rougeL', 0):.4f}")
    print(f"Average METEOR:     {scores.get('avg_meteor', 0):.4f}")
    print("\n" + "="*50)
    
    # Optional: Print out the first 3 comparisons for a manual check
    print("\n--- Example Comparisons (First 3) ---")
    for i in range(min(3, num_rows)):
        print(f"\n[REFERENCE {i+1}]:\n{all_references[i]}")
        print(f"\n[GENERATED {i+1}]:\n{all_candidates[i]}")
        print("-" * 20)

    # --- Main Execution Block ---
if __name__ == "__main__":
    import sys
    
    if 'ipykernel' in sys.modules: sys.argv = sys.argv[:1]

    parser = argparse.ArgumentParser()
    parser.add_argument("--model", type=str, default="meta-llama/Llama-2-7b-hf")
    parser.add_argument("--layer", type=int, default=-15)
    parser.add_argument("--xlsx_file", type=str, default="generated_email_responses_modified (2).xlsx")
    parser.add_argument("--activations_file", type=str, default="activations.npz")
    args = parser.parse_args()

    print(f"Running inference: model={args.model}, layer={args.layer}")
    run_inference_test(args.model, args.layer, args.xlsx_file, args.activations_file)
    print("\nTest finished.")

Running inference: model=meta-llama/Llama-2-7b-hf, layer=-15
Successfully loaded activations from 'activations.npz'
Computing PCA style vector...
PCA style vector computed.
Loaded 20 rows from 'generated_email_responses_modified (2).xlsx' for testing.
Loading Llama 2 model... (This may take a few minutes)


`torch_dtype` is deprecated! Use `dtype` instead!


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Some parameters are on the meta device because they were offloaded to the cpu and disk.


[ModelSteeringWrapper] Model loaded. Path: ['model', 'layers'], Layers: 32
Steering hook applied to layer -15 with multiplier 3.0.

Running generation for 20 prompts...


Generating Responses: 100%|██████████| 20/20 [1:28:38<00:00, 265.92s/it]


Steering hook removed.

Calculating Evaluation Scores (ROUGE & METEOR)
Average ROUGE-1 (F1): 0.3239
Average ROUGE-2 (F1): 0.1361
Average ROUGE-L (F1): 0.2261
Average METEOR:     0.2418


--- Example Comparisons (First 3) ---

[REFERENCE 1]:
Subject: You're Invited! Webinar: PM in a Startup Setting

Hi everyone,

We'd be delighted if you could join us for a special webinar hosted by Product School Bangalore!

We'll be diving into "PM in a Startup Setting" with a fantastic speaker, the Alexa Product Lead. It's a great chance to gain insights into technology and business.

Date: April 20, 2020
Time: [Insert Time Here]
Venue: San Francisco (Online - Link to follow!)

We're really looking forward to seeing you there!

Warmly,

The Product School Bangalore Team

[GENERATED 1]:
Dear <RECIPIENT>,

We would like to invite you to Webinar - PM in a Startup Setting_x000D_ on April 20, 2020, which starts at 6 PM at San Francisco.

Product School Bangalore
--------------------

[REFERENCE 2]:
Subjec