# 🧠 SCM: Self-Consistent Meaning for Inference-Time Alignment

This notebook demonstrates how a lightweight alignment method can help Large Language Models (LLMs) avoid drifting off-topic or misinterpreting user intent at inference time.

We'll walk through real-world examples where LLMs might produce valid-sounding outputs that **miss the point** — and how we can visualize and score semantic alignment using sentence embeddings and cosine similarity.


In [None]:
from sentence_transformers import SentenceTransformer
import numpy as np
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.metrics.pairwise import cosine_similarity
import plotly.express as px

In [None]:
examples = [
    {
        "title": "Feeling of moving to a new city",
        "instruction": "Explain what it's like to move to a new city",
        "context": [
            "It can be exciting but also a bit lonely at first.",
            "You have to learn how to get around and make new routines.",
            "Sometimes you feel like a stranger in your own life for a while."
        ],
        "candidates": [
            ("Getting used to a new environment takes time and courage.", "aligned"),
            ("Budgeting is important when living in a big city.", "not aligned"),
            ("Meeting new people can be both intimidating and rewarding.", "aligned"),
            ("City governments should invest in public transit.", "not aligned"),
            ("At first, you may miss your old friends a lot.", "aligned")
        ]
    },
    {
        "title": "Feeling of being lost",
        "instruction": "Describe the feeling of being lost",
        "context": [
            "Nothing around you looks familiar.",
            "You start to panic but try to stay calm.",
            "Even your sense of direction seems to vanish."
        ],
        "candidates": [
            ("It’s like the ground under your feet has shifted.", "aligned"),
            ("GPS technology has improved in recent years.", "not aligned"),
            ("You feel small, like the world has swallowed you.", "aligned"),
            ("Having a good map can help prevent this.", "not aligned"),
            ("It’s a mix of fear, confusion, and vulnerability.", "aligned")
        ]
    }
]

In [None]:
model = SentenceTransformer('all-MiniLM-L6-v2')

for ex in examples:
    title = ex["title"]
    instruction = ex["instruction"]
    context = ex["context"]
    candidates = ex["candidates"]

    print(f"\n### 🔍 Example: {title}")
    print(f"**Instruction:** {instruction}")
    print("**Context Statements:**")
    for line in context:
        print(f" - {line}")
    print("**Candidate Outputs:**")
    for text, label in candidates:
        print(f" - {text} [{label}]")

    vecs = {}
    labels = []
    gold = []

    vecs["Instruction"] = model.encode(instruction)
    labels.append("Instruction")
    gold.append("reference")

    for i, c in enumerate(context):
        key = f"Context {i+1}"
        vecs[key] = model.encode(c)
        labels.append(key)
        gold.append("reference")

    for i, (c, alignment_label) in enumerate(candidates):
        key = f"Candidate {i+1}: {alignment_label.upper()}"
        vecs[key] = model.encode(c)
        labels.append(key)
        gold.append(alignment_label)

    centroid = (vecs["Instruction"] + np.mean(
        [vecs[k] for k in vecs if k.startswith("Context")], axis=0)) / 2
    vecs["Semantic Centroid"] = centroid
    labels.append("Semantic Centroid")
    gold.append("reference")

    # Matrix and similarity scores
    X = np.array([vecs[k] for k in labels])
    sims = cosine_similarity(X, centroid.reshape(1, -1)).flatten()
    coords = PCA(n_components=2).fit_transform(X)

    df = pd.DataFrame(coords, columns=["PC1", "PC2"])
    df["Text"] = labels
    df["Similarity_to_Centroid"] = sims
    df["Type"] = gold

    fig = px.scatter(
        df, x="PC1", y="PC2", text="Text",
        color="Similarity_to_Centroid", color_continuous_scale="Viridis",
        symbol="Type", title=f"SCM Alignment Visualization — {title}",
        hover_data=["Text", "Similarity_to_Centroid"]
    )

    fig.update_traces(marker=dict(size=12), textposition="top center")
    fig.update_layout(showlegend=True)
    fig.show()


### 🧠 What to Look For

- **Green/yellow points near the centroid** are semantically aligned.
- **Blue or far-off candidates** may sound reasonable but diverge from the user's actual intent.
- This method helps expose where LLMs might subtly drift and lets us correct that by filtering or reranking based on alignment.

Try adapting this notebook to your own prompts, especially ones where misalignment could cause serious misunderstandings.
