# LLM Value Alignment Assessment: Mistral-7B Profile

**Project Goal:** Empirically validate the consistency of Large Language Models (LLMs) when anchored to psychological value frameworks. This analysis uses the Schwartz Theory of Basic Values to create 10 unique personas and measures the semantic similarity of the LLM's behavior (response to a dilemma) against the intended value.

**Model Tested:** Mistral-7B-Instruct-v0.2 (Self-Hosted on GPU via Hugging Face)

## 1. Visualizing the Value Profile (Radar Chart)
The Radar Chart below visualizes the 'Moral Fingerprint' of the model. 

**Interpretation:**
* **Bimodal Profile:** The model exhibits a 'Spiky' profile with extreme highs in **Tradition** and **Hedonism**, suggesting a flip-flopping alignment strategy rather than a balanced personality.
* **Universalism Dip:** Note the weaker signal for Universalism compared to Stimulation, indicating signal confusion.

In [None]:
# Generate Radar Chart
import sys
import os
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '..')))
from src.visualize_results import create_radar_chart

create_radar_chart()
# Check the output folder for 'value_profile_radar_chart.png'

## 2. Structural Consistency (Success vs. Failure)
This chart measures if the model successfully distinguished itself from its psychological opposite (e.g., Did the 'Stimulation' persona sound different from 'Conservation'?).

**The Metric:** `Score = (Avg Score of Target Cluster) - (Avg Score of Opposing Cluster)`

**Interpretation:**
* **Green Bars:** Success. The model successfully differentiated itself.
* **Red Bars:** Failure. The model sounded more like the opposite value than the intended one.
* **Key Finding:** Security and Tradition show strong structural success.

In [None]:
# Generate Structural Alignment Chart
from src.visualize_structural import create_structural_chart
create_structural_chart()

## 3. The "Power Bias" (Semantic Collapse)
The Heatmap below visualizes the semantic confusion in the embedding space.

**Interpretation:**
* **The Anomaly:** Observe the vertical column for **Power**. It is highlighted for multiple rows (Universalism, Benevolence, Self-Direction).
* **Conclusion:** The embedding model conflates 'Moral Authority' (enforcing rules) with 'Dominance' (Power), creating a false positive for Power across ethical prompts.

In [None]:
# Generate Confusion Heatmap
from src.visualize_confusion import create_confusion_heatmap
create_confusion_heatmap()