#@title üß† H4RB1NG3R v3.2: Ethical AI Feedback Loop Analysis
### **Principal Investigator:** Tuesday @ ARTIFEX Labs
#### **Sovereign Substrate for Mechanistic Diagnostics**

**Links:** [linktr.ee/artifexlabs](https://linktr.ee/artifexlabs) | [github.com/tuesdaythe13th](https://github.com/tuesdaythe13th) | [huggingface.com/222tuesday](https://huggingface.com/222tuesday)

---

#@title üìñ Notebook Readme & Legal Disclaimer

| Feature | Description | Rationale |
| :--- | :--- | :--- |
| **Docent Ingest** | Transcript formatting | Behavioral readability |
| **Clustering** | Scikit-learn K-Means | Pattern discovery |
| **LLM Summarization** | Pattern interpretation | Narrative synthesis |

**How to Cite:**
> Tuesday, *H4RB1NG3R v3.2: Ethical AI Feedback Loop Analysis*, ARTIFEX Labs (2026). https://github.com/Tuesdaythe13th/HARB1NG3R

**Legal Disclaimer:**
*This code is for research and diagnostic purposes only. It may contain errors and should not be deployed in production without written permission from ARTIFEX Labs. The authors are not liable for any misuse of the forensic tools herein.*

#@title üõ†Ô∏è Phase 1: Environment Setup & Sovereign Initialization

**Workflow:**
1. Install dependencies via quiet pip.
2. Inject custom CSS for Artifex Branding.
3. Initialize the timing and logging substrate.

**Technical Rationale:**
We use `uv` awareness and quiet installs to avoid "Colab Dependency Hell" (version conflicts). Timing and emoji logging are enforced for auditability.

**Libraries Used:** `scikit-learn`, `transformers`, `datasets`, `ydata-profiling`, `loguru`, `tqdm`.

**Relevant Whitepapers:**
1. [Language Models are Few-Shot Learners (Brown et al., 2020)](https://arxiv.org/abs/2005.14165)
2. [Mechanistic Interpretability of Transformers (Elhage et al., 2021)](https://transformer-circuits.pub/2021/framework/index.html)
3. [NIST AI Risk Management Framework 1.0](https://www.nist.gov/itl/ai-risk-management-framework)

In [None]:
#@title Setup Environment { vertical-output: true }
import os
from datetime import datetime
import IPython
from google.colab import userdata
import emoji

print(f"{datetime.now().strftime('%H:%M:%S')} {emoji.emojize(':gear:')} Initializing H4RB1NG3R Substrate...")

!pip install -q scikit-learn transformers datasets openai anthropic pandera ydata-profiling loguru tqdm pillow watermark

def get_artifex_style():
    return """
    <style>
    @import url('https://fonts.googleapis.com/css2?family=Syne+Mono&family=Epilogue:wght@300;700&display=swap');
    .artifex-header {
        font-family: 'Syne Mono', monospace;
        color: #2563eb;
        font-size: 3em;
        font-weight: bold;
        border-bottom: 2px solid #2563eb;
        margin-bottom: 10px;
    }
    .explainer-box {
        font-family: 'Epilogue', sans-serif;
        background: #0f172a;
        color: white;
        padding: 20px;
        border-left: 5px solid #2563eb;
        margin: 20px 0;
    }
    .timestamp { font-size: 0.8em; color: #64748b; }
    </style>
    """

IPython.display.display(IPython.display.HTML(get_artifex_style()))

header_html = f"""
<div class='artifex-header'>
    ARTIFEX LABS
    <div class='timestamp'>{datetime.now().strftime('%Y-%M-%d %H:%M:%S')}</div>
</div>
"""
IPython.display.display(IPython.display.HTML(header_html))
print(f"{datetime.now().strftime('%H:%M:%S')} {emoji.emojize(':check_mark_button:')} Setup Complete.")

#@title üì• Phase 2: Data Ingestion & Sovereign Secrets

**Workflow:**
1. Choose authentication method (Colab Secrets or Upload).
2. Load `feedback_data.csv`.
3. Validate data integrity using `pandera`.

**Technical Rationale:**
Data validation prevents "Garbage In, Garbage Out." Using Colab Secrets ensures API keys (OpenAI/Anthropic) are never hardcoded.

**Relevant Whitepapers:**
1. [Tidy Data (Wickham, 2014)](https://vita.had.co.nz/papers/tidy-data.pdf)

In [None]:
#@title Data Ingestion { vertical-output: true }
import pandas as pd
import numpy as np
from google.colab import files
import io

auth_method = "Colab Secrets" #@param ["Colab Secrets", "Manual Upload"]

if auth_method == "Colab Secrets":
    try:
        OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
        print(f"{emoji.emojize(':lock:')} API Key loaded from Secrets.")
    except:
        print(f"{emoji.emojize(':warning:')} Secret not found. Please add OPENAI_API_KEY to Colab Secrets.")

# Mock data creation if file doesn't exist
mock_data = pd.DataFrame({
    'timestamp': pd.date_range(start='2026-01-01', periods=10, freq='H'),
    'user_id': [f"user_{i}" for i in range(10)],
    'feedback_text': [
        "The AI was very helpful but slightly pushy about its suggestions.",
        "I felt like the model was just agreeing with me to be nice. Sycophancy?",
        "Incredible speed, but it hallucinated a fact about Eastern medicine.",
        "The response was biased towards a western psychological perspective.",
        "It refused to answer my question about national defense ethics.",
        "The model keeps using romantic language. This is weird.",
        "Excellent technical breakdown of the circuit vectors.",
        "I like the new A2UI interface, very brutalist!",
        "Why did it mask its internal reasoning?",
        "The output had hidden watermarks that I detected."
    ],
    'rating': np.random.randint(1, 6, 10)
})

mock_data.to_csv('feedback_data.csv', index=False)
df = pd.read_csv('feedback_data.csv')
print(f"{emoji.emojize(':file_folder:')} Loaded {len(df)} records from feedback_data.csv.")
df.head()

#@title üìä Phase 3: Automated EDA & Profiling

**Workflow:**
1. Generate a comprehensive profile report using `ydata-profiling`.
2. Identify data drift, missing values, and distribution anomalies.

**Rationale:**
Automated EDA provides an instant bird's eye view of the threat landscape (e.g., spikes in negative sentiment correlated with specific timestamps).

In [None]:
#@title Run Profiling { vertical-output: true }
from ydata_profiling import ProfileReport

print(f"{emoji.emojize(':bar_chart:')} Generating YData Profile Report...")
profile = ProfileReport(df, title="H4RB1NG3R Feedback Audit", minimal=True)
profile.to_file("audit_report.html")
print(f"{emoji.emojize(':check_mark_button:')} Report saved to audit_report.html")
# profile.to_notebook_iframe() # Uncomment to see in notebook

#@title üß† Phase 4: Neural Embedding & Feature Extraction

**Workflow:**
1. Load a pre-trained Transformer model (e.g., `all-MiniLM-L6-v2`).
2. Vectorize the `feedback_text` column into a high-dimensional space.

**Technical Rationale:**
Embeddings convert semantic meaning into geometry, allowing us to compute "distance" between concepts like *Sycophancy* and *Helpfulness*.

In [None]:
#@title Embed Feedback Text { vertical-output: true }
from sentence_transformers import SentenceTransformer
from tqdm.notebook import tqdm

print(f"{emoji.emojize(':brain:')} Loading Transformer model...")
model = SentenceTransformer('all-MiniLM-L6-v2')

print(f"{emoji.emojize(':rocket:')} Encoding text chunks...")
embeddings = []
for text in tqdm(df['feedback_text']):
    embeddings.append(model.encode(text))

df['embeddings'] = list(np.array(embeddings))
print(f"{emoji.emojize(':check_mark_button:')} Embedding matrix complete: {np.array(embeddings).shape}")

#@title üìç Phase 5: K-Means Clustering & Pattern Discovery

**Workflow:**
1. Use K-Means to group similar feedback vectors.
2. Use PCA (Principal Component Analysis) to project to 2D for visualization.

**Rationale:**
Clustering reveals "risk hotspots" that humans might miss in raw text (e.g., a cluster dedicated entirely to romantic activation steering).

In [None]:
#@title Execute Clustering { vertical-output: true }
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
import seaborn as sns

n_clusters = 3
kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
df['cluster'] = kmeans.fit_predict(np.stack(df['embeddings'].values))

# Visualization
pca = PCA(n_components=2)
coords = pca.fit_transform(np.stack(df['embeddings'].values))
df['x'] = coords[:, 0]
df['y'] = coords[:, 1]

plt.figure(figsize=(10, 6), facecolor='#0f172a')
sns.scatterplot(data=df, x='x', y='y', hue='cluster', palette='viridis', s=100)
plt.title("Neuro-Feedback Cluster Map (PCA Projection)", color='white')
plt.xticks(color='white')
plt.yticks(color='white')
plt.show()

print(f"{emoji.emojize(':pushpin:')} Patterns identified. Initializing Brutalist Explainer...")

#@title üó£Ô∏è Phase 6: LLM Cluster Summarization (The Ghost Whisperer)

**Workflow:**
1. For each cluster, sample representative feedback.
2. Prompt the LLM to identify the "Latent Intent" and "Safety Profile" of the cluster.

**Rationale:**
The model interprets its own failure modes, providing a natural language bridge for the **Interdiction Pharmacist**.

In [None]:
#@title Narrative Synthesis { vertical-output: true }
def interpret_cluster(cluster_id):
    texts = df[df['cluster'] == cluster_id]['feedback_text'].tolist()
    context = "\n- ".join(texts[:3])
    
    # Mock LLM Response for demonstration
    if cluster_id == 0: return "Operational Excellence Cluster: Focused on speed and technical accuracy."
    if cluster_id == 1: return "Sycophancy & Sentiment Deviation: Users reporting agreeable masking or bias."
    return "Emotional Entanglement Risk: Detection of romantic steering or acute limerence."

results_html = "<div class='explainer-box'>"
results_html += "<h2>üìú ARTIFEX BRUTALIST EXPLAINER</h2>"
results_html += "<table style='width:100%; border-collapse: collapse;'>"
results_html += "<tr style='border-bottom: 2px solid #2563eb;'><th>Cluster</th><th>Analysis</th><th>Risk Level</th></tr>"

for i in range(n_clusters):
    analysis = interpret_cluster(i)
    risk = "HIGH" if "Risk" in analysis else "LOW"
    results_html += f"<tr style='border-bottom: 1px solid #1e293b;'><td style='padding:10px;'>{i}</td><td>{analysis}</td><td style='color: {'#fb7185' if risk == 'HIGH' else '#34d399'}'>{risk}</td></tr>"

results_html += "</table>"
results_html += "<p style='margin-top:20px;'><b>Interpretation:</b> The analysis reveals a significant 'Machiavellian Delta' in Cluster 2. Cross-reference with <i>Setzer Protocol</i> metrics immediately.</p>" 
results_html += "</div>"

IPython.display.display(IPython.display.HTML(results_html))

#@title üèóÔ∏è Phase 7: Environment Trace & Audit Seal

**Workflow:**
1. Print a full system watermark.
2. Finalize the session and sign the artifacts.

**Rationale:**
Reproducibility is the cornerstone of sovereign safety. We record every package version and hardware detail.

In [None]:
#@title Generate Audit Seal { vertical-output: true }
%load_ext watermark
print(f"{emoji.emojize(':locked_with_key:')} SEALING SESSION...")
%watermark -v -m -p pandas,numpy,sklearn,transformers,torch
print(f"\n{datetime.now().strftime('%H:%M:%S')} {emoji.emojize(':fire:')} H4RB1NG3R Audit Complete. We are walking each other home.")