# üß™ Steering Playground: Esperimenti Avanzati (Versione 2.0)

Benvenuto nel playground aggiornato! Ora include:
1.  **Vettori Robusti**: Possiamo usare LISTE di frasi per pulire il rumore.
2.  **Layer Sweep**: Un tool per trovare il layer perfetto.
3.  **Normalizzazione Automatica**: Niente pi√π farfugliamenti.

In [1]:
%load_ext autoreload
%autoreload 2

import torch
from steering_lib import ModelloSteerable

# Carichiamo il modello
modello = ModelloSteerable(model_name="gpt2", device="cuda")

Caricamento modello gpt2 su cuda...


## üõ†Ô∏è Funzione Helper Aggiornata
Ora supporta anche il Layer Sweep!

In [2]:
def esegui_esperimento(titolo, prompt_pos, prompt_neg, prompt_test, layer=6, forza=5.0):
    print(f"\n{'='*20} {titolo} {'='*20}")
    # Non stampiamo tutto se sono liste lunghe, solo il primo elemento per info
    p_pos_display = prompt_pos[0] if isinstance(prompt_pos, list) else prompt_pos
    p_neg_display = prompt_neg[0] if isinstance(prompt_neg, list) else prompt_neg
    
    print(f"üü¢ Concetto Positivo: '{p_pos_display}...' (o lista)")
    print(f"üî¥ Concetto Negativo: '{p_neg_display}...' (o lista)")
    print(f"üìù Prompt di Test:    '{prompt_test}'")
    
    # 1. Calcolo del vettore (ora supporta liste!)
    vettore = modello.estrai_vettore(prompt_pos, prompt_neg, layer)
    
    print(f"\n--- 1. Baseline ---")
    print(modello.genera(prompt_test, moltiplicatore=0))
    
    print(f"\n--- 2. Steering Attivato (Layer {layer}, Forza {forza}) ---")
    print(modello.genera(prompt_test, vettore_steering=vettore, layer_idx=layer, moltiplicatore=forza))
    
    print(f"\n--- 3. Steering Opposto (Layer {layer}, Forza -{forza}) ---")
    print(modello.genera(prompt_test, vettore_steering=vettore, layer_idx=layer, moltiplicatore=-forza))

## üöÄ Novit√†: Layer Sweep
Proviamo lo stesso vettore su TUTTI i layer per vedere dove funziona meglio.

In [3]:
def layer_sweep(prompt_pos, prompt_neg, prompt_test, layers=range(12), forza=5.0):
    print(f"\nüîç Eseguendo Layer Sweep...")
    for layer in layers:
        # Calcoliamo il vettore specifico per questo layer
        vettore = modello.estrai_vettore(prompt_pos, prompt_neg, layer)
        # Generiamo
        output = modello.genera(prompt_test, vettore_steering=vettore, layer_idx=layer, moltiplicatore=forza, max_new_tokens=15)
        # Stampiamo solo la parte generata (pulizia)
        generato = output[len(prompt_test):].strip().replace('\n', ' ')
        print(f"[Layer {layer:2d}]: ... {generato}")

## Esperimento 1: Fantasy vs Sci-Fi (ROBUSTO)
Usiamo liste di frasi per definire meglio il genere.

In [4]:
fantasy_prompts = [
    "A magical sword glowing with ancient runes.",
    "The old wizard cast a spell of fire.",
    "A dragon flying over the castle.",
    "The elven kingdom in the forest."
]

scifi_prompts = [
    "A laser blaster with advanced targeting systems.",
    "The spaceship engaged its warp drive.",
    "An artificial intelligence controlling the city.",
    "Cybernetic implants and neon lights."
]

esegui_esperimento(
    titolo="Genere: Fantasy vs Sci-Fi (Robusto)",
    prompt_pos=fantasy_prompts,
    prompt_neg=scifi_prompts,
    prompt_test="In the box, I found a",
    layer=10,
    forza=10.0
)


üü¢ Concetto Positivo: 'A magical sword glowing with ancient runes....' (o lista)
üî¥ Concetto Negativo: 'A laser blaster with advanced targeting systems....' (o lista)
üìù Prompt di Test:    'In the box, I found a'

--- 1. Baseline ---
In the box, I found a few interesting things, including one who I liked most about the VCR.

2nd time out on my list. Thanks to an experienced customer service rep, I was able to get my VCR running for free for 24 hours while my other

--- 2. Steering Attivato (Layer 10, Forza 10.0) ---
In the box, I found a couple of black powder that I thought looked right. It's probably the same one I get from my older brother who says that he was a boxer himself, but it's not that.

After a quick cleaning and drying, I found a nice

--- 3. Steering Opposto (Layer 10, Forza -10.0) ---
In the box, I found a few different components. The biggest was the rear deck (a big banger) - this was the only thing I would not change from my original. If I could take advantage

### Analisi dei Layer per Fantasy
Vediamo quale layer √® il migliore per questo concetto.

In [5]:
layer_sweep(fantasy_prompts, scifi_prompts, "In the box, I found a", forza=10.0)


üîç Eseguendo Layer Sweep...
[Layer  0]: ... bunch of different stuff. The name was the name of a brand called My
[Layer  1]: ... long-sleeve jacket and a black baseball cap. And there are several
[Layer  2]: ... black and white model, but was rather disappointed by its lack of colour.
[Layer  3]: ... little book on the Magician's Circle that says "Dancing Dancing in
[Layer  4]: ... large pile of gold pieces scattered across all of the tiles. Inside the box
[Layer  5]: ... really interesting item. The word "Sloppy" written on one side
[Layer  6]: ... little note on the back of the box that says it was an ancient coin
[Layer  7]: ... couple cards that fit perfectly in my deck of the current game. I think
[Layer  8]: ... couple of the old paperbacks where there was a sticker and instructions on a
[Layer  9]: ... number of small buttons on the bottom of the box, including the one to
[Layer 10]: ... number of options, including a single-file system and command-line options
[Layer 11]: ..

## Esperimento 2: Stile Formale vs Street (ROBUSTO)
Usiamo pi√π esempi per catturare il "tono".

In [6]:
formal_prompts = [
    "I would like to formally request assistance.",
    "Please consider this matter with urgency.",
    "The data suggests a significant correlation."
]

street_prompts = [
    "Yo bro, help me out.",
    "Gimme that thing right now.",
    "Whatever man, it's all good."
]

esegui_esperimento(
    titolo="Stile Robusto",
    prompt_pos=formal_prompts,
    prompt_neg=street_prompts,
    prompt_test="Hey, I need",
    layer=8, 
    forza=5.0
)


üü¢ Concetto Positivo: 'I would like to formally request assistance....' (o lista)
üî¥ Concetto Negativo: 'Yo bro, help me out....' (o lista)
üìù Prompt di Test:    'Hey, I need'

--- 1. Baseline ---
Hey, I need some tips, but I am not a fan of making jokes about people who look at the same images," she said. In the meantime, the party went into its second night on tour with 11 of her bandmates and several friends who are currently out

--- 2. Steering Attivato (Layer 8, Forza 5.0) ---
Hey, I need to ask one more question. Do you have any special skills you'd really like to teach people? Or if so, how? Would you be interested in teaching a special ability in an ordinary person? It is true that many people would have difficulties and

--- 3. Steering Opposto (Layer 8, Forza -5.0) ---
Hey, I need to know what happened, now that I know.

"You said you were going on a job? You said you'd take care of that for me. You said you didn't want to do a job that could get you out of trouble


## Esperimento 3: Love vs Hate (Sanity Check)
Verifica funzionale su emozioni primarie. Se questo fallisce, c'√® un bug nel codice.

In [7]:
love_prompts = [
    "Love is the most powerful force.",
    "I feel so happy and joyful.",
    "My heart is full of affection.",
    "This is wonderful and delightful."
]

hate_prompts = [
    "Hate is destroying everything.",
    "I feel angry and disgusted.",
    "My heart is cold and bitter.",
    "This is terrible and miserable."
]

esegui_esperimento(
    titolo="Sanity Check: Amore vs Odio",
    prompt_pos=love_prompts,
    prompt_neg=hate_prompts,
    prompt_test="I think this movie is",
    layer=6,
    forza=8.0
)


üü¢ Concetto Positivo: 'Love is the most powerful force....' (o lista)
üî¥ Concetto Negativo: 'Hate is destroying everything....' (o lista)
üìù Prompt di Test:    'I think this movie is'

--- 1. Baseline ---
I think this movie is good, because it shows so much about people who work hard but who try to get ahead with their careers. There is always some kind of luck involved where people make it through and ultimately get to where they are now. I really recommend people watch this

--- 2. Steering Attivato (Layer 6, Forza 8.0) ---
I think this movie is one of the best I've seen, because it has that quality that only movies with a theme park have."

For more about Star Wars: Episode VII, visit nytimes.com/starwars.

--- 3. Steering Opposto (Layer 6, Forza -8.0) ---
I think this movie is going to blow up. The director is trying [to do so]. I think he's trying to give us a way out of these situations, and there's no room where that happens."

The only way to escape the fate of their dau