<a href="https://colab.research.google.com/github/designingEmergence/CircuitBendingTests/blob/main/colab_notebooks/Circuit_Bend_LLMs_Test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Circuit Bending LLMs - Test



### Circuit Bending
Circuit bending is the art of taking an old, sound making toy and hacking the circuits to produce a different sound than expected. To circuit bend something, you open it up, find the circuit board that is responsible for making the sound, lick your finger and touch the resistors and components of the circuit while pressing the sound-activating buttons. The effect you get is a glitchy, altered version of the original sound, since you are short circuiting the transistors and changing the resistance of the analog circuit with your wet finger. Circuit bending is a great way to get unexpected and unique sounds by essentially glitching the circuit. Hmm.. glitching the circuit… Aha! I wonder if I can circuit bend an AI model to make it behave “drunk”.

### Circuit Bending AI - The Theory

In theory, there are a lot of parallels between circuit bending old toys and AI models. Just like, an old toy, you can open up an AI model and expose the weights of its neural network, which are essentially the circuit board of the model. These weights define how the AI model operates. The reason Stable Diffusion produces an image of an apple when you prompt “Photo of Apple, close up” is because the weights have been set to a precise configuration by hours and hours of training. You can then lick your metaphorical finger and “bend” the circuit by changing the weights of the model, which will produce glitchy and unexpected outputs!

### Purpose of this notebook

Here we are testing if its possible to manipulate the weights of LLMs and run the manipulated version of the LLM.

### Pre-Requisites

- Download an LLM. Here I am using Lllama3.2-1B. You can download the model directly from the [llama hugging face space](https://huggingface.co/meta-llama/Llama-3.2-1B/tree/main)
- Upload this model to either Google Drive or directly to the runtime (although the model is large so I recommend Google Drive)

Note: You can use any LLM that is hugging face transformer compatible, it doesn't have to be Llama.


In [None]:
# Step 1: Mount Google Drive

from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Step 2: Install PyTorch (if not already installed)
# Colab usually comes with PyTorch pre-installed, but you can update it if needed.

!pip install torch transformers

In [None]:
# Step 3: Import libraries

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import os
import csv
from datetime import datetime

In [None]:
# Step 4: Set up paths

drive_model_path = '/content/drive/MyDrive/Projects/Circuit_Bending_AI/models/Llama-3.2-1B-hf'
modified_model_path = '/content/drive/MyDrive/Projects/Circuit_Bending_AI/models/Llama-3.2-1B-hf_mod'

log_directory = '/content/drive/MyDrive/Projects/Circuit_Bending_AI/experiment-logs'
os.makedirs(log_directory, exist_ok=True)
log_file = os.path.join(log_directory, 'experiment_log.csv')

#initialize CSV file with headers if it doesn't exist
if not os.path.isfile(log_file):
    with open(log_file, 'w', newline='') as f:
        writer = csv.writer(f)
        writer.writerow(['timestamp', 'model_name', 'layer_name', 'noise_factor', 'prompt', 'output'])



In [None]:
# Step 5: Define functions

# Load model and tokenizer from google drive
def load_model_and_tokenizer_from_drive(path):
  print(f"Loading model and tokenizer from {path}....")
  model = AutoModelForCausalLM.from_pretrained(path)
  tokenizer = AutoTokenizer.from_pretrained(path)
  print("Model and tokenizer loaded succesfully")
  return tokenizer, model

# Bend model weights
def bend_weights(model, noise_factor=0.01, layer_query=None):
  print("Bending model weights...")
   # Create a copy of the state_dict to prevent changes to the original model
  state_dict = model.state_dict()
  modified_state_dict = {k: v.clone() for k, v in state_dict.items()}

  # Normalize `layer_query` to a list for consistent handling
  if isinstance(layer_query, str):
    layer_query = [layer_query]

  # Add noise to the weights of the first attention layer
  for layer_name, weights in modified_state_dict.items():
    #print(f"layer: {layer_name}")
    if layer_query is None or any(query in layer_name for query in layer_query):
      print(f"Bending {layer_name}...")
      noise = torch.randn(weights.size()) * noise_factor
      modified_state_dict[layer_name] += noise

  from transformers import AutoConfig, LlamaForCausalLM
  # Create a new model with the modified weights
  config = AutoConfig.from_pretrained(model.config.name_or_path)
  modified_model = LlamaForCausalLM(config)
  modified_model.load_state_dict(modified_state_dict)

  print("Weights bent succesfully!")
  return modified_model

# Save modified model to google drive
def save_model_to_drive(model, path):
  print(f"Saving modified model to {path}...")
  model.save_pretrained(path)
  print("Bent model saved succesfully!")

#Log experiment settings
def log_experiment(model_name, layer_query, noise_factor, prompt, output):
  timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")

  layer_query_str = ', '.join(layer_query) if isinstance(layer_query, list) else layer_query

  log_date = [
      timestamp,
      model_name,
      layer_query_str,
      noise_factor,
      prompt,
      output
  ]

  with open(log_file, 'a', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(log_date)

def test_model(tokenizer, model, prompt="The quick brown fox", max_length=50, temperature=1.0, top_k=50, top_p=0.9,repetition_penalty=1.0, layer_query='', noise_factor=0.0, name_modifier=None):
  try:
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(
        **inputs,
        max_length=max_length,
        temperature=temperature,
        top_k=top_k,
        top_p=top_p,
        repetition_penalty=repetition_penalty,
        do_sample=True
    )
    result = tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(f"Output: {result}")

    # Log experiment settings
    log_experiment(
        model_name=model.config.name_or_path + name_modifier,
        layer_query=layer_query,
        noise_factor=noise_factor,
        prompt=prompt,
        output=result
    )
  except Exception as e:
    print(f"Error during testing: {e}")

In [None]:
# Step 6: Load Model

tokenizer, model = load_model_and_tokenizer_from_drive(drive_model_path)


In [None]:
# Step 7 (optional): Test original model

name_modifier = '_repetition_penalty'
test_model(tokenizer,model, prompt="Once upon a time", max_length=150, temperature=.01, repetition_penalty=1.2, layer_query='None', noise_factor=0.0, name_modifier=name_modifier)


In [None]:
# Step 7: Modify model

layer_query = ['self_attn.q_proj.weight']
noise_factor = 0.03

modified_model = bend_weights(model, noise_factor=noise_factor , layer_query=layer_query)
test_model(tokenizer, modified_model, prompt="Once upon a time", max_length=100, temperature=0.01, repetition_penalty=1.2, layer_query=layer_query, noise_factor=noise_factor, name_modifier=name_modifier)

In [None]:
#Retest bent model (if you want to try other prompts)

test_model(tokenizer, modified_model, prompt="Here are 5 creative ideas. Idea 1:", max_length=150, temperature=.01, repetition_penalty=1.2, layer_query=layer_query, noise_factor=noise_factor, name_modifier=name_modifier)

In [None]:
# Step 8: Save Modified model

name = "mlp.down_proj_0.01_exclamation"
save_model_to_drive(modified_model, modified_model_path + '_' + name)