<a href="https://colab.research.google.com/github/alenabd24/BEng-Dissertation-Project/blob/main/Buffer_Overflow_Flipping_Model_Weights.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **February 23rd**
## Testing Fault Tolerance of a GraphCodeBERT-based Buffer Overflow CWE Classification model, by injecting bit-flip faults into the model weight parameters.

# **Code Update (16th March):**
* Changed CWEs to Ratnaker's recommendations

## Purpose of the script:
1. Train a GraphCodeBERT-based model to classify code snippets into different CWE types (specifically those related to buffer overflows).

2. Introduce bit-flip noise into the model weights post-training, prior to inference on unseen test data.

3. Evaluate how this noise affects the model's accuracy and robustness.

---

Installing ML and NLP-related libraries, mainly from hugging face

In [2]:
!pip install datasets
!pip install transformers
!pip install accelerate -U
!pip install transformers[torch]
!pip install wandb

Collecting datasets
  Downloading datasets-3.4.1-py3-none-any.whl.metadata (19 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Downloading datasets-3.4.1-py3-none-any.whl (487 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m487.4/487.4 kB[0m [31m20.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m10.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading multiprocess-0.70.16-py311-none-any.whl (143 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m143.5/143.5 kB[0m [31m14.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading

In [3]:
from tqdm import tqdm, trange
import multiprocessing

from transformers import (WEIGHTS_NAME, AdamW, get_linear_schedule_with_warmup,
                          RobertaConfig, RobertaForSequenceClassification, RobertaTokenizer)
from datasets import Dataset
from transformers import RobertaTokenizer, RobertaForMaskedLM, pipeline
from transformers import DataCollatorWithPadding
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer

import torch

!pip install evaluate
import evaluate
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

Collecting evaluate
  Downloading evaluate-0.4.3-py3-none-any.whl.metadata (9.2 kB)
Downloading evaluate-0.4.3-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.0/84.0 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: evaluate
Successfully installed evaluate-0.4.3


## Transformers & Hugging Face Libraries  
- **RobertaConfig** → Configuration settings for RoBERTa models.  
- **RobertaForSequenceClassification** → RoBERTa model for classification tasks.  
- **RobertaTokenizer** → Tokenizer for RoBERTa (converts text into tokenized inputs).  
- **RobertaForMaskedLM** → RoBERTa for Masked Language Modeling (predicting masked words).  
- **pipeline** → High-level API for using pre-trained models easily.  
- **DataCollatorWithPadding** → Ensures tokenized inputs are correctly padded for training.  
- **AutoModelForSequenceClassification** → Generic method for loading classification models.  
- **TrainingArguments & Trainer** → Utilities for managing model training.  

## Torch & Optimizers  
- **torch** → PyTorch framework for training deep learning models.  
- **AdamW** → Optimizer designed for transformers.  
- **get_linear_schedule_with_warmup** → Learning rate scheduler.  

## Additional Libraries  
- **evaluate** → A package for computing accuracy, F1-score, etc., similar to `datasets.metric`.  
- **numpy & pandas** → For handling datasets and numerical operations.  
- **sklearn.train_test_split** → Splits data into training and test sets.  


In [5]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [6]:
import pandas as pd
df=pd.read_csv('/content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/processed_data_diversevul_2.csv')

In [7]:
import pandas as pd

# List of buffer overflow related CWEs
# buffer_overflow_cwes = [
#     'CWE119', 'CWE120', 'CWE121', 'CWE122', 'CWE123', 'CWE124',
#     'CWE125', 'CWE787', 'CWE805', 'CWE680', 'CWE131', 'CWE170',
#     'CWE369', 'CWE415'
# ]

buffer_overflow_cwes = [
    'CWE0', 'CWE120', 'CWE122', 'CWE369', 'CWE190',
    'CWE400', 'CWE502'
]


# Creating a list of specific CWE identifiers related to buffer overflow vulnerabilities

# Filter the dataset to include only rows where the CWE-Type column matches one of the buffer overflow CWEs
# This line filters the DataFrame (df) to only include rows where the value in the CWE-Type column matches
# any of the CWEs in the buffer_overflow_cwes list.
filtered_df = df[df['CWE-Type'].isin(buffer_overflow_cwes)]

# Show unique values in the CWE-Type column after filtering
# extracts the unique values in the CWE-Type column of the filtered DataFrame (filtered_df), i.e.,
# the distinct CWEs that match buffer overflow vulnerabilities. It prints those unique CWEs so you can see which vulnerabilities are present
unique_cwes = filtered_df['CWE-Type'].unique()
print("Unique CWEs in the filtered dataset:", unique_cwes)

# Save the filtered dataset to a new CSV file
filtered_df.to_csv('filtered_dataset.csv', index=False)

print("Dataset has been filtered and saved as 'filtered_dataset.csv'")


Unique CWEs in the filtered dataset: ['CWE0' 'CWE400' 'CWE120' 'CWE190' 'CWE369' 'CWE502' 'CWE122']
Dataset has been filtered and saved as 'filtered_dataset.csv'


CWEs are used to classify and identify different types of security vulnerabilities, in this case related to buffer overflow.

filtered_df = df[df['CWE-Type'].isin(buffer_overflow_cwes)]:
* Only rows where condition of (CWE-type present in buffer_overflow_cwes = true) are kept.
* Returns a new DataFrame with only the rows where the CWE-type column matches one of the buffer overflow CWEs

In [8]:
filtered_df.to_csv('/content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/bo_filtered_dataset.csv', index=False)

In [9]:
len(filtered_df)

80134

In [10]:

# Define a list of CWEs to filter
target_cwes = ['CWE0', 'CWE120', 'CWE122', 'CWE369', 'CWE190',
    'CWE400', 'CWE502']

# Include CWEs in the range 780-790
#for i in range(780, 791):
#    target_cwes.append('CWE' + str(i))

# Filter entries containing CWEs in the specified range or in the target list
filtered_df = df[df['CWE-Type'].str.contains('|'.join(target_cwes))]

# Output the filtered dataframe
print(filtered_df)

                                                     code CWE-Type
0       int _gnutls_ciphertext2compressed(gnutls_sessi...     CWE0
2       unpack_Z_stream(int fd_in, int fd_out)\n{\n\tI...     CWE0
14      asmlinkage long compat_sys_mount(char __user *...     CWE0
15      unsigned short atalk_checksum(struct ddpehdr *...     CWE0
16      static int ltalk_rcv(struct sk_buff *skb, stru...     CWE0
...                                                   ...      ...
409114  CpuDefinitionInfoList *qmp_query_cpu_definitio...     CWE0
409115  static bool loongarch_cpu_exec_interrupt(CPUSt...     CWE0
409116  static bool loongarch_cpu_has_work(CPUState *c...     CWE0
409117  static void loongarch_cpu_add_definition(gpoin...     CWE0
409118  static void loongarch_cpu_synchronize_from_tb(...     CWE0

[80134 rows x 2 columns]


In [11]:
# Extract unique CWE types
unique_cwes = filtered_df['CWE-Type'].nunique()

# Output unique CWE types
print("Unique CWE types:", unique_cwes)

Unique CWE types: 7


In [12]:
df=filtered_df

In [13]:
df = df.astype(str)

In [14]:
# Creating 2 dictionaries that convert between unique CWE types and numerical labels
id2label = dict() # Maps integer index to a CWE-type (0 : 'CWE119)
label2id = dict() # Maps CWE-type to an integer index ('CWE119' : 0)
ind = 0
for i in df['CWE-Type'].unique():
    id2label[ind] = i
    label2id[i] = ind
    ind+=1

In [15]:
print('id2label dictionary: ')
print(id2label)
print('label2id dictionary: ')
print(label2id)

id2label dictionary: 
{0: 'CWE0', 1: 'CWE400', 2: 'CWE120', 3: 'CWE190', 4: 'CWE369', 5: 'CWE502', 6: 'CWE122'}
label2id dictionary: 
{'CWE0': 0, 'CWE400': 1, 'CWE120': 2, 'CWE190': 3, 'CWE369': 4, 'CWE502': 5, 'CWE122': 6}


In [16]:
df['label']=df['CWE-Type'].map(label2id)
df.head()

Unnamed: 0,code,CWE-Type,label
0,int _gnutls_ciphertext2compressed(gnutls_sessi...,CWE0,0
2,"unpack_Z_stream(int fd_in, int fd_out)\n{\n\tI...",CWE0,0
14,asmlinkage long compat_sys_mount(char __user *...,CWE0,0
15,unsigned short atalk_checksum(struct ddpehdr *...,CWE0,0
16,"static int ltalk_rcv(struct sk_buff *skb, stru...",CWE0,0


In [17]:
# Splitting the dataset into training(80%) and test (20%) sets
df_train, df_test = train_test_split(df, test_size=0.2, random_state=42)

In [18]:
dataset = {} # Creating an empty dictionary
dataset['text'] = list(df_train['code']) # adding key-value pair to dataset dictionary, 'text' = key and 'code' = value (in the form of a list). Serves as the feature.
dataset['label'] = list(df_train['label']) # same, but adding the key-value pair to act as the label (prediction) for the model.
# The code below converts dictionary we just created into a Hugging Face dataset object. It provides many convenient NLP features, such as tokenization.
ds = Dataset.from_dict(dataset) # Creation of hugging face dataset object.
ds = ds.train_test_split(test_size=0.1) # train/validation split (10% validation)

The code cell above performs the **second (2ND)** data split.

### 1st Split:
* Creating the initial training and test datasets.
* test dataset is entirely separated from the training process
### 2nd Split:
* Splits the training data set into training and validation
* The validation set is used for hyperparameter tuning and intermediate evaluations during the training phase. Happens before testing

In [19]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("microsoft/graphcodebert-base") # Loading the tokenizer
model = AutoModelForSequenceClassification.from_pretrained("microsoft/graphcodebert-base", num_labels=7,id2label=id2label, label2id=label2id)  # Adjust num_labels according to your classification needs
# Above, a model is loaded for sequence classification, with 12 possible output labels, defined by num_labels=7. Corresponding to 12 types of CWEs

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/539 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/499M [00:00<?, ?B/s]

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at microsoft/graphcodebert-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [20]:
from sklearn.metrics import accuracy_score, precision_recall_fscore_support, confusion_matrix # model performance evaluation metrics
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
accuracy = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=labels)

# A function that calculates accuracy during model evaluation by comparing the predicted labels (after applying argmax) to the true labels.

model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]

In [21]:
def preprocess_function(examples):
    return tokenizer(examples["text"], truncation=True)

tokenized_dataset = ds.map(preprocess_function, batched=True)
# Tokenizing the dataset

Map:   0%|          | 0/57696 [00:00<?, ? examples/s]

Map:   0%|          | 0/6411 [00:00<?, ? examples/s]

In [22]:
from transformers import Trainer, TrainingArguments

'''
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
)

'''
training_args = TrainingArguments(
    output_dir="/content/drive/MyDrive/Colab Notebooks/THESIS_PROJECT/MODEL_WEIGHTS/NEW_MODEL_WEIGHTS/graphcodebert_bo",
    learning_rate=4e-5,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    num_train_epochs=3,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    report_to="wandb",
    fp16 = True,
    warmup_steps = 0
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

trainer.train()


  trainer = Trainer(
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33malenabd24[0m ([33malenabd24-queen-mary-university-of-london[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Epoch,Training Loss,Validation Loss,Accuracy
1,0.3623,0.266212,0.908283
2,0.189,0.222698,0.923881
3,0.1142,0.212522,0.935268


TrainOutput(global_step=5409, training_loss=0.276403069914897, metrics={'train_runtime': 4344.26, 'train_samples_per_second': 39.843, 'train_steps_per_second': 1.245, 'total_flos': 4.55428352566416e+16, 'train_loss': 0.276403069914897, 'epoch': 3.0})

* The idea is to fine-tune the model first, so that it selects appropriate weights for the classification task.
* After training, the model's accuracy should be evaluated without bit flips
* Following that, I'll inject bit flips and compare accuracy to before vs after fault injection

In [23]:
trainer.evaluate()

{'eval_loss': 0.2125215381383896,
 'eval_accuracy': 0.9352675089689596,
 'eval_runtime': 44.6608,
 'eval_samples_per_second': 143.549,
 'eval_steps_per_second': 4.501,
 'epoch': 3.0}

In [24]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

preds = []
for i in df_test['code'].values:
    with torch.no_grad():
        inputs = tokenizer(i, return_tensors="pt",  truncation=True).to(device)
        logits = model(**inputs).logits
        predicted_class_id = logits.argmax().item()
        preds.append(predicted_class_id)

In [25]:
y_true = [id2label[i] for i in df_test['label'].values]
y_pred = [id2label[i] for i in preds]

In [26]:
from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred))

              precision    recall  f1-score   support

        CWE0       0.95      0.97      0.96      9785
      CWE120       0.94      0.90      0.92      1093
      CWE122       0.88      0.91      0.90       286
      CWE190       0.91      0.88      0.90      2228
      CWE369       0.83      0.87      0.85       705
      CWE400       0.91      0.89      0.90      1906
      CWE502       1.00      0.67      0.80        24

    accuracy                           0.93     16027
   macro avg       0.92      0.87      0.89     16027
weighted avg       0.93      0.93      0.93     16027



### inspect model structure

In [None]:
for name, param in model.named_parameters():
    print(name)


### Accessing the final classification layer - out_proj_weights





In [None]:
# Access the first linear layer in the classifier (dense layer)
#dense_layer = model.classifier.dense
#dense_weights = dense_layer.weight.data

# Access the final projection layer (out_proj)
#out_proj_layer = model.classifier.out_proj
#out_proj_weights = out_proj_layer.weight.data


In [None]:
target_layer = "classifier.out_proj.weight"

In [None]:
import numpy as np
import torch

# access out_proj layer weights
out_proj_layer = model.classifier.out_proj
out_proj_weights = out_proj_layer.weight.data  #  torch.Tensor

weights_fp32 = out_proj_weights.float() #coonversion to float32

weights_np = weights_fp32.cpu().detach().numpy() #convert to numpy

weights_uint = weights_np.view(np.uint32) # bit-level representation for each flaot 32 value

binary_repr = np.vectorize(lambda x: format(x, '032b'))(weights_uint)

print("Weights shape:", weights_np.shape)
print("First 5 binary representations:")
print(binary_repr.flatten()[:5])


Weights shape: (7, 768)
First 5 binary representations:
['00111001110111110001010001001100' '00111100111110000010111110011110'
 '10111101001110000110011011010001' '10111011010010110111001100110110'
 '00111011101000101110110110001110']


### The key idea is to iteratively flip an increasing number of bits in the final classification layer to see the effect on classification performance.

* First iteration will flip 20 bits at random positions, using XOR operation

In [None]:
import random

num_flips = 20
num_rows, num_cols = weights_uint.shape

for i in range(num_flips):
    row = random.randint(0, num_rows - 1)
    col = random.randint(0, num_cols - 1)
    bit_position = random.randint(0, 31)
    print(f"Flipping bit {bit_position} at position (row={row}, col={col})")

    # Create a mask for the chosen bit and flip it using XOR
    mask = 1 << bit_position
    weights_uint[row, col] ^= mask

# Convert the modified binary representation back to float32
modified_weights_np = weights_uint.view(np.float32)

# Update the model's out_proj layer weights with the modified weights
out_proj_layer.weight.data.copy_(torch.tensor(modified_weights_np, dtype=weights_fp32.dtype))

print("20 bit flips performed and model weights updated.")


Flipping bit 17 at position (row=0, col=759)
Flipping bit 8 at position (row=1, col=228)
Flipping bit 5 at position (row=5, col=104)
Flipping bit 2 at position (row=4, col=432)
Flipping bit 13 at position (row=0, col=95)
Flipping bit 1 at position (row=1, col=517)
Flipping bit 26 at position (row=4, col=203)
Flipping bit 17 at position (row=1, col=459)
Flipping bit 10 at position (row=6, col=6)
Flipping bit 21 at position (row=5, col=432)
Flipping bit 13 at position (row=2, col=159)
Flipping bit 6 at position (row=6, col=344)
Flipping bit 6 at position (row=0, col=389)
Flipping bit 16 at position (row=2, col=352)
Flipping bit 29 at position (row=6, col=44)
Flipping bit 24 at position (row=4, col=127)
Flipping bit 18 at position (row=0, col=565)
Flipping bit 23 at position (row=6, col=643)
Flipping bit 4 at position (row=4, col=196)
Flipping bit 14 at position (row=0, col=677)
20 bit flips performed and model weights updated.


In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

preds = []
for i in df_test['code'].values:
    with torch.no_grad():
        inputs = tokenizer(i, return_tensors="pt",  truncation=True).to(device)
        logits = model(**inputs).logits
        predicted_class_id = logits.argmax().item()
        preds.append(predicted_class_id)

In [None]:
y_true = [id2label[i] for i in df_test['label'].values]
y_pred = [id2label[i] for i in preds]

In [None]:
from sklearn.metrics import classification_report
#print('CLassification Report - 20 bit flips (classification layer))
print(classification_report(y_true, y_pred))

              precision    recall  f1-score   support

        CWE0       0.95      0.96      0.96      9785
      CWE120       0.91      0.89      0.90      1093
      CWE122       0.88      0.90      0.89       286
      CWE190       0.91      0.84      0.87      2228
      CWE369       0.77      0.89      0.83       705
      CWE400       0.90      0.90      0.90      1906
      CWE502       1.00      0.75      0.86        24

    accuracy                           0.93     16027
   macro avg       0.90      0.88      0.89     16027
weighted avg       0.93      0.93      0.93     16027



### Iteration 2.0: Flipping additional 10 bits (30 total)

In [None]:
import random

num_flips = 10
num_rows, num_cols = weights_uint.shape

for i in range(num_flips):
    row = random.randint(0, num_rows - 1)
    col = random.randint(0, num_cols - 1)
    bit_position = random.randint(0, 31)
    print(f"Flipping bit {bit_position} at position (row={row}, col={col})")

    # Create a mask for the chosen bit and flip it using XOR
    mask = 1 << bit_position
    weights_uint[row, col] ^= mask

# Convert the modified binary representation back to float32
modified_weights_np = weights_uint.view(np.float32)

# Update the model's out_proj layer weights with the modified weights
out_proj_layer.weight.data.copy_(torch.tensor(modified_weights_np, dtype=weights_fp32.dtype))

print("10 more bit flips performed and model weights updated.")
print("Total of 30 bits have been flipped")


Flipping bit 5 at position (row=6, col=296)
Flipping bit 6 at position (row=6, col=238)
Flipping bit 29 at position (row=3, col=284)
Flipping bit 10 at position (row=5, col=373)
Flipping bit 13 at position (row=2, col=363)
Flipping bit 4 at position (row=5, col=273)
Flipping bit 10 at position (row=4, col=650)
Flipping bit 15 at position (row=4, col=746)
Flipping bit 24 at position (row=1, col=473)
Flipping bit 14 at position (row=2, col=655)
10 more bit flips performed and model weights updated.
Total of 30 bits have been flipped


In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

preds = []
for i in df_test['code'].values:
    with torch.no_grad():
        inputs = tokenizer(i, return_tensors="pt",  truncation=True).to(device)
        logits = model(**inputs).logits
        predicted_class_id = logits.argmax().item()
        preds.append(predicted_class_id)


y_true = [id2label[i] for i in df_test['label'].values]
y_pred = [id2label[i] for i in preds]

from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred))
print("Total of 30 bits have been flipped")


              precision    recall  f1-score   support

        CWE0       0.95      0.96      0.96      9785
      CWE120       0.91      0.89      0.90      1093
      CWE122       0.88      0.90      0.89       286
      CWE190       0.91      0.84      0.87      2228
      CWE369       0.77      0.89      0.83       705
      CWE400       0.90      0.90      0.90      1906
      CWE502       1.00      0.75      0.86        24

    accuracy                           0.93     16027
   macro avg       0.90      0.88      0.89     16027
weighted avg       0.93      0.93      0.93     16027

Total of 30 bits have been flipped


### Iteration 3.0: Flipping 10 more bits (40 total)

In [None]:
import random

num_flips = 10
num_rows, num_cols = weights_uint.shape

for i in range(num_flips):
    row = random.randint(0, num_rows - 1)
    col = random.randint(0, num_cols - 1)
    bit_position = random.randint(0, 31)
    print(f"Flipping bit {bit_position} at position (row={row}, col={col})")

    # Create a mask for the chosen bit and flip it using XOR
    mask = 1 << bit_position
    weights_uint[row, col] ^= mask

# Convert the modified binary representation back to float32
modified_weights_np = weights_uint.view(np.float32)

# Update the model's out_proj layer weights with the modified weights
out_proj_layer.weight.data.copy_(torch.tensor(modified_weights_np, dtype=weights_fp32.dtype))

print("10 more bit flips performed and model weights updated.")
print("Total of 40 bits have been flipped")



Flipping bit 3 at position (row=5, col=332)
Flipping bit 20 at position (row=1, col=32)
Flipping bit 4 at position (row=3, col=274)
Flipping bit 20 at position (row=1, col=580)
Flipping bit 31 at position (row=1, col=671)
Flipping bit 29 at position (row=3, col=658)
Flipping bit 8 at position (row=1, col=271)
Flipping bit 16 at position (row=1, col=762)
Flipping bit 27 at position (row=5, col=598)
Flipping bit 23 at position (row=4, col=408)
10 more bit flips performed and model weights updated.
Total of 40 bits have been flipped


In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

preds = []
for i in df_test['code'].values:
    with torch.no_grad():
        inputs = tokenizer(i, return_tensors="pt",  truncation=True).to(device)
        logits = model(**inputs).logits
        predicted_class_id = logits.argmax().item()
        preds.append(predicted_class_id)


y_true = [id2label[i] for i in df_test['label'].values]
y_pred = [id2label[i] for i in preds]

from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred))
print("Total of 40 bits have been flipped")


              precision    recall  f1-score   support

        CWE0       0.95      0.96      0.96      9785
      CWE120       0.91      0.89      0.90      1093
      CWE122       0.88      0.90      0.89       286
      CWE190       0.91      0.84      0.88      2228
      CWE369       0.77      0.89      0.83       705
      CWE400       0.90      0.90      0.90      1906
      CWE502       1.00      0.75      0.86        24

    accuracy                           0.93     16027
   macro avg       0.90      0.88      0.89     16027
weighted avg       0.93      0.93      0.93     16027

Total of 40 bits have been flipped


### Iteration 4.0: Flipping 10 more random bits (50 total)

In [None]:
import random

num_flips = 10
num_rows, num_cols = weights_uint.shape

for i in range(num_flips):
    row = random.randint(0, num_rows - 1)
    col = random.randint(0, num_cols - 1)
    bit_position = random.randint(0, 31)
    print(f"Flipping bit {bit_position} at position (row={row}, col={col})")

    # Create a mask for the chosen bit and flip it using XOR
    mask = 1 << bit_position
    weights_uint[row, col] ^= mask

# Convert the modified binary representation back to float32
modified_weights_np = weights_uint.view(np.float32)

# Update the model's out_proj layer weights with the modified weights
out_proj_layer.weight.data.copy_(torch.tensor(modified_weights_np, dtype=weights_fp32.dtype))

print("50 bit flips performed and model weights updated.")
print("Total of 50 bits have been flipped")


Flipping bit 31 at position (row=1, col=141)
Flipping bit 7 at position (row=0, col=48)
Flipping bit 10 at position (row=1, col=642)
Flipping bit 27 at position (row=6, col=696)
Flipping bit 24 at position (row=4, col=65)
Flipping bit 29 at position (row=3, col=610)
Flipping bit 0 at position (row=4, col=257)
Flipping bit 7 at position (row=5, col=738)
Flipping bit 17 at position (row=5, col=549)
Flipping bit 21 at position (row=6, col=656)
50 bit flips performed and model weights updated.
Total of 50 bits have been flipped


In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

preds = []
for i in df_test['code'].values:
    with torch.no_grad():
        inputs = tokenizer(i, return_tensors="pt",  truncation=True).to(device)
        logits = model(**inputs).logits
        predicted_class_id = logits.argmax().item()
        preds.append(predicted_class_id)


y_true = [id2label[i] for i in df_test['label'].values]
y_pred = [id2label[i] for i in preds]

from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred))
print("Total of 50 bits have been flipped")


              precision    recall  f1-score   support

        CWE0       0.95      0.96      0.95      9785
      CWE120       0.91      0.89      0.90      1093
      CWE122       0.88      0.90      0.89       286
      CWE190       0.91      0.84      0.87      2228
      CWE369       0.76      0.90      0.82       705
      CWE400       0.90      0.90      0.90      1906
      CWE502       1.00      0.75      0.86        24

    accuracy                           0.93     16027
   macro avg       0.90      0.88      0.89     16027
weighted avg       0.93      0.93      0.93     16027

Total of 50 bits have been flipped


### Iteration 5.0: Flipping 10 more bits (60 total)

In [None]:
import random

num_flips = 10
num_rows, num_cols = weights_uint.shape

for i in range(num_flips):
    row = random.randint(0, num_rows - 1)
    col = random.randint(0, num_cols - 1)
    bit_position = random.randint(0, 31)
    print(f"Flipping bit {bit_position} at position (row={row}, col={col})")

    # Create a mask for the chosen bit and flip it using XOR
    mask = 1 << bit_position
    weights_uint[row, col] ^= mask

# Convert the modified binary representation back to float32
modified_weights_np = weights_uint.view(np.float32)

# Update the model's out_proj layer weights with the modified weights
out_proj_layer.weight.data.copy_(torch.tensor(modified_weights_np, dtype=weights_fp32.dtype))

print("60 bit flips performed and model weights updated.")
print("Total of 60 bits have been flipped")


Flipping bit 27 at position (row=0, col=300)
Flipping bit 0 at position (row=1, col=464)
Flipping bit 16 at position (row=5, col=736)
Flipping bit 6 at position (row=4, col=182)
Flipping bit 19 at position (row=6, col=640)
Flipping bit 12 at position (row=6, col=654)
Flipping bit 10 at position (row=1, col=382)
Flipping bit 0 at position (row=4, col=543)
Flipping bit 31 at position (row=4, col=331)
Flipping bit 23 at position (row=0, col=114)
60 bit flips performed and model weights updated.
Total of 60 bits have been flipped


In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

preds = []
for i in df_test['code'].values:
    with torch.no_grad():
        inputs = tokenizer(i, return_tensors="pt",  truncation=True).to(device)
        logits = model(**inputs).logits
        predicted_class_id = logits.argmax().item()
        preds.append(predicted_class_id)


y_true = [id2label[i] for i in df_test['label'].values]
y_pred = [id2label[i] for i in preds]

from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred))
print("Total of 60 bits have been flipped")




              precision    recall  f1-score   support

        CWE0       0.95      0.96      0.95      9785
      CWE120       0.91      0.89      0.90      1093
      CWE122       0.89      0.90      0.89       286
      CWE190       0.91      0.84      0.87      2228
      CWE369       0.76      0.90      0.82       705
      CWE400       0.89      0.90      0.90      1906
      CWE502       1.00      0.75      0.86        24

    accuracy                           0.93     16027
   macro avg       0.90      0.88      0.89     16027
weighted avg       0.93      0.93      0.93     16027

Total of 60 bits have been flipped


### Iteration 6: flipping 10 more (70 total)

In [None]:
import random

num_flips = 10
num_rows, num_cols = weights_uint.shape

for i in range(num_flips):
    row = random.randint(0, num_rows - 1)
    col = random.randint(0, num_cols - 1)
    bit_position = random.randint(0, 31)
    print(f"Flipping bit {bit_position} at position (row={row}, col={col})")

    # Create a mask for the chosen bit and flip it using XOR
    mask = 1 << bit_position
    weights_uint[row, col] ^= mask

# Convert the modified binary representation back to float32
modified_weights_np = weights_uint.view(np.float32)

# Update the model's out_proj layer weights with the modified weights
out_proj_layer.weight.data.copy_(torch.tensor(modified_weights_np, dtype=weights_fp32.dtype))

print("70 bit flips performed and model weights updated.")
print("Total of 70 bits have been flipped")


Flipping bit 15 at position (row=6, col=314)
Flipping bit 5 at position (row=0, col=246)
Flipping bit 31 at position (row=0, col=749)
Flipping bit 8 at position (row=6, col=70)
Flipping bit 30 at position (row=1, col=675)
Flipping bit 16 at position (row=4, col=169)
Flipping bit 27 at position (row=4, col=621)
Flipping bit 12 at position (row=1, col=552)
Flipping bit 25 at position (row=5, col=319)
Flipping bit 23 at position (row=5, col=665)
70 bit flips performed and model weights updated.
Total of 70 bits have been flipped


In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

preds = []
for i in df_test['code'].values:
    with torch.no_grad():
        inputs = tokenizer(i, return_tensors="pt",  truncation=True).to(device)
        logits = model(**inputs).logits
        predicted_class_id = logits.argmax().item()
        preds.append(predicted_class_id)


y_true = [id2label[i] for i in df_test['label'].values]
y_pred = [id2label[i] for i in preds]

from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred))
print("Total of 70 bits have been flipped")




              precision    recall  f1-score   support

        CWE0       0.73      0.00      0.00      9785
      CWE120       0.94      0.24      0.39      1093
      CWE122       0.94      0.81      0.87       286
      CWE190       0.98      0.48      0.65      2228
      CWE369       1.00      0.00      0.00       705
      CWE400       0.13      1.00      0.23      1906
      CWE502       1.00      0.75      0.86        24

    accuracy                           0.22     16027
   macro avg       0.82      0.47      0.43     16027
weighted avg       0.73      0.22      0.16     16027

Total of 70 bits have been flipped


### Iteration 7: Flipping a total of 80 weights:

In [None]:
import random

num_flips = 10
num_rows, num_cols = weights_uint.shape

for i in range(num_flips):
    row = random.randint(0, num_rows - 1)
    col = random.randint(0, num_cols - 1)
    bit_position = random.randint(0, 31)
    print(f"Flipping bit {bit_position} at position (row={row}, col={col})")

    # Create a mask for the chosen bit and flip it using XOR
    mask = 1 << bit_position
    weights_uint[row, col] ^= mask

# Convert the modified binary representation back to float32
modified_weights_np = weights_uint.view(np.float32)

# Update the model's out_proj layer weights with the modified weights
out_proj_layer.weight.data.copy_(torch.tensor(modified_weights_np, dtype=weights_fp32.dtype))

print("80 bit flips performed and model weights updated.")
print("Total of 80 bits have been flipped")


Flipping bit 28 at position (row=3, col=529)
Flipping bit 14 at position (row=0, col=253)
Flipping bit 1 at position (row=0, col=346)
Flipping bit 14 at position (row=4, col=567)
Flipping bit 0 at position (row=4, col=225)
Flipping bit 3 at position (row=0, col=724)
Flipping bit 2 at position (row=1, col=69)
Flipping bit 4 at position (row=6, col=338)
Flipping bit 17 at position (row=4, col=243)
Flipping bit 13 at position (row=5, col=497)
80 bit flips performed and model weights updated.
Total of 80 bits have been flipped


In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

preds = []
for i in df_test['code'].values:
    with torch.no_grad():
        inputs = tokenizer(i, return_tensors="pt",  truncation=True).to(device)
        logits = model(**inputs).logits
        predicted_class_id = logits.argmax().item()
        preds.append(predicted_class_id)


y_true = [id2label[i] for i in df_test['label'].values]
y_pred = [id2label[i] for i in preds]

from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred))
print("Total of 80 bits have been flipped")




              precision    recall  f1-score   support

        CWE0       0.73      0.00      0.00      9785
      CWE120       0.94      0.24      0.39      1093
      CWE122       0.93      0.81      0.87       286
      CWE190       0.98      0.48      0.65      2228
      CWE369       1.00      0.00      0.00       705
      CWE400       0.13      1.00      0.23      1906
      CWE502       1.00      0.75      0.86        24

    accuracy                           0.22     16027
   macro avg       0.82      0.47      0.43     16027
weighted avg       0.73      0.22      0.16     16027

Total of 80 bits have been flipped


### Iteration 8: Flipping a total of 90 weights:

In [None]:
import random

num_flips = 10
num_rows, num_cols = weights_uint.shape

for i in range(num_flips):
    row = random.randint(0, num_rows - 1)
    col = random.randint(0, num_cols - 1)
    bit_position = random.randint(0, 31)
    print(f"Flipping bit {bit_position} at position (row={row}, col={col})")

    # Create a mask for the chosen bit and flip it using XOR
    mask = 1 << bit_position
    weights_uint[row, col] ^= mask

# Convert the modified binary representation back to float32
modified_weights_np = weights_uint.view(np.float32)

# Update the model's out_proj layer weights with the modified weights
out_proj_layer.weight.data.copy_(torch.tensor(modified_weights_np, dtype=weights_fp32.dtype))

print("90 bit flips performed and model weights updated.")
print("Total of 90 bits have been flipped")


Flipping bit 30 at position (row=4, col=135)
Flipping bit 26 at position (row=1, col=484)
Flipping bit 6 at position (row=1, col=96)
Flipping bit 22 at position (row=5, col=441)
Flipping bit 29 at position (row=3, col=420)
Flipping bit 3 at position (row=6, col=746)
Flipping bit 6 at position (row=5, col=669)
Flipping bit 21 at position (row=0, col=412)
Flipping bit 15 at position (row=6, col=111)
Flipping bit 28 at position (row=1, col=194)
90 bit flips performed and model weights updated.
Total of 90 bits have been flipped


In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

preds = []
for i in df_test['code'].values:
    with torch.no_grad():
        inputs = tokenizer(i, return_tensors="pt",  truncation=True).to(device)
        logits = model(**inputs).logits
        predicted_class_id = logits.argmax().item()
        preds.append(predicted_class_id)


y_true = [id2label[i] for i in df_test['label'].values]
y_pred = [id2label[i] for i in preds]

from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred))
print("Total of 90 bits have been flipped")




              precision    recall  f1-score   support

        CWE0       0.73      0.00      0.00      9785
      CWE120       0.94      0.19      0.32      1093
      CWE122       0.84      0.06      0.10       286
      CWE190       0.99      0.47      0.64      2228
      CWE369       0.00      0.00      0.00       705
      CWE400       0.13      1.00      0.23      1906
      CWE502       1.00      0.54      0.70        24

    accuracy                           0.20     16027
   macro avg       0.66      0.32      0.29     16027
weighted avg       0.68      0.20      0.14     16027

Total of 90 bits have been flipped


### Iteration 9: 100 Flipped Weights

In [None]:
import random

num_flips = 10
num_rows, num_cols = weights_uint.shape

for i in range(num_flips):
    row = random.randint(0, num_rows - 1)
    col = random.randint(0, num_cols - 1)
    bit_position = random.randint(0, 31)
    print(f"Flipping bit {bit_position} at position (row={row}, col={col})")

    # Create a mask for the chosen bit and flip it using XOR
    mask = 1 << bit_position
    weights_uint[row, col] ^= mask

# Convert the modified binary representation back to float32
modified_weights_np = weights_uint.view(np.float32)

# Update the model's out_proj layer weights with the modified weights
out_proj_layer.weight.data.copy_(torch.tensor(modified_weights_np, dtype=weights_fp32.dtype))

print("100 bit flips performed and model weights updated.")
print("Total of 100 bits have been flipped")


Flipping bit 11 at position (row=1, col=432)
Flipping bit 15 at position (row=2, col=473)
Flipping bit 28 at position (row=6, col=77)
Flipping bit 6 at position (row=6, col=563)
Flipping bit 0 at position (row=0, col=667)
Flipping bit 10 at position (row=0, col=242)
Flipping bit 30 at position (row=3, col=497)
Flipping bit 3 at position (row=1, col=410)
Flipping bit 0 at position (row=1, col=388)
Flipping bit 29 at position (row=3, col=271)
100 bit flips performed and model weights updated.
Total of 100 bits have been flipped


In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

preds = []
for i in df_test['code'].values:
    with torch.no_grad():
        inputs = tokenizer(i, return_tensors="pt",  truncation=True).to(device)
        logits = model(**inputs).logits
        predicted_class_id = logits.argmax().item()
        preds.append(predicted_class_id)


y_true = [id2label[i] for i in df_test['label'].values]
y_pred = [id2label[i] for i in preds]

from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred))
print("Total of 100 bits have been flipped")

              precision    recall  f1-score   support

        CWE0       0.12      0.00      0.00      9785
      CWE120       0.67      0.04      0.07      1093
      CWE122       0.22      0.05      0.08       286
      CWE190       0.77      0.42      0.54      2228
      CWE369       0.00      0.00      0.00       705
      CWE400       0.13      1.00      0.23      1906
      CWE502       1.00      0.54      0.70        24

    accuracy                           0.18     16027
   macro avg       0.42      0.29      0.23     16027
weighted avg       0.25      0.18      0.11     16027

Total of 100 bits have been flipped


---
---
---
---

### Next step would be to inject bit flips into other model weights

# GraphCodeBERT Model Configuration Analysis


---

## 1️ Model Architecture
| **Attribute** | **Value** | **Interpretation** |
|--------------|----------|-------------------|
| `"architectures"` | `["RobertaForSequenceClassification"]` | The model is **RoBERTa-based**, adapted for **classification**. |
| `"model_type"` | `"roberta"` | Confirms that the underlying transformer is **RoBERTa** (not BERT). |
| `"problem_type"` | `"single_label_classification"` | Used for **single-label CWE classification** (one label per sample). |

---

## 2️ Model Size & Layers
| **Attribute** | **Value** | **Interpretation** |
|--------------|----------|-------------------|
| `"hidden_size"` | `768` | Each token is encoded into a **768-dimensional vector**. |
| `"num_hidden_layers"` | `12` | The model has **12 transformer layers** (standard for RoBERTa). |
| `"num_attention_heads"` | `12` | Each layer has **12 attention heads** for multi-head attention. |
| `"intermediate_size"` | `3072` | The **feed-forward network (FFN) size** inside each layer. |

---

## 3️ Tokenization & Embeddings
| **Attribute** | **Value** | **Interpretation** |
|--------------|----------|-------------------|
| `"vocab_size"` | `50265` | Model has a vocabulary of **50K tokens** (subwords & special tokens). |
| `"max_position_embeddings"` | `514` | Supports input sequences up to **514 tokens** (default RoBERTa). |
| `"bos_token_id"` | `0` | Special token marking the **beginning of a sequence**. |
| `"eos_token_id"` | `2` | Special token marking the **end of a sequence**. |
| `"pad_token_id"` | `1` | Padding token used for **batch processing**. |

---

## 4️ Training & Regularization Parameters
| **Attribute** | **Value** | **Interpretation** |
|--------------|----------|-------------------|
| `"hidden_act"` | `"gelu"` | Uses **GELU activation** (common in transformer models). |
| `"initializer_range"` | `0.02` | Standard weight initialization range. |
| `"attention_probs_dropout_prob"` | `0.1` | **10% dropout** for attention layers (reduces overfitting). |
| `"hidden_dropout_prob"` | `0.1` | **10% dropout** for hidden layers (improves generalization). |
| `"classifier_dropout"` | `null` | No extra dropout in classifier head. |

---

## 5️ Label Mappings (CWE Class Assignments)
| **CWE Type** | **Index (label2id)** |
|-------------|-----------------|
| `"CWE0"` | `0` |
| `"CWE119"` | `2` |
| `"CWE120"` | `3` |
| `"CWE121"` | `8` |
| `"CWE122"` | `9` |
| `"CWE125"` | `5` |
| `"CWE131"` | `7` |
| `"CWE369"` | `6` |
| `"CWE415"` | `4` |
| `"CWE680"` | `10` |
| `"CWE787"` | `1` |
| `"CWE805"` | `11` |


---

## Next Steps
Now that we evaluated the baseline, unflipped model, let's identify and access model weights


* Exploring modifying the first layer, to see how error propagates.
* This tensor represents the query matrix in the first self-attention mechanism.

The weights are stored as 32-bit floating point numbers.

## **Process Workflow:**
1. Convert the float weights to binary representation (bit-level view).
2. Perform the bit-flip operation.
3. Convert the modified binary back to float representation.
4. Update the model weights

Above is the total number of weight values in our model

# Bit-Flip Implementation Plan
1. Select Random Weights to Modify (e.g., 0.1% of total weights).
2. Select a Random Bit in Each Chosen Weight (32-bit representation).
3. Flip the Bit Using XOR Operation (^= (1 << bit_position)).
4. Convert the Modified Integer Weights Back to Floats.
5. Replace the Modified Weights in the Model.

## As you can see, flipping weights in the first layer causes irrecoverable errors propagating through the rest of the layers in the model.
* This leads to the NaN error

# Let's attempt flipping the final classification layer:

Still seeing the NaN error. Checking if modified weights have NaNs: