# **May 22nd**
## Testing Fault Tolerance of a GraphCodeBERT-based Buffer Overflow CWE Classification model, by injecting bit-flip faults into the model weight parameters.

##Update:
- Using Ratnaker's new dataset

# **Code Update (24th March):**
* Changed CWEs to Ratnaker's recommendations
* Exploring bit flip injections into different layers other than classifier head
* Flip exponent bits instead of sign bits
* use DFMIT and Defor for performance metrics

## Purpose of the script:
1. Train a GraphCodeBERT-based model to classify code snippets into different CWE types (specifically those related to buffer overflows).

2. Introduce bit-flip noise into the model weights post-training, prior to inference on unseen test data.

3. Evaluate how this noise affects the model's accuracy and robustness.

---

Installing ML and NLP-related libraries, mainly from hugging face

In [None]:
!pip install datasets
!pip install transformers
!pip install accelerate -U
!pip install transformers[torch]
!pip install wandb

Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.wh

In [None]:
!pip uninstall -y transformers
!pip install transformers --upgrade --quiet
!pip show transformers


Found existing installation: transformers 4.52.2
Uninstalling transformers-4.52.2:
  Successfully uninstalled transformers-4.52.2
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.5/10.5 MB[0m [31m86.8 MB/s[0m eta [36m0:00:00[0m
[?25hName: transformers
Version: 4.52.4
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache 2.0 License
Location: /usr/local/lib/python3.11/dist-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: peft, sentence-transformers


In [None]:
from tqdm import tqdm, trange
import multiprocessing

from torch.optim import AdamW  # UPDATED
from transformers import (
    WEIGHTS_NAME, get_linear_schedule_with_warmup,
    RobertaConfig, RobertaForSequenceClassification, RobertaTokenizer,
    RobertaForMaskedLM, pipeline, DataCollatorWithPadding,
    AutoModelForSequenceClassification, TrainingArguments, Trainer
)
from datasets import Dataset
import torch

!pip install evaluate
import evaluate
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split


Collecting evaluate
  Downloading evaluate-0.4.3-py3-none-any.whl.metadata (9.2 kB)
Downloading evaluate-0.4.3-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.0/84.0 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: evaluate
Successfully installed evaluate-0.4.3


## Transformers & Hugging Face Libraries  
- **RobertaConfig** → Configuration settings for RoBERTa models.  
- **RobertaForSequenceClassification** → RoBERTa model for classification tasks.  
- **RobertaTokenizer** → Tokenizer for RoBERTa (converts text into tokenized inputs).  
- **RobertaForMaskedLM** → RoBERTa for Masked Language Modeling (predicting masked words).  
- **pipeline** → High-level API for using pre-trained models easily.  
- **DataCollatorWithPadding** → Ensures tokenized inputs are correctly padded for training.  
- **AutoModelForSequenceClassification** → Generic method for loading classification models.  
- **TrainingArguments & Trainer** → Utilities for managing model training.  

## Torch & Optimizers  
- **torch** → PyTorch framework for training deep learning models.  
- **AdamW** → Optimizer designed for transformers.  
- **get_linear_schedule_with_warmup** → Learning rate scheduler.  

## Additional Libraries  
- **evaluate** → A package for computing accuracy, F1-score, etc., similar to `datasets.metric`.  
- **numpy & pandas** → For handling datasets and numerical operations.  
- **sklearn.train_test_split** → Splits data into training and test sets.  


In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import pandas as pd
df=pd.read_csv('/content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/cvefixes_final.csv')

In [None]:
df.head(1)

Unnamed: 0,file_name,programming_language,code_before,code_after,diff,num_lines_added,num_lines_deleted,num_lines_in_file,num_tokens_in_file,complexity,...,commit_message,merge,cve_id,cwe_id,method_change_id,method_code,num_lines_in_method,method_complexity,num_tokens_in_method,vulnerable
0,dl-load.c,C,/* Map in a shared object's segments from the ...,/* Map in a shared object's segments from the ...,"{'added': [(152, ' const char *const start = ...",28,10,952.0,6592.0,260.0,...,Update.\n\n1999-11-09 Ulrich Drepper <dreppe...,False,CVE-1999-0199,CWE-252,217096824924488,"_dl_dst_count (const char *name, int is_path)\...",22,13,199,True


A bit of analysis to get accustomed to the new dataset.

In [None]:
print(df['cwe_id'].unique())
print(df.columns.tolist())


['CWE-252' 'SAFE' 'CWE-415' 'CWE-476' 'CWE-284' 'CWE-617' 'CWE-674'
 'CWE-190' 'CWE-400' 'CWE-416' 'CWE-835' 'CWE-665' 'CWE-369' 'CWE-404'
 'CWE-191' 'CWE-667' 'CWE-319' 'CWE-401' 'CWE-122' 'CWE-681' 'CWE-843'
 'CWE-367' 'CWE-134' 'CWE-121' 'CWE-426' 'CWE-78' 'CWE-457' 'CWE-126'
 'CWE-672' 'CWE-273' 'CWE-459' 'CWE-327']
['file_name', 'programming_language', 'code_before', 'code_after', 'diff', 'num_lines_added', 'num_lines_deleted', 'num_lines_in_file', 'num_tokens_in_file', 'complexity', 'file_change_id', 'hash', 'change_type', 'old_file_path', 'new_file_path', 'repo_url', 'author', 'committer', 'commit_message', 'merge', 'cve_id', 'cwe_id', 'method_change_id', 'method_code', 'num_lines_in_method', 'method_complexity', 'num_tokens_in_method', 'vulnerable']


Ratnaker gave me the agency to decide myself which CWEs I want to select. Therefore I want to figure out the distribution of CWE types within the dataset:

In [None]:
cwe_counts = df['cwe_id'].value_counts()
cwe_counts

Unnamed: 0_level_0,count
cwe_id,Unnamed: 1_level_1
SAFE,14066
CWE-190,687
CWE-476,471
CWE-416,421
CWE-415,171
CWE-400,161
CWE-617,142
CWE-401,84
CWE-284,78
CWE-122,73


As you can see, a lot of CWEs don't have enough representation to be used in model training and inference. Therefore, I am setting a threshold of a minimum of 50 data points required for a CWE type to be included in this model.

In [None]:
import pandas as pd

cwe_selection =  [
    'CWE-190', 'CWE-476', 'CWE-416', 'CWE-415', 'CWE-400', 'CWE-617',
    'CWE-401', 'CWE-284', 'CWE-122', 'CWE-835', 'CWE-843', 'CWE-78'
]

may_filtered_df = df[df['cwe_id'].isin(cwe_selection)]

# unique CWEs in the filtered result
unique_cwes = may_filtered_df['cwe_id'].unique()
print("Unique CWEs in the filtered dataset:", unique_cwes)

may_filtered_df.to_csv('filtered_dataset.csv', index=False)
print("Dataset has been filtered and saved as 'filtered_dataset.csv'")


Unique CWEs in the filtered dataset: ['CWE-415' 'CWE-476' 'CWE-284' 'CWE-617' 'CWE-190' 'CWE-400' 'CWE-416'
 'CWE-835' 'CWE-401' 'CWE-122' 'CWE-843' 'CWE-78']
Dataset has been filtered and saved as 'filtered_dataset.csv'


In [None]:
may_filtered_df.to_csv('/content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/23May_filtered_dataset.csv', index=False)

In [None]:
len(may_filtered_df)

2478

2478 rows in the new dataset

In [None]:
df = may_filtered_df.astype(str)

In [None]:
# Creating 2 dictionaries that convert between unique CWE types and numerical labels
id2label = dict() # Maps integer index to a CWE-type (0 : 'CWE119)
label2id = dict() # Maps CWE-type to an integer index ('CWE119' : 0)
ind = 0
for i in df['cwe_id'].unique():
    id2label[ind] = i
    label2id[i] = ind
    ind+=1

In [None]:
print('id2label dictionary: ')
print(id2label)
print('label2id dictionary: ')
print(label2id)

id2label dictionary: 
{0: 'CWE-415', 1: 'CWE-476', 2: 'CWE-284', 3: 'CWE-617', 4: 'CWE-190', 5: 'CWE-400', 6: 'CWE-416', 7: 'CWE-835', 8: 'CWE-401', 9: 'CWE-122', 10: 'CWE-843', 11: 'CWE-78'}
label2id dictionary: 
{'CWE-415': 0, 'CWE-476': 1, 'CWE-284': 2, 'CWE-617': 3, 'CWE-190': 4, 'CWE-400': 5, 'CWE-416': 6, 'CWE-835': 7, 'CWE-401': 8, 'CWE-122': 9, 'CWE-843': 10, 'CWE-78': 11}


In [None]:
df['label']=df['cwe_id'].map(label2id)
df.head()

Unnamed: 0,file_name,programming_language,code_before,code_after,diff,num_lines_added,num_lines_deleted,num_lines_in_file,num_tokens_in_file,complexity,...,merge,cve_id,cwe_id,method_change_id,method_code,num_lines_in_method,method_complexity,num_tokens_in_method,vulnerable,label
16,spnego_mech.c,C,"/*\n * Copyright (C) 2006,2008 by the Massachu...","/*\n * Copyright (C) 2006,2008 by the Massachu...","{'added': [], 'deleted': [(821, '\tgeneric_gss...",0,1,3104.0,15617.0,512.0,...,False,CVE-2014-4343,CWE-415,125656663779789,"init_ctx_reselect(OM_uint32 *minor_status, spn...",25,5,162,True,0
17,spnego_mech.c,C,"/*\n * Copyright (C) 2006,2008 by the Massachu...","/*\n * Copyright (C) 2006,2008 by the Massachu...","{'added': [(1471, '\tif (REMAIN == 0 || REMAIN...",1,1,3104.0,15621.0,513.0,...,False,CVE-2014-4344,CWE-476,165040919595628,"acc_ctx_cont(OM_uint32 *minstat,\n\t gss_b...",57,10,269,True,1
21,ldap_pwd_policy.c,C,/* -*- mode: c; c-basic-offset: 4; indent-tabs...,/* -*- mode: c; c-basic-offset: 4; indent-tabs...,"{'added': [(317, ' if (ent == NULL) {'), (3...",4,3,329.0,1998.0,54.0,...,False,CVE-2014-5353,CWE-476,153815760115106,krb5_ldap_get_password_policy_from_dn(krb5_con...,38,7,235,True,1
40,kadm_rpc_svc.c,C,"/* -*- mode: c; c-file-style: ""bsd""; indent-ta...","/* -*- mode: c; c-file-style: ""bsd""; indent-ta...","{'added': [(7, '#include <k5-int.h>'), (299, '...",3,9,269.0,1461.0,44.0,...,False,CVE-2014-9422,CWE-284,144486914514667,check_rpcsec_auth(struct svc_req *rqstp)\n{\n ...,55,9,365,True,2
130,kdc_util.c,C,/* -*- mode: c; c-basic-offset: 4; indent-tabs...,/* -*- mode: c; c-basic-offset: 4; indent-tabs...,"{'added': [(742, ' if (check_anon(kdc_activ...",1,1,1333.0,7563.0,310.0,...,False,CVE-2016-3120,CWE-476,43185298949362,validate_as_request(kdc_realm_t *kdc_active_re...,74,27,448,True,1


In [None]:
# Splitting the dataset into training(80%) and test (20%) sets
df_train, df_test = train_test_split(df, test_size=0.25, random_state=42)

It's important to check class balance in both train and test sets:

In [None]:
train_counts = df_train['cwe_id'].value_counts()
test_counts = df_test['cwe_id'].value_counts()
combined = pd.DataFrame({'train': train_counts, 'test': test_counts}).fillna(0).astype(int)
print(combined)


         train  test
cwe_id              
CWE-122     52    21
CWE-190    508   179
CWE-284     62    16
CWE-400    121    40
CWE-401     60    24
CWE-415    142    29
CWE-416    313   108
CWE-476    361   110
CWE-617     99    43
CWE-78      50    10
CWE-835     47    23
CWE-843     43    17


Distribution of cwes looks good :)

In [None]:
dataset = {} # Creating an empty dictionary
dataset['text'] = list(df_train['code_before']) # adding key-value pair to dataset dictionary, 'text' = key and 'code' = value (in the form of a list). Serves as the feature.
dataset['label'] = list(df_train['label']) # same, but adding the key-value pair to act as the label (prediction) for the model.
# The code below converts dictionary we just created into a Hugging Face dataset object. It provides many convenient NLP features, such as tokenization.
ds = Dataset.from_dict(dataset) # Creation of hugging face dataset object.
ds = ds.train_test_split(test_size=0.1) # train/validation split (10% validation)

The code cell above performs the **second (2ND)** data split.

### 1st Split:
* Creating the initial training and test datasets.
* test dataset is entirely separated from the training process
### 2nd Split:
* Splits the training data set into training and validation
* The validation set is used for hyperparameter tuning and intermediate evaluations during the training phase. Happens before testing

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load tokenizer for GraphCodeBERT
tokenizer = AutoTokenizer.from_pretrained("microsoft/graphcodebert-base")

# Determine number of unique classes (CWE types)
num_labels = len(label2id)

# Load model with correct classification head
model = AutoModelForSequenceClassification.from_pretrained(
    "microsoft/graphcodebert-base",
    num_labels=num_labels,
    id2label=id2label,
    label2id=label2id
)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/539 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/499M [00:00<?, ?B/s]

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at microsoft/graphcodebert-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
from sklearn.metrics import accuracy_score, precision_recall_fscore_support, confusion_matrix # model performance evaluation metrics
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
accuracy = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=labels)

# A function that calculates accuracy during model evaluation by comparing the predicted labels (after applying argmax) to the true labels.

model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]

In [None]:
def preprocess_function(examples):
    return tokenizer(examples["text"], truncation=True)

tokenized_dataset = ds.map(preprocess_function, batched=True)
# Tokenizing the dataset

Map:   0%|          | 0/1672 [00:00<?, ? examples/s]

Map:   0%|          | 0/186 [00:00<?, ? examples/s]

In [None]:
from transformers import Trainer, TrainingArguments

'''
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
)

'''
training_args = TrainingArguments(
    output_dir="/content/drive/MyDrive/Colab Notebooks/THESIS_PROJECT/MODEL_WEIGHTS/NEW_MODEL_WEIGHTS/graphcodebert_bo",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=6,
    weight_decay=0.01,
    eval_strategy="epoch",
    save_strategy="epoch",
    #logging_stragety = "epoch",
    #logging_first_step = True,
    logging_steps = 1,
    load_best_model_at_end=True,
    report_to="wandb",
    fp16 = True,
    warmup_steps = 20
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

trainer.train()


  trainer = Trainer(


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33malenabd24[0m ([33malenabd24-queen-mary-university-of-london[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Epoch,Training Loss,Validation Loss,Accuracy
1,1.7912,1.667984,0.489247
2,1.6764,1.421497,0.543011
3,1.4983,1.355127,0.564516
4,1.1972,1.27926,0.596774
5,0.6595,1.243351,0.639785
6,0.6157,1.212927,0.629032


TrainOutput(global_step=630, training_loss=1.2711871707250202, metrics={'train_runtime': 587.9929, 'train_samples_per_second': 17.061, 'train_steps_per_second': 1.071, 'total_flos': 2639767100129280.0, 'train_loss': 1.2711871707250202, 'epoch': 6.0})

**Saving the baseline model weights (to re-load later if necessary)

* The idea is to fine-tune the model first, so that it selects appropriate weights for the classification task.
* After training, the model's accuracy should be evaluated without bit flips
* Following that, I'll inject bit flips and compare accuracy to before vs after fault injection

In [None]:
trainer.evaluate()

{'eval_loss': 1.2129273414611816,
 'eval_accuracy': 0.6290322580645161,
 'eval_runtime': 1.4943,
 'eval_samples_per_second': 124.475,
 'eval_steps_per_second': 8.031,
 'epoch': 6.0}

In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

preds = []
for i in df_test['code_before'].values:
    with torch.no_grad():
        inputs = tokenizer(i, return_tensors="pt",  truncation=True).to(device)
        logits = model(**inputs).logits
        predicted_class_id = logits.argmax().item()
        preds.append(predicted_class_id)

In [None]:
y_true = [id2label[i] for i in df_test['label'].values]
y_pred = [id2label[i] for i in preds]

In [None]:
from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred))

              precision    recall  f1-score   support

     CWE-122       0.86      0.29      0.43        21
     CWE-190       0.87      0.85      0.86       179
     CWE-284       0.92      0.69      0.79        16
     CWE-400       0.76      0.47      0.58        40
     CWE-401       0.67      0.17      0.27        24
     CWE-415       0.56      0.79      0.66        29
     CWE-416       0.53      0.64      0.58       108
     CWE-476       0.51      0.67      0.58       110
     CWE-617       0.59      0.67      0.63        43
      CWE-78       0.64      0.70      0.67        10
     CWE-835       1.00      0.09      0.16        23
     CWE-843       0.72      0.76      0.74        17

    accuracy                           0.66       620
   macro avg       0.72      0.57      0.58       620
weighted avg       0.70      0.66      0.65       620



In [None]:
trainer.save_model("/content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may")


# Bit Flipping Strategy:

GraphCodeBERt has 12 transformer layers, indexed as encoder.layer[0] through encoder.layer[11].

### Each weight is a float32 number stored in 32 bits:
* 1 sign bit (positive or negative)
* 8 exponent bits (scaling the value, most critical)
* 23 mantissa bits (fractional precision)

### Starting point:
* Flip weights in layer[i].attention.self.query.weight, as you're already doing — this impacts early attention mechanisms

The idea is to essentially flip X (5) bits in each layer, starting from layer 0 and observing the effect on inference of the model.

Let's see how many weights there are in the query weight matrix for all layers:

In [None]:
for i in range(12):
    shape = model.roberta.encoder.layer[i].attention.self.query.weight.shape
    num_weights = model.roberta.encoder.layer[i].attention.self.query.weight.numel()
    print(f"Layer {i}: shape = {shape}, total weights = {num_weights}")


Layer 0: shape = torch.Size([768, 768]), total weights = 589824
Layer 1: shape = torch.Size([768, 768]), total weights = 589824
Layer 2: shape = torch.Size([768, 768]), total weights = 589824
Layer 3: shape = torch.Size([768, 768]), total weights = 589824
Layer 4: shape = torch.Size([768, 768]), total weights = 589824
Layer 5: shape = torch.Size([768, 768]), total weights = 589824
Layer 6: shape = torch.Size([768, 768]), total weights = 589824
Layer 7: shape = torch.Size([768, 768]), total weights = 589824
Layer 8: shape = torch.Size([768, 768]), total weights = 589824
Layer 9: shape = torch.Size([768, 768]), total weights = 589824
Layer 10: shape = torch.Size([768, 768]), total weights = 589824
Layer 11: shape = torch.Size([768, 768]), total weights = 589824


* model.roberta.encoder.layer[i].attention.self.query.weight is a tensor of shape [768, 768], which has 589,824 float32 weights.
* Each float32 value has 32 bits.
* I'm randomly picking 5 (row, col) indices and flipping 1 random bit in each of those 5 weights.
* So, 5 distinct weights are altered, one bit each.
* I suspect Im not gonna see any depreciation, so consequently I'll increase number of flipped bits

It's worth mentioning that the bits I'm flipping are query weights.

| Component                 | Effect When Flipped                           | Result in Inference                                       |
| ------------------------- | --------------------------------------------- | --------------------------------------------------------- |
| **Query Weights** (`W_Q`) | Alters what each token *asks for*             | Completely distorts which other tokens it attends to      |
| **Key Weights** (`W_K`)   | Alters what each token *looks like to others* | Makes it hard for other tokens to recognize relevant info |
| **Value Weights** (`W_V`) | Alters what information is actually *passed*  | Affects output even if attention is correct               |
| **All Combined**          | Breaks attention mechanism completely         | Massive degradation in understanding relationships        |


In [None]:
import torch
import random
import struct

def flip_random_bits_in_query_weight(model, layer_index=0, num_bits=50):
    """
    Flips `num_bits` random bits in the self-attention query weights of the specified layer in the model.

    Returns:
        List of (row, col, original_val, flipped_val) tuples.
    """
    flipped = []

    weight = model.roberta.encoder.layer[layer_index].attention.self.query.weight
    weight_data = weight.data.cpu().numpy()

    num_rows, num_cols = weight_data.shape

    for _ in range(num_bits):
        row = random.randint(0, num_rows - 1)
        col = random.randint(0, num_cols - 1)
        original_val = weight_data[row, col]

        # Convert float to int bit pattern
        int_bits = struct.unpack('>I', struct.pack('>f', original_val))[0]
        bit_to_flip = random.randint(0, 31)
        flipped_bits = int_bits ^ (1 << bit_to_flip)
        flipped_val = struct.unpack('>f', struct.pack('>I', flipped_bits))[0]

        weight_data[row, col] = flipped_val
        flipped.append((row, col, original_val, flipped_val))

    weight.data = torch.tensor(weight_data, dtype=weight.dtype, device=weight.device)
    return flipped


In [None]:
import os
import json
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from sklearn.metrics import classification_report

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_path = "/content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may"
results_dir = os.path.join(model_path, "bitflip_results_5")
os.makedirs(results_dir, exist_ok=True)

number_bits = 50  # ✅ define your flip count here

for layer_idx in range(12):
    print(f"\n--- Flipping Layer {layer_idx} ---")

    # Step 1: Reload model
    model = AutoModelForSequenceClassification.from_pretrained(model_path).to(device)
    tokenizer = AutoTokenizer.from_pretrained(model_path)

    # Step 2: Inject bit flips
    flipped = flip_random_bits_in_query_weight(model, layer_index=layer_idx, num_bits=number_bits)

    # Step 3: Inference on df_test
    preds = []
    model.eval()
    for code in df_test['code_before'].values:
        with torch.no_grad():
            inputs = tokenizer(code, return_tensors="pt", truncation=True).to(device)
            logits = model(**inputs).logits
            predicted_class = logits.argmax().item()
            preds.append(predicted_class)

    y_true = [id2label[i] for i in df_test['label'].values]
    y_pred = [id2label[i] for i in preds]

    # Step 4: Report & Save
    report = classification_report(y_true, y_pred, output_dict=True)
    print(classification_report(y_true, y_pred))

    save_path = os.path.join(results_dir, f"layer{layer_idx}_xor{number_bits}_report.json")
    flip_log_path = os.path.join(results_dir, f"layer{layer_idx}_xor{number_bits}_flips.json")

    with open(flip_log_path, "w") as f:
        json.dump([{
            "row": int(row),
            "col": int(col),
            "original": float(orig),
            "flipped": float(flipped_val)
        } for (row, col, orig, flipped_val) in flipped], f, indent=4)

    with open(save_path, "w") as f:
        json.dump(report, f, indent=4)

    print(f"✅ Saved report for layer {layer_idx} to {save_path}")



--- Flipping Layer 0 ---
              precision    recall  f1-score   support

     CWE-122       0.83      0.24      0.37        21
     CWE-190       0.89      0.84      0.87       179
     CWE-284       0.92      0.69      0.79        16
     CWE-400       0.73      0.40      0.52        40
     CWE-401       0.67      0.17      0.27        24
     CWE-415       0.62      0.79      0.70        29
     CWE-416       0.51      0.70      0.59       108
     CWE-476       0.52      0.65      0.57       110
     CWE-617       0.61      0.70      0.65        43
      CWE-78       0.60      0.60      0.60        10
     CWE-835       1.00      0.09      0.16        23
     CWE-843       0.68      0.76      0.72        17

    accuracy                           0.66       620
   macro avg       0.71      0.55      0.57       620
weighted avg       0.70      0.66      0.64       620

✅ Saved report for layer 0 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.67      0.10      0.17        21
     CWE-190       0.91      0.82      0.86       179
     CWE-284       0.91      0.62      0.74        16
     CWE-400       0.77      0.50      0.61        40
     CWE-401       0.67      0.17      0.27        24
     CWE-415       0.43      0.79      0.55        29
     CWE-416       0.49      0.61      0.54       108
     CWE-476       0.41      0.66      0.51       110
     CWE-617       0.60      0.42      0.49        43
      CWE-78       1.00      0.40      0.57        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.93      0.76      0.84        17

    accuracy                           0.61       620
   macro avg       0.65      0.49      0.51       620
weighted avg       0.65      0.61      0.60       620

✅ Saved report for layer 3 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_5/layer3_

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       1.00      0.05      0.09        21
     CWE-190       0.91      0.82      0.86       179
     CWE-284       0.92      0.69      0.79        16
     CWE-400       0.71      0.38      0.49        40
     CWE-401       0.67      0.17      0.27        24
     CWE-415       0.68      0.66      0.67        29
     CWE-416       0.46      0.66      0.54       108
     CWE-476       0.42      0.66      0.52       110
     CWE-617       0.62      0.60      0.61        43
      CWE-78       1.00      0.60      0.75        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.72      0.76      0.74        17

    accuracy                           0.62       620
   macro avg       0.68      0.50      0.53       620
weighted avg       0.66      0.62      0.61       620

✅ Saved report for layer 4 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_5/layer4_

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.75      0.14      0.24        21
     CWE-190       0.88      0.86      0.87       179
     CWE-284       0.92      0.69      0.79        16
     CWE-400       0.78      0.45      0.57        40
     CWE-401       0.67      0.17      0.27        24
     CWE-415       0.70      0.72      0.71        29
     CWE-416       0.56      0.59      0.58       108
     CWE-476       0.42      0.70      0.52       110
     CWE-617       0.60      0.70      0.65        43
      CWE-78       1.00      0.50      0.67        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.87      0.76      0.81        17

    accuracy                           0.65       620
   macro avg       0.68      0.52      0.56       620
weighted avg       0.66      0.65      0.63       620

✅ Saved report for layer 5 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_5/layer5_

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.50      0.01      0.02       179
     CWE-284       0.00      0.00      0.00        16
     CWE-400       0.91      0.25      0.39        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.05      1.00      0.09        29
     CWE-416       0.25      0.02      0.03       108
     CWE-476       0.57      0.04      0.07       110
     CWE-617       1.00      0.02      0.05        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       1.00      0.06      0.11        17

    accuracy                           0.08       620
   macro avg       0.36      0.12      0.06       620
weighted avg       0.45      0.08      0.06       620

✅ Saved report for layer 8 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_5/layer8_

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.89      0.85      0.87       179
     CWE-284       0.92      0.69      0.79        16
     CWE-400       0.77      0.50      0.61        40
     CWE-401       0.67      0.17      0.27        24
     CWE-415       0.61      0.79      0.69        29
     CWE-416       0.52      0.69      0.59       108
     CWE-476       0.51      0.67      0.58       110
     CWE-617       0.61      0.70      0.65        43
      CWE-78       0.60      0.60      0.60        10
     CWE-835       1.00      0.09      0.16        23
     CWE-843       0.81      0.76      0.79        17

    accuracy                           0.66       620
   macro avg       0.66      0.54      0.55       620
weighted avg       0.68      0.66      0.64       620

✅ Saved report for layer 9 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_5/layer9_

### Now flipping 100 weights:

In [None]:
import torch
import random
import struct

def flip_random_bits_in_query_weight(model, layer_index=0, num_bits=100):
    """
    Flips `num_bits` random bits in the self-attention query weights of the specified layer in the model.

    Returns:
        List of (row, col, original_val, flipped_val) tuples.
    """
    flipped = []

    weight = model.roberta.encoder.layer[layer_index].attention.self.query.weight
    weight_data = weight.data.cpu().numpy()

    num_rows, num_cols = weight_data.shape

    for _ in range(num_bits):
        row = random.randint(0, num_rows - 1)
        col = random.randint(0, num_cols - 1)
        original_val = weight_data[row, col]

        # Convert float to int bit pattern
        int_bits = struct.unpack('>I', struct.pack('>f', original_val))[0]
        bit_to_flip = random.randint(0, 31)
        flipped_bits = int_bits ^ (1 << bit_to_flip)
        flipped_val = struct.unpack('>f', struct.pack('>I', flipped_bits))[0]

        weight_data[row, col] = flipped_val
        flipped.append((row, col, original_val, flipped_val))

    weight.data = torch.tensor(weight_data, dtype=weight.dtype, device=weight.device)
    return flipped


In [None]:
import os
import json
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from sklearn.metrics import classification_report

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_path = "/content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may"
results_dir = os.path.join(model_path, "bitflip_results_100_5")
os.makedirs(results_dir, exist_ok=True)

number_bits = 100  # ✅ define your flip count here

for layer_idx in range(12):
    print(f"\n--- Flipping Layer {layer_idx} ---")

    # Step 1: Reload model
    model = AutoModelForSequenceClassification.from_pretrained(model_path).to(device)
    tokenizer = AutoTokenizer.from_pretrained(model_path)

    # Step 2: Inject bit flips
    flipped = flip_random_bits_in_query_weight(model, layer_index=layer_idx, num_bits=number_bits)

    # Step 3: Inference on df_test
    preds = []
    model.eval()
    for code in df_test['code_before'].values:
        with torch.no_grad():
            inputs = tokenizer(code, return_tensors="pt", truncation=True).to(device)
            logits = model(**inputs).logits
            predicted_class = logits.argmax().item()
            preds.append(predicted_class)

    y_true = [id2label[i] for i in df_test['label'].values]
    y_pred = [id2label[i] for i in preds]

    # Step 4: Report & Save
    report = classification_report(y_true, y_pred, output_dict=True)
    print(classification_report(y_true, y_pred))

    save_path = os.path.join(results_dir, f"layer{layer_idx}_xor{number_bits}_report.json")
    flip_log_path = os.path.join(results_dir, f"layer{layer_idx}_xor{number_bits}_flips.json")

    with open(flip_log_path, "w") as f:
        json.dump([{
            "row": int(row),
            "col": int(col),
            "original": float(orig),
            "flipped": float(flipped_val)
        } for (row, col, orig, flipped_val) in flipped], f, indent=4)

    with open(save_path, "w") as f:
        json.dump(report, f, indent=4)

    print(f"✅ Saved report for layer {layer_idx} to {save_path}")



--- Flipping Layer 0 ---
              precision    recall  f1-score   support

     CWE-122       0.75      0.29      0.41        21
     CWE-190       0.86      0.85      0.86       179
     CWE-284       0.92      0.69      0.79        16
     CWE-400       0.55      0.42      0.48        40
     CWE-401       0.67      0.17      0.27        24
     CWE-415       0.72      0.79      0.75        29
     CWE-416       0.53      0.73      0.61       108
     CWE-476       0.53      0.65      0.58       110
     CWE-617       0.61      0.58      0.60        43
      CWE-78       0.54      0.70      0.61        10
     CWE-835       1.00      0.09      0.16        23
     CWE-843       0.69      0.53      0.60        17

    accuracy                           0.66       620
   macro avg       0.70      0.54      0.56       620
weighted avg       0.68      0.66      0.64       620

✅ Saved report for layer 0 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.57      0.19      0.29        21
     CWE-190       0.86      0.82      0.84       179
     CWE-284       0.92      0.69      0.79        16
     CWE-400       0.78      0.35      0.48        40
     CWE-401       0.67      0.17      0.27        24
     CWE-415       0.64      0.72      0.68        29
     CWE-416       0.47      0.73      0.57       108
     CWE-476       0.47      0.59      0.52       110
     CWE-617       0.58      0.67      0.62        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.71      0.71      0.71        17

    accuracy                           0.62       620
   macro avg       0.55      0.47      0.48       620
weighted avg       0.62      0.62      0.60       620

✅ Saved report for layer 2 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_100_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       1.00      0.14      0.25        21
     CWE-190       0.89      0.85      0.87       179
     CWE-284       0.92      0.69      0.79        16
     CWE-400       0.77      0.50      0.61        40
     CWE-401       0.67      0.17      0.27        24
     CWE-415       0.63      0.66      0.64        29
     CWE-416       0.52      0.70      0.60       108
     CWE-476       0.49      0.66      0.56       110
     CWE-617       0.56      0.67      0.61        43
      CWE-78       0.56      0.50      0.53        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.87      0.76      0.81        17

    accuracy                           0.65       620
   macro avg       0.66      0.53      0.54       620
weighted avg       0.67      0.65      0.64       620

✅ Saved report for layer 6 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_100_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       1.00      0.05      0.09        21
     CWE-190       0.71      0.83      0.76       179
     CWE-284       1.00      0.50      0.67        16
     CWE-400       0.68      0.33      0.44        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.32      0.79      0.45        29
     CWE-416       0.60      0.44      0.51       108
     CWE-476       0.38      0.65      0.48       110
     CWE-617       0.82      0.33      0.47        43
      CWE-78       0.67      0.40      0.50        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.75      0.71      0.73        17

    accuracy                           0.55       620
   macro avg       0.58      0.42      0.42       620
weighted avg       0.58      0.55      0.52       620

✅ Saved report for layer 7 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_100_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       1.00      0.14      0.25        21
     CWE-190       0.86      0.86      0.86       179
     CWE-284       0.92      0.69      0.79        16
     CWE-400       0.78      0.45      0.57        40
     CWE-401       0.67      0.17      0.27        24
     CWE-415       0.68      0.79      0.73        29
     CWE-416       0.57      0.60      0.58       108
     CWE-476       0.46      0.71      0.56       110
     CWE-617       0.60      0.67      0.64        43
      CWE-78       0.60      0.60      0.60        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.72      0.76      0.74        17

    accuracy                           0.65       620
   macro avg       0.65      0.54      0.55       620
weighted avg       0.66      0.65      0.63       620

✅ Saved report for layer 8 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_100_5/lay

Now 150 flips

In [None]:
import torch
import random
import struct

def flip_random_bits_in_query_weight(model, layer_index=0, num_bits=150):
    """
    Flips `num_bits` random bits in the self-attention query weights of the specified layer in the model.

    Returns:
        List of (row, col, original_val, flipped_val) tuples.
    """
    flipped = []

    weight = model.roberta.encoder.layer[layer_index].attention.self.query.weight
    weight_data = weight.data.cpu().numpy()

    num_rows, num_cols = weight_data.shape

    for _ in range(num_bits):
        row = random.randint(0, num_rows - 1)
        col = random.randint(0, num_cols - 1)
        original_val = weight_data[row, col]

        # Convert float to int bit pattern
        int_bits = struct.unpack('>I', struct.pack('>f', original_val))[0]
        bit_to_flip = random.randint(0, 31)
        flipped_bits = int_bits ^ (1 << bit_to_flip)
        flipped_val = struct.unpack('>f', struct.pack('>I', flipped_bits))[0]

        weight_data[row, col] = flipped_val
        flipped.append((row, col, original_val, flipped_val))

    weight.data = torch.tensor(weight_data, dtype=weight.dtype, device=weight.device)
    return flipped


In [None]:
import os
import json
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from sklearn.metrics import classification_report

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_path = "/content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may"
results_dir = os.path.join(model_path, "bitflip_results_150_5")
os.makedirs(results_dir, exist_ok=True)

number_bits = 150  # ✅ define your flip count here

for layer_idx in range(12):
    print(f"\n--- Flipping Layer {layer_idx} ---")

    # Step 1: Reload model
    model = AutoModelForSequenceClassification.from_pretrained(model_path).to(device)
    tokenizer = AutoTokenizer.from_pretrained(model_path)

    # Step 2: Inject bit flips
    flipped = flip_random_bits_in_query_weight(model, layer_index=layer_idx, num_bits=number_bits)

    # Step 3: Inference on df_test
    preds = []
    model.eval()
    for code in df_test['code_before'].values:
        with torch.no_grad():
            inputs = tokenizer(code, return_tensors="pt", truncation=True).to(device)
            logits = model(**inputs).logits
            predicted_class = logits.argmax().item()
            preds.append(predicted_class)

    y_true = [id2label[i] for i in df_test['label'].values]
    y_pred = [id2label[i] for i in preds]

    # Step 4: Report & Save
    report = classification_report(y_true, y_pred, output_dict=True)
    print(classification_report(y_true, y_pred))

    save_path = os.path.join(results_dir, f"layer{layer_idx}_xor{number_bits}_report.json")
    flip_log_path = os.path.join(results_dir, f"layer{layer_idx}_xor{number_bits}_flips.json")

    with open(flip_log_path, "w") as f:
        json.dump([{
            "row": int(row),
            "col": int(col),
            "original": float(orig),
            "flipped": float(flipped_val)
        } for (row, col, orig, flipped_val) in flipped], f, indent=4)

    with open(save_path, "w") as f:
        json.dump(report, f, indent=4)

    print(f"✅ Saved report for layer {layer_idx} to {save_path}")



--- Flipping Layer 0 ---
              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.91      0.71      0.80       179
     CWE-284       1.00      0.69      0.81        16
     CWE-400       0.75      0.23      0.35        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.77      0.59      0.67        29
     CWE-416       0.52      0.56      0.54       108
     CWE-476       0.29      0.74      0.41       110
     CWE-617       0.68      0.35      0.46        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.80      0.71      0.75        17

    accuracy                           0.54       620
   macro avg       0.48      0.38      0.40       620
weighted avg       0.58      0.54      0.52       620

✅ Saved report for layer 0 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.87      0.75      0.80       179
     CWE-284       1.00      0.56      0.72        16
     CWE-400       0.82      0.35      0.49        40
     CWE-401       0.67      0.17      0.27        24
     CWE-415       0.23      0.72      0.34        29
     CWE-416       0.42      0.63      0.50       108
     CWE-476       0.43      0.49      0.46       110
     CWE-617       0.67      0.56      0.61        43
      CWE-78       1.00      0.50      0.67        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.80      0.47      0.59        17

    accuracy                           0.55       620
   macro avg       0.57      0.43      0.45       620
weighted avg       0.60      0.55      0.55       620

✅ Saved report for layer 1 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_150_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.89      0.79      0.84       179
     CWE-284       0.91      0.62      0.74        16
     CWE-400       0.73      0.47      0.58        40
     CWE-401       1.00      0.17      0.29        24
     CWE-415       0.62      0.55      0.58        29
     CWE-416       0.47      0.74      0.58       108
     CWE-476       0.40      0.60      0.48       110
     CWE-617       0.57      0.60      0.58        43
      CWE-78       1.00      0.50      0.67        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.90      0.53      0.67        17

    accuracy                           0.61       620
   macro avg       0.62      0.47      0.50       620
weighted avg       0.63      0.61      0.59       620

✅ Saved report for layer 2 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_150_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       1.00      0.14      0.25        21
     CWE-190       0.89      0.84      0.86       179
     CWE-284       0.92      0.69      0.79        16
     CWE-400       0.78      0.45      0.57        40
     CWE-401       0.67      0.17      0.27        24
     CWE-415       0.66      0.79      0.72        29
     CWE-416       0.52      0.74      0.61       108
     CWE-476       0.52      0.68      0.59       110
     CWE-617       0.58      0.65      0.62        43
      CWE-78       0.60      0.60      0.60        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.87      0.76      0.81        17

    accuracy                           0.66       620
   macro avg       0.67      0.54      0.56       620
weighted avg       0.68      0.66      0.64       620

✅ Saved report for layer 3 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_150_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.88      0.24      0.38       179
     CWE-284       0.00      0.00      0.00        16
     CWE-400       0.29      0.17      0.22        40
     CWE-401       1.00      0.17      0.29        24
     CWE-415       0.05      0.52      0.10        29
     CWE-416       0.32      0.30      0.31       108
     CWE-476       0.20      0.30      0.24       110
     CWE-617       1.00      0.02      0.05        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       1.00      0.24      0.38        17

    accuracy                           0.22       620
   macro avg       0.40      0.16      0.16       620
weighted avg       0.50      0.22      0.25       620

✅ Saved report for layer 4 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_150_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.90      0.82      0.86       179
     CWE-284       0.92      0.69      0.79        16
     CWE-400       0.78      0.35      0.48        40
     CWE-401       0.67      0.17      0.27        24
     CWE-415       0.83      0.66      0.73        29
     CWE-416       0.48      0.73      0.58       108
     CWE-476       0.44      0.65      0.53       110
     CWE-617       0.55      0.60      0.58        43
      CWE-78       1.00      0.40      0.57        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.74      0.82      0.78        17

    accuracy                           0.63       620
   macro avg       0.61      0.49      0.51       620
weighted avg       0.63      0.63      0.61       620

✅ Saved report for layer 5 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_150_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.90      0.66      0.76       179
     CWE-284       1.00      0.12      0.22        16
     CWE-400       0.53      0.20      0.29        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.54      0.24      0.33        29
     CWE-416       0.44      0.37      0.40       108
     CWE-476       0.24      0.76      0.37       110
     CWE-617       0.70      0.16      0.26        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.89      0.47      0.62        17

    accuracy                           0.44       620
   macro avg       0.44      0.25      0.27       620
weighted avg       0.54      0.44      0.43       620

✅ Saved report for layer 6 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_150_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.79      0.84      0.81       179
     CWE-284       0.91      0.62      0.74        16
     CWE-400       0.52      0.30      0.38        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.23      0.69      0.35        29
     CWE-416       0.39      0.32      0.36       108
     CWE-476       0.37      0.61      0.46       110
     CWE-617       0.74      0.47      0.57        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.83      0.59      0.69        17

    accuracy                           0.52       620
   macro avg       0.40      0.37      0.36       620
weighted avg       0.50      0.52      0.50       620

✅ Saved report for layer 7 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_150_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.67      0.10      0.17        21
     CWE-190       0.89      0.85      0.87       179
     CWE-284       0.92      0.69      0.79        16
     CWE-400       0.83      0.47      0.60        40
     CWE-401       0.67      0.17      0.27        24
     CWE-415       0.68      0.79      0.73        29
     CWE-416       0.57      0.62      0.59       108
     CWE-476       0.45      0.72      0.55       110
     CWE-617       0.60      0.70      0.65        43
      CWE-78       0.60      0.60      0.60        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.81      0.76      0.79        17

    accuracy                           0.66       620
   macro avg       0.64      0.54      0.55       620
weighted avg       0.67      0.66      0.64       620

✅ Saved report for layer 8 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_150_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


200 flips

In [None]:
import torch
import random
import struct

def flip_random_bits_in_query_weight(model, layer_index=0, num_bits=200):
    """
    Flips `num_bits` random bits in the self-attention query weights of the specified layer in the model.

    Returns:
        List of (row, col, original_val, flipped_val) tuples.
    """
    flipped = []

    weight = model.roberta.encoder.layer[layer_index].attention.self.query.weight
    weight_data = weight.data.cpu().numpy()

    num_rows, num_cols = weight_data.shape

    for _ in range(num_bits):
        row = random.randint(0, num_rows - 1)
        col = random.randint(0, num_cols - 1)
        original_val = weight_data[row, col]

        # Convert float to int bit pattern
        int_bits = struct.unpack('>I', struct.pack('>f', original_val))[0]
        bit_to_flip = random.randint(0, 31)
        flipped_bits = int_bits ^ (1 << bit_to_flip)
        flipped_val = struct.unpack('>f', struct.pack('>I', flipped_bits))[0]

        weight_data[row, col] = flipped_val
        flipped.append((row, col, original_val, flipped_val))

    weight.data = torch.tensor(weight_data, dtype=weight.dtype, device=weight.device)
    return flipped


In [None]:
import os
import json
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from sklearn.metrics import classification_report

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_path = "/content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may"
results_dir = os.path.join(model_path, "bitflip_results_200_5")
os.makedirs(results_dir, exist_ok=True)

number_bits = 200  # ✅ define your flip count here

for layer_idx in range(12):
    print(f"\n--- Flipping Layer {layer_idx} ---")

    # Step 1: Reload model
    model = AutoModelForSequenceClassification.from_pretrained(model_path).to(device)
    tokenizer = AutoTokenizer.from_pretrained(model_path)

    # Step 2: Inject bit flips
    flipped = flip_random_bits_in_query_weight(model, layer_index=layer_idx, num_bits=number_bits)

    # Step 3: Inference on df_test
    preds = []
    model.eval()
    for code in df_test['code_before'].values:
        with torch.no_grad():
            inputs = tokenizer(code, return_tensors="pt", truncation=True).to(device)
            logits = model(**inputs).logits
            predicted_class = logits.argmax().item()
            preds.append(predicted_class)

    y_true = [id2label[i] for i in df_test['label'].values]
    y_pred = [id2label[i] for i in preds]

    # Step 4: Report & Save
    report = classification_report(y_true, y_pred, output_dict=True)
    print(classification_report(y_true, y_pred))

    save_path = os.path.join(results_dir, f"layer{layer_idx}_xor{number_bits}_report.json")
    flip_log_path = os.path.join(results_dir, f"layer{layer_idx}_xor{number_bits}_flips.json")

    with open(flip_log_path, "w") as f:
        json.dump([{
            "row": int(row),
            "col": int(col),
            "original": float(orig),
            "flipped": float(flipped_val)
        } for (row, col, orig, flipped_val) in flipped], f, indent=4)

    with open(save_path, "w") as f:
        json.dump(report, f, indent=4)

    print(f"✅ Saved report for layer {layer_idx} to {save_path}")



--- Flipping Layer 0 ---


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       1.00      0.10      0.17        21
     CWE-190       0.75      0.72      0.74       179
     CWE-284       1.00      0.62      0.77        16
     CWE-400       0.57      0.42      0.49        40
     CWE-401       1.00      0.17      0.29        24
     CWE-415       0.44      0.41      0.43        29
     CWE-416       0.42      0.55      0.48       108
     CWE-476       0.29      0.54      0.37       110
     CWE-617       0.63      0.28      0.39        43
      CWE-78       1.00      0.40      0.57        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.88      0.41      0.56        17

    accuracy                           0.51       620
   macro avg       0.67      0.38      0.44       620
weighted avg       0.58      0.51      0.50       620

✅ Saved report for layer 0 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_200_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.89      0.53      0.66       179
     CWE-284       1.00      0.50      0.67        16
     CWE-400       0.43      0.30      0.35        40
     CWE-401       1.00      0.17      0.29        24
     CWE-415       0.67      0.14      0.23        29
     CWE-416       0.34      0.71      0.46       108
     CWE-476       0.28      0.58      0.38       110
     CWE-617       1.00      0.07      0.13        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.75      0.35      0.48        17

    accuracy                           0.44       620
   macro avg       0.53      0.28      0.30       620
weighted avg       0.58      0.44      0.42       620

✅ Saved report for layer 2 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_200_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.93      0.74      0.83       179
     CWE-284       0.91      0.62      0.74        16
     CWE-400       0.38      0.30      0.33        40
     CWE-401       1.00      0.17      0.29        24
     CWE-415       0.69      0.31      0.43        29
     CWE-416       0.45      0.74      0.56       108
     CWE-476       0.30      0.61      0.41       110
     CWE-617       0.40      0.09      0.15        43
      CWE-78       1.00      0.40      0.57        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.86      0.35      0.50        17

    accuracy                           0.53       620
   macro avg       0.58      0.36      0.40       620
weighted avg       0.59      0.53      0.51       620

✅ Saved report for layer 3 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_200_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       1.00      0.29      0.44        21
     CWE-190       0.84      0.84      0.84       179
     CWE-284       0.92      0.69      0.79        16
     CWE-400       0.76      0.40      0.52        40
     CWE-401       0.67      0.17      0.27        24
     CWE-415       0.62      0.79      0.70        29
     CWE-416       0.55      0.69      0.61       108
     CWE-476       0.48      0.65      0.55       110
     CWE-617       0.58      0.65      0.62        43
      CWE-78       0.67      0.60      0.63        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.76      0.76      0.76        17

    accuracy                           0.65       620
   macro avg       0.65      0.54      0.56       620
weighted avg       0.66      0.65      0.63       620

✅ Saved report for layer 4 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_200_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.90      0.53      0.67       179
     CWE-284       0.00      0.00      0.00        16
     CWE-400       0.21      0.12      0.16        40
     CWE-401       0.67      0.17      0.27        24
     CWE-415       0.50      0.07      0.12        29
     CWE-416       0.43      0.54      0.48       108
     CWE-476       0.23      0.71      0.35       110
     CWE-617       0.00      0.00      0.00        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       1.00      0.24      0.38        17

    accuracy                           0.40       620
   macro avg       0.33      0.20      0.20       620
weighted avg       0.47      0.40      0.37       620

✅ Saved report for layer 5 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_200_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.86      0.50      0.63       179
     CWE-284       1.00      0.56      0.72        16
     CWE-400       0.85      0.28      0.42        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.10      0.79      0.18        29
     CWE-416       0.45      0.37      0.41       108
     CWE-476       0.32      0.43      0.37       110
     CWE-617       0.44      0.16      0.24        43
      CWE-78       1.00      0.40      0.57        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.83      0.59      0.69        17

    accuracy                           0.39       620
   macro avg       0.49      0.34      0.35       620
weighted avg       0.54      0.39      0.42       620

✅ Saved report for layer 6 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_200_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.79      0.87      0.83       179
     CWE-284       0.92      0.69      0.79        16
     CWE-400       0.61      0.42      0.50        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.47      0.72      0.57        29
     CWE-416       0.53      0.59      0.56       108
     CWE-476       0.43      0.64      0.51       110
     CWE-617       0.64      0.58      0.61        43
      CWE-78       1.00      0.50      0.67        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.85      0.65      0.73        17

    accuracy                           0.61       620
   macro avg       0.52      0.47      0.48       620
weighted avg       0.57      0.61      0.58       620

✅ Saved report for layer 7 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_200_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.81      0.73      0.77       179
     CWE-284       0.80      0.25      0.38        16
     CWE-400       0.50      0.03      0.05        40
     CWE-401       1.00      0.17      0.29        24
     CWE-415       0.10      0.83      0.19        29
     CWE-416       0.62      0.31      0.41       108
     CWE-476       0.43      0.45      0.44       110
     CWE-617       0.58      0.51      0.54        43
      CWE-78       0.67      0.40      0.50        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.88      0.41      0.56        17

    accuracy                           0.45       620
   macro avg       0.53      0.34      0.34       620
weighted avg       0.59      0.45      0.46       620

✅ Saved report for layer 10 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_200_5/la

250 flips

In [None]:
import torch
import random
import struct

def flip_random_bits_in_query_weight(model, layer_index=0, num_bits=250):
    """
    Flips `num_bits` random bits in the self-attention query weights of the specified layer in the model.

    Returns:
        List of (row, col, original_val, flipped_val) tuples.
    """
    flipped = []

    weight = model.roberta.encoder.layer[layer_index].attention.self.query.weight
    weight_data = weight.data.cpu().numpy()

    num_rows, num_cols = weight_data.shape

    for _ in range(num_bits):
        row = random.randint(0, num_rows - 1)
        col = random.randint(0, num_cols - 1)
        original_val = weight_data[row, col]

        # Convert float to int bit pattern
        int_bits = struct.unpack('>I', struct.pack('>f', original_val))[0]
        bit_to_flip = random.randint(0, 31)
        flipped_bits = int_bits ^ (1 << bit_to_flip)
        flipped_val = struct.unpack('>f', struct.pack('>I', flipped_bits))[0]

        weight_data[row, col] = flipped_val
        flipped.append((row, col, original_val, flipped_val))

    weight.data = torch.tensor(weight_data, dtype=weight.dtype, device=weight.device)
    return flipped


In [None]:
import os
import json
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from sklearn.metrics import classification_report

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_path = "/content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may"
results_dir = os.path.join(model_path, "bitflip_results_250_5")
os.makedirs(results_dir, exist_ok=True)

number_bits = 250  # ✅ define your flip count here

for layer_idx in range(12):
    print(f"\n--- Flipping Layer {layer_idx} ---")

    # Step 1: Reload model
    model = AutoModelForSequenceClassification.from_pretrained(model_path).to(device)
    tokenizer = AutoTokenizer.from_pretrained(model_path)

    # Step 2: Inject bit flips
    flipped = flip_random_bits_in_query_weight(model, layer_index=layer_idx, num_bits=number_bits)

    # Step 3: Inference on df_test
    preds = []
    model.eval()
    for code in df_test['code_before'].values:
        with torch.no_grad():
            inputs = tokenizer(code, return_tensors="pt", truncation=True).to(device)
            logits = model(**inputs).logits
            predicted_class = logits.argmax().item()
            preds.append(predicted_class)

    y_true = [id2label[i] for i in df_test['label'].values]
    y_pred = [id2label[i] for i in preds]

    # Step 4: Report & Save
    report = classification_report(y_true, y_pred, output_dict=True)
    print(classification_report(y_true, y_pred))

    save_path = os.path.join(results_dir, f"layer{layer_idx}_xor{number_bits}_report.json")
    flip_log_path = os.path.join(results_dir, f"layer{layer_idx}_xor{number_bits}_flips.json")

    with open(flip_log_path, "w") as f:
        json.dump([{
            "row": int(row),
            "col": int(col),
            "original": float(orig),
            "flipped": float(flipped_val)
        } for (row, col, orig, flipped_val) in flipped], f, indent=4)

    with open(save_path, "w") as f:
        json.dump(report, f, indent=4)

    print(f"✅ Saved report for layer {layer_idx} to {save_path}")



--- Flipping Layer 0 ---
              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.00      0.00      0.00       179
     CWE-284       0.00      0.00      0.00        16
     CWE-400       0.00      0.00      0.00        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.05      1.00      0.09        29
     CWE-416       0.00      0.00      0.00       108
     CWE-476       0.00      0.00      0.00       110
     CWE-617       0.00      0.00      0.00        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.00      0.00      0.00        17

    accuracy                           0.05       620
   macro avg       0.00      0.08      0.01       620
weighted avg       0.00      0.05      0.00       620

✅ Saved report for layer 0 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.00      0.00      0.00       179
     CWE-284       0.00      0.00      0.00        16
     CWE-400       0.00      0.00      0.00        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.05      1.00      0.09        29
     CWE-416       0.00      0.00      0.00       108
     CWE-476       0.00      0.00      0.00       110
     CWE-617       0.00      0.00      0.00        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.00      0.00      0.00        17

    accuracy                           0.05       620
   macro avg       0.00      0.08      0.01       620
weighted avg       0.00      0.05      0.00       620

✅ Saved report for layer 1 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_250_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.95      0.57      0.71       179
     CWE-284       1.00      0.56      0.72        16
     CWE-400       0.54      0.35      0.42        40
     CWE-401       1.00      0.17      0.29        24
     CWE-415       0.14      0.41      0.21        29
     CWE-416       0.40      0.56      0.47       108
     CWE-476       0.31      0.55      0.40       110
     CWE-617       0.58      0.49      0.53        43
      CWE-78       1.00      0.10      0.18        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       1.00      0.41      0.58        17

    accuracy                           0.47       620
   macro avg       0.58      0.35      0.38       620
weighted avg       0.59      0.47      0.48       620

✅ Saved report for layer 3 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_250_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.88      0.71      0.78       179
     CWE-284       1.00      0.69      0.81        16
     CWE-400       0.75      0.15      0.25        40
     CWE-401       1.00      0.17      0.29        24
     CWE-415       0.75      0.62      0.68        29
     CWE-416       0.41      0.71      0.52       108
     CWE-476       0.37      0.61      0.46       110
     CWE-617       0.52      0.56      0.54        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.75      0.53      0.62        17

    accuracy                           0.55       620
   macro avg       0.54      0.40      0.41       620
weighted avg       0.59      0.55      0.53       620

✅ Saved report for layer 4 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_250_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       1.00      0.05      0.09        21
     CWE-190       0.76      0.80      0.78       179
     CWE-284       0.91      0.62      0.74        16
     CWE-400       0.50      0.30      0.38        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.33      0.03      0.06        29
     CWE-416       0.42      0.41      0.41       108
     CWE-476       0.26      0.60      0.36       110
     CWE-617       0.50      0.19      0.27        43
      CWE-78       1.00      0.40      0.57        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.87      0.76      0.81        17

    accuracy                           0.49       620
   macro avg       0.55      0.35      0.37       620
weighted avg       0.52      0.49      0.46       620

✅ Saved report for layer 5 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_250_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.92      0.68      0.78       179
     CWE-284       1.00      0.56      0.72        16
     CWE-400       0.67      0.20      0.31        40
     CWE-401       1.00      0.17      0.29        24
     CWE-415       0.45      0.31      0.37        29
     CWE-416       0.51      0.32      0.40       108
     CWE-476       0.26      0.84      0.40       110
     CWE-617       0.82      0.21      0.33        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.92      0.71      0.80        17

    accuracy                           0.48       620
   macro avg       0.55      0.33      0.37       620
weighted avg       0.61      0.48      0.48       620

✅ Saved report for layer 6 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_250_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       1.00      0.05      0.09        21
     CWE-190       0.72      0.88      0.79       179
     CWE-284       0.92      0.69      0.79        16
     CWE-400       0.71      0.38      0.49        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.50      0.48      0.49        29
     CWE-416       0.50      0.63      0.56       108
     CWE-476       0.39      0.57      0.46       110
     CWE-617       0.86      0.28      0.42        43
      CWE-78       0.75      0.60      0.67        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.81      0.76      0.79        17

    accuracy                           0.58       620
   macro avg       0.60      0.44      0.46       620
weighted avg       0.58      0.58      0.55       620

✅ Saved report for layer 7 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_250_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.65      0.71      0.68       179
     CWE-284       0.86      0.38      0.52        16
     CWE-400       0.42      0.45      0.43        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.12      0.45      0.18        29
     CWE-416       0.43      0.30      0.35       108
     CWE-476       0.37      0.53      0.44       110
     CWE-617       0.33      0.09      0.15        43
      CWE-78       1.00      0.50      0.67        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.86      0.71      0.77        17

    accuracy                           0.44       620
   macro avg       0.42      0.34      0.35       620
weighted avg       0.45      0.44      0.43       620

✅ Saved report for layer 9 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_250_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       1.00      0.19      0.32        21
     CWE-190       0.83      0.86      0.84       179
     CWE-284       0.92      0.69      0.79        16
     CWE-400       0.64      0.57      0.61        40
     CWE-401       1.00      0.17      0.29        24
     CWE-415       0.72      0.72      0.72        29
     CWE-416       0.59      0.67      0.63       108
     CWE-476       0.46      0.74      0.56       110
     CWE-617       0.88      0.51      0.65        43
      CWE-78       0.56      0.50      0.53        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.87      0.76      0.81        17

    accuracy                           0.66       620
   macro avg       0.70      0.53      0.56       620
weighted avg       0.69      0.66      0.64       620

✅ Saved report for layer 10 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_250_5/la

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


300 flips

In [None]:
import torch
import random
import struct

def flip_random_bits_in_query_weight(model, layer_index=0, num_bits=300):
    """
    Flips `num_bits` random bits in the self-attention query weights of the specified layer in the model.

    Returns:
        List of (row, col, original_val, flipped_val) tuples.
    """
    flipped = []

    weight = model.roberta.encoder.layer[layer_index].attention.self.query.weight
    weight_data = weight.data.cpu().numpy()

    num_rows, num_cols = weight_data.shape

    for _ in range(num_bits):
        row = random.randint(0, num_rows - 1)
        col = random.randint(0, num_cols - 1)
        original_val = weight_data[row, col]

        # Convert float to int bit pattern
        int_bits = struct.unpack('>I', struct.pack('>f', original_val))[0]
        bit_to_flip = random.randint(0, 31)
        flipped_bits = int_bits ^ (1 << bit_to_flip)
        flipped_val = struct.unpack('>f', struct.pack('>I', flipped_bits))[0]

        weight_data[row, col] = flipped_val
        flipped.append((row, col, original_val, flipped_val))

    weight.data = torch.tensor(weight_data, dtype=weight.dtype, device=weight.device)
    return flipped


In [None]:
import os
import json
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from sklearn.metrics import classification_report

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_path = "/content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may"
results_dir = os.path.join(model_path, "bitflip_results_300_5")
os.makedirs(results_dir, exist_ok=True)

number_bits = 300  # ✅ define your flip count here

for layer_idx in range(12):
    print(f"\n--- Flipping Layer {layer_idx} ---")

    # Step 1: Reload model
    model = AutoModelForSequenceClassification.from_pretrained(model_path).to(device)
    tokenizer = AutoTokenizer.from_pretrained(model_path)

    # Step 2: Inject bit flips
    flipped = flip_random_bits_in_query_weight(model, layer_index=layer_idx, num_bits=number_bits)

    # Step 3: Inference on df_test
    preds = []
    model.eval()
    for code in df_test['code_before'].values:
        with torch.no_grad():
            inputs = tokenizer(code, return_tensors="pt", truncation=True).to(device)
            logits = model(**inputs).logits
            predicted_class = logits.argmax().item()
            preds.append(predicted_class)

    y_true = [id2label[i] for i in df_test['label'].values]
    y_pred = [id2label[i] for i in preds]

    # Step 4: Report & Save
    report = classification_report(y_true, y_pred, output_dict=True)
    print(classification_report(y_true, y_pred))

    save_path = os.path.join(results_dir, f"layer{layer_idx}_xor{number_bits}_report.json")
    flip_log_path = os.path.join(results_dir, f"layer{layer_idx}_xor{number_bits}_flips.json")

    with open(flip_log_path, "w") as f:
        json.dump([{
            "row": int(row),
            "col": int(col),
            "original": float(orig),
            "flipped": float(flipped_val)
        } for (row, col, orig, flipped_val) in flipped], f, indent=4)

    with open(save_path, "w") as f:
        json.dump(report, f, indent=4)

    print(f"✅ Saved report for layer {layer_idx} to {save_path}")



--- Flipping Layer 0 ---


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.90      0.65      0.76       179
     CWE-284       0.83      0.31      0.45        16
     CWE-400       0.39      0.53      0.45        40
     CWE-401       1.00      0.17      0.29        24
     CWE-415       0.90      0.31      0.46        29
     CWE-416       0.38      0.81      0.51       108
     CWE-476       0.30      0.43      0.35       110
     CWE-617       0.50      0.28      0.36        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       1.00      0.12      0.21        17

    accuracy                           0.49       620
   macro avg       0.52      0.30      0.32       620
weighted avg       0.57      0.49      0.47       620

✅ Saved report for layer 0 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_300_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.00      0.00      0.00       179
     CWE-284       0.00      0.00      0.00        16
     CWE-400       0.00      0.00      0.00        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.05      1.00      0.09        29
     CWE-416       0.30      0.06      0.09       108
     CWE-476       0.00      0.00      0.00       110
     CWE-617       0.00      0.00      0.00        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.00      0.00      0.00        17

    accuracy                           0.06       620
   macro avg       0.03      0.09      0.02       620
weighted avg       0.05      0.06      0.02       620

✅ Saved report for layer 1 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_300_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.92      0.74      0.82       179
     CWE-284       0.92      0.69      0.79        16
     CWE-400       0.57      0.33      0.41        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.71      0.52      0.60        29
     CWE-416       0.43      0.66      0.52       108
     CWE-476       0.33      0.65      0.44       110
     CWE-617       0.52      0.40      0.45        43
      CWE-78       1.00      0.40      0.57        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.60      0.18      0.27        17

    accuracy                           0.55       620
   macro avg       0.50      0.38      0.41       620
weighted avg       0.56      0.55      0.53       620

✅ Saved report for layer 2 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_300_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.50      0.05      0.09        21
     CWE-190       0.89      0.65      0.75       179
     CWE-284       0.91      0.62      0.74        16
     CWE-400       0.65      0.28      0.39        40
     CWE-401       1.00      0.17      0.29        24
     CWE-415       0.50      0.17      0.26        29
     CWE-416       0.43      0.73      0.54       108
     CWE-476       0.32      0.69      0.44       110
     CWE-617       0.67      0.14      0.23        43
      CWE-78       0.67      0.40      0.50        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.86      0.35      0.50        17

    accuracy                           0.51       620
   macro avg       0.61      0.35      0.39       620
weighted avg       0.61      0.51      0.50       620

✅ Saved report for layer 3 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_300_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.82      0.23      0.36       179
     CWE-284       0.75      0.19      0.30        16
     CWE-400       0.18      0.17      0.18        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.25      0.03      0.06        29
     CWE-416       0.33      0.56      0.42       108
     CWE-476       0.21      0.64      0.31       110
     CWE-617       1.00      0.05      0.09        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       1.00      0.12      0.21        17

    accuracy                           0.30       620
   macro avg       0.38      0.17      0.16       620
weighted avg       0.47      0.30      0.27       620

✅ Saved report for layer 4 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_300_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.77      0.37      0.50       179
     CWE-284       1.00      0.25      0.40        16
     CWE-400       0.46      0.15      0.23        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.06      0.34      0.10        29
     CWE-416       0.42      0.37      0.39       108
     CWE-476       0.28      0.56      0.38       110
     CWE-617       0.30      0.07      0.11        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.89      0.47      0.62        17

    accuracy                           0.32       620
   macro avg       0.35      0.22      0.23       620
weighted avg       0.45      0.32      0.33       620

✅ Saved report for layer 5 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_300_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.75      0.22      0.34       179
     CWE-284       0.00      0.00      0.00        16
     CWE-400       0.60      0.07      0.13        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.06      0.76      0.10        29
     CWE-416       0.22      0.07      0.11       108
     CWE-476       0.19      0.22      0.20       110
     CWE-617       0.00      0.00      0.00        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.00      0.00      0.00        17

    accuracy                           0.16       620
   macro avg       0.15      0.11      0.07       620
weighted avg       0.33      0.16      0.17       620

✅ Saved report for layer 6 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_300_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       1.00      0.01      0.01       179
     CWE-284       0.00      0.00      0.00        16
     CWE-400       0.00      0.00      0.00        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.05      1.00      0.09        29
     CWE-416       0.00      0.00      0.00       108
     CWE-476       1.00      0.01      0.02       110
     CWE-617       0.00      0.00      0.00        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.00      0.00      0.00        17

    accuracy                           0.05       620
   macro avg       0.17      0.08      0.01       620
weighted avg       0.47      0.05      0.01       620

✅ Saved report for layer 7 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_300_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.75      0.05      0.09       179
     CWE-284       0.00      0.00      0.00        16
     CWE-400       1.00      0.07      0.14        40
     CWE-401       1.00      0.17      0.29        24
     CWE-415       0.05      0.93      0.10        29
     CWE-416       0.53      0.19      0.28       108
     CWE-476       0.63      0.15      0.25       110
     CWE-617       0.58      0.16      0.25        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.86      0.35      0.50        17

    accuracy                           0.15       620
   macro avg       0.45      0.17      0.16       620
weighted avg       0.59      0.15      0.18       620

✅ Saved report for layer 8 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_300_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       1.00      0.05      0.09        21
     CWE-190       0.90      0.80      0.85       179
     CWE-284       0.92      0.69      0.79        16
     CWE-400       0.71      0.38      0.49        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.19      0.79      0.30        29
     CWE-416       0.50      0.46      0.48       108
     CWE-476       0.47      0.59      0.53       110
     CWE-617       0.55      0.53      0.54        43
      CWE-78       0.71      0.50      0.59        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.81      0.76      0.79        17

    accuracy                           0.56       620
   macro avg       0.56      0.46      0.45       620
weighted avg       0.62      0.56      0.56       620

✅ Saved report for layer 9 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_300_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       1.00      0.19      0.32        21
     CWE-190       0.80      0.87      0.83       179
     CWE-284       0.92      0.69      0.79        16
     CWE-400       0.71      0.42      0.53        40
     CWE-401       0.67      0.17      0.27        24
     CWE-415       0.68      0.79      0.73        29
     CWE-416       0.57      0.69      0.62       108
     CWE-476       0.51      0.69      0.59       110
     CWE-617       0.68      0.63      0.65        43
      CWE-78       0.60      0.60      0.60        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.76      0.76      0.76        17

    accuracy                           0.66       620
   macro avg       0.66      0.54      0.56       620
weighted avg       0.66      0.66      0.64       620

✅ Saved report for layer 10 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_300_5/la

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


350 flips

In [None]:
import torch
import random
import struct

def flip_random_bits_in_query_weight(model, layer_index=0, num_bits=350):
    """
    Flips `num_bits` random bits in the self-attention query weights of the specified layer in the model.

    Returns:
        List of (row, col, original_val, flipped_val) tuples.
    """
    flipped = []

    weight = model.roberta.encoder.layer[layer_index].attention.self.query.weight
    weight_data = weight.data.cpu().numpy()

    num_rows, num_cols = weight_data.shape

    for _ in range(num_bits):
        row = random.randint(0, num_rows - 1)
        col = random.randint(0, num_cols - 1)
        original_val = weight_data[row, col]

        # Convert float to int bit pattern
        int_bits = struct.unpack('>I', struct.pack('>f', original_val))[0]
        bit_to_flip = random.randint(0, 31)
        flipped_bits = int_bits ^ (1 << bit_to_flip)
        flipped_val = struct.unpack('>f', struct.pack('>I', flipped_bits))[0]

        weight_data[row, col] = flipped_val
        flipped.append((row, col, original_val, flipped_val))

    weight.data = torch.tensor(weight_data, dtype=weight.dtype, device=weight.device)
    return flipped


In [None]:
import os
import json
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from sklearn.metrics import classification_report

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_path = "/content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may"
results_dir = os.path.join(model_path, "bitflip_results_350_5")
os.makedirs(results_dir, exist_ok=True)

number_bits = 350  # ✅ define your flip count here

for layer_idx in range(12):
    print(f"\n--- Flipping Layer {layer_idx} ---")

    # Step 1: Reload model
    model = AutoModelForSequenceClassification.from_pretrained(model_path).to(device)
    tokenizer = AutoTokenizer.from_pretrained(model_path)

    # Step 2: Inject bit flips
    flipped = flip_random_bits_in_query_weight(model, layer_index=layer_idx, num_bits=number_bits)

    # Step 3: Inference on df_test
    preds = []
    model.eval()
    for code in df_test['code_before'].values:
        with torch.no_grad():
            inputs = tokenizer(code, return_tensors="pt", truncation=True).to(device)
            logits = model(**inputs).logits
            predicted_class = logits.argmax().item()
            preds.append(predicted_class)

    y_true = [id2label[i] for i in df_test['label'].values]
    y_pred = [id2label[i] for i in preds]

    # Step 4: Report & Save
    report = classification_report(y_true, y_pred, output_dict=True)
    print(classification_report(y_true, y_pred))

    save_path = os.path.join(results_dir, f"layer{layer_idx}_xor{number_bits}_report.json")
    flip_log_path = os.path.join(results_dir, f"layer{layer_idx}_xor{number_bits}_flips.json")

    with open(flip_log_path, "w") as f:
        json.dump([{
            "row": int(row),
            "col": int(col),
            "original": float(orig),
            "flipped": float(flipped_val)
        } for (row, col, orig, flipped_val) in flipped], f, indent=4)

    with open(save_path, "w") as f:
        json.dump(report, f, indent=4)

    print(f"✅ Saved report for layer {layer_idx} to {save_path}")



--- Flipping Layer 0 ---


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.53      0.04      0.08       179
     CWE-284       1.00      0.06      0.12        16
     CWE-400       1.00      0.03      0.05        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.00      0.00      0.00        29
     CWE-416       0.28      0.62      0.38       108
     CWE-476       0.16      0.54      0.25       110
     CWE-617       0.00      0.00      0.00        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.00      0.00      0.00        17

    accuracy                           0.22       620
   macro avg       0.25      0.11      0.07       620
weighted avg       0.32      0.22      0.14       620

✅ Saved report for layer 0 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_350_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.87      0.30      0.44       179
     CWE-284       1.00      0.50      0.67        16
     CWE-400       0.73      0.20      0.31        40
     CWE-401       1.00      0.17      0.29        24
     CWE-415       0.07      0.66      0.13        29
     CWE-416       0.39      0.44      0.41       108
     CWE-476       0.36      0.48      0.41       110
     CWE-617       0.88      0.16      0.27        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.80      0.24      0.36        17

    accuracy                           0.33       620
   macro avg       0.51      0.26      0.28       620
weighted avg       0.58      0.33      0.36       620

✅ Saved report for layer 1 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_350_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.91      0.72      0.80       179
     CWE-284       0.89      0.50      0.64        16
     CWE-400       0.65      0.33      0.43        40
     CWE-401       1.00      0.17      0.29        24
     CWE-415       0.36      0.59      0.45        29
     CWE-416       0.46      0.69      0.55       108
     CWE-476       0.40      0.65      0.49       110
     CWE-617       0.61      0.58      0.60        43
      CWE-78       1.00      0.50      0.67        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.73      0.65      0.69        17

    accuracy                           0.57       620
   macro avg       0.58      0.45      0.47       620
weighted avg       0.61      0.57      0.56       620

✅ Saved report for layer 2 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_350_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.75      0.14      0.24        21
     CWE-190       0.92      0.79      0.85       179
     CWE-284       0.92      0.69      0.79        16
     CWE-400       0.79      0.47      0.59        40
     CWE-401       1.00      0.17      0.29        24
     CWE-415       0.73      0.66      0.69        29
     CWE-416       0.46      0.75      0.57       108
     CWE-476       0.40      0.65      0.49       110
     CWE-617       0.75      0.35      0.48        43
      CWE-78       1.00      0.60      0.75        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.92      0.71      0.80        17

    accuracy                           0.62       620
   macro avg       0.72      0.50      0.54       620
weighted avg       0.68      0.62      0.61       620

✅ Saved report for layer 3 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_350_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.91      0.28      0.43       179
     CWE-284       0.00      0.00      0.00        16
     CWE-400       0.19      0.07      0.11        40
     CWE-401       1.00      0.17      0.29        24
     CWE-415       0.13      0.28      0.18        29
     CWE-416       0.28      0.47      0.35       108
     CWE-476       0.17      0.47      0.25       110
     CWE-617       0.00      0.00      0.00        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.00      0.00      0.00        17

    accuracy                           0.27       620
   macro avg       0.22      0.15      0.13       620
weighted avg       0.40      0.27      0.26       620

✅ Saved report for layer 4 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_350_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.72      0.51      0.60       179
     CWE-284       0.00      0.00      0.00        16
     CWE-400       0.00      0.00      0.00        40
     CWE-401       0.67      0.17      0.27        24
     CWE-415       0.08      0.38      0.13        29
     CWE-416       0.41      0.34      0.37       108
     CWE-476       0.26      0.59      0.36       110
     CWE-617       0.00      0.00      0.00        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.75      0.18      0.29        17

    accuracy                           0.34       620
   macro avg       0.24      0.18      0.17       620
weighted avg       0.38      0.34      0.33       620

✅ Saved report for layer 5 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_350_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.82      0.51      0.63       179
     CWE-284       0.00      0.00      0.00        16
     CWE-400       0.50      0.10      0.17        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.73      0.38      0.50        29
     CWE-416       0.41      0.31      0.36       108
     CWE-476       0.24      0.86      0.38       110
     CWE-617       0.00      0.00      0.00        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       1.00      0.47      0.64        17

    accuracy                           0.39       620
   macro avg       0.31      0.22      0.22       620
weighted avg       0.45      0.39      0.36       620

✅ Saved report for layer 6 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_350_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.50      0.01      0.01       179
     CWE-284       1.00      0.38      0.55        16
     CWE-400       1.00      0.03      0.05        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.05      0.97      0.09        29
     CWE-416       0.25      0.02      0.03       108
     CWE-476       0.44      0.04      0.07       110
     CWE-617       0.00      0.00      0.00        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.00      0.00      0.00        17

    accuracy                           0.07       620
   macro avg       0.27      0.12      0.07       620
weighted avg       0.36      0.07      0.04       620

✅ Saved report for layer 7 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_350_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       1.00      0.10      0.17        21
     CWE-190       0.73      0.85      0.79       179
     CWE-284       0.92      0.69      0.79        16
     CWE-400       0.50      0.42      0.46        40
     CWE-401       1.00      0.17      0.29        24
     CWE-415       0.81      0.45      0.58        29
     CWE-416       0.56      0.56      0.56       108
     CWE-476       0.39      0.74      0.51       110
     CWE-617       0.86      0.14      0.24        43
      CWE-78       0.60      0.60      0.60        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.92      0.65      0.76        17

    accuracy                           0.59       620
   macro avg       0.69      0.45      0.48       620
weighted avg       0.64      0.59      0.56       620

✅ Saved report for layer 10 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_350_5/la

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


400 flips

In [None]:
import torch
import random
import struct

def flip_random_bits_in_query_weight(model, layer_index=0, num_bits=400):
    """
    Flips `num_bits` random bits in the self-attention query weights of the specified layer in the model.

    Returns:
        List of (row, col, original_val, flipped_val) tuples.
    """
    flipped = []

    weight = model.roberta.encoder.layer[layer_index].attention.self.query.weight
    weight_data = weight.data.cpu().numpy()

    num_rows, num_cols = weight_data.shape

    for _ in range(num_bits):
        row = random.randint(0, num_rows - 1)
        col = random.randint(0, num_cols - 1)
        original_val = weight_data[row, col]

        # Convert float to int bit pattern
        int_bits = struct.unpack('>I', struct.pack('>f', original_val))[0]
        bit_to_flip = random.randint(0, 31)
        flipped_bits = int_bits ^ (1 << bit_to_flip)
        flipped_val = struct.unpack('>f', struct.pack('>I', flipped_bits))[0]

        weight_data[row, col] = flipped_val
        flipped.append((row, col, original_val, flipped_val))

    weight.data = torch.tensor(weight_data, dtype=weight.dtype, device=weight.device)
    return flipped


In [None]:
import os
import json
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from sklearn.metrics import classification_report

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_path = "/content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may"
results_dir = os.path.join(model_path, "bitflip_results_400_5")
os.makedirs(results_dir, exist_ok=True)

number_bits = 400  # ✅ define your flip count here

for layer_idx in range(12):
    print(f"\n--- Flipping Layer {layer_idx} ---")

    # Step 1: Reload model
    model = AutoModelForSequenceClassification.from_pretrained(model_path).to(device)
    tokenizer = AutoTokenizer.from_pretrained(model_path)

    # Step 2: Inject bit flips
    flipped = flip_random_bits_in_query_weight(model, layer_index=layer_idx, num_bits=number_bits)

    # Step 3: Inference on df_test
    preds = []
    model.eval()
    for code in df_test['code_before'].values:
        with torch.no_grad():
            inputs = tokenizer(code, return_tensors="pt", truncation=True).to(device)
            logits = model(**inputs).logits
            predicted_class = logits.argmax().item()
            preds.append(predicted_class)

    y_true = [id2label[i] for i in df_test['label'].values]
    y_pred = [id2label[i] for i in preds]

    # Step 4: Report & Save
    report = classification_report(y_true, y_pred, output_dict=True)
    print(classification_report(y_true, y_pred))

    save_path = os.path.join(results_dir, f"layer{layer_idx}_xor{number_bits}_report.json")
    flip_log_path = os.path.join(results_dir, f"layer{layer_idx}_xor{number_bits}_flips.json")

    with open(flip_log_path, "w") as f:
        json.dump([{
            "row": int(row),
            "col": int(col),
            "original": float(orig),
            "flipped": float(flipped_val)
        } for (row, col, orig, flipped_val) in flipped], f, indent=4)

    with open(save_path, "w") as f:
        json.dump(report, f, indent=4)

    print(f"✅ Saved report for layer {layer_idx} to {save_path}")



--- Flipping Layer 0 ---


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.87      0.34      0.48       179
     CWE-284       0.86      0.38      0.52        16
     CWE-400       0.46      0.15      0.23        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.29      0.07      0.11        29
     CWE-416       0.28      0.68      0.40       108
     CWE-476       0.23      0.56      0.33       110
     CWE-617       0.00      0.00      0.00        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       1.00      0.06      0.11        17

    accuracy                           0.34       620
   macro avg       0.33      0.19      0.18       620
weighted avg       0.43      0.34      0.30       620

✅ Saved report for layer 0 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_400_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.89      0.63      0.73       179
     CWE-284       1.00      0.56      0.72        16
     CWE-400       0.67      0.20      0.31        40
     CWE-401       0.67      0.17      0.27        24
     CWE-415       0.35      0.45      0.39        29
     CWE-416       0.35      0.73      0.48       108
     CWE-476       0.36      0.52      0.42       110
     CWE-617       0.59      0.44      0.51        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.86      0.71      0.77        17

    accuracy                           0.50       620
   macro avg       0.48      0.37      0.38       620
weighted avg       0.56      0.50      0.49       620

✅ Saved report for layer 2 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_400_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.98      0.22      0.36       179
     CWE-284       0.00      0.00      0.00        16
     CWE-400       0.71      0.25      0.37        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.05      0.52      0.09        29
     CWE-416       0.44      0.48      0.46       108
     CWE-476       0.29      0.40      0.34       110
     CWE-617       1.00      0.02      0.05        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.83      0.29      0.43        17

    accuracy                           0.27       620
   macro avg       0.36      0.18      0.18       620
weighted avg       0.55      0.27      0.29       620

✅ Saved report for layer 3 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_400_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.88      0.43      0.58       179
     CWE-284       1.00      0.25      0.40        16
     CWE-400       0.05      0.03      0.03        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       1.00      0.07      0.13        29
     CWE-416       0.39      0.48      0.43       108
     CWE-476       0.23      0.77      0.36       110
     CWE-617       1.00      0.02      0.05        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       1.00      0.06      0.11        17

    accuracy                           0.36       620
   macro avg       0.46      0.18      0.17       620
weighted avg       0.53      0.36      0.33       620

✅ Saved report for layer 4 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_400_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.72      0.54      0.62       179
     CWE-284       0.00      0.00      0.00        16
     CWE-400       0.24      0.17      0.20        40
     CWE-401       0.80      0.17      0.28        24
     CWE-415       0.00      0.00      0.00        29
     CWE-416       0.26      0.23      0.25       108
     CWE-476       0.24      0.72      0.36       110
     CWE-617       0.00      0.00      0.00        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.80      0.24      0.36        17

    accuracy                           0.35       620
   macro avg       0.26      0.17      0.17       620
weighted avg       0.37      0.35      0.32       620

✅ Saved report for layer 5 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_400_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.00      0.00      0.00       179
     CWE-284       0.00      0.00      0.00        16
     CWE-400       0.00      0.00      0.00        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.05      1.00      0.09        29
     CWE-416       0.00      0.00      0.00       108
     CWE-476       0.00      0.00      0.00       110
     CWE-617       0.00      0.00      0.00        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.00      0.00      0.00        17

    accuracy                           0.05       620
   macro avg       0.00      0.08      0.01       620
weighted avg       0.00      0.05      0.00       620

✅ Saved report for layer 6 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_400_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.85      0.28      0.43       179
     CWE-284       0.00      0.00      0.00        16
     CWE-400       0.57      0.10      0.17        40
     CWE-401       1.00      0.17      0.29        24
     CWE-415       0.06      0.97      0.11        29
     CWE-416       0.50      0.06      0.10       108
     CWE-476       0.43      0.14      0.21       110
     CWE-617       0.00      0.00      0.00        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.50      0.06      0.11        17

    accuracy                           0.18       620
   macro avg       0.33      0.15      0.12       620
weighted avg       0.50      0.18      0.21       620

✅ Saved report for layer 7 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_400_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       1.00      0.10      0.17        21
     CWE-190       0.87      0.70      0.77       179
     CWE-284       0.90      0.56      0.69        16
     CWE-400       0.67      0.25      0.36        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.11      0.72      0.19        29
     CWE-416       0.53      0.31      0.39       108
     CWE-476       0.36      0.44      0.39       110
     CWE-617       0.59      0.56      0.57        43
      CWE-78       1.00      0.40      0.57        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       1.00      0.65      0.79        17

    accuracy                           0.46       620
   macro avg       0.58      0.39      0.41       620
weighted avg       0.60      0.46      0.49       620

✅ Saved report for layer 8 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_400_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       1.00      0.01      0.02       179
     CWE-284       1.00      0.19      0.32        16
     CWE-400       0.33      0.03      0.05        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.05      1.00      0.09        29
     CWE-416       0.00      0.00      0.00       108
     CWE-476       0.33      0.01      0.02       110
     CWE-617       0.00      0.00      0.00        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       1.00      0.06      0.11        17

    accuracy                           0.06       620
   macro avg       0.31      0.11      0.05       620
weighted avg       0.42      0.06      0.03       620

✅ Saved report for layer 9 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_400_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.71      0.68      0.69       179
     CWE-284       0.89      0.50      0.64        16
     CWE-400       0.57      0.42      0.49        40
     CWE-401       1.00      0.17      0.29        24
     CWE-415       0.18      0.66      0.28        29
     CWE-416       0.53      0.45      0.49       108
     CWE-476       0.38      0.61      0.47       110
     CWE-617       0.71      0.23      0.35        43
      CWE-78       0.71      0.50      0.59        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       1.00      0.59      0.74        17

    accuracy                           0.50       620
   macro avg       0.56      0.40      0.42       620
weighted avg       0.56      0.50      0.49       620

✅ Saved report for layer 10 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_400_5/la

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


450 flips

In [None]:
import torch
import random
import struct

def flip_random_bits_in_query_weight(model, layer_index=0, num_bits=450):
    """
    Flips `num_bits` random bits in the self-attention query weights of the specified layer in the model.

    Returns:
        List of (row, col, original_val, flipped_val) tuples.
    """
    flipped = []

    weight = model.roberta.encoder.layer[layer_index].attention.self.query.weight
    weight_data = weight.data.cpu().numpy()

    num_rows, num_cols = weight_data.shape

    for _ in range(num_bits):
        row = random.randint(0, num_rows - 1)
        col = random.randint(0, num_cols - 1)
        original_val = weight_data[row, col]

        # Convert float to int bit pattern
        int_bits = struct.unpack('>I', struct.pack('>f', original_val))[0]
        bit_to_flip = random.randint(0, 31)
        flipped_bits = int_bits ^ (1 << bit_to_flip)
        flipped_val = struct.unpack('>f', struct.pack('>I', flipped_bits))[0]

        weight_data[row, col] = flipped_val
        flipped.append((row, col, original_val, flipped_val))

    weight.data = torch.tensor(weight_data, dtype=weight.dtype, device=weight.device)
    return flipped


In [None]:
import os
import json
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from sklearn.metrics import classification_report

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_path = "/content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may"
results_dir = os.path.join(model_path, "bitflip_results_450_5")
os.makedirs(results_dir, exist_ok=True)

number_bits = 450  # ✅ define your flip count here

for layer_idx in range(12):
    print(f"\n--- Flipping Layer {layer_idx} ---")

    # Step 1: Reload model
    model = AutoModelForSequenceClassification.from_pretrained(model_path).to(device)
    tokenizer = AutoTokenizer.from_pretrained(model_path)

    # Step 2: Inject bit flips
    flipped = flip_random_bits_in_query_weight(model, layer_index=layer_idx, num_bits=number_bits)

    # Step 3: Inference on df_test
    preds = []
    model.eval()
    for code in df_test['code_before'].values:
        with torch.no_grad():
            inputs = tokenizer(code, return_tensors="pt", truncation=True).to(device)
            logits = model(**inputs).logits
            predicted_class = logits.argmax().item()
            preds.append(predicted_class)

    y_true = [id2label[i] for i in df_test['label'].values]
    y_pred = [id2label[i] for i in preds]

    # Step 4: Report & Save
    report = classification_report(y_true, y_pred, output_dict=True)
    print(classification_report(y_true, y_pred))

    save_path = os.path.join(results_dir, f"layer{layer_idx}_xor{number_bits}_report.json")
    flip_log_path = os.path.join(results_dir, f"layer{layer_idx}_xor{number_bits}_flips.json")

    with open(flip_log_path, "w") as f:
        json.dump([{
            "row": int(row),
            "col": int(col),
            "original": float(orig),
            "flipped": float(flipped_val)
        } for (row, col, orig, flipped_val) in flipped], f, indent=4)

    with open(save_path, "w") as f:
        json.dump(report, f, indent=4)

    print(f"✅ Saved report for layer {layer_idx} to {save_path}")



--- Flipping Layer 0 ---


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.51      0.12      0.19       179
     CWE-284       0.00      0.00      0.00        16
     CWE-400       0.06      0.03      0.04        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.00      0.00      0.00        29
     CWE-416       0.33      0.04      0.07       108
     CWE-476       0.17      0.86      0.29       110
     CWE-617       0.00      0.00      0.00        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.00      0.00      0.00        17

    accuracy                           0.20       620
   macro avg       0.09      0.09      0.05       620
weighted avg       0.24      0.20      0.12       620

✅ Saved report for layer 0 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_450_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.25      0.01      0.01       179
     CWE-284       0.00      0.00      0.00        16
     CWE-400       1.00      0.07      0.14        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.05      0.93      0.09        29
     CWE-416       0.23      0.03      0.05       108
     CWE-476       0.18      0.08      0.11       110
     CWE-617       0.00      0.00      0.00        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.00      0.00      0.00        17

    accuracy                           0.07       620
   macro avg       0.14      0.09      0.03       620
weighted avg       0.21      0.07      0.05       620

✅ Saved report for layer 1 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_450_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.93      0.62      0.74       179
     CWE-284       1.00      0.56      0.72        16
     CWE-400       0.74      0.35      0.47        40
     CWE-401       1.00      0.17      0.29        24
     CWE-415       0.92      0.41      0.57        29
     CWE-416       0.34      0.74      0.47       108
     CWE-476       0.28      0.51      0.36       110
     CWE-617       0.78      0.16      0.27        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.82      0.53      0.64        17

    accuracy                           0.49       620
   macro avg       0.57      0.34      0.38       620
weighted avg       0.61      0.49      0.48       620

✅ Saved report for layer 2 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_450_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.88      0.17      0.28       179
     CWE-284       0.00      0.00      0.00        16
     CWE-400       0.27      0.10      0.15        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.19      0.55      0.29        29
     CWE-416       0.41      0.54      0.47       108
     CWE-476       0.20      0.64      0.31       110
     CWE-617       0.00      0.00      0.00        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       1.00      0.06      0.11        17

    accuracy                           0.29       620
   macro avg       0.25      0.17      0.13       620
weighted avg       0.42      0.29      0.24       620

✅ Saved report for layer 3 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_450_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.00      0.00      0.00       179
     CWE-284       0.00      0.00      0.00        16
     CWE-400       0.00      0.00      0.00        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.05      1.00      0.09        29
     CWE-416       1.00      0.04      0.07       108
     CWE-476       0.67      0.02      0.04       110
     CWE-617       0.00      0.00      0.00        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.00      0.00      0.00        17

    accuracy                           0.06       620
   macro avg       0.14      0.09      0.02       620
weighted avg       0.29      0.06      0.02       620

✅ Saved report for layer 4 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_450_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.89      0.64      0.75       179
     CWE-284       1.00      0.31      0.48        16
     CWE-400       0.50      0.17      0.26        40
     CWE-401       1.00      0.17      0.29        24
     CWE-415       0.50      0.17      0.26        29
     CWE-416       0.40      0.33      0.36       108
     CWE-476       0.24      0.79      0.37       110
     CWE-617       0.00      0.00      0.00        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       1.00      0.29      0.45        17

    accuracy                           0.43       620
   macro avg       0.46      0.24      0.27       620
weighted avg       0.52      0.43      0.41       620

✅ Saved report for layer 5 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_450_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.94      0.50      0.65       179
     CWE-284       1.00      0.50      0.67        16
     CWE-400       0.75      0.07      0.14        40
     CWE-401       1.00      0.17      0.29        24
     CWE-415       0.58      0.24      0.34        29
     CWE-416       0.43      0.56      0.48       108
     CWE-476       0.25      0.75      0.37       110
     CWE-617       0.80      0.19      0.30        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.92      0.71      0.80        17

    accuracy                           0.44       620
   macro avg       0.56      0.31      0.34       620
weighted avg       0.61      0.44      0.44       620

✅ Saved report for layer 6 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_450_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.75      0.02      0.03       179
     CWE-284       0.00      0.00      0.00        16
     CWE-400       0.00      0.00      0.00        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.05      1.00      0.09        29
     CWE-416       0.25      0.02      0.03       108
     CWE-476       0.60      0.03      0.05       110
     CWE-617       0.00      0.00      0.00        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.00      0.00      0.00        17

    accuracy                           0.06       620
   macro avg       0.14      0.09      0.02       620
weighted avg       0.37      0.06      0.03       620

✅ Saved report for layer 8 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_450_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       1.00      0.13      0.23       179
     CWE-284       0.00      0.00      0.00        16
     CWE-400       0.50      0.03      0.05        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.05      0.97      0.09        29
     CWE-416       0.38      0.03      0.05       108
     CWE-476       0.21      0.03      0.05       110
     CWE-617       0.00      0.00      0.00        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.00      0.00      0.00        17

    accuracy                           0.09       620
   macro avg       0.18      0.10      0.04       620
weighted avg       0.43      0.09      0.09       620

✅ Saved report for layer 9 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_450_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       1.00      0.10      0.17        21
     CWE-190       0.76      0.87      0.81       179
     CWE-284       0.92      0.69      0.79        16
     CWE-400       0.56      0.47      0.51        40
     CWE-401       0.67      0.17      0.27        24
     CWE-415       0.25      0.69      0.37        29
     CWE-416       0.50      0.53      0.51       108
     CWE-476       0.46      0.52      0.49       110
     CWE-617       0.48      0.23      0.31        43
      CWE-78       0.71      0.50      0.59        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.86      0.71      0.77        17

    accuracy                           0.57       620
   macro avg       0.60      0.46      0.47       620
weighted avg       0.59      0.57      0.55       620

✅ Saved report for layer 10 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_450_5/la

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


500 flips

In [None]:
import torch
import random
import struct

def flip_random_bits_in_query_weight(model, layer_index=0, num_bits=500):
    """
    Flips `num_bits` random bits in the self-attention query weights of the specified layer in the model.

    Returns:
        List of (row, col, original_val, flipped_val) tuples.
    """
    flipped = []

    weight = model.roberta.encoder.layer[layer_index].attention.self.query.weight
    weight_data = weight.data.cpu().numpy()

    num_rows, num_cols = weight_data.shape

    for _ in range(num_bits):
        row = random.randint(0, num_rows - 1)
        col = random.randint(0, num_cols - 1)
        original_val = weight_data[row, col]

        # Convert float to int bit pattern
        int_bits = struct.unpack('>I', struct.pack('>f', original_val))[0]
        bit_to_flip = random.randint(0, 31)
        flipped_bits = int_bits ^ (1 << bit_to_flip)
        flipped_val = struct.unpack('>f', struct.pack('>I', flipped_bits))[0]

        weight_data[row, col] = flipped_val
        flipped.append((row, col, original_val, flipped_val))

    weight.data = torch.tensor(weight_data, dtype=weight.dtype, device=weight.device)
    return flipped


In [None]:
import os
import json
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from sklearn.metrics import classification_report

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_path = "/content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may"
results_dir = os.path.join(model_path, "bitflip_results_500_5")
os.makedirs(results_dir, exist_ok=True)

number_bits = 500  # ✅ define your flip count here

for layer_idx in range(12):
    print(f"\n--- Flipping Layer {layer_idx} ---")

    # Step 1: Reload model
    model = AutoModelForSequenceClassification.from_pretrained(model_path).to(device)
    tokenizer = AutoTokenizer.from_pretrained(model_path)

    # Step 2: Inject bit flips
    flipped = flip_random_bits_in_query_weight(model, layer_index=layer_idx, num_bits=number_bits)

    # Step 3: Inference on df_test
    preds = []
    model.eval()
    for code in df_test['code_before'].values:
        with torch.no_grad():
            inputs = tokenizer(code, return_tensors="pt", truncation=True).to(device)
            logits = model(**inputs).logits
            predicted_class = logits.argmax().item()
            preds.append(predicted_class)

    y_true = [id2label[i] for i in df_test['label'].values]
    y_pred = [id2label[i] for i in preds]

    # Step 4: Report & Save
    report = classification_report(y_true, y_pred, output_dict=True)
    print(classification_report(y_true, y_pred))

    save_path = os.path.join(results_dir, f"layer{layer_idx}_xor{number_bits}_report.json")
    flip_log_path = os.path.join(results_dir, f"layer{layer_idx}_xor{number_bits}_flips.json")

    with open(flip_log_path, "w") as f:
        json.dump([{
            "row": int(row),
            "col": int(col),
            "original": float(orig),
            "flipped": float(flipped_val)
        } for (row, col, orig, flipped_val) in flipped], f, indent=4)

    with open(save_path, "w") as f:
        json.dump(report, f, indent=4)

    print(f"✅ Saved report for layer {layer_idx} to {save_path}")



--- Flipping Layer 0 ---


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.00      0.00      0.00       179
     CWE-284       0.00      0.00      0.00        16
     CWE-400       0.00      0.00      0.00        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.05      1.00      0.09        29
     CWE-416       0.00      0.00      0.00       108
     CWE-476       0.00      0.00      0.00       110
     CWE-617       0.00      0.00      0.00        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.00      0.00      0.00        17

    accuracy                           0.05       620
   macro avg       0.00      0.08      0.01       620
weighted avg       0.00      0.05      0.00       620

✅ Saved report for layer 0 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_500_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.00      0.00      0.00       179
     CWE-284       0.00      0.00      0.00        16
     CWE-400       0.00      0.00      0.00        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.05      1.00      0.09        29
     CWE-416       0.00      0.00      0.00       108
     CWE-476       0.00      0.00      0.00       110
     CWE-617       0.00      0.00      0.00        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.00      0.00      0.00        17

    accuracy                           0.05       620
   macro avg       0.00      0.08      0.01       620
weighted avg       0.00      0.05      0.00       620

✅ Saved report for layer 1 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_500_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.88      0.55      0.68       179
     CWE-284       0.92      0.69      0.79        16
     CWE-400       0.55      0.42      0.48        40
     CWE-401       1.00      0.17      0.29        24
     CWE-415       0.13      0.52      0.21        29
     CWE-416       0.42      0.60      0.49       108
     CWE-476       0.32      0.50      0.39       110
     CWE-617       0.56      0.12      0.19        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.75      0.35      0.48        17

    accuracy                           0.45       620
   macro avg       0.46      0.33      0.33       620
weighted avg       0.55      0.45      0.45       620

✅ Saved report for layer 2 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_500_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.00      0.00      0.00       179
     CWE-284       0.00      0.00      0.00        16
     CWE-400       0.00      0.00      0.00        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.05      1.00      0.09        29
     CWE-416       0.00      0.00      0.00       108
     CWE-476       0.00      0.00      0.00       110
     CWE-617       0.00      0.00      0.00        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.00      0.00      0.00        17

    accuracy                           0.05       620
   macro avg       0.00      0.08      0.01       620
weighted avg       0.00      0.05      0.00       620

✅ Saved report for layer 3 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_500_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.92      0.59      0.72       179
     CWE-284       1.00      0.44      0.61        16
     CWE-400       0.50      0.17      0.26        40
     CWE-401       1.00      0.17      0.29        24
     CWE-415       0.50      0.21      0.29        29
     CWE-416       0.48      0.73      0.58       108
     CWE-476       0.29      0.74      0.41       110
     CWE-617       0.83      0.12      0.20        43
      CWE-78       1.00      0.40      0.57        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       1.00      0.41      0.58        17

    accuracy                           0.49       620
   macro avg       0.63      0.33      0.38       620
weighted avg       0.62      0.49      0.48       620

✅ Saved report for layer 4 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_500_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       1.00      0.14      0.25        21
     CWE-190       0.78      0.77      0.78       179
     CWE-284       1.00      0.38      0.55        16
     CWE-400       0.42      0.20      0.27        40
     CWE-401       1.00      0.17      0.29        24
     CWE-415       0.86      0.21      0.33        29
     CWE-416       0.52      0.31      0.39       108
     CWE-476       0.24      0.75      0.37       110
     CWE-617       0.00      0.00      0.00        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       1.00      0.06      0.11        17

    accuracy                           0.46       620
   macro avg       0.57      0.25      0.28       620
weighted avg       0.55      0.46      0.43       620

✅ Saved report for layer 5 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_500_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.79      0.06      0.11       179
     CWE-284       1.00      0.19      0.32        16
     CWE-400       0.62      0.12      0.21        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.05      0.62      0.09        29
     CWE-416       0.48      0.30      0.37       108
     CWE-476       0.27      0.35      0.30       110
     CWE-617       0.00      0.00      0.00        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.50      0.06      0.11        17

    accuracy                           0.18       620
   macro avg       0.31      0.14      0.13       620
weighted avg       0.44      0.18      0.18       620

✅ Saved report for layer 6 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_500_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.00      0.00      0.00       179
     CWE-284       0.00      0.00      0.00        16
     CWE-400       0.00      0.00      0.00        40
     CWE-401       0.00      0.00      0.00        24
     CWE-415       0.05      1.00      0.09        29
     CWE-416       0.00      0.00      0.00       108
     CWE-476       1.00      0.06      0.12       110
     CWE-617       0.00      0.00      0.00        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.00      0.00      0.00        17

    accuracy                           0.06       620
   macro avg       0.09      0.09      0.02       620
weighted avg       0.18      0.06      0.03       620

✅ Saved report for layer 7 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_500_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.91      0.28      0.43       179
     CWE-284       1.00      0.44      0.61        16
     CWE-400       0.73      0.20      0.31        40
     CWE-401       1.00      0.17      0.29        24
     CWE-415       0.06      0.93      0.11        29
     CWE-416       0.68      0.16      0.26       108
     CWE-476       0.36      0.15      0.21       110
     CWE-617       1.00      0.19      0.31        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       0.75      0.18      0.29        17

    accuracy                           0.23       620
   macro avg       0.54      0.22      0.23       620
weighted avg       0.65      0.23      0.29       620

✅ Saved report for layer 8 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_500_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.90      0.25      0.39       179
     CWE-284       1.00      0.38      0.55        16
     CWE-400       0.00      0.00      0.00        40
     CWE-401       1.00      0.17      0.29        24
     CWE-415       0.06      1.00      0.11        29
     CWE-416       0.46      0.10      0.17       108
     CWE-476       0.50      0.10      0.17       110
     CWE-617       1.00      0.09      0.17        43
      CWE-78       0.00      0.00      0.00        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       1.00      0.18      0.30        17

    accuracy                           0.18       620
   macro avg       0.49      0.19      0.18       620
weighted avg       0.59      0.18      0.22       620

✅ Saved report for layer 9 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_500_5/lay

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


              precision    recall  f1-score   support

     CWE-122       0.00      0.00      0.00        21
     CWE-190       0.90      0.74      0.82       179
     CWE-284       1.00      0.50      0.67        16
     CWE-400       0.57      0.42      0.49        40
     CWE-401       0.67      0.17      0.27        24
     CWE-415       0.14      0.72      0.24        29
     CWE-416       0.53      0.32      0.40       108
     CWE-476       0.34      0.56      0.42       110
     CWE-617       0.69      0.26      0.37        43
      CWE-78       0.56      0.50      0.53        10
     CWE-835       0.00      0.00      0.00        23
     CWE-843       1.00      0.41      0.58        17

    accuracy                           0.49       620
   macro avg       0.53      0.38      0.40       620
weighted avg       0.59      0.49      0.50       620

✅ Saved report for layer 10 to /content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance/baseline_model_may/bitflip_results_500_5/la

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [None]:
model_path = "/content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance"
model = AutoModelForSequenceClassification.from_pretrained(model_path).to("cuda")
tokenizer = AutoTokenizer.from_pretrained(model_path)

In [None]:
model.roberta.encoder.layer[0].attention.self.query.weight

Parameter containing:
tensor([[ 7.8795e-02,  1.1391e-02, -2.1290e-03,  ...,  8.4106e-03,
          7.7333e-02, -3.6564e-02],
        [-1.2574e-02,  1.1640e-01,  2.8115e-02,  ...,  2.5962e-02,
          8.9647e-02,  1.2310e-01],
        [ 8.5845e-02, -1.2937e-04, -1.2328e-02,  ..., -4.0053e-02,
         -2.7405e-02,  1.2359e-01],
        ...,
        [-1.1078e-01, -7.6444e-03, -3.4692e-02,  ...,  1.0139e-02,
          1.5933e-02, -2.0781e-02],
        [-1.7847e-01,  2.9770e-02,  5.6368e-02,  ...,  5.9899e-02,
         -1.6478e-01, -1.4955e-02],
        [-8.4729e-02, -7.6936e-02,  1.0213e-01,  ..., -1.4604e-01,
         -3.1576e-02, -7.7963e-02]], device='cuda:0', requires_grad=True)

In [None]:
import struct
import random

def flip_bit_with_xor(f, bit_index):
    """Flip a specific bit in a float32 using XOR."""
    i = struct.unpack('>I', struct.pack('>f', f))[0]
    i ^= (1 << bit_index)
    return struct.unpack('>f', struct.pack('>I', i))[0]

def flip_random_bit_xor(f):
    """Flip a random bit (0–31) in a float32."""
    bit_index = random.randint(0, 31)
    return flip_bit_with_xor(f, bit_index)


In [None]:
def inject_xor_bit_flips(model, layer_index=0, param_name='query', num_flips=5):
    param = getattr(model.roberta.encoder.layer[layer_index].attention.self, param_name).weight
    flipped_details = []

    with torch.no_grad():
        for _ in range(num_flips):
            row = random.randint(0, param.shape[0] - 1)
            col = random.randint(0, param.shape[1] - 1)
            original_val = param[row, col].item()
            flipped_val = flip_random_bit_xor(original_val)
            param[row, col] = torch.tensor(flipped_val, device=param.device)
            flipped_details.append((row, col, original_val, flipped_val))

    return flipped_details


In [None]:
from sklearn.metrics import classification_report
import json

def evaluate_and_save_report(model, tokenizer, df_test, save_path):
    from sklearn.metrics import classification_report
    import json

    model.eval()
    y_true = list(df_test['label'])
    y_pred = []

    for code in df_test['code_before']:
        inputs = tokenizer(code, return_tensors='pt', truncation=True, padding=True).to(model.device)
        with torch.no_grad():
            logits = model(**inputs).logits
            pred = torch.argmax(logits).item()
            y_pred.append(pred)

    report = classification_report(y_true, y_pred, output_dict=True)
    report_str = classification_report(y_true, y_pred, digits=2)

    # Print to console
    print("\n📊 Classification Report:\n")
    print(report_str)

    # Save to JSON file
    with open(save_path, 'w') as f:
        json.dump(report, f, indent=2)

    print(f"\n✅ Saved classification report to {save_path}")


In [None]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer

# Step A: Load baseline model
model_path = "/content/drive/MyDrive/Colab_Notebooks/MSc_Fault_Tolerance"
model = AutoModelForSequenceClassification.from_pretrained(model_path).to("cuda")
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Step B: Inject 5 XOR bit flips in Layer 0 query
flipped = inject_xor_bit_flips(model, layer_index=0, param_name='query', num_flips=5)
print("🔧 Flipped weights (row, col, original, flipped):")
for f in flipped:
    print(f)

# Step C: Save compromised model
flipped_model_path = f"{model_path}/flipped_layer0_query_xor5"
model.save_pretrained(flipped_model_path)
tokenizer.save_pretrained(flipped_model_path)

# Step D: Evaluate and save report
report_path = f"{model_path}/report_layer0_query_xor5.json"
evaluate_and_save_report(model, tokenizer, df_test, report_path)

🔧 Flipped weights (row, col, original, flipped):
(344, 759, 0.04989694431424141, 0.04989682510495186)
(56, 739, -0.08610805869102478, -0.07829555869102478)
(730, 589, -0.1887557953596115, -0.1887252777814865)
(694, 686, -0.05217772349715233, -0.00020381923241075128)
(734, 137, 0.1133853867650032, 0.0977603867650032)


NameError: name 'df_test' is not defined

# **Bit Search**

**The Method:**

The method ranks bits based on how sensitive each one is to the network’s loss

1. Select a small batch of data (128 samples) from the test set.
2. Enable gradient tracking on the final layer’s weights.
3. Forward pass to compute the classification loss (cross-entropy).
4. Backward pass to get gradients with respect to the final layer’s weights:
* dLoss/dWeight

5. Convert those weight gradients to “bit gradients” and rank them based on magnitude.
6. Flip the top N most sign critical bits.
7. Update the final layer’s weights with those flipped bits.
8. Evaluate on the full test set to measure the new performance.
9. Repeat until model’s performance collapses.

PBS assumes that bits whose weights have the highest magnitude gradients will yield the biggest loss increase if flipped. Flipping these bits is therefore the most “damaging” to the network.

* Forward Pass: Tells how the model transforms inputs into predictions.
* Backward Pass: Tells how each parameter influenced the final loss, which is crucial for:
1. Training (optimising weights).
2. Fault Injection to find which bits in the weights, if flipped, will increase the loss the most.

The gradient dLoss/dWeight can show how strongly a weight influences the loss. A large absolute gradient would be considered more important to the inference, compared to a smaller gradient.


## Step 1: Pick a Small Batch of Data

Sample random 128 code snippets from our `df_test`, tokenize them, and move them to the appropriate device (CPU or GPU). This small batch will be used for the forward-backward pass to compute gradients in the next step.


In [None]:
import random

batch_size = 32
# If df_test['code'] is your column with code snippets:
random_indices = random.sample(range(len(df_test)), batch_size)
subset = df_test.iloc[random_indices]
subset.head()

Unnamed: 0,code,CWE-Type,label
330857,void elst_del(GF_Box *s)\n{\n\tGF_EditListBox ...,CWE400,1
242052,static char *malloc_option_value_string(uint8_...,CWE0,0
218868,ConnectionHandlerImpl::ActiveListenerImplBase:...,CWE400,1
67379,static inline u32 vm_entry_controls_get(struct...,CWE0,0
183837,HttpDownstreamConnection::get_downstream_addr_...,CWE0,0


tokenize the subset as earlier:

In [None]:
inputs = tokenizer(
    list(subset['code']),      # Convert the column to a list
    truncation=True,
    padding=True,
    return_tensors='pt'
).to(device)                   # Move tensors to the same device as the model


In [None]:
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
print("Logits shape:", logits.shape)


Logits shape: torch.Size([32, 7])


In [None]:
import torch
import numpy as np

model.to("cpu")
model.train()
model.zero_grad()

# Suppose you have 'df_test_subset' with 128 rows and columns ['code', 'label']
codes = subset["code"].tolist()
labels = subset["label"].tolist()

inputs_cpu = tokenizer(
    codes,
    truncation=True,
    padding=True,
    return_tensors="pt"
)

# Convert labels to a torch tensor
labels_tensor = torch.tensor(labels, dtype=torch.long)
inputs_cpu["labels"] = labels_tensor

# Move everything to CPU
for k, v in inputs_cpu.items():
    inputs_cpu[k] = v.to("cpu")

# Forward pass with labels
outputs = model(**inputs_cpu)
loss = outputs.loss  # Now this exists!
print("Loss before bit flipping (CPU):", loss.item())

# Backward pass
loss.backward()
...


Loss before bit flipping (CPU): 0.3840833306312561


Ellipsis

Now I'm tryna flip bits not in the final out_proj layer, but in other layers. Let's inspect model parameters:

In [None]:
for name, param in model.named_parameters():
    print(name, param.shape)


roberta.embeddings.word_embeddings.weight torch.Size([50265, 768])
roberta.embeddings.position_embeddings.weight torch.Size([514, 768])
roberta.embeddings.token_type_embeddings.weight torch.Size([1, 768])
roberta.embeddings.LayerNorm.weight torch.Size([768])
roberta.embeddings.LayerNorm.bias torch.Size([768])
roberta.encoder.layer.0.attention.self.query.weight torch.Size([768, 768])
roberta.encoder.layer.0.attention.self.query.bias torch.Size([768])
roberta.encoder.layer.0.attention.self.key.weight torch.Size([768, 768])
roberta.encoder.layer.0.attention.self.key.bias torch.Size([768])
roberta.encoder.layer.0.attention.self.value.weight torch.Size([768, 768])
roberta.encoder.layer.0.attention.self.value.bias torch.Size([768])
roberta.encoder.layer.0.attention.output.dense.weight torch.Size([768, 768])
roberta.encoder.layer.0.attention.output.dense.bias torch.Size([768])
roberta.encoder.layer.0.attention.output.LayerNorm.weight torch.Size([768])
roberta.encoder.layer.0.attention.output.

I will explore accessing every layer iteratively to flip a K number of weights (1 bit per weight), and analyse the depreciation in performance:

* Starting with the first layer:

Let's attempt to flip bits in roberta.encoder.layer.0.attention.self.query.weight


In [None]:
layer_grad = model.roberta.encoder.layer[0].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)


Gradient shape: torch.Size([768, 768])


In [None]:
layer_grad = model.roberta.encoder.layer[0].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)


Gradient shape: torch.Size([768, 768])


In [None]:
import numpy as np
import torch

# Retrieve the gradient of the first layer
layer_grad = model.roberta.encoder.layer[0].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)

# Convert to CPU NumPy
grad_np = layer_grad.detach().cpu().numpy()
layer_weights_np = model.roberta.encoder.layer[0].attention.self.query.weight.data.detach().cpu().numpy()

# Flatten and rank
grad_flat = grad_np.flatten()
weights_flat = layer_weights_np.flatten()

K = 50
sorted_indices = np.argsort(-np.abs(grad_flat))  # descending order by abs(gradient)
top_k_indices = sorted_indices[:K]
print(f"Flipping bits in the top {K} weights by abs(gradient) in first layer query.")

# Flip the chosen exponent bit
weights_uint = layer_weights_np.view(np.uint32)
weights_uint_flat = weights_uint.flatten()

bit_position = 31
mask = 1 << bit_position

for idx in top_k_indices:
    weights_uint_flat[idx] ^= mask

# Convert back to float32 and update the model
modified_weights_uint = weights_uint_flat.reshape(weights_uint.shape)
modified_weights_fp32 = modified_weights_uint.view(np.float32)

model.roberta.encoder.layer[0].attention.self.query.weight.data.copy_(
    torch.tensor(modified_weights_fp32, dtype=torch.float32, device=layer_grad.device)
)
print("Bit flipping complete. The first layer (query) has been updated.")


Gradient shape: torch.Size([768, 768])
Flipping bits in the top 50 weights by abs(gradient) in first layer query.
Bit flipping complete. The first layer (query) has been updated.


evaluate the flipped model, same as the baseline evaluation was done:

In [None]:
device = "cuda"
model.to(device)
preds = []
for code_snippet in df_test['code'].values:
    with torch.no_grad():
        inputs = tokenizer(code_snippet, return_tensors="pt", truncation=True).to(device)
        logits = model(**inputs).logits
        predicted_class_id = logits.argmax().item()
        preds.append(predicted_class_id)

y_true = [id2label[i] for i in df_test['label'].values]
y_pred = [id2label[i] for i in preds]

from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred))

              precision    recall  f1-score   support

        CWE0       0.91      0.97      0.94      9785
      CWE120       0.88      0.81      0.84      1093
      CWE122       0.85      0.87      0.86       286
      CWE190       0.87      0.78      0.82      2228
      CWE369       0.80      0.76      0.78       705
      CWE400       0.92      0.79      0.85      1906
      CWE502       0.93      0.58      0.72        24

    accuracy                           0.90     16027
   macro avg       0.88      0.79      0.83     16027
weighted avg       0.90      0.90      0.89     16027



Now exploring the next layer:

In [None]:
layer_grad = model.roberta.encoder.layer[1].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)
layer_grad = model.roberta.encoder.layer[1].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)



Gradient shape: torch.Size([768, 768])
Gradient shape: torch.Size([768, 768])


In [None]:
import numpy as np
import torch

# Retrieve the gradient of the first layer
layer_grad = model.roberta.encoder.layer[1].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)

# Convert to CPU NumPy
grad_np = layer_grad.detach().cpu().numpy()
layer_weights_np = model.roberta.encoder.layer[1].attention.self.query.weight.data.detach().cpu().numpy()

# Flatten and rank
grad_flat = grad_np.flatten()
weights_flat = layer_weights_np.flatten()

K = 50
sorted_indices = np.argsort(-np.abs(grad_flat))  # descending order by abs(gradient)
top_k_indices = sorted_indices[:K]
print(f"Flipping bits in the top {K} weights by abs(gradient) in first layer query.")

# Flip the chosen exponent bit (e.g., bit 25)
weights_uint = layer_weights_np.view(np.uint32)
weights_uint_flat = weights_uint.flatten()

bit_position = 31
mask = 1 << bit_position

for idx in top_k_indices:
    weights_uint_flat[idx] ^= mask

#  Convert back to float32 and update the model
modified_weights_uint = weights_uint_flat.reshape(weights_uint.shape)
modified_weights_fp32 = modified_weights_uint.view(np.float32)

model.roberta.encoder.layer[1].attention.self.query.weight.data.copy_(
    torch.tensor(modified_weights_fp32, dtype=torch.float32, device=layer_grad.device)
)
print("Bit flipping complete. The first layer (query) has been updated.")


Gradient shape: torch.Size([768, 768])
Flipping bits in the top 50 weights by abs(gradient) in first layer query.
Bit flipping complete. The first layer (query) has been updated.


In [None]:
device = "cuda"
model.to(device)
preds = []
for code_snippet in df_test['code'].values:
    with torch.no_grad():
        inputs = tokenizer(code_snippet, return_tensors="pt", truncation=True).to(device)
        logits = model(**inputs).logits
        predicted_class_id = logits.argmax().item()
        preds.append(predicted_class_id)

y_true = [id2label[i] for i in df_test['label'].values]
y_pred = [id2label[i] for i in preds]

from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred))

              precision    recall  f1-score   support

        CWE0       0.90      0.96      0.93      9785
      CWE120       0.86      0.82      0.84      1093
      CWE122       0.86      0.87      0.87       286
      CWE190       0.88      0.77      0.82      2228
      CWE369       0.78      0.76      0.77       705
      CWE400       0.91      0.77      0.84      1906
      CWE502       1.00      0.58      0.74        24

    accuracy                           0.89     16027
   macro avg       0.89      0.79      0.83     16027
weighted avg       0.89      0.89      0.89     16027



**Next Layer:**

In [None]:
layer_grad = model.roberta.encoder.layer[2].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)
layer_grad = model.roberta.encoder.layer[2].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)



Gradient shape: torch.Size([768, 768])
Gradient shape: torch.Size([768, 768])


In [None]:
import numpy as np
import torch

#  Retrieve the gradient of the first layer
layer_grad = model.roberta.encoder.layer[2].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)

#  Convert to CPU NumPy
grad_np = layer_grad.detach().cpu().numpy()
layer_weights_np = model.roberta.encoder.layer[2].attention.self.query.weight.data.detach().cpu().numpy()

#. Flatten and rank
grad_flat = grad_np.flatten()
weights_flat = layer_weights_np.flatten()

K = 50
sorted_indices = np.argsort(-np.abs(grad_flat))  # descending order by abs(gradient)
top_k_indices = sorted_indices[:K]
print(f"Flipping bits in the top {K} weights by abs(gradient) in first layer query.")

# Flip the chosen exponent bit (e.g., bit 25)
weights_uint = layer_weights_np.view(np.uint32)
weights_uint_flat = weights_uint.flatten()

bit_position = 31
mask = 1 << bit_position

for idx in top_k_indices:
    weights_uint_flat[idx] ^= mask

#  Convert back to float32 and update the model
modified_weights_uint = weights_uint_flat.reshape(weights_uint.shape)
modified_weights_fp32 = modified_weights_uint.view(np.float32)

model.roberta.encoder.layer[2].attention.self.query.weight.data.copy_(
    torch.tensor(modified_weights_fp32, dtype=torch.float32, device=layer_grad.device)
)
print("Bit flipping complete. The first layer (query) has been updated.")


Gradient shape: torch.Size([768, 768])
Flipping bits in the top 50 weights by abs(gradient) in first layer query.
Bit flipping complete. The first layer (query) has been updated.


In [None]:
device = "cuda"
model.to(device)
preds = []
for code_snippet in df_test['code'].values:
    with torch.no_grad():
        inputs = tokenizer(code_snippet, return_tensors="pt", truncation=True).to(device)
        logits = model(**inputs).logits
        predicted_class_id = logits.argmax().item()
        preds.append(predicted_class_id)

y_true = [id2label[i] for i in df_test['label'].values]
y_pred = [id2label[i] for i in preds]

from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred))

              precision    recall  f1-score   support

        CWE0       0.91      0.96      0.93      9785
      CWE120       0.85      0.82      0.83      1093
      CWE122       0.85      0.86      0.85       286
      CWE190       0.86      0.78      0.82      2228
      CWE369       0.78      0.76      0.77       705
      CWE400       0.90      0.80      0.85      1906
      CWE502       0.92      0.50      0.65        24

    accuracy                           0.89     16027
   macro avg       0.87      0.78      0.81     16027
weighted avg       0.89      0.89      0.89     16027



4th Layer:

In [None]:
layer_grad = model.roberta.encoder.layer[3].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)
layer_grad = model.roberta.encoder.layer[3].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)



Gradient shape: torch.Size([768, 768])
Gradient shape: torch.Size([768, 768])


In [None]:
import numpy as np
import torch

#  Retrieve the gradient of the first layer
layer_grad = model.roberta.encoder.layer[3].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)

# Convert to CPU NumPy
grad_np = layer_grad.detach().cpu().numpy()
layer_weights_np = model.roberta.encoder.layer[3].attention.self.query.weight.data.detach().cpu().numpy()

# Flatten and rank
grad_flat = grad_np.flatten()
weights_flat = layer_weights_np.flatten()

K = 50
sorted_indices = np.argsort(-np.abs(grad_flat))  # descending order by abs(gradient)
top_k_indices = sorted_indices[:K]
print(f"Flipping bits in the top {K} weights by abs(gradient) in first layer query.")

# Flip the chosen exponent bit (e.g., bit 25)
weights_uint = layer_weights_np.view(np.uint32)
weights_uint_flat = weights_uint.flatten()

bit_position = 31
mask = 1 << bit_position

for idx in top_k_indices:
    weights_uint_flat[idx] ^= mask

# Convert back to float32 and update the model
modified_weights_uint = weights_uint_flat.reshape(weights_uint.shape)
modified_weights_fp32 = modified_weights_uint.view(np.float32)

model.roberta.encoder.layer[3].attention.self.query.weight.data.copy_(
    torch.tensor(modified_weights_fp32, dtype=torch.float32, device=layer_grad.device)
)
print("Bit flipping complete. The first layer (query) has been updated.")


Gradient shape: torch.Size([768, 768])
Flipping bits in the top 50 weights by abs(gradient) in first layer query.
Bit flipping complete. The first layer (query) has been updated.


In [None]:
device = "cuda"
model.to(device)
preds = []
for code_snippet in df_test['code'].values:
    with torch.no_grad():
        inputs = tokenizer(code_snippet, return_tensors="pt", truncation=True).to(device)
        logits = model(**inputs).logits
        predicted_class_id = logits.argmax().item()
        preds.append(predicted_class_id)

y_true = [id2label[i] for i in df_test['label'].values]
y_pred = [id2label[i] for i in preds]

from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred))

              precision    recall  f1-score   support

        CWE0       0.90      0.97      0.93      9785
      CWE120       0.87      0.81      0.84      1093
      CWE122       0.85      0.88      0.86       286
      CWE190       0.88      0.78      0.82      2228
      CWE369       0.80      0.76      0.78       705
      CWE400       0.92      0.78      0.84      1906
      CWE502       1.00      0.67      0.80        24

    accuracy                           0.90     16027
   macro avg       0.89      0.80      0.84     16027
weighted avg       0.89      0.90      0.89     16027



5th Layer:

In [None]:
layer_grad = model.roberta.encoder.layer[4].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)
layer_grad = model.roberta.encoder.layer[4].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.s  hape)



SyntaxError: invalid syntax. Perhaps you forgot a comma? (<ipython-input-86-3bf2b4c949e7>, line 4)

In [None]:
import numpy as np
import torch

# Retrieve the gradient of the first layer
layer_grad = model.roberta.encoder.layer[4].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)

# Convert to CPU NumPy
grad_np = layer_grad.detach().cpu().numpy()
layer_weights_np = model.roberta.encoder.layer[4].attention.self.query.weight.data.detach().cpu().numpy()

# Flatten and rank
grad_flat = grad_np.flatten()
weights_flat = layer_weights_np.flatten()

K = 50
sorted_indices = np.argsort(-np.abs(grad_flat))  # descending order by abs(gradient)
top_k_indices = sorted_indices[:K]
print(f"Flipping bits in the top {K} weights by abs(gradient) in first layer query.")

# Flip the chosen exponent bit (e.g., bit 25)
weights_uint = layer_weights_np.view(np.uint32)
weights_uint_flat = weights_uint.flatten()

bit_position = 31
mask = 1 << bit_position

for idx in top_k_indices:
    weights_uint_flat[idx] ^= mask

# Convert back to float32 and update the model
modified_weights_uint = weights_uint_flat.reshape(weights_uint.shape)
modified_weights_fp32 = modified_weights_uint.view(np.float32)

model.roberta.encoder.layer[4].attention.self.query.weight.data.copy_(
    torch.tensor(modified_weights_fp32, dtype=torch.float32, device=layer_grad.device)
)
print("Bit flipping complete. The first layer (query) has been updated.")


In [None]:
device = "cuda"
model.to(device)
preds = []
for code_snippet in df_test['code'].values:
    with torch.no_grad():
        inputs = tokenizer(code_snippet, return_tensors="pt", truncation=True).to(device)
        logits = model(**inputs).logits
        predicted_class_id = logits.argmax().item()
        preds.append(predicted_class_id)

y_true = [id2label[i] for i in df_test['label'].values]
y_pred = [id2label[i] for i in preds]

from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred))


              precision    recall  f1-score   support

        CWE0       0.91      0.97      0.93      9785
      CWE120       0.88      0.81      0.84      1093
      CWE122       0.86      0.88      0.87       286
      CWE190       0.88      0.79      0.83      2228
      CWE369       0.80      0.76      0.78       705
      CWE400       0.92      0.78      0.84      1906
      CWE502       1.00      0.54      0.70        24

    accuracy                           0.90     16027
   macro avg       0.89      0.79      0.83     16027
weighted avg       0.90      0.90      0.89     16027



Layer 6:

In [None]:
layer_grad = model.roberta.encoder.layer[5].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)
layer_grad = model.roberta.encoder.layer[5].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)



Gradient shape: torch.Size([768, 768])
Gradient shape: torch.Size([768, 768])


In [None]:
import numpy as np
import torch

# Retrieve the gradient of the first layer
layer_grad = model.roberta.encoder.layer[5].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)

# Convert to CPU NumPy
grad_np = layer_grad.detach().cpu().numpy()
layer_weights_np = model.roberta.encoder.layer[5].attention.self.query.weight.data.detach().cpu().numpy()

# Flatten and rank
grad_flat = grad_np.flatten()
weights_flat = layer_weights_np.flatten()

K = 50
sorted_indices = np.argsort(-np.abs(grad_flat))  # descending order by abs(gradient)
top_k_indices = sorted_indices[:K]
print(f"Flipping bits in the top {K} weights by abs(gradient) in first layer query.")

# Flip the chosen exponent bit (e.g., bit 25)
weights_uint = layer_weights_np.view(np.uint32)
weights_uint_flat = weights_uint.flatten()

bit_position = 31
mask = 1 << bit_position

for idx in top_k_indices:
    weights_uint_flat[idx] ^= mask

# Convert back to float32 and update the model
modified_weights_uint = weights_uint_flat.reshape(weights_uint.shape)
modified_weights_fp32 = modified_weights_uint.view(np.float32)

model.roberta.encoder.layer[5].attention.self.query.weight.data.copy_(
    torch.tensor(modified_weights_fp32, dtype=torch.float32, device=layer_grad.device)
)
print("Bit flipping complete. The first layer (query) has been updated.")


Gradient shape: torch.Size([768, 768])
Flipping bits in the top 50 weights by abs(gradient) in first layer query.
Bit flipping complete. The first layer (query) has been updated.


In [None]:
device = "cuda"
model.to(device)
preds = []
for code_snippet in df_test['code'].values:
    with torch.no_grad():
        inputs = tokenizer(code_snippet, return_tensors="pt", truncation=True).to(device)
        logits = model(**inputs).logits
        predicted_class_id = logits.argmax().item()
        preds.append(predicted_class_id)

y_true = [id2label[i] for i in df_test['label'].values]
y_pred = [id2label[i] for i in preds]

from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred))


              precision    recall  f1-score   support

        CWE0       0.91      0.95      0.93      9785
      CWE120       0.83      0.82      0.83      1093
      CWE122       0.85      0.86      0.86       286
      CWE190       0.85      0.79      0.82      2228
      CWE369       0.81      0.73      0.77       705
      CWE400       0.89      0.79      0.84      1906
      CWE502       0.88      0.62      0.73        24

    accuracy                           0.89     16027
   macro avg       0.86      0.80      0.82     16027
weighted avg       0.89      0.89      0.89     16027



layer 7:

In [None]:
layer_grad = model.roberta.encoder.layer[6].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)
layer_grad = model.roberta.encoder.layer[6].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)



Gradient shape: torch.Size([768, 768])
Gradient shape: torch.Size([768, 768])


In [None]:
import numpy as np
import torch

# Retrieve the gradient of the first layer
layer_grad = model.roberta.encoder.layer[6].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)

#  Convert to CPU NumPy
grad_np = layer_grad.detach().cpu().numpy()
layer_weights_np = model.roberta.encoder.layer[6].attention.self.query.weight.data.detach().cpu().numpy()

#  Flatten and rank
grad_flat = grad_np.flatten()
weights_flat = layer_weights_np.flatten()

K = 50
sorted_indices = np.argsort(-np.abs(grad_flat))  # descending order by abs(gradient)
top_k_indices = sorted_indices[:K]
print(f"Flipping bits in the top {K} weights by abs(gradient) in first layer query.")

# Flip the chosen exponent bit (e.g., bit 25)
weights_uint = layer_weights_np.view(np.uint32)
weights_uint_flat = weights_uint.flatten()

bit_position = 31
mask = 1 << bit_position

for idx in top_k_indices:
    weights_uint_flat[idx] ^= mask

# Convert back to float32 and update the model
modified_weights_uint = weights_uint_flat.reshape(weights_uint.shape)
modified_weights_fp32 = modified_weights_uint.view(np.float32)

model.roberta.encoder.layer[6].attention.self.query.weight.data.copy_(
    torch.tensor(modified_weights_fp32, dtype=torch.float32, device=layer_grad.device)
)
print("Bit flipping complete. The first layer (query) has been updated.")


Gradient shape: torch.Size([768, 768])
Flipping bits in the top 50 weights by abs(gradient) in first layer query.
Bit flipping complete. The first layer (query) has been updated.


In [None]:
device = "cuda"
model.to(device)
preds = []
for code_snippet in df_test['code'].values:
    with torch.no_grad():
        inputs = tokenizer(code_snippet, return_tensors="pt", truncation=True).to(device)
        logits = model(**inputs).logits
        predicted_class_id = logits.argmax().item()
        preds.append(predicted_class_id)

y_true = [id2label[i] for i in df_test['label'].values]
y_pred = [id2label[i] for i in preds]

from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred))


              precision    recall  f1-score   support

        CWE0       0.91      0.96      0.93      9785
      CWE120       0.86      0.81      0.83      1093
      CWE122       0.87      0.86      0.87       286
      CWE190       0.85      0.78      0.81      2228
      CWE369       0.81      0.73      0.76       705
      CWE400       0.91      0.79      0.84      1906
      CWE502       1.00      0.58      0.74        24

    accuracy                           0.89     16027
   macro avg       0.88      0.79      0.83     16027
weighted avg       0.89      0.89      0.89     16027



layer 8:

In [None]:
layer_grad = model.roberta.encoder.layer[7].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)
layer_grad = model.roberta.encoder.layer[7 ].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)



Gradient shape: torch.Size([768, 768])
Gradient shape: torch.Size([768, 768])


In [None]:
import numpy as np
import torch

# Retrieve the gradient of the first layer
layer_grad = model.roberta.encoder.layer[7].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)

# Convert to CPU NumPy
grad_np = layer_grad.detach().cpu().numpy()
layer_weights_np = model.roberta.encoder.layer[7].attention.self.query.weight.data.detach().cpu().numpy()

# Flatten and rank
grad_flat = grad_np.flatten()
weights_flat = layer_weights_np.flatten()

K = 50
sorted_indices = np.argsort(-np.abs(grad_flat))  # descending order by abs(gradient)
top_k_indices = sorted_indices[:K]
print(f"Flipping bits in the top {K} weights by abs(gradient) in first layer query.")

#  Flip the chosen exponent bit (e.g., bit 25)
weights_uint = layer_weights_np.view(np.uint32)
weights_uint_flat = weights_uint.flatten()

bit_position = 31
mask = 1 << bit_position

for idx in top_k_indices:
    weights_uint_flat[idx] ^= mask

# Convert back to float32 and update the model
modified_weights_uint = weights_uint_flat.reshape(weights_uint.shape)
modified_weights_fp32 = modified_weights_uint.view(np.float32)

model.roberta.encoder.layer[7].attention.self.query.weight.data.copy_(
    torch.tensor(modified_weights_fp32, dtype=torch.float32, device=layer_grad.device)
)
print("Bit flipping complete. The first layer (query) has been updated.")


Gradient shape: torch.Size([768, 768])
Flipping bits in the top 50 weights by abs(gradient) in first layer query.
Bit flipping complete. The first layer (query) has been updated.


In [None]:
device = "cuda"
model.to(device)
preds = []
for code_snippet in df_test['code'].values:
    with torch.no_grad():
        inputs = tokenizer(code_snippet, return_tensors="pt", truncation=True).to(device)
        logits = model(**inputs).logits
        predicted_class_id = logits.argmax().item()
        preds.append(predicted_class_id)

y_true = [id2label[i] for i in df_test['label'].values]
y_pred = [id2label[i] for i in preds]

from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred))


              precision    recall  f1-score   support

        CWE0       0.91      0.95      0.93      9785
      CWE120       0.83      0.83      0.83      1093
      CWE122       0.87      0.87      0.87       286
      CWE190       0.85      0.79      0.82      2228
      CWE369       0.84      0.70      0.76       705
      CWE400       0.89      0.79      0.84      1906
      CWE502       1.00      0.58      0.74        24

    accuracy                           0.89     16027
   macro avg       0.88      0.79      0.83     16027
weighted avg       0.89      0.89      0.89     16027



layer 9:

In [None]:
layer_grad = model.roberta.encoder.layer[8].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)
layer_grad = model.roberta.encoder.layer[8].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)



Gradient shape: torch.Size([768, 768])
Gradient shape: torch.Size([768, 768])


In [None]:
import numpy as np
import torch

# Retrieve the gradient of the first layer
layer_grad = model.roberta.encoder.layer[8].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)

#  Convert to CPU NumPy
grad_np = layer_grad.detach().cpu().numpy()
layer_weights_np = model.roberta.encoder.layer[8].attention.self.query.weight.data.detach().cpu().numpy()

# Flatten and rank
grad_flat = grad_np.flatten()
weights_flat = layer_weights_np.flatten()

K = 50
sorted_indices = np.argsort(-np.abs(grad_flat))  # descending order by abs(gradient)
top_k_indices = sorted_indices[:K]
print(f"Flipping bits in the top {K} weights by abs(gradient) in first layer query.")

# Flip the chosen exponent bit (e.g., bit 25)
weights_uint = layer_weights_np.view(np.uint32)
weights_uint_flat = weights_uint.flatten()

bit_position = 31
mask = 1 << bit_position

for idx in top_k_indices:
    weights_uint_flat[idx] ^= mask

# convert back to float32 and update the model
modified_weights_uint = weights_uint_flat.reshape(weights_uint.shape)
modified_weights_fp32 = modified_weights_uint.view(np.float32)

model.roberta.encoder.layer[8].attention.self.query.weight.data.copy_(
    torch.tensor(modified_weights_fp32, dtype=torch.float32, device=layer_grad.device)
)
print("Bit flipping complete. The first layer (query) has been updated.")


Gradient shape: torch.Size([768, 768])
Flipping bits in the top 50 weights by abs(gradient) in first layer query.
Bit flipping complete. The first layer (query) has been updated.


In [None]:
device = "cuda"
model.to(device)
preds = []
for code_snippet in df_test['code'].values:
    with torch.no_grad():
        inputs = tokenizer(code_snippet, return_tensors="pt", truncation=True).to(device)
        logits = model(**inputs).logits
        predicted_class_id = logits.argmax().item()
        preds.append(predicted_class_id)

y_true = [id2label[i] for i in df_test['label'].values]
y_pred = [id2label[i] for i in preds]

from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred))


              precision    recall  f1-score   support

        CWE0       0.91      0.94      0.93      9785
      CWE120       0.77      0.85      0.80      1093
      CWE122       0.86      0.86      0.86       286
      CWE190       0.82      0.81      0.81      2228
      CWE369       0.84      0.69      0.76       705
      CWE400       0.90      0.78      0.84      1906
      CWE502       0.88      0.58      0.70        24

    accuracy                           0.88     16027
   macro avg       0.85      0.79      0.81     16027
weighted avg       0.88      0.88      0.88     16027



layer 10:

In [None]:
layer_grad = model.roberta.encoder.layer[9].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)
layer_grad = model.roberta.encoder.layer[9].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)



Gradient shape: torch.Size([768, 768])
Gradient shape: torch.Size([768, 768])


In [None]:
import numpy as np
import torch

# Retrieve the gradient of the first layer
layer_grad = model.roberta.encoder.layer[9].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)

# Convert to CPU NumPy
grad_np = layer_grad.detach().cpu().numpy()
layer_weights_np = model.roberta.encoder.layer[9].attention.self.query.weight.data.detach().cpu().numpy()

#  Flatten and rank
grad_flat = grad_np.flatten()
weights_flat = layer_weights_np.flatten()

K = 50
sorted_indices = np.argsort(-np.abs(grad_flat))  # descending order by abs(gradient)
top_k_indices = sorted_indices[:K]
print(f"Flipping bits in the top {K} weights by abs(gradient) in first layer query.")

# Flip the chosen exponent bit (e.g., bit 25)
weights_uint = layer_weights_np.view(np.uint32)
weights_uint_flat = weights_uint.flatten()

bit_position = 31
mask = 1 << bit_position

for idx in top_k_indices:
    weights_uint_flat[idx] ^= mask

# Convert back to float32 and update the model
modified_weights_uint = weights_uint_flat.reshape(weights_uint.shape)
modified_weights_fp32 = modified_weights_uint.view(np.float32)

model.roberta.encoder.layer[9].attention.self.query.weight.data.copy_(
    torch.tensor(modified_weights_fp32, dtype=torch.float32, device=layer_grad.device)
)
print("Bit flipping complete. The first layer (query) has been updated.")


Gradient shape: torch.Size([768, 768])
Flipping bits in the top 50 weights by abs(gradient) in first layer query.
Bit flipping complete. The first layer (query) has been updated.


In [None]:
device = "cuda"
model.to(device)
preds = []
for code_snippet in df_test['code'].values:
    with torch.no_grad():
        inputs = tokenizer(code_snippet, return_tensors="pt", truncation=True).to(device)
        logits = model(**inputs).logits
        predicted_class_id = logits.argmax().item()
        preds.append(predicted_class_id)

y_true = [id2label[i] for i in df_test['label'].values]
y_pred = [id2label[i] for i in preds]

from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred))


              precision    recall  f1-score   support

        CWE0       0.91      0.94      0.93      9785
      CWE120       0.77      0.84      0.80      1093
      CWE122       0.86      0.86      0.86       286
      CWE190       0.82      0.82      0.82      2228
      CWE369       0.84      0.68      0.75       705
      CWE400       0.89      0.78      0.83      1906
      CWE502       0.94      0.62      0.75        24

    accuracy                           0.88     16027
   macro avg       0.86      0.79      0.82     16027
weighted avg       0.88      0.88      0.88     16027



layer 11:

In [None]:
layer_grad = model.roberta.encoder.layer[10].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)
layer_grad = model.roberta.encoder.layer[10].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)



Gradient shape: torch.Size([768, 768])
Gradient shape: torch.Size([768, 768])


In [None]:
import numpy as np
import torch

# Retrieve the gradient of the first layer
layer_grad = model.roberta.encoder.layer[10].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)

# Convert to CPU NumPy
grad_np = layer_grad.detach().cpu().numpy()
layer_weights_np = model.roberta.encoder.layer[10].attention.self.query.weight.data.detach().cpu().numpy()

# Flatten and rank
grad_flat = grad_np.flatten()
weights_flat = layer_weights_np.flatten()

K = 50
sorted_indices = np.argsort(-np.abs(grad_flat))  # descending order by abs(gradient)
top_k_indices = sorted_indices[:K]
print(f"Flipping bits in the top {K} weights by abs(gradient) in first layer query.")

#Flip the chosen exponent bit (e.g., bit 25)
weights_uint = layer_weights_np.view(np.uint32)
weights_uint_flat = weights_uint.flatten()

bit_position = 31
mask = 1 << bit_position

for idx in top_k_indices:
    weights_uint_flat[idx] ^= mask

#  Convert back to float32 and update the model
modified_weights_uint = weights_uint_flat.reshape(weights_uint.shape)
modified_weights_fp32 = modified_weights_uint.view(np.float32)

model.roberta.encoder.layer[10].attention.self.query.weight.data.copy_(
    torch.tensor(modified_weights_fp32, dtype=torch.float32, device=layer_grad.device)
)
print("Bit flipping complete. The first layer (query) has been updated.")


Gradient shape: torch.Size([768, 768])
Flipping bits in the top 50 weights by abs(gradient) in first layer query.
Bit flipping complete. The first layer (query) has been updated.


In [None]:
device = "cuda"
model.to(device)
preds = []
for code_snippet in df_test['code'].values:
    with torch.no_grad():
        inputs = tokenizer(code_snippet, return_tensors="pt", truncation=True).to(device)
        logits = model(**inputs).logits
        predicted_class_id = logits.argmax().item()
        preds.append(predicted_class_id)

y_true = [id2label[i] for i in df_test['label'].values]
y_pred = [id2label[i] for i in preds]

from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred))


              precision    recall  f1-score   support

        CWE0       0.92      0.94      0.93      9785
      CWE120       0.76      0.84      0.80      1093
      CWE122       0.86      0.86      0.86       286
      CWE190       0.82      0.82      0.82      2228
      CWE369       0.83      0.67      0.74       705
      CWE400       0.90      0.78      0.84      1906
      CWE502       0.94      0.62      0.75        24

    accuracy                           0.88     16027
   macro avg       0.86      0.79      0.82     16027
weighted avg       0.88      0.88      0.88     16027



layer 12:

In [None]:
layer_grad = model.roberta.encoder.layer[11].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)
layer_grad = model.roberta.encoder.layer[11].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)



Gradient shape: torch.Size([768, 768])
Gradient shape: torch.Size([768, 768])


In [None]:
import numpy as np
import torch

# Retrieve the gradient of the first layer
layer_grad = model.roberta.encoder.layer[11].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)

# Convert to CPU NumPy
grad_np = layer_grad.detach().cpu().numpy()
layer_weights_np = model.roberta.encoder.layer[11].attention.self.query.weight.data.detach().cpu().numpy()

# Flatten and rank
grad_flat = grad_np.flatten()
weights_flat = layer_weights_np.flatten()

K = 50
sorted_indices = np.argsort(-np.abs(grad_flat))  # descending order by abs(gradient)
top_k_indices = sorted_indices[:K]
print(f"Flipping bits in the top {K} weights by abs(gradient) in first layer query.")

#  Flip the chosen exponent bit (e.g., bit 25)
weights_uint = layer_weights_np.view(np.uint32)
weights_uint_flat = weights_uint.flatten()

bit_position = 31
mask = 1 << bit_position

for idx in top_k_indices:
    weights_uint_flat[idx] ^= mask

# Convert back to float32 and update the model
modified_weights_uint = weights_uint_flat.reshape(weights_uint.shape)
modified_weights_fp32 = modified_weights_uint.view(np.float32)

model.roberta.encoder.layer[11].attention.self.query.weight.data.copy_(
    torch.tensor(modified_weights_fp32, dtype=torch.float32, device=layer_grad.device)
)
print("Bit flipping complete. The first layer (query) has been updated.")


Gradient shape: torch.Size([768, 768])
Flipping bits in the top 50 weights by abs(gradient) in first layer query.
Bit flipping complete. The first layer (query) has been updated.


In [None]:
device = "cuda"
model.to(device)
preds = []
for code_snippet in df_test['code'].values:
    with torch.no_grad():
        inputs = tokenizer(code_snippet, return_tensors="pt", truncation=True).to(device)
        logits = model(**inputs).logits
        predicted_class_id = logits.argmax().item()
        preds.append(predicted_class_id)

y_true = [id2label[i] for i in df_test['label'].values]
y_pred = [id2label[i] for i in preds]

from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred))


              precision    recall  f1-score   support

        CWE0       0.91      0.94      0.93      9785
      CWE120       0.77      0.84      0.80      1093
      CWE122       0.87      0.87      0.87       286
      CWE190       0.82      0.81      0.82      2228
      CWE369       0.84      0.68      0.75       705
      CWE400       0.90      0.78      0.84      1906
      CWE502       1.00      0.58      0.74        24

    accuracy                           0.88     16027
   macro avg       0.87      0.79      0.82     16027
weighted avg       0.88      0.88      0.88     16027



layer 13:

now 15

In [None]:
import numpy as np
import torch

# 1. Retrieve the gradient of the first layer (layer[0].attention.self.query.weight)
layer_grad = model.roberta.encoder.layer[0].attention.self.query.weight.grad
print("Gradient shape:", layer_grad.shape)

# 2. Convert the layer's weights & gradient to NumPy
grad_np = layer_grad.detach().numpy()        # float32 array, shape depends on hidden_size
layer_weights_np = model.roberta.encoder.layer[0].attention.self.query.weight.data.detach().numpy()

# 3. Rank weights by abs(gradient)
grad_flat = grad_np.flatten()
weights_flat = layer_weights_np.flatten()

K = 15  # Number of weights to flip
sorted_indices = np.argsort(-np.abs(grad_flat))  # descending sort by absolute gradient
top_k_indices = sorted_indices[:K]
print(f"Flipping bits in the top {K} weights by abs(gradient) in first layer query.")

# 4. Convert float32 -> uint32 and flip a specific bit (e.g., exponent bit 25)
weights_uint = layer_weights_np.view(np.uint32)
weights_uint_flat = weights_uint.flatten()

bit_position = 25
mask = 1 << bit_position

for idx in top_k_indices:
    weights_uint_flat[idx] ^= mask

# 5. Convert back to float32 and update the model
modified_weights_uint = weights_uint_flat.reshape(weights_uint.shape)
modified_weights_fp32 = modified_weights_uint.view(np.float32)

model.roberta.encoder.layer[0].attention.self.query.weight.data.copy_(
    torch.tensor(modified_weights_fp32, dtype=torch.float32)
)
print("Bit flipping complete. The first layer (query) has been updated.")


In [None]:
device = "cuda"
model.to(device)
preds = []
for code_snippet in df_test['code'].values:
    with torch.no_grad():
        inputs = tokenizer(code_snippet, return_tensors="pt", truncation=True).to(device)
        logits = model(**inputs).logits
        predicted_class_id = logits.argmax().item()
        preds.append(predicted_class_id)

y_true = [id2label[i] for i in df_test['label'].values]
y_pred = [id2label[i] for i in preds]

from sklearn.metrics import classification_report
print(classification_report(y_true, y_pred))


above, we ensured the batch size is correct size and every label is represented in it.

### inspect model structure