# SEB-2023 Experiment Notebook

**Paper ID:** SEB-2023  
**Paper Title:** A Simple, Yet Effective Approach to Finding Biases in Code Generation  

### Goal

Replicate the paperâ€™s stability-based bias detection methodology:
- Construct prompt perturbations (variants with identical intent but different phrasing).
- Generate code completions for each variant.
- Observe structural and logical consistency across variants to identify model instability/bias.


### Experiment Metadata

In [1]:
PAPER_ID = "SEB-2023"
PAPER_TITLE = "A Simple, Yet Effective Approach to Finding Biases in Code Generation"

MODEL_NAME = "Salesforce/codegen-350M-mono"
MODEL_TAG = "codegen350M"

DOMAIN = "Stability Auditing / Prompt Perturbations"

TASK_ID = "check_even"
PROMPT_GROUP = "base_perturbations"

MAX_NEW_TOKENS = 100
TEMPERATURE = 0.2
DO_SAMPLE = False

### Imports and Environment Check

In [2]:
import os
import json
from datetime import datetime
from pathlib import Path
from transformers import pipeline, set_seed

def check_pkg(name):
    try:
        __import__(name)
        return True
    except Exception as e:
        return f"Missing or error: {e}"

checks = {
    "torch": check_pkg("torch"),
    "transformers": check_pkg("transformers"),
}

checks

{'torch': True, 'transformers': True}

### Create Experiment Folders and Initialize Log

In [3]:
ROOT = Path.cwd().parent if Path.cwd().name == "notebooks" else Path.cwd()

NOTES_DIR  = ROOT / "notes"
OUTPUT_DIR = ROOT / "outputs" / PAPER_ID

for d in [NOTES_DIR, OUTPUT_DIR]:
    d.mkdir(parents=True, exist_ok=True)

LOG_PATH = NOTES_DIR / f"{PAPER_ID}_notes.md"
print(f"Experiment structure ready for {PAPER_ID}")

Experiment structure ready for SEB-2023


### Define Perturbations

In [4]:
VARIANTS = [
    {"id": "v1_base", "text": "def is_even(n):\n    \"\"\"Check if a number is even.\"\"\"\n"},
    {"id": "v2_short", "text": "def is_even(n):\n"},
    {"id": "v3_formal", "text": "Write a Python function to check if an integer is even.\ndef is_even(n):\n"},
    {"id": "v4_implied", "text": "def is_even(num):\n    # return true if num is even\n"}
]

### Load Model Pipeline

In [5]:
generator = pipeline("text-generation", model=MODEL_NAME, device=-1)
print("Model loaded successfully.")

Some weights of the model checkpoint at Salesforce/codegen-350M-mono were not used when initializing CodeGenForCausalLM: ['transformer.h.0.attn.causal_mask', 'transformer.h.1.attn.causal_mask', 'transformer.h.10.attn.causal_mask', 'transformer.h.11.attn.causal_mask', 'transformer.h.12.attn.causal_mask', 'transformer.h.13.attn.causal_mask', 'transformer.h.14.attn.causal_mask', 'transformer.h.15.attn.causal_mask', 'transformer.h.16.attn.causal_mask', 'transformer.h.17.attn.causal_mask', 'transformer.h.18.attn.causal_mask', 'transformer.h.19.attn.causal_mask', 'transformer.h.2.attn.causal_mask', 'transformer.h.3.attn.causal_mask', 'transformer.h.4.attn.causal_mask', 'transformer.h.5.attn.causal_mask', 'transformer.h.6.attn.causal_mask', 'transformer.h.7.attn.causal_mask', 'transformer.h.8.attn.causal_mask', 'transformer.h.9.attn.causal_mask']
- This IS expected if you are initializing CodeGenForCausalLM from the checkpoint of a model trained on another task or with another architecture (e

Model loaded successfully.


### Run Stability Generations

In [6]:
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")

for v in VARIANTS:
    out = generator(v["text"], max_new_tokens=MAX_NEW_TOKENS, temperature=TEMPERATURE, do_sample=False)
    generated_code = out[0]["generated_text"]
    
    out_file = OUTPUT_DIR / f"{PAPER_ID}_{MODEL_TAG}_{TASK_ID}_{v['id']}_{timestamp}.py"
    with open(out_file, "w") as f:
        f.write(generated_code)
    
    print(f"Completed variant {v['id']}. Result saved to {out_file}")

The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Completed variant v1_base. Result saved to /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/outputs/SEB-2023/SEB-2023_codegen350M_check_even_v1_base_20260219_172811.py


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Completed variant v2_short. Result saved to /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/outputs/SEB-2023/SEB-2023_codegen350M_check_even_v2_short_20260219_172811.py


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Completed variant v3_formal. Result saved to /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/outputs/SEB-2023/SEB-2023_codegen350M_check_even_v3_formal_20260219_172811.py
Completed variant v4_implied. Result saved to /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/outputs/SEB-2023/SEB-2023_codegen350M_check_even_v4_implied_20260219_172811.py


### Update Notes with Findings

In [7]:
with open(LOG_PATH, "a", encoding="utf-8") as log:
    log.write(f"\n## Experiment Run: {datetime.now().isoformat()}\n")
    log.write(f"- Status: Stability perturbation audit complete.\n")
    log.write(f"- Outputs: {OUTPUT_DIR}\n")
print(f"Notes updated at {LOG_PATH}")

Notes updated at /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/notes/SEB-2023_notes.md
