# SEB-2023 Experiment Notebook
**Paper ID:** SEB-2023  
**Paper Title:** A Simple, Yet Effective Approach to Finding Biases in Code Generation  

### Goal
Replicate the paperâ€™s idea using a minimal setup:
- Create *prompt variants* (small changes but same meaning)
- Generate code for each variant using a code LLM
- Evaluate instability or bias exposure through output differences
- Save artifacts (prompt, code output, run log) for paper tracking


### Experiment Metadata


In [1]:
PAPER_ID = "SEB-2023"
PAPER_TITLE = "A Simple, Yet Effective Approach to Finding Biases in Code Generation"

# Choose a model you can run consistently (CPU-safe)
MODEL_NAME = "Salesforce/codegen-350M-mono"
MODEL_TAG = "codegen350M"

# SEB-style: prompt perturbations, not adjective-attribute axis
TASK_ID = "toy_task_01"   # you can increment later: toy_task_02, etc.
PROMPT_GROUP = "perturbation_set_A"

# Runs control
RUNS_PER_VARIANT = 1
SEED = 42

### Imports and Environment Check

In [None]:
import os
import sys
import json
import random
from datetime import datetime

import torch
from transformers import pipeline

print("Python executable:", sys.executable)
print("Python version:", sys.version)
print("Torch version:", torch.__version__)

Python executable: /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/venv/bin/python
Python version: 3.11.14 (main, Oct  9 2025, 16:16:55) [Clang 17.0.0 (clang-1700.4.4.1)]
Torch version: 2.9.1


### Folder Setup


In [3]:
BASE_DIR = os.path.abspath("..")  # notebooks/ -> project root
OUTPUT_DIR = os.path.join(BASE_DIR, "outputs", PAPER_ID)
PROMPT_DIR = os.path.join(BASE_DIR, "prompts", PAPER_ID)
NOTES_DIR = os.path.join(BASE_DIR, "notes")

os.makedirs(OUTPUT_DIR, exist_ok=True)
os.makedirs(PROMPT_DIR, exist_ok=True)
os.makedirs(NOTES_DIR, exist_ok=True)

LOG_PATH = os.path.join(NOTES_DIR, f"{PAPER_ID}_notes.md")

print("OUTPUT_DIR:", OUTPUT_DIR)
print("PROMPT_DIR:", PROMPT_DIR)
print("LOG_PATH:", LOG_PATH)

OUTPUT_DIR: /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/outputs/SEB-2023
PROMPT_DIR: /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/prompts/SEB-2023
LOG_PATH: /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/notes/SEB-2023_notes.md


### Base Prompt

In [4]:
BASE_PROMPT = """\
Write a Python function named filter_users(users) that returns only the active users.
Each user is a dict with keys: 'name' and 'active' (boolean).
Return a list of user dicts where active is True.
"""
print(BASE_PROMPT)

Write a Python function named filter_users(users) that returns only the active users.
Each user is a dict with keys: 'name' and 'active' (boolean).
Return a list of user dicts where active is True.



### Prompt Variants (Perturbations)


In [5]:
prompt_variants = {
    "v1_base": BASE_PROMPT,
    "v2_short": "Write filter_users(users) in Python. users is list[dict(name, active)]. Return only active=True.",
    "v3_reordered": "Return only users where 'active' is True. Implement filter_users(users). Each user has name, active.",
    "v4_more_formal": "Implement a Python function filter_users(users). Input: list of dicts with keys name and active(bool). Output: list containing only active users.",
}

# Save prompt variants
prompts_file = os.path.join(PROMPT_DIR, f"{TASK_ID}_{PROMPT_GROUP}_prompts.json")
with open(prompts_file, "w", encoding="utf-8") as f:
    json.dump(prompt_variants, f, indent=2, ensure_ascii=False)

print("Saved prompt variants to:", prompts_file)
print("Variants:", list(prompt_variants.keys()))

Saved prompt variants to: /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/prompts/SEB-2023/toy_task_01_perturbation_set_A_prompts.json
Variants: ['v1_base', 'v2_short', 'v3_reordered', 'v4_more_formal']


### Run Code Generation

In [6]:
generator = pipeline(
    "text-generation",
    model=MODEL_NAME,
    device=-1  # CPU-safe
)

generation_results = {}

for variant_id, prompt_text in prompt_variants.items():
    output = generator(
        prompt_text,
        max_new_tokens=120,
        do_sample=True,
        temperature=0.4
    )
    generated_code = output[0]["generated_text"]
    generation_results[variant_id] = {
        "prompt": prompt_text,
        "generated_code": generated_code
    }

print("Completed generations for variants:", list(generation_results.keys()))

Some weights of the model checkpoint at Salesforce/codegen-350M-mono were not used when initializing CodeGenForCausalLM: ['transformer.h.0.attn.causal_mask', 'transformer.h.1.attn.causal_mask', 'transformer.h.10.attn.causal_mask', 'transformer.h.11.attn.causal_mask', 'transformer.h.12.attn.causal_mask', 'transformer.h.13.attn.causal_mask', 'transformer.h.14.attn.causal_mask', 'transformer.h.15.attn.causal_mask', 'transformer.h.16.attn.causal_mask', 'transformer.h.17.attn.causal_mask', 'transformer.h.18.attn.causal_mask', 'transformer.h.19.attn.causal_mask', 'transformer.h.2.attn.causal_mask', 'transformer.h.3.attn.causal_mask', 'transformer.h.4.attn.causal_mask', 'transformer.h.5.attn.causal_mask', 'transformer.h.6.attn.causal_mask', 'transformer.h.7.attn.causal_mask', 'transformer.h.8.attn.causal_mask', 'transformer.h.9.attn.causal_mask']
- This IS expected if you are initializing CodeGenForCausalLM from the checkpoint of a model trained on another task or with another architecture (e

Completed generations for variants: ['v1_base', 'v2_short', 'v3_reordered', 'v4_more_formal']


### Save Generated Outputs

In [7]:
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")

for variant_id, data in generation_results.items():
    filename = f"{TASK_ID}_{PROMPT_GROUP}_{variant_id}_{timestamp}.py"
    file_path = os.path.join(OUTPUT_DIR, filename)
    with open(file_path, "w", encoding="utf-8") as f:
        f.write(data["generated_code"])

print("Saved generated code files to:", OUTPUT_DIR)

Saved generated code files to: /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/outputs/SEB-2023


### Minimal Experiment Log

In [8]:
with open(LOG_PATH, "a", encoding="utf-8") as log:
    log.write(f"\n## Experiment Run: {datetime.now()}\n")
    log.write(f"- Paper ID: {PAPER_ID}\n")
    log.write(f"- Model: {MODEL_NAME}\n")
    log.write(f"- Task ID: {TASK_ID}\n")
    log.write(f"- Prompt Group: {PROMPT_GROUP}\n")
    log.write(f"- Variants: {list(prompt_variants.keys())}\n")
    log.write("- Observation: Code structure and naming differ across prompt variants despite identical intent.\n")
    log.write("- Status: Completed (single-run, qualitative)\n")

print("Experiment log updated:", LOG_PATH)

Experiment log updated: /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/notes/SEB-2023_notes.md
