# MGB-2024 Experiment Notebook

**Paper ID:** MGB-2024  
**Paper Title:** Mitigating Gender Bias in Code Large Language Models via Model Editing  

### Goal

Replicate the MG-Editing (Multi-Granularity Model Editing) probe using CodeGenBias templates:
- Load template probes with generic modifiers and professions.
- Generate code completions.
- Evaluate using FB-Score (Factual Bias Score) to align with real-world gender distributions.


### Experiment Metadata

In [1]:
PAPER_ID = "MGB-2024"
PAPER_TITLE = "Mitigating Gender Bias in Code Large Language Models via Model Editing"

MODEL_NAME = "Salesforce/codegen-350M-mono"
MODEL_TAG = "codegen350M"

DOMAIN = "Gender Bias Mitigation / Model Editing"

SENSITIVE_ATTRS = ["gender"]

MAX_NEW_TOKENS = 40
TEMPERATURE = 0.3
DO_SAMPLE = True

### Imports and Environment Check

In [2]:
import os
import json
from datetime import datetime
from pathlib import Path
from transformers import pipeline, set_seed

def check_pkg(name):
    try:
        __import__(name)
        return True
    except Exception as e:
        return f"Missing or error: {e}"

checks = {
    "torch": check_pkg("torch"),
    "transformers": check_pkg("transformers"),
}

checks

{'torch': True, 'transformers': True}

### Create Experiment Folders and Initialize Log

In [3]:
import csv
PHASE = "baseline"
TIMESTAMP = datetime.now().strftime("%Y%m%d_%H%M%S")
RUN_ID = f"{PAPER_ID}_{MODEL_TAG}_{PHASE}_{TIMESTAMP}"

ROOT = Path.cwd().parent if Path.cwd().name == "notebooks" else Path.cwd()
PROMPT_DIR = ROOT / "prompts" / PAPER_ID
PAPER_OUTPUT_ROOT = ROOT / "outputs" / PAPER_ID
RUN_DIR = PAPER_OUTPUT_ROOT / PHASE / "runs" / RUN_ID

GEN_DIR = RUN_DIR / "generated"
AST_DIR = RUN_DIR / "ast_extract"
TEST_DIR = RUN_DIR / "tests_generated"
METRICS_DIR = PAPER_OUTPUT_ROOT / "metrics"
MANIFEST_DIR = PAPER_OUTPUT_ROOT / "manifests"
NOTES_DIR = ROOT / "notes"

for d in [GEN_DIR, AST_DIR, TEST_DIR, METRICS_DIR, MANIFEST_DIR, NOTES_DIR]:
    d.mkdir(parents=True, exist_ok=True)

def update_manifests(run_id, paper_id, phase, model, timestamp, gen_path, metrics_path):
    entry = {
        "Run_ID": run_id, "Paper_ID": paper_id, "Phase": phase, 
        "Model": model, "Timestamp": timestamp, 
        "Generated_Path": str(gen_path), "Metrics_Path": str(metrics_path)
    }
    p_manifest = MANIFEST_DIR / "run_manifest.csv"
    write_header = not p_manifest.exists()
    with open(p_manifest, "a", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=entry.keys())
        if write_header: writer.writeheader()
        writer.writerow(entry)
    g_manifest = ROOT / "outputs" / "run_manifest_all.csv"
    with open(g_manifest, "a", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=entry.keys())
        writer.writerow(entry)

LOG_PATH = NOTES_DIR / f"{PAPER_ID}_notes.md"
print(f"Run ID: {RUN_ID}")
print(f"Experiment structure ready for {PAPER_ID}")


Run ID: MGB-2024_codegen350M_baseline_20260219_175703
Experiment structure ready for MGB-2024


### Load Template Probes

In [4]:
PROMPT_FILE = PROMPT_DIR / f"{PAPER_ID}_prompts.json"
with open(PROMPT_FILE, "r") as f:
    data = json.load(f)

template = data["templates"][0]["text"]
modifiers = data["modifiers"]
professions = data["professions"]
print(f"Loaded template, {len(modifiers)} modifiers, and {len(professions)} professions.")

Loaded template, 4 modifiers, and 5 professions.


### Load Model Pipeline

In [5]:
generator = pipeline("text-generation", model=MODEL_NAME, device=-1)
print("Model loaded successfully.")

Some weights of the model checkpoint at Salesforce/codegen-350M-mono were not used when initializing CodeGenForCausalLM: ['transformer.h.0.attn.causal_mask', 'transformer.h.1.attn.causal_mask', 'transformer.h.10.attn.causal_mask', 'transformer.h.11.attn.causal_mask', 'transformer.h.12.attn.causal_mask', 'transformer.h.13.attn.causal_mask', 'transformer.h.14.attn.causal_mask', 'transformer.h.15.attn.causal_mask', 'transformer.h.16.attn.causal_mask', 'transformer.h.17.attn.causal_mask', 'transformer.h.18.attn.causal_mask', 'transformer.h.19.attn.causal_mask', 'transformer.h.2.attn.causal_mask', 'transformer.h.3.attn.causal_mask', 'transformer.h.4.attn.causal_mask', 'transformer.h.5.attn.causal_mask', 'transformer.h.6.attn.causal_mask', 'transformer.h.7.attn.causal_mask', 'transformer.h.8.attn.causal_mask', 'transformer.h.9.attn.causal_mask']
- This IS expected if you are initializing CodeGenForCausalLM from the checkpoint of a model trained on another task or with another architecture (e

Model loaded successfully.


### Run Template Completions

In [6]:
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")

for mod in modifiers:
    for prof in professions:
        prompt_text = template.replace("[MODIFIER]", mod).replace("[PROFESSION]", prof)
        out = generator(prompt_text, max_new_tokens=MAX_NEW_TOKENS, temperature=TEMPERATURE, do_sample=DO_SAMPLE)
        generated_text = out[0]["generated_text"]
        
        out_file = GEN_DIR / f"{PAPER_ID}_{MODEL_TAG}_{mod}_{prof}_{timestamp}.py"
        with open(out_file, "w") as f:
            f.write(generated_text)
            
        print(f"Saved {mod}_{prof} output to {out_file}")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Saved best_nurse output to /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/outputs/MGB-2024/baseline/runs/MGB-2024_codegen350M_baseline_20260219_175703/generated/MGB-2024_codegen350M_best_nurse_20260219_175705.py


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Saved best_engineer output to /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/outputs/MGB-2024/baseline/runs/MGB-2024_codegen350M_baseline_20260219_175703/generated/MGB-2024_codegen350M_best_engineer_20260219_175705.py


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Saved best_doctor output to /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/outputs/MGB-2024/baseline/runs/MGB-2024_codegen350M_baseline_20260219_175703/generated/MGB-2024_codegen350M_best_doctor_20260219_175705.py


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Saved best_teacher output to /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/outputs/MGB-2024/baseline/runs/MGB-2024_codegen350M_baseline_20260219_175703/generated/MGB-2024_codegen350M_best_teacher_20260219_175705.py


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Saved best_senator output to /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/outputs/MGB-2024/baseline/runs/MGB-2024_codegen350M_baseline_20260219_175703/generated/MGB-2024_codegen350M_best_senator_20260219_175705.py


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Saved sick_nurse output to /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/outputs/MGB-2024/baseline/runs/MGB-2024_codegen350M_baseline_20260219_175703/generated/MGB-2024_codegen350M_sick_nurse_20260219_175705.py


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Saved sick_engineer output to /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/outputs/MGB-2024/baseline/runs/MGB-2024_codegen350M_baseline_20260219_175703/generated/MGB-2024_codegen350M_sick_engineer_20260219_175705.py


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Saved sick_doctor output to /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/outputs/MGB-2024/baseline/runs/MGB-2024_codegen350M_baseline_20260219_175703/generated/MGB-2024_codegen350M_sick_doctor_20260219_175705.py


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Saved sick_teacher output to /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/outputs/MGB-2024/baseline/runs/MGB-2024_codegen350M_baseline_20260219_175703/generated/MGB-2024_codegen350M_sick_teacher_20260219_175705.py


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Saved sick_senator output to /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/outputs/MGB-2024/baseline/runs/MGB-2024_codegen350M_baseline_20260219_175703/generated/MGB-2024_codegen350M_sick_senator_20260219_175705.py


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Saved reliable_nurse output to /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/outputs/MGB-2024/baseline/runs/MGB-2024_codegen350M_baseline_20260219_175703/generated/MGB-2024_codegen350M_reliable_nurse_20260219_175705.py


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Saved reliable_engineer output to /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/outputs/MGB-2024/baseline/runs/MGB-2024_codegen350M_baseline_20260219_175703/generated/MGB-2024_codegen350M_reliable_engineer_20260219_175705.py


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Saved reliable_doctor output to /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/outputs/MGB-2024/baseline/runs/MGB-2024_codegen350M_baseline_20260219_175703/generated/MGB-2024_codegen350M_reliable_doctor_20260219_175705.py


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Saved reliable_teacher output to /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/outputs/MGB-2024/baseline/runs/MGB-2024_codegen350M_baseline_20260219_175703/generated/MGB-2024_codegen350M_reliable_teacher_20260219_175705.py


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Saved reliable_senator output to /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/outputs/MGB-2024/baseline/runs/MGB-2024_codegen350M_baseline_20260219_175703/generated/MGB-2024_codegen350M_reliable_senator_20260219_175705.py


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Saved unreliable_nurse output to /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/outputs/MGB-2024/baseline/runs/MGB-2024_codegen350M_baseline_20260219_175703/generated/MGB-2024_codegen350M_unreliable_nurse_20260219_175705.py


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Saved unreliable_engineer output to /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/outputs/MGB-2024/baseline/runs/MGB-2024_codegen350M_baseline_20260219_175703/generated/MGB-2024_codegen350M_unreliable_engineer_20260219_175705.py


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Saved unreliable_doctor output to /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/outputs/MGB-2024/baseline/runs/MGB-2024_codegen350M_baseline_20260219_175703/generated/MGB-2024_codegen350M_unreliable_doctor_20260219_175705.py


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Saved unreliable_teacher output to /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/outputs/MGB-2024/baseline/runs/MGB-2024_codegen350M_baseline_20260219_175703/generated/MGB-2024_codegen350M_unreliable_teacher_20260219_175705.py
Saved unreliable_senator output to /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/outputs/MGB-2024/baseline/runs/MGB-2024_codegen350M_baseline_20260219_175703/generated/MGB-2024_codegen350M_unreliable_senator_20260219_175705.py


### Update Notes with Findings

In [7]:
with open(LOG_PATH, "a", encoding="utf-8") as log:
    log.write(f"\n## Experiment Run: {datetime.now().isoformat()}\n")
    log.write(f"- Status: Template-based gender bias probes complete.\n")
    log.write(f"- Outputs: {RUN_DIR}\n")
print(f"Notes updated at {LOG_PATH}")

update_manifests(
    run_id=RUN_ID,
    paper_id=PAPER_ID,
    phase=PHASE,
    model=MODEL_TAG,
    timestamp=TIMESTAMP,
    gen_path=GEN_DIR.relative_to(ROOT),
    metrics_path=METRICS_DIR.relative_to(ROOT)
)



Notes updated at /Users/dhrubadatta/Documents/Research/CodeAudit X/Codes/notes/MGB-2024_notes.md
