# Self-Consistency Prompting Algorithm

* Install + Auth

* Config (edit these)

* Imports, Weights, Small Helpers

* Prompt Builders

* HF Client + Retry Wrapper

* Generate Synthetic Interviews

* K Scoring Runs (Self-Consistency @ 0.7)

* Aggregate Medians + IQR Confidence + Overall Weighted

* Deterministic Rewrite (Locked Scores @ 0.0)

* Save CSVs (Workspace or Drive)

## Install + Auth
**Purpose**: Install dependencies (`huggingface_hub`, `pandas`, `pyyaml`) and authenticate to Hugging Face so you can call hosted models.

**Inputs**: None (you’ll paste your HF token when prompted).

**Outputs**: Session-level auth; packages available for the rest of the notebook.

In [1]:
!pip -q install huggingface_hub pandas pyyaml

In [2]:
import os
from getpass import getpass

# If not already set in the environment, paste your HF token here
if "HF_TOKEN" not in os.environ or not os.environ["HF_TOKEN"]:
    os.environ["HF_TOKEN"] = getpass("Paste your Hugging Face token (starts with hf_...): ")


Paste your Hugging Face token (starts with hf_...): ··········


In [3]:
import pandas as pd
import numpy as np
import yaml
from huggingface_hub import login

In [3]:
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Configuration
**Purpose**: Central place to set role, question set, dataset size (`N`), self-consistency samples (`K`), model IDs, and generation params.

**Inputs**: You edit the `CONFIG` dict values.

**Outputs**: `CONFIG` used by all later cells.

In [4]:
CONFIG = {
    "role_title": "Field Technician",
    "question_set_id": "qs_v1",
    "question_set": [
        "Tell me about a time you handled an urgent service call. What steps did you take?",
        "How do you plan your route and prioritize jobs when schedules change during the day?",
        "Describe a tricky diagnostic you solved. What tools or methods did you use?",
        "How do you keep customers calm when they are upset or stressed?",
        "Walk me through your process for documenting work and updating tickets.",
        "What does reliability at work mean to you, and how do you demonstrate it?",
        "How do you stay safe on the job and follow site-specific rules?",
        "How do you collaborate with teammates or escalate when blocked?"
    ],
    "num_candidates": 50,          # bump later if needed
    "k_samples": 3,                # self-consistency K
    "generation_temperature": 0.8,
    "generation_max_tokens": 1200,
    "timeout_s": 60,
    "gen_prompt_version": "gen_v1",
    "score_prompt_version": "score_v1",
    "rewrite_prompt_version": "rewrite_v1",
}
# Choose ONE routed chat model (provider suffix matters for billing/availability)
# Examples:
# MODEL_ID = "meta-llama/Llama-3.1-8B-Instruct:novita"
# MODEL_ID = "mistralai/Mistral-7B-Instruct-v0.3:together"
# MODEL_ID = "Qwen/Qwen2.5-7B-Instruct:novita"
MODEL_ID = "meta-llama/Llama-3.1-8B-Instruct:novita"


## Imports, Weights, Small Helpers
**Purpose**: Define utility functions and constants used everywhere.

**Includes**:

* `WEIGHTS`, `METRICS`

* `canonicalize_qa_text()` (cleans timestamps/fillers)

* `clamp_int()`, `iqr_confidence()`

* `compute_overall_weighted()`

* `safe_json_parse()`

**Outputs**: Helper functions in memory.


In [5]:
import re, json, time, random, statistics as stats
from typing import Dict, Any, List
import numpy as np
import pandas as pd

WEIGHTS = {"ca":0.35,"exp":0.35,"ps":0.15,"rel":0.05,"prof":0.05,"comm":0.05}
METRICS = ["ca","exp","ps","rel","prof","comm"]
FILLERS_RE = re.compile(r"\b(?:um+|uh+|erm|like|you know|sort of|kinda|i mean|ya know)\b", re.IGNORECASE)

def clamp_int(x, lo=1, hi=10):
    try:
        xi = int(round(float(x)))
    except Exception:
        xi = 5
    return max(lo, min(hi, xi))

def canonicalize_qa_text(text: str) -> str:
    if not isinstance(text, str): return ""
    t = text
    t = re.sub(r"\[\d{1,2}:\d{2}(?::\d{2})?\]", " ", t)
    t = re.sub(r"\(\d{1,2}:\d{2}(?::\d{2})?\)", " ", t)
    t = re.sub(r"(?m)^\s*\d{1,2}:\d{2}(?::\d{2})?\s+", " ", t)
    t = FILLERS_RE.sub("", t)
    t = re.sub(r"[ \t]+", " ", t)
    t = re.sub(r"\n{3,}", "\n\n", t)
    return t.strip()

def iqr_confidence(vals: List[float], max_iqr: float = 4.0) -> float:
    if not vals: return 0.0
    q1, q3 = np.percentile(vals, [25, 75])
    iqr = float(q3 - q1)
    return round(1.0 - min(iqr / max_iqr, 1.0), 3)

def compute_overall_weighted(scores: Dict[str,int]) -> float:
    return round(sum(WEIGHTS[m]*clamp_int(scores[m]) for m in METRICS), 3)

def safe_json_parse(text: str) -> Dict[str, Any]:
    try:
        return json.loads(text)
    except Exception:
        try:
            start = text.index("{"); end = text.rindex("}") + 1
            return json.loads(text[start:end])
        except Exception:
            return {}


In [6]:
from huggingface_hub import whoami, HfApi
print(whoami())  # Which account is the token tied to?

api = HfApi()
info = api.model_info("meta-llama/Llama-3.2-3B-Instruct", use_auth_token=True)
print(info.id, "gated:", info.gated)  # Will raise if you don't have access


{'type': 'user', 'id': '68f018da27c30b98c620061c', 'name': 'serviceagent', 'fullname': 'ServiceAgent', 'canPay': True, 'periodEnd': 1761955199, 'isPro': True, 'avatarUrl': '/avatars/c1d1a0eb60867c137fe668fb925748f7.svg', 'orgs': [], 'auth': {'type': 'access_token', 'accessToken': {'displayName': 'ensemble', 'role': 'fineGrained', 'createdAt': '2025-10-18T20:52:33.027Z', 'fineGrained': {'canReadGatedRepos': True, 'global': ['discussion.write', 'post.write'], 'scoped': [{'entity': {'_id': '66eaf084b3b3239188f66fa7', 'type': 'model', 'name': 'meta-llama/Llama-3.2-3B'}, 'permissions': ['repo.content.read', 'discussion.write', 'repo.write']}, {'entity': {'_id': '68f018da27c30b98c620061c', 'type': 'user', 'name': 'serviceagent'}, 'permissions': ['repo.content.read', 'repo.write', 'inference.serverless.write', 'inference.endpoints.infer.write', 'inference.endpoints.write', 'user.webhooks.read', 'user.webhooks.write', 'collection.read', 'collection.write', 'discussion.write', 'user.billing.rea

## Prompt Builders

**Purpose**: Create plain-string prompts for three stages without messy nested triple quotes.

**Functions**:

- `build_generation_prompt`(role, question_set, persona)

- `build_scoring_prompt`(qa_text) → returns JSON-only instruction (6 ints)

- `build_rewrite_prompt_locked`(qa_text, scores) → locks numeric scores; asks for justifications, 3 strengths, 3 weaknesses, 3–4 sentence summary

**Outputs**: Prompt strings on demand.


In [6]:
from typing import Dict, Any, List

def build_generation_prompt(role: str, question_set: List[str], persona: Dict[str, Any]) -> str:
    q_block = "\n".join([f"Question {i+1}: {q}" for i, q in enumerate(question_set)])
    lines = [
        f"You are the candidate interviewing for the role: {role}.",
        f"Persona hints: title={persona['persona_title']}; years_experience={persona['yrs_experience']}; "
        f"keywords={persona['domain_keywords']}; reliability={persona['reliability_flags']}; notes={persona['notes']}.",
        "",
        "Answer each question clearly (2–4 sentences per answer).",
        "",
        q_block,
        "",
        "Return responses in this pattern:",
        "Question 1: <repeat question>",
        "Answer: <answer>",
        "",
        "Question 2: <repeat question>",
        "Answer: <answer>",
    ]
    return "\n".join(lines)

def build_scoring_prompt(qa_text: str) -> str:
    metrics_def = "\n".join([
        "- Cognitive Ability (35%): Structured thinking, planning, logic.",
        "- Experience (35%): Relevant work (last 10 years), skills, accomplishments in similar service jobs.",
        "- Problem Solving (15%): Resourcefulness, safe tradeoffs under constraints.",
        "- Reliability (5%): Punctuality, follow-through, transport reliability.",
        "- Professionalism (5%): Respect for clients/rules, composure under stress.",
        "- Communication (5%): Clarity and tone; IGNORE filler words.",
    ])
    # Keep INTERNAL keys the same so downstream stays unchanged.
    lines = [
        "Analyze the candidate responses using the six metrics below.",
        "Return ONLY a JSON object with keys: ca, exp, ps, rel, prof, comm (each 1–10).",
        "",
        "Definitions (approximate weighting):",
        metrics_def,
        "",
        "Candidate Responses:",
        "--- START RESPONSES ---",
        qa_text,
        "--- END RESPONSES ---",
        "",
        'Output example: {"ca":8,"exp":7,"ps":7,"rel":7,"prof":6,"comm":6}',
    ]
    return "\n".join(lines)


def build_rewrite_prompt_locked(qa_text: str, s: Dict[str,int]) -> str:
    lines = [
        "Use FIXED scores; DO NOT change them. Generate justifications + bullets + summary.",
        "",
        f"- Cognitive Ability: {s['ca']}",
        f"- Experience: {s['exp']}",
        f"- Problem Solving: {s['ps']}",
        f"- Reliability: {s['rel']}",
        f"- Professionalism: {s['prof']}",
        f"- Communication: {s['comm']}",
        "",
        "Return ONLY this JSON:",
        "{",
        f'  "cognitive_ability_score": {s["ca"]},',
        '  "cognitive_ability_justification": "...",',
        f'  "experience_score": {s["exp"]},',
        '  "experience_justification": "...",',
        f'  "reliability_score": {s["rel"]},',
        '  "reliability_justification": "...",',
        f'  "professionalism_score": {s["prof"]},',
        '  "professionalism_justification": "...",',
        f'  "problem_solving_score": {s["ps"]},',
        '  "problem_solving_justification": "...",',
        f'  "communication_score": {s["comm"]},',
        '  "communication_justification": "...",',
        '  "general_strengths": "- ...\\n- ...\\n- ...",',
        '  "general_weaknesses": "- ...\\n- ...\\n- ...",',
        '  "general_summary": "..."',
        "}",
        "",
        "Candidate Responses:",
        "--- START ---",
        qa_text,
        "--- END ---",
    ]
    return "\n".join(lines)


## HF Client + Retry Wrapper

In [7]:
from huggingface_hub import InferenceClient
from huggingface_hub.utils import HfHubHTTPError
import time

def get_client(timeout_s: int = 60) -> InferenceClient:
    # Neutral client; router picks a backend. We pass model_id per call.
    return InferenceClient(timeout=timeout_s)

def call_text_generation(client: InferenceClient, prompt: str,
                         temperature=0.7, top_p=1.0, max_tokens=1024,
                         model_id: str = None, retries: int = 1, backoff: float = 1.5):
    assert model_id, "model_id is required"
    last_err = None

    def _go(c):
        return c.text_generation(
            prompt,
            model=model_id,                 # <-- pass model here
            temperature=temperature,
            top_p=top_p,
            max_new_tokens=max_tokens,
            return_full_text=False,
        )

    for i in range(retries + 1):
        try:
            return _go(client)
        except (ValueError, HfHubHTTPError) as e:
            last_err = e
            # retry once with a fresh neutral client (router may pick a different backend)
            time.sleep(backoff * (i + 1))
            client = InferenceClient(timeout=client.timeout)
    raise last_err


In [8]:
from openai import OpenAI

hf_client = OpenAI(
    base_url="https://router.huggingface.co/v1",
    api_key=os.environ["HF_TOKEN"],
)

def hf_chat_once(prompt: str, model: str, temperature: float = 0.7, max_tokens: int = 512,
                 json_mode: bool = False) -> str:
    kwargs = dict(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=temperature,
        max_tokens=max_tokens,
        top_p=1.0,
    )
    if json_mode:
        # Enforce valid JSON responses when the provider supports this
        kwargs["response_format"] = {"type": "json_object"}
    resp = hf_client.chat.completions.create(**kwargs)
    return resp.choices[0].message.content

def hf_chat_json(prompt: str, model: str, temperature: float, max_tokens: int = 512) -> tuple[dict, str]:
    """
    Ask for JSON mode; if provider ignores it, fall back to brace-slice.
    """
    txt = hf_chat_once(prompt, model=model, temperature=temperature, max_tokens=max_tokens, json_mode=True)
    try:
        return json.loads(txt), txt
    except Exception:
        try:
            start = txt.index("{"); end = txt.rindex("}") + 1
            return json.loads(txt[start:end]), txt
        except Exception:
            return {}, txt


## Generate Synthetic Interviews



In [None]:
random.seed(123)

role = CONFIG["role_title"]
questions = CONFIG["question_set"]
N = CONFIG["num_candidates"]

interviews = []
for i in range(N):
    persona = {
        "candidate_id": f"cand_{i+1:04d}",
        "persona_title": random.choice(["Veteran field tech","Career switcher","Recent grad","Retail service rep","HVAC junior"]),
        "yrs_experience": random.choice([0,1,2,3,5,7,10]),
        "domain_keywords": random.choice([
            "preventive maintenance, HVAC, route planning",
            "customer empathy, troubleshooting basics",
            "inventory, parts ordering, safety protocols",
            "ticket triage, escalation, SLA awareness",
        ]),
        "reliability_flags": random.choice(["has_car; weekend_ok","public_transit","night_shift_ok"]),
        "notes": random.choice(["calm under pressure","fast learner","detail-oriented"]),
    }
    prompt = build_generation_prompt(role, questions, persona)
    out = hf_chat_once(prompt, model=MODEL_ID, temperature=CONFIG["generation_temperature"], max_tokens=CONFIG["generation_max_tokens"])
    interviews.append({
        "interview_id": f"intv_{i+1:04d}",
        "candidate_id": persona["candidate_id"],
        "role_title": role,
        "question_set_id": CONFIG["question_set_id"],
        "num_questions": len(questions),
        "qa_text": canonicalize_qa_text(out),
        "source": "synthetic",
        "gen_model": MODEL_ID,
        "gen_prompt_version": CONFIG["gen_prompt_version"],
        "gen_temperature": CONFIG["generation_temperature"],
        "gen_top_p": 1.0,
        "gen_seed": 123,
        "created_at": pd.Timestamp.utcnow().isoformat(),
    })

interviews_df = pd.DataFrame(interviews)
display(interviews_df.head(2))


Unnamed: 0,interview_id,candidate_id,role_title,question_set_id,num_questions,qa_text,source,gen_model,gen_prompt_version,gen_temperature,gen_top_p,gen_seed,created_at
0,intv_0001,cand_0001,Field Technician,qs_v1,8,Question 1: Tell me about a time you handled a...,synthetic,meta-llama/Llama-3.2-3B-Instruct,gen_v1,0.8,1.0,123,2025-10-09T23:35:59.950431+00:00
1,intv_0002,cand_0002,Field Technician,qs_v1,8,Question 1: Tell me about a time you handled a...,synthetic,meta-llama/Llama-3.2-3B-Instruct,gen_v1,0.8,1.0,123,2025-10-09T23:36:06.946492+00:00


In [None]:
interviews_df['qa_text'][0]

"Question 1: Tell me about a time you handled an urgent service call. What steps did you take?\n\nAnswer: In my previous role, I handled an urgent service call on a public transit line during rush hour. The line was experiencing a major breakdown, and I had to assess the issue and provide a rapid solution to minimize disruptions. I took the initiative to inspect the system, diagnose the problem, and implement a temporary fix, ensuring the line could operate safely and efficiently. My priority was to restore service as quickly as possible while maintaining a high level of customer satisfaction.\n\nQuestion 2: How do you plan your route and prioritize jobs when schedules change during the day?\n\nAnswer: As a seasoned field technician, I use a combination of route optimization software and my own experience to plan efficient routes. I also rely on real-time data from the system to adjust my schedule and prioritize jobs based on urgency and location. This enables me to make the most of my

In [None]:
interviews_df.shape[0]

50

In [12]:
from google.colab import drive
drive.mount('/content/drive')  # follow the auth prompt

Mounted at /content/drive


In [None]:
import os
import pandas as pd
from datetime import datetime

# Choose a folder in your Drive
save_dir = "/content/drive/MyDrive/mvp"
os.makedirs(save_dir, exist_ok=True)
ts = datetime.utcnow().strftime("%Y%m%d-%H%M%S")
csv_path = f"{save_dir}/synthInterviews{ts}.csv"

  ts = datetime.utcnow().strftime("%Y%m%d-%H%M%S")


In [None]:
interviews_df.to_csv(csv_path, index=False)

In [13]:
interviews_df = pd.read_csv('/content/drive/MyDrive/mvp/synthInterviews20251009-234435.csv')

## K Scoring Runs (Self-Consistency @ 0.7)

In [26]:
K = CONFIG["k_samples"]
records = interviews_df[["interview_id","qa_text"]].to_dict("records")

samples = []
for iv in records:
    qa = iv["qa_text"] or ""
    sprompt = build_scoring_prompt(qa)
    for k in range(K):
        t0 = time.time()
        js, raw = hf_chat_json(sprompt, model=MODEL_ID, temperature=0.7, max_tokens=256)
        latency = int((time.time() - t0) * 1000)
        row = {
            "interview_id": iv["interview_id"],
            "run_idx": k,
            "model_name": MODEL_ID,
            "prompt_version": CONFIG["score_prompt_version"],
            "temperature": 0.7,
            "json_valid": all(key in js for key in METRICS),
            "latency_ms": latency,
        }
        for m in METRICS:
            row[m] = clamp_int(js.get(m, 5))
        samples.append(row)

samples_df = pd.DataFrame(samples)
display(samples_df.head(3))


ValueError: Model mistralai/Mistral-7B-Instruct-v0.3 is not supported for task text-generation and provider together. Supported task: conversational.

In [None]:
import os
import pandas as pd
from datetime import datetime

# Choose a folder in your Drive
save_dir = "/content/drive/MyDrive/mvp"
os.makedirs(save_dir, exist_ok=True)
ts = datetime.utcnow().strftime("%Y%m%d-%H%M%S")
csv_path = f"{save_dir}/scoringdf{ts}.csv"

  ts = datetime.utcnow().strftime("%Y%m%d-%H%M%S")


In [None]:
samples_df.to_csv(csv_path, index=False)

## Aggregate Medians

In [None]:
aggs = []
for intv_id, g in samples_df.groupby("interview_id"):
    row = {"interview_id": intv_id}
    latencies = g["latency_ms"].tolist()
    for m in METRICS:
        vals = [int(v) for v in g[m].tolist()]
        row[f"{m}_score_agg"] = clamp_int(stats.median(vals))
        row[f"{m}_confidence"] = iqr_confidence(vals)
    row["overall_weighted_agg"] = compute_overall_weighted({m:row[f"{m}_score_agg"] for m in METRICS})
    row["p95_latency_ms"] = float(np.percentile(latencies, 95)) if latencies else 0.0
    row["k_samples"] = K
    aggs.append(row)

aggregated_df = pd.DataFrame(aggs)
display(aggregated_df.head(3))


Unnamed: 0,interview_id,ca_score_agg,ca_confidence,exp_score_agg,exp_confidence,ps_score_agg,ps_confidence,rel_score_agg,rel_confidence,prof_score_agg,prof_confidence,comm_score_agg,comm_confidence,overall_weighted_agg,p95_latency_ms,k_samples
0,intv_0001,9,1.0,8,1.0,8,1.0,9,0.625,8,0.875,9,0.875,8.45,1097.0,3
1,intv_0002,9,0.875,8,0.875,8,0.875,9,0.75,8,0.75,9,0.625,8.45,1181.0,3
2,intv_0003,9,1.0,8,0.875,8,0.75,9,0.875,9,0.875,8,0.5,8.45,1094.3,3


In [None]:
import os
import pandas as pd
from datetime import datetime

# Choose a folder in your Drive
save_dir = "/content/drive/MyDrive/mvp"
os.makedirs(save_dir, exist_ok=True)
ts = datetime.utcnow().strftime("%Y%m%d-%H%M%S")
csv_path = f"{save_dir}/aggScores{ts}.csv"

  ts = datetime.utcnow().strftime("%Y%m%d-%H%M%S")


In [None]:
aggregated_df.to_csv(csv_path, index=False)

In [14]:
aggregated_df = pd.read_csv('/content/drive/MyDrive/mvp/aggScores20251018-212723.csv')

In [15]:
samples_df = pd.read_csv('/content/drive/MyDrive/mvp/scoringdf20251018-212633.csv')

## Deterministic Rewrite

In [16]:
# Ensure qa_map resolves and no NaNs
assert "interview_id" in interviews_df.columns and "qa_text" in interviews_df.columns
assert aggregated_df["interview_id"].isin(interviews_df["interview_id"]).all(), "Mismatched IDs"
assert interviews_df["qa_text"].notna().all(), "Some qa_text are NaN"


In [17]:
qa_map = dict(zip(interviews_df["interview_id"], interviews_df["qa_text"]))

def rewrite_once_chat(qa_text: str, scores_locked: Dict[str,int], model: str,
                      max_tokens: int = 1200) -> tuple[dict, str]:
    prompt = (
        "Respond with exactly ONE JSON object, no code fences, no prose, no markdown. "
        "Do not include any keys that are not requested.\n\n"
        + build_rewrite_prompt_locked(qa_text, scores_locked)
    )
    # temp=0.0 for deterministic wording; big max_tokens to avoid truncation
    return hf_chat_json(prompt, model=model, temperature=0.0, max_tokens=max_tokens)

def rewrite_with_retry_chat(qa_text: str, scores_locked: Dict[str,int], model: str, retries: int = 2):
    needed = [
        "cognitive_ability_justification", "experience_justification",
        "problem_solving_justification", "reliability_justification",
        "professionalism_justification", "communication_justification",
        "general_strengths", "general_weaknesses", "general_summary"
    ]
    last_raw = ""
    for _ in range(retries + 1):
        js, raw = rewrite_once_chat(qa_text, scores_locked, model=model, max_tokens=1200)
        last_raw = raw
        if js and all(k in js for k in needed):
            return js, raw
    return {}, last_raw

final_rows, bad = [], []
for _, r in aggregated_df.iterrows():
    intv_id = r["interview_id"]
    qa_text = (qa_map.get(intv_id) or "").strip()
    if not qa_text:
        bad.append({"interview_id": intv_id, "reason": "empty qa_text"}); continue

    scores_locked = {m: int(r[f"{m}_score_agg"]) for m in METRICS}

    t0 = time.time()
    js, raw = rewrite_with_retry_chat(qa_text, scores_locked, model=MODEL_ID, retries=2)
    latency = int((time.time() - t0) * 1000)

    if not js:
        bad.append({"interview_id": intv_id, "reason": "invalid JSON", "raw": (raw or "")[:600]})
        continue

    final_rows.append({
        "interview_id": intv_id,
        "cognitive_ability_score": scores_locked["ca"],
        "experience_score": scores_locked["exp"],
        "problem_solving_score": scores_locked["ps"],
        "reliability_score": scores_locked["rel"],
        "professionalism_score": scores_locked["prof"],
        "communication_score": scores_locked["comm"],
        "cognitive_ability_justification": js.get("cognitive_ability_justification",""),
        "experience_justification": js.get("experience_justification",""),
        "problem_solving_justification": js.get("problem_solving_justification",""),
        "reliability_justification": js.get("reliability_justification",""),
        "professionalism_justification": js.get("professionalism_justification",""),
        "communication_justification": js.get("communication_justification",""),
        "general_strengths": js.get("general_strengths",""),
        "general_weaknesses": js.get("general_weaknesses",""),
        "general_summary": js.get("general_summary",""),
        "final_model_name": MODEL_ID,
        "final_prompt_version": CONFIG["rewrite_prompt_version"],
        "final_temperature": 0.0,
        "final_latency_ms": latency,
        "created_at": pd.Timestamp.utcnow().isoformat(),
    })

final_df = pd.DataFrame(final_rows)
display(final_df.head(3))
print(f"Final rows: {len(final_df)} | Invalid rewrites: {len(bad)}")
if bad[:1]:
    print("\nExample invalid:", bad[0])


Unnamed: 0,interview_id,cognitive_ability_score,experience_score,problem_solving_score,reliability_score,professionalism_score,communication_score,cognitive_ability_justification,experience_justification,problem_solving_justification,...,professionalism_justification,communication_justification,general_strengths,general_weaknesses,general_summary,final_model_name,final_prompt_version,final_temperature,final_latency_ms,created_at
0,intv_0001,9,8,8,9,8,9,The candidate demonstrated exceptional cogniti...,The candidate has significant experience in th...,The candidate demonstrated effective problem-s...,...,The candidate consistently demonstrated a prof...,"The candidate was an effective communicator, u...","- Exceptional cognitive abilities, with the ab...",- Limited information provided about their abi...,The candidate demonstrated exceptional cogniti...,meta-llama/Llama-3.1-8B-Instruct:novita,rewrite_v1,0.0,9573,2025-10-24T00:02:19.186352+00:00
1,intv_0002,9,8,8,9,8,9,The candidate demonstrated exceptional cogniti...,The candidate has extensive experience in the ...,The candidate demonstrated strong problem-solv...,...,The candidate consistently displayed a profess...,The candidate consistently demonstrated excell...,- Strong critical thinking and problem-solving...,- Could benefit from refining their communicat...,The candidate demonstrated exceptional cogniti...,meta-llama/Llama-3.1-8B-Instruct:novita,rewrite_v1,0.0,8013,2025-10-24T00:02:27.200074+00:00
2,intv_0003,9,8,8,9,9,8,The candidate demonstrates strong cognitive ab...,The candidate has relevant experience in the f...,The candidate demonstrates strong problem-solv...,...,The candidate consistently demonstrates profes...,The candidate demonstrates effective communica...,- Strong problem-solving skills - Effective co...,- May lack depth in certain areas of experienc...,The candidate demonstrates strong cognitive ab...,meta-llama/Llama-3.1-8B-Instruct:novita,rewrite_v1,0.0,6297,2025-10-24T00:02:33.497558+00:00


Final rows: 50 | Invalid rewrites: 0


In [18]:
import os
import pandas as pd
from datetime import datetime

# Choose a folder in your Drive
save_dir = "/content/drive/MyDrive/mvp"
os.makedirs(save_dir, exist_ok=True)
ts = datetime.utcnow().strftime("%Y%m%d-%H%M%S")
csv_path = f"{save_dir}/finalScores{ts}.csv"

  ts = datetime.utcnow().strftime("%Y%m%d-%H%M%S")


In [19]:
final_df.to_csv(csv_path, index=False)

## Save All Results

In [None]:
interviews_df.to_csv("synth_interviews.csv", index=False)
samples_df.to_csv("synth_k_samples.csv", index=False)
aggregated_df.to_csv("synth_aggregated.csv", index=False)
final_df.to_csv("synth_final_outputs.csv", index=False)

print("Saved CSVs:",
      "synth_interviews.csv, synth_k_samples.csv, synth_aggregated.csv, synth_final_outputs.csv", sep="\n- ")
