<a href="https://colab.research.google.com/github/ananya1331/SER-notebooks/blob/main/Call_Score_Demo_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Call Score : evaluating an agent's performance against predefined criteria

1. How well the interviewer followed the process
2. How effectively the conversation progressed
3. How the candidate responded

What to measure?
- Talk Ratio
- Response quality/Candidate responsiveness (Latency, Duration)
- Structure Adherence

Data needed:
- Audio
- Diarization Ouput
- Timestamped Transcription
- Sentiment Output

In [2]:
import json
from collections import defaultdict

In [3]:
def duration(seg):
    return seg["end"] - seg["start"]

def is_question(text):
    text = text.lower().strip()
    return (
        text.endswith("?")
        or text.startswith(("what", "why", "how", "can you", "could you"))
    )

Talk Ratio Score : Measures conversation balance

Cadidate Talk Ratio : R = Tc / T

Scoring Function :
                                        
                                        100 if 0.55 ≤R ≤ 0.70

    Talk Ratio Score        = {         70 if 0.40 ≤ R< 0.55 or 0.70 < R≤ 0.85

                                        30 otherwise





In [4]:
def talk_ratio_score(diarization):
    times = defaultdict(float)

    for seg in diarization:
        times[seg["speaker"]] += duration(seg)

    interviewer = times.get("INTERVIEWER", 0)
    candidate = times.get("CANDIDATE", 0)
    total = interviewer + candidate

    if total == 0:
        return 0

    ratio = candidate / total

    if 0.55 <= ratio <= 0.70:
        return 100
    elif ratio < 0.40 or ratio > 0.85:
        return 30
    else:
        return 70

Candidate Responsiveness : Latency + answer quality

In [5]:
def candidate_responsiveness_score(transcript):
    scores = []

    for i in range(len(transcript) - 1):
        curr = transcript[i]
        nxt = transcript[i + 1]

        if curr["speaker"] == "INTERVIEWER" and is_question(curr["text"]):
            if nxt["speaker"] == "CANDIDATE":
                latency = nxt["start"] - curr["end"]
                answer_len = nxt["end"] - nxt["start"]

                latency_score = 100 if latency <= 2 else 60 if latency <= 5 else 30
                duration_score = 100 if 10 <= answer_len <= 60 else 50

                scores.append((latency_score + duration_score) / 2)

    if not scores:
        return 20

    return int(sum(scores) / len(scores))

Structure Adherence : Did the interviewer follow a sane flow?

In [6]:
def structure_adherence_score(transcript):
    stages = {
        "intro": False,
        "experience": False,
        "problem": False,
        "closing": False
    }

    for turn in transcript:
        text = turn["text"].lower()

        if "introduce" in text or "background" in text:
            stages["intro"] = True
        if "experience" in text or "worked on" in text:
            stages["experience"] = True
        if "challenge" in text or "problem" in text:
            stages["problem"] = True
        if "questions for me" in text or "next steps" in text:
            stages["closing"] = True

    return int((sum(stages.values()) / len(stages)) * 100)

Final Call Score = 0.40⋅TalkRatioScore + 0.35⋅CandidateResponsiveness + 0.25⋅StructureAdherence

In [7]:
def compute_call_score(diarization, transcript):
    scores = {
        "talk_ratio": talk_ratio_score(diarization),
        "candidate_responsiveness": candidate_responsiveness_score(transcript),
        "structure_adherence": structure_adherence_score(transcript),
    }

    final_score = round(
        0.4 * scores["talk_ratio"] +
        0.35 * scores["candidate_responsiveness"] +
        0.25 * scores["structure_adherence"]
    )

    scores["final_call_score"] = final_score
    return scores

Example run with mock data

In [8]:
diarization = [
    {"speaker": "INTERVIEWER", "start": 0.0, "end": 10.0},
    {"speaker": "CANDIDATE", "start": 10.2, "end": 45.0},
    {"speaker": "INTERVIEWER", "start": 45.2, "end": 60.0},
]

transcript = [
    {"speaker": "INTERVIEWER", "start": 0.0, "end": 10.0, "text": "Can you introduce yourself?"},
    {"speaker": "CANDIDATE", "start": 10.2, "end": 45.0, "text": "I have three years of experience working on backend systems."},
    {"speaker": "INTERVIEWER", "start": 45.2, "end": 60.0, "text": "What was the most challenging problem you solved?"}
]

print(json.dumps(compute_call_score(diarization, transcript), indent=2))

{
  "talk_ratio": 100,
  "candidate_responsiveness": 100,
  "structure_adherence": 75,
  "final_call_score": 94
}
