# Scoring Simulator

This notebook allows you to simulate scoring for a conversation by providing:
- `conversation_id`: The UUID of the conversation to score
- `user_id`: The UUID of the user (for verification)

It will compute and display:
1. **Conversation Scores**: Fillerwords, Clarity, Participation, Key Themes, Index of Questions, Rhythm, and Objective
2. **Profile Scores**: Prospection, Empathy, Technical Domain, Negotiation, and Resilience

## Imports

In [1]:
# Import necessary libraries and services
import asyncio
import sys
import os
import time
import numpy as np
import pandas as pd
from pathlib import Path
from uuid import UUID
from dotenv import load_dotenv
from IPython.display import display, HTML
from collections import defaultdict

# Add parent directory to path to import app modules
project_root = Path().resolve().parent
sys.path.insert(0, str(project_root))

# Load environment variables
load_dotenv()

# Import services
from app.services.messages_service import get_conversation_transcript
from app.services.conversations_service import get_conversation_details
from app.services.db import execute_query_one
from scoring_scripts.get_conver_scores import get_conver_scores
from scoring_scripts.get_conver_skills import get_conver_skills

print("‚úÖ Imports successful!")

‚úÖ Imports successful!


## Auxiliary functions

In [2]:
async def get_conversation_with_stage(conversation_id: UUID):
    """Get conversation details including course_id and stage_id with all feedback"""
    query = """
    SELECT 
        c.conversation_id, 
        c.user_id,
        c.course_id, 
        c.stage_id,
        c.start_timestamp, 
        c.end_timestamp, 
        c.status,
        sbc.general_score,
        sbc.fillerwords_scoring,
        sbc.clarity_scoring,
        sbc.participation_scoring,
        sbc.keythemes_scoring,
        sbc.indexofquestions_scoring,
        sbc.rhythm_scoring,
        sbc.is_accomplished,
        sbc.fillerwords_feedback,
        sbc.clarity_feedback,
        sbc.participation_feedback,
        sbc.keythemes_feedback,
        sbc.indexofquestions_feedback,
        sbc.rhythm_feedback, 
        cs.stage_objectives, 
        cs.key_themes
    FROM conversaApp.conversations c
    LEFT JOIN conversaapp.scoring_by_conversation sbc ON c.conversation_id = sbc.conversation_id
    LEFT JOIN conversaconfig.course_stages cs ON c.course_id = cs.course_id AND c.stage_id = cs.stage_id
    WHERE c.conversation_id = $1
    """
    
    result = await execute_query_one(query, conversation_id)
    return dict(result) if result else None


In [3]:
async def fetch_conversation_details(conversation_id: UUID, user_id: str = None):
    """Fetch conversation details and return them"""
    conversation_details = await get_conversation_with_stage(conversation_id)

    if not conversation_details:
        print(f"‚ùå Conversation {conversation_id} not found!")
        raise ValueError("Conversation not found")

    # Extract required IDs
    course_id = conversation_details.get("course_id")
    stage_id = conversation_details.get("stage_id")
    conv_user_id = conversation_details.get("user_id")
    stage_objectives = conversation_details.get("stage_objectives")
    key_themes = conversation_details.get("key_themes")

    print("üìã Conversation Details:")
    print(f"   User ID: {conv_user_id}")
    print(f"   Course ID: {course_id}")
    print(f"   Stage ID: {stage_id}")
    print(f"   Status: {conversation_details.get('status')}")
    print(f"   Start: {conversation_details.get('start_timestamp')}")
    print(f"   End: {conversation_details.get('end_timestamp')}")
    print("-" * 100)
    print(f"   Key themes: {key_themes}")
    print("-" * 100)
    print(f"   Stage Objectives: {stage_objectives}")
    print("-" * 100)

    # Verify user_id if provided
    if user_id and str(conv_user_id) != user_id:
        print(f"‚ö†Ô∏è  Warning: Provided user_id ({user_id}) doesn't match conversation user_id ({conv_user_id})")

    if not course_id or not stage_id:
        print("‚ùå Missing course_id or stage_id in conversation!")
        raise ValueError("Conversation missing required course_id or stage_id")
    
    return {
        'conversation_details': conversation_details,
        'course_id': course_id,
        'stage_id': stage_id,
        'user_id': conv_user_id,
        'stage_objectives': stage_objectives
    }

In [4]:
def display_transcript(transcript):
    if not transcript:
        print("‚ùå No transcript found for this conversation!")
    else:
        # Prepare data for table
        transcript_data = []
        for idx, turn in enumerate(transcript, 1):
            speaker = turn.get('speaker', 'Unknown')
            text = turn.get('text', '')
            duration = turn.get('duracion', 'N/A')
            
            # Format speaker name
            speaker_display = "üë§ Vendedor" if speaker == "vendedor" else "ü§ñ Cliente"
            
            # Count words
            word_count = len(text.split()) if text else 0
            
            transcript_data.append({
                '#': idx,
                'Speaker': speaker_display,
                'Text': text,
                'Words': word_count,
                'Duration (s)': duration if isinstance(duration, (int, float)) else 'N/A'
            })
        
        # Create DataFrame
        df_transcript = pd.DataFrame(transcript_data)
        
        # Calculate summary statistics
        total_turns = len(transcript_data)
        vendedor_turns = sum(1 for t in transcript_data if 'Vendedor' in t['Speaker'])
        cliente_turns = sum(1 for t in transcript_data if 'Cliente' in t['Speaker'])
        total_words = sum(t['Words'] for t in transcript_data)
        vendedor_words = sum(t['Words'] for t in transcript_data if 'Vendedor' in t['Speaker'])
        cliente_words = sum(t['Words'] for t in transcript_data if 'Cliente' in t['Speaker'])
        
        # Display summary
        print("=" * 100)
        print("üìù CONVERSATION TRANSCRIPT")
        print("=" * 100)
        print(f"\nüìä Summary:")
        print(f"   Total Turns: {total_turns}")
        print(f"   Vendedor Turns: {vendedor_turns} ({vendedor_turns/total_turns*100:.1f}%)")
        print(f"   Cliente Turns: {cliente_turns} ({cliente_turns/total_turns*100:.1f}%)")
        print(f"   Total Words: {total_words}")
        print(f"   Vendedor Words: {vendedor_words} ({vendedor_words/total_words*100:.1f}%)")
        print(f"   Cliente Words: {cliente_words} ({cliente_words/total_words*100:.1f}%)")
        print()
        
        # Display transcript table with HTML for better formatting
        display(HTML(df_transcript.to_html(
            index=False, 
            escape=False, 
            classes='table table-striped table-hover',
            table_id='transcript_table'
        )))

In [5]:
async def show_conver_scores(transcript, course_id, stage_id):
    """Compute and display conversation scores"""
    print("üîÑ Computing conversation scores...")
    print("=" * 60)

    # Compute scores directly using the transcript passed as parameter
    scoring_results = await get_conver_scores(transcript, course_id, stage_id)

    # Extract scores and feedback from the results
    conversation_scores_direct = {
        'general_score': scoring_results.get('puntuacion_global', 'N/A'),
        'fillerwords_scoring': scoring_results.get('detalle', {}).get('muletillas_pausas', 'N/A'),
        'clarity_scoring': scoring_results.get('detalle', {}).get('claridad', 'N/A'),
        'participation_scoring': scoring_results.get('detalle', {}).get('participacion', 'N/A'),
        'keythemes_scoring': scoring_results.get('detalle', {}).get('cobertura', 'N/A'),
        'indexofquestions_scoring': scoring_results.get('detalle', {}).get('preguntas', 'N/A'),
        'rhythm_scoring': scoring_results.get('detalle', {}).get('ppm', 'N/A'),
        'is_accomplished': scoring_results.get('objetivo', {}),
        'fillerwords_feedback': scoring_results.get('feedback', {}).get('muletillas_pausas', 'N/A'),
        'clarity_feedback': scoring_results.get('feedback', {}).get('claridad', 'N/A'),
        'participation_feedback': scoring_results.get('feedback', {}).get('participacion', 'N/A'),
        'keythemes_feedback': scoring_results.get('feedback', {}).get('cobertura', 'N/A'),
        'indexofquestions_feedback': scoring_results.get('feedback', {}).get('preguntas', 'N/A'),
        'rhythm_feedback': scoring_results.get('feedback', {}).get('ppm', 'N/A'),
        'objetivo': scoring_results.get('objetivo', {}),
        'objetivo_feedback': scoring_results.get('feedback', {}).get('objetivo', 'N/A'),
    }

    # Store in global variable for use in other cells
    global updated_scores
    updated_scores = conversation_scores_direct

    print("\n" + "=" * 60)
    print("üìä CONVERSATION SCORES SUMMARY (COMPUTED)")
    print("=" * 60)
    print(f"General Score: {updated_scores.get('general_score', 'N/A')}")
    print(f"Fillerwords: {updated_scores.get('fillerwords_scoring', 'N/A')}")
    print(f"Clarity: {updated_scores.get('clarity_scoring', 'N/A')}")
    print(f"Participation: {updated_scores.get('participation_scoring', 'N/A')}")
    print(f"Key Themes: {updated_scores.get('keythemes_scoring', 'N/A')}")
    print(f"Index of Questions: {updated_scores.get('indexofquestions_scoring', 'N/A')}")
    print(f"Rhythm: {updated_scores.get('rhythm_scoring', 'N/A')}")
    print(f"Objective Accomplished: {updated_scores.get('is_accomplished', 'N/A')}")

    print("\nüìù FEEDBACK:")
    print(f"Fillerwords: {updated_scores.get('fillerwords_feedback', 'N/A')[:150]}...")
    print(f"Clarity: {updated_scores.get('clarity_feedback', 'N/A')[:150]}...")
    print(f"Participation: {updated_scores.get('participation_feedback', 'N/A')[:150]}...")
    print(f"Key Themes: {updated_scores.get('keythemes_feedback', 'N/A')[:150]}...")
    print(f"Index of Questions: {updated_scores.get('indexofquestions_feedback', 'N/A')[:150]}...")
    print(f"Rhythm: {updated_scores.get('rhythm_feedback', 'N/A')[:150]}...")
    print(f"Objective: {updated_scores.get('objetivo_feedback', 'N/A')[:150]}...")
    
    return updated_scores

async def show_conver_profiles(transcript):
    """Compute and display profile scores"""
    # Declare global variable first
    global profiling_results
    
    print("üîÑ Computing profile scores...")
    print("=" * 60)

    # Compute profiles directly using the transcript passed as parameter
    profiling_results_computed = await get_conver_skills(transcript)

    # Extract scores and feedback from the computed results
    profiling_results = {
        'prospection_scoring': profiling_results_computed.get('prospection', {}).get('score', 'N/A'),
        'empathy_scoring': profiling_results_computed.get('empathy', {}).get('score', 'N/A'),
        'technical_domain_scoring': profiling_results_computed.get('technical_domain', {}).get('score', 'N/A'),
        'negotiation_scoring': profiling_results_computed.get('negociation', {}).get('score', 'N/A'),
        'resilience_scoring': profiling_results_computed.get('resilience', {}).get('score', 'N/A'),
        'prospection_feedback': profiling_results_computed.get('prospection', {}).get('justification', 'N/A'),
        'empathy_feedback': profiling_results_computed.get('empathy', {}).get('justification', 'N/A'),
        'technical_domain_feedback': profiling_results_computed.get('technical_domain', {}).get('justification', 'N/A'),
        'negotiation_feedback': profiling_results_computed.get('negociation', {}).get('justification', 'N/A'),
        'resilience_feedback': profiling_results_computed.get('resilience', {}).get('justification', 'N/A'),
    }

    print("\n" + "=" * 60)
    print("üë• PROFILE SCORES SUMMARY (COMPUTED)")
    print("=" * 60)
    print(f"Prospection: {profiling_results.get('prospection_scoring', 'N/A')}")
    print(f"Empathy: {profiling_results.get('empathy_scoring', 'N/A')}")
    print(f"Technical Domain: {profiling_results.get('technical_domain_scoring', 'N/A')}")
    print(f"Negotiation: {profiling_results.get('negotiation_scoring', 'N/A')}")
    print(f"Resilience: {profiling_results.get('resilience_scoring', 'N/A')}")

    print("\nüìù FEEDBACK:")
    print(f"Prospection: {profiling_results.get('prospection_feedback', 'N/A')[:150]}...")
    print(f"Empathy: {profiling_results.get('empathy_feedback', 'N/A')[:150]}...")
    print(f"Technical Domain: {profiling_results.get('technical_domain_feedback', 'N/A')[:150]}...")
    print(f"Negotiation: {profiling_results.get('negotiation_feedback', 'N/A')[:150]}...")
    print(f"Resilience: {profiling_results.get('resilience_feedback', 'N/A')[:150]}...")
    
    return profiling_results

In [6]:
def display_results(): 

    # Prepare comprehensive data with scores and feedback
    conversation_data = []
    conversation_metrics = [
        ("General Score", "general_score", None),
        ("Fillerwords", "fillerwords_scoring", "fillerwords_feedback"),
        ("Clarity", "clarity_scoring", "clarity_feedback"),
        ("Participation", "participation_scoring", "participation_feedback"),
        ("Key Themes", "keythemes_scoring", "keythemes_feedback"),
        ("Index of Questions", "indexofquestions_scoring", "indexofquestions_feedback"),
        ("Rhythm", "rhythm_scoring", "rhythm_feedback"),
        ("Objective Accomplished", "is_accomplished", "objetivo_feedback")
    ]

    for metric_name, score_key, feedback_key in conversation_metrics:
        score_value = updated_scores.get(score_key, 'N/A')
        if score_key == "is_accomplished":
            score_value = "Yes" if score_value else "No"
        feedback_value = updated_scores.get(feedback_key, 'N/A') if feedback_key else 'N/A'
        conversation_data.append({
            "Category": "Conversation",
            "Metric": metric_name,
            "Score": score_value,
            "Feedback": feedback_value if feedback_value != 'N/A' else '-'
        })

    # Prepare profile data with scores and feedback
    profile_data = []
    profile_skills = [
        ("Prospection", "prospection_scoring", "prospection_feedback"),
        ("Empathy", "empathy_scoring", "empathy_feedback"),
        ("Technical Domain", "technical_domain_scoring", "technical_domain_feedback"),
        ("Negotiation", "negotiation_scoring", "negotiation_feedback"),
        ("Resilience", "resilience_scoring", "resilience_feedback")
    ]

    if profiling_results:
        for skill_name, score_key, feedback_key in profile_skills:
            score_value = profiling_results.get(score_key, 'N/A')
            feedback_value = profiling_results.get(feedback_key, 'N/A')
            profile_data.append({
                "Category": "Profile",
                "Metric": skill_name,
                "Score": score_value,
                "Feedback": feedback_value if feedback_value != 'N/A' else '-'
            })

    # Combine all data
    all_data = conversation_data + profile_data

    # Create comprehensive DataFrame
    df_all = pd.DataFrame(all_data)

    # Configure pandas display options for better formatting
    pd.set_option('display.max_colwidth', None)
    pd.set_option('display.width', None)
    pd.set_option('display.max_columns', None)

    # Display the comprehensive table
    print("=" * 100)
    print("üìä COMPLETE SCORING RESULTS WITH FEEDBACK")
    print("=" * 100)
    print()

    # Display with HTML for better formatting in Jupyter
    display(HTML(df_all.to_html(index=False, escape=False, classes='table table-striped table-hover')))


    # Create separate tables for better readability
    print("\n" + "=" * 100)
    print("üéØ CONVERSATION SCORES:")
    print("=" * 100)
    df_conv = pd.DataFrame(conversation_data)
    display(HTML(df_conv.to_html(index=False, escape=False)))

    print("\n" + "=" * 100)
    print("üë• PROFILE SCORES:")
    print("=" * 100)
    if profile_data:
        df_prof = pd.DataFrame(profile_data)
        display(HTML(df_prof.to_html(index=False, escape=False)))
    else:
        print("No profile scores available")

    print("\n" + "=" * 100)

In [7]:
async def stress_test(iterations=10, transcript=None):

    print("üß™ STRESS TEST: Computing scores 5 times...")
    print("=" * 100)
    print("This will help us understand the variance in LLM-based scoring.\n")

    # Store all runs
    all_conversation_runs = []
    all_profile_runs = []

    # Run 5 iterations
    num_runs = iterations
    for run_num in range(1, num_runs + 1):
        print(f"üîÑ Run {run_num}/{num_runs}...")
        start_time = time.time()
        
        # Compute conversation scores
        scoring_results = await get_conver_scores(transcript, COURSE_ID, STAGE_ID)
        
        # Extract conversation scores
        conv_run = {
            'run': run_num,
            'general_score': scoring_results.get('puntuacion_global', 'N/A'),
            'fillerwords_scoring': scoring_results.get('detalle', {}).get('muletillas_pausas', 'N/A'),
            'clarity_scoring': scoring_results.get('detalle', {}).get('claridad', 'N/A'),
            'participation_scoring': scoring_results.get('detalle', {}).get('participacion', 'N/A'),
            'keythemes_scoring': scoring_results.get('detalle', {}).get('cobertura', 'N/A'),
            'indexofquestions_scoring': scoring_results.get('detalle', {}).get('preguntas', 'N/A'),
            'rhythm_scoring': scoring_results.get('detalle', {}).get('ppm', 'N/A'),
            'is_accomplished': scoring_results.get('objetivo', {}),
        }
        all_conversation_runs.append(conv_run)
        
        # Compute profile scores
        profiling_results_computed = await get_conver_skills(transcript)
        
        # Extract profile scores
        prof_run = {
            'run': run_num,
            'prospection_scoring': profiling_results_computed.get('prospection', {}).get('score', 'N/A'),
            'empathy_scoring': profiling_results_computed.get('empathy', {}).get('score', 'N/A'),
            'technical_domain_scoring': profiling_results_computed.get('technical_domain', {}).get('score', 'N/A'),
            'negotiation_scoring': profiling_results_computed.get('negociation', {}).get('score', 'N/A'),
            'resilience_scoring': profiling_results_computed.get('resilience', {}).get('score', 'N/A'),
        }
        all_profile_runs.append(prof_run)
        
        elapsed = time.time() - start_time
        print(f"   ‚úÖ Completed in {elapsed:.2f}s\n")

    print("=" * 100)
    print("‚úÖ All 5 runs completed!")
    print("=" * 100)

    return all_conversation_runs, all_profile_runs

In [8]:
def display_stress_test(all_conversation_runs, all_profile_runs):

    df_conv_runs = pd.DataFrame(all_conversation_runs)
    df_prof_runs = pd.DataFrame(all_profile_runs)

    # Calculate statistics for each metric
    def calculate_stats(values):
        """Calculate statistics for a list of numeric values"""
        numeric_values = [v for v in values if isinstance(v, (int, float)) and not isinstance(v, bool)]
        if not numeric_values:
            return {'mean': 'N/A', 'std': 'N/A', 'min': 'N/A', 'max': 'N/A', 'range': 'N/A'}
        
        mean_val = np.mean(numeric_values)
        std_val = np.std(numeric_values)
        min_val = np.min(numeric_values)
        max_val = np.max(numeric_values)
        range_val = max_val - min_val
        
        return {
            'mean': round(mean_val, 2),
            'std': round(std_val, 2),
            'min': min_val,
            'max': max_val,
            'range': round(range_val, 2)
        }

    # Conversation scores statistics
    conv_stats = {}
    conv_metrics = [
        col for col in df_conv_runs.columns 
        if any(key in col for key in [
            'general_score', 'fillerwords_scoring', 'clarity_scoring', 
            'participation_scoring', 'keythemes_scoring', 'indexofquestions_scoring', 
            'rhythm_scoring'
        ])
    ]
    for metric in conv_metrics:
        # skip if not present
        if metric not in df_conv_runs:
            continue
        values = df_conv_runs[metric].tolist()
        conv_stats[metric] = calculate_stats(values)

    # Profile scores statistics
    prof_stats = {}
    prof_metrics = [
        col for col in df_prof_runs.columns
        if any(key in col for key in [
            'prospection_scoring', 'empathy_scoring', 'technical_domain_scoring',
            'negotiation_scoring', 'resilience_scoring'
        ])
    ]
    for metric in prof_metrics:
        if metric not in df_prof_runs:
            continue
        values = df_prof_runs[metric].tolist()
        prof_stats[metric] = calculate_stats(values)

    # Display all 5 runs side by side
    print("\n" + "=" * 100)
    print("üìã ALL RUNS - CONVERSATION SCORES")
    print("=" * 100)
    display(HTML(df_conv_runs.to_html(index=False, escape=False, classes='table table-striped table-hover')))

    print("\n" + "=" * 100)
    print("üìã ALL RUNS - PROFILE SCORES")
    print("=" * 100)
    display(HTML(df_prof_runs.to_html(index=False, escape=False, classes='table table-striped table-hover')))

## Input Parameters

Enter the `conversation_id` and `user_id` below:

In [9]:
# Input parameters
CONVERSATION_ID = "3e7eff28-ec12-45bf-8940-efaba98c4298"  # Replace with your conversation_id
USER_ID = "your-user-id-here"  # Replace with your user_id

# Validate UUIDs
try:
    conv_uuid = UUID(CONVERSATION_ID)
    user_uuid = UUID(USER_ID) if USER_ID != "your-user-id-here" else None
    print(f"‚úÖ Conversation ID: {CONVERSATION_ID}")
    if user_uuid:
        print(f"‚úÖ User ID: {USER_ID}")
    else:
        print("‚ö†Ô∏è  User ID not provided (will be extracted from conversation)")
except ValueError as e:
    print(f"‚ùå Invalid UUID format: {e}")
    raise

‚úÖ Conversation ID: 3e7eff28-ec12-45bf-8940-efaba98c4298
‚ö†Ô∏è  User ID not provided (will be extracted from conversation)


## Get Conversation Details

First, we'll fetch the conversation details to get `course_id` and `stage_id`:

In [10]:
# Fetch conversation details using await (Jupyter supports top-level await)
conv_info = await fetch_conversation_details(conv_uuid, USER_ID if USER_ID != "your-user-id-here" else None)

# Extract the IDs for use in subsequent cells
COURSE_ID = conv_info['course_id']
STAGE_ID = conv_info['stage_id']
CONV_USER_ID = conv_info['user_id']

conversation_details = conv_info['conversation_details']

üìã Conversation Details:
   User ID: ae1ed3b5-0ea0-4ad5-9565-fec1e4ef1ee7
   Course ID: 3eeeda53-7dff-40bc-b036-b608acb89e6f
   Stage ID: afa3a709-7c7b-4a64-9642-3d6eec19cc53
   Status: FINISHED
   Start: 2026-02-21 10:22:39.020365
   End: 2026-02-21 10:26:16.633623
----------------------------------------------------------------------------------------------------
   Key themes: Justificar t√©cnicamente las ventajas del modelo caro (Factor de marcha del 50% vs 60%).
Explicar adecuadamente en qu√© consiste la tecnolog√≠a inverter y el pulsado en acero. 
Explicar las principales ventajas frente a la competencia. 
Superar la objeci√≥n de "descapitalizaci√≥n" introduciendo servicios financieros (Renting y ventajas fiscales).
Aportar tranquilidad al Responsable de Calidad mencionando la fabricaci√≥n local y el mantenimiento preventivo

----------------------------------------------------------------------------------------------------
   Stage Objectives: El vendedor debe cerrar un presu

## Conversation Transcript

Displaying the full conversation transcript:

In [11]:
# Get the conversation transcript
transcript = await get_conversation_transcript(conv_uuid)
display_transcript(transcript)

üìù CONVERSATION TRANSCRIPT

üìä Summary:
   Total Turns: 21
   Vendedor Turns: 10 (47.6%)
   Cliente Turns: 11 (52.4%)
   Total Words: 713
   Vendedor Words: 173 (24.3%)
   Cliente Words: 540 (75.7%)



#,Speaker,Text,Words,Duration (s)
1,ü§ñ Cliente,"¬°Hola!, ¬øqu√© tal?",3,
2,üë§ Vendedor,Buenos d√≠as.,2,1.526
3,ü§ñ Cliente,"Buenos d√≠as. Soy Juan P√©rez, el Responsable de Calidad de SoldaKing. Estoy buscando m√°quinas de soldar para uso intensivo en nuestra f√°brica. He estado mirando la Gala MIG dos mil setecientos i, pero tengo algunas dudas sobre si aguantar√° nuestro ritmo de trabajo.",43,
4,üë§ Vendedor,"No te preocupes, que yo voy a disipar todas esas dudas. Ya te digo yo que aguanta.",17,4.834
5,ü§ñ Cliente,"Me alegra o√≠r eso. Sin embargo, me gustar√≠a que me argumentaras el retorno de inversi√≥n de cada uno de los modelos, la Gala MIG dos mil setecientos i y la Gala MIG tres mil quinientos i, y cu√°l me conviene m√°s a m√≠. Necesito asegurarme de que cualquier inversi√≥n adicional se justifique con una mejora notable en el rendimiento o la durabilidad.",62,
6,üë§ Vendedor,"A ti no te conviene la 2700X, sino la 5000. Ya te digo que con las plantas que manej√°is, o sea, el rendimiento que nos aporta la 500, o sea, la 5000 va a ser mucho mayor y es lo que necesit√°is. No hay duda.",45,18.838
7,ü§ñ Cliente,"Entiendo que me recomiendas la Gala MIG cinco mil i, pero me gustar√≠a que me explicaras por qu√©. La Gala MIG dos mil setecientos i da doscientos setenta amperios al cincuenta por ciento, mientras que la Gala MIG tres mil quinientos i llega a trescientos cincuenta amperios al sesenta por ciento. ¬øQu√© ventajas espec√≠ficas me ofrece la cinco mil i que justifiquen la inversi√≥n adicional, m√°s all√° de una mayor potencia? Necesito datos concretos y un argumento s√≥lido para considerar un modelo de mayor precio.",85,
8,üë§ Vendedor,"El A1700i aguanta cinco a√±os. La durabilidad del A5000 es el doble, pero el precio es un 30 % mayor. ¬øVes por d√≥nde van los n√∫meros?",26,17.719
9,ü§ñ Cliente,"Comprendo lo que dices sobre la durabilidad y el precio. Sin embargo, la Gala MIG dos mil setecientos i da doscientos setenta amperios al cincuenta por ciento, mientras que la Gala MIG tres mil quinientos i llega a trescientos cincuenta amperios al sesenta por ciento. Necesito entender la diferencia t√©cnica y el retorno de inversi√≥n de cada modelo para justificar una inversi√≥n mayor. ¬øPodr√≠as explicarme en qu√© consiste la tecnolog√≠a Inverter y el pulsado en acero, si es que aplica a estos modelos?",83,
10,üë§ Vendedor,"Por supuesto. La tecnolog√≠a Inverter lo que hace es que el acabado sea perfecto. Y en cuanto a pulsado en acero, pues es pulsado en acero.",26,9.553


## Compute Conversation Scores

Computing all conversation scores using `scoring_service`:

In [12]:
# Compute and display conversation scores
updated_scores = await show_conver_scores(transcript, COURSE_ID, STAGE_ID)

üîÑ Computing conversation scores...



üìä CONVERSATION SCORES SUMMARY (COMPUTED)
General Score: 27.6
Fillerwords: 60
Clarity: 100
Participation: 75
Key Themes: 0.0
Index of Questions: 0
Rhythm: 70
Objective Accomplished: False

üìù FEEDBACK:
Fillerwords: El porcentaje de muletillas empleadas es 4.62%, siendo las muletillas mas repetidas: pues, ya ...
Clarity: Mant√©n la organizaci√≥n en tus respuestas y evita frases cortas o sin desarrollo, como 'pulsado en acero'. Explica con m√°s detalle conceptos t√©cnicos y...
Participation: Tu porcentaje de participaci√≥n ha sido del 24.26%. Has mostrado escucha activa en 2 ocasiones....
Key Themes: No cubres adecuadamente ninguno de los temas clave. No justificas t√©cnicamente las ventajas del modelo caro, no explicas en qu√© consiste la tecnolog√≠a...
Index of Questions: M√©trica desactivada temporalmente...
Rhythm: Velocidad de habla cercana al √≥ptimo (128.6 PPM), pero con margen de mejora. Intenta acercarte m√°s al rango de 130-150 PPM....
Objective: Fracaso en todos los aspec

## Compute Profile Scores

Computing all profile scores using `profiling_service`:

In [13]:
profiling_results = await show_conver_profiles(transcript)

üîÑ Computing profile scores...

üë• PROFILE SCORES SUMMARY (COMPUTED)
Prospection: 5
Empathy: 2
Technical Domain: 3
Negotiation: 1
Resilience: 2

üìù FEEDBACK:
Prospection: Demuestras investigaci√≥n previa sobre los productos y retos del cliente, mencionas modelos espec√≠ficos y tecnolog√≠a, y haces preguntas t√©cnicas releva...
Empathy: El vendedor en varias ocasiones no valida las preocupaciones del cliente, como cuando dice 'no te preocupes, que yo voy a disipar todas esas dudas', s...
Technical Domain: El vendedor explica aspectos t√©cnicos como 'la tecnolog√≠a Inverter hace que el acabado sea perfecto' y menciona la durabilidad y potencia en cifras, p...
Negotiation: El vendedor evita profundizar en detalles t√©cnicos, solo afirma que la duraci√≥n del A1700i es de cinco a√±os y que la del A5000 es el doble, sin explic...
Resilience: El vendedor mantiene un tono positivo en la mayor√≠a de la llamada, si bien muestra momentos de frustraci√≥n, como al decir 'La verdad, la verdad 

## Complete Results Summary

Displaying all scores in a formatted table:

In [14]:
display_results()

üìä COMPLETE SCORING RESULTS WITH FEEDBACK



Category,Metric,Score,Feedback
Conversation,General Score,27.6,-
Conversation,Fillerwords,60,"El porcentaje de muletillas empleadas es 4.62%, siendo las muletillas mas repetidas: pues, ya"
Conversation,Clarity,100,"Mant√©n la organizaci√≥n en tus respuestas y evita frases cortas o sin desarrollo, como 'pulsado en acero'. Explica con m√°s detalle conceptos t√©cnicos y justifica mejor el retorno de inversi√≥n, por ejemplo, desglosando c√≥mo se calcula el 20%. Evita comentarios ambiguos como 'yo voy a disipar todas esas dudas' y enf√≥cate en respuestas concretas y estructuradas."
Conversation,Participation,75,Tu porcentaje de participaci√≥n ha sido del 24.26%. Has mostrado escucha activa en 2 ocasiones.
Conversation,Key Themes,0.0,"No cubres adecuadamente ninguno de los temas clave. No justificas t√©cnicamente las ventajas del modelo caro, no explicas en qu√© consiste la tecnolog√≠a inverter y el pulsado en acero, no expones las ventajas frente a la competencia, no introduces servicios financieros para superar la objeci√≥n de capital, ni aportas tranquilidad respecto a la fabricaci√≥n local y mantenimiento preventivo. La llamada requiere mayor desarrollo en estos aspectos."
Conversation,Index of Questions,0,M√©trica desactivada temporalmente
Conversation,Rhythm,70,"Velocidad de habla cercana al √≥ptimo (128.6 PPM), pero con margen de mejora. Intenta acercarte m√°s al rango de 130-150 PPM."
Conversation,Objective Accomplished,No,"Fracaso en todos los aspectos del objetivo. No hay acuerdo expl√≠cito en la cantidad, duraci√≥n ni en la argumentaci√≥n t√©cnica para justificar la compra del √∫ltimo modelo."
Profile,Prospection,5,"Demuestras investigaci√≥n previa sobre los productos y retos del cliente, mencionas modelos espec√≠ficos y tecnolog√≠a, y haces preguntas t√©cnicas relevantes como '¬øpuedes explicarme c√≥mo funciona?'."
Profile,Empathy,2,"El vendedor en varias ocasiones no valida las preocupaciones del cliente, como cuando dice 'no te preocupes, que yo voy a disipar todas esas dudas', sin profundizar en sus inquietudes t√©cnicas. Adem√°s, interrumpe y desvia la conversaci√≥n con frases como 'yo lo dejar√≠a aqu√≠', mostrando poca empat√≠a y atenci√≥n genuina."



üéØ CONVERSATION SCORES:


Category,Metric,Score,Feedback
Conversation,General Score,27.6,-
Conversation,Fillerwords,60,"El porcentaje de muletillas empleadas es 4.62%, siendo las muletillas mas repetidas: pues, ya"
Conversation,Clarity,100,"Mant√©n la organizaci√≥n en tus respuestas y evita frases cortas o sin desarrollo, como 'pulsado en acero'. Explica con m√°s detalle conceptos t√©cnicos y justifica mejor el retorno de inversi√≥n, por ejemplo, desglosando c√≥mo se calcula el 20%. Evita comentarios ambiguos como 'yo voy a disipar todas esas dudas' y enf√≥cate en respuestas concretas y estructuradas."
Conversation,Participation,75,Tu porcentaje de participaci√≥n ha sido del 24.26%. Has mostrado escucha activa en 2 ocasiones.
Conversation,Key Themes,0.0,"No cubres adecuadamente ninguno de los temas clave. No justificas t√©cnicamente las ventajas del modelo caro, no explicas en qu√© consiste la tecnolog√≠a inverter y el pulsado en acero, no expones las ventajas frente a la competencia, no introduces servicios financieros para superar la objeci√≥n de capital, ni aportas tranquilidad respecto a la fabricaci√≥n local y mantenimiento preventivo. La llamada requiere mayor desarrollo en estos aspectos."
Conversation,Index of Questions,0,M√©trica desactivada temporalmente
Conversation,Rhythm,70,"Velocidad de habla cercana al √≥ptimo (128.6 PPM), pero con margen de mejora. Intenta acercarte m√°s al rango de 130-150 PPM."
Conversation,Objective Accomplished,No,"Fracaso en todos los aspectos del objetivo. No hay acuerdo expl√≠cito en la cantidad, duraci√≥n ni en la argumentaci√≥n t√©cnica para justificar la compra del √∫ltimo modelo."



üë• PROFILE SCORES:


Category,Metric,Score,Feedback
Profile,Prospection,5,"Demuestras investigaci√≥n previa sobre los productos y retos del cliente, mencionas modelos espec√≠ficos y tecnolog√≠a, y haces preguntas t√©cnicas relevantes como '¬øpuedes explicarme c√≥mo funciona?'."
Profile,Empathy,2,"El vendedor en varias ocasiones no valida las preocupaciones del cliente, como cuando dice 'no te preocupes, que yo voy a disipar todas esas dudas', sin profundizar en sus inquietudes t√©cnicas. Adem√°s, interrumpe y desvia la conversaci√≥n con frases como 'yo lo dejar√≠a aqu√≠', mostrando poca empat√≠a y atenci√≥n genuina."
Profile,Technical Domain,3,"El vendedor explica aspectos t√©cnicos como 'la tecnolog√≠a Inverter hace que el acabado sea perfecto' y menciona la durabilidad y potencia en cifras, pero no profundiza en datos verificables ni en beneficios concretos para el cliente, limit√°ndose a respuestas superficiales."
Profile,Negotiation,1,"El vendedor evita profundizar en detalles t√©cnicos, solo afirma que la duraci√≥n del A1700i es de cinco a√±os y que la del A5000 es el doble, sin explicar claramente c√≥mo esos datos justifican el valor. Cede ante la objeci√≥n del cliente con un 'no, Juan' y 'demasiado defectivo', sin ofrecer soluciones concretas o pr√≥ximos pasos claros."
Profile,Resilience,2,"El vendedor mantiene un tono positivo en la mayor√≠a de la llamada, si bien muestra momentos de frustraci√≥n, como al decir 'La verdad, la verdad que no' y 'Demasiado defectivo'. No logra recuperar energ√≠a tras algunas respuestas del cliente y termina la llamada rindi√©ndose con 'Creo que nos estamos desviando del tema'."





## Stress Test: LLM Scoring Consistency

This cell runs the scoring and profiling functions **5 times** to test the consistency and variance of LLM-based scoring. This helps identify how much variation exists in the scores due to the non-deterministic nature of LLMs.

In [15]:
all_conversation_runs, all_profile_runs = await stress_test(iterations=5, transcript=transcript)

üß™ STRESS TEST: Computing scores 5 times...
This will help us understand the variance in LLM-based scoring.

üîÑ Run 1/5...
   ‚úÖ Completed in 19.02s

üîÑ Run 2/5...
   ‚úÖ Completed in 17.39s

üîÑ Run 3/5...
   ‚úÖ Completed in 14.20s

üîÑ Run 4/5...
   ‚úÖ Completed in 19.72s

üîÑ Run 5/5...
   ‚úÖ Completed in 16.87s

‚úÖ All 5 runs completed!


In [16]:
display_stress_test(all_conversation_runs, all_profile_runs)


üìã ALL RUNS - CONVERSATION SCORES


run,general_score,fillerwords_scoring,clarity_scoring,participation_scoring,keythemes_scoring,indexofquestions_scoring,rhythm_scoring,is_accomplished
1,24.6,60,60,75,0.0,0,70,False
2,26.6,60,50,75,20.0,0,70,False
3,27.6,60,100,75,0.0,0,70,False
4,23.1,60,40,75,0.0,0,70,False
5,23.1,60,40,75,0.0,0,70,False



üìã ALL RUNS - PROFILE SCORES


run,prospection_scoring,empathy_scoring,technical_domain_scoring,negotiation_scoring,resilience_scoring
1,4,1,2,1,2
2,4,3,2,2,2
3,3,3,2,2,2
4,3,1,2,3,2
5,3,1,3,2,4
