# Talk-to-Listen Ratio
- **Definition:** The proportion of time the salesperson talks versus the buyer during the call.
- **Why It Matters:** Balanced conversations are more engaging. If a salesperson dominates the conversation (e.g., talks 65% of the time), it may indicate they're not effectively engaging the buyer.
- **Example Insight:** Top-performing salespeople talk about 43% of the time in discovery calls.

## Algorithm

In [1]:
import pandas as pd
from decimal import Decimal, getcontext

def calculate_talk_times(filepath):
    """
    Calculate talk duration for each speaker and the total no-talk (silence) duration from a transcript CSV file.

    This function reads a CSV file containing transcript data with columns: "Speaker", "Transcript",
    "Start Time", and "End Time". It calculates the total speaking time for each speaker and the total
    duration of silence (no-talk time) between speaking intervals.

    Parameters:
    filepath (str): The path to the CSV file containing the transcript data.

    Returns:
    tuple:
        - speaker_durations (dict): A dictionary where the keys are speaker names and the values are
          the total speaking duration in seconds for each speaker.
        - no_talk_duration (float): The total duration of silence (in seconds) throughout the conversation.

    Example:
    >>> filepath = 'transcript.csv'
    >>> speaker_durations, no_talk_duration = calculate_talk_times(filepath)
    >>> print("Speaker Durations (in seconds):", speaker_durations)
    >>> print("No Talk Duration (in seconds):", no_talk_duration)

    Output:
    Speaker Durations (in seconds): {'Shawn Henderson': 9.16, 'Travis Mirra': 6.324}
    No Talk Duration (in seconds): 2.3460000000000005
    """
    
    df = pd.read_csv(filepath)

    # Set decimal precision
    getcontext().prec = 10

    # Convert 'Start Time' and 'End Time' to seconds
    def time_to_seconds(t):
        minutes, seconds = t.split(':')
        total_seconds = Decimal(minutes) * 60 + Decimal(seconds)
        return total_seconds

    df['Start Time'] = df['Start Time'].apply(time_to_seconds)
    df['End Time'] = df['End Time'].apply(time_to_seconds)

    # Calculate duration of each utterance
    df['Duration'] = df['End Time'] - df['Start Time']

    # Sum durations per speaker
    speaker_durations = df.groupby('Speaker')['Duration'].sum().to_dict()

    # Create a list of all speech intervals
    intervals = df[['Start Time', 'End Time']].values.tolist()
    intervals.sort()

    # Merge overlapping intervals to find total talk time
    total_talk_time = Decimal(0)
    merged_intervals = []
    start, end = intervals[0]
    for curr_start, curr_end in intervals[1:]:
        if curr_start <= end:
            end = max(end, curr_end)
        else:
            merged_intervals.append((start, end))
            total_talk_time += end - start
            start, end = curr_start, curr_end
    merged_intervals.append((start, end))
    total_talk_time += end - start

    # Calculate total duration and no-talk duration
    total_duration = df['End Time'].max() - df['Start Time'].min()
    no_talk_duration = total_duration - total_talk_time

    # Convert Decimal values to float for the final output
    speaker_durations = {name: float(duration_seconds) for name, duration_seconds in speaker_durations.items()}
    no_talk_duration = float(no_talk_duration)

    return speaker_durations, no_talk_duration

## Test


### Check the data

In [2]:
PATH_TO_TEST_FILE = 'sample.csv'
pd.read_csv(PATH_TO_TEST_FILE)

Unnamed: 0,Speaker,Transcript,Start Time,End Time
0,Travis Mirra,You doing pretty good.,0:00.090,0:01.246
1,Travis Mirra,Shauna doing all right.,0:01.268,0:03.070
2,Shawn Henderson,Wonderful.,0:03.220,0:04.042
3,Shawn Henderson,"Thank you so much for I know you rescheduled, ...",0:04.186,0:06.686
4,Shawn Henderson,You were booked earlier this week?,0:06.708,0:08.350
...,...,...,...,...
112,Shawn Henderson,So a lot of people associate drawstring bag—th...,7:54.470,7:58.202
113,Shawn Henderson,"So organically, you want to be there.",7:58.286,8:01.064
114,Shawn Henderson,It doesn't cost you anything.,8:01.102,8:02.696
115,Shawn Henderson,And it gives you that exposure beyond just you...,8:02.878,8:09.454


### Run algorithm and print results

In [3]:
# Example usage:
speaker_durations, no_talk_duration = calculate_talk_times(PATH_TO_TEST_FILE)
print("Speaker Durations (in seconds):", speaker_durations)
print("No Talk Duration (in seconds):", no_talk_duration)

Speaker Durations (in seconds): {'Shawn Henderson': 130.162, 'Travis Mirra': 310.27}
No Talk Duration (in seconds): 53.192
