# Matching timestamps
The teams transcriptions come with accurate timestamps for individual speakers since they are using separate microphones. Unfortunately, whisper does not have automatic knowledge of who's saying what. To make up for this, the transcribed speech from whisper is matched with the teams transcription timestamp.
<br><br>
This is achieved by taking a sentence from the whisper transcription and looking at its timestamp (start/end times). A specific sentence is then assigned the speaker tag of the time section with the most overlapping time from the teams transcription. 

In [1]:
import os
import json
import re
import pandas as pd
import numpy as np
import datetime

folder_path = os.path.join("Projects", "Example 1")
if not os.path.exists(folder_path):
    print(f"Folder does not exist")

Read in the data from .tsv files as pandas dataframes

In [26]:
# Teams transcription in tsv-format
df_teams = pd.read_csv(os.path.join(folder_path, "synthetic_teams.tsv"), sep="\t")

print(f"Shape of df: {df_teams.shape}")
df_teams.head()

Shape of df: (8, 4)


Unnamed: 0,start,end,speaker,text
0,0,45000,Annie Lööf,Occaecat ut incididunt sint magna proident qui...
1,45000,52000,Expressen,Qui irure laborum cupidatat dolore mollit mini...
2,52000,104000,Nooshi Dadgostar,Et commodo non veniam occaecat elit aute. Comm...
3,104000,131000,Annie Lööf,Non aliqua pariatur qui anim nisi. Non minim a...
4,131000,139000,Expressen,"Sunt nisi cillum proident ex aliquip tempor, i..."


In [27]:
# Whisper transcription in tsv-format
df_whisper = pd.read_csv(os.path.join(folder_path, "default_whisper_20230124_large-v2.tsv"), sep="\t")

print(f"Shape of df: {df_whisper.shape}")
df_whisper.head()

Shape of df: (60, 3)


Unnamed: 0,start,end,text
0,0,4000,Det är 16 000 äldre som har...
1,4000,7000,...bo på särskilda äldreboenden.
2,7000,11000,Och varannat besök på en vårdcentral görs hos ...
3,11000,15000,Jag tycker det är för mycket storstadsfokus i ...
4,15000,20000,För tar vi oss ut på landsbygden så är det ju ...


In [28]:
# Function to find the closest timestamp in df2 to a given timestamp in df1
def find_overlap(start1, end1, start2, end2):
    overlap = min(end1, end2) - max(start1, start2)

    if overlap < 0:
        return 0
    else:
        return overlap

In [29]:
# Handle offset/misalignment in speech-to-text

offset = 200    # milliseconds

# Calculate offset based on timestamps in team transcription and audio
hour_teams, minute_teams, second_teams = 0, 2, 46
milli_teams = int(hour_teams * 60*60*1000 + minute_teams * 60*1000 + second_teams * 1000)

hour_audio, minute_audio, second_audio = 0, 2, 50
milli_audio = int(hour_audio * 60*60*1000 + minute_audio * 60*1000 + second_audio * 1000)

#offset = milli_audio - milli_teams
print(offset)

# Make an empty column for speaker
df_whisper["speaker"] = np.zeros(df_whisper.shape[0])

# Loop over all rows in whisper transcription
for i in range(df_whisper.shape[0]):
    start1, end1 = df_whisper[["start", "end"]].iloc[i]

    # Find the row in team transcription with the most overlap with the current row in whisper transcription
    amount_of_overlap = df_teams.apply(lambda row: find_overlap(start1=start1, end1=end1, 
                                        start2=row["start"]+offset, 
                                        end2=row["end"]+offset), 
                                        axis=1)
    
    # Get index of row with most overlap
    index_max = np.argmax(amount_of_overlap)

    # Extract speaker from team transcription and add to whisper transcription
    speaker = df_teams.loc[index_max, "speaker"]
    df_whisper.loc[i, "speaker"] = speaker

# Change column order and save to file
df_whisper = df_whisper[["start", "end", "speaker", "text"]]
df_whisper.to_csv(os.path.join(folder_path, "text_whisper_offset_adjusted.tsv"), index=False, sep="\t")

print(f"Shape of df: {df_whisper.shape}")
df_whisper.head(10)

200


Unnamed: 0,start,end,speaker,text
0,0,4000,Annie Lööf,Det är 16 000 äldre som har...
1,4000,7000,Annie Lööf,...bo på särskilda äldreboenden.
2,7000,11000,Annie Lööf,Och varannat besök på en vårdcentral görs hos ...
3,11000,15000,Annie Lööf,Jag tycker det är för mycket storstadsfokus i ...
4,15000,20000,Annie Lööf,För tar vi oss ut på landsbygden så är det ju ...
5,20000,22000,Annie Lööf,Det är byskolan på landsbygden.
6,22000,26000,Annie Lööf,Det är att en äldre inte ska få välja vilken h...
7,26000,29000,Annie Lööf,Det är viktigt att satsa på kvalitet.
8,29000,32000,Annie Lööf,"Jag är stolt över svensk sjukvård, svensk omso..."
9,32000,36000,Annie Lööf,"Vi ska ha hög kvalitet, skärpta krav och se ti..."


In [17]:
df_teams.columns

Index(['start', 'end ', 'speaker ', 'text'], dtype='object')