5/12 (Sun) | UF Measures

# UF Measure Calculation Based on Manual Annotation (Dialogue)

## 1. Introduction

Thise notebook calculates UF measures from manual annotation of spoken dialogue corpus.
The measure calculation consist of the following procedures.

1. Load required files

    a. Load a TextGrid file
    b. Load the corresponding rev transcript

2. Get start and end time from two adjacent turns

    a. Convert the loaded rev transcript to turn-wise </br>
    b. Concatnate user's turn-wise and system's original transcript </br>
    c. Iterate two adjacent turns of system and user </br>
    d. Extract start and end time from the two turns

3. Based on the start and end time, get transcript, pauses, fillers and disfluency intervals.
4. Count the number of pruned syllables

    a. Count the number of syllables from the obtained transcript intervals (N_syl) </br>
    b. Count the number of syllables from obtained fillers and disfluency intervals (N_syl_disfl) </br>
    c. Calculate N_syl - N_syl_disfl (i.e., pruning)

5. Get duration of the turn

    a. Get earlies (start_time) and latest (end_time) timestamp from the obtained transcript, filler and disfluency intervals </br>
    b. Calculate end_time - start_time

6. Get pauses of the turn

    a. From pause intervals generate lists of MCP and ECP durations </br>
    b. From filler intervals count the number of fillers

7. Get disfluency of the turn
    
    a. From disfluency intervals count the number of disfluency

8. Set parameters calcuated in the procedure 4 to 7
9. Based on the parameters claculate UF measures using UtteranceFluencyFeatureExtractor

    Note. ignore detailed disfluency ratio measures and word-wise measures

Before starting the procedures, the following code block loads required packages and define global variables.

In [1]:
import sys
from typing import List, Tuple, Generator
from pathlib import Path

import numpy as np
import pandas as pd
from textgrids import TextGrid, Interval
import syllables

from utils.transcript import convert_turnwise

sys.path.append(
    "/home/matsuura/Development/app/feature_extraction_api/app/modules"
)

from fluency import UtteranceFluencyFeatureExtractor

DATA_DIR = Path("/home/matsuura/Development/app/feature_extraction_api/experiment/data")

---

## 2. Define Functions

This section defines function to conduct procedures. 
First, the following code block defines a generator to yield a textgrid path and a function to load a TextGrid object.

In [2]:
def textgrid_path_generator() -> Generator[Path, None, None]:
    load_dir = DATA_DIR / "WoZ_Interview/01_Manual_TextGrid"

    for textgrid_path in load_dir.glob("*.TextGrid"):
        yield textgrid_path

def load_textgrid(textgrid_path: Path) -> TextGrid:
    textgrid = TextGrid(str(textgrid_path))

    return textgrid

The following code block defines a function to load a rev transcript as padans' DataFrame object.

In [3]:
def load_df_transcript(textgrid_path: Path) -> pd.DataFrame:
    filename = textgrid_path.stem
    uid = filename.split("_")[1]

    load_path = DATA_DIR / f"WoZ_Interview/01_Manual_TextGrid/{uid}.csv"
    df_transcript = pd.read_csv(load_path)

    return df_transcript

The following code block defines a function to concat user's turn-wise and system's original transcript.

In [4]:
def concat_transcript(
        df_transcript: pd.DataFrame,
        df_transcript_turn: pd.DataFrame
) -> pd.DataFrame:
    mask_system = df_transcript["speaker"] == "system"
    mask_user = df_transcript_turn["speaker"] == "user"

    df_system = df_transcript[mask_system]
    df_user = df_transcript_turn[mask_user]

    df_concat = pd.concat([df_system, df_user])
    df_concat = df_concat.sort_values("start_time").reset_index(drop=True)

    return df_concat

The following code block defines a generator of two adjacent turns in the "system-speech" tier.

In [5]:
def system_turn_generator(
        df_transcript: pd.DataFrame
) -> Generator[Tuple[pd.Series, pd.Series], None, None]:
    mask_user = (df_transcript["speaker"] == "user")

    mask_intro = (df_transcript["topic"] == "intro")
    mask_closing = (df_transcript["topic"] == "closing")
    mask_topic = mask_intro | mask_closing

    mask = mask_user & ~mask_topic

    df_transcript_masked = df_transcript[mask]
    for idx_user in df_transcript_masked.index:
        idx_system = idx_user - 1

        turn_system = df_transcript.loc[idx_system, :]
        turn_user = df_transcript.loc[idx_user, :]

        yield turn_system, turn_user

The following code block gets start and end time of two adjacent system's turns.

In [6]:
def get_start_end_time(turn_system: pd.Series, turn_user: pd.Series) -> Tuple[float, float]:
    start_time = turn_system["start_time"]
    end_time = turn_user["end_time"]

    start_time /= 1000
    end_time /= 1000

    return start_time, end_time

The following code block extract intervals based on start and end times.

In [7]:
def extract_intervals(
        textgrid: TextGrid, 
        target_tier: str,
        start_time: float, 
        end_time: float
) -> List[Interval]:
    
    interval_list = []
    tier = textgrid[target_tier]

    for interval in tier:
        if interval.text == "":
            continue

        if (interval.xmin >= start_time) and (interval.xmax <= end_time):
            interval_list.append(interval)

        if interval.xmax > end_time:
            break

    return interval_list

The following code block defines a function to count the number of syllables.

In [8]:
def count_syllables(interval_list: List[Interval]) -> int:
    n_syl = 0
    for interval in interval_list:
        text = interval.text
        if text == "":
            continue
        
        for word in text.split(" "):
            n_syl += syllables.estimate(word)

    return n_syl

The following code block defines two functions to get earliest and latest timestamps and to calcuate speech duration.

In [9]:
def get_earliest_lates_timestamp(interval_list: List[Interval]) -> Tuple[float, float]:
    earliest_time = 60 * 60 # 1 hour
    latest_time = -1

    for interval in interval_list:
        if interval.text == "":
            continue

        if earliest_time > interval.xmin:
            earliest_time = interval.xmin
        if latest_time < interval.xmax:
            latest_time = interval.xmax

    return earliest_time, latest_time


def calculate_speech_duration(
        transcript_list: List[Interval],
        pause_list: List[Interval],
        filler_list: List[Interval],
        disfluency_list: List[Interval]
) -> float:
    earliest_time, latest_time = get_earliest_lates_timestamp(transcript_list)

    for interval_list in [pause_list, filler_list, disfluency_list]:
        if len(interval_list) == 0:
            continue

        start_time, end_time = get_earliest_lates_timestamp(interval_list)
        if earliest_time > start_time:
            earliest_time = start_time
        if latest_time < end_time:
            latest_time = end_time

    return latest_time - earliest_time

The following code block defines two function to get mid-clause and end-clause pause duration list and the number of filled pauses.

In [10]:
def extract_pause_durations(pause_list: List[Interval]) -> Tuple[List[float], List[float]]:
    mcp = []
    ecp = []
    for interval in pause_list:
        if interval.text == "":
            continue

        duration = interval.xmax - interval.xmin
        if interval.text == "CI":
            mcp.append(duration)
            continue

        if interval.text == "CE":
            ecp.append(duration)
    
    return mcp, ecp

def count_filled_pauses(filler_list: List[Interval]) -> int:
    n_filler = 0
    for interval in filler_list:
        if interval.text == "":
            continue

        n_filler += 1

    return n_filler

The following code block defines a function to count disfluency.

In [11]:
def count_disfluency(disfluency_list: List[Interval]) -> int:
    n_disfl = 0
    for interval in disfluency_list:
        if interval.text == "":
            continue

        n_disfl += 1

    return n_disfl

The following code block defines a function to get a turn id.

In [12]:
def get_turn_id(df_transcript_turn: pd.DataFrame, idx: int) -> str:
    mask_user = (df_transcript_turn["speaker"] == "user")

    mask_intro = (df_transcript_turn["topic"] == "intro")
    mask_closing = (df_transcript_turn["topic"] == "closing")
    mask_topic = mask_intro | mask_closing

    mask = mask_user & (~mask_topic)

    df_user = df_transcript_turn[mask].reset_index(drop=False)

    turn_id = df_user.at[idx, "index"]
    turn_id = str(int(turn_id)).zfill(3)

    return turn_id

---

## 3. Calculate UF Measures

This section calculate UF measures.
The following code block constructs a UF measure extractor object.

In [13]:
extractor = UtteranceFluencyFeatureExtractor(
    rep=False, rpr=False, fs=False, rf=False, 
    ptr=False, ar_w=False, sr_w=False, mlr_w=False
)

The following code block claculate UF measures.

In [14]:
measure_list = []

for textgrid_path in textgrid_path_generator():
    textgrid = load_textgrid(textgrid_path)
    
    df_transcript = load_df_transcript(textgrid_path)
    df_transcript_turn = convert_turnwise(df_transcript)
    df_transcript = concat_transcript(df_transcript, df_transcript_turn)
    
    for idx, (turn_system, turn_user) in enumerate(system_turn_generator(df_transcript)):
        start_time, end_time = get_start_end_time(turn_system, turn_user)

        transcript_list = extract_intervals(textgrid, "transcript", start_time, end_time)
        pause_list = extract_intervals(textgrid, "pause", start_time, end_time)
        filler_list = extract_intervals(textgrid, "filler", start_time, end_time)
        disfluency_list = extract_intervals(textgrid, "disfluency", start_time, end_time)

        n_syl_unpruned = count_syllables(transcript_list)
        n_syl_filler = count_syllables(filler_list)
        n_syl_disfl = count_syllables(disfluency_list)
        n_syl_pruned = n_syl_unpruned - n_syl_filler - n_syl_disfl

        duration = calculate_speech_duration(
            transcript_list, pause_list,
            filler_list, disfluency_list
        )

        mcp, ecp = extract_pause_durations(pause_list)
        n_filler = count_filled_pauses(filler_list)

        n_disfl = count_disfluency(disfluency_list)

        params = {
            "n_word": 1,
            "dur": duration,
            "syl": np.array([n_syl_pruned]),
            "mc": np.array(mcp),
            "ec": np.array(ecp),
            "repetition": n_disfl,
            "self_repair": 0,
            "false_start": 0,
            "repair_false": 0,
            "filled": n_filler
        }

        measure = extractor.extract_by_parameters(params)
        turn_id = get_turn_id(df_transcript_turn, idx)
        uid = textgrid_path.stem.split("_")[1]

        measure = [f"{uid}_{turn_id}"] + measure

        measure_list.append(measure)

The following code block defines a pandas' DataFramse from claculated UF measures.

In [15]:
measure_names = extractor.check_feature_names()
columns = ["uid"] + measure_names

df_measures = pd.DataFrame(measure_list, columns=columns)
df_measures = df_measures.sort_values("uid").reset_index(drop=True)
df_measures

Unnamed: 0,uid,speech_rate,mid_clause_pause_ratio,end_clause_pause_ratio,mid_clause_p-dur,end_clause_p-dur,filled_pause_ratio,dysfluency_ratio,dysfluency_rate,articulation_rate,mean_length_of_run,mean_pause_duration
0,001_009,3.474903,0.111111,0.000000,0.470000,0.000000,0.000000,0.000000,0.000000,4.245283,4.500000,0.470000
1,001_011,4.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,4.000000,6.000000,0.000000
2,001_013,1.422107,0.227273,0.045455,0.974295,0.350646,0.090909,0.000000,0.000000,2.146786,3.142857,0.870354
3,001_015,2.698145,0.062500,0.000000,0.883256,0.000000,0.062500,0.000000,0.000000,3.170361,8.000000,0.883256
4,001_017,1.625000,0.076923,0.000000,1.624947,0.000000,0.076923,0.000000,0.000000,2.039199,6.500000,1.624947
...,...,...,...,...,...,...,...,...,...,...,...,...
1738,085_043,0.702988,0.500000,0.000000,0.325000,0.000000,0.250000,0.000000,0.000000,0.793651,1.333333,0.325000
1739,085_045,1.494612,0.232558,0.069767,0.810000,0.849370,0.069767,0.046512,0.069517,2.372821,3.071429,0.819085
1740,085_047,1.128757,0.209302,0.093023,1.333889,1.008750,0.081395,0.081395,0.091876,1.949671,3.185185,1.233846
1741,085_049,1.116784,0.285714,0.100000,0.954000,1.021429,0.157143,0.071429,0.079770,1.920439,2.500000,0.971481


The following code block saves the DataFrame.

In [16]:
save_path = DATA_DIR / "WoZ_Interview/09_UF_Measures/uf_measures_manu_pruned.csv"

df_measures.to_csv(save_path, index=False)