11/25 (Tue)

---

# Converting 1st Read Aloud Weba Files to Wav Format

This notebook converts 1st read-aloud weba files, which were recorded through Gorilla, to wav format for further processing.
The 1st read-aloud is the audio recodings of the following paragraph of Chinua Achebe's "Things Fall Apart":
> He stretched himself and scratched his thigh where a mosquito had bitten him while he slept. Another one was wailing near his ear. He slapped the ear and hoped he had killed it. "Why do they always go for one's ears?" When he was a child, his mother had told him a story about it. Mosquito, she had said, had asked Ear to marry him, whereupon Ear fell on the floor in uncontrollable laughter. "How much longer do you think you will live?" she asked. "You are already a skeleton!" Mosquito went away humiliated; and anytime he passed her way, he told Ear that he was still alive. (Achebe 1958:53)

The following code cell imports the necessary libraries and sets up the file paths for the conversion process.

In [1]:
from pathlib import Path

import pandas as pd

from l2speech_ree_group_proj import GORILLA_RAW_DATA_DIR, PROCESSED_DATA_DIR
from l2speech_ree_group_proj.weba_2_wav import weba_2_wav

TARGET_TASK_NAME = "Read Aloud Recoding"

The following code cell defines the function to identify filenames of the 1st read-aloud weba files from the Gorilla export directory.

In [2]:
def identify_1st_read_aloud_weba_files() -> list[Path]:
    target_weba_files_str: list[str] = []
    target_weba_files_path: list[Path] = []

    for csv_path in GORILLA_RAW_DATA_DIR.glob("*.csv"):
        df_data_exp = pd.read_csv(csv_path)

        task_name = df_data_exp["Task Name"][0]
        if task_name != TARGET_TASK_NAME:
            continue

        task_1_mask = df_data_exp["display"] == "Task 1"
        weba_mask = df_data_exp["Response"].str.endswith(".weba")
        non_url_mask = ~(df_data_exp["Response"].str.startswith("http", na=False))

        target_weba_files_str += df_data_exp[task_1_mask & weba_mask & non_url_mask]["Response"].tolist()

    for weba_file in target_weba_files_str:
        target_path = GORILLA_RAW_DATA_DIR / f"uploads/{weba_file}"

        target_weba_files_path.append(target_path)

    return target_weba_files_path

The following code cell performs the conversion from weba to wav format.

In [3]:
target_weba_files_path = identify_1st_read_aloud_weba_files()

for weba_path in target_weba_files_path:
    if not weba_path.exists():
        print(f"Warning: {weba_path.name} does not exist. Skipping...")
        continue

    wav_path = PROCESSED_DATA_DIR / f"{weba_path.stem}.wav"
    weba_2_wav(weba_path, wav_path)

/Users/ryuki/Development/l2speech-ree-group-proj/data/processed/248922-1-14554303-task-v7ri-50699629-readaloud1-7-1.wav already exists. Skipping conversion.
/Users/ryuki/Development/l2speech-ree-group-proj/data/processed/248922-1-14564746-task-v7ri-50724318-readaloud1-7-1.wav already exists. Skipping conversion.
