Open Source Code and Data for Whos Said What (WSW) framework.
ICDL 2025 Paper: Who Said What (WSW 2.0)? Enhanced Automated Analysis of Preschool Classroom Speech
IEEE ICDL 2025: https://ieeexplore.ieee.org/document/11204438
ICDL 2024 Paper: Who Said What? An Automated Approach to Analyzing Speech in Preschool Classrooms
Arxiv:https://arxiv.org/abs/2401.07342 IEEE ICDL 2024: https://ieeexplore.ieee.org/document/10644508
This project is designed to process and transcribe audio files using OpenAI's Whisper model and do speaker classification using Egocentric Speaker Classification. It supports splitting audio into smaller segments, overlapping them, and transcribing the content.
- Load audio files from a specified directory (supports
.wavand.mp3formats). - Process audio by splitting it into segments.
- Optionally overlap segments to avoid data loss at boundaries.
- Use different Whisper model sizes (
large-v3,large-v2) for transcription. - Save transcription results for further analysis.
Before running the script, ensure you have installed all required dependencies by setting up the Conda environment using the provided whisper_env.yml file.
-
Create the environment using
whisper_env.yml:conda env create -f whisper_env.yml
-
Activate the environment:
conda activate whisper_env
You can run the script directly from the command line. Below are the available options and their descriptions:
python WhisperTrans.py [audio_file_pth] [output_file_pth] [--cut_minutes CUT_MINUTES] [--overlap_minutes OVERLAP_MINUTES] [--model_name MODEL_NAME]audio_file_pth(str): The directory containing the audio files to process.output_file_pth(str): The directory where the transcription results will be saved.--cut_minutes(float, optional): Length of each audio segment in minutes (default: 2 minutes).--overlap_minutes(int, optional): Overlap duration between consecutive segments in minutes (default: 0 minutes).--model_name(str, optional): Whisper model to use (large-v3orlarge-v2, default:large-v3).
To process audio files in the audio/ directory, split them into 2-minute segments with 30 seconds of overlap, and save the results to the output/ directory:
python WhisperTrans.py audio/ output/ --cut_minutes 2 --overlap_minutes 0.5 --model_name large-v3Saves a list to a specified file using Python's pickle module.
Main function for processing audio files. It loads the specified Whisper model, splits the audio into segments, and performs transcription on each segment.
Follow the instructions via: https://github.com/orasanen/ALICE
AliceWhisperAlign.ipynb - Diarization and Speech Alignment
This notebook processes and aligns audio diarization data (RTTM) with Automated Speech Transcription (AST) data. The goal is to identify the speaker classification for each segment in the AST file based on the overlap with the RTTM data.
- Speaker Diarization Alignment: Reads RTTM files to identify speaker classes (e.g., child, male, female).
- AST Processing: Reads AST files and assigns speaker labels to each segment based on RTTM overlap.
- Data Cleaning: The notebook includes functionality to clean and filter overlapping or unwanted segments.
- Saving Results: The cleaned and updated AST data is saved to CSV files for further analysis.
-
RTTM File Processing: The RTTM file is read, and speaker classifications (such as
CHI,FEM,KCHI,MAL) are extracted. The start and end times of each speaker's segment are stored in a DataFrame. -
AST File Processing: AST files are read, and for each segment, an attempt is made to find the corresponding speaker classification based on the overlap with the RTTM data.
-
Overlap Calculation: For each AST segment, the overlap with RTTM segments is calculated, and the speaker with the maximum overlap is assigned to that AST segment.
-
Cleaning the AST Data: The notebook includes code that removes overlapping or duplicate segments from the AST data.
After setting up the environment, run the notebook to align AST and RTTM data. It will produce CSV outputs containing AST segments annotated with the corresponding speaker classification.
The notebook saves the cleaned and updated AST data in the StarFish_<Date>_SyncAW directory, naming the files as Sync_<AST_File_Name>_AW.csv.
Follow the instructions and scripts in the subfolder ReliabilityAna