# DecVAE Tutorial: VOC_ALS Dataset

Complete workflow example for the VOC_ALS dataset.

In [1]:
# Import necessary libraries
import os
import json
from pathlib import Path

# Set the working directory to the DecVAE root
# Adjust this path to your local DecVAE directory
DECVAE_ROOT = Path(os.getcwd()).parent if 'examples' in os.getcwd() else Path(os.getcwd())
os.chdir(DECVAE_ROOT)
print(f"Working directory: {os.getcwd()}")

Working directory: c:\Users\Dell\Files\DecVAE


## 1. Prepare VOC_ALS Dataset

The VOC_ALS dataset contains voice recordings from individuals with ALS (Amyotrophic Lateral Sclerosis) and healthy controls.

Download the dataset from https://www.synapse.org/Synapse:syn53009474/wiki/624730 and place it in "../VOC-ALS" (same level as the DecVAE project directory).

Then execute the data preparation script:

In [3]:
!python scripts/misc/voc_als_prep.py

Added c:\Users\Dell\Files\DecVAE to Python path
Reading metadata from ../VOC-ALS\VOC-ALS.xlsx...
Encoded ALSFRS-R_TotalScore into 8 intervals
Encoded DiseaseDuration into 5 intervals
Encoded KingClinicalStage into values 0-6
Encoded Cantagallo_Questionnaire into 5 intervals
Added phoneme encoding map (A,E,I,O,U,KA,PA,TA)
Added category encoding map (HC,ALS)
Added speaker_id encoding map for 153 speakers
Succesfully encoded clinical variables
Saved encoding maps to ./vocabularies\voc_als_encodings.json
Scanning directories for audio files...
Found 1224 audio files across 153 subjects
Subject distribution: 102 patients, 51 controls
Organizing data by subject...
  Part 1: 39 subjects
Saving part 1 of processed data to ../VOC-ALS_preprocessed\voc_als_data_part1.json.gz...
Processed data saved to ../VOC-ALS_preprocessed\voc_als_data_part1.json.gz
  Part 2: 38 subjects
Saving part 2 of processed data to ../VOC-ALS_preprocessed\voc_als_data_part2.json.gz...
Processed data saved to ../VOC-ALS_


  0%|          | 0/153 [00:00<?, ?it/s]
  1%|          | 1/153 [00:01<04:12,  1.66s/it]
  2%|▏         | 3/153 [00:01<01:10,  2.14it/s]
  5%|▍         | 7/153 [00:01<00:25,  5.75it/s]
  6%|▌         | 9/153 [00:02<00:19,  7.27it/s]
  8%|▊         | 13/153 [00:02<00:12, 11.50it/s]
 10%|█         | 16/153 [00:02<00:09, 13.86it/s]
 12%|█▏        | 19/153 [00:02<00:08, 15.29it/s]
 14%|█▍        | 22/153 [00:02<00:07, 16.98it/s]
 16%|█▋        | 25/153 [00:02<00:06, 19.39it/s]
 18%|█▊        | 28/153 [00:02<00:08, 15.00it/s]
 20%|█▉        | 30/153 [00:03<00:08, 14.76it/s]
 21%|██        | 32/153 [00:03<00:07, 15.72it/s]
 23%|██▎       | 35/153 [00:03<00:06, 18.37it/s]
 25%|██▍       | 38/153 [00:03<00:07, 16.12it/s]
 27%|██▋       | 41/153 [00:03<00:06, 17.58it/s]
 28%|██▊       | 43/153 [00:03<00:06, 17.61it/s]
 29%|██▉       | 45/153 [00:04<00:08, 12.73it/s]
 31%|███       | 47/153 [00:04<00:07, 14.05it/s]
 32%|███▏      | 49/153 [00:04<00:07, 13.85it/s]
 35%|███▍      | 53/153 [00:04<0

## 2. Input Visualization

We generate input visualizations for the raw audio signal (X), and the components after applying a decomposition. We visualize individual components (OC1, OC2, ..., OCn) and aggregated representations, e.g. concatenation of all components and initial X [X,OC1,OC2,...,OCn]. We color the representations using frequency correspondence of the inputs or generative factors (phoneme, speaker, disease characteristics).

For the VOC_ALS dataset, we will visualize the inputs to all models.

Frame-level:

In [None]:
# Visualize frame-level inputs
!accelerate launch scripts/visualize/low_dim_vis_input.py \
    --config_file config_files/input_visualizations/config_visualizing_input_frames_voc_als.json

Sequence-level:

In [None]:
# Visualize sequence-level inputs
!accelerate launch scripts/visualize/low_dim_vis_input.py \
    --config_file config_files/input_visualizations/config_visualizing_input_sequences_voc_als.json

## 3. Decompose the VOC-ALS dataset

We will not pre-train a model for VOC-ALS, but instead use pre-trained on SimVowels or TIMIT models.

We will still have to run the pre-training script to obtain the decomposed data. If the input_visualization has been generated earlier, then this step can be skipped.

Single-GPU: use the --gpu_ids argument to specify the id of the GPU (0,1,2,...) - accelerate launch --gpu_ids <id> scripts... . Alternatively omit this argument and the default GPU id in your system will be used (as below).

In [None]:
# Pre-train DecVAE on single GPU
!accelerate launch scripts/pre-training/base_models_ssl_pretraining.py \
    --config_file config_files/DecVAEs/voc_als/pre-training/config_pretraining_voc_als_NoC4.json

Multi-GPU (specify GPU IDs):

In [None]:
# Pre-train DecVAE on multiple GPUs (e.g., GPU 0 and 1)
# Uncomment and modify as needed:
# !accelerate launch --gpu_ids 0,1 scripts/pre-training/base_models_ssl_pretraining.py \
#     --config_file config_files/DecVAEs/voc_als/pre-training/config_pretraining_voc_als_NoC4.json

View configuration:

In [None]:
import json

with open("config_files/DecVAEs/voc_als/pre-training/config_pretraining_voc_als_NoC4.json", 'r') as f:
    config = json.load(f)

print(json.dumps(config, indent=2))

## 4. Latent Evaluation

In [None]:
# Evaluate latent representations
!accelerate launch scripts/post-training/latents_post_analysis.py \
    --config_file config_files/DecVAEs/voc_als/latent_evaluations/config_latent_anal_voc_als.json

## 5. Latent Visualization

Frame-level:

In [None]:
# Visualize frame-level latent representations
!accelerate launch scripts/visualize/low_dim_vis_latents.py \
    --config_file config_files/DecVAEs/voc_als/latent_visualizations/config_latent_frames_visualization_voc_als.json

Sequence-level:

In [None]:
# Visualize sequence-level latent representations
!accelerate launch scripts/visualize/low_dim_vis_latents.py \
    --config_file config_files/DecVAEs/voc_als/latent_visualizations/config_latent_sequences_visualization_voc_als.json

## 6. Latent Traversals

Perform traversal analysis:

In [None]:
# Perform latent traversal analysis
!accelerate launch scripts/latent_response_analysis/latent_traversal_analysis.py \
    --config_file config_files/DecVAEs/voc_als/latent_traversals/config_latent_traversals_voc_als.json