# Style-Preserving Speech-to-Speech Translation Experiment

This notebook runs the experiment to determine the minimal duration of speaker embeddings required to effectively clone a speaker's voice across languages.

## 1. Setup Environment
Install necessary dependencies if running on Google Colab.

In [None]:
#only if needed, clear all files except experiment.ipynb
# This command will remove all files and folders in the current directory except "experiment.ipynb"
import os

for fname in os.listdir():
    if fname != "experiment.ipynb":
        if os.path.isdir(fname):
            import shutil
            shutil.rmtree(fname)
        else:
            os.remove(fname)







In [59]:
# Cell to refresh code from GitHub
import os

# Navigate to the repo directory
if os.path.exists("CS479-SpeakerEmbeddings"):
    os.chdir("CS479-SpeakerEmbeddings")
    !git pull
else:
    !git clone https://github.com/NathanAsayDong/CS479-SpeakerEmbeddings.git
    os.chdir("CS479-SpeakerEmbeddings")

# Optional: Reload modules if you've already imported them
import sys
import importlib

# List of your custom modules to reload
modules_to_reload = [
    "common_voice_dataset",
    "setup_experiment",
    "run_experiment",
    "asr_service",
    "translation_service",
    "tts_service",
    "embedding_service",
    "synthetic_data_service",
    "enums"
]

for module_name in modules_to_reload:
    if module_name in sys.modules:
        importlib.reload(sys.modules[module_name])
        print(f"Reloaded {module_name}")

Cloning into 'CS479-SpeakerEmbeddings'...
remote: Enumerating objects: 48, done.[K
remote: Counting objects: 100% (48/48), done.[K
remote: Compressing objects: 100% (36/36), done.[K
remote: Total 48 (delta 13), reused 45 (delta 10), pack-reused 0 (from 0)[K
Receiving objects: 100% (48/48), 568.24 KiB | 14.95 MiB/s, done.
Resolving deltas: 100% (13/13), done.
Reloaded common_voice_dataset
Reloaded setup_experiment
Reloaded run_experiment
Reloaded asr_service
Reloaded translation_service
Reloaded tts_service
Reloaded embedding_service
Reloaded synthetic_data_service
Reloaded enums


In [None]:
# For refreshing GitHub repo in Colab: remove old directory and re-clone
import shutil, os
#cd out of the current directory
%cd ..
!ls
# repo_dir = "CS479-SpeakerEmbeddings"
# if os.path.exists(repo_dir):
#     shutil.rmtree(repo_dir)
# !git clone https://github.com/NathanAsayDong/CS479-SpeakerEmbeddings.git
# %cd CS479-SpeakerEmbeddings
# !ls

Cloning into 'CS479-SpeakerEmbeddings'...
remote: Enumerating objects: 44, done.[K
remote: Counting objects: 100% (44/44), done.[K
remote: Compressing objects: 100% (32/32), done.[K
remote: Total 44 (delta 10), reused 44 (delta 10), pack-reused 0 (from 0)[K
Receiving objects: 100% (44/44), 564.29 KiB | 13.76 MiB/s, done.
Resolving deltas: 100% (10/10), done.
/content/CS479-SpeakerEmbeddings
asr_service.py		 peoples_speech_dataset.py  setup_experiment.py
common_voice_dataset.py  ProjectOutline.pdf	    synthetic_data_service.py
embedding_service.py	 __pycache__		    tmp_model
enums.py		 readMe			    translation_service.py
experiment.ipynb	 requirements.txt	    tts_service.py
libri_speech_dataset.py  run_experiment.py
main.py			 Samples


In [None]:
# !pip install torch transformers speechbrain soundfile librosa openai-whisper accelerate sentencepiece pydantic torchcodec datasets kagglehub[pandas-datasets]
# !pip install sounddevice
# !sudo apt-get install libportaudio2

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libportaudio2 is already the newest version (19.6.0-1.1).
0 upgraded, 0 newly installed, 0 to remove and 41 not upgraded.


## 2. Import Modules
Import the experiment setup and runner classes.

In [52]:
import os
import sys

# Add current directory to path if needed
sys.path.append(os.getcwd())

from enums import Language
from setup_experiment import ExperimentSetup
from run_experiment import ExperimentRunner

## 3. Configure Experiment
Define the parameters for the experiment: source/target languages and reference durations to test.

In [53]:
SOURCE_LANG = Language.ENGLISH
TARGET_LANG = Language.SPANISH
DURATIONS = [5.0, 10.0, 15.0, 20.0, 30.0]
NUM_SPEAKERS = 5 # Number of unique speakers to test
SEED = 42

## 4. Prepare Data
This step:
1. Downloads/Loads Common Voice dataset via KaggleHub.
2. Selects `NUM_SPEAKERS` with sufficient data.
3. Creates concatenated reference audio files for each duration.
4. Generates a manifest for the experiment run.

In [None]:
!ls

/content/CS479-SpeakerEmbeddings
asr_service.py		 libri_speech_dataset.py    run_experiment.py
common_voice_dataset.py  main.py		    Samples
CS479-SpeakerEmbeddings  peoples_speech_dataset.py  setup_experiment.py
embedding_service.py	 ProjectOutline.pdf	    synthetic_data_service.py
enums.py		 __pycache__		    tmp_model
experiment_data		 readMe			    translation_service.py
experiment.ipynb	 requirements.txt	    tts_service.py


In [56]:
setup = ExperimentSetup(
    source_language=SOURCE_LANG,
    target_language=TARGET_LANG,
    reference_durations=DURATIONS,
    seed=SEED
)

# Prepare the manifest
manifest = setup.prepare_data(num_speakers=NUM_SPEAKERS)

print(f"Manifest ready with {len(manifest)} speakers.")
print("Sample Item:", manifest[0] if manifest else "No data")

Preparing experiment data for 5 speakers...
Loading Common Voice dataset for language 'en'...
Using Colab cache for faster access to the 'common-voice' dataset.
Dataset path: /kaggle/input/common-voice
Searching for language 'en' in /kaggle/input/common-voice
Found flattened dataset structure at /kaggle/input/common-voice
Loaded 4076 records for en/dev


KeyError: 'client_id'

## 5. Run Experiment
Execute the pipeline for each speaker and duration:
1. Extract ground truth embedding (original speaker).
2. Translate source text to Spanish.
3. Synthesize Spanish speech using the reference audio (5s, 10s, etc.) for style.
4. Compute Cosine Similarity between ground truth and output embeddings.

In [None]:
runner = ExperimentRunner()
runner.run(manifest)

## 6. Analyze Results
Save and inspect the results.

In [None]:
runner.save_results("experiment_results.csv")

import pandas as pd
results_df = pd.read_csv("experiment_results.csv")

# Display average similarity score per duration
print("\nAverage Similarity Scores by Duration:")
print(results_df.groupby("duration")["similarity_score"].mean())

results_df.head(10)