# Welcome to Modal notebooks!

Write Python code and collaborate in real time. Your code runs in Modal's
**serverless cloud**, and anyone in the same workspace can join.

This notebook comes with some common Python libraries installed. Run
cells with `Shift+Enter`.

In [1]:
!modal secret create kaggle-secret \
    KAGGLE_USERNAME=seifosamahosney \
    KAGGLE_KEY=YOUR_KAGGLE_KEY

Created a new secret [32m'kaggle-secret'[0m with the keys [32m'KAGGLE_USERNAME'[0m, [32m'KAGGLE_KEY'[0m

Use it in your Modal app:

[40m                                                                                                                [0m
[92;40m@app[0m[91;40m.[0m[97;40mfunction[0m[97;40m([0m[97;40msecrets[0m[91;40m=[0m[97;40m[[0m[97;40mmodal[0m[91;40m.[0m[97;40mSecret[0m[91;40m.[0m[97;40mfrom_name[0m[97;40m([0m[93;40m"[0m[93;40mkaggle-secret[0m[93;40m"[0m[97;40m)[0m[97;40m][0m[97;40m)[0m[40m                                                [0m
[96;40mdef[0m[97;40m [0m[92;40msome_function[0m[97;40m([0m[97;40m)[0m[97;40m:[0m[40m                                                                                            [0m
[97;40m    [0m[97;40mos[0m[91;40m.[0m[97;40mgetenv[0m[97;40m([0m[93;40m"[0m[93;40mKAGGLE_USERNAME[0m[93;40m"[0m[97;40m)[0m[40m                                            

In [2]:
%%writefile download_tts_simple.py
import modal
import os
import subprocess
import shutil
from pathlib import Path

app = modal.App("download-tts-fixed")

volume = modal.Volume.from_name("tts-dataset-storage", create_if_missing=True)

@app.function(
    image=modal.Image.debian_slim().pip_install("kaggle"),
    secrets=[modal.Secret.from_name("kaggle-secret")],
    volumes={"/data": volume},
    timeout=3600 
)
def download_to_volume():
    """Download from Kaggle and structure for Parler-TTS"""
    
    target_dir = Path("/data/tts_dataset/teacher_dataset_large_updated/voices")
    target_dir.mkdir(parents=True, exist_ok=True)
    
    temp_download_path = Path("/tmp/kaggle_data")
    temp_download_path.mkdir(exist_ok=True)
    
    print(f"📥 Downloading Kaggle dataset to {temp_download_path}...")
    
    cmd = [
        "kaggle", "datasets", "download",
        "-d", "seifosamahosney/tts-dataset",
        "-p", str(temp_download_path),
        "--unzip",
        "--force"
    ]
    
    result = subprocess.run(cmd, capture_output=True, text=True)
    
    if result.returncode != 0:
        print(f"❌ Kaggle Error: {result.stderr}")
        return f"Download failed: {result.stderr}"

    print("✅ Download successful! Moving files to correct structure...")

    count = 0
    for file_path in temp_download_path.rglob("*.wav"):
        target_file = target_dir / file_path.name
        if not target_file.exists():
            shutil.move(str(file_path), str(target_file))
            count += 1
    
    print(f"🚚 Moved {count} .wav files to {target_dir}")
    
    check_file = target_dir / "teacher_013_cry_132851.wav"
    if check_file.exists():
        print(f"✨ Verification Success: Found {check_file}")
    else:
        print(f"⚠️ Warning: Could not find the specific file {check_file.name}")

    volume.commit()
    print(f"\n💾 Saved to permanent volume 'tts-dataset-storage'")
    
    return f"Data structured at {target_dir}"

@app.local_entrypoint()
def main():
    download_to_volume.remote()

Writing download_tts_simple.py


In [3]:
!modal run download_tts_simple.py

[?25l[34m⠋[0m Initializing...[2K[32m✓[0m Initialized. [37mView run at [0m[4;37mhttps://modal.com/apps/yayayoyo2331/main/ap-cTQd5Zi9eirqqDVegc8P4g[0m
[34m⠋[0m Initializing...[2K[34m⠋[0m Initializing...
[?25h[1A[2K[?25l[34m⠋[0m Creating objects...[2K[34m⠸[0m Creating objects...
[37m└── [0m🔨 Created mount /root/download_tts_simple.py[2K[1A[2K[33mBuilding image im-TqUg7EmxJnwbcJT8vAoneC
[0m[34m⠼[0m[33m Creating objects...[0m[33m
[0m[37m└── [0m[33m🔨 Created mount /root/download_tts_simple.py[0m[2K[1A[2K[34m⠦[0m Creating objects...
[37m└── [0m🔨 Created mount /root/download_tts_simple.py[2K[1A[2K[33m
=> Step 0: FROM base
[0m[34m⠇[0m[33m Creating objects...[0m[33m
[0m[37m└── [0m[33m🔨 Created mount /root/download_tts_simple.py[0m[2K[1A[2K[34m⠏[0m Creating objects...
[37m└── [0m🔨 Created mount /root/download_tts_simple.py[2K[1A[2K[33m
=> Step 1: RUN python -m pip install kaggle
[0m[34m⠋[0m[33m Crea

In [4]:
!modal secret create hf-secret HF_TOKEN=YOUR_TOKEN # you must give it access to write and read

Created a new secret [32m'hf-secret'[0m with the key [32m'HF_TOKEN'[0m

Use it in your Modal app:

[40m                                                                                                                [0m
[92;40m@app[0m[91;40m.[0m[97;40mfunction[0m[97;40m([0m[97;40msecrets[0m[91;40m=[0m[97;40m[[0m[97;40mmodal[0m[91;40m.[0m[97;40mSecret[0m[91;40m.[0m[97;40mfrom_name[0m[97;40m([0m[93;40m"[0m[93;40mhf-secret[0m[93;40m"[0m[97;40m)[0m[97;40m][0m[97;40m)[0m[40m                                                    [0m
[96;40mdef[0m[97;40m [0m[92;40msome_function[0m[97;40m([0m[97;40m)[0m[97;40m:[0m[40m                                                                                            [0m
[97;40m    [0m[97;40mos[0m[91;40m.[0m[97;40mgetenv[0m[97;40m([0m[93;40m"[0m[93;40mHF_TOKEN[0m[93;40m"[0m[97;40m)[0m[40m                                                                                      

In [5]:
!modal volume create tts-dataset-storage

[31m╭─[0m[31m Error [0m[31m─────────────────────────────────────────────────────────────────────────────────────────────────────[0m[31m─╮[0m
[31m│[0m Volume 'tts-dataset-storage' already exists in environment 'main'                                            [31m│[0m
[31m╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────╯[0m


In [None]:
%uv pip install datasets

In [None]:
import os
from datasets import load_dataset, Audio, Dataset, Features, Value

dataset_name = "SeifElden2342532/parler-tts-dataset-format"
print(f"Loading dataset: {dataset_name}")
ds = load_dataset(dataset_name, split="train")

LOCAL_AUDIO_DIR = "./voices" 

def gen():
    for example in ds:
        filename = os.path.basename(example['audio'])
        local_path = os.path.join(LOCAL_AUDIO_DIR, filename)
        
        if not os.path.exists(local_path):
            print(f"Warning: File not found: {local_path}")
            continue
            
        yield {
            "audio": local_path,
            "text": example['text'],
            "description": example['description']
        }

features = Features({
    "audio": Audio(sampling_rate=44100), 
    "text": Value("string"),
    "description": Value("string")
})

print("Creating new dataset with embedded audio...")
new_ds = Dataset.from_generator(gen, features=features)


print("\nDIAGNOSIS CONFIRMED:")
print("Your dataset viewer shows the 'audio' column as a 'string' containing absolute paths.")
print("Because it is a 'string' and not an 'Audio' type, Hugging Face does not package the files.")
print("The training script on Modal tries to open those paths and fails because they don't exist there.")


In [10]:
%uv pip install datasets huggingface_hub

[2mUsing Python 3.12.6 environment at: /usr/local[0m
[37m⠋[0m [2mResolving dependencies...                                                     [0m[2K[37m⠋[0m [2mResolving dependencies...                                                     [0m[2K[37m⠙[0m [2mResolving dependencies...                                                     [0m[2K[37m⠙[0m [2mdatasets==4.5.0                                                               [0m[2K[37m⠙[0m [2mhuggingface-hub==0.34.4                                                       [0m[2K[37m⠹[0m [2mhuggingface-hub==0.34.4                                                       [0m[2K[37m⠹[0m [2mfilelock==3.13.1                                                              [0m[2K[37m⠹[0m [2mnumpy==2.1.2                                                                  [0m[2K[37m⠹[0m [2mpyarrow==23.0.0                                                               [0m[2K[37m⠹[0m [2mdill==0.4.0

In [17]:
%%writefile splitting_data.py
import modal
from datasets import load_dataset, DatasetDict

DATASET_REPO_ID = "SeifElden2342532/parler-tts-dataset-format"
ORIGINAL_SPLIT_NAME = "train"
VALIDATION_SPLIT_PERCENTAGE = 0.1

image = (
    modal.Image.from_registry("python:3.11-slim")
    .pip_install("datasets", "huggingface_hub")
)

app = modal.App("hf-dataset-splitter", image=image)

@app.function(
    secrets=[modal.Secret.from_name("hf-secret")],
    timeout=3600
)
def split_and_push_dataset():
    print(f"1. Loading dataset: {DATASET_REPO_ID}")
    raw_dataset = load_dataset(DATASET_REPO_ID)
    
    if "validation" in raw_dataset:
        print("Dataset already contains a 'validation' split. Skipping.")
        return

    print(f"2. Splitting the '{ORIGINAL_SPLIT_NAME}' split...")
    split_dataset = raw_dataset[ORIGINAL_SPLIT_NAME].train_test_split(
        test_size=VALIDATION_SPLIT_PERCENTAGE, 
        seed=42
    )

    split_dataset["validation"] = split_dataset.pop("test")
    
    final_dataset_dict = DatasetDict({
        "train": split_dataset["train"],
        "validation": split_dataset["validation"]
    })

    print(f"3. Pushing new splits to the Hub...")
    final_dataset_dict.push_to_hub(DATASET_REPO_ID)
    print("4. Successfully updated the dataset!")

@app.local_entrypoint()
def main():
    split_and_push_dataset.remote()



Overwriting splitting_data.py


In [18]:
!modal run splitting_data.py

[?25l[34m⠋[0m Initializing...[2K[32m✓[0m Initialized. [37mView run at [0m[4;37mhttps://modal.com/apps/rararoro2331/main/ap-kPFAZFniiuFVmgRp8fxUZz[0m
[34m⠋[0m Initializing...[2K[34m⠋[0m Initializing...
[?25h[1A[2K[?25l[34m⠋[0m Creating objects...[2K[34m⠸[0m Creating objects...
[37m└── [0m[34m⠋[0m Creating mount /root/splitting_data.py: Uploaded 0/1 files[2K[1A[2K[34m⠦[0m Creating objects...
[37m└── [0m[34m⠸[0m Creating mount /root/splitting_data.py: Uploaded 0/1 files[2K[1A[2K[34m⠏[0m Creating objects...
[37m└── [0m[34m⠦[0m Creating mount /root/splitting_data.py: Uploaded 0/1 files[2K[1A[2K[34m⠹[0m Creating objects...
[37m├── [0m🔨 Created mount /root/splitting_data.py
[37m└── [0m🔨 Created function split_and_push_dataset.
[?25h[1A[2K[1A[2K[1A[2K[32m✓[0m Created objects.
[37m├── [0m🔨 Created mount /root/splitting_data.py
[37m└── [0m🔨 Created function split_and_push_dataset.
[?25l[34m⠋[0m Running app.

In [7]:
%%writefile train_parler.py
import modal
import os
import subprocess
from pathlib import Path

GPU_CONFIG = "H100:1" 
NUM_GPUS = 1

VOLUME_NAME = "tts-dataset-storage"
MOUNT_PATH = Path("/data")
OUTPUT_DIR = MOUNT_PATH / "parler-tts-finetuned-h100-ultra-optimized"
HF_DATASET_REPO = "SeifElden2342532/parler-tts-dataset-format" 

REQUIREMENTS = [
    "torch==2.5.0",
    "torchaudio==2.5.0",
    "torchcodec==0.1",
    "accelerate",
    "datasets[audio]",
    "transformers==4.46.1",
    "pydantic==1.10.17",
    "tqdm",
    "soundfile",
    "scipy",
    "pyyaml",
    "protobuf==4.25.8",
    "wandb",
    "evaluate",
    "jiwer",
    "librosa",
    "bitsandbytes",
    "huggingface_hub",
    "parler-tts @ git+https://github.com/huggingface/parler-tts.git"
]

image = (
    modal.Image.from_registry("nvidia/cuda:12.1.1-devel-ubuntu22.04", add_python="3.11" )
    .apt_install("git", "ffmpeg", "libsndfile1") 
    .run_commands("ulimit -n 65536") 
    .pip_install(
        *REQUIREMENTS,
        extra_index_url="https://download.pytorch.org/whl/cu121"
     )
)

app = modal.App("parler-tts-h100-finetune-ultra-optimized", image=image)

@app.function(
    volumes={str(MOUNT_PATH): modal.Volume.from_name(VOLUME_NAME)},
    timeout=25000,
    gpu=GPU_CONFIG,
    env={"FORCE_LIBSNDFILE": "1"} 
)
def finetune_parler_tts():
    repo_path = Path("/root/parler-tts")
    if not repo_path.exists():
        print("Cloning Parler-TTS repository...")
        subprocess.run(["git", "clone", "https://github.com/huggingface/parler-tts.git", str(repo_path )], check=True)

    import training.data
    data_py_path = Path(training.data.__file__)
    
    with open(data_py_path, "r") as f:
        content = f.read()

    buggy_code = 'metadata_dataset_names = metadata_dataset_names.split("+") if metadata_dataset_names is not None else None'
    fixed_code = 'metadata_dataset_names = metadata_dataset_names.split("+") if (metadata_dataset_names is not None and isinstance(metadata_dataset_names, str)) else [None] * len(dataset_names)'
    if buggy_code in content:
        content = content.replace(buggy_code, fixed_code)

    buggy_eval_code = 'vectorized_datasets["validation"]'
    fixed_eval_code = 'vectorized_datasets["eval"]'
    if buggy_eval_code in content:
        content = content.replace(buggy_eval_code, fixed_eval_code)
    
    with open(data_py_path, "w") as f:
        f.write(content)

    training_script_path = repo_path / "training" / "run_parler_tts_training.py"
    with open(training_script_path, "r") as f:
        script_content = f.read()
    
    buggy_num_proc = 'num_proc=min(data_args.preprocessing_num_workers, len(vectorized_datasets["eval"]) - 1),'
    fixed_num_proc = 'num_proc=1,' 
    if buggy_num_proc in script_content:
        script_content = script_content.replace(buggy_num_proc, fixed_num_proc)

    with open(training_script_path, "w") as f:
        f.write(script_content)

    model_name = "parler-tts/parler-tts-mini-v1"
    os.makedirs(OUTPUT_DIR, exist_ok=True)
    
    training_command = f"""
accelerate launch --num_processes={NUM_GPUS} training/run_parler_tts_training.py \\
    --model_name_or_path "{model_name}" \\
    --train_dataset_name "{HF_DATASET_REPO}" \\
    --train_dataset_config_name "default" \\
    --train_split_name "train" \\
    --eval_dataset_name "{HF_DATASET_REPO}" \\
    --eval_dataset_config_name "default" \\
    --eval_split_name "validation" \\
    --max_train_samples 1000 \\
    --max_eval_samples 100 \\
    --seed 42 \\
    --do_train true \\
    --do_eval true \\
    --preprocessing_num_workers 1 \\
    --evaluation_strategy "epoch" \\
    --description_column_name "description" \\
    --prompt_column_name "text" \\
    --target_audio_column_name "audio" \\
    --description_tokenizer_name "google/flan-t5-base" \\
    --prompt_tokenizer_name "google/flan-t5-base" \\
    --save_to_disk "/tmp/parler_dataset_processed" \\
    --temporary_save_to_disk "/tmp/parler_dataset_temp" \\
    --output_dir "{OUTPUT_DIR}" \\
    --overwrite_output_dir true \\
    --per_device_train_batch_size 8 \\
    --per_device_eval_batch_size 8 \\
    --gradient_accumulation_steps 2 \\
    --gradient_checkpointing true \\
    --optim "adamw_bnb_8bit" \\
    --max_steps 200 \\
    --bf16 true \\
    --report_to "none"
"""
    
    print(f"\n[STARTING] Starting ultra-optimized training on {NUM_GPUS} H100 GPU...")
    subprocess.run(training_command, shell=True, check=True, cwd=str(repo_path))
    modal.Volume.from_name(VOLUME_NAME).commit()
    print("\n[FINISHED] Fine-Tuning Complete!")

@app.local_entrypoint()
def main():
    finetune_parler_tts.remote()

Overwriting train_parler.py


In [8]:
!modal run train_parler.py

[?25l[34m⠋[0m Initializing...[2K[32m✓[0m Initialized. [37mView run at [0m[4;37mhttps://modal.com/apps/yayayoyo2331/main/ap-guEIgViVnwaldxYDHBnbeT[0m
[34m⠋[0m Initializing...[2K[34m⠋[0m Initializing...
[?25h[1A[2K[?25l[34m⠋[0m Creating objects...[2K[34m⠸[0m Creating objects...
[37m└── [0m[34m⠋[0m Creating mount /root/train_parler.py: Uploaded 0/1 files[2K[1A[2K[34m⠦[0m Creating objects...
[37m└── [0m[34m⠸[0m Creating mount /root/train_parler.py: Uploaded 0/1 files[2K[1A[2K[34m⠏[0m Creating objects...
[37m└── [0m[34m⠦[0m Creating mount /root/train_parler.py: Finalizing index of 1 files[2K[1A[2K[34m⠙[0m Creating objects...
[37m├── [0m🔨 Created mount /root/train_parler.py
[37m└── [0m🔨 Created function finetune_parler_tts.
[?25h[1A[2K[1A[2K[1A[2K[32m✓[0m Created objects.
[37m├── [0m🔨 Created mount /root/train_parler.py
[37m└── [0m🔨 Created function finetune_parler_tts.
[?25l[34m⠋[0m Running app...[2K

In [9]:
%%writefile compare_models.py
import modal
import os
from pathlib import Path

GPU_CONFIG = "H100:1"
VOLUME_NAME = "tts-dataset-storage"
MOUNT_PATH = Path("/data")
FINETUNED_MODEL_PATH = MOUNT_PATH / "parler-tts-finetuned-h100-ultra-optimized"
BASE_MODEL_NAME = "parler-tts/parler-tts-mini-v1"

REQUIREMENTS = [
    "torch==2.5.0",
    "torchaudio==2.5.0",
    "transformers==4.46.1",
    "parler-tts @ git+https://github.com/huggingface/parler-tts.git",
    "soundfile",
    "scipy"
]

image = (
    modal.Image.from_registry("nvidia/cuda:12.1.1-devel-ubuntu22.04", add_python="3.11")
    .apt_install("git", "ffmpeg", "libsndfile1")
    .pip_install(*REQUIREMENTS)
)

app = modal.App("parler-tts-comparison", image=image)

@app.function(
    volumes={str(MOUNT_PATH): modal.Volume.from_name(VOLUME_NAME)},
    gpu=GPU_CONFIG,
    timeout=600
)
def run_comparison(prompt: str, description: str):
    import torch
    from parler_tts import ParlerTTSForConditionalGeneration
    from transformers import AutoTokenizer
    import soundfile as sf
    import numpy as np

    device = "cuda" if torch.cuda.is_available() else "cpu"
    
    def generate_audio(model_id, model_name_label):
        print(f"Loading {model_name_label} model...")
        model = ParlerTTSForConditionalGeneration.from_pretrained(model_id).to(device)
        tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_NAME)
        description_tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-base")

        print(f"Generating audio with {model_name_label}...")
        input_ids = description_tokenizer(description, return_tensors="pt").input_ids.to(device)
        prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

        generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
        audio_arr = generation.cpu().numpy().squeeze()
        
        filename = f"output_{model_name_label.lower().replace(' ', '_')}.wav"
        sf.write(filename, audio_arr, model.config.sampling_rate)
        
        with open(filename, "rb") as f:
            return f.read(), filename

    base_audio, base_file = generate_audio(BASE_MODEL_NAME, "Base")
    
    if not FINETUNED_MODEL_PATH.exists():
        return f"Error: Fine-tuned model not found at {FINETUNED_MODEL_PATH}. Did the training finish?", None

    ft_audio, ft_file = generate_audio(str(FINETUNED_MODEL_PATH), "Fine-tuned")
    
    return {
        "base": (base_audio, base_file),
        "finetuned": (ft_audio, ft_file)
    }

@app.local_entrypoint()
def main():
    test_prompt = "Hey, did you know that Parler TTS is actually quite good at capturing emotions?"
    test_description = "A female speaker with a slightly high-pitched voice delivers her words at a fast pace with a small touch of excitement."
    
    print(f"Starting comparison inference...")
    print(f"Prompt: {test_prompt}")
    print(f"Description: {test_description}")
    
    results = run_comparison.remote(test_prompt, test_description)
    
    if isinstance(results, str):
        print(results)
    else:
        for key, (audio_data, filename) in results.items():
            with open(filename, "wb") as f:
                f.write(audio_data)
            print(f"Saved {key} result to {filename}")
        print("\nComparison complete! You can now listen to both files to hear the difference.")

Writing compare_models.py


In [10]:
!modal run compare_models

[33m│[0m Using Python module paths will require using the -m flag in a future version of Modal.                       [33m│[0m
[33m│[0m Use `modal run -m compare_models` instead.                                                                   [33m│[0m
[33m╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────╯[0m
[?25l[34m⠋[0m Initializing...[2K[32m✓[0m Initialized. [37mView run at [0m[4;37mhttps://modal.com/apps/yayayoyo2331/main/ap-YoShzGIhb7UCSzB0AEg0JK[0m
[34m⠋[0m Initializing...[2K[34m⠋[0m Initializing...
[?25h[1A[2K[?25l[34m⠋[0m Creating objects...[2K[34m⠸[0m Creating objects...
[37m└── [0m[34m⠋[0m Creating mount /root/compare_models.py: Uploaded 0/1 files[2K[1A[2K[34m⠦[0m Creating objects...
[37m└── [0m[34m⠸[0m Creating mount /root/compare_models.py: Finalizing index of 1 files[2K[1A[2K[33mBuilding image im-lWVIT4A21LHVjh7utu28LU
[0m[34m⠇[0m[33m Creatin