# 🇫🇮 Finnish TTS Training on NVIDIA Brev

Train a high-quality Finnish Text-to-Speech model using Fish Speech and LoRA fine-tuning.

## 📋 Before You Start

**✅ Prerequisites (already done by setup.sh):**
- Fish Speech installed at `~/fish-speech`
- Base model downloaded to `~/nvidia-brev-launchables/checkpoints/`
- PyTorch with CUDA support
- All dependencies installed

**📁 What You Need to Provide:**
- Finnish audio files (.wav, 44.1kHz, 16-bit PCM)
- Text transcripts (.lab files matching audio filenames)
- Minimum: 500 samples (1 hour of audio)
- Recommended: 2000+ samples (4+ hours)

**Upload your data to:** `~/data/FinnishSpeaker/`
```
~/data/FinnishSpeaker/
├── audio/
│   ├── speaker001_001.wav
│   ├── speaker001_002.wav
│   └── ...
└── transcripts/
    ├── speaker001_001.lab (contains: "Hyvää huomenta")
    ├── speaker001_002.lab (contains: "Kuinka voit?")
    └── ...
```

## 💰 Cost Estimate

| GPU | Time | Cost |
|-----|------|------|
| A100-80GB | ~4 hours | ~$4.80 |
| L40S | ~4 hours | ~$2.40 |

**Training saves checkpoints every 100 steps** - you can stop and resume anytime!

## 🚀 Let's Get Started!

Run the cells below in order. The notebook will guide you through:
1. Environment verification
2. Data preparation
3. VQ token extraction
4. Dataset packing
5. Model training (~4 hours)
6. Model export
7. Inference testing


# Finnish TTS Model Training on Nvidia Brev

**Project:** Train Finnish TTS model using Fish Speech + LoRA  
**Dataset:** 2000 Finnish samples (cv-15 + parliament)  
**Training:** Resume from step 750 → 2000 (improves quality)

---

## Step 1: Check GPU

In [3]:
# Install pip in the venv
!curl https://bootstrap.pypa.io/get-pip.py -o /tmp/get-pip.py
!{sys.executable} /tmp/get-pip.py

# Now install PyTorch
!{sys.executable} -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2131k  100 2131k    0     0  5056k      0 --:--:-- --:--:-- --:--:-- 5062k
Looking in indexes: https://mcache-kci.massedcompute.com/simple, https://pypi.org/simple
Collecting pip
  Downloading pip-25.3-py3-none-any.whl.metadata (4.7 kB)
Downloading pip-25.3-py3-none-any.whl (1.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m33.7 MB/s[0m  [33m0:00:00[0m
[?25hInstalling collected packages: pip
Successfully installed pip-25.3
Looking in indexes: https://download.pytorch.org/whl/cu121, https://pypi.org/simple
Collecting torch
  Downloading torch-2.9.1-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (30 kB)
Collecting torchvision
  Downloading torchvision-0.24.1-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (5.9 kB)
Collecting torchaudio
  Downloading torchaudio-2.9.1-cp312-cp312-manylinu

In [4]:
!nvidia-smi

import sys
sys.path.insert(0, '${HOME}/fish-speech')

import torch
print(f"\nPyTorch: {torch.__version__}")
print(f"CUDA: {torch.cuda.is_available()}")
gpu_name = torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None'
print(f"GPU: {gpu_name}")

# A100-80GB optimization
if 'A100' in gpu_name:
    print(f"\n🎯 A100-80GB detected!")
    print(f"Recommended settings:")
    print(f"  batch_size=8")
    print(f"  num_workers=8")

Sat Nov 29 12:14:07 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.64.03              Driver Version: 575.64.03      CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A100 80GB PCIe          On  |   00000000:01:00.0 Off |                    0 |
| N/A   26C    P0             42W /  300W |       0MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                

In [5]:
!nvidia-smi

# Use the existing .venv where everything is installed
import sys
sys.path.insert(0, '${HOME}/fish-speech')

# Now this should work since .venv has torch
import torch
print(f"\nPyTorch: {torch.__version__}")
print(f"CUDA: {torch.cuda.is_available()}")
gpu_name = torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None'
print(f"GPU: {gpu_name}")

# A100-80GB optimization
if 'A100' in gpu_name:
    print(f"\n🎯 A100-80GB detected!")
    print(f"Recommended settings:")
    print(f"  batch_size=8")
    print(f"  num_workers=8")

Sat Nov 29 12:14:24 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.64.03              Driver Version: 575.64.03      CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A100 80GB PCIe          On  |   00000000:01:00.0 Off |                    0 |
| N/A   26C    P0             63W /  300W |       4MiB /  81920MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                

## Step 3: Login to HuggingFace

In [7]:
!pip install huggingface_hub

Looking in indexes: https://mcache-kci.massedcompute.com/simple, https://pypi.org/simple
Collecting huggingface_hub
  Downloading huggingface_hub-1.1.6-py3-none-any.whl.metadata (13 kB)
Collecting hf-xet<2.0.0,>=1.2.0 (from huggingface_hub)
  Using cached hf_xet-1.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.9 kB)
Collecting shellingham (from huggingface_hub)
  Downloading shellingham-1.5.4-py2.py3-none-any.whl.metadata (3.5 kB)
Collecting tqdm>=4.42.1 (from huggingface_hub)
  Using cached tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
Collecting typer-slim (from huggingface_hub)
  Downloading typer_slim-0.20.0-py3-none-any.whl.metadata (16 kB)
Collecting click>=8.0.0 (from typer-slim->huggingface_hub)
  Downloading click-8.3.1-py3-none-any.whl.metadata (2.6 kB)
Downloading huggingface_hub-1.1.6-py3-none-any.whl (516 kB)
Using cached hf_xet-1.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB)
Using cached tqdm-4.67.1-py3-none-any.whl (78 kB

In [9]:
import os
from huggingface_hub import login

# Load from .env file
import pathlib
env_file = pathlib.Path('${HOME}/nvidia-brev-launchables/.env')
if env_file.exists():
    for line in env_file.read_text().splitlines():
        if line.strip() and not line.startswith('#') and '=' in line:
            key, value = line.split('=', 1)
            os.environ[key.strip()] = value.strip()

hf_token = os.getenv('HF_TOKEN')
print(f"HF_TOKEN found: {hf_token[:10]}..." if hf_token else "HF_TOKEN not found!")

login(token=hf_token)
print("✅ HuggingFace login successful!")

Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.


HF_TOKEN found: hf_TdbKwue...
✅ HuggingFace login successful!


## Step 4: Download Base Model

In [11]:
!ls -lh ~/nvidia-brev-launchables/checkpoints/openaudio-s1-mini/

total 3.4G
-rw-r--r-- 1 shadeform shadeform 1.8G Nov 29 11:46 codec.pth
-rw-r--r-- 1 shadeform shadeform  844 Nov 29 11:46 config.json
-rw-r--r-- 1 shadeform shadeform 1.7G Nov 29 11:46 model.pth
-rw-r--r-- 1 shadeform shadeform 2.8K Nov 29 11:46 README.md
-rw-r--r-- 1 shadeform shadeform 124K Nov 29 11:46 special_tokens.json
-rw-r--r-- 1 shadeform shadeform 2.5M Nov 29 11:46 tokenizer.tiktoken


## Step 5: Load Dataset

Make sure you added `finnishspeaker-2000-partial` dataset in notebook settings.

In [12]:
from pathlib import Path

DATA_DIR = Path('${HOME}/nvidia-brev-launchables/data/FinnishSpeaker')

wav_files = list(DATA_DIR.glob('*.wav'))
lab_files = list(DATA_DIR.glob('*.lab'))
npy_files = list(DATA_DIR.glob('*.npy'))

print(f"📊 Dataset at: {DATA_DIR}")
print(f"  WAV files: {len(wav_files)}")
print(f"  LAB files: {len(lab_files)}")
print(f"  NPY files: {len(npy_files)}")

if len(wav_files) == 0:
    print("\n❌ No files found! Did you extract the dataset?")
elif len(npy_files) < len(wav_files):
    print(f"\n⚠️  Need to extract VQ tokens ({len(npy_files)}/{len(wav_files)})")
else:
    print("\n✅ Dataset ready for training!")

📊 Dataset at: /home/shadeform/finnish-tts-brev/data/FinnishSpeaker
  WAV files: 2000
  LAB files: 2000
  NPY files: 502

⚠️  Need to extract VQ tokens (502/2000)


## Step 6: Extract Remaining VQ Tokens (if needed)

Skip if you already have 2000 .npy files. Run only if NPY count < 2000.

In [16]:
!pip install -q loguru

In [17]:
!pip install hydra-core omegaconf

Looking in indexes: https://mcache-kci.massedcompute.com/simple, https://pypi.org/simple
Collecting hydra-core
  Using cached hydra_core-1.3.2-py3-none-any.whl.metadata (5.5 kB)
Collecting omegaconf
  Using cached omegaconf-2.3.0-py3-none-any.whl.metadata (3.9 kB)
Collecting antlr4-python3-runtime==4.9.* (from hydra-core)
  Using cached antlr4-python3-runtime-4.9.3.tar.gz (117 kB)
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Using cached hydra_core-1.3.2-py3-none-any.whl (154 kB)
Using cached omegaconf-2.3.0-py3-none-any.whl (79 kB)
Building wheels for collected packages: antlr4-python3-runtime
  Building wheel for antlr4-python3-runtime (pyproject.toml) ... [?25ldone
[?25h  Created wheel for antlr4-python3-runtime: filename=antlr4_python3_runtime-4.9.3-py3-none-any.whl size=144591 sha256=fe6f1ca2aee7e2d3477bfca98001f439ba27e0513900b967e540a942969ae50d
  Stored in d

In [20]:
!pip install -e ${HOME}/fish-speech

Looking in indexes: https://mcache-kci.massedcompute.com/simple, https://pypi.org/simple
Obtaining file:///home/shadeform/fish-speech
  Installing build dependencies ... [?25ldone
[?25h  Checking if build backend supports build_editable ... [?25ldone
[?25h  Getting requirements to build editable ... [?25ldone
[?25h  Preparing editable metadata (pyproject.toml) ... [?25ldone
[?25hCollecting numpy<=1.26.4 (from fish-speech==0.1.0)
  Downloading numpy-1.26.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
Collecting transformers>=4.45.2 (from fish-speech==0.1.0)
  Using cached transformers-4.57.3-py3-none-any.whl.metadata (43 kB)
Collecting datasets==2.18.0 (from fish-speech==0.1.0)
  Downloading datasets-2.18.0-py3-none-any.whl.metadata (20 kB)
Collecting lightning>=2.1.0 (from fish-speech==0.1.0)
  Using cached lightning-2.6.0-py3-none-any.whl.metadata (44 kB)
Collecting tensorboard>=2.14.1 (from fish-speech==0.1.0)
  Using cached tensorboard-2.20.0-p

In [None]:
#fish-speech/tools/vqgan/extract_vq.py

In [22]:
pip install torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121

Looking in indexes: https://download.pytorch.org/whl/cu121, https://pypi.org/simple
[31mERROR: Could not find a version that satisfies the requirement torchaudio==2.1.0 (from versions: 2.2.0, 2.2.0+cu121, 2.2.1, 2.2.1+cu121, 2.2.2, 2.2.2+cu121, 2.3.0, 2.3.0+cu121, 2.3.1, 2.3.1+cu121, 2.4.0, 2.4.0+cu121, 2.4.1, 2.4.1+cu121, 2.5.0, 2.5.0+cu121, 2.5.1, 2.5.1+cu121, 2.6.0, 2.7.0, 2.7.1, 2.8.0, 2.9.0, 2.9.1)[0m[31m
[0m[31mERROR: No matching distribution found for torchaudio==2.1.0[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.


In [None]:
# Cell: Extract VQ Tokens
import sys
sys.path.insert(0, '${HOME}/fish-speech')

import os
os.chdir('${HOME}/fish-speech')

!python tools/vqgan/extract_vq.py \
  ${HOME}/nvidia-brev-launchables/data/FinnishSpeaker \
  --checkpoint-path ${HOME}/nvidia-brev-launchables/checkpoints/openaudio-s1-mini/codec.pth \
  --num-workers 2 \
  --batch-size 4

print("\n✅ VQ extraction complete!")
!echo "Final NPY count: $(ls ${HOME}/nvidia-brev-launchables/data/FinnishSpeaker/*.npy | wc -l)"

In [8]:
print("\n✅ VQ extraction complete!")
!echo "Final NPY count: $(ls ${HOME}/nvidia-brev-launchables/data/FinnishSpeaker/*.npy | wc -l)"


✅ VQ extraction complete!
Final NPY count: 2000


## Step 7: Pack Dataset

In [11]:
!pip install -q "protobuf>=3.20.3,<5" --upgrade

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
descript-audiotools 0.7.2 requires protobuf<3.20,>=3.9.2, but you have protobuf 4.25.8 which is incompatible.[0m[31m
[0m

In [12]:
# Cell: Pack Dataset to Proto Format
!python tools/llama/build_dataset.py \
  --input ${HOME}/nvidia-brev-launchables/data/FinnishSpeaker \
  --output ${HOME}/nvidia-brev-launchables/data/protos \
  --text-extension .lab \
  --num-workers 4

print("\n✅ Dataset packing complete!")
!ls -lh ${HOME}/nvidia-brev-launchables/data/protos/

[0;93m2025-11-29 12:49:05.849491345 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card0/device/vendor"[m
0it [00:00, ?it/s]
Loading /home/shadeform/finnish-tts-brev/data/FinnishSpeaker: 2000it [00:00, 193121.26it/s]

Grouping /home/shadeform/finnish-tts-brev/data/FinnishSpeaker:   0%| | 0/2000 [0[A
Grouping /home/shadeform/finnish-tts-brev/data/FinnishSpeaker: 100%|█| 2000/2000[A
[32m2025-11-29 12:49:08.549[0m | [1mINFO    [0m | [36m__main__[0m:[36mtask_generator_folder[0m:[36m46[0m - [1mFound 1 groups in /home/shadeform/finnish-tts-brev/data/FinnishSpeaker, ['/home/shadeform/finnish-tts-brev/data/FinnishSpeaker']...[0m
1it [00:01,  1.27s/it]
[32m2025-11-29 12:49:09.509[0m | [1mINFO    [0m | [36m__main__[0m:[36mmain[0m:[36m165[0m - [1mFinished writing 1 shards to /home/shadeform/finnish-tts-brev/data/protos[0m

✅ Dataset pac

## Step 8: Resume Training (750 → 2000 steps)

**This will:**
- Load your previous checkpoint from step 750
- Train for 1250 more steps
- Take ~1.5 hours
- Improve quality significantly

In [21]:
# Cell: Start LoRA Fine-tuning Training with Early Stopping
import os

# Load HF token from .env
with open('${HOME}/nvidia-brev-launchables/.env', 'r') as f:
    for line in f:
        if line.startswith('HF_TOKEN='):
            os.environ['HF_TOKEN'] = line.strip().split('=', 1)[1]
            break

print("🚀 Starting training with automatic early stopping...")
print("Training will stop if validation loss doesn't improve for 5 validations")
print("Estimated time: 1-2 hours on A100-80GB\n")

!python fish_speech/train.py \
  --config-name text2semantic_finetune \
  pretrained_ckpt_path=${HOME}/nvidia-brev-launchables/checkpoints/openaudio-s1-mini \
  train_dataset.proto_files=[${HOME}/nvidia-brev-launchables/data/protos] \
  val_dataset.proto_files=[${HOME}/nvidia-brev-launchables/data/protos] \
  project=FinnishTTS_Training \
  +lora@model.model.lora_config=r_8_alpha_16 \
  data.batch_size=8 \
  data.num_workers=8 \
  trainer.max_steps=3000 \
  trainer.val_check_interval=100 \
  trainer.accumulate_grad_batches=1 \
  +callbacks.early_stopping._target_=lightning.pytorch.callbacks.EarlyStopping \
  +callbacks.early_stopping.monitor=train/loss \
  +callbacks.early_stopping.patience=5 \
  +callbacks.early_stopping.mode=min \
  +callbacks.early_stopping.verbose=true

print("\n✅ Training complete!")
!ls -lh results/FinnishTTS_Training/checkpoints/

🚀 Starting training with automatic early stopping...
Training will stop if validation loss doesn't improve for 5 validations
Estimated time: 1-2 hours on A100-80GB

[0;93m2025-11-29 13:04:49.892624544 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card0/device/vendor"[m
[2025-11-29 13:04:52,415][__main__][INFO] - [rank: 0] Instantiating datamodule <fish_speech.datasets.semantic.SemanticDataModule>
[2025-11-29 13:04:52,631][datasets][INFO] - PyTorch version 2.9.1 available.
[2025-11-29 13:04:53,469][__main__][INFO] - [rank: 0] Instantiating model <fish_speech.models.text2semantic.lit_module.TextToSemantic>
[32m2025-11-29 13:04:53.480[0m | [1mINFO    [0m | [36mfish_speech.models.text2semantic.llama[0m:[36mfrom_pretrained[0m:[36m416[0m - [1mOverride max_seq_len to 4096[0m
[32m2025-11-29 13:04:53.710[0m | [1mINFO    [0m | [36mfish_speech.

## Step 9: Monitor Progress (Optional)

In [22]:
!ls -lht results/FinnishTTS_Training/checkpoints/ | head -5

total 233M
-rw------- 1 shadeform shadeform 47M Nov 29 17:28 step_000002800.ckpt
-rw------- 1 shadeform shadeform 47M Nov 29 17:18 step_000002700.ckpt
-rw------- 1 shadeform shadeform 47M Nov 29 17:09 step_000002600.ckpt
-rw------- 1 shadeform shadeform 47M Nov 29 16:59 step_000002500.ckpt


## Step 10: Merge LoRA Weights

In [23]:
# Cell 1: Merge LoRA weights (5 min)
!cd ~/fish-speech && python tools/llama/merge_lora.py \
  --lora-config r_8_alpha_16 \
  --base-weight ~/nvidia-brev-launchables/checkpoints/openaudio-s1-mini \
  --lora-weight ~/fish-speech/results/FinnishTTS_Training/checkpoints/step_000002800.ckpt \
  --output ~/finnish-merged-model

!ls -lh ~/finnish-merged-model/

[32m2025-11-29 17:46:58.890[0m | [1mINFO    [0m | [36m__main__[0m:[36mmerge[0m:[36m23[0m - [1mMerging /home/shadeform/finnish-tts-brev/checkpoints/openaudio-s1-mini and /home/shadeform/fish-speech/results/FinnishSpeaker_2000_finetune/checkpoints/step_000002800.ckpt into /home/shadeform/finnish-merged-model with r_8_alpha_16[0m
[32m2025-11-29 17:46:58.937[0m | [1mINFO    [0m | [36m__main__[0m:[36mmerge[0m:[36m31[0m - [1mLoaded lora model with config LoraConfig(r=8, lora_alpha=16, lora_dropout=0.01)[0m
[32m2025-11-29 17:46:59.189[0m | [1mINFO    [0m | [36mfish_speech.models.text2semantic.llama[0m:[36mfrom_pretrained[0m:[36m432[0m - [1mLoading model from /home/shadeform/finnish-tts-brev/checkpoints/openaudio-s1-mini, config: DualARModelArgs(model_type='dual_ar', vocab_size=155776, n_layer=28, n_head=16, dim=1024, intermediate_size=3072, n_local_heads=8, head_dim=128, rope_base=1000000, norm_eps=1e-06, max_seq_len=8192, dropout=0.0, tie_word_embeddings=Fa

In [None]:
!ls -lh /kaggle/input/my-2000-run/fish-speech/results/FinnishTTS_Training/checkpoints/step_000001050.ckpt

In [None]:
# Check which checkpoint to use
!ls -lh /kaggle/input/my-2000-run/fish-speech/results/FinnishTTS_Training/checkpoints/

# Merge (use step_000002000.ckpt or whatever your final checkpoint is)
!python tools/llama/merge_lora.py \
  --lora-config r_8_alpha_16 \
  --base-weight checkpoints/openaudio-s1-mini \
  --lora-weight /kaggle/input/my-2000-run/fish-speech/results/FinnishTTS_Training/checkpoints/step_000001050.ckpt \
  --output checkpoints/FinnishTTS_Trainingd

!ls -lh checkpoints/FinnishTTS_Trainingd/

## Step 11: Download Model

In [None]:
# Create archive
!tar -czf FinnishSpeaker_2000_trained_v2.tar.gz checkpoints/FinnishTTS_Trainingd/

!ls -lh FinnishSpeaker_2000_trained_v2.tar.gz
print("\n✅ Download from Output tab (right sidebar) →")

In [None]:
# Check training progress
!tail -20 /kaggle/input/my-2000-run/fish-speech/results/FinnishTTS_Training/train.log 2>/dev/null || echo "Log not created yet"

# Check checkpoints
!ls -lh /kaggle/input/my-2000-run/fish-speech/results/FinnishTTS_Training/checkpoints/ 2>/dev/null || echo "No checkpoints yet"

# Check GPU activity
!nvidia-smi --query-gpu=utilization.gpu,utilization.memory,memory.used --format=csv

In [None]:
!tail -50 results/FinnishTTS_Training/train.log | grep -E "(Epoch|step|loss|it/s)"

---

## Summary

**Training:**
- Resumed from step 750 → completed 2000 steps
- Total: ~2.6 epochs over 2000 samples
- Should sound **much better** than 750-step model

**Testing:**
1. Download `FinnishSpeaker_2000_trained_v2.tar.gz`
2. Extract on Mac
3. Test with WebUI
4. Try settings: `temperature=0.5`, `max_new_tokens=256`

In [28]:
!cd ~/fish-speech && python tools/webui/inference.py --llama-checkpoint-path ~/finnish-merged-model --decoder-checkpoint-path ~/nvidia-brev-launchables/checkpoints/openaudio-s1-mini/firefly-gan-base-generator.ckpt --listen 0.0.0.0:7860

[0;93m2025-11-29 17:53:44.078389381 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card0/device/vendor"[m


In [27]:
!ls ~/fish-speech/tools/webui/

inference.py  __init__.py  __pycache__	variables.py


In [30]:
# Generate test audio directly
!cd ~/fish-speech && python tools/llama/generate.py \
  --checkpoint-path ~/finnish-merged-model \
  --text "Hyvää päivää! Tämä on suomalainen puhesynteesi." \
  --num-samples 1 \
  --max-new-tokens 512 \
  --compile False

python: can't open file '/home/shadeform/fish-speech/tools/llama/generate.py': [Errno 2] No such file or directory


In [31]:
!ls ~/fish-speech/tools/llama/

build_dataset.py  eval_in_context.py  merge_lora.py  quantize.py


In [32]:
# Quick inference test
test_text = "Hei, kuinka voit?"

# Generate audio (basic CLI inference)
!python tools/llama/generate.py \
  --text "$test_text" \
  --checkpoint-path checkpoints/FinnishTTS_Trainingd \
  --output test_output.wav

# Play audio
from IPython.display import Audio
Audio('test_output.wav')

python: can't open file '/home/shadeform/fish-speech/tools/llama/generate.py': [Errno 2] No such file or directory


ValueError: rate must be specified when data is a numpy array or list of audio samples.

In [33]:
# Quick test inference
import torch
from fish_speech.text2semantic.inference import Text2Semantic

# Load model
model = Text2Semantic.from_pretrained(
    "checkpoints/FinnishSpeaker-finetuned",
    device="cuda"
)

# Generate
text = "Hei, kuinka voit?"
tokens = model.generate(text, max_new_tokens=128)
print(f"Generated {len(tokens)} tokens for: {text}")

ModuleNotFoundError: No module named 'fish_speech.text2semantic'