Skip to content

Finetuning Issue #15479

@Shaheer66

Description

@Shaheer66

I am trying to finetune the Nvidia models on the Openslr and CORAA ASR with some youtube extracted audios but the WER is even more higher than the base mode.
Model added too much noise, the base model has an average accuracy on similar test set is 85% but on the finetuned model it decreases to 50%
I have tried with my code and the available finetuning code in the Nemo repo but it does not work
Below I am adding my finetuning code

import os
import torch
import nemo.collections.asr as nemo_asr
from omegaconf import OmegaConf, open_dict
import lightning.pytorch as pl

# --- 1. SET PATHS (Based on your setup) ---
ROOT = os.getcwd()
TRAIN_MANIFEST = os.path.join(ROOT, "train_manifest_nemo.jsonl")
VAL_MANIFEST = os.path.join(ROOT, "val_manifest_nemo.jsonl")

# --- 2. LOAD MODEL ---
print("Loading model...")
model = nemo_asr.models.EncDecHybridRNNTCTCBPEModel.from_pretrained(
    model_name="nvidia/stt_pt_fastconformer_hybrid_large_pc"
)

# --- 3. THE ERROR FIX: Patch Missing Mandatory Values ---
# This block prevents the "Missing mandatory value: dir" and "manifest_filepath" errors
with open_dict(model.cfg):
    # Fix the tokenizer directory crash
    if 'tokenizer' in model.cfg:
        model.cfg.tokenizer.dir = "" 
    
    # Fill placeholders for all data sections to satisfy OmegaConf
    for ds in ['train_ds', 'validation_ds', 'test_ds']:
        if ds in model.cfg:
            model.cfg[ds].manifest_filepath = TRAIN_MANIFEST if ds == 'train_ds' else VAL_MANIFEST
            model.cfg[ds].batch_size = 1 # Keep it 1 for CPU stability
            model.cfg[ds].num_workers = 0 
            model.cfg[ds].pin_memory = False

# --- 4. SETUP DATA ---
print("Setting up data loaders...")
model.setup_training_data(model.cfg.train_ds)
model.setup_validation_data(model.cfg.validation_ds)

# --- 5. RESULT ORIENTED: Protect the 85% Accuracy Baseline ---
# We freeze the encoder so your small pilot data doesn't "break" the model.
# This fixes the "one-word output (ela)" issue.
model.encoder.freeze()
print("Encoder frozen. Only fine-tuning Decoders for the pilot.")

# --- 6. OPTIMIZATION SETUP ---
model.setup_optimization(
    optim_config={
        'lr': 1e-4,
        'weight_decay': 0.001,
        'sched': {
            'name': 'CosineAnnealing',
            'warmup_steps': 100,
            'min_lr': 1e-6,
        },
    }
)

# --- 7. TRAINER (Strictly CPU-Compatible) ---
trainer = pl.Trainer(
    max_epochs=5,
    accelerator="cpu", # Change to "gpu" once you are on the RTX 4080
    devices=1,
    precision=32,      # 16-bit is for GPU only; 32-bit is required for CPU
    enable_checkpointing=True,
    logger=False
)

# --- 8. EXECUTION ---
print("Starting Pilot Training...")
trainer.fit(model)

# --- 9. SAVE FINAL MODEL ---
model.save_to("pt_br_pilot_final.nemo")
print("Successfully saved: pt_br_pilot_final.nemo")

Any support from your side is highly appreciated, Its not a personal project, it is my job requirement from client side we are targeting the more tahn 90% of accuracy so if anyone can guide me help me
It will be a great favour and I am very thankful to you for this guidance

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions