<a href="https://colab.research.google.com/github/Rupper02/mos/blob/main/quickstart.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🚀 Quick Introduction to MOS Prediction using UTMOSv2

In this Jupyter notebook, we will introduce a method for predicting MOS (Mean Opinion Score) using UTMOSv2.

## 🛠 Installation

In [1]:
!GIT_LFS_SKIP_SMUDGE=1 pip install git+https://github.com/sarulab-speech/UTMOSv2.git

Collecting git+https://github.com/sarulab-speech/UTMOSv2.git
  Cloning https://github.com/sarulab-speech/UTMOSv2.git to /tmp/pip-req-build-h0b1m3vx
  Running command git clone --filter=blob:none --quiet https://github.com/sarulab-speech/UTMOSv2.git /tmp/pip-req-build-h0b1m3vx
  Resolved https://github.com/sarulab-speech/UTMOSv2.git to commit 16c55afcb1d25bef462dcf64c888e7eef8039918
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2.3.1->utmosv2==1.1.0)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=2.3.1->utmosv2==1.1.0)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=2.3.1->utmosv2==1.1.0)
  Do

In [13]:
import utmosv2
from collections import defaultdict


## 🔮 Make predictions

To predict the MOS of a single wav file:

In [10]:
model = utmosv2.create_model(pretrained=True)
mos = model.predict(input_path="./eleven.wav")
print(mos)

Loaded checkpoint from /root/.cache/utmosv2/models/fusion_stage3/fold0_s42_best_model.pth


Predicting: 100%|██████████| 1/1 [00:03<00:00,  3.68s/it]

2.978515625





To predict the MOS of all .wav files in a folder:

In [17]:
n_runs = 50
input_dir = "./"

# Create model once outside the loop
model = utmosv2.create_model(pretrained=True)

# Dictionary to store lists of predicted MOS for each file
mos_results = defaultdict(list)

for i in range(n_runs):
    mos = model.predict(input_dir=input_dir)
    for result in mos:
        mos_results[result['file_path']].append(result['predicted_mos'])

# Calculate and print average MOS per file
print("Average MOS after {} runs:".format(n_runs))
for file_path, scores in mos_results.items():
    avg_mos = sum(scores) / len(scores)
    print(f"{file_path}: {avg_mos:.4f}")

Loaded checkpoint from /root/.cache/utmosv2/models/fusion_stage3/fold0_s42_best_model.pth


Predicting: 100%|██████████| 1/1 [00:08<00:00,  8.34s/it]
Predicting: 100%|██████████| 1/1 [00:06<00:00,  6.99s/it]
Predicting: 100%|██████████| 1/1 [00:08<00:00,  8.57s/it]
Predicting: 100%|██████████| 1/1 [00:06<00:00,  6.80s/it]
Predicting: 100%|██████████| 1/1 [00:07<00:00,  7.93s/it]
Predicting: 100%|██████████| 1/1 [00:08<00:00,  8.13s/it]
Predicting: 100%|██████████| 1/1 [00:06<00:00,  6.83s/it]
Predicting: 100%|██████████| 1/1 [00:08<00:00,  8.33s/it]
Predicting: 100%|██████████| 1/1 [00:07<00:00,  7.09s/it]
Predicting: 100%|██████████| 1/1 [00:07<00:00,  7.75s/it]
Predicting: 100%|██████████| 1/1 [00:08<00:00,  8.06s/it]
Predicting: 100%|██████████| 1/1 [00:06<00:00,  6.89s/it]
Predicting: 100%|██████████| 1/1 [00:08<00:00,  8.65s/it]
Predicting: 100%|██████████| 1/1 [00:06<00:00,  6.86s/it]
Predicting: 100%|██████████| 1/1 [00:07<00:00,  7.86s/it]
Predicting: 100%|██████████| 1/1 [00:08<00:00,  8.10s/it]
Predicting: 100%|██████████| 1/1 [00:07<00:00,  7.09s/it]
Predicting: 10

Average MOS after 50 runs:
audio.wav: 3.3042
edgen.wav: 2.8942
eleven.wav: 2.9812
openai.wav: 3.4178





Note that either `input_path` or `input_dir` must be specified, but not both.

For more details on how to use the inference script, please refer to [inference guide](https://github.com/sarulab-speech/UTMOSv2/blob/main/docs/inference.md).