<a href="https://colab.research.google.com/github/benjaminsinzore/moshi_finetune/blob/main/moshi_finetune.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Getting Started with Fine-Tuning Moshi 7B

This notebook shows you a simple example of how to LoRA finetune Moshi 7B. You can run this notebook in Google Colab using a A100 GPU.

<a target="_blank" href="https://colab.research.google.com/github//kyutai-labs/moshi-finetune/blob/main/tutorials/moshi_finetune.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

Check out `moshi-finetune` Github repo to learn more: https://github.com/kyutai-labs/moshi-finetune/


## Installation

Clone the `moshi-finetune` repo:


In [1]:
%cd /content/
!git clone https://github.com/kyutai-labs/moshi-finetune.git

/content
Cloning into 'moshi-finetune'...
remote: Enumerating objects: 227, done.[K
remote: Counting objects: 100% (37/37), done.[K
remote: Compressing objects: 100% (13/13), done.[K
remote: Total 227 (delta 28), reused 24 (delta 24), pack-reused 190 (from 1)[K
Receiving objects: 100% (227/227), 623.92 KiB | 13.27 MiB/s, done.
Resolving deltas: 100% (127/127), done.


Install all required dependencies:


In [2]:
%pip install -e /content/moshi-finetune

Obtaining file:///content/moshi-finetune
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
Collecting moshi@ git+https://github.com/kyutai-labs/moshi.git#subdirectory=moshi (from finetune==0.0.0)
  Cloning https://github.com/kyutai-labs/moshi.git to /tmp/pip-install-2jk59rep/moshi_502e9cd06ac243849290a022c2900173
  Running command git clone --filter=blob:none --quiet https://github.com/kyutai-labs/moshi.git /tmp/pip-install-2jk59rep/moshi_502e9cd06ac243849290a022c2900173
  Resolved https://github.com/kyutai-labs/moshi.git to commit 62d0154eb199074c459f1fef2ef71028486fd528
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting fire (from finetune==0.0.0)
  Dow

## Prepare dataset


In [4]:
from pathlib import Path
from huggingface_hub import snapshot_download
import time

Path("/content/data/daily-talk-contiguous").mkdir(parents=True, exist_ok=True)

# Download the dataset with retries and delay
local_dir = None
retries = 3  # Number of retries
delay = 5    # Delay in seconds between retries

for i in range(retries):
    try:
        local_dir = snapshot_download(
            "kyutai/DailyTalkContiguous",
            repo_type="dataset",
            local_dir="/content/data/daily-talk-contiguous",
        )
        break  # Exit loop if successful
    except Exception as e:
        if "429" in str(e):  # Check for rate limit error
            print(f"Rate limit hit. Retrying in {delay} seconds... (Attempt {i+1}/{retries})")
            time.sleep(delay)
        else:
            raise e  # Re-raise other exceptions

if local_dir is None:
    print("Failed to download dataset after multiple retries.")
else:
    print("Dataset downloaded successfully!")

Fetching 5085 files:   0%|          | 0/5085 [00:00<?, ?it/s]

1260.json:   0%|          | 0.00/1.85k [00:00<?, ?B/s]

126.wav:   0%|          | 0.00/3.34M [00:00<?, ?B/s]

1258.wav:   0%|          | 0.00/4.02M [00:00<?, ?B/s]

1260.wav:   0%|          | 0.00/6.77M [00:00<?, ?B/s]

1259.wav:   0%|          | 0.00/2.05M [00:00<?, ?B/s]

126.json:   0%|          | 0.00/964 [00:00<?, ?B/s]

1261.json:   0%|          | 0.00/1.39k [00:00<?, ?B/s]

1259.json:   0%|          | 0.00/936 [00:00<?, ?B/s]

1262.wav:   0%|          | 0.00/10.8M [00:00<?, ?B/s]

1261.wav:   0%|          | 0.00/6.17M [00:00<?, ?B/s]

1262.json:   0%|          | 0.00/4.72k [00:00<?, ?B/s]

1263.json:   0%|          | 0.00/705 [00:00<?, ?B/s]

1263.wav:   0%|          | 0.00/2.33M [00:00<?, ?B/s]

1265.json:   0%|          | 0.00/732 [00:00<?, ?B/s]

1264.wav:   0%|          | 0.00/2.60M [00:00<?, ?B/s]

1264.json:   0%|          | 0.00/757 [00:00<?, ?B/s]

1265.wav:   0%|          | 0.00/2.49M [00:00<?, ?B/s]

1266.json:   0%|          | 0.00/2.29k [00:00<?, ?B/s]

1266.wav:   0%|          | 0.00/5.11M [00:00<?, ?B/s]

1267.json:   0%|          | 0.00/1.51k [00:00<?, ?B/s]

1267.wav:   0%|          | 0.00/3.68M [00:00<?, ?B/s]

1268.json:   0%|          | 0.00/507 [00:00<?, ?B/s]

1268.wav:   0%|          | 0.00/2.56M [00:00<?, ?B/s]

1269.json:   0%|          | 0.00/806 [00:00<?, ?B/s]

1269.wav:   0%|          | 0.00/3.94M [00:00<?, ?B/s]

127.wav:   0%|          | 0.00/14.6M [00:00<?, ?B/s]

1270.json:   0%|          | 0.00/1.32k [00:00<?, ?B/s]

127.json:   0%|          | 0.00/6.44k [00:00<?, ?B/s]

1270.wav:   0%|          | 0.00/3.48M [00:00<?, ?B/s]

1271.wav:   0%|          | 0.00/2.47M [00:00<?, ?B/s]

1271.json:   0%|          | 0.00/585 [00:00<?, ?B/s]

1272.json:   0%|          | 0.00/1.24k [00:00<?, ?B/s]

1273.json:   0%|          | 0.00/484 [00:00<?, ?B/s]

1272.wav:   0%|          | 0.00/5.74M [00:00<?, ?B/s]

1273.wav:   0%|          | 0.00/3.61M [00:00<?, ?B/s]

1274.wav:   0%|          | 0.00/2.71M [00:00<?, ?B/s]

1274.json:   0%|          | 0.00/1.31k [00:00<?, ?B/s]

1275.json:   0%|          | 0.00/593 [00:00<?, ?B/s]

1276.wav:   0%|          | 0.00/4.84M [00:00<?, ?B/s]

1276.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

1277.json:   0%|          | 0.00/634 [00:00<?, ?B/s]

1277.wav:   0%|          | 0.00/2.22M [00:00<?, ?B/s]

1278.wav:   0%|          | 0.00/3.56M [00:00<?, ?B/s]

1278.json:   0%|          | 0.00/877 [00:00<?, ?B/s]

1275.wav:   0%|          | 0.00/2.94M [00:00<?, ?B/s]

1279.json:   0%|          | 0.00/3.64k [00:00<?, ?B/s]

128.json:   0%|          | 0.00/958 [00:00<?, ?B/s]

1279.wav:   0%|          | 0.00/8.54M [00:00<?, ?B/s]

128.wav:   0%|          | 0.00/4.52M [00:00<?, ?B/s]

1280.wav:   0%|          | 0.00/6.24M [00:00<?, ?B/s]

1280.json:   0%|          | 0.00/1.78k [00:00<?, ?B/s]

1281.json:   0%|          | 0.00/3.78k [00:00<?, ?B/s]

1281.wav:   0%|          | 0.00/7.93M [00:00<?, ?B/s]

1282.json:   0%|          | 0.00/1.91k [00:00<?, ?B/s]

1283.json:   0%|          | 0.00/4.38k [00:00<?, ?B/s]

1283.wav:   0%|          | 0.00/13.3M [00:00<?, ?B/s]

1284.json:   0%|          | 0.00/1.38k [00:00<?, ?B/s]

1282.wav:   0%|          | 0.00/6.06M [00:00<?, ?B/s]

1285.json:   0%|          | 0.00/1.64k [00:00<?, ?B/s]

1285.wav:   0%|          | 0.00/6.14M [00:00<?, ?B/s]

1284.wav:   0%|          | 0.00/5.25M [00:00<?, ?B/s]

1286.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

1287.json:   0%|          | 0.00/1.81k [00:00<?, ?B/s]

1286.wav:   0%|          | 0.00/4.79M [00:00<?, ?B/s]

1287.wav:   0%|          | 0.00/4.75M [00:00<?, ?B/s]

1288.json:   0%|          | 0.00/965 [00:00<?, ?B/s]

1289.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

1288.wav:   0%|          | 0.00/3.49M [00:00<?, ?B/s]

1289.wav:   0%|          | 0.00/4.91M [00:00<?, ?B/s]

129.json:   0%|          | 0.00/1.41k [00:00<?, ?B/s]

129.wav:   0%|          | 0.00/6.96M [00:00<?, ?B/s]

1290.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

1290.wav:   0%|          | 0.00/5.52M [00:00<?, ?B/s]

1291.json:   0%|          | 0.00/997 [00:00<?, ?B/s]

1291.wav:   0%|          | 0.00/3.51M [00:00<?, ?B/s]

1292.json:   0%|          | 0.00/1.63k [00:00<?, ?B/s]

1292.wav:   0%|          | 0.00/5.95M [00:00<?, ?B/s]

1293.wav:   0%|          | 0.00/5.79M [00:00<?, ?B/s]

1293.json:   0%|          | 0.00/1.78k [00:00<?, ?B/s]

1294.json:   0%|          | 0.00/656 [00:00<?, ?B/s]

1295.json:   0%|          | 0.00/1.06k [00:00<?, ?B/s]

1294.wav:   0%|          | 0.00/2.26M [00:00<?, ?B/s]

1296.json:   0%|          | 0.00/1.41k [00:00<?, ?B/s]

1295.wav:   0%|          | 0.00/2.80M [00:00<?, ?B/s]

1296.wav:   0%|          | 0.00/6.72M [00:00<?, ?B/s]

1297.wav:   0%|          | 0.00/4.07M [00:00<?, ?B/s]

1297.json:   0%|          | 0.00/1.54k [00:00<?, ?B/s]

1298.json:   0%|          | 0.00/3.00k [00:00<?, ?B/s]

1298.wav:   0%|          | 0.00/11.2M [00:00<?, ?B/s]

1299.json:   0%|          | 0.00/1.47k [00:00<?, ?B/s]

1299.wav:   0%|          | 0.00/6.61M [00:00<?, ?B/s]

13.json:   0%|          | 0.00/2.78k [00:00<?, ?B/s]

130.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

130.wav:   0%|          | 0.00/3.02M [00:00<?, ?B/s]

13.wav:   0%|          | 0.00/6.99M [00:00<?, ?B/s]

1300.json:   0%|          | 0.00/1.99k [00:00<?, ?B/s]

1300.wav:   0%|          | 0.00/6.65M [00:00<?, ?B/s]

1301.wav:   0%|          | 0.00/2.92M [00:00<?, ?B/s]

1302.json:   0%|          | 0.00/3.20k [00:00<?, ?B/s]

1301.json:   0%|          | 0.00/1.12k [00:00<?, ?B/s]

1302.wav:   0%|          | 0.00/7.50M [00:00<?, ?B/s]

1303.json:   0%|          | 0.00/1.63k [00:00<?, ?B/s]

1303.wav:   0%|          | 0.00/7.56M [00:00<?, ?B/s]

1305.json:   0%|          | 0.00/2.03k [00:00<?, ?B/s]

1306.json:   0%|          | 0.00/1.23k [00:00<?, ?B/s]

1304.json:   0%|          | 0.00/4.24k [00:00<?, ?B/s]

1306.wav:   0%|          | 0.00/5.46M [00:00<?, ?B/s]

1304.wav:   0%|          | 0.00/11.7M [00:00<?, ?B/s]

1307.json:   0%|          | 0.00/3.63k [00:00<?, ?B/s]

1305.wav:   0%|          | 0.00/7.79M [00:00<?, ?B/s]

1307.wav:   0%|          | 0.00/10.2M [00:00<?, ?B/s]

1308.wav:   0%|          | 0.00/3.41M [00:00<?, ?B/s]

1309.json:   0%|          | 0.00/2.40k [00:00<?, ?B/s]

1308.json:   0%|          | 0.00/953 [00:00<?, ?B/s]

1309.wav:   0%|          | 0.00/7.60M [00:00<?, ?B/s]

1311.json:   0%|          | 0.00/680 [00:00<?, ?B/s]

1310.json:   0%|          | 0.00/1.94k [00:00<?, ?B/s]

131.json:   0%|          | 0.00/971 [00:00<?, ?B/s]

1310.wav:   0%|          | 0.00/5.45M [00:00<?, ?B/s]

131.wav:   0%|          | 0.00/3.33M [00:00<?, ?B/s]

1311.wav:   0%|          | 0.00/2.74M [00:00<?, ?B/s]

1312.json:   0%|          | 0.00/2.94k [00:00<?, ?B/s]

1313.json:   0%|          | 0.00/1.12k [00:00<?, ?B/s]

1313.wav:   0%|          | 0.00/4.23M [00:00<?, ?B/s]

1312.wav:   0%|          | 0.00/9.87M [00:00<?, ?B/s]

1314.wav:   0%|          | 0.00/5.45M [00:00<?, ?B/s]

1315.json:   0%|          | 0.00/618 [00:00<?, ?B/s]

1315.wav:   0%|          | 0.00/2.31M [00:00<?, ?B/s]

1316.wav:   0%|          | 0.00/4.60M [00:00<?, ?B/s]

1316.json:   0%|          | 0.00/1.33k [00:00<?, ?B/s]

1314.json:   0%|          | 0.00/1.64k [00:00<?, ?B/s]

1317.json:   0%|          | 0.00/2.03k [00:00<?, ?B/s]

1318.json:   0%|          | 0.00/1.44k [00:00<?, ?B/s]

132.json:   0%|          | 0.00/2.40k [00:00<?, ?B/s]

1319.wav:   0%|          | 0.00/1.41M [00:00<?, ?B/s]

1318.wav:   0%|          | 0.00/4.14M [00:00<?, ?B/s]

1317.wav:   0%|          | 0.00/6.15M [00:00<?, ?B/s]

1319.json:   0%|          | 0.00/558 [00:00<?, ?B/s]

1320.json:   0%|          | 0.00/3.82k [00:00<?, ?B/s]

132.wav:   0%|          | 0.00/4.63M [00:00<?, ?B/s]

1320.wav:   0%|          | 0.00/9.83M [00:00<?, ?B/s]

1321.json:   0%|          | 0.00/1.32k [00:00<?, ?B/s]

1322.json:   0%|          | 0.00/2.25k [00:00<?, ?B/s]

1321.wav:   0%|          | 0.00/4.74M [00:00<?, ?B/s]

1322.wav:   0%|          | 0.00/7.11M [00:00<?, ?B/s]

1323.json:   0%|          | 0.00/1.55k [00:00<?, ?B/s]

1323.wav:   0%|          | 0.00/3.15M [00:00<?, ?B/s]

1324.json:   0%|          | 0.00/1.50k [00:00<?, ?B/s]

1324.wav:   0%|          | 0.00/6.13M [00:00<?, ?B/s]

1325.json:   0%|          | 0.00/4.46k [00:00<?, ?B/s]

1326.json:   0%|          | 0.00/476 [00:00<?, ?B/s]

1325.wav:   0%|          | 0.00/15.1M [00:00<?, ?B/s]

1326.wav:   0%|          | 0.00/2.55M [00:00<?, ?B/s]

1327.json:   0%|          | 0.00/1.23k [00:00<?, ?B/s]

1327.wav:   0%|          | 0.00/3.37M [00:00<?, ?B/s]

1328.json:   0%|          | 0.00/582 [00:00<?, ?B/s]

133.json:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

1329.json:   0%|          | 0.00/2.83k [00:00<?, ?B/s]

1328.wav:   0%|          | 0.00/3.23M [00:00<?, ?B/s]

1329.wav:   0%|          | 0.00/9.70M [00:00<?, ?B/s]

133.wav:   0%|          | 0.00/2.93M [00:00<?, ?B/s]

1330.wav:   0%|          | 0.00/2.05M [00:00<?, ?B/s]

1331.json:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

1332.json:   0%|          | 0.00/1.71k [00:00<?, ?B/s]

1331.wav:   0%|          | 0.00/3.25M [00:00<?, ?B/s]

1332.wav:   0%|          | 0.00/5.18M [00:00<?, ?B/s]

1333.json:   0%|          | 0.00/2.37k [00:00<?, ?B/s]

1333.wav:   0%|          | 0.00/6.71M [00:00<?, ?B/s]

1334.json:   0%|          | 0.00/1.41k [00:00<?, ?B/s]

1334.wav:   0%|          | 0.00/4.01M [00:00<?, ?B/s]

1335.json:   0%|          | 0.00/779 [00:00<?, ?B/s]

1335.wav:   0%|          | 0.00/2.88M [00:00<?, ?B/s]

1336.json:   0%|          | 0.00/838 [00:00<?, ?B/s]

1336.wav:   0%|          | 0.00/2.20M [00:00<?, ?B/s]

1330.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

1337.wav:   0%|          | 0.00/2.66M [00:00<?, ?B/s]

1338.json:   0%|          | 0.00/835 [00:00<?, ?B/s]

1339.json:   0%|          | 0.00/1.75k [00:00<?, ?B/s]

1338.wav:   0%|          | 0.00/2.49M [00:00<?, ?B/s]

1339.wav:   0%|          | 0.00/6.10M [00:00<?, ?B/s]

134.json:   0%|          | 0.00/1.77k [00:00<?, ?B/s]

134.wav:   0%|          | 0.00/5.77M [00:00<?, ?B/s]

1340.json:   0%|          | 0.00/2.21k [00:00<?, ?B/s]

1340.wav:   0%|          | 0.00/6.53M [00:00<?, ?B/s]

1341.json:   0%|          | 0.00/882 [00:00<?, ?B/s]

1337.json:   0%|          | 0.00/894 [00:00<?, ?B/s]

1342.json:   0%|          | 0.00/1.51k [00:00<?, ?B/s]

1341.wav:   0%|          | 0.00/4.25M [00:00<?, ?B/s]

1342.wav:   0%|          | 0.00/5.57M [00:00<?, ?B/s]

1343.json:   0%|          | 0.00/848 [00:00<?, ?B/s]

1343.wav:   0%|          | 0.00/3.22M [00:00<?, ?B/s]

1344.wav:   0%|          | 0.00/5.16M [00:00<?, ?B/s]

1344.json:   0%|          | 0.00/1.14k [00:00<?, ?B/s]

1346.json:   0%|          | 0.00/1.30k [00:00<?, ?B/s]

1345.wav:   0%|          | 0.00/3.84M [00:00<?, ?B/s]

1345.json:   0%|          | 0.00/597 [00:00<?, ?B/s]

1346.wav:   0%|          | 0.00/3.39M [00:00<?, ?B/s]

1347.json:   0%|          | 0.00/717 [00:00<?, ?B/s]

1348.wav:   0%|          | 0.00/5.48M [00:00<?, ?B/s]

1349.json:   0%|          | 0.00/2.71k [00:00<?, ?B/s]

1348.json:   0%|          | 0.00/1.71k [00:00<?, ?B/s]

1347.wav:   0%|          | 0.00/2.95M [00:00<?, ?B/s]

1349.wav:   0%|          | 0.00/7.17M [00:00<?, ?B/s]

135.json:   0%|          | 0.00/1.06k [00:00<?, ?B/s]

135.wav:   0%|          | 0.00/5.95M [00:00<?, ?B/s]

1350.json:   0%|          | 0.00/2.85k [00:00<?, ?B/s]

1350.wav:   0%|          | 0.00/9.27M [00:00<?, ?B/s]

1351.wav:   0%|          | 0.00/7.05M [00:00<?, ?B/s]

1351.json:   0%|          | 0.00/2.33k [00:00<?, ?B/s]

1352.json:   0%|          | 0.00/842 [00:00<?, ?B/s]

1353.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

1352.wav:   0%|          | 0.00/4.56M [00:00<?, ?B/s]

1353.wav:   0%|          | 0.00/4.87M [00:00<?, ?B/s]

1354.json:   0%|          | 0.00/3.05k [00:00<?, ?B/s]

1355.json:   0%|          | 0.00/1.34k [00:00<?, ?B/s]

1354.wav:   0%|          | 0.00/11.7M [00:00<?, ?B/s]

1355.wav:   0%|          | 0.00/3.34M [00:00<?, ?B/s]

1356.json:   0%|          | 0.00/741 [00:00<?, ?B/s]

1356.wav:   0%|          | 0.00/5.68M [00:00<?, ?B/s]

1357.json:   0%|          | 0.00/1.31k [00:00<?, ?B/s]

1357.wav:   0%|          | 0.00/4.17M [00:00<?, ?B/s]

1358.json:   0%|          | 0.00/1.50k [00:00<?, ?B/s]

1359.json:   0%|          | 0.00/1.03k [00:00<?, ?B/s]

1359.wav:   0%|          | 0.00/3.21M [00:00<?, ?B/s]

1358.wav:   0%|          | 0.00/3.23M [00:00<?, ?B/s]

136.json:   0%|          | 0.00/2.63k [00:00<?, ?B/s]

136.wav:   0%|          | 0.00/6.37M [00:00<?, ?B/s]

1360.json:   0%|          | 0.00/868 [00:00<?, ?B/s]

1360.wav:   0%|          | 0.00/2.92M [00:00<?, ?B/s]

1361.json:   0%|          | 0.00/1.78k [00:00<?, ?B/s]

1362.json:   0%|          | 0.00/1.32k [00:00<?, ?B/s]

1361.wav:   0%|          | 0.00/5.43M [00:00<?, ?B/s]

1362.wav:   0%|          | 0.00/4.46M [00:00<?, ?B/s]

1364.json:   0%|          | 0.00/902 [00:00<?, ?B/s]

1365.json:   0%|          | 0.00/1.23k [00:00<?, ?B/s]

1364.wav:   0%|          | 0.00/2.85M [00:00<?, ?B/s]

1363.wav:   0%|          | 0.00/2.16M [00:00<?, ?B/s]

1365.wav:   0%|          | 0.00/3.07M [00:00<?, ?B/s]

1366.json:   0%|          | 0.00/2.62k [00:00<?, ?B/s]

1363.json:   0%|          | 0.00/1.03k [00:00<?, ?B/s]

1366.wav:   0%|          | 0.00/6.62M [00:00<?, ?B/s]

1368.json:   0%|          | 0.00/1.70k [00:00<?, ?B/s]

1369.json:   0%|          | 0.00/780 [00:00<?, ?B/s]

1368.wav:   0%|          | 0.00/5.72M [00:00<?, ?B/s]

1369.wav:   0%|          | 0.00/2.49M [00:00<?, ?B/s]

1367.wav:   0%|          | 0.00/4.36M [00:00<?, ?B/s]

137.json:   0%|          | 0.00/1.54k [00:00<?, ?B/s]

1370.json:   0%|          | 0.00/3.29k [00:00<?, ?B/s]

1367.json:   0%|          | 0.00/976 [00:00<?, ?B/s]

137.wav:   0%|          | 0.00/6.64M [00:00<?, ?B/s]

1371.json:   0%|          | 0.00/741 [00:00<?, ?B/s]

1370.wav:   0%|          | 0.00/10.4M [00:00<?, ?B/s]

1372.json:   0%|          | 0.00/1.75k [00:00<?, ?B/s]

1371.wav:   0%|          | 0.00/2.14M [00:00<?, ?B/s]

1373.json:   0%|          | 0.00/1.28k [00:00<?, ?B/s]

1372.wav:   0%|          | 0.00/4.98M [00:00<?, ?B/s]

1373.wav:   0%|          | 0.00/5.22M [00:00<?, ?B/s]

1374.wav:   0%|          | 0.00/6.12M [00:00<?, ?B/s]

1374.json:   0%|          | 0.00/1.71k [00:00<?, ?B/s]

1375.json:   0%|          | 0.00/573 [00:00<?, ?B/s]

1375.wav:   0%|          | 0.00/3.34M [00:00<?, ?B/s]

1376.wav:   0%|          | 0.00/7.46M [00:00<?, ?B/s]

1378.json:   0%|          | 0.00/1.34k [00:00<?, ?B/s]

1377.wav:   0%|          | 0.00/2.83M [00:00<?, ?B/s]

1378.wav:   0%|          | 0.00/8.24M [00:00<?, ?B/s]

1379.wav:   0%|          | 0.00/7.28M [00:00<?, ?B/s]

1379.json:   0%|          | 0.00/2.14k [00:00<?, ?B/s]

1376.json:   0%|          | 0.00/1.88k [00:00<?, ?B/s]

1377.json:   0%|          | 0.00/947 [00:00<?, ?B/s]

138.json:   0%|          | 0.00/1.89k [00:00<?, ?B/s]

138.wav:   0%|          | 0.00/4.11M [00:00<?, ?B/s]

1380.json:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

1380.wav:   0%|          | 0.00/3.16M [00:00<?, ?B/s]

1381.json:   0%|          | 0.00/1.43k [00:00<?, ?B/s]

1381.wav:   0%|          | 0.00/3.48M [00:00<?, ?B/s]

1382.json:   0%|          | 0.00/1.74k [00:00<?, ?B/s]

1382.wav:   0%|          | 0.00/6.67M [00:00<?, ?B/s]

1383.json:   0%|          | 0.00/1.00k [00:00<?, ?B/s]

1384.json:   0%|          | 0.00/2.12k [00:00<?, ?B/s]

1383.wav:   0%|          | 0.00/1.97M [00:00<?, ?B/s]

1384.wav:   0%|          | 0.00/6.35M [00:00<?, ?B/s]

1385.json:   0%|          | 0.00/3.89k [00:00<?, ?B/s]

1386.json:   0%|          | 0.00/4.39k [00:00<?, ?B/s]

1385.wav:   0%|          | 0.00/9.33M [00:00<?, ?B/s]

1387.json:   0%|          | 0.00/3.86k [00:00<?, ?B/s]

1386.wav:   0%|          | 0.00/13.2M [00:00<?, ?B/s]

1388.json:   0%|          | 0.00/2.24k [00:00<?, ?B/s]

1387.wav:   0%|          | 0.00/10.7M [00:00<?, ?B/s]

1388.wav:   0%|          | 0.00/7.71M [00:00<?, ?B/s]

1389.wav:   0%|          | 0.00/2.67M [00:00<?, ?B/s]

1389.json:   0%|          | 0.00/943 [00:00<?, ?B/s]

139.json:   0%|          | 0.00/1.41k [00:00<?, ?B/s]

139.wav:   0%|          | 0.00/4.57M [00:00<?, ?B/s]

1390.json:   0%|          | 0.00/534 [00:00<?, ?B/s]

1391.json:   0%|          | 0.00/796 [00:00<?, ?B/s]

1390.wav:   0%|          | 0.00/2.00M [00:00<?, ?B/s]

1393.json:   0%|          | 0.00/858 [00:00<?, ?B/s]

1392.wav:   0%|          | 0.00/5.76M [00:00<?, ?B/s]

1391.wav:   0%|          | 0.00/3.05M [00:00<?, ?B/s]

1392.json:   0%|          | 0.00/1.54k [00:00<?, ?B/s]

1394.json:   0%|          | 0.00/1.45k [00:00<?, ?B/s]

1393.wav:   0%|          | 0.00/3.73M [00:00<?, ?B/s]

1396.json:   0%|          | 0.00/1.26k [00:00<?, ?B/s]

1395.wav:   0%|          | 0.00/4.56M [00:00<?, ?B/s]

1394.wav:   0%|          | 0.00/5.25M [00:00<?, ?B/s]

1395.json:   0%|          | 0.00/1.37k [00:00<?, ?B/s]

1396.wav:   0%|          | 0.00/5.05M [00:00<?, ?B/s]

1397.wav:   0%|          | 0.00/8.02M [00:00<?, ?B/s]

1398.json:   0%|          | 0.00/600 [00:00<?, ?B/s]

1398.wav:   0%|          | 0.00/2.49M [00:00<?, ?B/s]

1399.json:   0%|          | 0.00/1.17k [00:00<?, ?B/s]

1399.wav:   0%|          | 0.00/3.91M [00:00<?, ?B/s]

14.json:   0%|          | 0.00/2.72k [00:00<?, ?B/s]

140.json:   0%|          | 0.00/1.78k [00:00<?, ?B/s]

140.wav:   0%|          | 0.00/3.89M [00:00<?, ?B/s]

1400.json:   0%|          | 0.00/1.14k [00:00<?, ?B/s]

1400.wav:   0%|          | 0.00/3.76M [00:00<?, ?B/s]

1401.json:   0%|          | 0.00/1.70k [00:00<?, ?B/s]

1397.json:   0%|          | 0.00/1.87k [00:00<?, ?B/s]

1401.wav:   0%|          | 0.00/5.37M [00:00<?, ?B/s]

14.wav:   0%|          | 0.00/6.03M [00:00<?, ?B/s]

1402.json:   0%|          | 0.00/1.79k [00:00<?, ?B/s]

1403.wav:   0%|          | 0.00/2.56M [00:00<?, ?B/s]

1402.wav:   0%|          | 0.00/6.85M [00:00<?, ?B/s]

1403.json:   0%|          | 0.00/808 [00:00<?, ?B/s]

1404.json:   0%|          | 0.00/906 [00:00<?, ?B/s]

1404.wav:   0%|          | 0.00/4.43M [00:00<?, ?B/s]

1405.wav:   0%|          | 0.00/5.48M [00:00<?, ?B/s]

1405.json:   0%|          | 0.00/1.67k [00:00<?, ?B/s]

1406.wav:   0%|          | 0.00/12.0M [00:00<?, ?B/s]

1407.json:   0%|          | 0.00/918 [00:00<?, ?B/s]

1407.wav:   0%|          | 0.00/4.47M [00:00<?, ?B/s]

1406.json:   0%|          | 0.00/2.79k [00:00<?, ?B/s]

1408.wav:   0%|          | 0.00/4.12M [00:00<?, ?B/s]

1409.json:   0%|          | 0.00/1.46k [00:00<?, ?B/s]

1408.json:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

1409.wav:   0%|          | 0.00/4.02M [00:00<?, ?B/s]

141.json:   0%|          | 0.00/2.17k [00:00<?, ?B/s]

141.wav:   0%|          | 0.00/7.10M [00:00<?, ?B/s]

1410.json:   0%|          | 0.00/3.27k [00:00<?, ?B/s]

1410.wav:   0%|          | 0.00/9.47M [00:00<?, ?B/s]

1411.json:   0%|          | 0.00/1.97k [00:00<?, ?B/s]

1411.wav:   0%|          | 0.00/4.93M [00:00<?, ?B/s]

1412.json:   0%|          | 0.00/2.47k [00:00<?, ?B/s]

1412.wav:   0%|          | 0.00/7.80M [00:00<?, ?B/s]

1413.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

1413.wav:   0%|          | 0.00/4.58M [00:00<?, ?B/s]

1414.json:   0%|          | 0.00/2.30k [00:00<?, ?B/s]

1415.json:   0%|          | 0.00/1.73k [00:00<?, ?B/s]

1416.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

1415.wav:   0%|          | 0.00/5.48M [00:00<?, ?B/s]

1416.wav:   0%|          | 0.00/4.45M [00:00<?, ?B/s]

1417.wav:   0%|          | 0.00/3.89M [00:00<?, ?B/s]

1414.wav:   0%|          | 0.00/6.68M [00:00<?, ?B/s]

1418.wav:   0%|          | 0.00/5.66M [00:00<?, ?B/s]

1417.json:   0%|          | 0.00/1.56k [00:00<?, ?B/s]

1419.json:   0%|          | 0.00/1.59k [00:00<?, ?B/s]

1418.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

142.json:   0%|          | 0.00/826 [00:00<?, ?B/s]

142.wav:   0%|          | 0.00/2.51M [00:00<?, ?B/s]

1421.json:   0%|          | 0.00/3.74k [00:00<?, ?B/s]

1419.wav:   0%|          | 0.00/4.28M [00:00<?, ?B/s]

1420.wav:   0%|          | 0.00/2.87M [00:00<?, ?B/s]

1420.json:   0%|          | 0.00/1.02k [00:00<?, ?B/s]

1421.wav:   0%|          | 0.00/12.8M [00:00<?, ?B/s]

1422.wav:   0%|          | 0.00/3.41M [00:00<?, ?B/s]

1422.json:   0%|          | 0.00/866 [00:00<?, ?B/s]

1424.json:   0%|          | 0.00/1.06k [00:00<?, ?B/s]

1423.json:   0%|          | 0.00/985 [00:00<?, ?B/s]

1423.wav:   0%|          | 0.00/3.00M [00:00<?, ?B/s]

1424.wav:   0%|          | 0.00/3.85M [00:00<?, ?B/s]

1425.json:   0%|          | 0.00/1.20k [00:00<?, ?B/s]

1427.json:   0%|          | 0.00/4.33k [00:00<?, ?B/s]

1426.wav:   0%|          | 0.00/5.50M [00:00<?, ?B/s]

1425.wav:   0%|          | 0.00/4.17M [00:00<?, ?B/s]

1426.json:   0%|          | 0.00/1.46k [00:00<?, ?B/s]

1427.wav:   0%|          | 0.00/15.8M [00:00<?, ?B/s]

1428.wav:   0%|          | 0.00/5.12M [00:00<?, ?B/s]

1429.json:   0%|          | 0.00/752 [00:00<?, ?B/s]

143.wav:   0%|          | 0.00/5.14M [00:00<?, ?B/s]

1430.json:   0%|          | 0.00/4.46k [00:00<?, ?B/s]

143.json:   0%|          | 0.00/1.85k [00:00<?, ?B/s]

1429.wav:   0%|          | 0.00/4.15M [00:00<?, ?B/s]

1428.json:   0%|          | 0.00/1.28k [00:00<?, ?B/s]

1430.wav:   0%|          | 0.00/9.55M [00:00<?, ?B/s]

1431.wav:   0%|          | 0.00/3.06M [00:00<?, ?B/s]

1431.json:   0%|          | 0.00/708 [00:00<?, ?B/s]

1433.json:   0%|          | 0.00/1.33k [00:00<?, ?B/s]

1432.json:   0%|          | 0.00/2.27k [00:00<?, ?B/s]

1432.wav:   0%|          | 0.00/5.38M [00:00<?, ?B/s]

1433.wav:   0%|          | 0.00/4.30M [00:00<?, ?B/s]

1434.json:   0%|          | 0.00/1.39k [00:00<?, ?B/s]

1435.wav:   0%|          | 0.00/2.45M [00:00<?, ?B/s]

1435.json:   0%|          | 0.00/877 [00:00<?, ?B/s]

1434.wav:   0%|          | 0.00/3.47M [00:00<?, ?B/s]

1436.json:   0%|          | 0.00/3.09k [00:00<?, ?B/s]

1436.wav:   0%|          | 0.00/8.97M [00:00<?, ?B/s]

1437.json:   0%|          | 0.00/785 [00:00<?, ?B/s]

1437.wav:   0%|          | 0.00/3.40M [00:00<?, ?B/s]

1438.wav:   0%|          | 0.00/10.1M [00:00<?, ?B/s]

1438.json:   0%|          | 0.00/4.10k [00:00<?, ?B/s]

1439.json:   0%|          | 0.00/4.64k [00:00<?, ?B/s]

1439.wav:   0%|          | 0.00/14.8M [00:00<?, ?B/s]

144.json:   0%|          | 0.00/3.54k [00:00<?, ?B/s]

144.wav:   0%|          | 0.00/11.6M [00:00<?, ?B/s]

1440.json:   0%|          | 0.00/1.10k [00:00<?, ?B/s]

1440.wav:   0%|          | 0.00/4.03M [00:00<?, ?B/s]

1442.json:   0%|          | 0.00/1.46k [00:00<?, ?B/s]

1441.json:   0%|          | 0.00/889 [00:00<?, ?B/s]

1442.wav:   0%|          | 0.00/4.26M [00:00<?, ?B/s]

1441.wav:   0%|          | 0.00/3.33M [00:00<?, ?B/s]

1443.json:   0%|          | 0.00/1.88k [00:00<?, ?B/s]

1444.json:   0%|          | 0.00/1.89k [00:00<?, ?B/s]

1443.wav:   0%|          | 0.00/4.27M [00:00<?, ?B/s]

1444.wav:   0%|          | 0.00/6.18M [00:00<?, ?B/s]

1445.wav:   0%|          | 0.00/3.77M [00:00<?, ?B/s]

1446.wav:   0%|          | 0.00/12.8M [00:00<?, ?B/s]

1446.json:   0%|          | 0.00/2.85k [00:00<?, ?B/s]

1445.json:   0%|          | 0.00/925 [00:00<?, ?B/s]

1447.json:   0%|          | 0.00/832 [00:00<?, ?B/s]

1448.json:   0%|          | 0.00/1.00k [00:00<?, ?B/s]

1447.wav:   0%|          | 0.00/2.26M [00:00<?, ?B/s]

1449.json:   0%|          | 0.00/1.71k [00:00<?, ?B/s]

1448.wav:   0%|          | 0.00/3.49M [00:00<?, ?B/s]

145.json:   0%|          | 0.00/1.37k [00:00<?, ?B/s]

1449.wav:   0%|          | 0.00/7.15M [00:00<?, ?B/s]

145.wav:   0%|          | 0.00/4.83M [00:00<?, ?B/s]

1450.json:   0%|          | 0.00/972 [00:00<?, ?B/s]

1451.json:   0%|          | 0.00/1.31k [00:00<?, ?B/s]

1450.wav:   0%|          | 0.00/4.58M [00:00<?, ?B/s]

1451.wav:   0%|          | 0.00/4.72M [00:00<?, ?B/s]

1452.wav:   0%|          | 0.00/12.3M [00:00<?, ?B/s]

1452.json:   0%|          | 0.00/5.44k [00:00<?, ?B/s]

1453.json:   0%|          | 0.00/962 [00:00<?, ?B/s]

1453.wav:   0%|          | 0.00/4.71M [00:00<?, ?B/s]

1454.json:   0%|          | 0.00/3.26k [00:00<?, ?B/s]

1456.json:   0%|          | 0.00/1.52k [00:00<?, ?B/s]

1454.wav:   0%|          | 0.00/11.6M [00:00<?, ?B/s]

1455.wav:   0%|          | 0.00/5.08M [00:00<?, ?B/s]

1456.wav:   0%|          | 0.00/5.13M [00:00<?, ?B/s]

1457.wav:   0%|          | 0.00/3.14M [00:00<?, ?B/s]

1455.json:   0%|          | 0.00/1.89k [00:00<?, ?B/s]

1458.json:   0%|          | 0.00/1.13k [00:00<?, ?B/s]

1457.json:   0%|          | 0.00/518 [00:00<?, ?B/s]

1458.wav:   0%|          | 0.00/4.80M [00:00<?, ?B/s]

1459.json:   0%|          | 0.00/1.16k [00:00<?, ?B/s]

1459.wav:   0%|          | 0.00/5.06M [00:00<?, ?B/s]

1460.json:   0%|          | 0.00/1.11k [00:00<?, ?B/s]

1460.wav:   0%|          | 0.00/3.50M [00:00<?, ?B/s]

146.json:   0%|          | 0.00/5.03k [00:00<?, ?B/s]

146.wav:   0%|          | 0.00/14.0M [00:00<?, ?B/s]

1461.json:   0%|          | 0.00/545 [00:00<?, ?B/s]

1462.json:   0%|          | 0.00/2.49k [00:00<?, ?B/s]

1461.wav:   0%|          | 0.00/3.01M [00:00<?, ?B/s]

1462.wav:   0%|          | 0.00/8.67M [00:00<?, ?B/s]

1463.json:   0%|          | 0.00/1.88k [00:00<?, ?B/s]

1463.wav:   0%|          | 0.00/4.60M [00:00<?, ?B/s]

1464.json:   0%|          | 0.00/1.98k [00:00<?, ?B/s]

1464.wav:   0%|          | 0.00/6.78M [00:00<?, ?B/s]

1465.json:   0%|          | 0.00/791 [00:00<?, ?B/s]

1466.json:   0%|          | 0.00/1.88k [00:00<?, ?B/s]

1466.wav:   0%|          | 0.00/6.18M [00:00<?, ?B/s]

1467.json:   0%|          | 0.00/2.40k [00:00<?, ?B/s]

1467.wav:   0%|          | 0.00/7.63M [00:00<?, ?B/s]

1465.wav:   0%|          | 0.00/2.68M [00:00<?, ?B/s]

1468.wav:   0%|          | 0.00/8.16M [00:00<?, ?B/s]

1468.json:   0%|          | 0.00/2.72k [00:00<?, ?B/s]

1469.json:   0%|          | 0.00/794 [00:00<?, ?B/s]

1469.wav:   0%|          | 0.00/6.16M [00:00<?, ?B/s]

147.json:   0%|          | 0.00/1.38k [00:00<?, ?B/s]

147.wav:   0%|          | 0.00/5.35M [00:00<?, ?B/s]

1470.json:   0%|          | 0.00/1.09k [00:00<?, ?B/s]

1470.wav:   0%|          | 0.00/4.04M [00:00<?, ?B/s]

1471.json:   0%|          | 0.00/1.03k [00:00<?, ?B/s]

1471.wav:   0%|          | 0.00/3.00M [00:00<?, ?B/s]

1472.json:   0%|          | 0.00/501 [00:00<?, ?B/s]

1473.json:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

1472.wav:   0%|          | 0.00/1.76M [00:00<?, ?B/s]

1473.wav:   0%|          | 0.00/3.33M [00:00<?, ?B/s]

1474.json:   0%|          | 0.00/3.16k [00:00<?, ?B/s]

1475.json:   0%|          | 0.00/4.58k [00:00<?, ?B/s]

1474.wav:   0%|          | 0.00/12.1M [00:00<?, ?B/s]

1475.wav:   0%|          | 0.00/13.4M [00:00<?, ?B/s]

1476.json:   0%|          | 0.00/1.30k [00:00<?, ?B/s]

1476.wav:   0%|          | 0.00/4.75M [00:00<?, ?B/s]

1478.json:   0%|          | 0.00/4.05k [00:00<?, ?B/s]

1477.json:   0%|          | 0.00/1.24k [00:00<?, ?B/s]

1477.wav:   0%|          | 0.00/4.65M [00:00<?, ?B/s]

1478.wav:   0%|          | 0.00/15.1M [00:00<?, ?B/s]

1480.json:   0%|          | 0.00/1.72k [00:00<?, ?B/s]

1479.wav:   0%|          | 0.00/2.77M [00:00<?, ?B/s]

148.json:   0%|          | 0.00/1.28k [00:00<?, ?B/s]

148.wav:   0%|          | 0.00/4.08M [00:00<?, ?B/s]

1480.wav:   0%|          | 0.00/6.83M [00:00<?, ?B/s]

1479.json:   0%|          | 0.00/705 [00:00<?, ?B/s]

1481.json:   0%|          | 0.00/1.53k [00:00<?, ?B/s]

1481.wav:   0%|          | 0.00/3.69M [00:00<?, ?B/s]

1483.json:   0%|          | 0.00/1.12k [00:00<?, ?B/s]

1482.wav:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

1484.json:   0%|          | 0.00/3.13k [00:00<?, ?B/s]

1483.wav:   0%|          | 0.00/4.23M [00:00<?, ?B/s]

1484.wav:   0%|          | 0.00/11.3M [00:00<?, ?B/s]

1482.json:   0%|          | 0.00/3.27k [00:00<?, ?B/s]

1485.json:   0%|          | 0.00/1.11k [00:00<?, ?B/s]

1485.wav:   0%|          | 0.00/2.88M [00:00<?, ?B/s]

1486.json:   0%|          | 0.00/2.33k [00:00<?, ?B/s]

1486.wav:   0%|          | 0.00/12.9M [00:00<?, ?B/s]

1487.json:   0%|          | 0.00/2.85k [00:00<?, ?B/s]

1487.wav:   0%|          | 0.00/10.3M [00:00<?, ?B/s]

Rate limit hit. Retrying in 5 seconds... (Attempt 1/3)


Returning existing local_dir `/content/data/daily-talk-contiguous` as remote repo cannot be accessed in `snapshot_download` (429 Client Error: Too Many Requests for url: https://huggingface.co/api/datasets/kyutai/DailyTalkContiguous/revision/main).


Dataset downloaded successfully!


## Start training


In [5]:
# these info is needed for training
import os

os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

In [6]:
# define training configuration
# for your own use cases, you might want to change the data paths, model path, run_dir, and other hyperparameters
import yaml

config = """
# data
data:
  train_data: '/content/data/daily-talk-contiguous/dailytalk.jsonl' # Fill
  eval_data: '' # Optionally Fill
  shuffle: true

# model
moshi_paths:
  hf_repo_id: "kyutai/moshiko-pytorch-bf16"


full_finetuning: false # Activate lora.enable if partial finetuning
lora:
  enable: true
  rank: 128
  scaling: 2.
  ft_embed: false

# training hyperparameters
first_codebook_weight_multiplier: 100.
text_padding_weight: .5


# tokens per training steps = batch_size x num_GPUs x duration_sec
# we recommend a sequence duration of 300 seconds
# If you run into memory error, you can try reduce the sequence length
duration_sec: 100
batch_size: 1
max_steps: 300

gradient_checkpointing: true # Activate checkpointing of layers

# optim
optim:
  lr: 2.e-6
  weight_decay: 0.1
  pct_start: 0.05

# other
seed: 0
log_freq: 10
eval_freq: 1
do_eval: False
ckpt_freq: 10

save_adapters: True

run_dir: "/content/test"  # Fill
"""

# save the same file locally into the example.yaml file
with open("/content/example.yaml", "w") as file:
    yaml.dump(yaml.safe_load(config), file)

In [7]:
# make sure the run_dir has not been created before
# only run this when you ran torchrun previously and created the /content/test_ultra file
# ! rm -r /content/test

In [8]:
# start training

!cd /content/moshi-finetune && torchrun --nproc-per-node 1 -m train /content/example.yaml

2025-05-08 12:30:40.408285: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1746707440.702007    4993 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1746707440.787368    4993 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-05-08 12:30:41.424281: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-05-08 12:30:48 (UTC) - 0:00:17 - distributed - INFO - torch.cuda.device_count: 1
2025-05-08 12:30:48 (UTC) - 0:0

## Inference

Once the model has been trained, inference can be run on the colab GPU too, and gradio can be used to tunnel the audio data from a local client to the notebook.

More details on how to set this up can be found in the [moshi readme](https://github.com/kyutai-labs/moshi?tab=readme-ov-file#python-pytorch).


In [9]:
!pip install gradio

Collecting gradio
  Downloading gradio-5.29.0-py3-none-any.whl.metadata (16 kB)
Collecting aiofiles<25.0,>=22.0 (from gradio)
  Downloading aiofiles-24.1.0-py3-none-any.whl.metadata (10 kB)
Collecting fastapi<1.0,>=0.115.2 (from gradio)
  Downloading fastapi-0.115.12-py3-none-any.whl.metadata (27 kB)
Collecting ffmpy (from gradio)
  Downloading ffmpy-0.5.0-py3-none-any.whl.metadata (3.0 kB)
Collecting gradio-client==1.10.0 (from gradio)
  Downloading gradio_client-1.10.0-py3-none-any.whl.metadata (7.1 kB)
Collecting groovy~=0.1 (from gradio)
  Downloading groovy-0.1.2-py3-none-any.whl.metadata (6.1 kB)
Collecting pydub (from gradio)
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting python-multipart>=0.0.18 (from gradio)
  Downloading python_multipart-0.0.20-py3-none-any.whl.metadata (1.8 kB)
Collecting ruff>=0.9.3 (from gradio)
  Downloading ruff-0.11.8-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (25 kB)
Collecting safehttpx<0.2.0,>=0.1.6

In [10]:
!python -m moshi.server --gradio-tunnel --lora-weight=/content/test/checkpoints/checkpoint_000300/consolidated/lora.safetensors --config-path=/content/test/checkpoints/checkpoint_000300/consolidated/config.json

[1;34m[Info][0m retrieving checkpoint
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/usr/local/lib/python3.11/dist-packages/moshi/server.py", line 291, in <module>
    main()
  File "/usr/local/lib/python3.11/dist-packages/moshi/server.py", line 227, in main
    checkpoint_info = loaders.CheckpointInfo.from_hf_repo(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/moshi/models/loaders.py", line 198, in from_hf_repo
    raw_config = json.loads(Path(config_path).read_text())
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/pathlib.py", line 1058, in read_text
    with self.open(mode='r', encoding=encoding, errors=errors) as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/pathlib.py", line 1044, in open
    return io.open(self, mode, buffering, enc