# 02a – 2. Experiment: Stage-2-Fine-Tuning: Erweitertes Training auf MCoRec

## Hypothese

BL4 (AV-HuBERT Cocktail, MCoRec-feingetunet) wurde bereits auf einem Multi-Corpus-Mix
aus LRS2, Vox2, AVYT, AVYT-mix und MCoRec trainiert (~400 k Schritte, LR 1·10⁻⁴).
Kann ein zweites, gezieltes Fine-Tuning (**Stage 2**) mit niedrigerer Lernrate
und weniger Schritten die Domänenanpassung weiter verbessern?

## Vorgehen

Stage-2-Checkpoint `avsr_cocktail_mcorec_stage2_lr5e-5_30k`:
- **Ausgangspunkt:** BL4-Checkpoint (`avsr_cocktail_mcorec_finetune`)
- **Lernrate:** 5·10⁻⁵ (halbiert gegenüber Stage 1)
- **Schritte:** 30 000 (statt 400 000) – moderates Nachtraining
- **Datenmix:** gleicher Streaming-Mix mit `--include_mcorec`

## Ergebnis (Vorschau)

Das Stage-2-Modell ist **konsistent schlechter** als BL4 (WER +0.036–0.037, JER +0.018).
Das zusätzliche Training hat das Modell überanpasst und von der besser generalisierenden
BL4-Lösung wegoptimiert. → Stage-2-Ansatz wird nicht weiterverfolgt.

**Hinweis zum Bugfix:** Dieser Lauf wurde **vor dem Bugfix** in `segmentation.py` durchgeführt (`min_duration_off` las fälschlicherweise den Wert von `min_duration_on`). Das ist **gewollt**: Der Bugfix wurde erst nach Abschluss der LLM- und Hyperparameter-Experimente entdeckt. Da der Bugfix allein die WER zunächst verschlechterte, wurde erst in `02j_`/`02k_` die Kombination aus Bugfix + `min_duration`-Optimierung erarbeitet, die schließlich das beste Ergebnis lieferte.

## 1 – Setup: Arbeitsverzeichnis & Imports

In [1]:
import os, sys
from pathlib import Path

# Pfad anpassen, falls nötig
project_baseline_path = Path("/home/josch080/Projektgruppe/mcorec_baseline")
os.chdir(project_baseline_path)
print("CWD:", os.getcwd())

# Repo-Root in sys.path aufnehmen, damit projektinterne Module importierbar sind
if str(project_baseline_path) not in sys.path:
    sys.path.append(str(project_baseline_path))


CWD: /home/josch080/Projektgruppe/mcorec_baseline


## 2 – GPU-Auswahl

In [2]:
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

## 3 – CUDA-Verifikation

In [3]:
import torch
print("n_gpu:", torch.cuda.device_count())
# sollte 1 ausgeben


n_gpu: 1


## 4 – Stage-2-Training

Das Training wird als Subprocess gestartet. Die Argumente spiegeln das Stage-2-Setup
wider: niedrigere Lernrate, weniger Schritte, gleicher Datenmix wie Stage 1.

**Hinweis:** `subprocess.run(cmd)` ist bereits ausgeführt; der Checkpoint liegt unter
`model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000`.

In [4]:
cmd = [
    sys.executable, "script/train.py",
    "--streaming_dataset", # Daten werden on-the-fly geladen (kein vollständiges Preprocessing nötig)
    "--include_mcorec", # MCoRec-Daten in den Trainingsmix aufnehmen
    "--batch_size", "4",
    "--max_steps", "30000", # Weniger Schritte als Stage 1 (400k)
    "--gradient_accumulation_steps", "2", # Effektive Batch-Größe = 4 × 2 = 8
    "--save_steps", "2000", # Checkpoint alle 2000 Schritte sichern
    "--eval_steps", "2000", # Validation-Loss alle 2000 Schritte berechnen
    "--log_interval", "25",
    "--learning_rate", "5e-5", # Halbierte LR gegenüber Stage 1 (1e-4)
    "--warmup_steps", "3000", # Linearer LR-Warmup über die ersten 3000 Schritte
    "--checkpoint_name", "avsr_cocktail_mcorec_stage2_lr5e-5_30k", # Name des Ausgabe-Checkpoints
    "--model_name_or_path", "./model-bin/avsr_cocktail_mcorec_finetune", # Startpunkt: BL4
    "--output_dir", "./model-bin",
    "--report_to", "none", # Kein Logging an W&B o.ä.
]
print(" ".join(cmd)) # Vollständigen Befehl zur Verifikation ausgeben

/home/josch080/Projektgruppe/mcorec_train/bin/python script/train.py --streaming_dataset --include_mcorec --batch_size 4 --max_steps 30000 --gradient_accumulation_steps 2 --save_steps 2000 --eval_steps 2000 --log_interval 25 --learning_rate 5e-5 --warmup_steps 3000 --checkpoint_name avsr_cocktail_mcorec_stage2_lr5e-5_30k --model_name_or_path ./model-bin/avsr_cocktail_mcorec_finetune --output_dir ./model-bin --report_to none


In [5]:
import subprocess
# Training als Subprocess starten – Ausgabe erscheint direkt im Notebook
subprocess.run(cmd)

Loading pretrained model from ./model-bin/avsr_cocktail_mcorec_finetune
Loading MCoRec dataset
map_datasets
 {'lrs2': {'probabilities': 0.25, 'dataset': {'train': IterableDataset({
    features: ['label', 'length', 'sample_id', 'video'],
    num_shards: 10
}), 'valid': None}}, 'vox2': {'probabilities': 0.1, 'dataset': {'train': IterableDataset({
    features: ['label', 'length', 'sample_id', 'video'],
    num_shards: 53
}), 'valid': None}}, 'avyt': {'probabilities': 0.2, 'dataset': {'train': IterableDataset({
    features: ['label', 'length', 'sample_id', 'video'],
    num_shards: 16
}), 'valid': None}}, 'avyt-mix': {'probabilities': 0.25, 'dataset': {'train': IterableDataset({
    features: ['label', 'length', 'sample_id', 'video'],
    num_shards: 664
}), 'valid': None}}, 'mcorec': {'probabilities': 0.2, 'dataset': {'train': IterableDataset({
    features: ['label', 'length', 'sample_id', 'video'],
    num_shards: 48
}), 'valid': IterableDataset({
    features: ['label', 'length', 's

  super().__init__(
Could not estimate the number of tokens of the input, floating-point operations will not be computed
  0%|          | 25/30000 [00:19<4:11:33,  1.99it/s]

{'loss': 17.3312, 'grad_norm': 71.00940704345703, 'learning_rate': 3.5000000000000004e-07, 'epoch': 0.0}


  0%|          | 50/30000 [00:31<4:02:24,  2.06it/s]

{'loss': 14.2914, 'grad_norm': 15.612455368041992, 'learning_rate': 7.666666666666667e-07, 'epoch': 0.0}


  0%|          | 75/30000 [00:44<3:54:11,  2.13it/s]

{'loss': 13.6917, 'grad_norm': 16.764677047729492, 'learning_rate': 1.1833333333333334e-06, 'epoch': 0.0}


  0%|          | 100/30000 [00:55<3:54:57,  2.12it/s]

{'loss': 12.6155, 'grad_norm': 27.925941467285156, 'learning_rate': 1.6000000000000001e-06, 'epoch': 0.0}


  0%|          | 125/30000 [01:08<4:15:11,  1.95it/s]

{'loss': 19.2985, 'grad_norm': 28.92288589477539, 'learning_rate': 2.0166666666666667e-06, 'epoch': 0.0}


  0%|          | 150/30000 [01:20<4:17:01,  1.94it/s]

{'loss': 14.9646, 'grad_norm': 60.98441696166992, 'learning_rate': 2.4333333333333335e-06, 'epoch': 0.01}


  1%|          | 175/30000 [01:32<4:10:50,  1.98it/s]

{'loss': 23.0048, 'grad_norm': 66.44439697265625, 'learning_rate': 2.8333333333333335e-06, 'epoch': 0.01}


  1%|          | 200/30000 [01:45<4:14:12,  1.95it/s]

{'loss': 19.794, 'grad_norm': 30.48972511291504, 'learning_rate': 3.2500000000000002e-06, 'epoch': 0.01}


  1%|          | 225/30000 [01:57<4:11:36,  1.97it/s]

{'loss': 23.434, 'grad_norm': 62.91205596923828, 'learning_rate': 3.666666666666667e-06, 'epoch': 0.01}


  1%|          | 250/30000 [02:10<3:36:18,  2.29it/s]

{'loss': 18.9943, 'grad_norm': 29.91696548461914, 'learning_rate': 4.083333333333334e-06, 'epoch': 0.01}


  1%|          | 275/30000 [02:21<3:30:37,  2.35it/s]

{'loss': 10.9403, 'grad_norm': 27.176530838012695, 'learning_rate': 4.5e-06, 'epoch': 0.01}


  1%|          | 300/30000 [02:33<3:47:19,  2.18it/s]

{'loss': 11.5666, 'grad_norm': 19.913427352905273, 'learning_rate': 4.9166666666666665e-06, 'epoch': 0.01}


  1%|          | 325/30000 [02:45<4:08:39,  1.99it/s]

{'loss': 15.5754, 'grad_norm': 58.23151779174805, 'learning_rate': 5.333333333333334e-06, 'epoch': 0.01}


  1%|          | 350/30000 [02:57<3:32:36,  2.32it/s]

{'loss': 13.8308, 'grad_norm': 88.8888931274414, 'learning_rate': 5.750000000000001e-06, 'epoch': 0.01}


  1%|▏         | 375/30000 [03:09<3:35:50,  2.29it/s]

{'loss': 14.628, 'grad_norm': 44.540672302246094, 'learning_rate': 6.166666666666667e-06, 'epoch': 0.01}


  1%|▏         | 400/30000 [03:21<4:08:35,  1.98it/s]

{'loss': 19.8694, 'grad_norm': 58.85300064086914, 'learning_rate': 6.583333333333333e-06, 'epoch': 0.01}


  1%|▏         | 425/30000 [03:33<4:11:25,  1.96it/s]

{'loss': 19.7465, 'grad_norm': 61.64405822753906, 'learning_rate': 7.000000000000001e-06, 'epoch': 0.01}


  2%|▏         | 450/30000 [03:45<4:12:19,  1.95it/s]

{'loss': 15.1813, 'grad_norm': 41.8242301940918, 'learning_rate': 7.416666666666668e-06, 'epoch': 0.01}


  2%|▏         | 475/30000 [03:56<3:24:25,  2.41it/s]

{'loss': 13.0993, 'grad_norm': 22.02269172668457, 'learning_rate': 7.833333333333333e-06, 'epoch': 0.02}


  2%|▏         | 500/30000 [04:09<4:19:57,  1.89it/s]

{'loss': 18.5771, 'grad_norm': 50.633445739746094, 'learning_rate': 8.25e-06, 'epoch': 0.02}


  2%|▏         | 525/30000 [04:21<3:43:35,  2.20it/s]

{'loss': 16.6158, 'grad_norm': 16.82877540588379, 'learning_rate': 8.666666666666668e-06, 'epoch': 0.02}


  2%|▏         | 550/30000 [04:33<3:49:10,  2.14it/s]

{'loss': 18.139, 'grad_norm': 35.00023651123047, 'learning_rate': 9.083333333333333e-06, 'epoch': 0.02}


  2%|▏         | 575/30000 [04:45<3:36:43,  2.26it/s]

{'loss': 17.2456, 'grad_norm': 61.128929138183594, 'learning_rate': 9.5e-06, 'epoch': 0.02}


  2%|▏         | 600/30000 [04:56<3:28:49,  2.35it/s]

{'loss': 11.2404, 'grad_norm': 32.10109329223633, 'learning_rate': 9.916666666666668e-06, 'epoch': 0.02}


  2%|▏         | 625/30000 [05:08<4:10:57,  1.95it/s]

{'loss': 15.8525, 'grad_norm': 26.56573486328125, 'learning_rate': 1.0333333333333333e-05, 'epoch': 0.02}


  2%|▏         | 650/30000 [05:20<4:06:21,  1.99it/s]

{'loss': 19.5963, 'grad_norm': 81.76909637451172, 'learning_rate': 1.075e-05, 'epoch': 0.02}


  2%|▏         | 675/30000 [05:32<4:12:53,  1.93it/s]

{'loss': 18.121, 'grad_norm': 22.9056396484375, 'learning_rate': 1.1166666666666668e-05, 'epoch': 0.02}


  2%|▏         | 700/30000 [05:44<3:11:10,  2.55it/s]

{'loss': 13.572, 'grad_norm': 22.447370529174805, 'learning_rate': 1.1583333333333333e-05, 'epoch': 0.02}


  2%|▏         | 725/30000 [05:56<4:08:49,  1.96it/s]

{'loss': 14.3483, 'grad_norm': 27.274768829345703, 'learning_rate': 1.2e-05, 'epoch': 0.02}


  2%|▎         | 750/30000 [06:08<3:53:31,  2.09it/s]

{'loss': 18.0872, 'grad_norm': 51.99196243286133, 'learning_rate': 1.2416666666666667e-05, 'epoch': 0.03}


  3%|▎         | 775/30000 [06:19<4:04:58,  1.99it/s]

{'loss': 14.1611, 'grad_norm': 50.653465270996094, 'learning_rate': 1.2833333333333333e-05, 'epoch': 0.03}


  3%|▎         | 800/30000 [06:31<4:04:34,  1.99it/s]

{'loss': 20.0291, 'grad_norm': 52.49267578125, 'learning_rate': 1.3250000000000002e-05, 'epoch': 0.03}


  3%|▎         | 825/30000 [06:44<4:03:28,  2.00it/s]

{'loss': 16.6426, 'grad_norm': 30.468233108520508, 'learning_rate': 1.3666666666666666e-05, 'epoch': 0.03}


  3%|▎         | 850/30000 [06:55<4:02:14,  2.01it/s]

{'loss': 17.1329, 'grad_norm': 33.73909378051758, 'learning_rate': 1.4083333333333335e-05, 'epoch': 0.03}


  3%|▎         | 875/30000 [07:07<3:35:16,  2.25it/s]

{'loss': 15.3856, 'grad_norm': 41.35885238647461, 'learning_rate': 1.45e-05, 'epoch': 0.03}


  3%|▎         | 900/30000 [07:19<3:28:24,  2.33it/s]

{'loss': 20.9155, 'grad_norm': 39.216983795166016, 'learning_rate': 1.4916666666666667e-05, 'epoch': 0.03}


  3%|▎         | 925/30000 [07:32<4:09:12,  1.94it/s]

{'loss': 20.1126, 'grad_norm': 32.35628890991211, 'learning_rate': 1.5333333333333334e-05, 'epoch': 0.03}


  3%|▎         | 950/30000 [07:43<4:02:18,  2.00it/s]

{'loss': 16.0119, 'grad_norm': 37.96964645385742, 'learning_rate': 1.575e-05, 'epoch': 0.03}


  3%|▎         | 975/30000 [07:55<4:06:10,  1.97it/s]

{'loss': 19.9917, 'grad_norm': 34.199974060058594, 'learning_rate': 1.6166666666666665e-05, 'epoch': 0.03}


  3%|▎         | 1000/30000 [08:07<3:29:49,  2.30it/s]

{'loss': 20.1834, 'grad_norm': 37.08354949951172, 'learning_rate': 1.6583333333333334e-05, 'epoch': 0.03}


  3%|▎         | 1025/30000 [08:20<3:53:59,  2.06it/s]

{'loss': 18.8966, 'grad_norm': 61.63634490966797, 'learning_rate': 1.7000000000000003e-05, 'epoch': 0.03}


  4%|▎         | 1050/30000 [08:32<4:00:38,  2.01it/s]

{'loss': 20.8369, 'grad_norm': 87.92446899414062, 'learning_rate': 1.741666666666667e-05, 'epoch': 0.04}


  4%|▎         | 1075/30000 [08:44<3:50:23,  2.09it/s]

{'loss': 19.8394, 'grad_norm': 64.042236328125, 'learning_rate': 1.7833333333333334e-05, 'epoch': 0.04}


  4%|▎         | 1100/30000 [08:56<3:17:55,  2.43it/s]

{'loss': 16.2662, 'grad_norm': 28.386716842651367, 'learning_rate': 1.825e-05, 'epoch': 0.04}


  4%|▍         | 1125/30000 [09:07<3:58:40,  2.02it/s]

{'loss': 14.4158, 'grad_norm': 45.88051986694336, 'learning_rate': 1.866666666666667e-05, 'epoch': 0.04}


  4%|▍         | 1150/30000 [09:20<3:57:48,  2.02it/s]

{'loss': 19.7009, 'grad_norm': 78.97460174560547, 'learning_rate': 1.9083333333333334e-05, 'epoch': 0.04}


  4%|▍         | 1175/30000 [09:32<4:07:37,  1.94it/s]

{'loss': 17.2208, 'grad_norm': 42.034645080566406, 'learning_rate': 1.9500000000000003e-05, 'epoch': 0.04}


  4%|▍         | 1200/30000 [09:43<4:01:16,  1.99it/s]

{'loss': 21.579, 'grad_norm': 55.21867370605469, 'learning_rate': 1.9916666666666665e-05, 'epoch': 0.04}


  4%|▍         | 1225/30000 [09:56<3:27:09,  2.32it/s]

{'loss': 21.6571, 'grad_norm': 39.98133850097656, 'learning_rate': 2.0333333333333334e-05, 'epoch': 0.04}


  4%|▍         | 1250/30000 [10:09<4:07:28,  1.94it/s]

{'loss': 17.38, 'grad_norm': 28.673797607421875, 'learning_rate': 2.075e-05, 'epoch': 0.04}


  4%|▍         | 1275/30000 [10:21<4:02:05,  1.98it/s]

{'loss': 21.658, 'grad_norm': 58.537132263183594, 'learning_rate': 2.116666666666667e-05, 'epoch': 0.04}


  4%|▍         | 1300/30000 [10:32<2:54:37,  2.74it/s]

{'loss': 16.4793, 'grad_norm': 35.77513885498047, 'learning_rate': 2.1583333333333334e-05, 'epoch': 0.04}


  4%|▍         | 1325/30000 [10:44<3:57:12,  2.01it/s]

{'loss': 19.9648, 'grad_norm': 68.52244567871094, 'learning_rate': 2.2000000000000003e-05, 'epoch': 0.04}


  4%|▍         | 1350/30000 [10:56<3:17:12,  2.42it/s]

{'loss': 14.4986, 'grad_norm': 35.37556457519531, 'learning_rate': 2.2416666666666665e-05, 'epoch': 0.04}


  5%|▍         | 1375/30000 [11:08<3:28:41,  2.29it/s]

{'loss': 20.3668, 'grad_norm': 29.858572006225586, 'learning_rate': 2.2833333333333334e-05, 'epoch': 0.05}


  5%|▍         | 1400/30000 [11:21<4:19:53,  1.83it/s]

{'loss': 22.7666, 'grad_norm': 46.11834716796875, 'learning_rate': 2.3250000000000003e-05, 'epoch': 0.05}


  5%|▍         | 1425/30000 [11:33<4:06:10,  1.93it/s]

{'loss': 16.3119, 'grad_norm': 55.18647384643555, 'learning_rate': 2.3666666666666668e-05, 'epoch': 0.05}


  5%|▍         | 1450/30000 [11:46<4:10:54,  1.90it/s]

{'loss': 20.6022, 'grad_norm': 56.05821990966797, 'learning_rate': 2.4083333333333337e-05, 'epoch': 0.05}


  5%|▍         | 1475/30000 [11:58<4:07:18,  1.92it/s]

{'loss': 15.2809, 'grad_norm': 32.359107971191406, 'learning_rate': 2.45e-05, 'epoch': 0.05}


  5%|▌         | 1500/30000 [12:11<4:14:14,  1.87it/s]

{'loss': 18.4959, 'grad_norm': 51.5662956237793, 'learning_rate': 2.4916666666666668e-05, 'epoch': 0.05}


  5%|▌         | 1525/30000 [12:23<4:03:01,  1.95it/s]

{'loss': 18.9244, 'grad_norm': 46.71903610229492, 'learning_rate': 2.5333333333333337e-05, 'epoch': 0.05}


  5%|▌         | 1550/30000 [12:36<3:33:53,  2.22it/s]

{'loss': 28.0429, 'grad_norm': 56.01412582397461, 'learning_rate': 2.5750000000000002e-05, 'epoch': 0.05}


  5%|▌         | 1575/30000 [12:49<3:29:03,  2.27it/s]

{'loss': 18.3593, 'grad_norm': 40.899322509765625, 'learning_rate': 2.6166666666666668e-05, 'epoch': 0.05}


  5%|▌         | 1600/30000 [13:01<3:54:04,  2.02it/s]

{'loss': 20.2859, 'grad_norm': 56.436317443847656, 'learning_rate': 2.6583333333333333e-05, 'epoch': 0.05}


  5%|▌         | 1625/30000 [13:12<2:58:39,  2.65it/s]

{'loss': 14.3731, 'grad_norm': 25.855138778686523, 'learning_rate': 2.7000000000000002e-05, 'epoch': 0.05}


  6%|▌         | 1650/30000 [13:25<3:58:47,  1.98it/s]

{'loss': 18.3222, 'grad_norm': 34.41569137573242, 'learning_rate': 2.7416666666666668e-05, 'epoch': 0.06}


  6%|▌         | 1675/30000 [13:36<3:25:49,  2.29it/s]

{'loss': 13.4559, 'grad_norm': 22.869075775146484, 'learning_rate': 2.7833333333333333e-05, 'epoch': 0.06}


  6%|▌         | 1700/30000 [13:48<3:22:20,  2.33it/s]

{'loss': 25.3491, 'grad_norm': 29.589744567871094, 'learning_rate': 2.825e-05, 'epoch': 0.06}


  6%|▌         | 1725/30000 [13:59<3:21:18,  2.34it/s]

{'loss': 16.8254, 'grad_norm': 45.938621520996094, 'learning_rate': 2.8666666666666668e-05, 'epoch': 0.06}


  6%|▌         | 1750/30000 [14:12<3:52:27,  2.03it/s]

{'loss': 18.1454, 'grad_norm': 42.662803649902344, 'learning_rate': 2.9083333333333333e-05, 'epoch': 0.06}


  6%|▌         | 1775/30000 [14:24<3:52:45,  2.02it/s]

{'loss': 18.7537, 'grad_norm': 54.05384063720703, 'learning_rate': 2.95e-05, 'epoch': 0.06}


  6%|▌         | 1800/30000 [14:35<3:50:01,  2.04it/s]

{'loss': 15.2807, 'grad_norm': 36.34051513671875, 'learning_rate': 2.991666666666667e-05, 'epoch': 0.06}


  6%|▌         | 1825/30000 [14:47<3:27:56,  2.26it/s]

{'loss': 18.4973, 'grad_norm': 33.69395446777344, 'learning_rate': 3.0333333333333337e-05, 'epoch': 0.06}


  6%|▌         | 1850/30000 [14:59<3:32:04,  2.21it/s]

{'loss': 20.1352, 'grad_norm': 27.058490753173828, 'learning_rate': 3.075e-05, 'epoch': 0.06}


  6%|▋         | 1875/30000 [15:12<3:50:07,  2.04it/s]

{'loss': 23.3058, 'grad_norm': 53.92612075805664, 'learning_rate': 3.116666666666667e-05, 'epoch': 0.06}


  6%|▋         | 1900/30000 [15:25<3:36:10,  2.17it/s]

{'loss': 15.2464, 'grad_norm': 40.980125427246094, 'learning_rate': 3.158333333333334e-05, 'epoch': 0.06}


  6%|▋         | 1925/30000 [15:37<3:57:10,  1.97it/s]

{'loss': 19.5127, 'grad_norm': 44.42938232421875, 'learning_rate': 3.2000000000000005e-05, 'epoch': 0.06}


  6%|▋         | 1950/30000 [15:49<3:18:13,  2.36it/s]

{'loss': 17.4068, 'grad_norm': 30.93537712097168, 'learning_rate': 3.2416666666666664e-05, 'epoch': 0.07}


  7%|▋         | 1975/30000 [16:02<3:58:32,  1.96it/s]

{'loss': 29.3811, 'grad_norm': 47.147552490234375, 'learning_rate': 3.283333333333333e-05, 'epoch': 0.07}


  7%|▋         | 2000/30000 [16:13<3:53:09,  2.00it/s]

{'loss': 12.8964, 'grad_norm': 46.747657775878906, 'learning_rate': 3.325e-05, 'epoch': 0.07}


Too many dataloader workers: 10 (max is dataset.num_shards=3). Stopping 7 dataloader workers.
  7%|▋         | 2000/30000 [19:10<3:53:09,  2.00it/s]

{'eval_loss': 28.17624282836914, 'eval_runtime': 177.3195, 'eval_samples_per_second': 21.949, 'eval_steps_per_second': 5.487, 'epoch': 0.07}


  7%|▋         | 2025/30000 [19:30<3:54:51,  1.99it/s]  

{'loss': 15.2431, 'grad_norm': 47.072444915771484, 'learning_rate': 3.366666666666667e-05, 'epoch': 0.07}


  7%|▋         | 2050/30000 [19:42<3:24:29,  2.28it/s]

{'loss': 19.5806, 'grad_norm': 27.6933650970459, 'learning_rate': 3.408333333333333e-05, 'epoch': 0.07}


  7%|▋         | 2075/30000 [19:54<4:00:31,  1.93it/s]

{'loss': 16.4841, 'grad_norm': 66.80195617675781, 'learning_rate': 3.45e-05, 'epoch': 0.07}


  7%|▋         | 2100/30000 [20:06<3:14:47,  2.39it/s]

{'loss': 16.1435, 'grad_norm': 16.617084503173828, 'learning_rate': 3.491666666666667e-05, 'epoch': 0.07}


  7%|▋         | 2125/30000 [20:17<3:16:30,  2.36it/s]

{'loss': 19.2917, 'grad_norm': 41.30341720581055, 'learning_rate': 3.5333333333333336e-05, 'epoch': 0.07}


  7%|▋         | 2150/30000 [20:29<3:49:43,  2.02it/s]

{'loss': 13.7086, 'grad_norm': 26.246660232543945, 'learning_rate': 3.575e-05, 'epoch': 0.07}


  7%|▋         | 2175/30000 [20:42<3:18:27,  2.34it/s]

{'loss': 16.7288, 'grad_norm': 32.548641204833984, 'learning_rate': 3.6166666666666674e-05, 'epoch': 0.07}


  7%|▋         | 2200/30000 [20:53<3:09:32,  2.44it/s]

{'loss': 13.276, 'grad_norm': 24.635089874267578, 'learning_rate': 3.658333333333334e-05, 'epoch': 0.07}


  7%|▋         | 2225/30000 [21:05<3:45:44,  2.05it/s]

{'loss': 17.6287, 'grad_norm': 38.5001106262207, 'learning_rate': 3.7e-05, 'epoch': 0.07}


  8%|▊         | 2250/30000 [21:17<3:20:24,  2.31it/s]

{'loss': 18.7036, 'grad_norm': 38.27681350708008, 'learning_rate': 3.7416666666666664e-05, 'epoch': 0.07}


  8%|▊         | 2275/30000 [21:30<3:55:26,  1.96it/s]

{'loss': 16.7197, 'grad_norm': 48.6198844909668, 'learning_rate': 3.7833333333333336e-05, 'epoch': 0.08}


  8%|▊         | 2300/30000 [21:42<4:01:38,  1.91it/s]

{'loss': 16.7139, 'grad_norm': 51.51292037963867, 'learning_rate': 3.825e-05, 'epoch': 0.08}


  8%|▊         | 2325/30000 [21:55<3:57:13,  1.94it/s]

{'loss': 20.7062, 'grad_norm': 27.385961532592773, 'learning_rate': 3.866666666666667e-05, 'epoch': 0.08}


  8%|▊         | 2350/30000 [22:06<4:05:39,  1.88it/s]

{'loss': 15.8657, 'grad_norm': 66.13665771484375, 'learning_rate': 3.906666666666667e-05, 'epoch': 0.08}


  8%|▊         | 2375/30000 [22:19<4:01:13,  1.91it/s]

{'loss': 20.7345, 'grad_norm': 55.007354736328125, 'learning_rate': 3.9483333333333335e-05, 'epoch': 0.08}


  8%|▊         | 2400/30000 [22:31<3:55:47,  1.95it/s]

{'loss': 23.3238, 'grad_norm': 61.91250991821289, 'learning_rate': 3.99e-05, 'epoch': 0.08}


  8%|▊         | 2425/30000 [22:44<3:53:35,  1.97it/s]

{'loss': 17.7407, 'grad_norm': 33.03654861450195, 'learning_rate': 4.0316666666666666e-05, 'epoch': 0.08}


  8%|▊         | 2450/30000 [22:57<3:38:18,  2.10it/s]

{'loss': 15.6627, 'grad_norm': 32.973567962646484, 'learning_rate': 4.073333333333333e-05, 'epoch': 0.08}


  8%|▊         | 2475/30000 [23:10<3:56:25,  1.94it/s]

{'loss': 18.0797, 'grad_norm': 26.24540901184082, 'learning_rate': 4.115e-05, 'epoch': 0.08}


  8%|▊         | 2500/30000 [23:22<3:57:41,  1.93it/s]

{'loss': 20.2257, 'grad_norm': 48.01731491088867, 'learning_rate': 4.156666666666667e-05, 'epoch': 0.08}


  8%|▊         | 2505/30000 [23:24<3:55:02,  1.95it/s]'(ProtocolError('Connection aborted.', BrokenPipeError(32, 'Broken pipe')), '(Request ID: 184cb6e5-21d9-44e0-84ce-9b9f9a4d1536)')' thrown while requesting GET https://huggingface.co/datasets/nguyenvulebinh/AVYT/resolve/e6c6bf6f40e698b82215d269cfc0a0d65a7a2372/vox2/vox2-dev-000001.tar
Retrying in 1s [Retry 1/5].
  8%|▊         | 2525/30000 [23:34<3:52:57,  1.97it/s]

{'loss': 16.1858, 'grad_norm': 50.2606086730957, 'learning_rate': 4.1983333333333335e-05, 'epoch': 0.08}


  8%|▊         | 2550/30000 [23:46<3:42:44,  2.05it/s]

{'loss': 20.4322, 'grad_norm': 54.27219009399414, 'learning_rate': 4.24e-05, 'epoch': 0.09}


  9%|▊         | 2575/30000 [23:59<3:54:20,  1.95it/s]

{'loss': 17.1194, 'grad_norm': 39.40780258178711, 'learning_rate': 4.2816666666666666e-05, 'epoch': 0.09}


  9%|▊         | 2600/30000 [24:11<4:04:30,  1.87it/s]

{'loss': 14.5468, 'grad_norm': 17.392757415771484, 'learning_rate': 4.323333333333334e-05, 'epoch': 0.09}


  9%|▉         | 2625/30000 [24:23<4:00:43,  1.90it/s]

{'loss': 17.6046, 'grad_norm': 44.33537292480469, 'learning_rate': 4.3650000000000004e-05, 'epoch': 0.09}


  9%|▉         | 2650/30000 [24:35<3:10:36,  2.39it/s]

{'loss': 16.2522, 'grad_norm': 37.58259963989258, 'learning_rate': 4.406666666666667e-05, 'epoch': 0.09}


  9%|▉         | 2675/30000 [24:47<3:47:02,  2.01it/s]

{'loss': 17.7248, 'grad_norm': 47.34408950805664, 'learning_rate': 4.4483333333333335e-05, 'epoch': 0.09}


  9%|▉         | 2700/30000 [24:59<3:16:50,  2.31it/s]

{'loss': 15.1748, 'grad_norm': 33.853641510009766, 'learning_rate': 4.49e-05, 'epoch': 0.09}


  9%|▉         | 2725/30000 [25:11<4:01:25,  1.88it/s]

{'loss': 15.6938, 'grad_norm': 79.88500213623047, 'learning_rate': 4.5316666666666666e-05, 'epoch': 0.09}


  9%|▉         | 2750/30000 [25:23<3:53:24,  1.95it/s]

{'loss': 19.9844, 'grad_norm': 48.52936935424805, 'learning_rate': 4.573333333333333e-05, 'epoch': 0.09}


  9%|▉         | 2775/30000 [25:35<3:43:21,  2.03it/s]

{'loss': 16.4202, 'grad_norm': 43.097049713134766, 'learning_rate': 4.6150000000000004e-05, 'epoch': 0.09}


  9%|▉         | 2800/30000 [25:48<3:57:21,  1.91it/s]

{'loss': 16.9427, 'grad_norm': 44.06317138671875, 'learning_rate': 4.656666666666667e-05, 'epoch': 0.09}


  9%|▉         | 2825/30000 [26:01<3:53:52,  1.94it/s]

{'loss': 22.7557, 'grad_norm': 47.66730880737305, 'learning_rate': 4.6983333333333335e-05, 'epoch': 0.09}


 10%|▉         | 2850/30000 [26:13<3:26:51,  2.19it/s]

{'loss': 14.2364, 'grad_norm': 43.55778121948242, 'learning_rate': 4.74e-05, 'epoch': 0.1}


 10%|▉         | 2875/30000 [26:24<3:09:58,  2.38it/s]

{'loss': 14.715, 'grad_norm': 40.1729850769043, 'learning_rate': 4.781666666666667e-05, 'epoch': 0.1}


 10%|▉         | 2900/30000 [26:36<3:52:06,  1.95it/s]

{'loss': 13.5983, 'grad_norm': 26.32107925415039, 'learning_rate': 4.823333333333334e-05, 'epoch': 0.1}


 10%|▉         | 2925/30000 [26:49<3:55:43,  1.91it/s]

{'loss': 18.1017, 'grad_norm': 31.99476432800293, 'learning_rate': 4.8650000000000003e-05, 'epoch': 0.1}


 10%|▉         | 2950/30000 [27:01<3:42:59,  2.02it/s]

{'loss': 17.3263, 'grad_norm': 85.69903564453125, 'learning_rate': 4.906666666666667e-05, 'epoch': 0.1}


 10%|▉         | 2975/30000 [27:13<3:11:25,  2.35it/s]

{'loss': 13.8482, 'grad_norm': 35.472389221191406, 'learning_rate': 4.9483333333333334e-05, 'epoch': 0.1}


 10%|█         | 3000/30000 [27:25<3:48:39,  1.97it/s]

{'loss': 21.7097, 'grad_norm': 34.8327522277832, 'learning_rate': 4.99e-05, 'epoch': 0.1}


 10%|█         | 3025/30000 [27:37<3:00:41,  2.49it/s]

{'loss': 10.6001, 'grad_norm': 13.190470695495605, 'learning_rate': 4.996481481481482e-05, 'epoch': 0.1}


 10%|█         | 3050/30000 [27:49<3:17:50,  2.27it/s]

{'loss': 17.1302, 'grad_norm': 45.75181579589844, 'learning_rate': 4.991851851851852e-05, 'epoch': 0.1}


 10%|█         | 3075/30000 [28:01<3:04:21,  2.43it/s]

{'loss': 19.7415, 'grad_norm': 33.37633514404297, 'learning_rate': 4.9872222222222225e-05, 'epoch': 0.1}


 10%|█         | 3100/30000 [28:14<3:50:00,  1.95it/s]

{'loss': 17.0819, 'grad_norm': 28.064006805419922, 'learning_rate': 4.982592592592592e-05, 'epoch': 0.1}


 10%|█         | 3125/30000 [28:25<3:33:11,  2.10it/s]

{'loss': 21.0091, 'grad_norm': 35.46098709106445, 'learning_rate': 4.977962962962963e-05, 'epoch': 0.1}


 10%|█         | 3150/30000 [28:38<3:59:51,  1.87it/s]

{'loss': 16.9858, 'grad_norm': 24.328083038330078, 'learning_rate': 4.973333333333334e-05, 'epoch': 0.1}


 11%|█         | 3175/30000 [28:50<3:00:46,  2.47it/s]

{'loss': 21.1534, 'grad_norm': 22.701208114624023, 'learning_rate': 4.968703703703704e-05, 'epoch': 0.11}


 11%|█         | 3200/30000 [29:03<3:45:58,  1.98it/s]

{'loss': 18.4471, 'grad_norm': 19.105220794677734, 'learning_rate': 4.9640740740740744e-05, 'epoch': 0.11}


 11%|█         | 3225/30000 [29:16<3:53:02,  1.91it/s]

{'loss': 17.8963, 'grad_norm': 47.936241149902344, 'learning_rate': 4.959444444444445e-05, 'epoch': 0.11}


 11%|█         | 3250/30000 [29:27<3:43:58,  1.99it/s]

{'loss': 14.0106, 'grad_norm': 48.442047119140625, 'learning_rate': 4.954814814814815e-05, 'epoch': 0.11}


 11%|█         | 3275/30000 [29:39<3:47:44,  1.96it/s]

{'loss': 19.3764, 'grad_norm': 59.10612106323242, 'learning_rate': 4.9501851851851854e-05, 'epoch': 0.11}


 11%|█         | 3300/30000 [29:51<2:50:44,  2.61it/s]

{'loss': 18.7346, 'grad_norm': 38.538795471191406, 'learning_rate': 4.945555555555556e-05, 'epoch': 0.11}


 11%|█         | 3325/30000 [30:03<3:44:50,  1.98it/s]

{'loss': 16.2711, 'grad_norm': 64.82545471191406, 'learning_rate': 4.940925925925926e-05, 'epoch': 0.11}


 11%|█         | 3350/30000 [30:15<3:35:11,  2.06it/s]

{'loss': 16.3156, 'grad_norm': 91.93931579589844, 'learning_rate': 4.936296296296297e-05, 'epoch': 0.11}


 11%|█▏        | 3375/30000 [30:27<3:11:19,  2.32it/s]

{'loss': 18.3682, 'grad_norm': 32.892242431640625, 'learning_rate': 4.931666666666667e-05, 'epoch': 0.11}


 11%|█▏        | 3400/30000 [30:38<3:05:56,  2.38it/s]

{'loss': 17.2827, 'grad_norm': 32.833961486816406, 'learning_rate': 4.9270370370370374e-05, 'epoch': 0.11}


 11%|█▏        | 3425/30000 [30:51<3:46:45,  1.95it/s]

{'loss': 21.0335, 'grad_norm': 45.503021240234375, 'learning_rate': 4.922407407407408e-05, 'epoch': 0.11}


 12%|█▏        | 3450/30000 [31:03<3:39:32,  2.02it/s]

{'loss': 15.7656, 'grad_norm': 17.38704490661621, 'learning_rate': 4.917777777777778e-05, 'epoch': 0.12}


 12%|█▏        | 3475/30000 [31:16<3:47:04,  1.95it/s]

{'loss': 14.9186, 'grad_norm': 26.426761627197266, 'learning_rate': 4.913148148148148e-05, 'epoch': 0.12}


 12%|█▏        | 3500/30000 [31:27<3:06:03,  2.37it/s]

{'loss': 15.3008, 'grad_norm': 50.95805740356445, 'learning_rate': 4.908518518518519e-05, 'epoch': 0.12}


 12%|█▏        | 3525/30000 [31:40<3:50:01,  1.92it/s]

{'loss': 13.8678, 'grad_norm': 37.995361328125, 'learning_rate': 4.903888888888889e-05, 'epoch': 0.12}


 12%|█▏        | 3550/30000 [31:53<3:41:26,  1.99it/s]

{'loss': 25.266, 'grad_norm': 34.77388000488281, 'learning_rate': 4.89925925925926e-05, 'epoch': 0.12}


 12%|█▏        | 3575/30000 [32:03<3:24:44,  2.15it/s]

{'loss': 8.5745, 'grad_norm': 21.738466262817383, 'learning_rate': 4.8946296296296304e-05, 'epoch': 0.12}


 12%|█▏        | 3600/30000 [32:15<2:49:24,  2.60it/s]

{'loss': 22.1572, 'grad_norm': 11.232606887817383, 'learning_rate': 4.89e-05, 'epoch': 0.12}


 12%|█▏        | 3625/30000 [32:28<3:51:10,  1.90it/s]

{'loss': 20.7341, 'grad_norm': 52.21577072143555, 'learning_rate': 4.885370370370371e-05, 'epoch': 0.12}


 12%|█▏        | 3650/30000 [32:40<3:38:09,  2.01it/s]

{'loss': 15.2986, 'grad_norm': 42.12381362915039, 'learning_rate': 4.880740740740741e-05, 'epoch': 0.12}


 12%|█▏        | 3675/30000 [32:51<3:39:46,  2.00it/s]

{'loss': 13.5757, 'grad_norm': 38.01112365722656, 'learning_rate': 4.876111111111111e-05, 'epoch': 0.12}


 12%|█▏        | 3700/30000 [33:04<3:41:48,  1.98it/s]

{'loss': 17.2453, 'grad_norm': 70.89056396484375, 'learning_rate': 4.871481481481482e-05, 'epoch': 0.12}


 12%|█▏        | 3725/30000 [33:17<3:46:53,  1.93it/s]

{'loss': 17.2477, 'grad_norm': 44.20790481567383, 'learning_rate': 4.8668518518518516e-05, 'epoch': 0.12}


 12%|█▎        | 3750/30000 [33:29<4:00:25,  1.82it/s]

{'loss': 19.6534, 'grad_norm': 25.02775001525879, 'learning_rate': 4.862222222222222e-05, 'epoch': 0.12}


 13%|█▎        | 3775/30000 [33:41<3:15:47,  2.23it/s]

{'loss': 16.8242, 'grad_norm': 31.761756896972656, 'learning_rate': 4.857592592592593e-05, 'epoch': 0.13}


 13%|█▎        | 3800/30000 [33:54<3:36:09,  2.02it/s]

{'loss': 15.7955, 'grad_norm': 27.70638656616211, 'learning_rate': 4.852962962962963e-05, 'epoch': 0.13}


 13%|█▎        | 3825/30000 [34:05<3:33:31,  2.04it/s]

{'loss': 12.2506, 'grad_norm': 40.06996154785156, 'learning_rate': 4.848333333333334e-05, 'epoch': 0.13}


 13%|█▎        | 3850/30000 [34:18<3:38:40,  1.99it/s]

{'loss': 18.4189, 'grad_norm': 30.229658126831055, 'learning_rate': 4.843703703703704e-05, 'epoch': 0.13}


 13%|█▎        | 3875/30000 [34:31<3:29:58,  2.07it/s]

{'loss': 19.0975, 'grad_norm': 42.69168472290039, 'learning_rate': 4.839074074074074e-05, 'epoch': 0.13}


 13%|█▎        | 3900/30000 [34:42<3:40:30,  1.97it/s]

{'loss': 14.6278, 'grad_norm': 37.505882263183594, 'learning_rate': 4.8344444444444447e-05, 'epoch': 0.13}


 13%|█▎        | 3925/30000 [34:55<3:38:17,  1.99it/s]

{'loss': 16.9248, 'grad_norm': 41.43263626098633, 'learning_rate': 4.8298148148148145e-05, 'epoch': 0.13}


 13%|█▎        | 3950/30000 [35:07<3:50:17,  1.89it/s]

{'loss': 24.364, 'grad_norm': 57.813446044921875, 'learning_rate': 4.825185185185185e-05, 'epoch': 0.13}


 13%|█▎        | 3975/30000 [35:19<3:32:11,  2.04it/s]

{'loss': 18.8224, 'grad_norm': 60.66712951660156, 'learning_rate': 4.820555555555556e-05, 'epoch': 0.13}


 13%|█▎        | 4000/30000 [35:31<3:43:22,  1.94it/s]

{'loss': 14.1124, 'grad_norm': 35.98902893066406, 'learning_rate': 4.815925925925926e-05, 'epoch': 0.13}


Too many dataloader workers: 10 (max is dataset.num_shards=3). Stopping 7 dataloader workers.
 13%|█▎        | 4000/30000 [38:32<3:43:22,  1.94it/s]

{'eval_loss': 29.022287368774414, 'eval_runtime': 181.3287, 'eval_samples_per_second': 21.464, 'eval_steps_per_second': 5.366, 'epoch': 0.13}


 13%|█▎        | 4025/30000 [38:52<3:52:41,  1.86it/s]  

{'loss': 13.6175, 'grad_norm': 57.35700225830078, 'learning_rate': 4.8112962962962966e-05, 'epoch': 0.13}


 14%|█▎        | 4050/30000 [39:04<2:57:59,  2.43it/s]

{'loss': 15.8332, 'grad_norm': 17.545730590820312, 'learning_rate': 4.806666666666667e-05, 'epoch': 0.14}


 14%|█▎        | 4075/30000 [39:16<3:05:47,  2.33it/s]

{'loss': 15.1068, 'grad_norm': 33.67498779296875, 'learning_rate': 4.802037037037037e-05, 'epoch': 0.14}


 14%|█▎        | 4100/30000 [39:27<3:09:11,  2.28it/s]

{'loss': 13.4778, 'grad_norm': 53.151214599609375, 'learning_rate': 4.7974074074074076e-05, 'epoch': 0.14}


 14%|█▍        | 4125/30000 [39:39<3:40:52,  1.95it/s]

{'loss': 19.5387, 'grad_norm': 62.021240234375, 'learning_rate': 4.792777777777778e-05, 'epoch': 0.14}


 14%|█▍        | 4150/30000 [39:51<3:07:42,  2.30it/s]

{'loss': 12.9157, 'grad_norm': 30.475574493408203, 'learning_rate': 4.788148148148148e-05, 'epoch': 0.14}


 14%|█▍        | 4175/30000 [40:02<2:59:10,  2.40it/s]

{'loss': 15.8153, 'grad_norm': 17.95908546447754, 'learning_rate': 4.783518518518519e-05, 'epoch': 0.14}


 14%|█▍        | 4200/30000 [40:15<3:06:09,  2.31it/s]

{'loss': 19.1894, 'grad_norm': 62.913063049316406, 'learning_rate': 4.778888888888889e-05, 'epoch': 0.14}


 14%|█▍        | 4225/30000 [40:26<3:44:30,  1.91it/s]

{'loss': 15.4821, 'grad_norm': 58.250606536865234, 'learning_rate': 4.7742592592592596e-05, 'epoch': 0.14}


 14%|█▍        | 4250/30000 [40:39<3:51:31,  1.85it/s]

{'loss': 24.649, 'grad_norm': 28.157302856445312, 'learning_rate': 4.76962962962963e-05, 'epoch': 0.14}


 14%|█▍        | 4275/30000 [40:52<3:40:28,  1.94it/s]

{'loss': 15.3717, 'grad_norm': 34.92318344116211, 'learning_rate': 4.765e-05, 'epoch': 0.14}


 14%|█▍        | 4300/30000 [41:04<3:35:36,  1.99it/s]

{'loss': 18.6014, 'grad_norm': 47.54502487182617, 'learning_rate': 4.7603703703703705e-05, 'epoch': 0.14}


 14%|█▍        | 4325/30000 [41:15<3:34:02,  2.00it/s]

{'loss': 13.5914, 'grad_norm': 29.95307731628418, 'learning_rate': 4.755740740740741e-05, 'epoch': 0.14}


 14%|█▍        | 4350/30000 [41:27<3:41:01,  1.93it/s]

{'loss': 16.7452, 'grad_norm': 40.48641586303711, 'learning_rate': 4.751111111111111e-05, 'epoch': 0.14}


 15%|█▍        | 4375/30000 [41:40<3:36:59,  1.97it/s]

{'loss': 16.7788, 'grad_norm': 49.83125305175781, 'learning_rate': 4.746481481481482e-05, 'epoch': 0.15}


 15%|█▍        | 4400/30000 [41:51<3:25:06,  2.08it/s]

{'loss': 16.2961, 'grad_norm': 46.861873626708984, 'learning_rate': 4.742037037037037e-05, 'epoch': 0.15}


 15%|█▍        | 4425/30000 [42:04<3:35:46,  1.98it/s]

{'loss': 17.6394, 'grad_norm': 41.24041748046875, 'learning_rate': 4.7374074074074075e-05, 'epoch': 0.15}


 15%|█▍        | 4450/30000 [42:17<3:36:52,  1.96it/s]

{'loss': 27.244, 'grad_norm': 64.98941040039062, 'learning_rate': 4.732777777777778e-05, 'epoch': 0.15}


 15%|█▍        | 4475/30000 [42:29<3:39:48,  1.94it/s]

{'loss': 21.6392, 'grad_norm': 39.62762451171875, 'learning_rate': 4.7281481481481485e-05, 'epoch': 0.15}


 15%|█▌        | 4500/30000 [42:41<2:56:20,  2.41it/s]

{'loss': 19.1883, 'grad_norm': 21.145402908325195, 'learning_rate': 4.723518518518519e-05, 'epoch': 0.15}


 15%|█▌        | 4525/30000 [42:53<3:22:16,  2.10it/s]

{'loss': 15.0927, 'grad_norm': 45.778480529785156, 'learning_rate': 4.7188888888888896e-05, 'epoch': 0.15}


 15%|█▌        | 4550/30000 [43:06<3:35:30,  1.97it/s]

{'loss': 21.8076, 'grad_norm': 25.397558212280273, 'learning_rate': 4.7142592592592595e-05, 'epoch': 0.15}


 15%|█▌        | 4575/30000 [43:17<3:00:43,  2.34it/s]

{'loss': 16.1304, 'grad_norm': 27.2319278717041, 'learning_rate': 4.70962962962963e-05, 'epoch': 0.15}


 15%|█▌        | 4600/30000 [43:29<3:29:27,  2.02it/s]

{'loss': 17.1538, 'grad_norm': 45.20541763305664, 'learning_rate': 4.705e-05, 'epoch': 0.15}


 15%|█▌        | 4625/30000 [43:40<2:49:41,  2.49it/s]

{'loss': 17.5584, 'grad_norm': 32.486053466796875, 'learning_rate': 4.7003703703703704e-05, 'epoch': 0.15}


 16%|█▌        | 4650/30000 [43:51<3:34:23,  1.97it/s]

{'loss': 15.6187, 'grad_norm': 45.574703216552734, 'learning_rate': 4.695740740740741e-05, 'epoch': 0.15}


 16%|█▌        | 4675/30000 [44:03<3:34:46,  1.97it/s]

{'loss': 16.2329, 'grad_norm': 36.11827850341797, 'learning_rate': 4.6911111111111114e-05, 'epoch': 0.16}


 16%|█▌        | 4700/30000 [44:16<3:35:01,  1.96it/s]

{'loss': 26.005, 'grad_norm': 43.788143157958984, 'learning_rate': 4.686481481481482e-05, 'epoch': 0.16}


 16%|█▌        | 4725/30000 [44:28<3:40:27,  1.91it/s]

{'loss': 19.6136, 'grad_norm': 47.29106140136719, 'learning_rate': 4.6818518518518525e-05, 'epoch': 0.16}


 16%|█▌        | 4750/30000 [44:40<3:36:33,  1.94it/s]

{'loss': 19.8389, 'grad_norm': 44.14997863769531, 'learning_rate': 4.6772222222222224e-05, 'epoch': 0.16}


 16%|█▌        | 4775/30000 [44:52<3:37:02,  1.94it/s]

{'loss': 18.3579, 'grad_norm': 31.957895278930664, 'learning_rate': 4.672592592592593e-05, 'epoch': 0.16}


 16%|█▌        | 4800/30000 [45:05<3:36:58,  1.94it/s]

{'loss': 24.7826, 'grad_norm': 69.87605285644531, 'learning_rate': 4.6679629629629634e-05, 'epoch': 0.16}


 16%|█▌        | 4825/30000 [45:16<2:58:26,  2.35it/s]

{'loss': 19.4108, 'grad_norm': 36.94178771972656, 'learning_rate': 4.663333333333333e-05, 'epoch': 0.16}


 16%|█▌        | 4850/30000 [45:28<3:23:23,  2.06it/s]

{'loss': 17.7596, 'grad_norm': 36.55371856689453, 'learning_rate': 4.658703703703704e-05, 'epoch': 0.16}


 16%|█▋        | 4875/30000 [45:40<3:24:21,  2.05it/s]

{'loss': 21.4059, 'grad_norm': 53.33694076538086, 'learning_rate': 4.6540740740740744e-05, 'epoch': 0.16}


 16%|█▋        | 4900/30000 [45:52<3:46:34,  1.85it/s]

{'loss': 16.6781, 'grad_norm': 35.69208908081055, 'learning_rate': 4.649444444444445e-05, 'epoch': 0.16}


 16%|█▋        | 4925/30000 [46:04<3:35:58,  1.94it/s]

{'loss': 16.5291, 'grad_norm': 45.16056823730469, 'learning_rate': 4.6448148148148154e-05, 'epoch': 0.16}


 16%|█▋        | 4950/30000 [46:16<2:51:52,  2.43it/s]

{'loss': 20.6508, 'grad_norm': 10.941885948181152, 'learning_rate': 4.640185185185185e-05, 'epoch': 0.17}


 17%|█▋        | 4975/30000 [46:27<2:49:55,  2.45it/s]

{'loss': 15.4771, 'grad_norm': 33.7725715637207, 'learning_rate': 4.635555555555556e-05, 'epoch': 0.17}


 17%|█▋        | 5000/30000 [46:40<3:32:58,  1.96it/s]

{'loss': 19.8166, 'grad_norm': 56.73935317993164, 'learning_rate': 4.6309259259259264e-05, 'epoch': 0.17}


 17%|█▋        | 5025/30000 [46:52<3:30:35,  1.98it/s]

{'loss': 18.8655, 'grad_norm': 51.022403717041016, 'learning_rate': 4.626296296296296e-05, 'epoch': 0.17}


 17%|█▋        | 5050/30000 [47:03<3:37:15,  1.91it/s]

{'loss': 15.903, 'grad_norm': 35.30324935913086, 'learning_rate': 4.621666666666667e-05, 'epoch': 0.17}


 17%|█▋        | 5075/30000 [47:16<3:49:55,  1.81it/s]

{'loss': 17.4056, 'grad_norm': 28.52605628967285, 'learning_rate': 4.617037037037037e-05, 'epoch': 0.17}


 17%|█▋        | 5100/30000 [47:29<3:15:36,  2.12it/s]

{'loss': 19.2186, 'grad_norm': 58.127540588378906, 'learning_rate': 4.612407407407408e-05, 'epoch': 0.17}


 17%|█▋        | 5125/30000 [47:41<3:45:21,  1.84it/s]

{'loss': 16.5307, 'grad_norm': 51.28739929199219, 'learning_rate': 4.6077777777777783e-05, 'epoch': 0.17}


 17%|█▋        | 5150/30000 [47:53<3:02:54,  2.26it/s]

{'loss': 12.7429, 'grad_norm': 18.857452392578125, 'learning_rate': 4.603148148148148e-05, 'epoch': 0.17}


 17%|█▋        | 5175/30000 [48:05<3:41:24,  1.87it/s]

{'loss': 18.4396, 'grad_norm': 103.70193481445312, 'learning_rate': 4.598518518518519e-05, 'epoch': 0.17}


 17%|█▋        | 5200/30000 [48:18<3:05:03,  2.23it/s]

{'loss': 14.218, 'grad_norm': 23.822595596313477, 'learning_rate': 4.593888888888889e-05, 'epoch': 0.17}


 17%|█▋        | 5225/30000 [48:31<3:39:23,  1.88it/s]

{'loss': 19.5997, 'grad_norm': 42.40589141845703, 'learning_rate': 4.589259259259259e-05, 'epoch': 0.17}


 18%|█▊        | 5250/30000 [48:45<3:46:06,  1.82it/s]

{'loss': 14.4512, 'grad_norm': 31.370622634887695, 'learning_rate': 4.5846296296296297e-05, 'epoch': 0.17}


 18%|█▊        | 5275/30000 [48:59<3:59:57,  1.72it/s]

{'loss': 19.2615, 'grad_norm': 31.041746139526367, 'learning_rate': 4.58e-05, 'epoch': 0.18}


 18%|█▊        | 5300/30000 [49:12<3:46:27,  1.82it/s]

{'loss': 18.1107, 'grad_norm': 41.325103759765625, 'learning_rate': 4.575370370370371e-05, 'epoch': 0.18}


 18%|█▊        | 5325/30000 [49:24<2:53:13,  2.37it/s]

{'loss': 15.3175, 'grad_norm': 45.89544677734375, 'learning_rate': 4.570740740740741e-05, 'epoch': 0.18}


 18%|█▊        | 5350/30000 [49:37<3:00:19,  2.28it/s]

{'loss': 18.2191, 'grad_norm': 29.707054138183594, 'learning_rate': 4.566111111111112e-05, 'epoch': 0.18}


 18%|█▊        | 5375/30000 [49:48<2:55:25,  2.34it/s]

{'loss': 15.1771, 'grad_norm': 36.12180709838867, 'learning_rate': 4.5614814814814817e-05, 'epoch': 0.18}


 18%|█▊        | 5400/30000 [49:59<2:54:57,  2.34it/s]

{'loss': 13.02, 'grad_norm': 33.5576171875, 'learning_rate': 4.556851851851852e-05, 'epoch': 0.18}


 18%|█▊        | 5425/30000 [50:12<3:23:10,  2.02it/s]

{'loss': 18.3651, 'grad_norm': 44.85825729370117, 'learning_rate': 4.552222222222222e-05, 'epoch': 0.18}


 18%|█▊        | 5450/30000 [50:23<3:32:18,  1.93it/s]

{'loss': 17.8957, 'grad_norm': 103.42386627197266, 'learning_rate': 4.5475925925925926e-05, 'epoch': 0.18}


 18%|█▊        | 5475/30000 [50:36<3:32:35,  1.92it/s]

{'loss': 18.1513, 'grad_norm': 44.772727966308594, 'learning_rate': 4.542962962962963e-05, 'epoch': 0.18}


 18%|█▊        | 5500/30000 [50:49<3:32:12,  1.92it/s]

{'loss': 19.9577, 'grad_norm': 42.84458541870117, 'learning_rate': 4.5383333333333336e-05, 'epoch': 0.18}


 18%|█▊        | 5525/30000 [51:01<2:57:29,  2.30it/s]

{'loss': 16.7361, 'grad_norm': 29.729793548583984, 'learning_rate': 4.533703703703704e-05, 'epoch': 0.18}


 18%|█▊        | 5550/30000 [51:15<4:06:23,  1.65it/s]

{'loss': 22.2088, 'grad_norm': 71.18746185302734, 'learning_rate': 4.529074074074075e-05, 'epoch': 0.18}


 19%|█▊        | 5575/30000 [51:28<3:22:44,  2.01it/s]

{'loss': 12.9099, 'grad_norm': 20.254457473754883, 'learning_rate': 4.5244444444444446e-05, 'epoch': 0.19}


 19%|█▊        | 5600/30000 [51:42<3:55:50,  1.72it/s]

{'loss': 16.9867, 'grad_norm': 32.32893753051758, 'learning_rate': 4.519814814814815e-05, 'epoch': 0.19}


 19%|█▉        | 5625/30000 [51:56<4:04:04,  1.66it/s]

{'loss': 17.0368, 'grad_norm': 24.132667541503906, 'learning_rate': 4.5151851851851856e-05, 'epoch': 0.19}


 19%|█▉        | 5650/30000 [52:09<3:52:21,  1.75it/s]

{'loss': 13.5152, 'grad_norm': 32.37455368041992, 'learning_rate': 4.5105555555555555e-05, 'epoch': 0.19}


 19%|█▉        | 5675/30000 [52:23<3:18:55,  2.04it/s]

{'loss': 21.6962, 'grad_norm': 18.58466148376465, 'learning_rate': 4.505925925925926e-05, 'epoch': 0.19}


 19%|█▉        | 5700/30000 [52:36<3:28:11,  1.95it/s]

{'loss': 16.9126, 'grad_norm': 38.70191192626953, 'learning_rate': 4.5012962962962966e-05, 'epoch': 0.19}


 19%|█▉        | 5725/30000 [52:50<3:12:46,  2.10it/s]

{'loss': 16.8766, 'grad_norm': 25.637556076049805, 'learning_rate': 4.496666666666667e-05, 'epoch': 0.19}


 19%|█▉        | 5750/30000 [53:05<3:54:34,  1.72it/s]

{'loss': 25.4366, 'grad_norm': 36.06661605834961, 'learning_rate': 4.4920370370370376e-05, 'epoch': 0.19}


 19%|█▉        | 5775/30000 [53:19<3:41:40,  1.82it/s]

{'loss': 17.5677, 'grad_norm': 82.25835418701172, 'learning_rate': 4.4874074074074075e-05, 'epoch': 0.19}


 19%|█▉        | 5800/30000 [53:32<4:07:21,  1.63it/s]

{'loss': 14.8649, 'grad_norm': 35.15450668334961, 'learning_rate': 4.482777777777778e-05, 'epoch': 0.19}


 19%|█▉        | 5825/30000 [53:45<3:21:55,  2.00it/s]

{'loss': 18.0373, 'grad_norm': 31.085952758789062, 'learning_rate': 4.4781481481481486e-05, 'epoch': 0.19}


 20%|█▉        | 5850/30000 [53:59<3:30:17,  1.91it/s]

{'loss': 19.857, 'grad_norm': 82.31710052490234, 'learning_rate': 4.4735185185185184e-05, 'epoch': 0.2}


 20%|█▉        | 5875/30000 [54:13<3:59:49,  1.68it/s]

{'loss': 18.2909, 'grad_norm': 46.69183349609375, 'learning_rate': 4.468888888888889e-05, 'epoch': 0.2}


 20%|█▉        | 5900/30000 [54:28<3:47:03,  1.77it/s]

{'loss': 19.5341, 'grad_norm': 66.7958984375, 'learning_rate': 4.4642592592592595e-05, 'epoch': 0.2}


 20%|█▉        | 5925/30000 [54:42<3:47:51,  1.76it/s]

{'loss': 24.0393, 'grad_norm': 39.62435531616211, 'learning_rate': 4.45962962962963e-05, 'epoch': 0.2}


 20%|█▉        | 5950/30000 [54:57<4:03:23,  1.65it/s]

{'loss': 25.288, 'grad_norm': 68.63208770751953, 'learning_rate': 4.4550000000000005e-05, 'epoch': 0.2}


 20%|█▉        | 5975/30000 [55:10<3:04:57,  2.16it/s]

{'loss': 18.2677, 'grad_norm': 25.001840591430664, 'learning_rate': 4.450370370370371e-05, 'epoch': 0.2}


 20%|██        | 6000/30000 [55:24<3:18:49,  2.01it/s]

{'loss': 20.2862, 'grad_norm': 42.09543228149414, 'learning_rate': 4.445740740740741e-05, 'epoch': 0.2}


Too many dataloader workers: 10 (max is dataset.num_shards=3). Stopping 7 dataloader workers.
 20%|██        | 6000/30000 [58:25<3:18:49,  2.01it/s]

{'eval_loss': 29.104970932006836, 'eval_runtime': 180.9972, 'eval_samples_per_second': 21.503, 'eval_steps_per_second': 5.376, 'epoch': 0.2}


 20%|██        | 6025/30000 [58:46<3:29:56,  1.90it/s]  

{'loss': 15.8287, 'grad_norm': 36.72803497314453, 'learning_rate': 4.4411111111111115e-05, 'epoch': 0.2}


 20%|██        | 6050/30000 [58:58<3:01:25,  2.20it/s]

{'loss': 17.948, 'grad_norm': 26.055095672607422, 'learning_rate': 4.436481481481481e-05, 'epoch': 0.2}


 20%|██        | 6075/30000 [59:10<3:23:57,  1.96it/s]

{'loss': 16.8921, 'grad_norm': 23.586381912231445, 'learning_rate': 4.431851851851852e-05, 'epoch': 0.2}


 20%|██        | 6100/30000 [59:22<3:22:33,  1.97it/s]

{'loss': 15.62, 'grad_norm': 53.15266799926758, 'learning_rate': 4.4272222222222224e-05, 'epoch': 0.2}


 20%|██        | 6125/30000 [59:34<3:13:45,  2.05it/s]

{'loss': 25.2939, 'grad_norm': 59.84363555908203, 'learning_rate': 4.422592592592593e-05, 'epoch': 0.2}


 20%|██        | 6150/30000 [59:46<3:20:45,  1.98it/s]

{'loss': 19.6536, 'grad_norm': 65.3487548828125, 'learning_rate': 4.4179629629629635e-05, 'epoch': 0.2}


 21%|██        | 6175/30000 [59:58<3:14:00,  2.05it/s]

{'loss': 14.1602, 'grad_norm': 17.494779586791992, 'learning_rate': 4.413333333333334e-05, 'epoch': 0.21}


 21%|██        | 6200/30000 [1:00:10<3:25:18,  1.93it/s]

{'loss': 18.2495, 'grad_norm': 49.8433723449707, 'learning_rate': 4.408703703703704e-05, 'epoch': 0.21}


 21%|██        | 6225/30000 [1:00:22<3:17:18,  2.01it/s]

{'loss': 20.7039, 'grad_norm': 48.14457702636719, 'learning_rate': 4.4042592592592594e-05, 'epoch': 0.21}


 21%|██        | 6250/30000 [1:00:33<2:49:01,  2.34it/s]

{'loss': 19.0023, 'grad_norm': 35.20683670043945, 'learning_rate': 4.39962962962963e-05, 'epoch': 0.21}


 21%|██        | 6275/30000 [1:00:46<3:17:31,  2.00it/s]

{'loss': 21.3614, 'grad_norm': 36.352394104003906, 'learning_rate': 4.3950000000000004e-05, 'epoch': 0.21}


 21%|██        | 6300/30000 [1:00:58<2:56:48,  2.23it/s]

{'loss': 20.6148, 'grad_norm': 56.06901168823242, 'learning_rate': 4.390370370370371e-05, 'epoch': 0.21}


 21%|██        | 6325/30000 [1:01:11<3:04:45,  2.14it/s]

{'loss': 23.3395, 'grad_norm': 29.13734245300293, 'learning_rate': 4.385740740740741e-05, 'epoch': 0.21}


 21%|██        | 6350/30000 [1:01:22<3:13:19,  2.04it/s]

{'loss': 18.6844, 'grad_norm': 35.358863830566406, 'learning_rate': 4.3811111111111114e-05, 'epoch': 0.21}


 21%|██▏       | 6375/30000 [1:01:33<3:22:56,  1.94it/s]

{'loss': 13.1175, 'grad_norm': 54.81475067138672, 'learning_rate': 4.376481481481482e-05, 'epoch': 0.21}


 21%|██▏       | 6400/30000 [1:01:46<3:21:05,  1.96it/s]

{'loss': 20.4848, 'grad_norm': 59.07655715942383, 'learning_rate': 4.371851851851852e-05, 'epoch': 0.21}


 21%|██▏       | 6425/30000 [1:01:58<3:21:44,  1.95it/s]

{'loss': 24.5246, 'grad_norm': 45.95403289794922, 'learning_rate': 4.367222222222222e-05, 'epoch': 0.21}


 22%|██▏       | 6450/30000 [1:02:10<2:52:36,  2.27it/s]

{'loss': 17.5259, 'grad_norm': 39.47509002685547, 'learning_rate': 4.362592592592593e-05, 'epoch': 0.21}


 22%|██▏       | 6475/30000 [1:02:23<3:18:06,  1.98it/s]

{'loss': 17.2353, 'grad_norm': 50.076332092285156, 'learning_rate': 4.3579629629629634e-05, 'epoch': 0.22}


 22%|██▏       | 6500/30000 [1:02:34<2:43:05,  2.40it/s]

{'loss': 19.9636, 'grad_norm': 15.701350212097168, 'learning_rate': 4.353333333333334e-05, 'epoch': 0.22}


 22%|██▏       | 6525/30000 [1:02:46<3:19:25,  1.96it/s]

{'loss': 21.7949, 'grad_norm': 75.74345397949219, 'learning_rate': 4.348703703703704e-05, 'epoch': 0.22}


 22%|██▏       | 6550/30000 [1:02:58<3:16:02,  1.99it/s]

{'loss': 18.0053, 'grad_norm': 24.224790573120117, 'learning_rate': 4.344074074074074e-05, 'epoch': 0.22}


 22%|██▏       | 6575/30000 [1:03:10<3:23:43,  1.92it/s]

{'loss': 16.1367, 'grad_norm': 45.726680755615234, 'learning_rate': 4.339444444444445e-05, 'epoch': 0.22}


 22%|██▏       | 6600/30000 [1:03:23<3:15:07,  2.00it/s]

{'loss': 21.1735, 'grad_norm': 41.60273361206055, 'learning_rate': 4.334814814814815e-05, 'epoch': 0.22}


 22%|██▏       | 6625/30000 [1:03:34<2:49:53,  2.29it/s]

{'loss': 19.6231, 'grad_norm': 37.92676544189453, 'learning_rate': 4.330185185185185e-05, 'epoch': 0.22}


 22%|██▏       | 6650/30000 [1:03:46<3:19:18,  1.95it/s]

{'loss': 14.7755, 'grad_norm': 84.082275390625, 'learning_rate': 4.325555555555556e-05, 'epoch': 0.22}


 22%|██▏       | 6675/30000 [1:03:58<2:46:57,  2.33it/s]

{'loss': 13.2358, 'grad_norm': 47.05772018432617, 'learning_rate': 4.320925925925926e-05, 'epoch': 0.22}


 22%|██▏       | 6700/30000 [1:04:09<3:12:09,  2.02it/s]

{'loss': 14.5919, 'grad_norm': 68.79151916503906, 'learning_rate': 4.316296296296297e-05, 'epoch': 0.22}


 22%|██▏       | 6725/30000 [1:04:22<3:29:56,  1.85it/s]

{'loss': 18.7821, 'grad_norm': 33.050594329833984, 'learning_rate': 4.311666666666667e-05, 'epoch': 0.22}


 22%|██▎       | 6750/30000 [1:04:33<2:27:17,  2.63it/s]

{'loss': 22.1312, 'grad_norm': 34.93065643310547, 'learning_rate': 4.307037037037037e-05, 'epoch': 0.23}


 23%|██▎       | 6775/30000 [1:04:45<3:10:00,  2.04it/s]

{'loss': 17.7941, 'grad_norm': 50.494380950927734, 'learning_rate': 4.302407407407408e-05, 'epoch': 0.23}


 23%|██▎       | 6800/30000 [1:04:57<3:12:59,  2.00it/s]

{'loss': 16.4499, 'grad_norm': 37.0106201171875, 'learning_rate': 4.2977777777777776e-05, 'epoch': 0.23}


 23%|██▎       | 6825/30000 [1:05:09<3:15:00,  1.98it/s]

{'loss': 18.0243, 'grad_norm': 36.557533264160156, 'learning_rate': 4.293148148148148e-05, 'epoch': 0.23}


 23%|██▎       | 6850/30000 [1:05:22<3:12:44,  2.00it/s]

{'loss': 17.1965, 'grad_norm': 39.78514099121094, 'learning_rate': 4.2885185185185187e-05, 'epoch': 0.23}


 23%|██▎       | 6875/30000 [1:05:33<2:35:38,  2.48it/s]

{'loss': 15.8438, 'grad_norm': 43.04949951171875, 'learning_rate': 4.283888888888889e-05, 'epoch': 0.23}


 23%|██▎       | 6900/30000 [1:05:45<2:40:01,  2.41it/s]

{'loss': 16.9595, 'grad_norm': 27.21409797668457, 'learning_rate': 4.27925925925926e-05, 'epoch': 0.23}


 23%|██▎       | 6925/30000 [1:05:58<3:10:37,  2.02it/s]

{'loss': 25.4023, 'grad_norm': 23.30606460571289, 'learning_rate': 4.27462962962963e-05, 'epoch': 0.23}


 23%|██▎       | 6950/30000 [1:06:10<3:13:41,  1.98it/s]

{'loss': 19.3866, 'grad_norm': 37.029052734375, 'learning_rate': 4.27e-05, 'epoch': 0.23}


 23%|██▎       | 6975/30000 [1:06:22<3:17:42,  1.94it/s]

{'loss': 18.7836, 'grad_norm': 29.601818084716797, 'learning_rate': 4.2653703703703706e-05, 'epoch': 0.23}


 23%|██▎       | 7000/30000 [1:06:34<3:05:06,  2.07it/s]

{'loss': 16.3933, 'grad_norm': 42.11093521118164, 'learning_rate': 4.2607407407407405e-05, 'epoch': 0.23}


 23%|██▎       | 7025/30000 [1:06:47<3:12:57,  1.98it/s]

{'loss': 19.722, 'grad_norm': 41.18967056274414, 'learning_rate': 4.256111111111111e-05, 'epoch': 0.23}


 24%|██▎       | 7050/30000 [1:06:58<2:40:24,  2.38it/s]

{'loss': 14.7854, 'grad_norm': 16.674419403076172, 'learning_rate': 4.2514814814814816e-05, 'epoch': 0.23}


 24%|██▎       | 7075/30000 [1:07:10<3:11:35,  1.99it/s]

{'loss': 23.1346, 'grad_norm': 57.82908248901367, 'learning_rate': 4.246851851851852e-05, 'epoch': 0.24}


 24%|██▎       | 7100/30000 [1:07:22<3:05:46,  2.05it/s]

{'loss': 16.4221, 'grad_norm': 35.68750762939453, 'learning_rate': 4.2422222222222226e-05, 'epoch': 0.24}


 24%|██▍       | 7125/30000 [1:07:34<3:06:42,  2.04it/s]

{'loss': 20.442, 'grad_norm': 53.411800384521484, 'learning_rate': 4.237592592592593e-05, 'epoch': 0.24}


 24%|██▍       | 7150/30000 [1:07:47<3:16:53,  1.93it/s]

{'loss': 20.0941, 'grad_norm': 49.99555587768555, 'learning_rate': 4.232962962962963e-05, 'epoch': 0.24}


 24%|██▍       | 7175/30000 [1:07:59<2:49:21,  2.25it/s]

{'loss': 11.3148, 'grad_norm': 18.319929122924805, 'learning_rate': 4.2283333333333336e-05, 'epoch': 0.24}


 24%|██▍       | 7200/30000 [1:08:12<3:16:19,  1.94it/s]

{'loss': 17.7418, 'grad_norm': 44.68300247192383, 'learning_rate': 4.223703703703704e-05, 'epoch': 0.24}


 24%|██▍       | 7225/30000 [1:08:24<3:11:07,  1.99it/s]

{'loss': 16.6186, 'grad_norm': 50.15288162231445, 'learning_rate': 4.2192592592592596e-05, 'epoch': 0.24}


 24%|██▍       | 7250/30000 [1:08:37<3:11:05,  1.98it/s]

{'loss': 20.0852, 'grad_norm': 53.99336242675781, 'learning_rate': 4.21462962962963e-05, 'epoch': 0.24}


 24%|██▍       | 7275/30000 [1:08:48<2:42:20,  2.33it/s]

{'loss': 16.1064, 'grad_norm': 39.2442626953125, 'learning_rate': 4.21e-05, 'epoch': 0.24}


 24%|██▍       | 7300/30000 [1:09:00<3:05:30,  2.04it/s]

{'loss': 16.1072, 'grad_norm': 44.4179573059082, 'learning_rate': 4.2053703703703705e-05, 'epoch': 0.24}


 24%|██▍       | 7325/30000 [1:09:12<2:54:06,  2.17it/s]

{'loss': 20.4659, 'grad_norm': 25.99185562133789, 'learning_rate': 4.200740740740741e-05, 'epoch': 0.24}


 24%|██▍       | 7350/30000 [1:09:25<3:14:06,  1.94it/s]

{'loss': 19.2824, 'grad_norm': 53.59809112548828, 'learning_rate': 4.196111111111111e-05, 'epoch': 0.24}


 25%|██▍       | 7375/30000 [1:09:36<3:05:53,  2.03it/s]

{'loss': 15.0621, 'grad_norm': 49.10029983520508, 'learning_rate': 4.1914814814814815e-05, 'epoch': 0.25}


 25%|██▍       | 7400/30000 [1:09:47<2:39:59,  2.35it/s]

{'loss': 11.879, 'grad_norm': 41.438812255859375, 'learning_rate': 4.186851851851852e-05, 'epoch': 0.25}


 25%|██▍       | 7425/30000 [1:09:58<2:36:03,  2.41it/s]

{'loss': 13.2554, 'grad_norm': 11.815878868103027, 'learning_rate': 4.1822222222222225e-05, 'epoch': 0.25}


 25%|██▍       | 7450/30000 [1:10:10<3:20:06,  1.88it/s]

{'loss': 14.6688, 'grad_norm': 38.70671081542969, 'learning_rate': 4.177592592592593e-05, 'epoch': 0.25}


 25%|██▍       | 7475/30000 [1:10:21<2:31:44,  2.47it/s]

{'loss': 21.7332, 'grad_norm': 31.68916893005371, 'learning_rate': 4.172962962962963e-05, 'epoch': 0.25}


 25%|██▌       | 7500/30000 [1:10:34<2:50:41,  2.20it/s]

{'loss': 17.0799, 'grad_norm': 28.958066940307617, 'learning_rate': 4.1683333333333335e-05, 'epoch': 0.25}


 25%|██▌       | 7525/30000 [1:10:47<3:14:04,  1.93it/s]

{'loss': 14.8254, 'grad_norm': 22.135517120361328, 'learning_rate': 4.163703703703704e-05, 'epoch': 0.25}


 25%|██▌       | 7550/30000 [1:11:00<3:12:48,  1.94it/s]

{'loss': 25.4587, 'grad_norm': 26.830413818359375, 'learning_rate': 4.159074074074074e-05, 'epoch': 0.25}


 25%|██▌       | 7575/30000 [1:11:12<3:03:49,  2.03it/s]

{'loss': 15.6655, 'grad_norm': 84.45604705810547, 'learning_rate': 4.1544444444444444e-05, 'epoch': 0.25}


 25%|██▌       | 7600/30000 [1:11:24<3:13:31,  1.93it/s]

{'loss': 17.2133, 'grad_norm': 42.972694396972656, 'learning_rate': 4.1498148148148156e-05, 'epoch': 0.25}


 25%|██▌       | 7625/30000 [1:11:36<3:05:52,  2.01it/s]

{'loss': 18.7529, 'grad_norm': 61.60957717895508, 'learning_rate': 4.1451851851851855e-05, 'epoch': 0.25}


 26%|██▌       | 7650/30000 [1:11:48<3:05:46,  2.01it/s]

{'loss': 19.1123, 'grad_norm': 44.44479751586914, 'learning_rate': 4.140555555555556e-05, 'epoch': 0.26}


 26%|██▌       | 7675/30000 [1:12:00<3:10:22,  1.95it/s]

{'loss': 21.2217, 'grad_norm': 49.356048583984375, 'learning_rate': 4.135925925925926e-05, 'epoch': 0.26}


 26%|██▌       | 7700/30000 [1:12:12<2:32:06,  2.44it/s]

{'loss': 14.4708, 'grad_norm': 28.23849105834961, 'learning_rate': 4.1312962962962964e-05, 'epoch': 0.26}


 26%|██▌       | 7725/30000 [1:12:24<3:10:34,  1.95it/s]

{'loss': 14.6371, 'grad_norm': 43.99629211425781, 'learning_rate': 4.126666666666667e-05, 'epoch': 0.26}


 26%|██▌       | 7750/30000 [1:12:35<3:08:21,  1.97it/s]

{'loss': 14.7629, 'grad_norm': 28.152841567993164, 'learning_rate': 4.122037037037037e-05, 'epoch': 0.26}


 26%|██▌       | 7775/30000 [1:12:48<3:11:27,  1.93it/s]

{'loss': 17.4864, 'grad_norm': 23.147363662719727, 'learning_rate': 4.117407407407407e-05, 'epoch': 0.26}


 26%|██▌       | 7800/30000 [1:12:59<3:06:48,  1.98it/s]

{'loss': 15.0248, 'grad_norm': 48.1171760559082, 'learning_rate': 4.1127777777777785e-05, 'epoch': 0.26}


 26%|██▌       | 7825/30000 [1:13:12<3:08:44,  1.96it/s]

{'loss': 21.2189, 'grad_norm': 59.56917190551758, 'learning_rate': 4.1081481481481484e-05, 'epoch': 0.26}


 26%|██▌       | 7850/30000 [1:13:23<3:08:52,  1.95it/s]

{'loss': 13.3242, 'grad_norm': 81.65150451660156, 'learning_rate': 4.103518518518519e-05, 'epoch': 0.26}


 26%|██▋       | 7875/30000 [1:13:35<2:28:24,  2.48it/s]

{'loss': 16.5284, 'grad_norm': 51.89410400390625, 'learning_rate': 4.0988888888888894e-05, 'epoch': 0.26}


 26%|██▋       | 7900/30000 [1:13:47<2:37:24,  2.34it/s]

{'loss': 11.557, 'grad_norm': 35.719398498535156, 'learning_rate': 4.094259259259259e-05, 'epoch': 0.26}


 26%|██▋       | 7925/30000 [1:14:00<3:07:59,  1.96it/s]

{'loss': 16.9837, 'grad_norm': 62.61414337158203, 'learning_rate': 4.08962962962963e-05, 'epoch': 0.26}


 26%|██▋       | 7950/30000 [1:14:12<3:06:02,  1.98it/s]

{'loss': 18.427, 'grad_norm': 39.355499267578125, 'learning_rate': 4.085e-05, 'epoch': 0.27}


 27%|██▋       | 7975/30000 [1:14:24<3:08:52,  1.94it/s]

{'loss': 20.7547, 'grad_norm': 47.01577377319336, 'learning_rate': 4.08037037037037e-05, 'epoch': 0.27}


 27%|██▋       | 8000/30000 [1:14:37<2:49:32,  2.16it/s]

{'loss': 16.3098, 'grad_norm': 66.32861328125, 'learning_rate': 4.075740740740741e-05, 'epoch': 0.27}


Too many dataloader workers: 10 (max is dataset.num_shards=3). Stopping 7 dataloader workers.


{'eval_loss': 28.793615341186523, 'eval_runtime': 177.7452, 'eval_samples_per_second': 21.897, 'eval_steps_per_second': 5.474, 'epoch': 0.27}


 27%|██▋       | 8025/30000 [1:18:01<4:00:12,  1.52it/s]  

{'loss': 18.3969, 'grad_norm': 50.643226623535156, 'learning_rate': 4.071111111111111e-05, 'epoch': 0.27}


 27%|██▋       | 8050/30000 [1:18:12<2:32:22,  2.40it/s]

{'loss': 17.2302, 'grad_norm': 23.46561050415039, 'learning_rate': 4.066481481481482e-05, 'epoch': 0.27}


 27%|██▋       | 8075/30000 [1:18:24<2:38:24,  2.31it/s]

{'loss': 18.9765, 'grad_norm': 35.54780960083008, 'learning_rate': 4.0618518518518524e-05, 'epoch': 0.27}


 27%|██▋       | 8100/30000 [1:18:36<2:19:07,  2.62it/s]

{'loss': 15.3, 'grad_norm': 25.92477035522461, 'learning_rate': 4.057222222222222e-05, 'epoch': 0.27}


 27%|██▋       | 8125/30000 [1:18:49<3:11:11,  1.91it/s]

{'loss': 16.4006, 'grad_norm': 43.16529083251953, 'learning_rate': 4.052592592592593e-05, 'epoch': 0.27}


 27%|██▋       | 8150/30000 [1:19:02<3:08:52,  1.93it/s]

{'loss': 19.429, 'grad_norm': 54.1532096862793, 'learning_rate': 4.047962962962963e-05, 'epoch': 0.27}


 27%|██▋       | 8175/30000 [1:19:14<2:22:46,  2.55it/s]

{'loss': 13.556, 'grad_norm': 30.17148780822754, 'learning_rate': 4.043333333333333e-05, 'epoch': 0.27}


 27%|██▋       | 8200/30000 [1:19:25<2:40:05,  2.27it/s]

{'loss': 16.7407, 'grad_norm': 41.900230407714844, 'learning_rate': 4.038703703703704e-05, 'epoch': 0.27}


 27%|██▋       | 8225/30000 [1:19:37<2:34:05,  2.36it/s]

{'loss': 19.6178, 'grad_norm': 26.807262420654297, 'learning_rate': 4.034074074074074e-05, 'epoch': 0.27}


 28%|██▊       | 8250/30000 [1:19:49<2:21:06,  2.57it/s]

{'loss': 11.8492, 'grad_norm': 31.832897186279297, 'learning_rate': 4.029444444444445e-05, 'epoch': 0.28}


 28%|██▊       | 8275/30000 [1:20:00<3:02:52,  1.98it/s]

{'loss': 20.3792, 'grad_norm': 52.16863250732422, 'learning_rate': 4.024814814814815e-05, 'epoch': 0.28}


 28%|██▊       | 8300/30000 [1:20:11<2:31:20,  2.39it/s]

{'loss': 14.2384, 'grad_norm': 18.31647300720215, 'learning_rate': 4.020185185185185e-05, 'epoch': 0.28}


 28%|██▊       | 8325/30000 [1:20:23<3:09:10,  1.91it/s]

{'loss': 19.7393, 'grad_norm': 51.03809356689453, 'learning_rate': 4.0155555555555557e-05, 'epoch': 0.28}


 28%|██▊       | 8350/30000 [1:20:34<3:07:55,  1.92it/s]

{'loss': 10.425, 'grad_norm': 71.44564819335938, 'learning_rate': 4.010925925925926e-05, 'epoch': 0.28}


 28%|██▊       | 8375/30000 [1:20:47<3:05:17,  1.95it/s]

{'loss': 21.5784, 'grad_norm': 59.58249282836914, 'learning_rate': 4.006296296296296e-05, 'epoch': 0.28}


 28%|██▊       | 8400/30000 [1:20:59<2:27:27,  2.44it/s]

{'loss': 17.3329, 'grad_norm': 14.336095809936523, 'learning_rate': 4.0016666666666666e-05, 'epoch': 0.28}


 28%|██▊       | 8425/30000 [1:21:10<3:08:50,  1.90it/s]

{'loss': 13.0971, 'grad_norm': 27.870607376098633, 'learning_rate': 3.997037037037038e-05, 'epoch': 0.28}


 28%|██▊       | 8450/30000 [1:21:23<3:05:54,  1.93it/s]

{'loss': 22.5158, 'grad_norm': 98.07991027832031, 'learning_rate': 3.9924074074074077e-05, 'epoch': 0.28}


 28%|██▊       | 8475/30000 [1:21:36<3:02:53,  1.96it/s]

{'loss': 22.1896, 'grad_norm': 40.160682678222656, 'learning_rate': 3.987777777777778e-05, 'epoch': 0.28}


 28%|██▊       | 8500/30000 [1:21:48<3:04:39,  1.94it/s]

{'loss': 15.3399, 'grad_norm': 43.83748245239258, 'learning_rate': 3.983148148148148e-05, 'epoch': 0.28}


 28%|██▊       | 8525/30000 [1:22:00<2:59:00,  2.00it/s]

{'loss': 14.2783, 'grad_norm': 32.6021614074707, 'learning_rate': 3.9785185185185186e-05, 'epoch': 0.28}


 28%|██▊       | 8550/30000 [1:22:12<2:45:40,  2.16it/s]

{'loss': 19.4605, 'grad_norm': 65.9903335571289, 'learning_rate': 3.973888888888889e-05, 'epoch': 0.28}


 29%|██▊       | 8575/30000 [1:22:24<2:20:56,  2.53it/s]

{'loss': 25.2253, 'grad_norm': 22.335176467895508, 'learning_rate': 3.969259259259259e-05, 'epoch': 0.29}


 29%|██▊       | 8600/30000 [1:22:36<3:03:23,  1.94it/s]

{'loss': 19.3619, 'grad_norm': 46.193931579589844, 'learning_rate': 3.9646296296296295e-05, 'epoch': 0.29}


 29%|██▉       | 8625/30000 [1:22:49<3:04:49,  1.93it/s]

{'loss': 21.9393, 'grad_norm': 59.65625, 'learning_rate': 3.960000000000001e-05, 'epoch': 0.29}


 29%|██▉       | 8650/30000 [1:23:01<2:58:50,  1.99it/s]

{'loss': 15.2089, 'grad_norm': 58.18660354614258, 'learning_rate': 3.9553703703703706e-05, 'epoch': 0.29}


 29%|██▉       | 8675/30000 [1:23:13<2:56:02,  2.02it/s]

{'loss': 18.8221, 'grad_norm': 51.133087158203125, 'learning_rate': 3.950740740740741e-05, 'epoch': 0.29}


 29%|██▉       | 8700/30000 [1:23:25<2:33:21,  2.31it/s]

{'loss': 17.2557, 'grad_norm': 31.987613677978516, 'learning_rate': 3.9461111111111116e-05, 'epoch': 0.29}


 29%|██▉       | 8725/30000 [1:23:38<3:07:46,  1.89it/s]

{'loss': 20.6514, 'grad_norm': 53.67612075805664, 'learning_rate': 3.9414814814814815e-05, 'epoch': 0.29}


 29%|██▉       | 8750/30000 [1:23:51<3:01:48,  1.95it/s]

{'loss': 21.6582, 'grad_norm': 44.32560348510742, 'learning_rate': 3.936851851851852e-05, 'epoch': 0.29}


 29%|██▉       | 8775/30000 [1:24:02<2:56:02,  2.01it/s]

{'loss': 13.4805, 'grad_norm': 27.839542388916016, 'learning_rate': 3.932222222222222e-05, 'epoch': 0.29}


 29%|██▉       | 8800/30000 [1:24:15<3:04:49,  1.91it/s]

{'loss': 15.2241, 'grad_norm': 26.61453628540039, 'learning_rate': 3.9275925925925924e-05, 'epoch': 0.29}


 29%|██▉       | 8825/30000 [1:24:27<2:39:18,  2.22it/s]

{'loss': 10.4836, 'grad_norm': 47.1616325378418, 'learning_rate': 3.922962962962963e-05, 'epoch': 0.29}


 30%|██▉       | 8850/30000 [1:24:39<2:51:20,  2.06it/s]

{'loss': 22.4033, 'grad_norm': 35.11111068725586, 'learning_rate': 3.9183333333333335e-05, 'epoch': 0.29}


 30%|██▉       | 8875/30000 [1:24:50<2:51:44,  2.05it/s]

{'loss': 9.3013, 'grad_norm': 52.82780456542969, 'learning_rate': 3.913703703703704e-05, 'epoch': 0.3}


 30%|██▉       | 8900/30000 [1:25:02<3:05:30,  1.90it/s]

{'loss': 14.5567, 'grad_norm': 29.365049362182617, 'learning_rate': 3.9090740740740746e-05, 'epoch': 0.3}


 30%|██▉       | 8925/30000 [1:25:14<2:31:33,  2.32it/s]

{'loss': 20.907, 'grad_norm': 24.886703491210938, 'learning_rate': 3.9044444444444444e-05, 'epoch': 0.3}


 30%|██▉       | 8950/30000 [1:25:26<2:56:35,  1.99it/s]

{'loss': 15.6331, 'grad_norm': 17.754501342773438, 'learning_rate': 3.899814814814815e-05, 'epoch': 0.3}


 30%|██▉       | 8975/30000 [1:25:38<2:27:34,  2.37it/s]

{'loss': 16.4401, 'grad_norm': 41.48281478881836, 'learning_rate': 3.8951851851851855e-05, 'epoch': 0.3}


 30%|███       | 9000/30000 [1:25:50<2:55:54,  1.99it/s]

{'loss': 16.1324, 'grad_norm': 28.9782772064209, 'learning_rate': 3.890555555555555e-05, 'epoch': 0.3}


 30%|███       | 9025/30000 [1:26:02<2:54:05,  2.01it/s]

{'loss': 18.2781, 'grad_norm': 52.0673713684082, 'learning_rate': 3.885925925925926e-05, 'epoch': 0.3}


 30%|███       | 9050/30000 [1:26:15<2:51:37,  2.03it/s]

{'loss': 18.222, 'grad_norm': 44.740867614746094, 'learning_rate': 3.8812962962962964e-05, 'epoch': 0.3}


 30%|███       | 9075/30000 [1:26:26<2:52:06,  2.03it/s]

{'loss': 13.4469, 'grad_norm': 42.78768539428711, 'learning_rate': 3.876666666666667e-05, 'epoch': 0.3}


 30%|███       | 9100/30000 [1:26:39<2:54:08,  2.00it/s]

{'loss': 19.3658, 'grad_norm': 43.01389694213867, 'learning_rate': 3.8720370370370375e-05, 'epoch': 0.3}


 30%|███       | 9125/30000 [1:26:51<2:56:04,  1.98it/s]

{'loss': 25.0389, 'grad_norm': 55.12788009643555, 'learning_rate': 3.867407407407407e-05, 'epoch': 0.3}


 30%|███       | 9150/30000 [1:27:03<2:48:36,  2.06it/s]

{'loss': 15.3263, 'grad_norm': 31.980876922607422, 'learning_rate': 3.862777777777778e-05, 'epoch': 0.3}


 31%|███       | 9175/30000 [1:27:15<2:49:47,  2.04it/s]

{'loss': 15.4029, 'grad_norm': 61.54401397705078, 'learning_rate': 3.8581481481481484e-05, 'epoch': 0.31}


 31%|███       | 9200/30000 [1:27:27<2:17:50,  2.51it/s]

{'loss': 18.8744, 'grad_norm': 25.786087036132812, 'learning_rate': 3.853518518518518e-05, 'epoch': 0.31}


 31%|███       | 9225/30000 [1:27:38<2:52:17,  2.01it/s]

{'loss': 18.7345, 'grad_norm': 42.04170227050781, 'learning_rate': 3.848888888888889e-05, 'epoch': 0.31}


 31%|███       | 9250/30000 [1:27:50<2:49:55,  2.04it/s]

{'loss': 12.7785, 'grad_norm': 57.85971450805664, 'learning_rate': 3.84425925925926e-05, 'epoch': 0.31}


 31%|███       | 9275/30000 [1:28:03<2:50:50,  2.02it/s]

{'loss': 17.3252, 'grad_norm': 30.21937370300293, 'learning_rate': 3.83962962962963e-05, 'epoch': 0.31}


 31%|███       | 9300/30000 [1:28:14<2:52:23,  2.00it/s]

{'loss': 14.3758, 'grad_norm': 150.14576721191406, 'learning_rate': 3.8350000000000004e-05, 'epoch': 0.31}


 31%|███       | 9325/30000 [1:28:26<2:51:52,  2.00it/s]

{'loss': 17.4259, 'grad_norm': 34.4419059753418, 'learning_rate': 3.830370370370371e-05, 'epoch': 0.31}


 31%|███       | 9350/30000 [1:28:37<2:19:19,  2.47it/s]

{'loss': 15.6676, 'grad_norm': 27.278339385986328, 'learning_rate': 3.825740740740741e-05, 'epoch': 0.31}


 31%|███▏      | 9375/30000 [1:28:49<2:50:08,  2.02it/s]

{'loss': 13.5872, 'grad_norm': 46.021507263183594, 'learning_rate': 3.821111111111111e-05, 'epoch': 0.31}


 31%|███▏      | 9400/30000 [1:29:02<2:32:05,  2.26it/s]

{'loss': 19.883, 'grad_norm': 24.87390899658203, 'learning_rate': 3.816481481481481e-05, 'epoch': 0.31}


 31%|███▏      | 9425/30000 [1:29:14<2:51:50,  2.00it/s]

{'loss': 16.3709, 'grad_norm': 34.7171745300293, 'learning_rate': 3.811851851851852e-05, 'epoch': 0.31}


 32%|███▏      | 9450/30000 [1:29:26<2:36:50,  2.18it/s]

{'loss': 21.1349, 'grad_norm': 59.82864761352539, 'learning_rate': 3.807222222222223e-05, 'epoch': 0.32}


 32%|███▏      | 9475/30000 [1:29:38<2:56:55,  1.93it/s]

{'loss': 25.6979, 'grad_norm': 61.04928970336914, 'learning_rate': 3.802592592592593e-05, 'epoch': 0.32}


 32%|███▏      | 9500/30000 [1:29:49<2:38:16,  2.16it/s]

{'loss': 13.4381, 'grad_norm': 16.11832046508789, 'learning_rate': 3.797962962962963e-05, 'epoch': 0.32}


 32%|███▏      | 9525/30000 [1:30:01<2:51:36,  1.99it/s]

{'loss': 15.4082, 'grad_norm': 38.08115005493164, 'learning_rate': 3.793333333333334e-05, 'epoch': 0.32}


 32%|███▏      | 9550/30000 [1:30:13<2:58:47,  1.91it/s]

{'loss': 14.7794, 'grad_norm': 46.4102897644043, 'learning_rate': 3.788703703703704e-05, 'epoch': 0.32}


 32%|███▏      | 9575/30000 [1:30:26<2:47:12,  2.04it/s]

{'loss': 12.4491, 'grad_norm': 15.846877098083496, 'learning_rate': 3.784074074074074e-05, 'epoch': 0.32}


 32%|███▏      | 9600/30000 [1:30:36<2:45:35,  2.05it/s]

{'loss': 16.8896, 'grad_norm': 22.078350067138672, 'learning_rate': 3.779444444444445e-05, 'epoch': 0.32}


 32%|███▏      | 9625/30000 [1:30:48<2:53:03,  1.96it/s]

{'loss': 14.052, 'grad_norm': 42.922645568847656, 'learning_rate': 3.7748148148148146e-05, 'epoch': 0.32}


 32%|███▏      | 9650/30000 [1:30:59<2:12:07,  2.57it/s]

{'loss': 14.3962, 'grad_norm': 21.023834228515625, 'learning_rate': 3.770185185185186e-05, 'epoch': 0.32}


 32%|███▏      | 9675/30000 [1:31:11<2:49:35,  2.00it/s]

{'loss': 15.7384, 'grad_norm': 39.50383758544922, 'learning_rate': 3.765555555555556e-05, 'epoch': 0.32}


 32%|███▏      | 9700/30000 [1:31:22<2:16:37,  2.48it/s]

{'loss': 11.6577, 'grad_norm': 33.78419876098633, 'learning_rate': 3.760925925925926e-05, 'epoch': 0.32}


 32%|███▏      | 9725/30000 [1:31:34<2:45:26,  2.04it/s]

{'loss': 17.5258, 'grad_norm': 56.7797737121582, 'learning_rate': 3.756296296296297e-05, 'epoch': 0.32}


 32%|███▎      | 9750/30000 [1:31:46<2:51:59,  1.96it/s]

{'loss': 19.7029, 'grad_norm': 48.466758728027344, 'learning_rate': 3.7516666666666666e-05, 'epoch': 0.33}


 33%|███▎      | 9775/30000 [1:31:58<2:25:38,  2.31it/s]

{'loss': 14.0833, 'grad_norm': 18.631418228149414, 'learning_rate': 3.747037037037037e-05, 'epoch': 0.33}


 33%|███▎      | 9800/30000 [1:32:09<2:08:55,  2.61it/s]

{'loss': 14.1597, 'grad_norm': 60.26628494262695, 'learning_rate': 3.742407407407408e-05, 'epoch': 0.33}


 33%|███▎      | 9825/30000 [1:32:21<2:07:13,  2.64it/s]

{'loss': 15.0465, 'grad_norm': 33.56208038330078, 'learning_rate': 3.7377777777777775e-05, 'epoch': 0.33}


 33%|███▎      | 9850/30000 [1:32:32<2:45:46,  2.03it/s]

{'loss': 12.9305, 'grad_norm': 65.49327087402344, 'learning_rate': 3.733148148148148e-05, 'epoch': 0.33}


 33%|███▎      | 9875/30000 [1:32:43<2:44:56,  2.03it/s]

{'loss': 16.3298, 'grad_norm': 63.52819061279297, 'learning_rate': 3.728518518518519e-05, 'epoch': 0.33}


 33%|███▎      | 9900/30000 [1:32:54<2:02:31,  2.73it/s]

{'loss': 13.2972, 'grad_norm': 21.555028915405273, 'learning_rate': 3.723888888888889e-05, 'epoch': 0.33}


 33%|███▎      | 9925/30000 [1:33:06<2:50:38,  1.96it/s]

{'loss': 17.5581, 'grad_norm': 51.60391616821289, 'learning_rate': 3.71925925925926e-05, 'epoch': 0.33}


 33%|███▎      | 9950/30000 [1:33:18<2:29:47,  2.23it/s]

{'loss': 18.1527, 'grad_norm': 35.24221420288086, 'learning_rate': 3.7146296296296295e-05, 'epoch': 0.33}


 33%|███▎      | 9975/30000 [1:33:29<2:22:36,  2.34it/s]

{'loss': 13.1432, 'grad_norm': 39.01789093017578, 'learning_rate': 3.71e-05, 'epoch': 0.33}


 33%|███▎      | 10000/30000 [1:33:41<2:51:48,  1.94it/s]

{'loss': 14.6857, 'grad_norm': 50.40171813964844, 'learning_rate': 3.7053703703703706e-05, 'epoch': 0.33}


Too many dataloader workers: 10 (max is dataset.num_shards=3). Stopping 7 dataloader workers.
 33%|███▎      | 10000/30000 [1:36:39<2:51:48,  1.94it/s]

{'eval_loss': 29.399389266967773, 'eval_runtime': 178.0572, 'eval_samples_per_second': 21.858, 'eval_steps_per_second': 5.465, 'epoch': 0.33}


 33%|███▎      | 10025/30000 [1:37:00<2:24:09,  2.31it/s]  

{'loss': 17.5924, 'grad_norm': 48.06121063232422, 'learning_rate': 3.7007407407407404e-05, 'epoch': 0.33}


 34%|███▎      | 10050/30000 [1:37:13<2:53:57,  1.91it/s]

{'loss': 19.1016, 'grad_norm': 44.77885818481445, 'learning_rate': 3.696111111111111e-05, 'epoch': 0.34}


 34%|███▎      | 10075/30000 [1:37:23<2:18:32,  2.40it/s]

{'loss': 12.9018, 'grad_norm': 62.459293365478516, 'learning_rate': 3.691481481481482e-05, 'epoch': 0.34}


 34%|███▎      | 10100/30000 [1:37:36<2:23:16,  2.31it/s]

{'loss': 16.4952, 'grad_norm': 48.505836486816406, 'learning_rate': 3.686851851851852e-05, 'epoch': 0.34}


 34%|███▍      | 10125/30000 [1:37:48<2:52:35,  1.92it/s]

{'loss': 16.8599, 'grad_norm': 48.10630416870117, 'learning_rate': 3.6822222222222226e-05, 'epoch': 0.34}


 34%|███▍      | 10150/30000 [1:37:59<2:40:15,  2.06it/s]

{'loss': 16.828, 'grad_norm': 57.10725021362305, 'learning_rate': 3.677592592592593e-05, 'epoch': 0.34}


 34%|███▍      | 10175/30000 [1:38:12<2:44:09,  2.01it/s]

{'loss': 20.9516, 'grad_norm': 40.590972900390625, 'learning_rate': 3.672962962962963e-05, 'epoch': 0.34}


 34%|███▍      | 10200/30000 [1:38:24<2:46:25,  1.98it/s]

{'loss': 19.5505, 'grad_norm': 32.73298263549805, 'learning_rate': 3.6683333333333335e-05, 'epoch': 0.34}


 34%|███▍      | 10225/30000 [1:38:36<2:43:23,  2.02it/s]

{'loss': 20.1396, 'grad_norm': 35.99333572387695, 'learning_rate': 3.6637037037037034e-05, 'epoch': 0.34}


 34%|███▍      | 10250/30000 [1:38:48<2:41:11,  2.04it/s]

{'loss': 15.048, 'grad_norm': 62.6375617980957, 'learning_rate': 3.659074074074074e-05, 'epoch': 0.34}


 34%|███▍      | 10275/30000 [1:38:59<2:40:30,  2.05it/s]

{'loss': 13.2738, 'grad_norm': 17.291982650756836, 'learning_rate': 3.654444444444445e-05, 'epoch': 0.34}


 34%|███▍      | 10300/30000 [1:39:13<3:10:53,  1.72it/s]

{'loss': 14.0092, 'grad_norm': 55.8306884765625, 'learning_rate': 3.649814814814815e-05, 'epoch': 0.34}


 34%|███▍      | 10325/30000 [1:39:27<2:46:19,  1.97it/s]

{'loss': 20.7546, 'grad_norm': 54.5384635925293, 'learning_rate': 3.6451851851851855e-05, 'epoch': 0.34}


 34%|███▍      | 10350/30000 [1:39:39<2:41:08,  2.03it/s]

{'loss': 20.7983, 'grad_norm': 41.8912239074707, 'learning_rate': 3.640555555555556e-05, 'epoch': 0.34}


 35%|███▍      | 10375/30000 [1:39:51<2:36:58,  2.08it/s]

{'loss': 14.2094, 'grad_norm': 62.52158737182617, 'learning_rate': 3.635925925925926e-05, 'epoch': 0.35}


 35%|███▍      | 10400/30000 [1:40:04<2:43:58,  1.99it/s]

{'loss': 18.0427, 'grad_norm': 36.85905838012695, 'learning_rate': 3.6312962962962964e-05, 'epoch': 0.35}


 35%|███▍      | 10425/30000 [1:40:15<2:41:52,  2.02it/s]

{'loss': 15.9378, 'grad_norm': 64.01429748535156, 'learning_rate': 3.626666666666667e-05, 'epoch': 0.35}


 35%|███▍      | 10450/30000 [1:40:27<2:06:47,  2.57it/s]

{'loss': 15.2688, 'grad_norm': 18.56576156616211, 'learning_rate': 3.622037037037037e-05, 'epoch': 0.35}


 35%|███▍      | 10475/30000 [1:40:40<2:40:21,  2.03it/s]

{'loss': 17.1091, 'grad_norm': 49.129676818847656, 'learning_rate': 3.617407407407408e-05, 'epoch': 0.35}


 35%|███▌      | 10500/30000 [1:40:52<2:43:46,  1.98it/s]

{'loss': 13.8926, 'grad_norm': 30.497243881225586, 'learning_rate': 3.612777777777778e-05, 'epoch': 0.35}


 35%|███▌      | 10525/30000 [1:41:04<2:14:20,  2.42it/s]

{'loss': 18.955, 'grad_norm': 31.240936279296875, 'learning_rate': 3.6081481481481484e-05, 'epoch': 0.35}


 35%|███▌      | 10550/30000 [1:41:16<2:43:21,  1.98it/s]

{'loss': 16.756, 'grad_norm': 20.689407348632812, 'learning_rate': 3.603518518518519e-05, 'epoch': 0.35}


 35%|███▌      | 10575/30000 [1:41:30<2:42:30,  1.99it/s]

{'loss': 10.3981, 'grad_norm': 29.198772430419922, 'learning_rate': 3.598888888888889e-05, 'epoch': 0.35}


 35%|███▌      | 10600/30000 [1:41:41<2:01:57,  2.65it/s]

{'loss': 11.4606, 'grad_norm': 37.82677459716797, 'learning_rate': 3.594259259259259e-05, 'epoch': 0.35}


 35%|███▌      | 10625/30000 [1:41:52<2:10:14,  2.48it/s]

{'loss': 15.4329, 'grad_norm': 25.166105270385742, 'learning_rate': 3.58962962962963e-05, 'epoch': 0.35}


 36%|███▌      | 10650/30000 [1:42:06<3:34:19,  1.50it/s]

{'loss': 15.3143, 'grad_norm': 55.56877517700195, 'learning_rate': 3.585e-05, 'epoch': 0.35}


 36%|███▌      | 10675/30000 [1:42:22<3:38:22,  1.47it/s]

{'loss': 18.3284, 'grad_norm': 53.034095764160156, 'learning_rate': 3.58037037037037e-05, 'epoch': 0.36}


 36%|███▌      | 10700/30000 [1:42:37<2:55:35,  1.83it/s]

{'loss': 14.767, 'grad_norm': 16.190349578857422, 'learning_rate': 3.5757407407407415e-05, 'epoch': 0.36}


 36%|███▌      | 10725/30000 [1:42:53<3:12:30,  1.67it/s]

{'loss': 16.6409, 'grad_norm': 68.63422393798828, 'learning_rate': 3.571111111111111e-05, 'epoch': 0.36}


 36%|███▌      | 10750/30000 [1:43:09<3:41:41,  1.45it/s]

{'loss': 17.1392, 'grad_norm': 54.1526985168457, 'learning_rate': 3.566481481481482e-05, 'epoch': 0.36}


 36%|███▌      | 10775/30000 [1:43:24<3:32:53,  1.51it/s]

{'loss': 21.4062, 'grad_norm': 48.16030502319336, 'learning_rate': 3.561851851851852e-05, 'epoch': 0.36}


 36%|███▌      | 10800/30000 [1:43:41<3:30:48,  1.52it/s]

{'loss': 19.8173, 'grad_norm': 41.045074462890625, 'learning_rate': 3.557222222222222e-05, 'epoch': 0.36}


 36%|███▌      | 10825/30000 [1:43:56<3:23:58,  1.57it/s]

{'loss': 16.0056, 'grad_norm': 61.36013412475586, 'learning_rate': 3.552592592592593e-05, 'epoch': 0.36}


 36%|███▌      | 10850/30000 [1:44:10<2:35:56,  2.05it/s]

{'loss': 9.9531, 'grad_norm': 43.21696472167969, 'learning_rate': 3.5479629629629626e-05, 'epoch': 0.36}


 36%|███▋      | 10875/30000 [1:44:22<2:39:51,  1.99it/s]

{'loss': 19.6544, 'grad_norm': 42.667755126953125, 'learning_rate': 3.543333333333333e-05, 'epoch': 0.36}


 36%|███▋      | 10900/30000 [1:44:34<2:23:17,  2.22it/s]

{'loss': 17.7906, 'grad_norm': 35.22868728637695, 'learning_rate': 3.5387037037037044e-05, 'epoch': 0.36}


 36%|███▋      | 10925/30000 [1:44:47<2:18:56,  2.29it/s]

{'loss': 23.9001, 'grad_norm': 34.75862503051758, 'learning_rate': 3.534074074074074e-05, 'epoch': 0.36}


 36%|███▋      | 10950/30000 [1:44:59<2:41:51,  1.96it/s]

{'loss': 18.8312, 'grad_norm': 51.97074508666992, 'learning_rate': 3.529444444444445e-05, 'epoch': 0.36}


 37%|███▋      | 10975/30000 [1:45:11<2:12:33,  2.39it/s]

{'loss': 19.2638, 'grad_norm': 26.225563049316406, 'learning_rate': 3.524814814814815e-05, 'epoch': 0.37}


 37%|███▋      | 11000/30000 [1:45:21<2:36:42,  2.02it/s]

{'loss': 13.7724, 'grad_norm': 50.53926467895508, 'learning_rate': 3.520185185185185e-05, 'epoch': 0.37}


 37%|███▋      | 11025/30000 [1:45:32<2:19:13,  2.27it/s]

{'loss': 13.1348, 'grad_norm': 70.84485626220703, 'learning_rate': 3.515555555555556e-05, 'epoch': 0.37}


 37%|███▋      | 11050/30000 [1:45:44<2:18:37,  2.28it/s]

{'loss': 15.9409, 'grad_norm': 27.377864837646484, 'learning_rate': 3.5109259259259256e-05, 'epoch': 0.37}


 37%|███▋      | 11075/30000 [1:45:56<2:17:37,  2.29it/s]

{'loss': 16.2808, 'grad_norm': 24.518293380737305, 'learning_rate': 3.506296296296296e-05, 'epoch': 0.37}


 37%|███▋      | 11100/30000 [1:46:08<2:38:07,  1.99it/s]

{'loss': 15.4262, 'grad_norm': 18.601238250732422, 'learning_rate': 3.501666666666667e-05, 'epoch': 0.37}


 37%|███▋      | 11125/30000 [1:46:20<2:37:25,  2.00it/s]

{'loss': 19.1667, 'grad_norm': 47.74802017211914, 'learning_rate': 3.497037037037037e-05, 'epoch': 0.37}


 37%|███▋      | 11150/30000 [1:46:31<2:31:57,  2.07it/s]

{'loss': 21.2746, 'grad_norm': 54.5495491027832, 'learning_rate': 3.492407407407408e-05, 'epoch': 0.37}


 37%|███▋      | 11175/30000 [1:46:43<2:21:34,  2.22it/s]

{'loss': 16.7253, 'grad_norm': 27.57292366027832, 'learning_rate': 3.487777777777778e-05, 'epoch': 0.37}


 37%|███▋      | 11200/30000 [1:46:54<2:11:27,  2.38it/s]

{'loss': 12.3362, 'grad_norm': 23.274986267089844, 'learning_rate': 3.483148148148148e-05, 'epoch': 0.37}


 37%|███▋      | 11225/30000 [1:47:06<2:15:14,  2.31it/s]

{'loss': 14.9897, 'grad_norm': 29.20033073425293, 'learning_rate': 3.4785185185185186e-05, 'epoch': 0.37}


 38%|███▊      | 11250/30000 [1:47:19<2:22:41,  2.19it/s]

{'loss': 16.8565, 'grad_norm': 34.912025451660156, 'learning_rate': 3.473888888888889e-05, 'epoch': 0.38}


 38%|███▊      | 11275/30000 [1:47:31<2:41:27,  1.93it/s]

{'loss': 21.5443, 'grad_norm': 35.65898895263672, 'learning_rate': 3.469259259259259e-05, 'epoch': 0.38}


 38%|███▊      | 11300/30000 [1:47:43<2:18:20,  2.25it/s]

{'loss': 17.983, 'grad_norm': 44.07905960083008, 'learning_rate': 3.46462962962963e-05, 'epoch': 0.38}


 38%|███▊      | 11325/30000 [1:47:53<2:03:20,  2.52it/s]

{'loss': 7.322, 'grad_norm': 26.991518020629883, 'learning_rate': 3.46e-05, 'epoch': 0.38}


 38%|███▊      | 11350/30000 [1:48:06<2:37:53,  1.97it/s]

{'loss': 15.5378, 'grad_norm': 29.646808624267578, 'learning_rate': 3.4553703703703706e-05, 'epoch': 0.38}


 38%|███▊      | 11375/30000 [1:48:18<2:33:12,  2.03it/s]

{'loss': 17.2449, 'grad_norm': 39.9084358215332, 'learning_rate': 3.450740740740741e-05, 'epoch': 0.38}


 38%|███▊      | 11400/30000 [1:48:30<2:18:46,  2.23it/s]

{'loss': 17.5274, 'grad_norm': 27.51472282409668, 'learning_rate': 3.446111111111111e-05, 'epoch': 0.38}


 38%|███▊      | 11425/30000 [1:48:43<2:36:54,  1.97it/s]

{'loss': 14.7199, 'grad_norm': 42.3934211730957, 'learning_rate': 3.4414814814814815e-05, 'epoch': 0.38}


 38%|███▊      | 11450/30000 [1:48:55<2:35:23,  1.99it/s]

{'loss': 17.785, 'grad_norm': 27.642656326293945, 'learning_rate': 3.436851851851852e-05, 'epoch': 0.38}


 38%|███▊      | 11475/30000 [1:49:06<2:05:01,  2.47it/s]

{'loss': 15.845, 'grad_norm': 43.62353515625, 'learning_rate': 3.432222222222222e-05, 'epoch': 0.38}


 38%|███▊      | 11500/30000 [1:49:19<2:37:53,  1.95it/s]

{'loss': 14.3164, 'grad_norm': 39.553001403808594, 'learning_rate': 3.427592592592593e-05, 'epoch': 0.38}


 38%|███▊      | 11525/30000 [1:49:31<2:13:33,  2.31it/s]

{'loss': 14.3886, 'grad_norm': 33.523929595947266, 'learning_rate': 3.422962962962964e-05, 'epoch': 0.38}


 38%|███▊      | 11550/30000 [1:49:44<2:36:54,  1.96it/s]

{'loss': 17.2492, 'grad_norm': 33.068660736083984, 'learning_rate': 3.4183333333333335e-05, 'epoch': 0.39}


 39%|███▊      | 11575/30000 [1:49:56<1:58:25,  2.59it/s]

{'loss': 21.9167, 'grad_norm': 23.940288543701172, 'learning_rate': 3.413703703703704e-05, 'epoch': 0.39}


 39%|███▊      | 11600/30000 [1:50:08<2:42:15,  1.89it/s]

{'loss': 17.86, 'grad_norm': 58.5166015625, 'learning_rate': 3.409074074074074e-05, 'epoch': 0.39}


 39%|███▉      | 11625/30000 [1:50:20<2:39:37,  1.92it/s]

{'loss': 15.5915, 'grad_norm': 14.827202796936035, 'learning_rate': 3.4044444444444445e-05, 'epoch': 0.39}


 39%|███▉      | 11650/30000 [1:50:32<2:34:36,  1.98it/s]

{'loss': 18.5047, 'grad_norm': 33.30298614501953, 'learning_rate': 3.399814814814815e-05, 'epoch': 0.39}


 39%|███▉      | 11675/30000 [1:50:43<2:35:51,  1.96it/s]

{'loss': 10.5163, 'grad_norm': 56.308677673339844, 'learning_rate': 3.395185185185185e-05, 'epoch': 0.39}


 39%|███▉      | 11700/30000 [1:50:55<2:38:13,  1.93it/s]

{'loss': 19.0392, 'grad_norm': 32.74774169921875, 'learning_rate': 3.3905555555555554e-05, 'epoch': 0.39}


 39%|███▉      | 11725/30000 [1:51:07<2:31:32,  2.01it/s]

{'loss': 16.3883, 'grad_norm': 16.277301788330078, 'learning_rate': 3.3859259259259266e-05, 'epoch': 0.39}


 39%|███▉      | 11750/30000 [1:51:17<1:46:08,  2.87it/s]

{'loss': 11.7977, 'grad_norm': 6.990965366363525, 'learning_rate': 3.3812962962962964e-05, 'epoch': 0.39}


 39%|███▉      | 11775/30000 [1:51:28<2:10:13,  2.33it/s]

{'loss': 13.2939, 'grad_norm': 16.491077423095703, 'learning_rate': 3.376666666666667e-05, 'epoch': 0.39}


 39%|███▉      | 11800/30000 [1:51:41<2:39:01,  1.91it/s]

{'loss': 14.5403, 'grad_norm': 23.16593360900879, 'learning_rate': 3.3720370370370375e-05, 'epoch': 0.39}


 39%|███▉      | 11825/30000 [1:51:52<2:38:29,  1.91it/s]

{'loss': 20.9361, 'grad_norm': 40.933414459228516, 'learning_rate': 3.3674074074074074e-05, 'epoch': 0.39}


 40%|███▉      | 11850/30000 [1:52:05<2:32:40,  1.98it/s]

{'loss': 19.3224, 'grad_norm': 39.06330871582031, 'learning_rate': 3.362777777777778e-05, 'epoch': 0.4}


 40%|███▉      | 11875/30000 [1:52:17<2:33:52,  1.96it/s]

{'loss': 15.5766, 'grad_norm': 41.43344497680664, 'learning_rate': 3.358148148148148e-05, 'epoch': 0.4}


 40%|███▉      | 11900/30000 [1:52:30<2:25:19,  2.08it/s]

{'loss': 16.2964, 'grad_norm': 34.75800323486328, 'learning_rate': 3.353518518518518e-05, 'epoch': 0.4}


 40%|███▉      | 11925/30000 [1:52:43<2:37:38,  1.91it/s]

{'loss': 17.2412, 'grad_norm': 32.84504318237305, 'learning_rate': 3.3488888888888895e-05, 'epoch': 0.4}


 40%|███▉      | 11950/30000 [1:52:57<2:23:59,  2.09it/s]

{'loss': 11.4554, 'grad_norm': 27.539539337158203, 'learning_rate': 3.3442592592592594e-05, 'epoch': 0.4}


 40%|███▉      | 11975/30000 [1:53:09<2:17:19,  2.19it/s]

{'loss': 14.2298, 'grad_norm': 13.902034759521484, 'learning_rate': 3.33962962962963e-05, 'epoch': 0.4}


 40%|████      | 12000/30000 [1:53:21<2:11:36,  2.28it/s]

{'loss': 14.7513, 'grad_norm': 26.735275268554688, 'learning_rate': 3.3350000000000004e-05, 'epoch': 0.4}


Too many dataloader workers: 10 (max is dataset.num_shards=3). Stopping 7 dataloader workers.
 40%|████      | 12000/30000 [1:56:29<2:11:36,  2.28it/s]

{'eval_loss': 28.053943634033203, 'eval_runtime': 187.8652, 'eval_samples_per_second': 20.717, 'eval_steps_per_second': 5.179, 'epoch': 0.4}


 40%|████      | 12025/30000 [1:56:55<3:59:13,  1.25it/s]  

{'loss': 12.1614, 'grad_norm': 23.897245407104492, 'learning_rate': 3.33037037037037e-05, 'epoch': 0.4}


 40%|████      | 12050/30000 [1:57:14<3:53:07,  1.28it/s]

{'loss': 16.5693, 'grad_norm': 35.603355407714844, 'learning_rate': 3.325740740740741e-05, 'epoch': 0.4}


 40%|████      | 12075/30000 [1:57:32<2:33:23,  1.95it/s]

{'loss': 20.7743, 'grad_norm': 49.11125183105469, 'learning_rate': 3.3211111111111114e-05, 'epoch': 0.4}


 40%|████      | 12100/30000 [1:57:44<2:02:09,  2.44it/s]

{'loss': 18.8691, 'grad_norm': 29.16915512084961, 'learning_rate': 3.316481481481481e-05, 'epoch': 0.4}


 40%|████      | 12125/30000 [1:57:57<2:30:40,  1.98it/s]

{'loss': 19.0555, 'grad_norm': 39.83517837524414, 'learning_rate': 3.3118518518518524e-05, 'epoch': 0.4}


 40%|████      | 12150/30000 [1:58:08<1:53:18,  2.63it/s]

{'loss': 16.3303, 'grad_norm': 35.71503448486328, 'learning_rate': 3.307222222222222e-05, 'epoch': 0.41}


 41%|████      | 12175/30000 [1:58:19<2:10:18,  2.28it/s]

{'loss': 10.5537, 'grad_norm': 24.188804626464844, 'learning_rate': 3.302592592592593e-05, 'epoch': 0.41}


 41%|████      | 12200/30000 [1:58:32<2:31:37,  1.96it/s]

{'loss': 17.4838, 'grad_norm': 38.61312484741211, 'learning_rate': 3.2979629629629633e-05, 'epoch': 0.41}


 41%|████      | 12225/30000 [1:58:43<2:05:17,  2.36it/s]

{'loss': 18.0308, 'grad_norm': 26.25202751159668, 'learning_rate': 3.293333333333333e-05, 'epoch': 0.41}


 41%|████      | 12250/30000 [1:58:55<2:29:29,  1.98it/s]

{'loss': 15.9095, 'grad_norm': 40.1266975402832, 'learning_rate': 3.288703703703704e-05, 'epoch': 0.41}


 41%|████      | 12275/30000 [1:59:06<1:55:56,  2.55it/s]

{'loss': 15.7225, 'grad_norm': 30.764692306518555, 'learning_rate': 3.284074074074074e-05, 'epoch': 0.41}


 41%|████      | 12300/30000 [1:59:18<2:27:33,  2.00it/s]

{'loss': 13.7919, 'grad_norm': 48.02360153198242, 'learning_rate': 3.279444444444444e-05, 'epoch': 0.41}


 41%|████      | 12325/30000 [1:59:28<1:53:31,  2.59it/s]

{'loss': 15.2362, 'grad_norm': 26.19232177734375, 'learning_rate': 3.274814814814815e-05, 'epoch': 0.41}


 41%|████      | 12350/30000 [1:59:41<2:30:27,  1.96it/s]

{'loss': 18.874, 'grad_norm': 54.17836380004883, 'learning_rate': 3.270185185185186e-05, 'epoch': 0.41}


 41%|████▏     | 12375/30000 [1:59:53<2:32:09,  1.93it/s]

{'loss': 15.3919, 'grad_norm': 9.454344749450684, 'learning_rate': 3.265555555555556e-05, 'epoch': 0.41}


 41%|████▏     | 12400/30000 [2:00:05<2:00:53,  2.43it/s]

{'loss': 16.62, 'grad_norm': 22.83926010131836, 'learning_rate': 3.260925925925926e-05, 'epoch': 0.41}


 41%|████▏     | 12425/30000 [2:00:16<2:11:27,  2.23it/s]

{'loss': 13.9959, 'grad_norm': 32.398319244384766, 'learning_rate': 3.256296296296296e-05, 'epoch': 0.41}


 42%|████▏     | 12450/30000 [2:00:28<2:22:24,  2.05it/s]

{'loss': 13.257, 'grad_norm': 41.42188262939453, 'learning_rate': 3.2516666666666666e-05, 'epoch': 0.41}


 42%|████▏     | 12475/30000 [2:00:40<2:22:37,  2.05it/s]

{'loss': 18.0014, 'grad_norm': 37.21760559082031, 'learning_rate': 3.247037037037037e-05, 'epoch': 0.42}


 42%|████▏     | 12500/30000 [2:00:52<2:26:47,  1.99it/s]

{'loss': 15.2048, 'grad_norm': 34.575042724609375, 'learning_rate': 3.242407407407407e-05, 'epoch': 0.42}


 42%|████▏     | 12525/30000 [2:01:05<2:23:27,  2.03it/s]

{'loss': 16.2739, 'grad_norm': 51.79195785522461, 'learning_rate': 3.2377777777777776e-05, 'epoch': 0.42}


 42%|████▏     | 12550/30000 [2:01:16<2:28:56,  1.95it/s]

{'loss': 13.4034, 'grad_norm': 54.719478607177734, 'learning_rate': 3.233148148148149e-05, 'epoch': 0.42}


 42%|████▏     | 12575/30000 [2:01:28<2:24:23,  2.01it/s]

{'loss': 15.9561, 'grad_norm': 42.73583221435547, 'learning_rate': 3.2285185185185186e-05, 'epoch': 0.42}


 42%|████▏     | 12600/30000 [2:01:39<2:03:15,  2.35it/s]

{'loss': 17.3288, 'grad_norm': 28.228574752807617, 'learning_rate': 3.223888888888889e-05, 'epoch': 0.42}


 42%|████▏     | 12625/30000 [2:01:51<2:26:12,  1.98it/s]

{'loss': 17.0996, 'grad_norm': 27.474058151245117, 'learning_rate': 3.21925925925926e-05, 'epoch': 0.42}


 42%|████▏     | 12650/30000 [2:02:04<2:25:59,  1.98it/s]

{'loss': 16.6474, 'grad_norm': 44.158447265625, 'learning_rate': 3.2146296296296296e-05, 'epoch': 0.42}


 42%|████▏     | 12675/30000 [2:02:16<1:54:39,  2.52it/s]

{'loss': 13.1753, 'grad_norm': 24.0896053314209, 'learning_rate': 3.21e-05, 'epoch': 0.42}


 42%|████▏     | 12700/30000 [2:02:27<1:59:05,  2.42it/s]

{'loss': 15.7432, 'grad_norm': 35.154991149902344, 'learning_rate': 3.2053703703703706e-05, 'epoch': 0.42}


 42%|████▏     | 12725/30000 [2:02:40<2:28:26,  1.94it/s]

{'loss': 21.379, 'grad_norm': 31.15313148498535, 'learning_rate': 3.2007407407407405e-05, 'epoch': 0.42}


 42%|████▎     | 12750/30000 [2:02:52<2:09:35,  2.22it/s]

{'loss': 20.0972, 'grad_norm': 35.54722213745117, 'learning_rate': 3.196111111111112e-05, 'epoch': 0.42}


 43%|████▎     | 12775/30000 [2:03:04<2:23:58,  1.99it/s]

{'loss': 18.4732, 'grad_norm': 42.788578033447266, 'learning_rate': 3.1914814814814816e-05, 'epoch': 0.43}


 43%|████▎     | 12800/30000 [2:03:17<2:25:30,  1.97it/s]

{'loss': 18.5547, 'grad_norm': 62.39685821533203, 'learning_rate': 3.186851851851852e-05, 'epoch': 0.43}


 43%|████▎     | 12825/30000 [2:03:30<2:30:04,  1.91it/s]

{'loss': 20.4066, 'grad_norm': 42.713050842285156, 'learning_rate': 3.1822222222222226e-05, 'epoch': 0.43}


 43%|████▎     | 12850/30000 [2:03:42<2:26:16,  1.95it/s]

{'loss': 19.6777, 'grad_norm': 30.33133888244629, 'learning_rate': 3.1775925925925925e-05, 'epoch': 0.43}


 43%|████▎     | 12875/30000 [2:03:53<2:27:31,  1.93it/s]

{'loss': 15.6868, 'grad_norm': 61.29613494873047, 'learning_rate': 3.172962962962963e-05, 'epoch': 0.43}


 43%|████▎     | 12900/30000 [2:04:05<2:24:54,  1.97it/s]

{'loss': 12.5848, 'grad_norm': 40.141517639160156, 'learning_rate': 3.1683333333333335e-05, 'epoch': 0.43}


 43%|████▎     | 12925/30000 [2:04:17<2:25:33,  1.96it/s]

{'loss': 17.5708, 'grad_norm': 31.73662567138672, 'learning_rate': 3.1637037037037034e-05, 'epoch': 0.43}


 43%|████▎     | 12950/30000 [2:04:29<2:06:23,  2.25it/s]

{'loss': 18.6529, 'grad_norm': 32.00833511352539, 'learning_rate': 3.1590740740740746e-05, 'epoch': 0.43}


 43%|████▎     | 12975/30000 [2:04:41<2:24:17,  1.97it/s]

{'loss': 25.3838, 'grad_norm': 37.69894027709961, 'learning_rate': 3.154444444444445e-05, 'epoch': 0.43}


 43%|████▎     | 13000/30000 [2:04:54<2:03:14,  2.30it/s]

{'loss': 17.3252, 'grad_norm': 35.98380661010742, 'learning_rate': 3.149814814814815e-05, 'epoch': 0.43}


 43%|████▎     | 13025/30000 [2:05:05<2:23:20,  1.97it/s]

{'loss': 15.3228, 'grad_norm': 125.84695434570312, 'learning_rate': 3.1451851851851855e-05, 'epoch': 0.43}


 44%|████▎     | 13050/30000 [2:05:17<2:21:11,  2.00it/s]

{'loss': 19.5355, 'grad_norm': 51.367286682128906, 'learning_rate': 3.1405555555555554e-05, 'epoch': 0.43}


 44%|████▎     | 13075/30000 [2:05:29<2:22:05,  1.99it/s]

{'loss': 15.6695, 'grad_norm': 59.151214599609375, 'learning_rate': 3.135925925925926e-05, 'epoch': 0.44}


 44%|████▎     | 13100/30000 [2:05:41<2:21:49,  1.99it/s]

{'loss': 14.207, 'grad_norm': 223.3188018798828, 'learning_rate': 3.1312962962962965e-05, 'epoch': 0.44}


 44%|████▍     | 13125/30000 [2:05:53<1:58:36,  2.37it/s]

{'loss': 13.903, 'grad_norm': 36.010711669921875, 'learning_rate': 3.126666666666666e-05, 'epoch': 0.44}


 44%|████▍     | 13150/30000 [2:06:06<2:30:04,  1.87it/s]

{'loss': 20.3124, 'grad_norm': 24.584367752075195, 'learning_rate': 3.1220370370370375e-05, 'epoch': 0.44}


 44%|████▍     | 13175/30000 [2:06:18<2:22:42,  1.96it/s]

{'loss': 14.5002, 'grad_norm': 38.52916717529297, 'learning_rate': 3.117407407407408e-05, 'epoch': 0.44}


 44%|████▍     | 13200/30000 [2:06:29<1:58:16,  2.37it/s]

{'loss': 15.6137, 'grad_norm': 20.488296508789062, 'learning_rate': 3.112777777777778e-05, 'epoch': 0.44}


 44%|████▍     | 13225/30000 [2:06:41<1:55:00,  2.43it/s]

{'loss': 18.5952, 'grad_norm': 17.64044189453125, 'learning_rate': 3.1081481481481485e-05, 'epoch': 0.44}


 44%|████▍     | 13250/30000 [2:06:52<1:57:27,  2.38it/s]

{'loss': 13.3758, 'grad_norm': 25.352685928344727, 'learning_rate': 3.103518518518519e-05, 'epoch': 0.44}


 44%|████▍     | 13275/30000 [2:07:03<1:51:43,  2.49it/s]

{'loss': 16.4702, 'grad_norm': 39.3829460144043, 'learning_rate': 3.098888888888889e-05, 'epoch': 0.44}


 44%|████▍     | 13300/30000 [2:07:15<2:19:07,  2.00it/s]

{'loss': 14.6593, 'grad_norm': 53.823760986328125, 'learning_rate': 3.0942592592592594e-05, 'epoch': 0.44}


 44%|████▍     | 13325/30000 [2:07:27<2:15:03,  2.06it/s]

{'loss': 17.5291, 'grad_norm': 61.93108367919922, 'learning_rate': 3.089629629629629e-05, 'epoch': 0.44}


 44%|████▍     | 13350/30000 [2:07:39<2:24:10,  1.92it/s]

{'loss': 23.4803, 'grad_norm': 31.37865447998047, 'learning_rate': 3.0850000000000004e-05, 'epoch': 0.45}


 45%|████▍     | 13375/30000 [2:07:51<2:13:45,  2.07it/s]

{'loss': 17.9001, 'grad_norm': 66.55660247802734, 'learning_rate': 3.080370370370371e-05, 'epoch': 0.45}


 45%|████▍     | 13400/30000 [2:08:04<2:24:10,  1.92it/s]

{'loss': 21.9785, 'grad_norm': 56.33380889892578, 'learning_rate': 3.075740740740741e-05, 'epoch': 0.45}


 45%|████▍     | 13425/30000 [2:08:16<2:18:12,  2.00it/s]

{'loss': 17.0195, 'grad_norm': 23.180715560913086, 'learning_rate': 3.0711111111111114e-05, 'epoch': 0.45}


 45%|████▍     | 13450/30000 [2:08:27<1:50:25,  2.50it/s]

{'loss': 11.8727, 'grad_norm': 12.659266471862793, 'learning_rate': 3.066481481481482e-05, 'epoch': 0.45}


 45%|████▍     | 13475/30000 [2:08:40<2:21:25,  1.95it/s]

{'loss': 23.0835, 'grad_norm': 63.43239212036133, 'learning_rate': 3.061851851851852e-05, 'epoch': 0.45}


 45%|████▌     | 13500/30000 [2:08:51<2:18:12,  1.99it/s]

{'loss': 13.3431, 'grad_norm': 48.55322265625, 'learning_rate': 3.057222222222222e-05, 'epoch': 0.45}


 45%|████▌     | 13525/30000 [2:09:03<2:16:19,  2.01it/s]

{'loss': 12.106, 'grad_norm': 44.91973114013672, 'learning_rate': 3.052592592592593e-05, 'epoch': 0.45}


 45%|████▌     | 13550/30000 [2:09:15<2:11:43,  2.08it/s]

{'loss': 20.0483, 'grad_norm': 56.26594924926758, 'learning_rate': 3.047962962962963e-05, 'epoch': 0.45}


 45%|████▌     | 13575/30000 [2:09:27<2:17:01,  2.00it/s]

{'loss': 17.9941, 'grad_norm': 40.041015625, 'learning_rate': 3.0433333333333336e-05, 'epoch': 0.45}


 45%|████▌     | 13600/30000 [2:09:39<2:19:55,  1.95it/s]

{'loss': 17.882, 'grad_norm': 28.362943649291992, 'learning_rate': 3.0387037037037038e-05, 'epoch': 0.45}


 45%|████▌     | 13625/30000 [2:09:50<2:13:20,  2.05it/s]

{'loss': 11.7268, 'grad_norm': 40.462852478027344, 'learning_rate': 3.0340740740740743e-05, 'epoch': 0.45}


 46%|████▌     | 13650/30000 [2:10:02<1:48:07,  2.52it/s]

{'loss': 11.4755, 'grad_norm': 24.025714874267578, 'learning_rate': 3.0294444444444448e-05, 'epoch': 0.46}


 46%|████▌     | 13675/30000 [2:10:13<2:11:30,  2.07it/s]

{'loss': 13.7456, 'grad_norm': 53.43873977661133, 'learning_rate': 3.0248148148148147e-05, 'epoch': 0.46}


 46%|████▌     | 13700/30000 [2:10:24<2:11:45,  2.06it/s]

{'loss': 11.1114, 'grad_norm': nan, 'learning_rate': 3.0201851851851852e-05, 'epoch': 0.46}


 46%|████▌     | 13725/30000 [2:10:36<2:19:44,  1.94it/s]

{'loss': 11.7979, 'grad_norm': 30.509206771850586, 'learning_rate': 3.0157407407407407e-05, 'epoch': 0.46}


 46%|████▌     | 13750/30000 [2:10:48<1:51:15,  2.43it/s]

{'loss': 18.5502, 'grad_norm': 21.24832534790039, 'learning_rate': 3.0111111111111113e-05, 'epoch': 0.46}


 46%|████▌     | 13775/30000 [2:11:00<2:15:29,  2.00it/s]

{'loss': 20.9791, 'grad_norm': 19.731157302856445, 'learning_rate': 3.0064814814814818e-05, 'epoch': 0.46}


 46%|████▌     | 13800/30000 [2:11:13<2:16:19,  1.98it/s]

{'loss': 18.9832, 'grad_norm': 74.8206558227539, 'learning_rate': 3.0018518518518517e-05, 'epoch': 0.46}


 46%|████▌     | 13825/30000 [2:11:25<2:12:04,  2.04it/s]

{'loss': 17.9043, 'grad_norm': 65.35150146484375, 'learning_rate': 2.9972222222222225e-05, 'epoch': 0.46}


 46%|████▌     | 13850/30000 [2:11:37<2:17:57,  1.95it/s]

{'loss': 13.5848, 'grad_norm': 33.97673797607422, 'learning_rate': 2.992592592592593e-05, 'epoch': 0.46}


 46%|████▋     | 13875/30000 [2:11:49<2:08:07,  2.10it/s]

{'loss': 12.2898, 'grad_norm': 43.83666229248047, 'learning_rate': 2.987962962962963e-05, 'epoch': 0.46}


 46%|████▋     | 13900/30000 [2:12:01<1:59:33,  2.24it/s]

{'loss': 23.3502, 'grad_norm': 44.72200012207031, 'learning_rate': 2.9833333333333335e-05, 'epoch': 0.46}


 46%|████▋     | 13925/30000 [2:12:13<2:18:31,  1.93it/s]

{'loss': 15.6185, 'grad_norm': 61.1119270324707, 'learning_rate': 2.978703703703704e-05, 'epoch': 0.46}


 46%|████▋     | 13950/30000 [2:12:26<2:09:20,  2.07it/s]

{'loss': 13.4586, 'grad_norm': 30.602985382080078, 'learning_rate': 2.9740740740740742e-05, 'epoch': 0.47}


 47%|████▋     | 13975/30000 [2:12:37<2:08:53,  2.07it/s]

{'loss': 20.0965, 'grad_norm': 33.47903823852539, 'learning_rate': 2.96962962962963e-05, 'epoch': 0.47}


 47%|████▋     | 14000/30000 [2:12:50<2:17:25,  1.94it/s]

{'loss': 20.1384, 'grad_norm': 21.342119216918945, 'learning_rate': 2.965e-05, 'epoch': 0.47}


Too many dataloader workers: 10 (max is dataset.num_shards=3). Stopping 7 dataloader workers.
 47%|████▋     | 14000/30000 [2:15:52<2:17:25,  1.94it/s]

{'eval_loss': 28.2296142578125, 'eval_runtime': 181.8202, 'eval_samples_per_second': 21.406, 'eval_steps_per_second': 5.351, 'epoch': 0.47}


 47%|████▋     | 14025/30000 [2:16:13<2:25:01,  1.84it/s]  

{'loss': 10.7453, 'grad_norm': 60.85987091064453, 'learning_rate': 2.9603703703703704e-05, 'epoch': 0.47}


 47%|████▋     | 14050/30000 [2:16:25<1:59:57,  2.22it/s]

{'loss': 19.4863, 'grad_norm': 47.21971130371094, 'learning_rate': 2.9557407407407413e-05, 'epoch': 0.47}


 47%|████▋     | 14075/30000 [2:16:37<1:52:28,  2.36it/s]

{'loss': 15.6881, 'grad_norm': 29.288352966308594, 'learning_rate': 2.951111111111111e-05, 'epoch': 0.47}


 47%|████▋     | 14100/30000 [2:16:49<2:11:55,  2.01it/s]

{'loss': 23.1973, 'grad_norm': 48.735595703125, 'learning_rate': 2.9464814814814817e-05, 'epoch': 0.47}


 47%|████▋     | 14125/30000 [2:17:01<2:12:15,  2.00it/s]

{'loss': 15.4847, 'grad_norm': 70.52416229248047, 'learning_rate': 2.941851851851852e-05, 'epoch': 0.47}


 47%|████▋     | 14150/30000 [2:17:12<2:11:57,  2.00it/s]

{'loss': 12.425, 'grad_norm': 40.978939056396484, 'learning_rate': 2.9372222222222224e-05, 'epoch': 0.47}


 47%|████▋     | 14175/30000 [2:17:24<1:46:58,  2.47it/s]

{'loss': 19.223, 'grad_norm': 39.39643859863281, 'learning_rate': 2.932592592592593e-05, 'epoch': 0.47}


 47%|████▋     | 14200/30000 [2:17:36<2:14:36,  1.96it/s]

{'loss': 13.5548, 'grad_norm': 55.06263732910156, 'learning_rate': 2.9279629629629628e-05, 'epoch': 0.47}


 47%|████▋     | 14225/30000 [2:17:49<2:20:11,  1.88it/s]

{'loss': 12.1656, 'grad_norm': 41.33216857910156, 'learning_rate': 2.9233333333333334e-05, 'epoch': 0.47}


 48%|████▊     | 14250/30000 [2:18:01<2:08:19,  2.05it/s]

{'loss': 15.6837, 'grad_norm': 40.60786437988281, 'learning_rate': 2.918703703703704e-05, 'epoch': 0.47}


 48%|████▊     | 14275/30000 [2:18:13<2:19:10,  1.88it/s]

{'loss': 20.8752, 'grad_norm': 58.29774475097656, 'learning_rate': 2.914074074074074e-05, 'epoch': 0.48}


 48%|████▊     | 14300/30000 [2:18:25<2:16:49,  1.91it/s]

{'loss': 14.2233, 'grad_norm': 31.426025390625, 'learning_rate': 2.9094444444444446e-05, 'epoch': 0.48}


 48%|████▊     | 14325/30000 [2:18:37<2:09:54,  2.01it/s]

{'loss': 14.3124, 'grad_norm': 54.09067916870117, 'learning_rate': 2.904814814814815e-05, 'epoch': 0.48}


 48%|████▊     | 14350/30000 [2:18:49<2:18:41,  1.88it/s]

{'loss': 20.144, 'grad_norm': 61.77125549316406, 'learning_rate': 2.9001851851851853e-05, 'epoch': 0.48}


 48%|████▊     | 14375/30000 [2:19:02<2:10:56,  1.99it/s]

{'loss': 15.541, 'grad_norm': 55.7199821472168, 'learning_rate': 2.895555555555556e-05, 'epoch': 0.48}


 48%|████▊     | 14400/30000 [2:19:13<2:09:41,  2.00it/s]

{'loss': 18.8114, 'grad_norm': 46.11152267456055, 'learning_rate': 2.8909259259259257e-05, 'epoch': 0.48}


 48%|████▊     | 14425/30000 [2:19:24<1:54:30,  2.27it/s]

{'loss': 11.8897, 'grad_norm': 36.86391067504883, 'learning_rate': 2.8862962962962963e-05, 'epoch': 0.48}


 48%|████▊     | 14450/30000 [2:19:36<1:44:27,  2.48it/s]

{'loss': 17.6285, 'grad_norm': 57.440738677978516, 'learning_rate': 2.8816666666666668e-05, 'epoch': 0.48}


 48%|████▊     | 14475/30000 [2:19:48<1:48:48,  2.38it/s]

{'loss': 18.6071, 'grad_norm': 42.709957122802734, 'learning_rate': 2.877037037037037e-05, 'epoch': 0.48}


 48%|████▊     | 14500/30000 [2:20:00<1:45:40,  2.44it/s]

{'loss': 13.4436, 'grad_norm': 6.63881778717041, 'learning_rate': 2.8724074074074075e-05, 'epoch': 0.48}


 48%|████▊     | 14525/30000 [2:20:12<1:49:02,  2.37it/s]

{'loss': 14.0505, 'grad_norm': 47.21418380737305, 'learning_rate': 2.867777777777778e-05, 'epoch': 0.48}


 48%|████▊     | 14550/30000 [2:20:24<1:49:03,  2.36it/s]

{'loss': 17.0885, 'grad_norm': 13.018009185791016, 'learning_rate': 2.8631481481481483e-05, 'epoch': 0.48}


 49%|████▊     | 14575/30000 [2:20:35<1:57:40,  2.18it/s]

{'loss': 15.0225, 'grad_norm': 23.592641830444336, 'learning_rate': 2.8585185185185188e-05, 'epoch': 0.49}


 49%|████▊     | 14600/30000 [2:20:47<2:06:28,  2.03it/s]

{'loss': 18.369, 'grad_norm': 56.31374740600586, 'learning_rate': 2.8538888888888893e-05, 'epoch': 0.49}


 49%|████▉     | 14625/30000 [2:21:00<2:06:11,  2.03it/s]

{'loss': 15.5158, 'grad_norm': 68.43972778320312, 'learning_rate': 2.8492592592592592e-05, 'epoch': 0.49}


 49%|████▉     | 14650/30000 [2:21:12<1:44:26,  2.45it/s]

{'loss': 17.2447, 'grad_norm': 33.890953063964844, 'learning_rate': 2.8446296296296297e-05, 'epoch': 0.49}


 49%|████▉     | 14675/30000 [2:21:22<1:41:45,  2.51it/s]

{'loss': 9.5789, 'grad_norm': 25.831756591796875, 'learning_rate': 2.84e-05, 'epoch': 0.49}


 49%|████▉     | 14700/30000 [2:21:34<2:20:19,  1.82it/s]

{'loss': 15.1475, 'grad_norm': 18.949247360229492, 'learning_rate': 2.8353703703703704e-05, 'epoch': 0.49}


 49%|████▉     | 14725/30000 [2:21:47<1:55:37,  2.20it/s]

{'loss': 16.3167, 'grad_norm': 21.25999641418457, 'learning_rate': 2.830740740740741e-05, 'epoch': 0.49}


 49%|████▉     | 14750/30000 [2:22:00<2:13:03,  1.91it/s]

{'loss': 19.3685, 'grad_norm': 46.47040939331055, 'learning_rate': 2.8261111111111112e-05, 'epoch': 0.49}


 49%|████▉     | 14775/30000 [2:22:12<1:48:51,  2.33it/s]

{'loss': 14.7628, 'grad_norm': 23.86733055114746, 'learning_rate': 2.8214814814814817e-05, 'epoch': 0.49}


 49%|████▉     | 14800/30000 [2:22:25<2:17:16,  1.85it/s]

{'loss': 12.5854, 'grad_norm': 60.178951263427734, 'learning_rate': 2.8168518518518522e-05, 'epoch': 0.49}


 49%|████▉     | 14825/30000 [2:22:38<2:06:30,  2.00it/s]

{'loss': 18.5357, 'grad_norm': 39.215736389160156, 'learning_rate': 2.812222222222222e-05, 'epoch': 0.49}


 50%|████▉     | 14850/30000 [2:22:50<2:10:13,  1.94it/s]

{'loss': 25.5339, 'grad_norm': 70.93537902832031, 'learning_rate': 2.8075925925925926e-05, 'epoch': 0.49}


 50%|████▉     | 14875/30000 [2:23:01<1:47:02,  2.36it/s]

{'loss': 14.3778, 'grad_norm': 41.82049560546875, 'learning_rate': 2.8029629629629635e-05, 'epoch': 0.5}


 50%|████▉     | 14900/30000 [2:23:12<1:59:57,  2.10it/s]

{'loss': 17.7859, 'grad_norm': 58.8966064453125, 'learning_rate': 2.7983333333333334e-05, 'epoch': 0.5}


 50%|████▉     | 14925/30000 [2:23:25<2:13:23,  1.88it/s]

{'loss': 23.2377, 'grad_norm': 64.13819885253906, 'learning_rate': 2.793703703703704e-05, 'epoch': 0.5}


 50%|████▉     | 14950/30000 [2:23:37<2:06:19,  1.99it/s]

{'loss': 17.1537, 'grad_norm': 45.13072204589844, 'learning_rate': 2.789074074074074e-05, 'epoch': 0.5}


 50%|████▉     | 14975/30000 [2:23:49<1:41:19,  2.47it/s]

{'loss': 19.7511, 'grad_norm': 30.404478073120117, 'learning_rate': 2.7844444444444446e-05, 'epoch': 0.5}


 50%|█████     | 15000/30000 [2:24:01<2:12:22,  1.89it/s]

{'loss': 16.4334, 'grad_norm': 55.94886016845703, 'learning_rate': 2.779814814814815e-05, 'epoch': 0.5}


 50%|█████     | 15025/30000 [2:24:12<1:42:27,  2.44it/s]

{'loss': 12.8993, 'grad_norm': 38.813133239746094, 'learning_rate': 2.775185185185185e-05, 'epoch': 0.5}


 50%|█████     | 15050/30000 [2:24:25<2:11:49,  1.89it/s]

{'loss': 15.7499, 'grad_norm': 48.50754928588867, 'learning_rate': 2.7705555555555556e-05, 'epoch': 0.5}


 50%|█████     | 15075/30000 [2:24:37<2:05:06,  1.99it/s]

{'loss': 22.945, 'grad_norm': 13.32844352722168, 'learning_rate': 2.765925925925926e-05, 'epoch': 0.5}


 50%|█████     | 15100/30000 [2:24:50<2:08:20,  1.93it/s]

{'loss': 24.3798, 'grad_norm': 70.61377716064453, 'learning_rate': 2.7612962962962963e-05, 'epoch': 0.5}


 50%|█████     | 15125/30000 [2:25:02<2:03:46,  2.00it/s]

{'loss': 17.0332, 'grad_norm': 39.9687614440918, 'learning_rate': 2.7566666666666668e-05, 'epoch': 0.5}


 50%|█████     | 15150/30000 [2:25:15<2:04:47,  1.98it/s]

{'loss': 17.7806, 'grad_norm': 46.37089920043945, 'learning_rate': 2.7520370370370373e-05, 'epoch': 0.51}


 51%|█████     | 15175/30000 [2:25:28<2:04:55,  1.98it/s]

{'loss': 19.1408, 'grad_norm': 30.663232803344727, 'learning_rate': 2.7474074074074075e-05, 'epoch': 0.51}


 51%|█████     | 15200/30000 [2:25:40<2:11:08,  1.88it/s]

{'loss': 16.6069, 'grad_norm': 47.47871017456055, 'learning_rate': 2.742777777777778e-05, 'epoch': 0.51}


 51%|█████     | 15225/30000 [2:25:53<2:04:23,  1.98it/s]

{'loss': 19.801, 'grad_norm': 53.28766632080078, 'learning_rate': 2.738148148148148e-05, 'epoch': 0.51}


 51%|█████     | 15250/30000 [2:26:05<1:39:59,  2.46it/s]

{'loss': 23.6707, 'grad_norm': 25.35014533996582, 'learning_rate': 2.7335185185185185e-05, 'epoch': 0.51}


 51%|█████     | 15275/30000 [2:26:17<2:07:18,  1.93it/s]

{'loss': 15.0685, 'grad_norm': 46.30641174316406, 'learning_rate': 2.728888888888889e-05, 'epoch': 0.51}


 51%|█████     | 15300/30000 [2:26:28<1:45:25,  2.32it/s]

{'loss': 14.764, 'grad_norm': 44.0247917175293, 'learning_rate': 2.7242592592592592e-05, 'epoch': 0.51}


 51%|█████     | 15325/30000 [2:26:41<2:02:20,  2.00it/s]

{'loss': 18.8249, 'grad_norm': 38.21992111206055, 'learning_rate': 2.7196296296296297e-05, 'epoch': 0.51}


 51%|█████     | 15350/30000 [2:26:53<1:38:58,  2.47it/s]

{'loss': 17.2721, 'grad_norm': 29.97968101501465, 'learning_rate': 2.7150000000000003e-05, 'epoch': 0.51}


 51%|█████▏    | 15375/30000 [2:27:05<1:47:51,  2.26it/s]

{'loss': 21.7971, 'grad_norm': 45.034427642822266, 'learning_rate': 2.7103703703703705e-05, 'epoch': 0.51}


 51%|█████▏    | 15400/30000 [2:27:18<2:06:16,  1.93it/s]

{'loss': 20.4641, 'grad_norm': 66.85188293457031, 'learning_rate': 2.705740740740741e-05, 'epoch': 0.51}


 51%|█████▏    | 15425/30000 [2:27:31<2:09:42,  1.87it/s]

{'loss': 16.7434, 'grad_norm': 36.48903274536133, 'learning_rate': 2.7011111111111115e-05, 'epoch': 0.51}


 52%|█████▏    | 15450/30000 [2:27:43<2:04:47,  1.94it/s]

{'loss': 18.7551, 'grad_norm': 57.727046966552734, 'learning_rate': 2.6964814814814814e-05, 'epoch': 0.52}


 52%|█████▏    | 15475/30000 [2:27:56<1:46:12,  2.28it/s]

{'loss': 16.1927, 'grad_norm': 17.04656982421875, 'learning_rate': 2.691851851851852e-05, 'epoch': 0.52}


 52%|█████▏    | 15500/30000 [2:28:08<1:46:45,  2.26it/s]

{'loss': 18.7432, 'grad_norm': 35.93132781982422, 'learning_rate': 2.687222222222222e-05, 'epoch': 0.52}


 52%|█████▏    | 15525/30000 [2:28:20<1:33:28,  2.58it/s]

{'loss': 13.0673, 'grad_norm': 25.81853675842285, 'learning_rate': 2.6825925925925926e-05, 'epoch': 0.52}


 52%|█████▏    | 15550/30000 [2:28:32<2:06:18,  1.91it/s]

{'loss': 14.6544, 'grad_norm': 39.04435348510742, 'learning_rate': 2.6779629629629632e-05, 'epoch': 0.52}


 52%|█████▏    | 15575/30000 [2:28:44<1:39:42,  2.41it/s]

{'loss': 14.602, 'grad_norm': 25.094526290893555, 'learning_rate': 2.6733333333333334e-05, 'epoch': 0.52}


 52%|█████▏    | 15600/30000 [2:28:55<1:40:06,  2.40it/s]

{'loss': 11.3719, 'grad_norm': 15.673111915588379, 'learning_rate': 2.668703703703704e-05, 'epoch': 0.52}


 52%|█████▏    | 15625/30000 [2:29:07<1:40:58,  2.37it/s]

{'loss': 14.5283, 'grad_norm': 26.61578369140625, 'learning_rate': 2.6640740740740744e-05, 'epoch': 0.52}


 52%|█████▏    | 15650/30000 [2:29:19<2:01:32,  1.97it/s]

{'loss': 13.4869, 'grad_norm': 48.965370178222656, 'learning_rate': 2.6594444444444443e-05, 'epoch': 0.52}


 52%|█████▏    | 15675/30000 [2:29:31<1:36:20,  2.48it/s]

{'loss': 16.8586, 'grad_norm': 14.739033699035645, 'learning_rate': 2.654814814814815e-05, 'epoch': 0.52}


 52%|█████▏    | 15700/30000 [2:29:44<2:05:39,  1.90it/s]

{'loss': 25.1065, 'grad_norm': 35.291831970214844, 'learning_rate': 2.6501851851851857e-05, 'epoch': 0.52}


 52%|█████▏    | 15725/30000 [2:29:55<1:37:17,  2.45it/s]

{'loss': 13.6053, 'grad_norm': 33.04702377319336, 'learning_rate': 2.6455555555555556e-05, 'epoch': 0.52}


 52%|█████▎    | 15750/30000 [2:30:08<2:06:25,  1.88it/s]

{'loss': 18.1846, 'grad_norm': 38.90666961669922, 'learning_rate': 2.640925925925926e-05, 'epoch': 0.53}


 53%|█████▎    | 15775/30000 [2:30:21<2:01:52,  1.95it/s]

{'loss': 20.397, 'grad_norm': 47.33987808227539, 'learning_rate': 2.6362962962962963e-05, 'epoch': 0.53}


 53%|█████▎    | 15800/30000 [2:30:33<2:00:55,  1.96it/s]

{'loss': 20.3754, 'grad_norm': 52.709903717041016, 'learning_rate': 2.6316666666666668e-05, 'epoch': 0.53}


 53%|█████▎    | 15825/30000 [2:30:44<1:43:03,  2.29it/s]

{'loss': 19.4122, 'grad_norm': 31.987735748291016, 'learning_rate': 2.6270370370370374e-05, 'epoch': 0.53}


 53%|█████▎    | 15850/30000 [2:30:57<1:58:48,  1.98it/s]

{'loss': 14.6204, 'grad_norm': 35.51969528198242, 'learning_rate': 2.6224074074074072e-05, 'epoch': 0.53}


 53%|█████▎    | 15875/30000 [2:31:09<1:56:31,  2.02it/s]

{'loss': 15.6767, 'grad_norm': 36.939910888671875, 'learning_rate': 2.6177777777777777e-05, 'epoch': 0.53}


 53%|█████▎    | 15900/30000 [2:31:21<2:03:03,  1.91it/s]

{'loss': 13.0697, 'grad_norm': 24.996850967407227, 'learning_rate': 2.6131481481481486e-05, 'epoch': 0.53}


 53%|█████▎    | 15925/30000 [2:31:33<2:03:33,  1.90it/s]

{'loss': 21.0494, 'grad_norm': 47.39741134643555, 'learning_rate': 2.6085185185185185e-05, 'epoch': 0.53}


 53%|█████▎    | 15950/30000 [2:31:45<1:59:00,  1.97it/s]

{'loss': 17.2483, 'grad_norm': 47.718223571777344, 'learning_rate': 2.603888888888889e-05, 'epoch': 0.53}


 53%|█████▎    | 15975/30000 [2:31:58<2:08:40,  1.82it/s]

{'loss': 19.5742, 'grad_norm': 43.42972946166992, 'learning_rate': 2.5992592592592595e-05, 'epoch': 0.53}


 53%|█████▎    | 16000/30000 [2:32:12<2:02:48,  1.90it/s]

{'loss': 15.1397, 'grad_norm': 43.00606155395508, 'learning_rate': 2.5946296296296297e-05, 'epoch': 0.53}


Too many dataloader workers: 10 (max is dataset.num_shards=3). Stopping 7 dataloader workers.
 53%|█████▎    | 16000/30000 [2:35:13<2:02:48,  1.90it/s]

{'eval_loss': 28.881000518798828, 'eval_runtime': 181.4851, 'eval_samples_per_second': 21.445, 'eval_steps_per_second': 5.361, 'epoch': 0.53}


 53%|█████▎    | 16025/30000 [2:35:33<1:59:26,  1.95it/s]  

{'loss': 14.7579, 'grad_norm': 49.78045654296875, 'learning_rate': 2.5900000000000003e-05, 'epoch': 0.53}


 54%|█████▎    | 16050/30000 [2:35:45<1:56:07,  2.00it/s]

{'loss': 16.2813, 'grad_norm': 37.815757751464844, 'learning_rate': 2.58537037037037e-05, 'epoch': 0.54}


 54%|█████▎    | 16075/30000 [2:35:58<2:00:35,  1.92it/s]

{'loss': 12.756, 'grad_norm': 40.9350700378418, 'learning_rate': 2.5807407407407407e-05, 'epoch': 0.54}


 54%|█████▎    | 16100/30000 [2:36:10<1:54:37,  2.02it/s]

{'loss': 18.1717, 'grad_norm': 49.16861343383789, 'learning_rate': 2.5761111111111112e-05, 'epoch': 0.54}


 54%|█████▍    | 16125/30000 [2:36:21<1:47:17,  2.16it/s]

{'loss': 12.7727, 'grad_norm': 32.66469192504883, 'learning_rate': 2.5714814814814814e-05, 'epoch': 0.54}


 54%|█████▍    | 16150/30000 [2:36:32<1:32:41,  2.49it/s]

{'loss': 11.3429, 'grad_norm': 17.79810333251953, 'learning_rate': 2.566851851851852e-05, 'epoch': 0.54}


 54%|█████▍    | 16175/30000 [2:36:44<1:43:06,  2.23it/s]

{'loss': 14.2771, 'grad_norm': 34.29072570800781, 'learning_rate': 2.5622222222222225e-05, 'epoch': 0.54}


 54%|█████▍    | 16200/30000 [2:36:57<1:41:11,  2.27it/s]

{'loss': 16.7869, 'grad_norm': 38.39728546142578, 'learning_rate': 2.5575925925925927e-05, 'epoch': 0.54}


 54%|█████▍    | 16225/30000 [2:37:08<1:39:45,  2.30it/s]

{'loss': 17.6887, 'grad_norm': 43.821102142333984, 'learning_rate': 2.5529629629629632e-05, 'epoch': 0.54}


 54%|█████▍    | 16250/30000 [2:37:20<1:52:12,  2.04it/s]

{'loss': 18.3708, 'grad_norm': 28.345565795898438, 'learning_rate': 2.5483333333333337e-05, 'epoch': 0.54}


 54%|█████▍    | 16275/30000 [2:37:32<1:55:31,  1.98it/s]

{'loss': 19.825, 'grad_norm': 51.19709777832031, 'learning_rate': 2.5437037037037036e-05, 'epoch': 0.54}


 54%|█████▍    | 16300/30000 [2:37:45<1:53:05,  2.02it/s]

{'loss': 18.619, 'grad_norm': 34.04478454589844, 'learning_rate': 2.539074074074074e-05, 'epoch': 0.54}


 54%|█████▍    | 16325/30000 [2:37:57<1:34:09,  2.42it/s]

{'loss': 14.9056, 'grad_norm': 42.80632400512695, 'learning_rate': 2.534444444444445e-05, 'epoch': 0.54}


 55%|█████▍    | 16350/30000 [2:38:09<1:38:19,  2.31it/s]

{'loss': 19.4358, 'grad_norm': 41.553401947021484, 'learning_rate': 2.529814814814815e-05, 'epoch': 0.55}


 55%|█████▍    | 16375/30000 [2:38:21<1:33:21,  2.43it/s]

{'loss': 17.1382, 'grad_norm': 30.87876319885254, 'learning_rate': 2.5251851851851854e-05, 'epoch': 0.55}


 55%|█████▍    | 16400/30000 [2:38:33<1:56:37,  1.94it/s]

{'loss': 17.8192, 'grad_norm': 43.13504409790039, 'learning_rate': 2.5205555555555556e-05, 'epoch': 0.55}


 55%|█████▍    | 16425/30000 [2:38:46<1:53:33,  1.99it/s]

{'loss': 15.026, 'grad_norm': 26.492904663085938, 'learning_rate': 2.515925925925926e-05, 'epoch': 0.55}


 55%|█████▍    | 16450/30000 [2:38:58<1:54:18,  1.98it/s]

{'loss': 19.4898, 'grad_norm': 48.15498733520508, 'learning_rate': 2.5112962962962966e-05, 'epoch': 0.55}


 55%|█████▍    | 16475/30000 [2:39:10<1:53:04,  1.99it/s]

{'loss': 15.7005, 'grad_norm': 33.425743103027344, 'learning_rate': 2.5066666666666665e-05, 'epoch': 0.55}


 55%|█████▌    | 16500/30000 [2:39:22<1:33:11,  2.41it/s]

{'loss': 13.4525, 'grad_norm': 16.560102462768555, 'learning_rate': 2.502037037037037e-05, 'epoch': 0.55}


 55%|█████▌    | 16525/30000 [2:39:34<1:53:45,  1.97it/s]

{'loss': 18.5916, 'grad_norm': 34.424869537353516, 'learning_rate': 2.4974074074074076e-05, 'epoch': 0.55}


 55%|█████▌    | 16550/30000 [2:39:47<1:54:27,  1.96it/s]

{'loss': 18.9552, 'grad_norm': 60.89418411254883, 'learning_rate': 2.4927777777777778e-05, 'epoch': 0.55}


 55%|█████▌    | 16575/30000 [2:40:00<1:58:40,  1.89it/s]

{'loss': 12.866, 'grad_norm': 33.969520568847656, 'learning_rate': 2.4881481481481483e-05, 'epoch': 0.55}


 55%|█████▌    | 16600/30000 [2:40:13<1:56:50,  1.91it/s]

{'loss': 11.9652, 'grad_norm': 48.6492919921875, 'learning_rate': 2.4835185185185185e-05, 'epoch': 0.55}


 55%|█████▌    | 16625/30000 [2:40:25<1:44:10,  2.14it/s]

{'loss': 16.5777, 'grad_norm': 33.747344970703125, 'learning_rate': 2.478888888888889e-05, 'epoch': 0.55}


 56%|█████▌    | 16650/30000 [2:40:37<1:37:10,  2.29it/s]

{'loss': 16.2014, 'grad_norm': 20.367324829101562, 'learning_rate': 2.4742592592592596e-05, 'epoch': 0.56}


 56%|█████▌    | 16675/30000 [2:40:50<1:55:00,  1.93it/s]

{'loss': 20.5141, 'grad_norm': 45.9372444152832, 'learning_rate': 2.4696296296296298e-05, 'epoch': 0.56}


 56%|█████▌    | 16700/30000 [2:41:02<1:46:44,  2.08it/s]

{'loss': 19.9832, 'grad_norm': 33.567138671875, 'learning_rate': 2.465e-05, 'epoch': 0.56}


 56%|█████▌    | 16725/30000 [2:41:14<1:41:48,  2.17it/s]

{'loss': 14.1979, 'grad_norm': 12.261873245239258, 'learning_rate': 2.4603703703703705e-05, 'epoch': 0.56}


 56%|█████▌    | 16750/30000 [2:41:25<1:37:01,  2.28it/s]

{'loss': 20.2462, 'grad_norm': 52.2880744934082, 'learning_rate': 2.455740740740741e-05, 'epoch': 0.56}


 56%|█████▌    | 16775/30000 [2:41:38<1:51:58,  1.97it/s]

{'loss': 23.9357, 'grad_norm': 46.14335250854492, 'learning_rate': 2.4511111111111112e-05, 'epoch': 0.56}


 56%|█████▌    | 16800/30000 [2:41:50<1:32:42,  2.37it/s]

{'loss': 16.2492, 'grad_norm': 22.03934097290039, 'learning_rate': 2.4464814814814814e-05, 'epoch': 0.56}


 56%|█████▌    | 16825/30000 [2:42:02<1:50:59,  1.98it/s]

{'loss': 18.7303, 'grad_norm': 43.302833557128906, 'learning_rate': 2.441851851851852e-05, 'epoch': 0.56}


 56%|█████▌    | 16850/30000 [2:42:15<1:54:25,  1.92it/s]

{'loss': 22.5125, 'grad_norm': 52.59840393066406, 'learning_rate': 2.4372222222222225e-05, 'epoch': 0.56}


 56%|█████▋    | 16875/30000 [2:42:26<1:34:55,  2.30it/s]

{'loss': 14.5326, 'grad_norm': 55.095157623291016, 'learning_rate': 2.4325925925925927e-05, 'epoch': 0.56}


 56%|█████▋    | 16900/30000 [2:42:38<1:53:33,  1.92it/s]

{'loss': 16.9513, 'grad_norm': 69.59211730957031, 'learning_rate': 2.427962962962963e-05, 'epoch': 0.56}


 56%|█████▋    | 16925/30000 [2:42:51<1:52:25,  1.94it/s]

{'loss': 18.4914, 'grad_norm': 35.86819076538086, 'learning_rate': 2.4233333333333337e-05, 'epoch': 0.56}


 56%|█████▋    | 16950/30000 [2:43:04<1:53:23,  1.92it/s]

{'loss': 17.1557, 'grad_norm': 20.460100173950195, 'learning_rate': 2.418703703703704e-05, 'epoch': 0.56}


 57%|█████▋    | 16975/30000 [2:43:16<1:29:44,  2.42it/s]

{'loss': 13.0298, 'grad_norm': 12.99942684173584, 'learning_rate': 2.414074074074074e-05, 'epoch': 0.57}


 57%|█████▋    | 17000/30000 [2:43:28<1:51:19,  1.95it/s]

{'loss': 14.9309, 'grad_norm': 62.600704193115234, 'learning_rate': 2.4094444444444443e-05, 'epoch': 0.57}


 57%|█████▋    | 17025/30000 [2:43:40<1:52:19,  1.93it/s]

{'loss': 22.7563, 'grad_norm': 40.01313400268555, 'learning_rate': 2.404814814814815e-05, 'epoch': 0.57}


 57%|█████▋    | 17050/30000 [2:43:52<1:39:04,  2.18it/s]

{'loss': 19.0125, 'grad_norm': 41.78205871582031, 'learning_rate': 2.4001851851851854e-05, 'epoch': 0.57}


 57%|█████▋    | 17075/30000 [2:44:04<1:51:31,  1.93it/s]

{'loss': 20.8628, 'grad_norm': 50.66983413696289, 'learning_rate': 2.3955555555555556e-05, 'epoch': 0.57}


 57%|█████▋    | 17100/30000 [2:44:17<1:48:57,  1.97it/s]

{'loss': 21.4316, 'grad_norm': 47.715213775634766, 'learning_rate': 2.3909259259259258e-05, 'epoch': 0.57}


 57%|█████▋    | 17125/30000 [2:44:29<1:29:03,  2.41it/s]

{'loss': 18.9947, 'grad_norm': 24.722082138061523, 'learning_rate': 2.3862962962962963e-05, 'epoch': 0.57}


 57%|█████▋    | 17150/30000 [2:44:41<1:49:19,  1.96it/s]

{'loss': 17.743, 'grad_norm': 27.134727478027344, 'learning_rate': 2.381666666666667e-05, 'epoch': 0.57}


 57%|█████▋    | 17175/30000 [2:44:53<1:28:33,  2.41it/s]

{'loss': 15.8097, 'grad_norm': 44.66362380981445, 'learning_rate': 2.377037037037037e-05, 'epoch': 0.57}


 57%|█████▋    | 17200/30000 [2:45:05<1:38:08,  2.17it/s]

{'loss': 18.1192, 'grad_norm': 56.44839096069336, 'learning_rate': 2.3724074074074076e-05, 'epoch': 0.57}


 57%|█████▋    | 17225/30000 [2:45:18<1:48:06,  1.97it/s]

{'loss': 16.1973, 'grad_norm': 38.34716796875, 'learning_rate': 2.3677777777777778e-05, 'epoch': 0.57}


 57%|█████▊    | 17250/30000 [2:45:30<1:50:39,  1.92it/s]

{'loss': 18.6816, 'grad_norm': 53.79248046875, 'learning_rate': 2.3631481481481483e-05, 'epoch': 0.57}


 58%|█████▊    | 17275/30000 [2:45:42<1:48:49,  1.95it/s]

{'loss': 19.5313, 'grad_norm': 40.634002685546875, 'learning_rate': 2.3585185185185185e-05, 'epoch': 0.58}


 58%|█████▊    | 17300/30000 [2:45:55<1:30:56,  2.33it/s]

{'loss': 17.2492, 'grad_norm': 16.919553756713867, 'learning_rate': 2.353888888888889e-05, 'epoch': 0.58}


 58%|█████▊    | 17325/30000 [2:46:07<1:43:20,  2.04it/s]

{'loss': 16.9103, 'grad_norm': 19.087316513061523, 'learning_rate': 2.3494444444444446e-05, 'epoch': 0.58}


 58%|█████▊    | 17350/30000 [2:46:17<1:25:02,  2.48it/s]

{'loss': 12.2813, 'grad_norm': 18.728178024291992, 'learning_rate': 2.344814814814815e-05, 'epoch': 0.58}


 58%|█████▊    | 17375/30000 [2:46:30<1:54:27,  1.84it/s]

{'loss': 14.6497, 'grad_norm': 38.02111053466797, 'learning_rate': 2.3401851851851853e-05, 'epoch': 0.58}


 58%|█████▊    | 17400/30000 [2:46:42<1:49:05,  1.93it/s]

{'loss': 11.4905, 'grad_norm': 17.0706729888916, 'learning_rate': 2.3355555555555555e-05, 'epoch': 0.58}


 58%|█████▊    | 17425/30000 [2:46:54<1:48:33,  1.93it/s]

{'loss': 17.7074, 'grad_norm': 53.604740142822266, 'learning_rate': 2.330925925925926e-05, 'epoch': 0.58}


 58%|█████▊    | 17450/30000 [2:47:06<1:30:56,  2.30it/s]

{'loss': 11.4736, 'grad_norm': 16.702430725097656, 'learning_rate': 2.3262962962962965e-05, 'epoch': 0.58}


 58%|█████▊    | 17475/30000 [2:47:19<1:50:17,  1.89it/s]

{'loss': 15.5475, 'grad_norm': 35.86323928833008, 'learning_rate': 2.3216666666666667e-05, 'epoch': 0.58}


 58%|█████▊    | 17500/30000 [2:47:32<1:44:37,  1.99it/s]

{'loss': 19.1986, 'grad_norm': 46.896881103515625, 'learning_rate': 2.3170370370370373e-05, 'epoch': 0.58}


 58%|█████▊    | 17525/30000 [2:47:44<1:41:59,  2.04it/s]

{'loss': 19.8042, 'grad_norm': 41.828956604003906, 'learning_rate': 2.3124074074074075e-05, 'epoch': 0.58}


 58%|█████▊    | 17550/30000 [2:47:56<1:42:15,  2.03it/s]

{'loss': 19.05, 'grad_norm': 43.37610626220703, 'learning_rate': 2.307777777777778e-05, 'epoch': 0.58}


 59%|█████▊    | 17575/30000 [2:48:08<1:50:14,  1.88it/s]

{'loss': 17.0905, 'grad_norm': 71.23760986328125, 'learning_rate': 2.3031481481481482e-05, 'epoch': 0.59}


 59%|█████▊    | 17600/30000 [2:48:20<1:29:54,  2.30it/s]

{'loss': 13.5297, 'grad_norm': 24.223026275634766, 'learning_rate': 2.2985185185185187e-05, 'epoch': 0.59}


 59%|█████▉    | 17625/30000 [2:48:33<1:47:36,  1.92it/s]

{'loss': 20.0012, 'grad_norm': 37.35341262817383, 'learning_rate': 2.293888888888889e-05, 'epoch': 0.59}


 59%|█████▉    | 17650/30000 [2:48:45<1:44:31,  1.97it/s]

{'loss': 18.9549, 'grad_norm': 30.263315200805664, 'learning_rate': 2.2892592592592595e-05, 'epoch': 0.59}


 59%|█████▉    | 17675/30000 [2:48:57<1:26:36,  2.37it/s]

{'loss': 12.0384, 'grad_norm': 22.911001205444336, 'learning_rate': 2.2846296296296297e-05, 'epoch': 0.59}


 59%|█████▉    | 17700/30000 [2:49:09<1:47:52,  1.90it/s]

{'loss': 17.7568, 'grad_norm': 48.8204231262207, 'learning_rate': 2.2800000000000002e-05, 'epoch': 0.59}


 59%|█████▉    | 17725/30000 [2:49:22<1:42:35,  1.99it/s]

{'loss': 16.7051, 'grad_norm': 55.075077056884766, 'learning_rate': 2.2753703703703704e-05, 'epoch': 0.59}


 59%|█████▉    | 17750/30000 [2:49:33<1:44:52,  1.95it/s]

{'loss': 12.8908, 'grad_norm': 47.695613861083984, 'learning_rate': 2.270740740740741e-05, 'epoch': 0.59}


 59%|█████▉    | 17775/30000 [2:49:46<1:51:39,  1.82it/s]

{'loss': 16.3359, 'grad_norm': 39.56904220581055, 'learning_rate': 2.2661111111111115e-05, 'epoch': 0.59}


 59%|█████▉    | 17800/30000 [2:49:58<1:42:06,  1.99it/s]

{'loss': 18.7606, 'grad_norm': 55.10137939453125, 'learning_rate': 2.2614814814814817e-05, 'epoch': 0.59}


 59%|█████▉    | 17825/30000 [2:50:11<1:40:22,  2.02it/s]

{'loss': 17.6844, 'grad_norm': 25.88570785522461, 'learning_rate': 2.256851851851852e-05, 'epoch': 0.59}


 60%|█████▉    | 17850/30000 [2:50:23<1:39:19,  2.04it/s]

{'loss': 17.461, 'grad_norm': 43.8101692199707, 'learning_rate': 2.2522222222222224e-05, 'epoch': 0.59}


 60%|█████▉    | 17875/30000 [2:50:35<1:38:19,  2.06it/s]

{'loss': 12.109, 'grad_norm': 94.05409240722656, 'learning_rate': 2.247592592592593e-05, 'epoch': 0.6}


 60%|█████▉    | 17900/30000 [2:50:48<1:42:24,  1.97it/s]

{'loss': 15.5991, 'grad_norm': 39.020938873291016, 'learning_rate': 2.242962962962963e-05, 'epoch': 0.6}


 60%|█████▉    | 17925/30000 [2:51:00<1:29:16,  2.25it/s]

{'loss': 17.7116, 'grad_norm': 32.1358757019043, 'learning_rate': 2.2383333333333333e-05, 'epoch': 0.6}


 60%|█████▉    | 17950/30000 [2:51:12<1:24:37,  2.37it/s]

{'loss': 11.4871, 'grad_norm': 13.641562461853027, 'learning_rate': 2.233703703703704e-05, 'epoch': 0.6}


 60%|█████▉    | 17975/30000 [2:51:23<1:43:58,  1.93it/s]

{'loss': 19.8778, 'grad_norm': 55.279579162597656, 'learning_rate': 2.2290740740740744e-05, 'epoch': 0.6}


 60%|██████    | 18000/30000 [2:51:35<1:44:32,  1.91it/s]

{'loss': 16.6518, 'grad_norm': 11.729783058166504, 'learning_rate': 2.2244444444444446e-05, 'epoch': 0.6}


Too many dataloader workers: 10 (max is dataset.num_shards=3). Stopping 7 dataloader workers.
 60%|██████    | 18000/30000 [2:54:40<1:44:32,  1.91it/s]

{'eval_loss': 27.658283233642578, 'eval_runtime': 184.8784, 'eval_samples_per_second': 21.052, 'eval_steps_per_second': 5.263, 'epoch': 0.6}


 60%|██████    | 18025/30000 [2:55:01<1:17:39,  2.57it/s]  

{'loss': 15.6066, 'grad_norm': 28.024822235107422, 'learning_rate': 2.2198148148148148e-05, 'epoch': 0.6}


 60%|██████    | 18050/30000 [2:55:13<1:43:02,  1.93it/s]

{'loss': 13.9711, 'grad_norm': 27.60226821899414, 'learning_rate': 2.2151851851851853e-05, 'epoch': 0.6}


 60%|██████    | 18075/30000 [2:55:25<1:20:17,  2.48it/s]

{'loss': 21.2375, 'grad_norm': 29.96381187438965, 'learning_rate': 2.2105555555555558e-05, 'epoch': 0.6}


 60%|██████    | 18100/30000 [2:55:38<1:42:21,  1.94it/s]

{'loss': 23.7681, 'grad_norm': 42.28201675415039, 'learning_rate': 2.205925925925926e-05, 'epoch': 0.6}


 60%|██████    | 18125/30000 [2:55:49<1:39:40,  1.99it/s]

{'loss': 13.5533, 'grad_norm': 36.56755447387695, 'learning_rate': 2.2012962962962962e-05, 'epoch': 0.6}


 60%|██████    | 18150/30000 [2:56:01<1:20:12,  2.46it/s]

{'loss': 17.0252, 'grad_norm': 39.18189239501953, 'learning_rate': 2.1966666666666668e-05, 'epoch': 0.6}


 61%|██████    | 18175/30000 [2:56:13<1:39:54,  1.97it/s]

{'loss': 16.3206, 'grad_norm': 25.402647018432617, 'learning_rate': 2.1920370370370373e-05, 'epoch': 0.61}


 61%|██████    | 18200/30000 [2:56:24<1:14:52,  2.63it/s]

{'loss': 15.1733, 'grad_norm': 14.18698787689209, 'learning_rate': 2.1874074074074075e-05, 'epoch': 0.61}


 61%|██████    | 18225/30000 [2:56:36<1:41:43,  1.93it/s]

{'loss': 14.5929, 'grad_norm': 25.220701217651367, 'learning_rate': 2.1827777777777777e-05, 'epoch': 0.61}


 61%|██████    | 18250/30000 [2:56:48<1:41:24,  1.93it/s]

{'loss': 17.1488, 'grad_norm': 79.80551147460938, 'learning_rate': 2.1781481481481482e-05, 'epoch': 0.61}


 61%|██████    | 18275/30000 [2:57:00<1:31:46,  2.13it/s]

{'loss': 15.801, 'grad_norm': 50.191009521484375, 'learning_rate': 2.1735185185185187e-05, 'epoch': 0.61}


 61%|██████    | 18300/30000 [2:57:11<1:33:15,  2.09it/s]

{'loss': 13.736, 'grad_norm': 44.72626495361328, 'learning_rate': 2.168888888888889e-05, 'epoch': 0.61}


 61%|██████    | 18325/30000 [2:57:24<1:37:41,  1.99it/s]

{'loss': 21.4151, 'grad_norm': 72.40023040771484, 'learning_rate': 2.1642592592592595e-05, 'epoch': 0.61}


 61%|██████    | 18350/30000 [2:57:36<1:32:44,  2.09it/s]

{'loss': 12.2881, 'grad_norm': 31.423015594482422, 'learning_rate': 2.1596296296296297e-05, 'epoch': 0.61}


 61%|██████▏   | 18375/30000 [2:57:46<1:19:51,  2.43it/s]

{'loss': 11.9473, 'grad_norm': 31.860801696777344, 'learning_rate': 2.1550000000000002e-05, 'epoch': 0.61}


 61%|██████▏   | 18400/30000 [2:57:57<1:21:05,  2.38it/s]

{'loss': 10.952, 'grad_norm': 45.14696502685547, 'learning_rate': 2.1503703703703704e-05, 'epoch': 0.61}


 61%|██████▏   | 18425/30000 [2:58:09<1:34:45,  2.04it/s]

{'loss': 21.1786, 'grad_norm': 63.31108856201172, 'learning_rate': 2.145740740740741e-05, 'epoch': 0.61}


 62%|██████▏   | 18450/30000 [2:58:21<1:42:23,  1.88it/s]

{'loss': 14.9559, 'grad_norm': 35.160675048828125, 'learning_rate': 2.141111111111111e-05, 'epoch': 0.61}


 62%|██████▏   | 18475/30000 [2:58:33<1:34:09,  2.04it/s]

{'loss': 16.0466, 'grad_norm': 47.87949752807617, 'learning_rate': 2.1364814814814817e-05, 'epoch': 0.62}


 62%|██████▏   | 18500/30000 [2:58:46<1:26:02,  2.23it/s]

{'loss': 16.7259, 'grad_norm': 26.545551300048828, 'learning_rate': 2.131851851851852e-05, 'epoch': 0.62}


 62%|██████▏   | 18525/30000 [2:58:58<1:36:45,  1.98it/s]

{'loss': 21.6434, 'grad_norm': 42.79142379760742, 'learning_rate': 2.1272222222222224e-05, 'epoch': 0.62}


 62%|██████▏   | 18550/30000 [2:59:10<1:33:37,  2.04it/s]

{'loss': 13.1628, 'grad_norm': 43.034263610839844, 'learning_rate': 2.1225925925925926e-05, 'epoch': 0.62}


 62%|██████▏   | 18575/30000 [2:59:22<1:36:08,  1.98it/s]

{'loss': 13.899, 'grad_norm': 43.018001556396484, 'learning_rate': 2.117962962962963e-05, 'epoch': 0.62}


 62%|██████▏   | 18600/30000 [2:59:34<1:17:28,  2.45it/s]

{'loss': 17.2304, 'grad_norm': 19.399621963500977, 'learning_rate': 2.1133333333333337e-05, 'epoch': 0.62}


 62%|██████▏   | 18625/30000 [2:59:45<1:14:54,  2.53it/s]

{'loss': 16.7169, 'grad_norm': 23.1240177154541, 'learning_rate': 2.108703703703704e-05, 'epoch': 0.62}


 62%|██████▏   | 18650/30000 [2:59:57<1:33:04,  2.03it/s]

{'loss': 21.0955, 'grad_norm': 49.972537994384766, 'learning_rate': 2.104074074074074e-05, 'epoch': 0.62}


 62%|██████▏   | 18675/30000 [3:00:09<1:21:59,  2.30it/s]

{'loss': 14.4428, 'grad_norm': 23.47067642211914, 'learning_rate': 2.0994444444444446e-05, 'epoch': 0.62}


 62%|██████▏   | 18677/30000 [3:00:10<1:28:38,  2.13it/s]'(ProtocolError('Connection aborted.', BrokenPipeError(32, 'Broken pipe')), '(Request ID: 6d753d3a-e4b9-473f-877c-bfbb58cc2293)')' thrown while requesting GET https://huggingface.co/datasets/nguyenvulebinh/AVYT/resolve/e6c6bf6f40e698b82215d269cfc0a0d65a7a2372/vox2/vox2-dev-000006.tar
Retrying in 1s [Retry 1/5].
 62%|██████▏   | 18700/30000 [3:00:21<1:31:39,  2.05it/s]

{'loss': 15.7197, 'grad_norm': 16.802000045776367, 'learning_rate': 2.094814814814815e-05, 'epoch': 0.62}


 62%|██████▏   | 18725/30000 [3:00:33<1:31:43,  2.05it/s]

{'loss': 15.046, 'grad_norm': 16.420246124267578, 'learning_rate': 2.0901851851851853e-05, 'epoch': 0.62}


 62%|██████▎   | 18750/30000 [3:00:45<1:33:49,  2.00it/s]

{'loss': 13.48, 'grad_norm': 37.68280029296875, 'learning_rate': 2.0855555555555555e-05, 'epoch': 0.62}


 63%|██████▎   | 18775/30000 [3:00:58<1:32:55,  2.01it/s]

{'loss': 16.4135, 'grad_norm': 26.11446762084961, 'learning_rate': 2.080925925925926e-05, 'epoch': 0.63}


 63%|██████▎   | 18800/30000 [3:01:10<1:31:34,  2.04it/s]

{'loss': 14.9064, 'grad_norm': 56.0360221862793, 'learning_rate': 2.0762962962962966e-05, 'epoch': 0.63}


 63%|██████▎   | 18825/30000 [3:01:21<1:31:47,  2.03it/s]

{'loss': 13.4264, 'grad_norm': 37.70730972290039, 'learning_rate': 2.0716666666666668e-05, 'epoch': 0.63}


 63%|██████▎   | 18850/30000 [3:01:32<1:24:10,  2.21it/s]

{'loss': 16.6455, 'grad_norm': 39.35651397705078, 'learning_rate': 2.067037037037037e-05, 'epoch': 0.63}


 63%|██████▎   | 18875/30000 [3:01:44<1:24:42,  2.19it/s]

{'loss': 13.1753, 'grad_norm': 44.28331756591797, 'learning_rate': 2.0624074074074075e-05, 'epoch': 0.63}


 63%|██████▎   | 18900/30000 [3:01:56<1:36:05,  1.93it/s]

{'loss': 17.7199, 'grad_norm': 41.657859802246094, 'learning_rate': 2.057777777777778e-05, 'epoch': 0.63}


 63%|██████▎   | 18925/30000 [3:02:08<1:36:10,  1.92it/s]

{'loss': 17.3549, 'grad_norm': 35.411834716796875, 'learning_rate': 2.0531481481481482e-05, 'epoch': 0.63}


 63%|██████▎   | 18950/30000 [3:02:20<1:16:24,  2.41it/s]

{'loss': 13.7184, 'grad_norm': 12.93682861328125, 'learning_rate': 2.0485185185185184e-05, 'epoch': 0.63}


 63%|██████▎   | 18975/30000 [3:02:32<1:36:39,  1.90it/s]

{'loss': 18.5175, 'grad_norm': 49.61681365966797, 'learning_rate': 2.043888888888889e-05, 'epoch': 0.63}


 63%|██████▎   | 19000/30000 [3:02:45<1:30:28,  2.03it/s]

{'loss': 18.983, 'grad_norm': 43.062557220458984, 'learning_rate': 2.0392592592592595e-05, 'epoch': 0.63}


 63%|██████▎   | 19025/30000 [3:02:56<1:14:22,  2.46it/s]

{'loss': 15.6158, 'grad_norm': 27.710145950317383, 'learning_rate': 2.0346296296296297e-05, 'epoch': 0.63}


 64%|██████▎   | 19050/30000 [3:03:08<1:31:22,  2.00it/s]

{'loss': 16.9645, 'grad_norm': 35.04247283935547, 'learning_rate': 2.0300000000000002e-05, 'epoch': 0.64}


 64%|██████▎   | 19075/30000 [3:03:19<1:11:48,  2.54it/s]

{'loss': 16.2454, 'grad_norm': 56.672054290771484, 'learning_rate': 2.0253703703703704e-05, 'epoch': 0.64}


 64%|██████▎   | 19100/30000 [3:03:31<1:16:39,  2.37it/s]

{'loss': 19.5135, 'grad_norm': 48.877891540527344, 'learning_rate': 2.020740740740741e-05, 'epoch': 0.64}


 64%|██████▍   | 19125/30000 [3:03:43<1:36:16,  1.88it/s]

{'loss': 19.4196, 'grad_norm': 48.401100158691406, 'learning_rate': 2.016111111111111e-05, 'epoch': 0.64}


 64%|██████▍   | 19150/30000 [3:03:55<1:19:28,  2.28it/s]

{'loss': 15.8626, 'grad_norm': 27.070737838745117, 'learning_rate': 2.0114814814814817e-05, 'epoch': 0.64}


 64%|██████▍   | 19175/30000 [3:04:08<1:33:11,  1.94it/s]

{'loss': 17.5376, 'grad_norm': 31.723995208740234, 'learning_rate': 2.006851851851852e-05, 'epoch': 0.64}


 64%|██████▍   | 19200/30000 [3:04:21<1:30:52,  1.98it/s]

{'loss': 19.0396, 'grad_norm': 33.0833854675293, 'learning_rate': 2.0022222222222224e-05, 'epoch': 0.64}


 64%|██████▍   | 19225/30000 [3:04:33<1:10:19,  2.55it/s]

{'loss': 21.9267, 'grad_norm': 29.41127586364746, 'learning_rate': 1.9975925925925926e-05, 'epoch': 0.64}


 64%|██████▍   | 19250/30000 [3:04:44<1:10:20,  2.55it/s]

{'loss': 10.0944, 'grad_norm': 10.420674324035645, 'learning_rate': 1.992962962962963e-05, 'epoch': 0.64}


 64%|██████▍   | 19275/30000 [3:04:55<1:27:50,  2.03it/s]

{'loss': 15.9789, 'grad_norm': 41.72770690917969, 'learning_rate': 1.9883333333333333e-05, 'epoch': 0.64}


 64%|██████▍   | 19300/30000 [3:05:07<1:09:59,  2.55it/s]

{'loss': 18.2954, 'grad_norm': 7.91192102432251, 'learning_rate': 1.983703703703704e-05, 'epoch': 0.64}


 64%|██████▍   | 19325/30000 [3:05:20<1:33:18,  1.91it/s]

{'loss': 20.2605, 'grad_norm': 81.379638671875, 'learning_rate': 1.9790740740740744e-05, 'epoch': 0.64}


 64%|██████▍   | 19350/30000 [3:05:32<1:31:13,  1.95it/s]

{'loss': 13.7242, 'grad_norm': 49.099647521972656, 'learning_rate': 1.9744444444444446e-05, 'epoch': 0.65}


 65%|██████▍   | 19375/30000 [3:05:45<1:28:20,  2.00it/s]

{'loss': 18.4847, 'grad_norm': 42.256568908691406, 'learning_rate': 1.9698148148148148e-05, 'epoch': 0.65}


 65%|██████▍   | 19400/30000 [3:05:57<1:31:49,  1.92it/s]

{'loss': 21.3448, 'grad_norm': 50.79570770263672, 'learning_rate': 1.9651851851851853e-05, 'epoch': 0.65}


 65%|██████▍   | 19425/30000 [3:06:09<1:27:56,  2.00it/s]

{'loss': 14.6134, 'grad_norm': 66.6483154296875, 'learning_rate': 1.960555555555556e-05, 'epoch': 0.65}


 65%|██████▍   | 19450/30000 [3:06:21<1:28:42,  1.98it/s]

{'loss': 24.8055, 'grad_norm': 44.92084503173828, 'learning_rate': 1.955925925925926e-05, 'epoch': 0.65}


 65%|██████▍   | 19475/30000 [3:06:34<1:29:30,  1.96it/s]

{'loss': 19.527, 'grad_norm': 60.31160354614258, 'learning_rate': 1.9512962962962962e-05, 'epoch': 0.65}


 65%|██████▌   | 19500/30000 [3:06:46<1:26:42,  2.02it/s]

{'loss': 17.0112, 'grad_norm': 43.993656158447266, 'learning_rate': 1.9466666666666668e-05, 'epoch': 0.65}


 65%|██████▌   | 19525/30000 [3:06:58<1:25:53,  2.03it/s]

{'loss': 15.1286, 'grad_norm': 38.11527633666992, 'learning_rate': 1.9420370370370373e-05, 'epoch': 0.65}


 65%|██████▌   | 19550/30000 [3:07:11<1:12:15,  2.41it/s]

{'loss': 15.3269, 'grad_norm': 34.34827423095703, 'learning_rate': 1.9374074074074075e-05, 'epoch': 0.65}


 65%|██████▌   | 19575/30000 [3:07:22<1:24:48,  2.05it/s]

{'loss': 16.1831, 'grad_norm': 48.833213806152344, 'learning_rate': 1.9327777777777777e-05, 'epoch': 0.65}


 65%|██████▌   | 19600/30000 [3:07:33<1:07:19,  2.57it/s]

{'loss': 11.523, 'grad_norm': 36.08258056640625, 'learning_rate': 1.9281481481481482e-05, 'epoch': 0.65}


 65%|██████▌   | 19625/30000 [3:07:45<1:07:20,  2.57it/s]

{'loss': 15.0657, 'grad_norm': 42.3525390625, 'learning_rate': 1.9235185185185188e-05, 'epoch': 0.65}


 66%|██████▌   | 19650/30000 [3:07:57<1:24:16,  2.05it/s]

{'loss': 15.7202, 'grad_norm': 46.596736907958984, 'learning_rate': 1.918888888888889e-05, 'epoch': 0.66}


 66%|██████▌   | 19675/30000 [3:08:10<1:25:56,  2.00it/s]

{'loss': 24.7739, 'grad_norm': 32.381072998046875, 'learning_rate': 1.914259259259259e-05, 'epoch': 0.66}


 66%|██████▌   | 19700/30000 [3:08:21<1:25:13,  2.01it/s]

{'loss': 14.1362, 'grad_norm': 24.872770309448242, 'learning_rate': 1.9096296296296297e-05, 'epoch': 0.66}


 66%|██████▌   | 19725/30000 [3:08:33<1:28:30,  1.94it/s]

{'loss': 17.06, 'grad_norm': 49.59062957763672, 'learning_rate': 1.9050000000000002e-05, 'epoch': 0.66}


 66%|██████▌   | 19750/30000 [3:08:44<1:27:49,  1.95it/s]

{'loss': 19.1172, 'grad_norm': 37.44957733154297, 'learning_rate': 1.9003703703703704e-05, 'epoch': 0.66}


 66%|██████▌   | 19775/30000 [3:08:57<1:25:06,  2.00it/s]

{'loss': 17.5411, 'grad_norm': 23.416982650756836, 'learning_rate': 1.8957407407407406e-05, 'epoch': 0.66}


 66%|██████▌   | 19800/30000 [3:09:09<1:11:29,  2.38it/s]

{'loss': 17.4042, 'grad_norm': 28.590530395507812, 'learning_rate': 1.891111111111111e-05, 'epoch': 0.66}


 66%|██████▌   | 19825/30000 [3:09:21<1:08:43,  2.47it/s]

{'loss': 15.6501, 'grad_norm': 27.201372146606445, 'learning_rate': 1.8864814814814817e-05, 'epoch': 0.66}


 66%|██████▌   | 19850/30000 [3:09:33<1:27:27,  1.93it/s]

{'loss': 15.2541, 'grad_norm': 46.72618865966797, 'learning_rate': 1.881851851851852e-05, 'epoch': 0.66}


 66%|██████▋   | 19875/30000 [3:09:44<1:27:20,  1.93it/s]

{'loss': 15.5932, 'grad_norm': 32.2164192199707, 'learning_rate': 1.8772222222222224e-05, 'epoch': 0.66}


 66%|██████▋   | 19900/30000 [3:09:57<1:23:46,  2.01it/s]

{'loss': 20.5949, 'grad_norm': 30.103668212890625, 'learning_rate': 1.8725925925925926e-05, 'epoch': 0.66}


 66%|██████▋   | 19925/30000 [3:10:09<1:25:25,  1.97it/s]

{'loss': 18.7794, 'grad_norm': 37.71989059448242, 'learning_rate': 1.867962962962963e-05, 'epoch': 0.66}


 66%|██████▋   | 19950/30000 [3:10:22<1:26:13,  1.94it/s]

{'loss': 17.7583, 'grad_norm': 46.06471633911133, 'learning_rate': 1.8633333333333333e-05, 'epoch': 0.67}


 67%|██████▋   | 19975/30000 [3:10:33<1:26:47,  1.93it/s]

{'loss': 13.9596, 'grad_norm': 44.86674499511719, 'learning_rate': 1.858703703703704e-05, 'epoch': 0.67}


 67%|██████▋   | 20000/30000 [3:10:45<1:27:27,  1.91it/s]

{'loss': 25.1643, 'grad_norm': 55.64210510253906, 'learning_rate': 1.854074074074074e-05, 'epoch': 0.67}


Too many dataloader workers: 10 (max is dataset.num_shards=3). Stopping 7 dataloader workers.
 67%|██████▋   | 20000/30000 [3:13:51<1:27:27,  1.91it/s]

{'eval_loss': 28.1104736328125, 'eval_runtime': 186.448, 'eval_samples_per_second': 20.874, 'eval_steps_per_second': 5.219, 'epoch': 0.67}


 67%|██████▋   | 20025/30000 [3:14:11<1:25:56,  1.93it/s]  

{'loss': 21.5254, 'grad_norm': 47.99876022338867, 'learning_rate': 1.8494444444444446e-05, 'epoch': 0.67}


 67%|██████▋   | 20050/30000 [3:14:24<1:14:49,  2.22it/s]

{'loss': 15.0915, 'grad_norm': 30.07797622680664, 'learning_rate': 1.8448148148148148e-05, 'epoch': 0.67}


 67%|██████▋   | 20075/30000 [3:14:37<1:25:03,  1.94it/s]

{'loss': 22.9586, 'grad_norm': 52.750362396240234, 'learning_rate': 1.8401851851851853e-05, 'epoch': 0.67}


 67%|██████▋   | 20100/30000 [3:14:48<1:10:39,  2.34it/s]

{'loss': 18.141, 'grad_norm': 35.12208938598633, 'learning_rate': 1.8355555555555555e-05, 'epoch': 0.67}


 67%|██████▋   | 20125/30000 [3:14:59<1:24:29,  1.95it/s]

{'loss': 10.6285, 'grad_norm': 26.735408782958984, 'learning_rate': 1.830925925925926e-05, 'epoch': 0.67}


 67%|██████▋   | 20150/30000 [3:15:12<1:21:06,  2.02it/s]

{'loss': 17.6223, 'grad_norm': 42.81346130371094, 'learning_rate': 1.8262962962962966e-05, 'epoch': 0.67}


 67%|██████▋   | 20175/30000 [3:15:24<1:23:58,  1.95it/s]

{'loss': 15.1135, 'grad_norm': 53.43558120727539, 'learning_rate': 1.8216666666666668e-05, 'epoch': 0.67}


 67%|██████▋   | 20200/30000 [3:15:37<1:25:34,  1.91it/s]

{'loss': 17.2568, 'grad_norm': 29.614229202270508, 'learning_rate': 1.817037037037037e-05, 'epoch': 0.67}


 67%|██████▋   | 20225/30000 [3:15:50<1:35:29,  1.71it/s]

{'loss': 18.6612, 'grad_norm': 52.32588195800781, 'learning_rate': 1.8124074074074075e-05, 'epoch': 0.67}


 68%|██████▊   | 20250/30000 [3:16:01<1:19:45,  2.04it/s]

{'loss': 9.2495, 'grad_norm': 28.650163650512695, 'learning_rate': 1.807777777777778e-05, 'epoch': 0.68}


 68%|██████▊   | 20275/30000 [3:16:14<1:24:00,  1.93it/s]

{'loss': 20.9527, 'grad_norm': 28.727684020996094, 'learning_rate': 1.8031481481481482e-05, 'epoch': 0.68}


 68%|██████▊   | 20300/30000 [3:16:26<1:13:50,  2.19it/s]

{'loss': 19.4471, 'grad_norm': 62.702049255371094, 'learning_rate': 1.7985185185185184e-05, 'epoch': 0.68}


 68%|██████▊   | 20325/30000 [3:16:38<1:18:19,  2.06it/s]

{'loss': 13.7754, 'grad_norm': 37.96190643310547, 'learning_rate': 1.793888888888889e-05, 'epoch': 0.68}


 68%|██████▊   | 20350/30000 [3:16:50<1:18:01,  2.06it/s]

{'loss': 12.8993, 'grad_norm': 47.65336608886719, 'learning_rate': 1.7892592592592595e-05, 'epoch': 0.68}


 68%|██████▊   | 20375/30000 [3:17:01<1:17:21,  2.07it/s]

{'loss': 14.7202, 'grad_norm': 52.3100471496582, 'learning_rate': 1.7846296296296297e-05, 'epoch': 0.68}


 68%|██████▊   | 20400/30000 [3:17:12<1:05:02,  2.46it/s]

{'loss': 10.6201, 'grad_norm': 20.214082717895508, 'learning_rate': 1.78e-05, 'epoch': 0.68}


 68%|██████▊   | 20425/30000 [3:17:24<1:09:55,  2.28it/s]

{'loss': 15.8049, 'grad_norm': 64.95606231689453, 'learning_rate': 1.7753703703703704e-05, 'epoch': 0.68}


 68%|██████▊   | 20450/30000 [3:17:36<1:19:46,  2.00it/s]

{'loss': 17.0661, 'grad_norm': 31.315122604370117, 'learning_rate': 1.770740740740741e-05, 'epoch': 0.68}


 68%|██████▊   | 20475/30000 [3:17:48<1:14:10,  2.14it/s]

{'loss': 11.8447, 'grad_norm': 43.898929595947266, 'learning_rate': 1.766111111111111e-05, 'epoch': 0.68}


 68%|██████▊   | 20500/30000 [3:18:01<1:21:18,  1.95it/s]

{'loss': 20.7473, 'grad_norm': 49.84866714477539, 'learning_rate': 1.7614814814814814e-05, 'epoch': 0.68}


 68%|██████▊   | 20525/30000 [3:18:13<1:20:49,  1.95it/s]

{'loss': 18.9413, 'grad_norm': 66.57857513427734, 'learning_rate': 1.756851851851852e-05, 'epoch': 0.68}


 68%|██████▊   | 20550/30000 [3:18:25<1:04:08,  2.46it/s]

{'loss': 15.8861, 'grad_norm': 16.493112564086914, 'learning_rate': 1.7522222222222224e-05, 'epoch': 0.69}


 69%|██████▊   | 20575/30000 [3:18:38<1:22:10,  1.91it/s]

{'loss': 19.3845, 'grad_norm': 67.15777587890625, 'learning_rate': 1.7475925925925926e-05, 'epoch': 0.69}


 69%|██████▊   | 20600/30000 [3:18:50<1:18:45,  1.99it/s]

{'loss': 20.9068, 'grad_norm': 54.91962432861328, 'learning_rate': 1.7429629629629628e-05, 'epoch': 0.69}


 69%|██████▉   | 20625/30000 [3:19:02<1:01:02,  2.56it/s]

{'loss': 19.5669, 'grad_norm': 36.9579963684082, 'learning_rate': 1.7383333333333333e-05, 'epoch': 0.69}


 69%|██████▉   | 20650/30000 [3:19:14<1:19:12,  1.97it/s]

{'loss': 13.8237, 'grad_norm': 33.31592559814453, 'learning_rate': 1.733703703703704e-05, 'epoch': 0.69}


 69%|██████▉   | 20675/30000 [3:19:26<1:15:02,  2.07it/s]

{'loss': 13.728, 'grad_norm': 28.86334228515625, 'learning_rate': 1.729074074074074e-05, 'epoch': 0.69}


 69%|██████▉   | 20700/30000 [3:19:37<1:06:12,  2.34it/s]

{'loss': 10.7209, 'grad_norm': 14.317514419555664, 'learning_rate': 1.7244444444444446e-05, 'epoch': 0.69}


 69%|██████▉   | 20725/30000 [3:19:49<1:19:38,  1.94it/s]

{'loss': 17.394, 'grad_norm': 63.996883392333984, 'learning_rate': 1.7198148148148148e-05, 'epoch': 0.69}


 69%|██████▉   | 20750/30000 [3:20:00<1:09:09,  2.23it/s]

{'loss': 14.2826, 'grad_norm': 20.70207405090332, 'learning_rate': 1.7151851851851853e-05, 'epoch': 0.69}


 69%|██████▉   | 20775/30000 [3:20:12<1:14:19,  2.07it/s]

{'loss': 17.1483, 'grad_norm': 34.017330169677734, 'learning_rate': 1.7105555555555555e-05, 'epoch': 0.69}


 69%|██████▉   | 20800/30000 [3:20:24<1:16:39,  2.00it/s]

{'loss': 14.4772, 'grad_norm': 39.20464324951172, 'learning_rate': 1.705925925925926e-05, 'epoch': 0.69}


 69%|██████▉   | 20825/30000 [3:20:37<1:22:31,  1.85it/s]

{'loss': 22.7183, 'grad_norm': 60.20503616333008, 'learning_rate': 1.7012962962962963e-05, 'epoch': 0.69}


 70%|██████▉   | 20850/30000 [3:20:49<1:20:18,  1.90it/s]

{'loss': 19.3652, 'grad_norm': 79.72776794433594, 'learning_rate': 1.6966666666666668e-05, 'epoch': 0.69}


 70%|██████▉   | 20875/30000 [3:21:00<1:02:17,  2.44it/s]

{'loss': 13.9207, 'grad_norm': 40.78715515136719, 'learning_rate': 1.6920370370370373e-05, 'epoch': 0.7}


 70%|██████▉   | 20900/30000 [3:21:13<1:20:10,  1.89it/s]

{'loss': 23.0123, 'grad_norm': 37.25870132446289, 'learning_rate': 1.6874074074074075e-05, 'epoch': 0.7}


 70%|██████▉   | 20925/30000 [3:21:24<1:17:30,  1.95it/s]

{'loss': 13.7021, 'grad_norm': 49.886775970458984, 'learning_rate': 1.6827777777777777e-05, 'epoch': 0.7}


 70%|██████▉   | 20950/30000 [3:21:36<1:15:57,  1.99it/s]

{'loss': 16.7265, 'grad_norm': 47.51485824584961, 'learning_rate': 1.6781481481481483e-05, 'epoch': 0.7}


 70%|██████▉   | 20975/30000 [3:21:48<1:16:26,  1.97it/s]

{'loss': 14.6546, 'grad_norm': 29.101802825927734, 'learning_rate': 1.6735185185185188e-05, 'epoch': 0.7}


 70%|███████   | 21000/30000 [3:22:00<1:14:23,  2.02it/s]

{'loss': 22.3233, 'grad_norm': 52.2928352355957, 'learning_rate': 1.668888888888889e-05, 'epoch': 0.7}


 70%|███████   | 21025/30000 [3:22:12<1:00:42,  2.46it/s]

{'loss': 16.079, 'grad_norm': 23.10733413696289, 'learning_rate': 1.6642592592592592e-05, 'epoch': 0.7}


 70%|███████   | 21050/30000 [3:22:24<1:16:34,  1.95it/s]

{'loss': 19.7622, 'grad_norm': 26.67542266845703, 'learning_rate': 1.6596296296296297e-05, 'epoch': 0.7}


 70%|███████   | 21075/30000 [3:22:36<1:10:04,  2.12it/s]

{'loss': 13.7737, 'grad_norm': 18.505739212036133, 'learning_rate': 1.6550000000000002e-05, 'epoch': 0.7}


 70%|███████   | 21100/30000 [3:22:47<57:20,  2.59it/s]  

{'loss': 20.9856, 'grad_norm': 45.782447814941406, 'learning_rate': 1.6503703703703704e-05, 'epoch': 0.7}


 70%|███████   | 21125/30000 [3:23:00<1:18:02,  1.90it/s]

{'loss': 16.2693, 'grad_norm': 61.980323791503906, 'learning_rate': 1.6457407407407406e-05, 'epoch': 0.7}


 70%|███████   | 21150/30000 [3:23:12<1:17:21,  1.91it/s]

{'loss': 15.1412, 'grad_norm': 27.652223587036133, 'learning_rate': 1.6411111111111112e-05, 'epoch': 0.7}


 71%|███████   | 21175/30000 [3:23:24<1:05:31,  2.24it/s]

{'loss': 12.8174, 'grad_norm': 34.70598602294922, 'learning_rate': 1.6364814814814817e-05, 'epoch': 0.71}


 71%|███████   | 21200/30000 [3:23:36<1:14:42,  1.96it/s]

{'loss': 21.9371, 'grad_norm': 54.226806640625, 'learning_rate': 1.631851851851852e-05, 'epoch': 0.71}


 71%|███████   | 21225/30000 [3:23:49<1:14:29,  1.96it/s]

{'loss': 21.2198, 'grad_norm': 54.011741638183594, 'learning_rate': 1.6274074074074074e-05, 'epoch': 0.71}


 71%|███████   | 21250/30000 [3:24:01<1:09:38,  2.09it/s]

{'loss': 15.6105, 'grad_norm': 44.40178680419922, 'learning_rate': 1.6227777777777776e-05, 'epoch': 0.71}


 71%|███████   | 21275/30000 [3:24:13<1:16:35,  1.90it/s]

{'loss': 16.7999, 'grad_norm': 57.99754333496094, 'learning_rate': 1.6181481481481485e-05, 'epoch': 0.71}


 71%|███████   | 21300/30000 [3:24:26<1:12:28,  2.00it/s]

{'loss': 20.3968, 'grad_norm': 43.44960403442383, 'learning_rate': 1.6135185185185187e-05, 'epoch': 0.71}


 71%|███████   | 21325/30000 [3:24:38<1:12:03,  2.01it/s]

{'loss': 15.5594, 'grad_norm': 22.41855812072754, 'learning_rate': 1.608888888888889e-05, 'epoch': 0.71}


 71%|███████   | 21350/30000 [3:24:50<58:48,  2.45it/s]  

{'loss': 25.7705, 'grad_norm': 28.850034713745117, 'learning_rate': 1.604259259259259e-05, 'epoch': 0.71}


 71%|███████▏  | 21375/30000 [3:25:03<1:15:12,  1.91it/s]

{'loss': 18.556, 'grad_norm': 28.29561996459961, 'learning_rate': 1.59962962962963e-05, 'epoch': 0.71}


 71%|███████▏  | 21400/30000 [3:25:15<1:01:54,  2.32it/s]

{'loss': 14.512, 'grad_norm': 31.95958709716797, 'learning_rate': 1.595e-05, 'epoch': 0.71}


 71%|███████▏  | 21425/30000 [3:25:27<1:10:02,  2.04it/s]

{'loss': 18.1935, 'grad_norm': 50.51464080810547, 'learning_rate': 1.5903703703703703e-05, 'epoch': 0.71}


 72%|███████▏  | 21450/30000 [3:25:38<57:29,  2.48it/s]  

{'loss': 16.0363, 'grad_norm': 40.5750732421875, 'learning_rate': 1.5857407407407405e-05, 'epoch': 0.71}


 72%|███████▏  | 21475/30000 [3:25:51<1:12:58,  1.95it/s]

{'loss': 20.1566, 'grad_norm': 30.127674102783203, 'learning_rate': 1.5811111111111114e-05, 'epoch': 0.72}


 72%|███████▏  | 21500/30000 [3:26:03<1:00:05,  2.36it/s]

{'loss': 19.1114, 'grad_norm': 24.27032470703125, 'learning_rate': 1.5764814814814816e-05, 'epoch': 0.72}


 72%|███████▏  | 21525/30000 [3:26:16<1:13:06,  1.93it/s]

{'loss': 17.0676, 'grad_norm': 41.250389099121094, 'learning_rate': 1.5718518518518518e-05, 'epoch': 0.72}


 72%|███████▏  | 21550/30000 [3:26:28<1:14:56,  1.88it/s]

{'loss': 20.0152, 'grad_norm': 71.72754669189453, 'learning_rate': 1.5672222222222223e-05, 'epoch': 0.72}


 72%|███████▏  | 21575/30000 [3:26:40<1:11:12,  1.97it/s]

{'loss': 17.8318, 'grad_norm': 33.635719299316406, 'learning_rate': 1.562592592592593e-05, 'epoch': 0.72}


 72%|███████▏  | 21600/30000 [3:26:53<1:09:14,  2.02it/s]

{'loss': 17.6766, 'grad_norm': 37.76400375366211, 'learning_rate': 1.557962962962963e-05, 'epoch': 0.72}


 72%|███████▏  | 21625/30000 [3:27:05<1:10:54,  1.97it/s]

{'loss': 13.9667, 'grad_norm': 43.59395980834961, 'learning_rate': 1.5533333333333333e-05, 'epoch': 0.72}


 72%|███████▏  | 21650/30000 [3:27:16<1:11:37,  1.94it/s]

{'loss': 17.314, 'grad_norm': 60.6461181640625, 'learning_rate': 1.5487037037037038e-05, 'epoch': 0.72}


 72%|███████▏  | 21675/30000 [3:27:28<1:10:18,  1.97it/s]

{'loss': 16.536, 'grad_norm': 48.798152923583984, 'learning_rate': 1.5440740740740743e-05, 'epoch': 0.72}


 72%|███████▏  | 21700/30000 [3:27:40<59:43,  2.32it/s]  

{'loss': 20.1991, 'grad_norm': 55.78223419189453, 'learning_rate': 1.5394444444444445e-05, 'epoch': 0.72}


 72%|███████▏  | 21725/30000 [3:27:52<55:16,  2.50it/s]  

{'loss': 17.2901, 'grad_norm': 33.12871170043945, 'learning_rate': 1.5348148148148147e-05, 'epoch': 0.72}


 72%|███████▎  | 21750/30000 [3:28:04<1:09:30,  1.98it/s]

{'loss': 20.2861, 'grad_norm': 37.84981918334961, 'learning_rate': 1.5301851851851852e-05, 'epoch': 0.72}


 73%|███████▎  | 21775/30000 [3:28:17<1:11:08,  1.93it/s]

{'loss': 23.9515, 'grad_norm': 47.02854537963867, 'learning_rate': 1.5255555555555556e-05, 'epoch': 0.73}


 73%|███████▎  | 21800/30000 [3:28:29<1:07:38,  2.02it/s]

{'loss': 17.1261, 'grad_norm': 54.937320709228516, 'learning_rate': 1.520925925925926e-05, 'epoch': 0.73}


 73%|███████▎  | 21825/30000 [3:28:42<1:09:26,  1.96it/s]

{'loss': 18.4876, 'grad_norm': 43.405574798583984, 'learning_rate': 1.5162962962962965e-05, 'epoch': 0.73}


 73%|███████▎  | 21850/30000 [3:28:54<1:07:08,  2.02it/s]

{'loss': 14.7853, 'grad_norm': 43.00304412841797, 'learning_rate': 1.5116666666666667e-05, 'epoch': 0.73}


 73%|███████▎  | 21875/30000 [3:29:05<59:48,  2.26it/s]  

{'loss': 16.3315, 'grad_norm': 28.32307243347168, 'learning_rate': 1.507037037037037e-05, 'epoch': 0.73}


 73%|███████▎  | 21900/30000 [3:29:17<1:08:56,  1.96it/s]

{'loss': 16.8806, 'grad_norm': 56.48855972290039, 'learning_rate': 1.5024074074074074e-05, 'epoch': 0.73}


 73%|███████▎  | 21925/30000 [3:29:29<57:47,  2.33it/s]  

{'loss': 15.3288, 'grad_norm': 37.705528259277344, 'learning_rate': 1.497777777777778e-05, 'epoch': 0.73}


 73%|███████▎  | 21950/30000 [3:29:42<1:08:26,  1.96it/s]

{'loss': 19.7437, 'grad_norm': 55.54335021972656, 'learning_rate': 1.4931481481481482e-05, 'epoch': 0.73}


 73%|███████▎  | 21975/30000 [3:29:53<56:07,  2.38it/s]  

{'loss': 14.0562, 'grad_norm': 22.094337463378906, 'learning_rate': 1.4885185185185185e-05, 'epoch': 0.73}


 73%|███████▎  | 22000/30000 [3:30:06<1:08:44,  1.94it/s]

{'loss': 15.6502, 'grad_norm': 60.688236236572266, 'learning_rate': 1.4838888888888889e-05, 'epoch': 0.73}


Too many dataloader workers: 10 (max is dataset.num_shards=3). Stopping 7 dataloader workers.
 73%|███████▎  | 22000/30000 [3:33:04<1:08:44,  1.94it/s]

{'eval_loss': 28.050811767578125, 'eval_runtime': 178.5884, 'eval_samples_per_second': 21.793, 'eval_steps_per_second': 5.448, 'epoch': 0.73}


 73%|███████▎  | 22025/30000 [3:33:25<1:06:29,  2.00it/s]  

{'loss': 18.9865, 'grad_norm': 25.631311416625977, 'learning_rate': 1.4792592592592594e-05, 'epoch': 0.73}


 74%|███████▎  | 22050/30000 [3:33:37<1:12:52,  1.82it/s]

{'loss': 13.0815, 'grad_norm': 44.635555267333984, 'learning_rate': 1.4746296296296296e-05, 'epoch': 0.73}


 74%|███████▎  | 22075/30000 [3:33:49<56:55,  2.32it/s]  

{'loss': 9.8228, 'grad_norm': 35.853675842285156, 'learning_rate': 1.47e-05, 'epoch': 0.74}


 74%|███████▎  | 22100/30000 [3:34:02<1:00:28,  2.18it/s]

{'loss': 18.0548, 'grad_norm': 43.60686492919922, 'learning_rate': 1.4653703703703705e-05, 'epoch': 0.74}


 74%|███████▍  | 22125/30000 [3:34:13<54:58,  2.39it/s]  

{'loss': 11.5298, 'grad_norm': 26.33736801147461, 'learning_rate': 1.4607407407407409e-05, 'epoch': 0.74}


 74%|███████▍  | 22150/30000 [3:34:24<54:53,  2.38it/s]  

{'loss': 16.5568, 'grad_norm': 27.977462768554688, 'learning_rate': 1.456111111111111e-05, 'epoch': 0.74}


 74%|███████▍  | 22175/30000 [3:34:36<56:55,  2.29it/s]  

{'loss': 24.0427, 'grad_norm': 25.860811233520508, 'learning_rate': 1.4514814814814814e-05, 'epoch': 0.74}


 74%|███████▍  | 22200/30000 [3:34:49<1:03:54,  2.03it/s]

{'loss': 18.734, 'grad_norm': 61.92802810668945, 'learning_rate': 1.446851851851852e-05, 'epoch': 0.74}


 74%|███████▍  | 22225/30000 [3:35:00<52:21,  2.48it/s]  

{'loss': 22.1948, 'grad_norm': 38.871883392333984, 'learning_rate': 1.4422222222222223e-05, 'epoch': 0.74}


 74%|███████▍  | 22250/30000 [3:35:12<1:02:46,  2.06it/s]

{'loss': 20.109, 'grad_norm': 67.751220703125, 'learning_rate': 1.4375925925925925e-05, 'epoch': 0.74}


 74%|███████▍  | 22275/30000 [3:35:25<1:05:10,  1.98it/s]

{'loss': 17.6512, 'grad_norm': 63.80775451660156, 'learning_rate': 1.4329629629629629e-05, 'epoch': 0.74}


 74%|███████▍  | 22300/30000 [3:35:37<1:06:07,  1.94it/s]

{'loss': 25.6143, 'grad_norm': 61.06571960449219, 'learning_rate': 1.4283333333333334e-05, 'epoch': 0.74}


 74%|███████▍  | 22325/30000 [3:35:49<1:02:37,  2.04it/s]

{'loss': 22.818, 'grad_norm': 42.77422332763672, 'learning_rate': 1.4237037037037038e-05, 'epoch': 0.74}


 74%|███████▍  | 22350/30000 [3:36:01<1:07:31,  1.89it/s]

{'loss': 13.6463, 'grad_norm': 45.16246795654297, 'learning_rate': 1.419074074074074e-05, 'epoch': 0.74}


 75%|███████▍  | 22375/30000 [3:36:14<1:05:17,  1.95it/s]

{'loss': 23.3016, 'grad_norm': 32.8688850402832, 'learning_rate': 1.4144444444444447e-05, 'epoch': 0.75}


 75%|███████▍  | 22400/30000 [3:36:27<1:05:08,  1.94it/s]

{'loss': 22.2421, 'grad_norm': 56.19773864746094, 'learning_rate': 1.4098148148148149e-05, 'epoch': 0.75}


 75%|███████▍  | 22425/30000 [3:36:40<1:04:45,  1.95it/s]

{'loss': 21.7653, 'grad_norm': 47.5322265625, 'learning_rate': 1.4051851851851853e-05, 'epoch': 0.75}


 75%|███████▍  | 22450/30000 [3:36:53<1:07:59,  1.85it/s]

{'loss': 19.7672, 'grad_norm': 55.10342025756836, 'learning_rate': 1.4005555555555555e-05, 'epoch': 0.75}


 75%|███████▍  | 22475/30000 [3:37:05<1:03:23,  1.98it/s]

{'loss': 22.8149, 'grad_norm': 40.750099182128906, 'learning_rate': 1.3959259259259262e-05, 'epoch': 0.75}


 75%|███████▌  | 22500/30000 [3:37:17<1:03:02,  1.98it/s]

{'loss': 20.3808, 'grad_norm': 46.77142333984375, 'learning_rate': 1.3912962962962964e-05, 'epoch': 0.75}


 75%|███████▌  | 22525/30000 [3:37:30<1:03:45,  1.95it/s]

{'loss': 27.2238, 'grad_norm': 52.19534683227539, 'learning_rate': 1.3866666666666667e-05, 'epoch': 0.75}


 75%|███████▌  | 22550/30000 [3:37:42<1:03:52,  1.94it/s]

{'loss': 18.2754, 'grad_norm': 55.74171829223633, 'learning_rate': 1.3820370370370373e-05, 'epoch': 0.75}


 75%|███████▌  | 22575/30000 [3:37:54<54:00,  2.29it/s]  

{'loss': 18.7503, 'grad_norm': 25.774396896362305, 'learning_rate': 1.3774074074074076e-05, 'epoch': 0.75}


 75%|███████▌  | 22600/30000 [3:38:06<1:01:59,  1.99it/s]

{'loss': 14.0832, 'grad_norm': 117.31138610839844, 'learning_rate': 1.3727777777777778e-05, 'epoch': 0.75}


 75%|███████▌  | 22625/30000 [3:38:18<54:52,  2.24it/s]  

{'loss': 14.7916, 'grad_norm': 45.88684844970703, 'learning_rate': 1.3681481481481482e-05, 'epoch': 0.75}


 76%|███████▌  | 22650/30000 [3:38:30<1:01:36,  1.99it/s]

{'loss': 15.1761, 'grad_norm': 32.40635681152344, 'learning_rate': 1.3635185185185187e-05, 'epoch': 0.76}


 76%|███████▌  | 22675/30000 [3:38:42<50:59,  2.39it/s]  

{'loss': 17.3723, 'grad_norm': 23.164527893066406, 'learning_rate': 1.358888888888889e-05, 'epoch': 0.76}


 76%|███████▌  | 22700/30000 [3:38:54<1:01:19,  1.98it/s]

{'loss': 21.2027, 'grad_norm': 28.1663875579834, 'learning_rate': 1.3542592592592593e-05, 'epoch': 0.76}


 76%|███████▌  | 22725/30000 [3:39:06<52:44,  2.30it/s]  

{'loss': 21.761, 'grad_norm': 29.269203186035156, 'learning_rate': 1.3496296296296296e-05, 'epoch': 0.76}


 76%|███████▌  | 22750/30000 [3:39:18<1:00:09,  2.01it/s]

{'loss': 16.9782, 'grad_norm': 43.2945671081543, 'learning_rate': 1.3450000000000002e-05, 'epoch': 0.76}


 76%|███████▌  | 22775/30000 [3:39:30<1:01:31,  1.96it/s]

{'loss': 17.7025, 'grad_norm': 62.774658203125, 'learning_rate': 1.3403703703703704e-05, 'epoch': 0.76}


 76%|███████▌  | 22800/30000 [3:39:43<1:01:16,  1.96it/s]

{'loss': 19.1013, 'grad_norm': 53.970436096191406, 'learning_rate': 1.3357407407407407e-05, 'epoch': 0.76}


 76%|███████▌  | 22825/30000 [3:39:55<1:00:04,  1.99it/s]

{'loss': 19.1266, 'grad_norm': 33.10505294799805, 'learning_rate': 1.3311111111111113e-05, 'epoch': 0.76}


 76%|███████▌  | 22850/30000 [3:40:06<51:16,  2.32it/s]  

{'loss': 12.9527, 'grad_norm': 25.56682586669922, 'learning_rate': 1.3264814814814816e-05, 'epoch': 0.76}


 76%|███████▋  | 22875/30000 [3:40:18<58:47,  2.02it/s]  

{'loss': 18.6356, 'grad_norm': 57.499900817871094, 'learning_rate': 1.3218518518518518e-05, 'epoch': 0.76}


 76%|███████▋  | 22900/30000 [3:40:30<49:17,  2.40it/s]  

{'loss': 13.3547, 'grad_norm': 74.62550354003906, 'learning_rate': 1.3172222222222222e-05, 'epoch': 0.76}


 76%|███████▋  | 22925/30000 [3:40:42<46:38,  2.53it/s]  

{'loss': 14.03, 'grad_norm': 54.17427444458008, 'learning_rate': 1.3125925925925927e-05, 'epoch': 0.76}


 76%|███████▋  | 22950/30000 [3:40:54<56:55,  2.06it/s]  

{'loss': 19.7143, 'grad_norm': 54.823081970214844, 'learning_rate': 1.307962962962963e-05, 'epoch': 0.77}


 77%|███████▋  | 22975/30000 [3:41:05<51:33,  2.27it/s]

{'loss': 16.0825, 'grad_norm': 25.954837799072266, 'learning_rate': 1.3033333333333333e-05, 'epoch': 0.77}


 77%|███████▋  | 23000/30000 [3:41:17<1:01:36,  1.89it/s]

{'loss': 17.8153, 'grad_norm': 41.419918060302734, 'learning_rate': 1.2987037037037036e-05, 'epoch': 0.77}


 77%|███████▋  | 23025/30000 [3:41:30<59:11,  1.96it/s]  

{'loss': 22.5777, 'grad_norm': 53.71919250488281, 'learning_rate': 1.2940740740740742e-05, 'epoch': 0.77}


 77%|███████▋  | 23050/30000 [3:41:41<58:48,  1.97it/s]  

{'loss': 17.6921, 'grad_norm': 38.33465576171875, 'learning_rate': 1.2894444444444445e-05, 'epoch': 0.77}


 77%|███████▋  | 23075/30000 [3:41:53<49:55,  2.31it/s]  

{'loss': 18.4561, 'grad_norm': 42.08399200439453, 'learning_rate': 1.2848148148148147e-05, 'epoch': 0.77}


 77%|███████▋  | 23100/30000 [3:42:05<58:42,  1.96it/s]  

{'loss': 21.8872, 'grad_norm': 34.39888000488281, 'learning_rate': 1.2801851851851854e-05, 'epoch': 0.77}


 77%|███████▋  | 23125/30000 [3:42:17<47:50,  2.39it/s]  

{'loss': 18.5143, 'grad_norm': 38.42448806762695, 'learning_rate': 1.2755555555555556e-05, 'epoch': 0.77}


 77%|███████▋  | 23150/30000 [3:42:29<48:25,  2.36it/s]  

{'loss': 17.6314, 'grad_norm': 43.187774658203125, 'learning_rate': 1.270925925925926e-05, 'epoch': 0.77}


 77%|███████▋  | 23175/30000 [3:42:41<56:38,  2.01it/s]

{'loss': 18.5652, 'grad_norm': 37.733673095703125, 'learning_rate': 1.2662962962962962e-05, 'epoch': 0.77}


 77%|███████▋  | 23200/30000 [3:42:53<58:06,  1.95it/s]

{'loss': 17.8893, 'grad_norm': 53.13877487182617, 'learning_rate': 1.2616666666666669e-05, 'epoch': 0.77}


 77%|███████▋  | 23225/30000 [3:43:06<58:10,  1.94it/s]  

{'loss': 22.5271, 'grad_norm': 33.003883361816406, 'learning_rate': 1.2570370370370371e-05, 'epoch': 0.77}


 78%|███████▊  | 23250/30000 [3:43:19<59:50,  1.88it/s]  

{'loss': 14.6049, 'grad_norm': 37.46840286254883, 'learning_rate': 1.2524074074074075e-05, 'epoch': 0.78}


 78%|███████▊  | 23275/30000 [3:43:30<57:39,  1.94it/s]

{'loss': 18.2181, 'grad_norm': 40.862205505371094, 'learning_rate': 1.2477777777777778e-05, 'epoch': 0.78}


 78%|███████▊  | 23300/30000 [3:43:43<50:20,  2.22it/s]

{'loss': 24.4175, 'grad_norm': 57.89690017700195, 'learning_rate': 1.2431481481481482e-05, 'epoch': 0.78}


 78%|███████▊  | 23325/30000 [3:43:55<56:34,  1.97it/s]

{'loss': 20.8306, 'grad_norm': 54.577293395996094, 'learning_rate': 1.2385185185185186e-05, 'epoch': 0.78}


 78%|███████▊  | 23350/30000 [3:44:07<56:34,  1.96it/s]

{'loss': 20.9307, 'grad_norm': 47.76253128051758, 'learning_rate': 1.233888888888889e-05, 'epoch': 0.78}


 78%|███████▊  | 23375/30000 [3:44:18<48:10,  2.29it/s]

{'loss': 15.3804, 'grad_norm': 21.199138641357422, 'learning_rate': 1.2292592592592593e-05, 'epoch': 0.78}


 78%|███████▊  | 23400/30000 [3:44:31<55:25,  1.98it/s]

{'loss': 24.4984, 'grad_norm': 26.022449493408203, 'learning_rate': 1.2246296296296298e-05, 'epoch': 0.78}


 78%|███████▊  | 23425/30000 [3:44:44<56:00,  1.96it/s]

{'loss': 25.1333, 'grad_norm': 31.941238403320312, 'learning_rate': 1.22e-05, 'epoch': 0.78}


 78%|███████▊  | 23450/30000 [3:44:55<48:42,  2.24it/s]

{'loss': 24.3349, 'grad_norm': 32.32821273803711, 'learning_rate': 1.2153703703703705e-05, 'epoch': 0.78}


 78%|███████▊  | 23475/30000 [3:45:07<50:45,  2.14it/s]

{'loss': 22.3258, 'grad_norm': 44.556514739990234, 'learning_rate': 1.2107407407407407e-05, 'epoch': 0.78}


 78%|███████▊  | 23500/30000 [3:45:19<55:43,  1.94it/s]

{'loss': 16.568, 'grad_norm': 46.21636962890625, 'learning_rate': 1.2061111111111113e-05, 'epoch': 0.78}


 78%|███████▊  | 23525/30000 [3:45:31<46:39,  2.31it/s]

{'loss': 24.235, 'grad_norm': 31.145578384399414, 'learning_rate': 1.2014814814814815e-05, 'epoch': 0.78}


 78%|███████▊  | 23550/30000 [3:45:43<52:14,  2.06it/s]

{'loss': 23.8834, 'grad_norm': 41.027225494384766, 'learning_rate': 1.196851851851852e-05, 'epoch': 0.79}


 79%|███████▊  | 23575/30000 [3:45:56<56:09,  1.91it/s]

{'loss': 20.3553, 'grad_norm': 90.04823303222656, 'learning_rate': 1.1922222222222222e-05, 'epoch': 0.79}


 79%|███████▊  | 23600/30000 [3:46:09<56:46,  1.88it/s]

{'loss': 22.6577, 'grad_norm': 32.67070007324219, 'learning_rate': 1.1875925925925927e-05, 'epoch': 0.79}


 79%|███████▉  | 23625/30000 [3:46:20<46:07,  2.30it/s]

{'loss': 18.0836, 'grad_norm': 28.650941848754883, 'learning_rate': 1.1829629629629631e-05, 'epoch': 0.79}


 79%|███████▉  | 23650/30000 [3:46:32<46:09,  2.29it/s]

{'loss': 17.6389, 'grad_norm': 27.635902404785156, 'learning_rate': 1.1783333333333333e-05, 'epoch': 0.79}


 79%|███████▉  | 23675/30000 [3:46:44<54:31,  1.93it/s]

{'loss': 19.6123, 'grad_norm': 40.06486892700195, 'learning_rate': 1.1737037037037038e-05, 'epoch': 0.79}


 79%|███████▉  | 23700/30000 [3:46:56<52:42,  1.99it/s]

{'loss': 18.6523, 'grad_norm': 48.787696838378906, 'learning_rate': 1.169074074074074e-05, 'epoch': 0.79}


 79%|███████▉  | 23725/30000 [3:47:08<53:07,  1.97it/s]

{'loss': 20.7371, 'grad_norm': 45.277496337890625, 'learning_rate': 1.1644444444444446e-05, 'epoch': 0.79}


 79%|███████▉  | 23750/30000 [3:47:21<52:39,  1.98it/s]

{'loss': 23.2376, 'grad_norm': 59.14219665527344, 'learning_rate': 1.1598148148148147e-05, 'epoch': 0.79}


 79%|███████▉  | 23775/30000 [3:47:33<51:23,  2.02it/s]

{'loss': 12.7519, 'grad_norm': 30.130226135253906, 'learning_rate': 1.1551851851851853e-05, 'epoch': 0.79}


 79%|███████▉  | 23800/30000 [3:47:45<51:57,  1.99it/s]

{'loss': 16.8658, 'grad_norm': 27.237516403198242, 'learning_rate': 1.1505555555555555e-05, 'epoch': 0.79}


 79%|███████▉  | 23825/30000 [3:47:58<55:22,  1.86it/s]

{'loss': 26.3299, 'grad_norm': 33.73141098022461, 'learning_rate': 1.145925925925926e-05, 'epoch': 0.79}


 80%|███████▉  | 23850/30000 [3:48:10<46:03,  2.23it/s]

{'loss': 21.9951, 'grad_norm': 38.057071685791016, 'learning_rate': 1.1412962962962964e-05, 'epoch': 0.8}


 80%|███████▉  | 23875/30000 [3:48:22<52:30,  1.94it/s]

{'loss': 23.8992, 'grad_norm': 48.49127197265625, 'learning_rate': 1.1366666666666667e-05, 'epoch': 0.8}


 80%|███████▉  | 23900/30000 [3:48:35<46:20,  2.19it/s]

{'loss': 17.1226, 'grad_norm': 48.3300666809082, 'learning_rate': 1.1320370370370371e-05, 'epoch': 0.8}


 80%|███████▉  | 23925/30000 [3:48:47<43:03,  2.35it/s]

{'loss': 15.3206, 'grad_norm': 45.96845626831055, 'learning_rate': 1.1274074074074075e-05, 'epoch': 0.8}


 80%|███████▉  | 23950/30000 [3:48:59<47:25,  2.13it/s]

{'loss': 18.2708, 'grad_norm': 31.64129066467285, 'learning_rate': 1.1227777777777778e-05, 'epoch': 0.8}


 80%|███████▉  | 23975/30000 [3:49:10<43:28,  2.31it/s]

{'loss': 19.4195, 'grad_norm': 47.51108169555664, 'learning_rate': 1.1181481481481482e-05, 'epoch': 0.8}


 80%|████████  | 24000/30000 [3:49:23<53:06,  1.88it/s]

{'loss': 19.5498, 'grad_norm': 34.72029113769531, 'learning_rate': 1.1135185185185186e-05, 'epoch': 0.8}


Too many dataloader workers: 10 (max is dataset.num_shards=3). Stopping 7 dataloader workers.
 80%|████████  | 24000/30000 [3:52:28<53:06,  1.88it/s]

{'eval_loss': 28.090496063232422, 'eval_runtime': 185.2457, 'eval_samples_per_second': 21.01, 'eval_steps_per_second': 5.252, 'epoch': 0.8}


 80%|████████  | 24025/30000 [3:52:49<52:32,  1.90it/s]   

{'loss': 24.6628, 'grad_norm': 51.62433624267578, 'learning_rate': 1.108888888888889e-05, 'epoch': 0.8}


 80%|████████  | 24050/30000 [3:53:01<42:54,  2.31it/s]

{'loss': 21.4314, 'grad_norm': 21.922245025634766, 'learning_rate': 1.1042592592592593e-05, 'epoch': 0.8}


 80%|████████  | 24075/30000 [3:53:14<50:15,  1.96it/s]

{'loss': 19.4517, 'grad_norm': 51.22739028930664, 'learning_rate': 1.0996296296296297e-05, 'epoch': 0.8}


 80%|████████  | 24100/30000 [3:53:26<44:51,  2.19it/s]

{'loss': 20.6666, 'grad_norm': 35.249267578125, 'learning_rate': 1.095e-05, 'epoch': 0.8}


 80%|████████  | 24125/30000 [3:53:39<44:58,  2.18it/s]

{'loss': 17.6907, 'grad_norm': 56.40427780151367, 'learning_rate': 1.0903703703703706e-05, 'epoch': 0.8}


 80%|████████  | 24150/30000 [3:53:51<49:28,  1.97it/s]

{'loss': 13.2737, 'grad_norm': 34.82954788208008, 'learning_rate': 1.0857407407407407e-05, 'epoch': 0.81}


 81%|████████  | 24175/30000 [3:54:03<42:29,  2.29it/s]

{'loss': 18.5045, 'grad_norm': 15.101320266723633, 'learning_rate': 1.0811111111111113e-05, 'epoch': 0.81}


 81%|████████  | 24200/30000 [3:54:16<50:47,  1.90it/s]

{'loss': 22.6767, 'grad_norm': 58.77570724487305, 'learning_rate': 1.0764814814814815e-05, 'epoch': 0.81}


 81%|████████  | 24225/30000 [3:54:29<48:46,  1.97it/s]

{'loss': 24.2702, 'grad_norm': 41.191436767578125, 'learning_rate': 1.071851851851852e-05, 'epoch': 0.81}


 81%|████████  | 24250/30000 [3:54:42<49:46,  1.93it/s]

{'loss': 22.7348, 'grad_norm': 56.009185791015625, 'learning_rate': 1.0672222222222222e-05, 'epoch': 0.81}


 81%|████████  | 24275/30000 [3:54:54<41:26,  2.30it/s]

{'loss': 20.5037, 'grad_norm': 38.35136795043945, 'learning_rate': 1.0625925925925927e-05, 'epoch': 0.81}


 81%|████████  | 24300/30000 [3:55:06<49:25,  1.92it/s]

{'loss': 22.557, 'grad_norm': 56.48548889160156, 'learning_rate': 1.057962962962963e-05, 'epoch': 0.81}


 81%|████████  | 24325/30000 [3:55:19<49:53,  1.90it/s]

{'loss': 19.505, 'grad_norm': 61.93001174926758, 'learning_rate': 1.0533333333333335e-05, 'epoch': 0.81}


 81%|████████  | 24350/30000 [3:55:31<46:22,  2.03it/s]

{'loss': 20.1356, 'grad_norm': 51.80267333984375, 'learning_rate': 1.0487037037037037e-05, 'epoch': 0.81}


 81%|████████▏ | 24375/30000 [3:55:44<48:56,  1.92it/s]

{'loss': 23.2462, 'grad_norm': 62.26496505737305, 'learning_rate': 1.0440740740740742e-05, 'epoch': 0.81}


 81%|████████▏ | 24400/30000 [3:55:57<47:05,  1.98it/s]

{'loss': 20.2128, 'grad_norm': 52.78245162963867, 'learning_rate': 1.0394444444444446e-05, 'epoch': 0.81}


 81%|████████▏ | 24425/30000 [3:56:09<46:43,  1.99it/s]

{'loss': 20.7726, 'grad_norm': 144.40501403808594, 'learning_rate': 1.034814814814815e-05, 'epoch': 0.81}


 82%|████████▏ | 24450/30000 [3:56:21<47:32,  1.95it/s]

{'loss': 26.7574, 'grad_norm': 64.43579864501953, 'learning_rate': 1.0301851851851853e-05, 'epoch': 0.81}


 82%|████████▏ | 24475/30000 [3:56:34<45:40,  2.02it/s]

{'loss': 22.2218, 'grad_norm': 52.46772384643555, 'learning_rate': 1.0255555555555557e-05, 'epoch': 0.82}


 82%|████████▏ | 24500/30000 [3:56:46<45:53,  2.00it/s]

{'loss': 16.8664, 'grad_norm': 24.90394401550293, 'learning_rate': 1.020925925925926e-05, 'epoch': 0.82}


 82%|████████▏ | 24525/30000 [3:56:58<40:05,  2.28it/s]

{'loss': 25.6121, 'grad_norm': 25.559728622436523, 'learning_rate': 1.0162962962962964e-05, 'epoch': 0.82}


 82%|████████▏ | 24550/30000 [3:57:11<46:34,  1.95it/s]

{'loss': 23.1822, 'grad_norm': 54.748435974121094, 'learning_rate': 1.0116666666666667e-05, 'epoch': 0.82}


 82%|████████▏ | 24575/30000 [3:57:23<45:22,  1.99it/s]

{'loss': 21.4099, 'grad_norm': 29.757177352905273, 'learning_rate': 1.007037037037037e-05, 'epoch': 0.82}


 82%|████████▏ | 24600/30000 [3:57:36<48:29,  1.86it/s]

{'loss': 21.028, 'grad_norm': 35.74647521972656, 'learning_rate': 1.0024074074074075e-05, 'epoch': 0.82}


 82%|████████▏ | 24625/30000 [3:57:49<45:37,  1.96it/s]

{'loss': 22.4264, 'grad_norm': 67.64907836914062, 'learning_rate': 9.977777777777778e-06, 'epoch': 0.82}


 82%|████████▏ | 24650/30000 [3:58:02<45:11,  1.97it/s]

{'loss': 16.2961, 'grad_norm': 83.17414093017578, 'learning_rate': 9.931481481481482e-06, 'epoch': 0.82}


 82%|████████▏ | 24675/30000 [3:58:14<44:50,  1.98it/s]

{'loss': 19.7977, 'grad_norm': 38.09455490112305, 'learning_rate': 9.885185185185186e-06, 'epoch': 0.82}


 82%|████████▏ | 24700/30000 [3:58:26<39:50,  2.22it/s]

{'loss': 18.1136, 'grad_norm': 55.521461486816406, 'learning_rate': 9.83888888888889e-06, 'epoch': 0.82}


 82%|████████▏ | 24725/30000 [3:58:38<37:15,  2.36it/s]

{'loss': 18.2838, 'grad_norm': 26.5661678314209, 'learning_rate': 9.794444444444445e-06, 'epoch': 0.82}


 82%|████████▎ | 24750/30000 [3:58:51<47:05,  1.86it/s]

{'loss': 22.924, 'grad_norm': 39.06159973144531, 'learning_rate': 9.74814814814815e-06, 'epoch': 0.82}


 83%|████████▎ | 24762/30000 [3:58:56<36:37,  2.38it/s]'(ProtocolError('Connection aborted.', BrokenPipeError(32, 'Broken pipe')), '(Request ID: 50bbb993-8b5f-4a65-9af8-5b24e106cb1b)')' thrown while requesting GET https://huggingface.co/datasets/nguyenvulebinh/AVYT/resolve/e6c6bf6f40e698b82215d269cfc0a0d65a7a2372/vox2/vox2-dev-000005.tar
Retrying in 1s [Retry 1/5].
 83%|████████▎ | 24775/30000 [3:59:03<45:53,  1.90it/s]

{'loss': 16.2487, 'grad_norm': 56.32001876831055, 'learning_rate': 9.701851851851852e-06, 'epoch': 0.83}


 83%|████████▎ | 24800/30000 [3:59:16<45:48,  1.89it/s]

{'loss': 23.1098, 'grad_norm': 32.46828842163086, 'learning_rate': 9.655555555555557e-06, 'epoch': 0.83}


 83%|████████▎ | 24825/30000 [3:59:28<43:22,  1.99it/s]

{'loss': 16.6202, 'grad_norm': 33.41164016723633, 'learning_rate': 9.60925925925926e-06, 'epoch': 0.83}


 83%|████████▎ | 24850/30000 [3:59:41<42:54,  2.00it/s]

{'loss': 20.822, 'grad_norm': 34.6898193359375, 'learning_rate': 9.562962962962965e-06, 'epoch': 0.83}


 83%|████████▎ | 24875/30000 [3:59:53<39:19,  2.17it/s]

{'loss': 20.5724, 'grad_norm': 76.29960632324219, 'learning_rate': 9.516666666666666e-06, 'epoch': 0.83}


 83%|████████▎ | 24900/30000 [4:00:05<42:44,  1.99it/s]

{'loss': 17.8156, 'grad_norm': 88.61653900146484, 'learning_rate': 9.470370370370372e-06, 'epoch': 0.83}


 83%|████████▎ | 24925/30000 [4:00:18<43:07,  1.96it/s]

{'loss': 19.6513, 'grad_norm': 57.19171905517578, 'learning_rate': 9.424074074074074e-06, 'epoch': 0.83}


 83%|████████▎ | 24950/30000 [4:00:30<44:01,  1.91it/s]

{'loss': 21.6004, 'grad_norm': 38.87995910644531, 'learning_rate': 9.377777777777779e-06, 'epoch': 0.83}


 83%|████████▎ | 24975/30000 [4:00:42<38:31,  2.17it/s]

{'loss': 24.4699, 'grad_norm': 25.786701202392578, 'learning_rate': 9.331481481481481e-06, 'epoch': 0.83}


 83%|████████▎ | 25000/30000 [4:00:54<42:17,  1.97it/s]

{'loss': 25.9802, 'grad_norm': 43.880859375, 'learning_rate': 9.285185185185186e-06, 'epoch': 0.83}


 83%|████████▎ | 25025/30000 [4:01:06<34:18,  2.42it/s]

{'loss': 20.32, 'grad_norm': 55.38917922973633, 'learning_rate': 9.23888888888889e-06, 'epoch': 0.83}


 84%|████████▎ | 25050/30000 [4:01:17<41:32,  1.99it/s]

{'loss': 18.4808, 'grad_norm': 29.96344566345215, 'learning_rate': 9.192592592592594e-06, 'epoch': 0.83}


 84%|████████▎ | 25075/30000 [4:01:30<42:03,  1.95it/s]

{'loss': 20.5164, 'grad_norm': 24.666030883789062, 'learning_rate': 9.146296296296297e-06, 'epoch': 0.84}


 84%|████████▎ | 25100/30000 [4:01:42<36:51,  2.22it/s]

{'loss': 16.8048, 'grad_norm': 48.337852478027344, 'learning_rate': 9.100000000000001e-06, 'epoch': 0.84}


 84%|████████▍ | 25125/30000 [4:01:54<41:18,  1.97it/s]

{'loss': 23.0717, 'grad_norm': 33.86601638793945, 'learning_rate': 9.053703703703705e-06, 'epoch': 0.84}


 84%|████████▍ | 25150/30000 [4:02:07<41:23,  1.95it/s]

{'loss': 29.418, 'grad_norm': 68.67144775390625, 'learning_rate': 9.007407407407408e-06, 'epoch': 0.84}


 84%|████████▍ | 25175/30000 [4:02:18<36:26,  2.21it/s]

{'loss': 19.1139, 'grad_norm': 36.708621978759766, 'learning_rate': 8.961111111111112e-06, 'epoch': 0.84}


 84%|████████▍ | 25200/30000 [4:02:31<39:40,  2.02it/s]

{'loss': 25.9962, 'grad_norm': 40.35446548461914, 'learning_rate': 8.914814814814816e-06, 'epoch': 0.84}


 84%|████████▍ | 25225/30000 [4:02:42<41:17,  1.93it/s]

{'loss': 18.7627, 'grad_norm': 50.837467193603516, 'learning_rate': 8.86851851851852e-06, 'epoch': 0.84}


 84%|████████▍ | 25250/30000 [4:02:54<40:41,  1.95it/s]

{'loss': 23.2038, 'grad_norm': 34.55715560913086, 'learning_rate': 8.822222222222223e-06, 'epoch': 0.84}


 84%|████████▍ | 25275/30000 [4:03:07<35:12,  2.24it/s]

{'loss': 22.266, 'grad_norm': 31.764957427978516, 'learning_rate': 8.775925925925926e-06, 'epoch': 0.84}


 84%|████████▍ | 25300/30000 [4:03:19<40:34,  1.93it/s]

{'loss': 25.8608, 'grad_norm': 44.33496856689453, 'learning_rate': 8.72962962962963e-06, 'epoch': 0.84}


 84%|████████▍ | 25325/30000 [4:03:31<39:23,  1.98it/s]

{'loss': 16.7482, 'grad_norm': 25.504688262939453, 'learning_rate': 8.683333333333334e-06, 'epoch': 0.84}


 84%|████████▍ | 25350/30000 [4:03:43<40:23,  1.92it/s]

{'loss': 24.9484, 'grad_norm': 67.16098022460938, 'learning_rate': 8.637037037037037e-06, 'epoch': 0.84}


 85%|████████▍ | 25375/30000 [4:03:55<34:40,  2.22it/s]

{'loss': 14.92, 'grad_norm': 39.623905181884766, 'learning_rate': 8.590740740740741e-06, 'epoch': 0.85}


 85%|████████▍ | 25400/30000 [4:04:07<36:33,  2.10it/s]

{'loss': 18.1438, 'grad_norm': 89.75872039794922, 'learning_rate': 8.544444444444445e-06, 'epoch': 0.85}


 85%|████████▍ | 25425/30000 [4:04:19<39:23,  1.94it/s]

{'loss': 21.1969, 'grad_norm': 40.89870071411133, 'learning_rate': 8.498148148148148e-06, 'epoch': 0.85}


 85%|████████▍ | 25450/30000 [4:04:30<33:35,  2.26it/s]

{'loss': 23.3309, 'grad_norm': 54.925716400146484, 'learning_rate': 8.451851851851852e-06, 'epoch': 0.85}


 85%|████████▍ | 25475/30000 [4:04:43<34:38,  2.18it/s]

{'loss': 23.1092, 'grad_norm': 24.848297119140625, 'learning_rate': 8.405555555555556e-06, 'epoch': 0.85}


 85%|████████▌ | 25500/30000 [4:04:55<37:55,  1.98it/s]

{'loss': 25.8493, 'grad_norm': 83.12834167480469, 'learning_rate': 8.35925925925926e-06, 'epoch': 0.85}


 85%|████████▌ | 25525/30000 [4:05:06<31:35,  2.36it/s]

{'loss': 21.8341, 'grad_norm': 64.23099517822266, 'learning_rate': 8.312962962962963e-06, 'epoch': 0.85}


 85%|████████▌ | 25550/30000 [4:05:18<36:33,  2.03it/s]

{'loss': 19.2603, 'grad_norm': 18.819171905517578, 'learning_rate': 8.266666666666667e-06, 'epoch': 0.85}


 85%|████████▌ | 25575/30000 [4:05:30<31:33,  2.34it/s]

{'loss': 30.5231, 'grad_norm': 56.72410202026367, 'learning_rate': 8.220370370370372e-06, 'epoch': 0.85}


 85%|████████▌ | 25600/30000 [4:05:42<37:32,  1.95it/s]

{'loss': 23.4157, 'grad_norm': 68.3516616821289, 'learning_rate': 8.174074074074074e-06, 'epoch': 0.85}


 85%|████████▌ | 25625/30000 [4:05:54<37:30,  1.94it/s]

{'loss': 17.3024, 'grad_norm': 23.353885650634766, 'learning_rate': 8.12777777777778e-06, 'epoch': 0.85}


 86%|████████▌ | 25650/30000 [4:06:07<36:07,  2.01it/s]

{'loss': 14.9365, 'grad_norm': 55.15470886230469, 'learning_rate': 8.081481481481481e-06, 'epoch': 0.85}


 86%|████████▌ | 25675/30000 [4:06:19<36:26,  1.98it/s]

{'loss': 22.7191, 'grad_norm': 68.53753662109375, 'learning_rate': 8.035185185185186e-06, 'epoch': 0.86}


 86%|████████▌ | 25700/30000 [4:06:30<30:05,  2.38it/s]

{'loss': 20.5059, 'grad_norm': 33.83312225341797, 'learning_rate': 7.988888888888888e-06, 'epoch': 0.86}


 86%|████████▌ | 25725/30000 [4:06:42<28:47,  2.48it/s]

{'loss': 18.4248, 'grad_norm': 29.262067794799805, 'learning_rate': 7.942592592592594e-06, 'epoch': 0.86}


 86%|████████▌ | 25750/30000 [4:06:55<32:24,  2.19it/s]

{'loss': 19.214, 'grad_norm': 49.04997253417969, 'learning_rate': 7.896296296296296e-06, 'epoch': 0.86}


 86%|████████▌ | 25775/30000 [4:07:07<35:52,  1.96it/s]

{'loss': 20.084, 'grad_norm': 20.136993408203125, 'learning_rate': 7.850000000000001e-06, 'epoch': 0.86}


 86%|████████▌ | 25800/30000 [4:07:19<35:52,  1.95it/s]

{'loss': 23.7598, 'grad_norm': 53.74237823486328, 'learning_rate': 7.803703703703705e-06, 'epoch': 0.86}


 86%|████████▌ | 25825/30000 [4:07:31<29:44,  2.34it/s]

{'loss': 19.5125, 'grad_norm': 128.87716674804688, 'learning_rate': 7.757407407407408e-06, 'epoch': 0.86}


 86%|████████▌ | 25850/30000 [4:07:42<33:54,  2.04it/s]

{'loss': 18.5582, 'grad_norm': 28.416038513183594, 'learning_rate': 7.711111111111112e-06, 'epoch': 0.86}


 86%|████████▋ | 25875/30000 [4:07:54<32:00,  2.15it/s]

{'loss': 21.3131, 'grad_norm': 56.56707763671875, 'learning_rate': 7.664814814814816e-06, 'epoch': 0.86}


 86%|████████▋ | 25900/30000 [4:08:07<35:02,  1.95it/s]

{'loss': 21.4878, 'grad_norm': 51.939674377441406, 'learning_rate': 7.618518518518519e-06, 'epoch': 0.86}


 86%|████████▋ | 25925/30000 [4:08:19<34:21,  1.98it/s]

{'loss': 20.6821, 'grad_norm': 47.29595184326172, 'learning_rate': 7.572222222222222e-06, 'epoch': 0.86}


 86%|████████▋ | 25950/30000 [4:08:30<28:36,  2.36it/s]

{'loss': 15.9249, 'grad_norm': 25.020023345947266, 'learning_rate': 7.525925925925927e-06, 'epoch': 0.86}


 87%|████████▋ | 25975/30000 [4:08:42<28:26,  2.36it/s]

{'loss': 15.9559, 'grad_norm': 37.58469009399414, 'learning_rate': 7.479629629629629e-06, 'epoch': 0.87}


 87%|████████▋ | 26000/30000 [4:08:55<34:08,  1.95it/s]

{'loss': 22.2124, 'grad_norm': 96.62413787841797, 'learning_rate': 7.433333333333334e-06, 'epoch': 0.87}


Too many dataloader workers: 10 (max is dataset.num_shards=3). Stopping 7 dataloader workers.
 87%|████████▋ | 26000/30000 [4:11:59<34:08,  1.95it/s]

{'eval_loss': 27.6351261138916, 'eval_runtime': 183.9877, 'eval_samples_per_second': 21.154, 'eval_steps_per_second': 5.288, 'epoch': 0.87}


 87%|████████▋ | 26025/30000 [4:12:20<34:10,  1.94it/s]   

{'loss': 22.4264, 'grad_norm': 112.85369110107422, 'learning_rate': 7.387037037037037e-06, 'epoch': 0.87}


 87%|████████▋ | 26050/30000 [4:12:31<34:14,  1.92it/s]

{'loss': 17.3484, 'grad_norm': 47.24037170410156, 'learning_rate': 7.340740740740741e-06, 'epoch': 0.87}


 87%|████████▋ | 26075/30000 [4:12:42<28:03,  2.33it/s]

{'loss': 19.3416, 'grad_norm': 53.4660530090332, 'learning_rate': 7.294444444444446e-06, 'epoch': 0.87}


 87%|████████▋ | 26100/30000 [4:12:54<32:18,  2.01it/s]

{'loss': 23.0705, 'grad_norm': 31.341869354248047, 'learning_rate': 7.2481481481481485e-06, 'epoch': 0.87}


 87%|████████▋ | 26125/30000 [4:13:06<27:38,  2.34it/s]

{'loss': 19.7658, 'grad_norm': 17.261608123779297, 'learning_rate': 7.201851851851853e-06, 'epoch': 0.87}


 87%|████████▋ | 26150/30000 [4:13:18<32:27,  1.98it/s]

{'loss': 20.1259, 'grad_norm': 51.763023376464844, 'learning_rate': 7.155555555555556e-06, 'epoch': 0.87}


 87%|████████▋ | 26175/30000 [4:13:30<28:38,  2.23it/s]

{'loss': 17.869, 'grad_norm': 54.51813888549805, 'learning_rate': 7.10925925925926e-06, 'epoch': 0.87}


 87%|████████▋ | 26200/30000 [4:13:43<31:53,  1.99it/s]

{'loss': 26.1891, 'grad_norm': 55.297855377197266, 'learning_rate': 7.062962962962963e-06, 'epoch': 0.87}


 87%|████████▋ | 26225/30000 [4:13:56<32:14,  1.95it/s]

{'loss': 19.4885, 'grad_norm': 24.867809295654297, 'learning_rate': 7.0166666666666675e-06, 'epoch': 0.87}


 88%|████████▊ | 26250/30000 [4:14:08<25:46,  2.43it/s]

{'loss': 19.0194, 'grad_norm': 41.8068962097168, 'learning_rate': 6.97037037037037e-06, 'epoch': 0.88}


 88%|████████▊ | 26275/30000 [4:14:20<32:08,  1.93it/s]

{'loss': 22.3338, 'grad_norm': 43.909278869628906, 'learning_rate': 6.924074074074075e-06, 'epoch': 0.88}


 88%|████████▊ | 26300/30000 [4:14:33<31:04,  1.98it/s]

{'loss': 20.7297, 'grad_norm': 46.6257438659668, 'learning_rate': 6.877777777777778e-06, 'epoch': 0.88}


 88%|████████▊ | 26325/30000 [4:14:45<32:13,  1.90it/s]

{'loss': 18.5525, 'grad_norm': 43.32427215576172, 'learning_rate': 6.831481481481482e-06, 'epoch': 0.88}


 88%|████████▊ | 26350/30000 [4:14:57<26:33,  2.29it/s]

{'loss': 18.2272, 'grad_norm': 66.27188873291016, 'learning_rate': 6.785185185185186e-06, 'epoch': 0.88}


 88%|████████▊ | 26375/30000 [4:15:09<25:52,  2.33it/s]

{'loss': 20.8844, 'grad_norm': 50.7093505859375, 'learning_rate': 6.738888888888889e-06, 'epoch': 0.88}


 88%|████████▊ | 26400/30000 [4:15:21<31:09,  1.93it/s]

{'loss': 18.4678, 'grad_norm': 50.76116943359375, 'learning_rate': 6.692592592592593e-06, 'epoch': 0.88}


 88%|████████▊ | 26425/30000 [4:15:32<29:54,  1.99it/s]

{'loss': 16.8751, 'grad_norm': 50.25065612792969, 'learning_rate': 6.646296296296297e-06, 'epoch': 0.88}


 88%|████████▊ | 26450/30000 [4:15:44<24:13,  2.44it/s]

{'loss': 20.3162, 'grad_norm': 49.581214904785156, 'learning_rate': 6.6e-06, 'epoch': 0.88}


 88%|████████▊ | 26475/30000 [4:15:56<30:22,  1.93it/s]

{'loss': 17.1582, 'grad_norm': 43.82173156738281, 'learning_rate': 6.553703703703704e-06, 'epoch': 0.88}


 88%|████████▊ | 26500/30000 [4:16:09<29:36,  1.97it/s]

{'loss': 20.3898, 'grad_norm': 69.34854888916016, 'learning_rate': 6.507407407407408e-06, 'epoch': 0.88}


 88%|████████▊ | 26525/30000 [4:16:22<31:16,  1.85it/s]

{'loss': 23.3883, 'grad_norm': 50.51172637939453, 'learning_rate': 6.4611111111111104e-06, 'epoch': 0.88}


 88%|████████▊ | 26550/30000 [4:16:34<28:54,  1.99it/s]

{'loss': 16.2766, 'grad_norm': 36.23075485229492, 'learning_rate': 6.414814814814815e-06, 'epoch': 0.89}


 89%|████████▊ | 26575/30000 [4:16:46<23:10,  2.46it/s]

{'loss': 19.6311, 'grad_norm': 44.82147216796875, 'learning_rate': 6.368518518518519e-06, 'epoch': 0.89}


 89%|████████▊ | 26600/30000 [4:16:58<28:31,  1.99it/s]

{'loss': 16.1226, 'grad_norm': 47.572418212890625, 'learning_rate': 6.322222222222222e-06, 'epoch': 0.89}


 89%|████████▉ | 26625/30000 [4:17:10<28:30,  1.97it/s]

{'loss': 24.7571, 'grad_norm': 38.15071487426758, 'learning_rate': 6.275925925925927e-06, 'epoch': 0.89}


 89%|████████▉ | 26650/30000 [4:17:22<27:17,  2.05it/s]

{'loss': 22.483, 'grad_norm': 36.42898941040039, 'learning_rate': 6.2296296296296295e-06, 'epoch': 0.89}


 89%|████████▉ | 26675/30000 [4:17:34<24:13,  2.29it/s]

{'loss': 21.4427, 'grad_norm': 34.55306625366211, 'learning_rate': 6.183333333333333e-06, 'epoch': 0.89}


 89%|████████▉ | 26700/30000 [4:17:46<29:16,  1.88it/s]

{'loss': 21.4637, 'grad_norm': 69.20838928222656, 'learning_rate': 6.137037037037038e-06, 'epoch': 0.89}


 89%|████████▉ | 26725/30000 [4:17:58<23:56,  2.28it/s]

{'loss': 15.051, 'grad_norm': 39.40006637573242, 'learning_rate': 6.090740740740741e-06, 'epoch': 0.89}


 89%|████████▉ | 26750/30000 [4:18:11<28:04,  1.93it/s]

{'loss': 17.7712, 'grad_norm': 70.39512634277344, 'learning_rate': 6.044444444444445e-06, 'epoch': 0.89}


 89%|████████▉ | 26775/30000 [4:18:23<27:48,  1.93it/s]

{'loss': 19.1816, 'grad_norm': 48.17369079589844, 'learning_rate': 5.9981481481481486e-06, 'epoch': 0.89}


 89%|████████▉ | 26800/30000 [4:18:35<27:06,  1.97it/s]

{'loss': 20.7769, 'grad_norm': 38.14612579345703, 'learning_rate': 5.951851851851852e-06, 'epoch': 0.89}


 89%|████████▉ | 26825/30000 [4:18:48<27:13,  1.94it/s]

{'loss': 25.2334, 'grad_norm': 58.760807037353516, 'learning_rate': 5.905555555555556e-06, 'epoch': 0.89}


 90%|████████▉ | 26850/30000 [4:18:59<25:59,  2.02it/s]

{'loss': 15.1704, 'grad_norm': 57.665138244628906, 'learning_rate': 5.8592592592592595e-06, 'epoch': 0.9}


 90%|████████▉ | 26875/30000 [4:19:11<26:20,  1.98it/s]

{'loss': 17.9426, 'grad_norm': 44.52408981323242, 'learning_rate': 5.814814814814816e-06, 'epoch': 0.9}


 90%|████████▉ | 26900/30000 [4:19:22<25:22,  2.04it/s]

{'loss': 20.4139, 'grad_norm': 25.686695098876953, 'learning_rate': 5.768518518518519e-06, 'epoch': 0.9}


 90%|████████▉ | 26925/30000 [4:19:34<25:25,  2.02it/s]

{'loss': 18.6379, 'grad_norm': 49.015987396240234, 'learning_rate': 5.722222222222223e-06, 'epoch': 0.9}


 90%|████████▉ | 26950/30000 [4:19:45<21:32,  2.36it/s]

{'loss': 16.4218, 'grad_norm': 22.37679672241211, 'learning_rate': 5.6759259259259265e-06, 'epoch': 0.9}


 90%|████████▉ | 26975/30000 [4:19:57<24:19,  2.07it/s]

{'loss': 17.3731, 'grad_norm': 59.25218200683594, 'learning_rate': 5.62962962962963e-06, 'epoch': 0.9}


 90%|█████████ | 27000/30000 [4:20:09<24:13,  2.06it/s]

{'loss': 21.9119, 'grad_norm': 33.81238555908203, 'learning_rate': 5.583333333333334e-06, 'epoch': 0.9}


 90%|█████████ | 27025/30000 [4:20:21<24:26,  2.03it/s]

{'loss': 18.1375, 'grad_norm': 30.020973205566406, 'learning_rate': 5.5370370370370374e-06, 'epoch': 0.9}


 90%|█████████ | 27050/30000 [4:20:32<24:06,  2.04it/s]

{'loss': 17.5324, 'grad_norm': 52.42443084716797, 'learning_rate': 5.490740740740741e-06, 'epoch': 0.9}


 90%|█████████ | 27075/30000 [4:20:44<21:37,  2.25it/s]

{'loss': 16.9332, 'grad_norm': 31.308547973632812, 'learning_rate': 5.444444444444445e-06, 'epoch': 0.9}


 90%|█████████ | 27100/30000 [4:20:56<23:39,  2.04it/s]

{'loss': 16.6994, 'grad_norm': 55.75825881958008, 'learning_rate': 5.398148148148148e-06, 'epoch': 0.9}


 90%|█████████ | 27125/30000 [4:21:08<23:33,  2.03it/s]

{'loss': 16.0281, 'grad_norm': 64.34502410888672, 'learning_rate': 5.351851851851852e-06, 'epoch': 0.9}


 90%|█████████ | 27150/30000 [4:21:20<24:42,  1.92it/s]

{'loss': 19.3751, 'grad_norm': 65.07720947265625, 'learning_rate': 5.305555555555556e-06, 'epoch': 0.91}


 91%|█████████ | 27175/30000 [4:21:33<23:26,  2.01it/s]

{'loss': 15.5985, 'grad_norm': 49.29689407348633, 'learning_rate': 5.259259259259259e-06, 'epoch': 0.91}


 91%|█████████ | 27200/30000 [4:21:44<23:46,  1.96it/s]

{'loss': 14.5769, 'grad_norm': 14.705760955810547, 'learning_rate': 5.212962962962963e-06, 'epoch': 0.91}


 91%|█████████ | 27225/30000 [4:21:57<23:59,  1.93it/s]

{'loss': 22.3421, 'grad_norm': 35.960899353027344, 'learning_rate': 5.166666666666667e-06, 'epoch': 0.91}


 91%|█████████ | 27250/30000 [4:22:09<24:05,  1.90it/s]

{'loss': 19.6848, 'grad_norm': 60.45317840576172, 'learning_rate': 5.12037037037037e-06, 'epoch': 0.91}


 91%|█████████ | 27275/30000 [4:22:22<23:31,  1.93it/s]

{'loss': 19.7196, 'grad_norm': 50.7868537902832, 'learning_rate': 5.074074074074074e-06, 'epoch': 0.91}


 91%|█████████ | 27300/30000 [4:22:35<24:09,  1.86it/s]

{'loss': 18.0808, 'grad_norm': 23.02650260925293, 'learning_rate': 5.0277777777777775e-06, 'epoch': 0.91}


 91%|█████████ | 27325/30000 [4:22:47<19:52,  2.24it/s]

{'loss': 16.5978, 'grad_norm': 51.86538314819336, 'learning_rate': 4.981481481481481e-06, 'epoch': 0.91}


 91%|█████████ | 27350/30000 [4:23:00<23:04,  1.91it/s]

{'loss': 20.9962, 'grad_norm': 61.465576171875, 'learning_rate': 4.935185185185186e-06, 'epoch': 0.91}


 91%|█████████▏| 27375/30000 [4:23:12<18:59,  2.30it/s]

{'loss': 20.5658, 'grad_norm': 49.98973083496094, 'learning_rate': 4.888888888888889e-06, 'epoch': 0.91}


 91%|█████████▏| 27400/30000 [4:23:24<22:08,  1.96it/s]

{'loss': 19.2688, 'grad_norm': 66.4561996459961, 'learning_rate': 4.842592592592593e-06, 'epoch': 0.91}


 91%|█████████▏| 27425/30000 [4:23:36<22:01,  1.95it/s]

{'loss': 24.1852, 'grad_norm': 50.46548080444336, 'learning_rate': 4.796296296296297e-06, 'epoch': 0.91}


 92%|█████████▏| 27450/30000 [4:23:49<21:25,  1.98it/s]

{'loss': 14.7346, 'grad_norm': 25.879682540893555, 'learning_rate': 4.75e-06, 'epoch': 0.92}


 92%|█████████▏| 27475/30000 [4:24:00<19:23,  2.17it/s]

{'loss': 16.5266, 'grad_norm': 35.39683151245117, 'learning_rate': 4.703703703703704e-06, 'epoch': 0.92}


 92%|█████████▏| 27500/30000 [4:24:12<20:49,  2.00it/s]

{'loss': 19.4367, 'grad_norm': 46.240264892578125, 'learning_rate': 4.6574074074074076e-06, 'epoch': 0.92}


 92%|█████████▏| 27525/30000 [4:24:25<21:39,  1.90it/s]

{'loss': 18.158, 'grad_norm': 34.826683044433594, 'learning_rate': 4.611111111111111e-06, 'epoch': 0.92}


 92%|█████████▏| 27550/30000 [4:24:37<19:09,  2.13it/s]

{'loss': 15.2556, 'grad_norm': 18.224746704101562, 'learning_rate': 4.564814814814815e-06, 'epoch': 0.92}


 92%|█████████▏| 27575/30000 [4:24:49<21:02,  1.92it/s]

{'loss': 19.4434, 'grad_norm': 53.4388313293457, 'learning_rate': 4.5185185185185185e-06, 'epoch': 0.92}


 92%|█████████▏| 27600/30000 [4:25:01<20:48,  1.92it/s]

{'loss': 19.2444, 'grad_norm': 31.147520065307617, 'learning_rate': 4.472222222222222e-06, 'epoch': 0.92}


 92%|█████████▏| 27625/30000 [4:25:14<20:43,  1.91it/s]

{'loss': 18.7488, 'grad_norm': 18.987642288208008, 'learning_rate': 4.425925925925927e-06, 'epoch': 0.92}


 92%|█████████▏| 27650/30000 [4:25:27<19:24,  2.02it/s]

{'loss': 19.214, 'grad_norm': 30.730478286743164, 'learning_rate': 4.37962962962963e-06, 'epoch': 0.92}


 92%|█████████▏| 27675/30000 [4:25:38<16:22,  2.37it/s]

{'loss': 15.4104, 'grad_norm': 15.95295524597168, 'learning_rate': 4.333333333333334e-06, 'epoch': 0.92}


 92%|█████████▏| 27700/30000 [4:25:50<19:34,  1.96it/s]

{'loss': 21.6967, 'grad_norm': 62.40781021118164, 'learning_rate': 4.2870370370370376e-06, 'epoch': 0.92}


 92%|█████████▏| 27725/30000 [4:26:03<18:58,  2.00it/s]

{'loss': 22.3265, 'grad_norm': 23.500709533691406, 'learning_rate': 4.240740740740741e-06, 'epoch': 0.92}


 92%|█████████▎| 27750/30000 [4:26:16<19:49,  1.89it/s]

{'loss': 18.5501, 'grad_norm': 30.75749397277832, 'learning_rate': 4.194444444444445e-06, 'epoch': 0.93}


 93%|█████████▎| 27775/30000 [4:26:28<18:48,  1.97it/s]

{'loss': 24.4217, 'grad_norm': 49.98203659057617, 'learning_rate': 4.1481481481481485e-06, 'epoch': 0.93}


 93%|█████████▎| 27800/30000 [4:26:40<18:26,  1.99it/s]

{'loss': 19.5401, 'grad_norm': 42.436100006103516, 'learning_rate': 4.101851851851852e-06, 'epoch': 0.93}


 93%|█████████▎| 27825/30000 [4:26:52<16:17,  2.23it/s]

{'loss': 15.6331, 'grad_norm': 49.881141662597656, 'learning_rate': 4.055555555555556e-06, 'epoch': 0.93}


 93%|█████████▎| 27850/30000 [4:27:05<16:22,  2.19it/s]

{'loss': 18.6566, 'grad_norm': 24.56839370727539, 'learning_rate': 4.0092592592592594e-06, 'epoch': 0.93}


 93%|█████████▎| 27875/30000 [4:27:15<14:35,  2.43it/s]

{'loss': 13.4813, 'grad_norm': 49.53357696533203, 'learning_rate': 3.962962962962963e-06, 'epoch': 0.93}


 93%|█████████▎| 27900/30000 [4:27:27<17:53,  1.96it/s]

{'loss': 17.3197, 'grad_norm': 36.43801498413086, 'learning_rate': 3.916666666666667e-06, 'epoch': 0.93}


 93%|█████████▎| 27925/30000 [4:27:40<17:47,  1.94it/s]

{'loss': 19.9199, 'grad_norm': 45.25344467163086, 'learning_rate': 3.87037037037037e-06, 'epoch': 0.93}


 93%|█████████▎| 27950/30000 [4:27:51<18:12,  1.88it/s]

{'loss': 14.5439, 'grad_norm': 38.95186996459961, 'learning_rate': 3.824074074074074e-06, 'epoch': 0.93}


 93%|█████████▎| 27975/30000 [4:28:04<17:19,  1.95it/s]

{'loss': 24.094, 'grad_norm': 72.46165466308594, 'learning_rate': 3.777777777777778e-06, 'epoch': 0.93}


 93%|█████████▎| 28000/30000 [4:28:16<14:05,  2.36it/s]

{'loss': 15.6524, 'grad_norm': 39.430076599121094, 'learning_rate': 3.7314814814814817e-06, 'epoch': 0.93}


Too many dataloader workers: 10 (max is dataset.num_shards=3). Stopping 7 dataloader workers.
 93%|█████████▎| 28000/30000 [4:31:16<14:05,  2.36it/s]

{'eval_loss': 27.75004005432129, 'eval_runtime': 180.1605, 'eval_samples_per_second': 21.603, 'eval_steps_per_second': 5.401, 'epoch': 0.93}


 93%|█████████▎| 28025/30000 [4:31:38<18:33,  1.77it/s]   

{'loss': 15.456, 'grad_norm': 52.88685989379883, 'learning_rate': 3.6851851851851854e-06, 'epoch': 0.93}


 94%|█████████▎| 28050/30000 [4:31:50<16:41,  1.95it/s]

{'loss': 21.873, 'grad_norm': 57.07524871826172, 'learning_rate': 3.638888888888889e-06, 'epoch': 0.94}


 94%|█████████▎| 28075/30000 [4:32:02<14:11,  2.26it/s]

{'loss': 14.9934, 'grad_norm': 28.62950897216797, 'learning_rate': 3.5925925925925927e-06, 'epoch': 0.94}


 94%|█████████▎| 28100/30000 [4:32:14<15:17,  2.07it/s]

{'loss': 15.803, 'grad_norm': 31.34559440612793, 'learning_rate': 3.5462962962962963e-06, 'epoch': 0.94}


 94%|█████████▍| 28125/30000 [4:32:27<14:41,  2.13it/s]

{'loss': 19.7348, 'grad_norm': 69.83973693847656, 'learning_rate': 3.5000000000000004e-06, 'epoch': 0.94}


 94%|█████████▍| 28150/30000 [4:32:39<14:00,  2.20it/s]

{'loss': 16.4288, 'grad_norm': 27.528636932373047, 'learning_rate': 3.453703703703704e-06, 'epoch': 0.94}


 94%|█████████▍| 28175/30000 [4:32:52<15:35,  1.95it/s]

{'loss': 22.7377, 'grad_norm': 150.14553833007812, 'learning_rate': 3.4092592592592592e-06, 'epoch': 0.94}


 94%|█████████▍| 28200/30000 [4:33:04<15:04,  1.99it/s]

{'loss': 21.0733, 'grad_norm': 59.89854431152344, 'learning_rate': 3.362962962962963e-06, 'epoch': 0.94}


 94%|█████████▍| 28225/30000 [4:33:17<14:18,  2.07it/s]

{'loss': 21.392, 'grad_norm': 50.738555908203125, 'learning_rate': 3.3166666666666665e-06, 'epoch': 0.94}


 94%|█████████▍| 28250/30000 [4:33:29<15:17,  1.91it/s]

{'loss': 15.1276, 'grad_norm': 49.10731506347656, 'learning_rate': 3.270370370370371e-06, 'epoch': 0.94}


 94%|█████████▍| 28275/30000 [4:33:42<13:10,  2.18it/s]

{'loss': 17.9709, 'grad_norm': 25.73773956298828, 'learning_rate': 3.2240740740740747e-06, 'epoch': 0.94}


 94%|█████████▍| 28300/30000 [4:33:54<14:08,  2.00it/s]

{'loss': 17.4557, 'grad_norm': 33.74159240722656, 'learning_rate': 3.1777777777777783e-06, 'epoch': 0.94}


 94%|█████████▍| 28325/30000 [4:34:06<14:35,  1.91it/s]

{'loss': 19.0094, 'grad_norm': 58.6474609375, 'learning_rate': 3.131481481481482e-06, 'epoch': 0.94}


 94%|█████████▍| 28350/30000 [4:34:19<13:50,  1.99it/s]

{'loss': 20.6716, 'grad_norm': 48.25708770751953, 'learning_rate': 3.0851851851851856e-06, 'epoch': 0.94}


 95%|█████████▍| 28375/30000 [4:34:32<11:56,  2.27it/s]

{'loss': 17.8792, 'grad_norm': 34.918575286865234, 'learning_rate': 3.038888888888889e-06, 'epoch': 0.95}


 95%|█████████▍| 28400/30000 [4:34:43<11:17,  2.36it/s]

{'loss': 16.5813, 'grad_norm': 32.56999206542969, 'learning_rate': 2.9925925925925925e-06, 'epoch': 0.95}


 95%|█████████▍| 28425/30000 [4:34:56<11:22,  2.31it/s]

{'loss': 17.9289, 'grad_norm': 40.73421096801758, 'learning_rate': 2.946296296296296e-06, 'epoch': 0.95}


 95%|█████████▍| 28450/30000 [4:35:08<12:57,  1.99it/s]

{'loss': 19.604, 'grad_norm': 36.0300407409668, 'learning_rate': 2.9e-06, 'epoch': 0.95}


 95%|█████████▍| 28475/30000 [4:35:20<12:35,  2.02it/s]

{'loss': 20.8105, 'grad_norm': 60.077091217041016, 'learning_rate': 2.853703703703704e-06, 'epoch': 0.95}


 95%|█████████▌| 28500/30000 [4:35:32<10:49,  2.31it/s]

{'loss': 18.9435, 'grad_norm': 24.545120239257812, 'learning_rate': 2.8074074074074075e-06, 'epoch': 0.95}


 95%|█████████▌| 28525/30000 [4:35:44<12:18,  2.00it/s]

{'loss': 17.8083, 'grad_norm': 41.56490707397461, 'learning_rate': 2.761111111111111e-06, 'epoch': 0.95}


 95%|█████████▌| 28550/30000 [4:35:56<10:50,  2.23it/s]

{'loss': 15.9834, 'grad_norm': 39.15068054199219, 'learning_rate': 2.7148148148148148e-06, 'epoch': 0.95}


 95%|█████████▌| 28575/30000 [4:36:08<11:59,  1.98it/s]

{'loss': 17.0116, 'grad_norm': 25.49711036682129, 'learning_rate': 2.668518518518519e-06, 'epoch': 0.95}


 95%|█████████▌| 28600/30000 [4:36:21<11:34,  2.01it/s]

{'loss': 21.2883, 'grad_norm': 64.05098724365234, 'learning_rate': 2.6240740740740745e-06, 'epoch': 0.95}


 95%|█████████▌| 28612/30000 [4:36:27<10:30,  2.20it/s]'(ProtocolError('Connection aborted.', BrokenPipeError(32, 'Broken pipe')), '(Request ID: 04add555-6630-4e69-bd39-904d4f97eb5d)')' thrown while requesting GET https://huggingface.co/datasets/nguyenvulebinh/AVYT/resolve/e6c6bf6f40e698b82215d269cfc0a0d65a7a2372/vox2/vox2-dev-000006.tar
Retrying in 1s [Retry 1/5].
 95%|█████████▌| 28625/30000 [4:36:33<12:01,  1.91it/s]

{'loss': 16.6968, 'grad_norm': 42.177284240722656, 'learning_rate': 2.5777777777777777e-06, 'epoch': 0.95}


 96%|█████████▌| 28650/30000 [4:36:46<09:51,  2.28it/s]

{'loss': 18.2826, 'grad_norm': 51.1025505065918, 'learning_rate': 2.5314814814814814e-06, 'epoch': 0.95}


 96%|█████████▌| 28675/30000 [4:36:58<11:12,  1.97it/s]

{'loss': 13.2276, 'grad_norm': 23.910921096801758, 'learning_rate': 2.485185185185185e-06, 'epoch': 0.96}


 96%|█████████▌| 28700/30000 [4:37:09<10:43,  2.02it/s]

{'loss': 14.0299, 'grad_norm': 35.1053581237793, 'learning_rate': 2.438888888888889e-06, 'epoch': 0.96}


 96%|█████████▌| 28725/30000 [4:37:22<11:18,  1.88it/s]

{'loss': 21.6292, 'grad_norm': 42.184234619140625, 'learning_rate': 2.3925925925925927e-06, 'epoch': 0.96}


 96%|█████████▌| 28750/30000 [4:37:34<09:11,  2.27it/s]

{'loss': 20.4304, 'grad_norm': 34.8933219909668, 'learning_rate': 2.3462962962962964e-06, 'epoch': 0.96}


 96%|█████████▌| 28752/30000 [4:37:35<09:49,  2.12it/s]'(ProtocolError('Connection aborted.', BrokenPipeError(32, 'Broken pipe')), '(Request ID: 53339afa-a226-4a9b-af73-f124fe8f3b13)')' thrown while requesting GET https://huggingface.co/datasets/nguyenvulebinh/AVYT/resolve/e6c6bf6f40e698b82215d269cfc0a0d65a7a2372/vox2/vox2-dev-000005.tar
Retrying in 1s [Retry 1/5].
 96%|█████████▌| 28775/30000 [4:37:46<09:10,  2.23it/s]

{'loss': 22.4924, 'grad_norm': 30.467620849609375, 'learning_rate': 2.3e-06, 'epoch': 0.96}


 96%|█████████▌| 28800/30000 [4:37:58<08:49,  2.26it/s]

{'loss': 24.2596, 'grad_norm': 32.9532356262207, 'learning_rate': 2.2537037037037036e-06, 'epoch': 0.96}


 96%|█████████▌| 28825/30000 [4:38:10<09:35,  2.04it/s]

{'loss': 21.8301, 'grad_norm': 36.95344924926758, 'learning_rate': 2.2074074074074073e-06, 'epoch': 0.96}


 96%|█████████▌| 28850/30000 [4:38:22<09:46,  1.96it/s]

{'loss': 20.9172, 'grad_norm': 35.52144241333008, 'learning_rate': 2.1611111111111114e-06, 'epoch': 0.96}


 96%|█████████▋| 28875/30000 [4:38:35<09:24,  1.99it/s]

{'loss': 22.7316, 'grad_norm': 24.254199981689453, 'learning_rate': 2.114814814814815e-06, 'epoch': 0.96}


 96%|█████████▋| 28900/30000 [4:38:47<09:43,  1.89it/s]

{'loss': 21.2101, 'grad_norm': 43.43735122680664, 'learning_rate': 2.0685185185185187e-06, 'epoch': 0.96}


 96%|█████████▋| 28925/30000 [4:38:59<08:37,  2.08it/s]

{'loss': 21.7539, 'grad_norm': 51.93722152709961, 'learning_rate': 2.0222222222222223e-06, 'epoch': 0.96}


 96%|█████████▋| 28950/30000 [4:39:10<07:47,  2.25it/s]

{'loss': 13.9933, 'grad_norm': 34.59150695800781, 'learning_rate': 1.975925925925926e-06, 'epoch': 0.96}


 97%|█████████▋| 28975/30000 [4:39:23<08:54,  1.92it/s]

{'loss': 20.8057, 'grad_norm': 46.20816421508789, 'learning_rate': 1.92962962962963e-06, 'epoch': 0.97}


 97%|█████████▋| 29000/30000 [4:39:35<08:11,  2.04it/s]

{'loss': 16.7021, 'grad_norm': 42.583980560302734, 'learning_rate': 1.8833333333333334e-06, 'epoch': 0.97}


 97%|█████████▋| 29025/30000 [4:39:48<08:23,  1.94it/s]

{'loss': 23.9584, 'grad_norm': 47.828125, 'learning_rate': 1.837037037037037e-06, 'epoch': 0.97}


 97%|█████████▋| 29050/30000 [4:40:01<08:18,  1.90it/s]

{'loss': 19.2597, 'grad_norm': 53.22992706298828, 'learning_rate': 1.7907407407407407e-06, 'epoch': 0.97}


 97%|█████████▋| 29075/30000 [4:40:14<07:28,  2.06it/s]

{'loss': 22.0427, 'grad_norm': 22.44475555419922, 'learning_rate': 1.7444444444444444e-06, 'epoch': 0.97}


 97%|█████████▋| 29100/30000 [4:40:27<06:48,  2.20it/s]

{'loss': 22.7226, 'grad_norm': 22.357852935791016, 'learning_rate': 1.6981481481481484e-06, 'epoch': 0.97}


 97%|█████████▋| 29125/30000 [4:40:40<08:59,  1.62it/s]

{'loss': 23.2294, 'grad_norm': 53.86505126953125, 'learning_rate': 1.651851851851852e-06, 'epoch': 0.97}


 97%|█████████▋| 29150/30000 [4:40:52<06:20,  2.23it/s]

{'loss': 20.1378, 'grad_norm': 91.98261260986328, 'learning_rate': 1.6055555555555557e-06, 'epoch': 0.97}


 97%|█████████▋| 29175/30000 [4:41:04<05:49,  2.36it/s]

{'loss': 15.0334, 'grad_norm': 26.62936019897461, 'learning_rate': 1.5592592592592592e-06, 'epoch': 0.97}


 97%|█████████▋| 29200/30000 [4:41:17<06:49,  1.95it/s]

{'loss': 22.3648, 'grad_norm': 31.239818572998047, 'learning_rate': 1.512962962962963e-06, 'epoch': 0.97}


 97%|█████████▋| 29225/30000 [4:41:28<05:38,  2.29it/s]

{'loss': 19.1091, 'grad_norm': 42.17718505859375, 'learning_rate': 1.4666666666666667e-06, 'epoch': 0.97}


 98%|█████████▊| 29250/30000 [4:41:40<05:39,  2.21it/s]

{'loss': 19.0516, 'grad_norm': 24.672372817993164, 'learning_rate': 1.4203703703703705e-06, 'epoch': 0.97}


 98%|█████████▊| 29275/30000 [4:41:52<05:09,  2.34it/s]

{'loss': 16.627, 'grad_norm': 34.102447509765625, 'learning_rate': 1.3740740740740742e-06, 'epoch': 0.98}


 98%|█████████▊| 29300/30000 [4:42:04<05:48,  2.01it/s]

{'loss': 22.4665, 'grad_norm': 44.04630661010742, 'learning_rate': 1.3277777777777778e-06, 'epoch': 0.98}


 98%|█████████▊| 29325/30000 [4:42:16<04:46,  2.35it/s]

{'loss': 16.8635, 'grad_norm': 47.85992431640625, 'learning_rate': 1.2814814814814817e-06, 'epoch': 0.98}


 98%|█████████▊| 29350/30000 [4:42:29<04:53,  2.22it/s]

{'loss': 21.7066, 'grad_norm': 93.58183288574219, 'learning_rate': 1.2351851851851853e-06, 'epoch': 0.98}


 98%|█████████▊| 29375/30000 [4:42:41<04:12,  2.47it/s]

{'loss': 19.3995, 'grad_norm': 56.25837326049805, 'learning_rate': 1.188888888888889e-06, 'epoch': 0.98}


 98%|█████████▊| 29400/30000 [4:42:52<04:15,  2.35it/s]

{'loss': 19.1771, 'grad_norm': 27.8082275390625, 'learning_rate': 1.1425925925925926e-06, 'epoch': 0.98}


 98%|█████████▊| 29425/30000 [4:43:05<04:49,  1.99it/s]

{'loss': 16.8538, 'grad_norm': 33.702457427978516, 'learning_rate': 1.0962962962962963e-06, 'epoch': 0.98}


 98%|█████████▊| 29450/30000 [4:43:17<04:08,  2.22it/s]

{'loss': 24.2938, 'grad_norm': 60.97801971435547, 'learning_rate': 1.0500000000000001e-06, 'epoch': 0.98}


 98%|█████████▊| 29475/30000 [4:43:29<04:20,  2.01it/s]

{'loss': 18.1812, 'grad_norm': 49.18142318725586, 'learning_rate': 1.0037037037037038e-06, 'epoch': 0.98}


 98%|█████████▊| 29500/30000 [4:43:41<03:47,  2.19it/s]

{'loss': 24.137, 'grad_norm': 50.088932037353516, 'learning_rate': 9.574074074074074e-07, 'epoch': 0.98}


 98%|█████████▊| 29525/30000 [4:43:53<03:24,  2.33it/s]

{'loss': 23.5973, 'grad_norm': 27.50471305847168, 'learning_rate': 9.111111111111112e-07, 'epoch': 0.98}


 98%|█████████▊| 29550/30000 [4:44:06<03:46,  1.99it/s]

{'loss': 17.5811, 'grad_norm': 50.030555725097656, 'learning_rate': 8.648148148148148e-07, 'epoch': 0.98}


 99%|█████████▊| 29575/30000 [4:44:17<02:53,  2.44it/s]

{'loss': 16.2084, 'grad_norm': 35.434783935546875, 'learning_rate': 8.185185185185187e-07, 'epoch': 0.99}


 99%|█████████▊| 29600/30000 [4:44:30<03:25,  1.95it/s]

{'loss': 21.5474, 'grad_norm': 65.88105010986328, 'learning_rate': 7.722222222222223e-07, 'epoch': 0.99}


 99%|█████████▉| 29625/30000 [4:44:41<03:08,  1.99it/s]

{'loss': 20.371, 'grad_norm': 46.471656799316406, 'learning_rate': 7.259259259259259e-07, 'epoch': 0.99}


 99%|█████████▉| 29650/30000 [4:44:54<02:54,  2.01it/s]

{'loss': 25.9244, 'grad_norm': 58.42923355102539, 'learning_rate': 6.796296296296296e-07, 'epoch': 0.99}


 99%|█████████▉| 29675/30000 [4:45:06<02:25,  2.23it/s]

{'loss': 18.1311, 'grad_norm': 38.1458854675293, 'learning_rate': 6.333333333333333e-07, 'epoch': 0.99}


 99%|█████████▉| 29700/30000 [4:45:18<02:24,  2.08it/s]

{'loss': 20.0344, 'grad_norm': 32.802669525146484, 'learning_rate': 5.870370370370371e-07, 'epoch': 0.99}


 99%|█████████▉| 29725/30000 [4:45:30<01:53,  2.41it/s]

{'loss': 20.155, 'grad_norm': 30.632627487182617, 'learning_rate': 5.407407407407408e-07, 'epoch': 0.99}


 99%|█████████▉| 29750/30000 [4:45:43<02:11,  1.91it/s]

{'loss': 26.6391, 'grad_norm': 134.69117736816406, 'learning_rate': 4.944444444444444e-07, 'epoch': 0.99}


 99%|█████████▉| 29775/30000 [4:45:55<01:55,  1.96it/s]

{'loss': 21.9612, 'grad_norm': 48.57712936401367, 'learning_rate': 4.4814814814814813e-07, 'epoch': 0.99}


 99%|█████████▉| 29800/30000 [4:46:07<01:37,  2.04it/s]

{'loss': 25.1518, 'grad_norm': 57.82143020629883, 'learning_rate': 4.018518518518519e-07, 'epoch': 0.99}


 99%|█████████▉| 29825/30000 [4:46:19<01:17,  2.25it/s]

{'loss': 22.9466, 'grad_norm': 67.0403060913086, 'learning_rate': 3.555555555555556e-07, 'epoch': 0.99}


100%|█████████▉| 29850/30000 [4:46:31<01:18,  1.91it/s]

{'loss': 17.0783, 'grad_norm': 30.277482986450195, 'learning_rate': 3.092592592592593e-07, 'epoch': 0.99}


100%|█████████▉| 29875/30000 [4:46:45<01:07,  1.86it/s]

{'loss': 26.2936, 'grad_norm': 29.67350196838379, 'learning_rate': 2.62962962962963e-07, 'epoch': 1.0}


100%|█████████▉| 29900/30000 [4:46:57<00:48,  2.06it/s]

{'loss': 16.4536, 'grad_norm': 21.441110610961914, 'learning_rate': 2.1666666666666667e-07, 'epoch': 1.0}


100%|█████████▉| 29925/30000 [4:47:10<00:38,  1.97it/s]

{'loss': 20.4643, 'grad_norm': 37.836631774902344, 'learning_rate': 1.703703703703704e-07, 'epoch': 1.0}


100%|█████████▉| 29950/30000 [4:47:21<00:24,  2.01it/s]

{'loss': 23.7461, 'grad_norm': 66.31090545654297, 'learning_rate': 1.240740740740741e-07, 'epoch': 1.0}


100%|█████████▉| 29975/30000 [4:47:33<00:12,  1.96it/s]

{'loss': 17.0543, 'grad_norm': 35.56364059448242, 'learning_rate': 7.777777777777778e-08, 'epoch': 1.0}


100%|██████████| 30000/30000 [4:47:45<00:00,  2.31it/s]

{'loss': 22.8197, 'grad_norm': 49.814903259277344, 'learning_rate': 3.148148148148148e-08, 'epoch': 1.0}


Too many dataloader workers: 10 (max is dataset.num_shards=3). Stopping 7 dataloader workers.
100%|██████████| 30000/30000 [4:50:36<00:00,  2.31it/s]

{'eval_loss': 27.49915313720703, 'eval_runtime': 170.7197, 'eval_samples_per_second': 22.798, 'eval_steps_per_second': 5.699, 'epoch': 1.0}
{'train_runtime': 17445.3793, 'train_samples_per_second': 13.757, 'train_steps_per_second': 1.72, 'train_loss': 17.871614496358237, 'epoch': 1.0}


100%|██████████| 30000/30000 [4:50:45<00:00,  1.72it/s]


CompletedProcess(args=['/home/josch080/Projektgruppe/mcorec_train/bin/python', 'script/train.py', '--streaming_dataset', '--include_mcorec', '--batch_size', '4', '--max_steps', '30000', '--gradient_accumulation_steps', '2', '--save_steps', '2000', '--eval_steps', '2000', '--log_interval', '25', '--learning_rate', '5e-5', '--warmup_steps', '3000', '--checkpoint_name', 'avsr_cocktail_mcorec_stage2_lr5e-5_30k', '--model_name_or_path', './model-bin/avsr_cocktail_mcorec_finetune', '--output_dir', './model-bin', '--report_to', 'none'], returncode=0)

## 5 – Inference-Setup

Experiment-Utilities neu laden, da oben ggf. CWD geändert wurde.

In [1]:
import os, sys
import pandas as pd

# Arbeitsverzeichnis auf Repo-Root setzen (Voraussetzung für alle relativen Pfade)
project_baseline_path = "/home/josch080/Projektgruppe/mcorec_baseline"
os.chdir(project_baseline_path)

# Repo-Root in sys.path, damit projektinterne Module importierbar sind
if project_baseline_path not in sys.path:
    sys.path.append(project_baseline_path)

from script.pg_utils_experiments import run_inference_for_experiment, run_eval_and_log, append_eval_results_for_experiments

## 6 – Modell-Definitionen

Zwei Modelle im Vergleich: BL4 als Referenz und das neue Stage-2-Modell.
Der Checkpoint-Pfad zeigt auf den letzten gespeicherten Step (30 000).

In [2]:
MODELS = {
    # BL4: Referenzmodell aus vorherigen Experimenten
    "cocktail_finetuned": {
        "model_type": "avsr_cocktail",
        "chkpt": "model-bin/avsr_cocktail_mcorec_finetune",
        "out": "output_avsr_cocktail_finetuned",
    },
    # Stage-2-Modell: BL4 + 30k weiterer Trainingsschritte
    "cocktail_stage2": {
        "model_type": "avsr_cocktail",
        "chkpt": "model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000",
        "out": "output_avsr_cocktail_stage2",
    },
}

## 7 – Sessions & Experimente

Gleiches 5-Session-Subset wie in `02_`. Drei Experimente testen das Stage-2-Modell
mit den Baseline-Parametern (E11) und den bisher besten Parametern (E12, E13).

In [3]:
# Gleiches Dev-Subset wie in 02_ für direkte Vergleichbarkeit
SESSION_IDS = ["session_40", "session_43", "session_49", "session_50", "session_54"]

In [4]:
EXPERIMENTS = {
    "E11_stage2_bs3_len15": {
        "base_model": "cocktail_stage2",
        "beam_size": 3,
        "max_length": 15,
        "comment": "Stage-2-Modell, beam=3, len=15",
    },
    "E12_stage2_bs12_len20": {
        "base_model": "cocktail_stage2",
        "beam_size": 12,
        "max_length": 20,
        "comment": "Stage-2-Modell, beam=12, len=20 (beste Konfiguration)",
    },
    "E13_stage2_bs8_len20": {
        "base_model": "cocktail_stage2",
        "beam_size": 8,
        "max_length": 20,
        "comment": "Stage-2-Modell, beam=8, len=20",
    },
}


## 8 – Inference

3 Experimente × 5 Sessions = 15 Läufe. Ergebnisse liegen bereits vor.

In [26]:
for sid in SESSION_IDS:
    session_dir = f"data-bin/dev/{sid}"
    print(f"\n########## Starte Experimente für {sid} ##########")

    for exp_name in EXPERIMENTS:
        run_inference_for_experiment(
            exp_name=exp_name,
            base_models=MODELS,
            experiments=EXPERIMENTS,
            session_dir=session_dir,
        )


########## Starte Experimente für session_40 ##########

Starte Inference für Experiment: E11_stage2_bs3_len15
  base_model      = cocktail_stage2
  model_type      = avsr_cocktail
  checkpoint_path = model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
  beam_size       = 3
  max_length      = 15
  output_dir_name = output_E11_stage2_bs3_len15
  session_dir     = data-bin/dev_without_central_videos/dev/session_40
  comment         = Stage-2-Modell, beam=3, len=15
Loading avsr_cocktail model...
Loading model from model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
avsr_cocktail model loaded successfully!
Inferring 1 sessions using avsr_cocktail model
Processing session session_40


Processing speakers:   0%|          | 0/6 [00:00<?, ?it/s]





[Acessing speaker spk_0 track 1 of 1:   0%|          | 0/35 [00:00<?, ?it/s]
[Acessing speaker spk_0 track 1 of 1:   3%|▎         | 1/35 [00:02<01:11,  2.11s/it]
[Acessing speaker spk_0 track 1 of 1:   6%|▌         | 2/35 [00:02<00:35,  1.09s/it]
[Acessing speaker spk_0 track 1 of 1:   9%|▊         | 3/35 [00:02<00:22,  1.41it/s]
[Acessing speaker spk_0 track 1 of 1:  11%|█▏        | 4/35 [00:03<00:17,  1.77it/s]
[Acessing speaker spk_0 track 1 of 1:  14%|█▍        | 5/35 [00:03<00:13,  2.15it/s]
[Acessing speaker spk_0 track 1 of 1:  17%|█▋        | 6/35 [00:04<00:19,  1.47it/s]
[Acessing speaker spk_0 track 1 of 1:  20%|██        | 7/35 [00:07<00:42,  1.53s/it]
[Acessing speaker spk_0 track 1 of 1:  23%|██▎       | 8/35 [00:08<00:32,  1.21s/it]
[Acessing speaker spk_0 track 1 of 1:  26%|██▌       | 9/35 [00:09<00:32,  1.24s/it]
[Acessing speaker spk_0 track 1 of 1:  29%|██▊       | 10/35 [00:10<00:26,  1.07s/it]
[Acessing speaker spk_0 track 1 of 1:  31%|███▏      | 11/3





[Acessing speaker spk_1 track 1 of 1:   0%|          | 0/42 [00:00<?, ?it/s]
[Acessing speaker spk_1 track 1 of 1:   2%|▏         | 1/42 [00:01<01:04,  1.58s/it]
[Acessing speaker spk_1 track 1 of 1:   5%|▍         | 2/42 [00:02<00:44,  1.11s/it]
[Acessing speaker spk_1 track 1 of 1:   7%|▋         | 3/42 [00:03<00:35,  1.10it/s]
[Acessing speaker spk_1 track 1 of 1:  10%|▉         | 4/42 [00:03<00:29,  1.29it/s]
[Acessing speaker spk_1 track 1 of 1:  12%|█▏        | 5/42 [00:04<00:30,  1.21it/s]
[Acessing speaker spk_1 track 1 of 1:  14%|█▍        | 6/42 [00:05<00:28,  1.28it/s]
[Acessing speaker spk_1 track 1 of 1:  17%|█▋        | 7/42 [00:05<00:24,  1.45it/s]
[Acessing speaker spk_1 track 1 of 1:  19%|█▉        | 8/42 [00:06<00:21,  1.59it/s]
[Acessing speaker spk_1 track 1 of 1:  21%|██▏       | 9/42 [00:06<00:18,  1.76it/s]
[Acessing speaker spk_1 track 1 of 1:  24%|██▍       | 10/42 [00:07<00:19,  1.63it/s]
[Acessing speaker spk_1 track 1 of 1:  26%|██▌       | 11/4





Processing speaker spk_2 track 1 of 3: 0it [00:00, ?it/s]

[Acessing speaker spk_2 track 2 of 3:   0%|          | 0/14 [00:00<?, ?it/s]
[Acessing speaker spk_2 track 2 of 3:   7%|▋         | 1/14 [00:00<00:06,  2.01it/s]
[Acessing speaker spk_2 track 2 of 3:  14%|█▍        | 2/14 [00:01<00:09,  1.30it/s]
[Acessing speaker spk_2 track 2 of 3:  21%|██▏       | 3/14 [00:02<00:11,  1.01s/it]
[Acessing speaker spk_2 track 2 of 3:  29%|██▊       | 4/14 [00:06<00:22,  2.26s/it]
[Acessing speaker spk_2 track 2 of 3:  36%|███▌      | 5/14 [00:07<00:15,  1.76s/it]
[Acessing speaker spk_2 track 2 of 3:  43%|████▎     | 6/14 [00:10<00:16,  2.05s/it]
[Acessing speaker spk_2 track 2 of 3:  50%|█████     | 7/14 [00:14<00:18,  2.66s/it]
[Acessing speaker spk_2 track 2 of 3:  57%|█████▋    | 8/14 [00:22<00:26,  4.34s/it]
[Acessing speaker spk_2 track 2 of 3:  64%|██████▍   | 9/14 [00:23<00:16,  3.23s/it]
[Acessing speaker spk_2 track 2 of 3:  71%|███████▏  | 10/14 [00:27<00:14,  3.74s/it]






[Acessing speaker spk_3 track 1 of 2:   0%|          | 0/18 [00:00<?, ?it/s]
[Acessing speaker spk_3 track 1 of 2:   6%|▌         | 1/18 [00:00<00:11,  1.45it/s]
[Acessing speaker spk_3 track 1 of 2:  11%|█         | 2/18 [00:01<00:08,  1.78it/s]
[Acessing speaker spk_3 track 1 of 2:  17%|█▋        | 3/18 [00:05<00:35,  2.38s/it]
[Acessing speaker spk_3 track 1 of 2:  22%|██▏       | 4/18 [00:07<00:30,  2.17s/it]
[Acessing speaker spk_3 track 1 of 2:  28%|██▊       | 5/18 [00:11<00:36,  2.82s/it]
[Acessing speaker spk_3 track 1 of 2:  33%|███▎      | 6/18 [00:12<00:25,  2.16s/it]
[Acessing speaker spk_3 track 1 of 2:  39%|███▉      | 7/18 [00:16<00:30,  2.74s/it]
[Acessing speaker spk_3 track 1 of 2:  44%|████▍     | 8/18 [00:18<00:26,  2.69s/it]
[Acessing speaker spk_3 track 1 of 2:  50%|█████     | 9/18 [00:22<00:26,  2.90s/it]
[Acessing speaker spk_3 track 1 of 2:  56%|█████▌    | 10/18 [00:22<00:17,  2.21s/it]
[Acessing speaker spk_3 track 1 of 2:  61%|██████    | 11/1





[Acessing speaker spk_4 track 1 of 1:   0%|          | 0/29 [00:00<?, ?it/s]
[Acessing speaker spk_4 track 1 of 1:   3%|▎         | 1/29 [00:02<01:00,  2.18s/it]
[Acessing speaker spk_4 track 1 of 1:   7%|▋         | 2/29 [00:03<00:38,  1.44s/it]
[Acessing speaker spk_4 track 1 of 1:  10%|█         | 3/29 [00:06<00:56,  2.16s/it]
[Acessing speaker spk_4 track 1 of 1:  14%|█▍        | 4/29 [00:08<00:55,  2.23s/it]
[Acessing speaker spk_4 track 1 of 1:  17%|█▋        | 5/29 [00:12<01:08,  2.84s/it]
[Acessing speaker spk_4 track 1 of 1:  21%|██        | 6/29 [00:15<01:08,  2.98s/it]
[Acessing speaker spk_4 track 1 of 1:  24%|██▍       | 7/29 [00:17<00:59,  2.70s/it]
[Acessing speaker spk_4 track 1 of 1:  28%|██▊       | 8/29 [00:19<00:51,  2.47s/it]
[Acessing speaker spk_4 track 1 of 1:  31%|███       | 9/29 [00:21<00:44,  2.23s/it]
[Acessing speaker spk_4 track 1 of 1:  34%|███▍      | 10/29 [00:23<00:41,  2.18s/it]
[Acessing speaker spk_4 track 1 of 1:  38%|███▊      | 11/2





[Acessing speaker spk_5 track 1 of 1:   0%|          | 0/33 [00:00<?, ?it/s]
[Acessing speaker spk_5 track 1 of 1:   3%|▎         | 1/33 [00:00<00:27,  1.15it/s]
[Acessing speaker spk_5 track 1 of 1:   6%|▌         | 2/33 [00:04<01:09,  2.25s/it]
[Acessing speaker spk_5 track 1 of 1:   9%|▉         | 3/33 [00:05<00:55,  1.87s/it]
[Acessing speaker spk_5 track 1 of 1:  12%|█▏        | 4/33 [00:08<01:02,  2.17s/it]
[Acessing speaker spk_5 track 1 of 1:  15%|█▌        | 5/33 [00:08<00:45,  1.64s/it]
[Acessing speaker spk_5 track 1 of 1:  18%|█▊        | 6/33 [00:10<00:48,  1.79s/it]
[Acessing speaker spk_5 track 1 of 1:  21%|██        | 7/33 [00:11<00:39,  1.50s/it]
[Acessing speaker spk_5 track 1 of 1:  24%|██▍       | 8/33 [00:13<00:36,  1.47s/it]
[Acessing speaker spk_5 track 1 of 1:  27%|██▋       | 9/33 [00:14<00:36,  1.53s/it]
[Acessing speaker spk_5 track 1 of 1:  30%|███       | 10/33 [00:16<00:38,  1.68s/it]
[Acessing speaker spk_5 track 1 of 1:  33%|███▎      | 11/3


Starte Inference für Experiment: E12_stage2_bs12_len20
  base_model      = cocktail_stage2
  model_type      = avsr_cocktail
  checkpoint_path = model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
  beam_size       = 12
  max_length      = 20
  output_dir_name = output_E12_stage2_bs12_len20
  session_dir     = data-bin/dev_without_central_videos/dev/session_40
  comment         = Stage-2-Modell, beam=12, len=20 (beste Konfiguration)
Loading avsr_cocktail model...
Loading model from model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
avsr_cocktail model loaded successfully!
Inferring 1 sessions using avsr_cocktail model
Processing session session_40


Processing speakers:   0%|          | 0/6 [00:00<?, ?it/s]





[Acessing speaker spk_0 track 1 of 1:   0%|          | 0/35 [00:00<?, ?it/s]
[Acessing speaker spk_0 track 1 of 1:   3%|▎         | 1/35 [00:00<00:24,  1.42it/s]
[Acessing speaker spk_0 track 1 of 1:   6%|▌         | 2/35 [00:01<00:19,  1.71it/s]
[Acessing speaker spk_0 track 1 of 1:   9%|▊         | 3/35 [00:01<00:20,  1.54it/s]
[Acessing speaker spk_0 track 1 of 1:  11%|█▏        | 4/35 [00:02<00:18,  1.72it/s]
[Acessing speaker spk_0 track 1 of 1:  14%|█▍        | 5/35 [00:02<00:16,  1.84it/s]
[Acessing speaker spk_0 track 1 of 1:  17%|█▋        | 6/35 [00:04<00:23,  1.24it/s]
[Acessing speaker spk_0 track 1 of 1:  20%|██        | 7/35 [00:08<00:55,  2.00s/it]
[Acessing speaker spk_0 track 1 of 1:  23%|██▎       | 8/35 [00:09<00:44,  1.66s/it]
[Acessing speaker spk_0 track 1 of 1:  26%|██▌       | 9/35 [00:11<00:42,  1.64s/it]
[Acessing speaker spk_0 track 1 of 1:  29%|██▊       | 10/35 [00:12<00:35,  1.41s/it]
[Acessing speaker spk_0 track 1 of 1:  31%|███▏      | 11/3





[Acessing speaker spk_1 track 1 of 1:   0%|          | 0/40 [00:00<?, ?it/s]
[Acessing speaker spk_1 track 1 of 1:   2%|▎         | 1/40 [00:00<00:25,  1.52it/s]
[Acessing speaker spk_1 track 1 of 1:   5%|▌         | 2/40 [00:01<00:35,  1.07it/s]
[Acessing speaker spk_1 track 1 of 1:   8%|▊         | 3/40 [00:02<00:33,  1.09it/s]
[Acessing speaker spk_1 track 1 of 1:  10%|█         | 4/40 [00:03<00:27,  1.31it/s]
[Acessing speaker spk_1 track 1 of 1:  12%|█▎        | 5/40 [00:04<00:32,  1.08it/s]
[Acessing speaker spk_1 track 1 of 1:  15%|█▌        | 6/40 [00:05<00:30,  1.10it/s]
[Acessing speaker spk_1 track 1 of 1:  18%|█▊        | 7/40 [00:05<00:26,  1.24it/s]
[Acessing speaker spk_1 track 1 of 1:  20%|██        | 8/40 [00:06<00:25,  1.26it/s]
[Acessing speaker spk_1 track 1 of 1:  22%|██▎       | 9/40 [00:07<00:23,  1.35it/s]
[Acessing speaker spk_1 track 1 of 1:  25%|██▌       | 10/40 [00:08<00:24,  1.22it/s]
[Acessing speaker spk_1 track 1 of 1:  28%|██▊       | 11/4





Processing speaker spk_2 track 1 of 3: 0it [00:00, ?it/s]

[Acessing speaker spk_2 track 2 of 3:   0%|          | 0/13 [00:00<?, ?it/s]
[Acessing speaker spk_2 track 2 of 3:   8%|▊         | 1/13 [00:00<00:07,  1.58it/s]
[Acessing speaker spk_2 track 2 of 3:  15%|█▌        | 2/13 [00:01<00:11,  1.04s/it]
[Acessing speaker spk_2 track 2 of 3:  23%|██▎       | 3/13 [00:03<00:12,  1.30s/it]
[Acessing speaker spk_2 track 2 of 3:  31%|███       | 4/13 [00:08<00:25,  2.88s/it]
[Acessing speaker spk_2 track 2 of 3:  38%|███▊      | 5/13 [00:09<00:17,  2.24s/it]
[Acessing speaker spk_2 track 2 of 3:  46%|████▌     | 6/13 [00:13<00:19,  2.77s/it]
[Acessing speaker spk_2 track 2 of 3:  54%|█████▍    | 7/13 [00:18<00:20,  3.45s/it]
[Acessing speaker spk_2 track 2 of 3:  62%|██████▏   | 8/13 [00:24<00:21,  4.38s/it]
[Acessing speaker spk_2 track 2 of 3:  69%|██████▉   | 9/13 [00:26<00:13,  3.36s/it]
[Acessing speaker spk_2 track 2 of 3:  77%|███████▋  | 10/13 [00:31<00:12,  4.11s/it]






[Acessing speaker spk_3 track 1 of 2:   0%|          | 0/18 [00:00<?, ?it/s]
[Acessing speaker spk_3 track 1 of 2:   6%|▌         | 1/18 [00:00<00:11,  1.45it/s]
[Acessing speaker spk_3 track 1 of 2:  11%|█         | 2/18 [00:01<00:12,  1.33it/s]
[Acessing speaker spk_3 track 1 of 2:  17%|█▋        | 3/18 [00:07<00:46,  3.12s/it]
[Acessing speaker spk_3 track 1 of 2:  22%|██▏       | 4/18 [00:09<00:38,  2.77s/it]
[Acessing speaker spk_3 track 1 of 2:  28%|██▊       | 5/18 [00:14<00:46,  3.55s/it]
[Acessing speaker spk_3 track 1 of 2:  33%|███▎      | 6/18 [00:15<00:33,  2.80s/it]
[Acessing speaker spk_3 track 1 of 2:  39%|███▉      | 7/18 [00:20<00:38,  3.47s/it]
[Acessing speaker spk_3 track 1 of 2:  44%|████▍     | 8/18 [00:23<00:33,  3.35s/it]
[Acessing speaker spk_3 track 1 of 2:  50%|█████     | 9/18 [00:27<00:31,  3.50s/it]
[Acessing speaker spk_3 track 1 of 2:  56%|█████▌    | 10/18 [00:28<00:21,  2.73s/it]
[Acessing speaker spk_3 track 1 of 2:  61%|██████    | 11/1





[Acessing speaker spk_4 track 1 of 1:   0%|          | 0/26 [00:00<?, ?it/s]
[Acessing speaker spk_4 track 1 of 1:   4%|▍         | 1/26 [00:02<01:04,  2.58s/it]
[Acessing speaker spk_4 track 1 of 1:   8%|▊         | 2/26 [00:03<00:40,  1.69s/it]
[Acessing speaker spk_4 track 1 of 1:  12%|█▏        | 3/26 [00:12<01:58,  5.17s/it]
[Acessing speaker spk_4 track 1 of 1:  15%|█▌        | 4/26 [00:26<03:06,  8.49s/it]
[Acessing speaker spk_4 track 1 of 1:  19%|█▉        | 5/26 [00:29<02:18,  6.62s/it]
[Acessing speaker spk_4 track 1 of 1:  23%|██▎       | 6/26 [00:31<01:41,  5.06s/it]
[Acessing speaker spk_4 track 1 of 1:  27%|██▋       | 7/26 [00:33<01:17,  4.07s/it]
[Acessing speaker spk_4 track 1 of 1:  31%|███       | 8/26 [00:36<01:05,  3.65s/it]
[Acessing speaker spk_4 track 1 of 1:  35%|███▍      | 9/26 [00:38<00:53,  3.16s/it]
[Acessing speaker spk_4 track 1 of 1:  38%|███▊      | 10/26 [00:46<01:13,  4.62s/it]
[Acessing speaker spk_4 track 1 of 1:  42%|████▏     | 11/2





[Acessing speaker spk_5 track 1 of 1:   0%|          | 0/32 [00:00<?, ?it/s]
[Acessing speaker spk_5 track 1 of 1:   3%|▎         | 1/32 [00:01<00:34,  1.11s/it]
[Acessing speaker spk_5 track 1 of 1:   6%|▋         | 2/32 [00:05<01:26,  2.88s/it]
[Acessing speaker spk_5 track 1 of 1:   9%|▉         | 3/32 [00:07<01:11,  2.45s/it]
[Acessing speaker spk_5 track 1 of 1:  12%|█▎        | 4/32 [00:10<01:17,  2.78s/it]
[Acessing speaker spk_5 track 1 of 1:  16%|█▌        | 5/32 [00:11<00:56,  2.10s/it]
[Acessing speaker spk_5 track 1 of 1:  19%|█▉        | 6/32 [00:14<00:59,  2.29s/it]
[Acessing speaker spk_5 track 1 of 1:  22%|██▏       | 7/32 [00:15<00:47,  1.90s/it]
[Acessing speaker spk_5 track 1 of 1:  25%|██▌       | 8/32 [00:18<01:00,  2.51s/it]
[Acessing speaker spk_5 track 1 of 1:  28%|██▊       | 9/32 [00:20<00:53,  2.34s/it]
[Acessing speaker spk_5 track 1 of 1:  31%|███▏      | 10/32 [00:23<00:52,  2.39s/it]
[Acessing speaker spk_5 track 1 of 1:  34%|███▍      | 11/3


Starte Inference für Experiment: E13_stage2_bs8_len20
  base_model      = cocktail_stage2
  model_type      = avsr_cocktail
  checkpoint_path = model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
  beam_size       = 8
  max_length      = 20
  output_dir_name = output_E13_stage2_bs8_len20
  session_dir     = data-bin/dev_without_central_videos/dev/session_40
  comment         = Stage-2-Modell, beam=8, len=20
Loading avsr_cocktail model...
Loading model from model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
avsr_cocktail model loaded successfully!
Inferring 1 sessions using avsr_cocktail model
Processing session session_40


Processing speakers:   0%|          | 0/6 [00:00<?, ?it/s]





[Acessing speaker spk_0 track 1 of 1:   0%|          | 0/35 [00:00<?, ?it/s]
[Acessing speaker spk_0 track 1 of 1:   3%|▎         | 1/35 [00:00<00:19,  1.79it/s]
[Acessing speaker spk_0 track 1 of 1:   6%|▌         | 2/35 [00:01<00:16,  1.96it/s]
[Acessing speaker spk_0 track 1 of 1:   9%|▊         | 3/35 [00:01<00:18,  1.75it/s]
[Acessing speaker spk_0 track 1 of 1:  11%|█▏        | 4/35 [00:02<00:16,  1.87it/s]
[Acessing speaker spk_0 track 1 of 1:  14%|█▍        | 5/35 [00:02<00:14,  2.00it/s]
[Acessing speaker spk_0 track 1 of 1:  17%|█▋        | 6/35 [00:03<00:21,  1.37it/s]
[Acessing speaker spk_0 track 1 of 1:  20%|██        | 7/35 [00:08<00:55,  1.98s/it]
[Acessing speaker spk_0 track 1 of 1:  23%|██▎       | 8/35 [00:09<00:43,  1.61s/it]
[Acessing speaker spk_0 track 1 of 1:  26%|██▌       | 9/35 [00:10<00:39,  1.53s/it]
[Acessing speaker spk_0 track 1 of 1:  29%|██▊       | 10/35 [00:11<00:33,  1.34s/it]
[Acessing speaker spk_0 track 1 of 1:  31%|███▏      | 11/3





[Acessing speaker spk_1 track 1 of 1:   0%|          | 0/40 [00:00<?, ?it/s]
[Acessing speaker spk_1 track 1 of 1:   2%|▎         | 1/40 [00:00<00:19,  1.98it/s]
[Acessing speaker spk_1 track 1 of 1:   5%|▌         | 2/40 [00:01<00:31,  1.19it/s]
[Acessing speaker spk_1 track 1 of 1:   8%|▊         | 3/40 [00:02<00:31,  1.19it/s]
[Acessing speaker spk_1 track 1 of 1:  10%|█         | 4/40 [00:02<00:26,  1.38it/s]
[Acessing speaker spk_1 track 1 of 1:  12%|█▎        | 5/40 [00:04<00:40,  1.16s/it]
[Acessing speaker spk_1 track 1 of 1:  15%|█▌        | 6/40 [00:06<00:40,  1.19s/it]
[Acessing speaker spk_1 track 1 of 1:  18%|█▊        | 7/40 [00:06<00:32,  1.03it/s]
[Acessing speaker spk_1 track 1 of 1:  20%|██        | 8/40 [00:07<00:28,  1.12it/s]
[Acessing speaker spk_1 track 1 of 1:  22%|██▎       | 9/40 [00:08<00:25,  1.24it/s]
[Acessing speaker spk_1 track 1 of 1:  25%|██▌       | 10/40 [00:08<00:24,  1.22it/s]
[Acessing speaker spk_1 track 1 of 1:  28%|██▊       | 11/4





Processing speaker spk_2 track 1 of 3: 0it [00:00, ?it/s]

[Acessing speaker spk_2 track 2 of 3:   0%|          | 0/13 [00:00<?, ?it/s]
[Acessing speaker spk_2 track 2 of 3:   8%|▊         | 1/13 [00:00<00:05,  2.01it/s]
[Acessing speaker spk_2 track 2 of 3:  15%|█▌        | 2/13 [00:01<00:09,  1.15it/s]
[Acessing speaker spk_2 track 2 of 3:  23%|██▎       | 3/13 [00:03<00:11,  1.16s/it]
[Acessing speaker spk_2 track 2 of 3:  31%|███       | 4/13 [00:07<00:22,  2.55s/it]
[Acessing speaker spk_2 track 2 of 3:  38%|███▊      | 5/13 [00:08<00:16,  2.01s/it]
[Acessing speaker spk_2 track 2 of 3:  46%|████▌     | 6/13 [00:12<00:17,  2.55s/it]
[Acessing speaker spk_2 track 2 of 3:  54%|█████▍    | 7/13 [00:16<00:19,  3.19s/it]
[Acessing speaker spk_2 track 2 of 3:  62%|██████▏   | 8/13 [00:22<00:19,  3.99s/it]
[Acessing speaker spk_2 track 2 of 3:  69%|██████▉   | 9/13 [00:23<00:12,  3.02s/it]
[Acessing speaker spk_2 track 2 of 3:  77%|███████▋  | 10/13 [00:29<00:11,  3.93s/it]






[Acessing speaker spk_3 track 1 of 2:   0%|          | 0/18 [00:00<?, ?it/s]
[Acessing speaker spk_3 track 1 of 2:   6%|▌         | 1/18 [00:00<00:12,  1.35it/s]
[Acessing speaker spk_3 track 1 of 2:  11%|█         | 2/18 [00:01<00:10,  1.47it/s]
[Acessing speaker spk_3 track 1 of 2:  17%|█▋        | 3/18 [00:08<00:50,  3.40s/it]
[Acessing speaker spk_3 track 1 of 2:  22%|██▏       | 4/18 [00:10<00:41,  2.97s/it]
[Acessing speaker spk_3 track 1 of 2:  28%|██▊       | 5/18 [00:15<00:47,  3.62s/it]
[Acessing speaker spk_3 track 1 of 2:  33%|███▎      | 6/18 [00:16<00:33,  2.80s/it]
[Acessing speaker spk_3 track 1 of 2:  39%|███▉      | 7/18 [00:20<00:37,  3.37s/it]
[Acessing speaker spk_3 track 1 of 2:  44%|████▍     | 8/18 [00:23<00:32,  3.25s/it]
[Acessing speaker spk_3 track 1 of 2:  50%|█████     | 9/18 [00:27<00:29,  3.31s/it]
[Acessing speaker spk_3 track 1 of 2:  56%|█████▌    | 10/18 [00:28<00:20,  2.54s/it]
[Acessing speaker spk_3 track 1 of 2:  61%|██████    | 11/1





[Acessing speaker spk_4 track 1 of 1:   0%|          | 0/26 [00:00<?, ?it/s]
[Acessing speaker spk_4 track 1 of 1:   4%|▍         | 1/26 [00:02<00:58,  2.35s/it]
[Acessing speaker spk_4 track 1 of 1:   8%|▊         | 2/26 [00:03<00:38,  1.59s/it]
[Acessing speaker spk_4 track 1 of 1:  12%|█▏        | 3/26 [00:11<01:47,  4.66s/it]
[Acessing speaker spk_4 track 1 of 1:  15%|█▌        | 4/26 [00:23<02:43,  7.43s/it]
[Acessing speaker spk_4 track 1 of 1:  19%|█▉        | 5/26 [00:26<02:00,  5.72s/it]
[Acessing speaker spk_4 track 1 of 1:  23%|██▎       | 6/26 [00:28<01:28,  4.44s/it]
[Acessing speaker spk_4 track 1 of 1:  27%|██▋       | 7/26 [00:29<01:08,  3.61s/it]
[Acessing speaker spk_4 track 1 of 1:  31%|███       | 8/26 [00:32<00:58,  3.25s/it]
[Acessing speaker spk_4 track 1 of 1:  35%|███▍      | 9/26 [00:34<00:48,  2.87s/it]
[Acessing speaker spk_4 track 1 of 1:  38%|███▊      | 10/26 [00:41<01:08,  4.29s/it]
[Acessing speaker spk_4 track 1 of 1:  42%|████▏     | 11/2





[Acessing speaker spk_5 track 1 of 1:   0%|          | 0/32 [00:00<?, ?it/s]
[Acessing speaker spk_5 track 1 of 1:   3%|▎         | 1/32 [00:00<00:25,  1.21it/s]
[Acessing speaker spk_5 track 1 of 1:   6%|▋         | 2/32 [00:04<01:19,  2.64s/it]
[Acessing speaker spk_5 track 1 of 1:   9%|▉         | 3/32 [00:06<01:05,  2.25s/it]
[Acessing speaker spk_5 track 1 of 1:  12%|█▎        | 4/32 [00:09<01:11,  2.55s/it]
[Acessing speaker spk_5 track 1 of 1:  16%|█▌        | 5/32 [00:10<00:51,  1.92s/it]
[Acessing speaker spk_5 track 1 of 1:  19%|█▉        | 6/32 [00:13<00:57,  2.22s/it]
[Acessing speaker spk_5 track 1 of 1:  22%|██▏       | 7/32 [00:14<00:46,  1.86s/it]
[Acessing speaker spk_5 track 1 of 1:  25%|██▌       | 8/32 [00:16<00:45,  1.91s/it]
[Acessing speaker spk_5 track 1 of 1:  28%|██▊       | 9/32 [00:18<00:44,  1.93s/it]
[Acessing speaker spk_5 track 1 of 1:  31%|███▏      | 10/32 [00:20<00:45,  2.07s/it]
[Acessing speaker spk_5 track 1 of 1:  34%|███▍      | 11/3


########## Starte Experimente für session_43 ##########

Starte Inference für Experiment: E11_stage2_bs3_len15
  base_model      = cocktail_stage2
  model_type      = avsr_cocktail
  checkpoint_path = model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
  beam_size       = 3
  max_length      = 15
  output_dir_name = output_E11_stage2_bs3_len15
  session_dir     = data-bin/dev_without_central_videos/dev/session_43
  comment         = Stage-2-Modell, beam=3, len=15
Loading avsr_cocktail model...
Loading model from model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
avsr_cocktail model loaded successfully!
Inferring 1 sessions using avsr_cocktail model
Processing session session_43


Processing speakers:   0%|          | 0/6 [00:00<?, ?it/s]





[Acessing speaker spk_0 track 1 of 2:   0%|          | 0/29 [00:00<?, ?it/s]
[Acessing speaker spk_0 track 1 of 2:   3%|▎         | 1/29 [00:00<00:21,  1.29it/s]
[Acessing speaker spk_0 track 1 of 2:   7%|▋         | 2/29 [00:01<00:18,  1.49it/s]
[Acessing speaker spk_0 track 1 of 2:  10%|█         | 3/29 [00:02<00:24,  1.07it/s]
[Acessing speaker spk_0 track 1 of 2:  14%|█▍        | 4/29 [00:03<00:18,  1.38it/s]
[Acessing speaker spk_0 track 1 of 2:  17%|█▋        | 5/29 [00:04<00:19,  1.21it/s]
[Acessing speaker spk_0 track 1 of 2:  21%|██        | 6/29 [00:06<00:30,  1.31s/it]
[Acessing speaker spk_0 track 1 of 2:  24%|██▍       | 7/29 [00:08<00:38,  1.77s/it]
[Acessing speaker spk_0 track 1 of 2:  28%|██▊       | 8/29 [00:11<00:41,  1.97s/it]
[Acessing speaker spk_0 track 1 of 2:  31%|███       | 9/29 [00:13<00:40,  2.04s/it]
[Acessing speaker spk_0 track 1 of 2:  34%|███▍      | 10/29 [00:15<00:40,  2.14s/it]
[Acessing speaker spk_0 track 1 of 2:  38%|███▊      | 11/2





[Acessing speaker spk_1 track 1 of 1:   0%|          | 0/32 [00:00<?, ?it/s]
[Acessing speaker spk_1 track 1 of 1:   3%|▎         | 1/32 [00:00<00:24,  1.26it/s]
[Acessing speaker spk_1 track 1 of 1:   6%|▋         | 2/32 [00:01<00:21,  1.38it/s]
[Acessing speaker spk_1 track 1 of 1:   9%|▉         | 3/32 [00:01<00:17,  1.62it/s]
[Acessing speaker spk_1 track 1 of 1:  12%|█▎        | 4/32 [00:04<00:42,  1.53s/it]
[Acessing speaker spk_1 track 1 of 1:  16%|█▌        | 5/32 [00:05<00:31,  1.18s/it]
[Acessing speaker spk_1 track 1 of 1:  19%|█▉        | 6/32 [00:05<00:25,  1.04it/s]
[Acessing speaker spk_1 track 1 of 1:  22%|██▏       | 7/32 [00:06<00:22,  1.13it/s]
[Acessing speaker spk_1 track 1 of 1:  25%|██▌       | 8/32 [00:08<00:24,  1.04s/it]
[Acessing speaker spk_1 track 1 of 1:  28%|██▊       | 9/32 [00:09<00:27,  1.20s/it]
[Acessing speaker spk_1 track 1 of 1:  31%|███▏      | 10/32 [00:10<00:23,  1.06s/it]
[Acessing speaker spk_1 track 1 of 1:  34%|███▍      | 11/3





[Acessing speaker spk_2 track 1 of 1:   0%|          | 0/31 [00:00<?, ?it/s]
[Acessing speaker spk_2 track 1 of 1:   3%|▎         | 1/31 [00:00<00:20,  1.44it/s]
[Acessing speaker spk_2 track 1 of 1:   6%|▋         | 2/31 [00:01<00:15,  1.90it/s]
[Acessing speaker spk_2 track 1 of 1:  10%|▉         | 3/31 [00:01<00:13,  2.10it/s]
[Acessing speaker spk_2 track 1 of 1:  13%|█▎        | 4/31 [00:02<00:14,  1.81it/s]
[Acessing speaker spk_2 track 1 of 1:  16%|█▌        | 5/31 [00:02<00:14,  1.77it/s]
[Acessing speaker spk_2 track 1 of 1:  19%|█▉        | 6/31 [00:03<00:13,  1.85it/s]
[Acessing speaker spk_2 track 1 of 1:  23%|██▎       | 7/31 [00:04<00:14,  1.62it/s]
[Acessing speaker spk_2 track 1 of 1:  26%|██▌       | 8/31 [00:04<00:11,  2.01it/s]
[Acessing speaker spk_2 track 1 of 1:  29%|██▉       | 9/31 [00:04<00:11,  1.95it/s]
[Acessing speaker spk_2 track 1 of 1:  32%|███▏      | 10/31 [00:06<00:17,  1.20it/s]
[Acessing speaker spk_2 track 1 of 1:  35%|███▌      | 11/3





[Acessing speaker spk_3 track 1 of 1:   0%|          | 0/33 [00:00<?, ?it/s]
[Acessing speaker spk_3 track 1 of 1:   3%|▎         | 1/33 [00:00<00:23,  1.38it/s]
[Acessing speaker spk_3 track 1 of 1:   6%|▌         | 2/33 [00:01<00:20,  1.52it/s]
[Acessing speaker spk_3 track 1 of 1:   9%|▉         | 3/33 [00:02<00:21,  1.39it/s]
[Acessing speaker spk_3 track 1 of 1:  12%|█▏        | 4/33 [00:02<00:18,  1.60it/s]
[Acessing speaker spk_3 track 1 of 1:  15%|█▌        | 5/33 [00:03<00:22,  1.25it/s]
[Acessing speaker spk_3 track 1 of 1:  18%|█▊        | 6/33 [00:05<00:31,  1.17s/it]
[Acessing speaker spk_3 track 1 of 1:  21%|██        | 7/33 [00:05<00:23,  1.10it/s]
[Acessing speaker spk_3 track 1 of 1:  24%|██▍       | 8/33 [00:10<00:49,  1.97s/it]
[Acessing speaker spk_3 track 1 of 1:  27%|██▋       | 9/33 [00:15<01:09,  2.88s/it]
[Acessing speaker spk_3 track 1 of 1:  30%|███       | 10/33 [00:20<01:23,  3.65s/it]
[Acessing speaker spk_3 track 1 of 1:  33%|███▎      | 11/3





[Acessing speaker spk_4 track 1 of 2:   0%|          | 0/16 [00:00<?, ?it/s]
[Acessing speaker spk_4 track 1 of 2:   6%|▋         | 1/16 [00:00<00:11,  1.35it/s]
[Acessing speaker spk_4 track 1 of 2:  12%|█▎        | 2/16 [00:02<00:15,  1.10s/it]
[Acessing speaker spk_4 track 1 of 2:  19%|█▉        | 3/16 [00:05<00:25,  1.96s/it]
[Acessing speaker spk_4 track 1 of 2:  25%|██▌       | 4/16 [00:07<00:25,  2.12s/it]
[Acessing speaker spk_4 track 1 of 2:  31%|███▏      | 5/16 [00:08<00:17,  1.61s/it]
[Acessing speaker spk_4 track 1 of 2:  38%|███▊      | 6/16 [00:08<00:12,  1.29s/it]
[Acessing speaker spk_4 track 1 of 2:  44%|████▍     | 7/16 [00:09<00:09,  1.07s/it]
[Acessing speaker spk_4 track 1 of 2:  50%|█████     | 8/16 [00:09<00:06,  1.18it/s]
[Acessing speaker spk_4 track 1 of 2:  56%|█████▋    | 9/16 [00:10<00:05,  1.22it/s]
[Acessing speaker spk_4 track 1 of 2:  62%|██████▎   | 10/16 [00:11<00:04,  1.35it/s]
[Acessing speaker spk_4 track 1 of 2:  69%|██████▉   | 11/1





[Acessing speaker spk_5 track 1 of 1:   0%|          | 0/37 [00:00<?, ?it/s]
[Acessing speaker spk_5 track 1 of 1:   3%|▎         | 1/37 [00:00<00:23,  1.52it/s]
[Acessing speaker spk_5 track 1 of 1:   5%|▌         | 2/37 [00:01<00:18,  1.87it/s]
[Acessing speaker spk_5 track 1 of 1:   8%|▊         | 3/37 [00:05<01:19,  2.33s/it]
[Acessing speaker spk_5 track 1 of 1:  11%|█         | 4/37 [00:07<01:08,  2.07s/it]
[Acessing speaker spk_5 track 1 of 1:  14%|█▎        | 5/37 [00:08<00:58,  1.84s/it]
[Acessing speaker spk_5 track 1 of 1:  16%|█▌        | 6/37 [00:09<00:45,  1.47s/it]
[Acessing speaker spk_5 track 1 of 1:  19%|█▉        | 7/37 [00:11<00:45,  1.53s/it]
[Acessing speaker spk_5 track 1 of 1:  22%|██▏       | 8/37 [00:11<00:36,  1.26s/it]
[Acessing speaker spk_5 track 1 of 1:  24%|██▍       | 9/37 [00:13<00:35,  1.27s/it]
[Acessing speaker spk_5 track 1 of 1:  27%|██▋       | 10/37 [00:13<00:27,  1.01s/it]
[Acessing speaker spk_5 track 1 of 1:  30%|██▉       | 11/3


Starte Inference für Experiment: E12_stage2_bs12_len20
  base_model      = cocktail_stage2
  model_type      = avsr_cocktail
  checkpoint_path = model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
  beam_size       = 12
  max_length      = 20
  output_dir_name = output_E12_stage2_bs12_len20
  session_dir     = data-bin/dev_without_central_videos/dev/session_43
  comment         = Stage-2-Modell, beam=12, len=20 (beste Konfiguration)
Loading avsr_cocktail model...
Loading model from model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
avsr_cocktail model loaded successfully!
Inferring 1 sessions using avsr_cocktail model
Processing session session_43


Processing speakers:   0%|          | 0/6 [00:00<?, ?it/s]





[Acessing speaker spk_0 track 1 of 2:   0%|          | 0/27 [00:00<?, ?it/s]
[Acessing speaker spk_0 track 1 of 2:   4%|▎         | 1/27 [00:00<00:18,  1.38it/s]
[Acessing speaker spk_0 track 1 of 2:   7%|▋         | 2/27 [00:01<00:18,  1.37it/s]
[Acessing speaker spk_0 track 1 of 2:  11%|█         | 3/27 [00:03<00:27,  1.14s/it]
[Acessing speaker spk_0 track 1 of 2:  15%|█▍        | 4/27 [00:03<00:21,  1.07it/s]
[Acessing speaker spk_0 track 1 of 2:  19%|█▊        | 5/27 [00:04<00:22,  1.04s/it]
[Acessing speaker spk_0 track 1 of 2:  22%|██▏       | 6/27 [00:09<00:46,  2.21s/it]
[Acessing speaker spk_0 track 1 of 2:  26%|██▌       | 7/27 [00:18<01:30,  4.53s/it]
[Acessing speaker spk_0 track 1 of 2:  30%|██▉       | 8/27 [00:26<01:46,  5.59s/it]
[Acessing speaker spk_0 track 1 of 2:  33%|███▎      | 9/27 [00:27<01:14,  4.12s/it]
[Acessing speaker spk_0 track 1 of 2:  37%|███▋      | 10/27 [00:30<01:04,  3.77s/it]
[Acessing speaker spk_0 track 1 of 2:  41%|████      | 11/2





[Acessing speaker spk_1 track 1 of 1:   0%|          | 0/32 [00:00<?, ?it/s]
[Acessing speaker spk_1 track 1 of 1:   3%|▎         | 1/32 [00:00<00:29,  1.05it/s]
[Acessing speaker spk_1 track 1 of 1:   6%|▋         | 2/32 [00:01<00:29,  1.00it/s]
[Acessing speaker spk_1 track 1 of 1:   9%|▉         | 3/32 [00:02<00:24,  1.19it/s]
[Acessing speaker spk_1 track 1 of 1:  12%|█▎        | 4/32 [00:06<00:55,  1.99s/it]
[Acessing speaker spk_1 track 1 of 1:  16%|█▌        | 5/32 [00:07<00:43,  1.62s/it]
[Acessing speaker spk_1 track 1 of 1:  19%|█▉        | 6/32 [00:08<00:33,  1.30s/it]
[Acessing speaker spk_1 track 1 of 1:  22%|██▏       | 7/32 [00:09<00:29,  1.20s/it]
[Acessing speaker spk_1 track 1 of 1:  25%|██▌       | 8/32 [00:10<00:31,  1.30s/it]
[Acessing speaker spk_1 track 1 of 1:  28%|██▊       | 9/32 [00:12<00:34,  1.52s/it]
[Acessing speaker spk_1 track 1 of 1:  31%|███▏      | 10/32 [00:13<00:29,  1.34s/it]
[Acessing speaker spk_1 track 1 of 1:  34%|███▍      | 11/3





[Acessing speaker spk_2 track 1 of 1:   0%|          | 0/29 [00:00<?, ?it/s]
[Acessing speaker spk_2 track 1 of 1:   3%|▎         | 1/29 [00:00<00:22,  1.24it/s]
[Acessing speaker spk_2 track 1 of 1:   7%|▋         | 2/29 [00:01<00:18,  1.44it/s]
[Acessing speaker spk_2 track 1 of 1:  10%|█         | 3/29 [00:02<00:18,  1.38it/s]
[Acessing speaker spk_2 track 1 of 1:  14%|█▍        | 4/29 [00:03<00:19,  1.27it/s]
[Acessing speaker spk_2 track 1 of 1:  17%|█▋        | 5/29 [00:03<00:19,  1.21it/s]
[Acessing speaker spk_2 track 1 of 1:  21%|██        | 6/29 [00:04<00:17,  1.34it/s]
[Acessing speaker spk_2 track 1 of 1:  24%|██▍       | 7/29 [00:05<00:18,  1.17it/s]
[Acessing speaker spk_2 track 1 of 1:  28%|██▊       | 8/29 [00:06<00:16,  1.31it/s]
[Acessing speaker spk_2 track 1 of 1:  31%|███       | 9/29 [00:07<00:15,  1.29it/s]
[Acessing speaker spk_2 track 1 of 1:  34%|███▍      | 10/29 [00:15<00:57,  3.03s/it]
[Acessing speaker spk_2 track 1 of 1:  38%|███▊      | 11/2





[Acessing speaker spk_3 track 1 of 1:   0%|          | 0/31 [00:00<?, ?it/s]
[Acessing speaker spk_3 track 1 of 1:   3%|▎         | 1/31 [00:00<00:20,  1.44it/s]
[Acessing speaker spk_3 track 1 of 1:   6%|▋         | 2/31 [00:01<00:23,  1.25it/s]
[Acessing speaker spk_3 track 1 of 1:  10%|▉         | 3/31 [00:02<00:25,  1.12it/s]
[Acessing speaker spk_3 track 1 of 1:  13%|█▎        | 4/31 [00:03<00:22,  1.22it/s]
[Acessing speaker spk_3 track 1 of 1:  16%|█▌        | 5/31 [00:04<00:25,  1.00it/s]
[Acessing speaker spk_3 track 1 of 1:  19%|█▉        | 6/31 [00:07<00:42,  1.69s/it]
[Acessing speaker spk_3 track 1 of 1:  23%|██▎       | 7/31 [00:08<00:34,  1.42s/it]
[Acessing speaker spk_3 track 1 of 1:  26%|██▌       | 8/31 [00:18<01:34,  4.10s/it]
[Acessing speaker spk_3 track 1 of 1:  29%|██▉       | 9/31 [00:31<02:34,  7.05s/it]
[Acessing speaker spk_3 track 1 of 1:  32%|███▏      | 10/31 [00:33<01:53,  5.39s/it]
[Acessing speaker spk_3 track 1 of 1:  35%|███▌      | 11/3





[Acessing speaker spk_4 track 1 of 2:   0%|          | 0/16 [00:00<?, ?it/s]
[Acessing speaker spk_4 track 1 of 2:   6%|▋         | 1/16 [00:00<00:11,  1.29it/s]
[Acessing speaker spk_4 track 1 of 2:  12%|█▎        | 2/16 [00:02<00:18,  1.31s/it]
[Acessing speaker spk_4 track 1 of 2:  19%|█▉        | 3/16 [00:06<00:31,  2.42s/it]
[Acessing speaker spk_4 track 1 of 2:  25%|██▌       | 4/16 [00:09<00:31,  2.66s/it]
[Acessing speaker spk_4 track 1 of 2:  31%|███▏      | 5/16 [00:10<00:23,  2.10s/it]
[Acessing speaker spk_4 track 1 of 2:  38%|███▊      | 6/16 [00:11<00:17,  1.71s/it]
[Acessing speaker spk_4 track 1 of 2:  44%|████▍     | 7/16 [00:12<00:12,  1.40s/it]
[Acessing speaker spk_4 track 1 of 2:  50%|█████     | 8/16 [00:12<00:09,  1.18s/it]
[Acessing speaker spk_4 track 1 of 2:  56%|█████▋    | 9/16 [00:13<00:07,  1.07s/it]
[Acessing speaker spk_4 track 1 of 2:  62%|██████▎   | 10/16 [00:14<00:06,  1.01s/it]
[Acessing speaker spk_4 track 1 of 2:  69%|██████▉   | 11/1





[Acessing speaker spk_5 track 1 of 1:   0%|          | 0/37 [00:00<?, ?it/s]
[Acessing speaker spk_5 track 1 of 1:   3%|▎         | 1/37 [00:00<00:21,  1.69it/s]
[Acessing speaker spk_5 track 1 of 1:   5%|▌         | 2/37 [00:01<00:22,  1.59it/s]
[Acessing speaker spk_5 track 1 of 1:   8%|▊         | 3/37 [00:06<01:39,  2.93s/it]
[Acessing speaker spk_5 track 1 of 1:  11%|█         | 4/37 [00:08<01:24,  2.55s/it]
[Acessing speaker spk_5 track 1 of 1:  14%|█▎        | 5/37 [00:10<01:10,  2.21s/it]
[Acessing speaker spk_5 track 1 of 1:  16%|█▌        | 6/37 [00:11<00:55,  1.80s/it]
[Acessing speaker spk_5 track 1 of 1:  19%|█▉        | 7/37 [00:13<00:55,  1.86s/it]
[Acessing speaker spk_5 track 1 of 1:  22%|██▏       | 8/37 [00:14<00:46,  1.59s/it]
[Acessing speaker spk_5 track 1 of 1:  24%|██▍       | 9/37 [00:16<00:45,  1.63s/it]
[Acessing speaker spk_5 track 1 of 1:  27%|██▋       | 10/37 [00:16<00:35,  1.31s/it]
[Acessing speaker spk_5 track 1 of 1:  30%|██▉       | 11/3


Starte Inference für Experiment: E13_stage2_bs8_len20
  base_model      = cocktail_stage2
  model_type      = avsr_cocktail
  checkpoint_path = model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
  beam_size       = 8
  max_length      = 20
  output_dir_name = output_E13_stage2_bs8_len20
  session_dir     = data-bin/dev_without_central_videos/dev/session_43
  comment         = Stage-2-Modell, beam=8, len=20
Loading avsr_cocktail model...
Loading model from model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
avsr_cocktail model loaded successfully!
Inferring 1 sessions using avsr_cocktail model
Processing session session_43


Processing speakers:   0%|          | 0/6 [00:00<?, ?it/s]





[Acessing speaker spk_0 track 1 of 2:   0%|          | 0/27 [00:00<?, ?it/s]
[Acessing speaker spk_0 track 1 of 2:   4%|▎         | 1/27 [00:00<00:15,  1.70it/s]
[Acessing speaker spk_0 track 1 of 2:   7%|▋         | 2/27 [00:01<00:15,  1.60it/s]
[Acessing speaker spk_0 track 1 of 2:  11%|█         | 3/27 [00:02<00:24,  1.01s/it]
[Acessing speaker spk_0 track 1 of 2:  15%|█▍        | 4/27 [00:03<00:18,  1.24it/s]
[Acessing speaker spk_0 track 1 of 2:  19%|█▊        | 5/27 [00:04<00:20,  1.10it/s]
[Acessing speaker spk_0 track 1 of 2:  22%|██▏       | 6/27 [00:06<00:31,  1.51s/it]
[Acessing speaker spk_0 track 1 of 2:  26%|██▌       | 7/27 [00:15<01:15,  3.79s/it]
[Acessing speaker spk_0 track 1 of 2:  30%|██▉       | 8/27 [00:22<01:33,  4.91s/it]
[Acessing speaker spk_0 track 1 of 2:  33%|███▎      | 9/27 [00:23<01:04,  3.60s/it]
[Acessing speaker spk_0 track 1 of 2:  37%|███▋      | 10/27 [00:26<00:56,  3.33s/it]
[Acessing speaker spk_0 track 1 of 2:  41%|████      | 11/2





[Acessing speaker spk_1 track 1 of 1:   0%|          | 0/32 [00:00<?, ?it/s]
[Acessing speaker spk_1 track 1 of 1:   3%|▎         | 1/32 [00:00<00:27,  1.12it/s]
[Acessing speaker spk_1 track 1 of 1:   6%|▋         | 2/32 [00:01<00:24,  1.23it/s]
[Acessing speaker spk_1 track 1 of 1:   9%|▉         | 3/32 [00:02<00:21,  1.35it/s]
[Acessing speaker spk_1 track 1 of 1:  12%|█▎        | 4/32 [00:05<00:50,  1.80s/it]
[Acessing speaker spk_1 track 1 of 1:  16%|█▌        | 5/32 [00:06<00:39,  1.45s/it]
[Acessing speaker spk_1 track 1 of 1:  19%|█▉        | 6/32 [00:07<00:29,  1.15s/it]
[Acessing speaker spk_1 track 1 of 1:  22%|██▏       | 7/32 [00:07<00:26,  1.06s/it]
[Acessing speaker spk_1 track 1 of 1:  25%|██▌       | 8/32 [00:09<00:28,  1.20s/it]
[Acessing speaker spk_1 track 1 of 1:  28%|██▊       | 9/32 [00:11<00:31,  1.37s/it]
[Acessing speaker spk_1 track 1 of 1:  31%|███▏      | 10/32 [00:12<00:26,  1.22s/it]
[Acessing speaker spk_1 track 1 of 1:  34%|███▍      | 11/3





[Acessing speaker spk_2 track 1 of 1:   0%|          | 0/29 [00:00<?, ?it/s]
[Acessing speaker spk_2 track 1 of 1:   3%|▎         | 1/29 [00:00<00:20,  1.39it/s]
[Acessing speaker spk_2 track 1 of 1:   7%|▋         | 2/29 [00:01<00:17,  1.55it/s]
[Acessing speaker spk_2 track 1 of 1:  10%|█         | 3/29 [00:01<00:17,  1.52it/s]
[Acessing speaker spk_2 track 1 of 1:  14%|█▍        | 4/29 [00:02<00:17,  1.41it/s]
[Acessing speaker spk_2 track 1 of 1:  17%|█▋        | 5/29 [00:03<00:17,  1.39it/s]
[Acessing speaker spk_2 track 1 of 1:  21%|██        | 6/29 [00:04<00:15,  1.51it/s]
[Acessing speaker spk_2 track 1 of 1:  24%|██▍       | 7/29 [00:05<00:17,  1.26it/s]
[Acessing speaker spk_2 track 1 of 1:  28%|██▊       | 8/29 [00:05<00:14,  1.43it/s]
[Acessing speaker spk_2 track 1 of 1:  31%|███       | 9/29 [00:06<00:13,  1.53it/s]
[Acessing speaker spk_2 track 1 of 1:  34%|███▍      | 10/29 [00:13<00:53,  2.80s/it]
[Acessing speaker spk_2 track 1 of 1:  38%|███▊      | 11/2





[Acessing speaker spk_3 track 1 of 1:   0%|          | 0/31 [00:00<?, ?it/s]
[Acessing speaker spk_3 track 1 of 1:   3%|▎         | 1/31 [00:00<00:17,  1.69it/s]
[Acessing speaker spk_3 track 1 of 1:   6%|▋         | 2/31 [00:01<00:18,  1.60it/s]
[Acessing speaker spk_3 track 1 of 1:  10%|▉         | 3/31 [00:02<00:20,  1.36it/s]
[Acessing speaker spk_3 track 1 of 1:  13%|█▎        | 4/31 [00:02<00:18,  1.44it/s]
[Acessing speaker spk_3 track 1 of 1:  16%|█▌        | 5/31 [00:03<00:22,  1.15it/s]
[Acessing speaker spk_3 track 1 of 1:  19%|█▉        | 6/31 [00:06<00:36,  1.46s/it]
[Acessing speaker spk_3 track 1 of 1:  23%|██▎       | 7/31 [00:07<00:28,  1.18s/it]
[Acessing speaker spk_3 track 1 of 1:  26%|██▌       | 8/31 [00:16<01:26,  3.78s/it]
[Acessing speaker spk_3 track 1 of 1:  29%|██▉       | 9/31 [00:28<02:22,  6.47s/it]
[Acessing speaker spk_3 track 1 of 1:  32%|███▏      | 10/31 [00:30<01:43,  4.92s/it]
[Acessing speaker spk_3 track 1 of 1:  35%|███▌      | 11/3





[Acessing speaker spk_4 track 1 of 2:   0%|          | 0/16 [00:00<?, ?it/s]
[Acessing speaker spk_4 track 1 of 2:   6%|▋         | 1/16 [00:00<00:09,  1.65it/s]
[Acessing speaker spk_4 track 1 of 2:  12%|█▎        | 2/16 [00:02<00:15,  1.11s/it]
[Acessing speaker spk_4 track 1 of 2:  19%|█▉        | 3/16 [00:05<00:28,  2.18s/it]
[Acessing speaker spk_4 track 1 of 2:  25%|██▌       | 4/16 [00:08<00:28,  2.39s/it]
[Acessing speaker spk_4 track 1 of 2:  31%|███▏      | 5/16 [00:09<00:20,  1.85s/it]
[Acessing speaker spk_4 track 1 of 2:  38%|███▊      | 6/16 [00:10<00:16,  1.63s/it]
[Acessing speaker spk_4 track 1 of 2:  44%|████▍     | 7/16 [00:11<00:12,  1.44s/it]
[Acessing speaker spk_4 track 1 of 2:  50%|█████     | 8/16 [00:12<00:09,  1.18s/it]
[Acessing speaker spk_4 track 1 of 2:  56%|█████▋    | 9/16 [00:12<00:07,  1.09s/it]
[Acessing speaker spk_4 track 1 of 2:  62%|██████▎   | 10/16 [00:13<00:05,  1.03it/s]
[Acessing speaker spk_4 track 1 of 2:  69%|██████▉   | 11/1





[Acessing speaker spk_5 track 1 of 1:   0%|          | 0/37 [00:00<?, ?it/s]
[Acessing speaker spk_5 track 1 of 1:   3%|▎         | 1/37 [00:00<00:20,  1.76it/s]
[Acessing speaker spk_5 track 1 of 1:   5%|▌         | 2/37 [00:01<00:17,  1.96it/s]
[Acessing speaker spk_5 track 1 of 1:   8%|▊         | 3/37 [00:06<01:29,  2.62s/it]
[Acessing speaker spk_5 track 1 of 1:  11%|█         | 4/37 [00:08<01:18,  2.38s/it]
[Acessing speaker spk_5 track 1 of 1:  14%|█▎        | 5/37 [00:09<01:05,  2.05s/it]
[Acessing speaker spk_5 track 1 of 1:  16%|█▌        | 6/37 [00:10<00:52,  1.69s/it]
[Acessing speaker spk_5 track 1 of 1:  19%|█▉        | 7/37 [00:12<00:51,  1.73s/it]
[Acessing speaker spk_5 track 1 of 1:  22%|██▏       | 8/37 [00:13<00:40,  1.41s/it]
[Acessing speaker spk_5 track 1 of 1:  24%|██▍       | 9/37 [00:14<00:40,  1.46s/it]
[Acessing speaker spk_5 track 1 of 1:  27%|██▋       | 10/37 [00:16<00:42,  1.59s/it]
[Acessing speaker spk_5 track 1 of 1:  30%|██▉       | 11/3


########## Starte Experimente für session_49 ##########

Starte Inference für Experiment: E11_stage2_bs3_len15
  base_model      = cocktail_stage2
  model_type      = avsr_cocktail
  checkpoint_path = model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
  beam_size       = 3
  max_length      = 15
  output_dir_name = output_E11_stage2_bs3_len15
  session_dir     = data-bin/dev_without_central_videos/dev/session_49
  comment         = Stage-2-Modell, beam=3, len=15
Loading avsr_cocktail model...
Loading model from model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
avsr_cocktail model loaded successfully!
Inferring 1 sessions using avsr_cocktail model
Processing session session_49


Processing speakers:   0%|          | 0/6 [00:00<?, ?it/s]





[Acessing speaker spk_0 track 1 of 1:   0%|          | 0/12 [00:00<?, ?it/s]
[Acessing speaker spk_0 track 1 of 1:   8%|▊         | 1/12 [00:01<00:12,  1.16s/it]
[Acessing speaker spk_0 track 1 of 1:  17%|█▋        | 2/12 [00:02<00:09,  1.02it/s]
[Acessing speaker spk_0 track 1 of 1:  25%|██▌       | 3/12 [00:02<00:06,  1.34it/s]
[Acessing speaker spk_0 track 1 of 1:  33%|███▎      | 4/12 [00:02<00:04,  1.63it/s]
[Acessing speaker spk_0 track 1 of 1:  42%|████▏     | 5/12 [00:03<00:03,  1.80it/s]
[Acessing speaker spk_0 track 1 of 1:  50%|█████     | 6/12 [00:03<00:03,  1.73it/s]
[Acessing speaker spk_0 track 1 of 1:  58%|█████▊    | 7/12 [00:04<00:02,  1.77it/s]
[Acessing speaker spk_0 track 1 of 1:  67%|██████▋   | 8/12 [00:05<00:02,  1.73it/s]
[Acessing speaker spk_0 track 1 of 1:  75%|███████▌  | 9/12 [00:07<00:03,  1.13s/it]
[Acessing speaker spk_0 track 1 of 1:  83%|████████▎ | 10/12 [00:08<00:02,  1.03s/it]
[Acessing speaker spk_0 track 1 of 1:  92%|█████████▏| 11/1





[Acessing speaker spk_1 track 1 of 1:   0%|          | 0/15 [00:00<?, ?it/s]
[Acessing speaker spk_1 track 1 of 1:   7%|▋         | 1/15 [00:05<01:21,  5.84s/it]
[Acessing speaker spk_1 track 1 of 1:  13%|█▎        | 2/15 [00:06<00:37,  2.89s/it]
[Acessing speaker spk_1 track 1 of 1:  20%|██        | 3/15 [00:07<00:21,  1.79s/it]
[Acessing speaker spk_1 track 1 of 1:  27%|██▋       | 4/15 [00:07<00:14,  1.28s/it]
[Acessing speaker spk_1 track 1 of 1:  33%|███▎      | 5/15 [00:08<00:09,  1.01it/s]
[Acessing speaker spk_1 track 1 of 1:  40%|████      | 6/15 [00:09<00:08,  1.04it/s]
[Acessing speaker spk_1 track 1 of 1:  47%|████▋     | 7/15 [00:10<00:08,  1.00s/it]
[Acessing speaker spk_1 track 1 of 1:  53%|█████▎    | 8/15 [00:12<00:10,  1.44s/it]
[Acessing speaker spk_1 track 1 of 1:  60%|██████    | 9/15 [00:15<00:11,  1.86s/it]
[Acessing speaker spk_1 track 1 of 1:  67%|██████▋   | 10/15 [00:18<00:11,  2.30s/it]
[Acessing speaker spk_1 track 1 of 1:  73%|███████▎  | 11/1





[Acessing speaker spk_2 track 1 of 8:   0%|          | 0/1 [00:00<?, ?it/s]
Processing speaker spk_2 track 1 of 8: 100%|██████████| 1/1 [00:00<00:00,  2.26it/s]

[Acessing speaker spk_2 track 2 of 8:   0%|          | 0/1 [00:00<?, ?it/s]
Processing speaker spk_2 track 2 of 8: 100%|██████████| 1/1 [00:00<00:00,  1.96it/s]

[Acessing speaker spk_2 track 3 of 8:   0%|          | 0/1 [00:00<?, ?it/s]
Processing speaker spk_2 track 3 of 8: 100%|██████████| 1/1 [00:00<00:00,  1.92it/s]

[Acessing speaker spk_2 track 4 of 8:   0%|          | 0/4 [00:00<?, ?it/s]
[Acessing speaker spk_2 track 4 of 8:  25%|██▌       | 1/4 [00:03<00:10,  3.58s/it]
[Acessing speaker spk_2 track 4 of 8:  50%|█████     | 2/4 [00:07<00:07,  3.95s/it]
[Acessing speaker spk_2 track 4 of 8:  75%|███████▌  | 3/4 [00:10<00:03,  3.22s/it]
Processing speaker spk_2 track 4 of 8: 100%|██████████| 4/4 [00:12<00:00,  3.04s/it]

[Acessing speaker spk_2 track 5 of 8:   0%|          | 0/2 [00:00<?, ?it/s]
[Acessing spea





[Acessing speaker spk_3 track 1 of 1:   0%|          | 0/21 [00:00<?, ?it/s]
[Acessing speaker spk_3 track 1 of 1:   5%|▍         | 1/21 [00:00<00:15,  1.32it/s]
[Acessing speaker spk_3 track 1 of 1:  10%|▉         | 2/21 [00:01<00:18,  1.04it/s]
[Acessing speaker spk_3 track 1 of 1:  14%|█▍        | 3/21 [00:04<00:29,  1.64s/it]
[Acessing speaker spk_3 track 1 of 1:  19%|█▉        | 4/21 [00:07<00:39,  2.32s/it]
[Acessing speaker spk_3 track 1 of 1:  24%|██▍       | 5/21 [00:09<00:36,  2.29s/it]
[Acessing speaker spk_3 track 1 of 1:  29%|██▊       | 6/21 [00:10<00:25,  1.73s/it]
[Acessing speaker spk_3 track 1 of 1:  33%|███▎      | 7/21 [00:13<00:30,  2.17s/it]
[Acessing speaker spk_3 track 1 of 1:  38%|███▊      | 8/21 [00:16<00:32,  2.52s/it]
[Acessing speaker spk_3 track 1 of 1:  43%|████▎     | 9/21 [00:18<00:25,  2.12s/it]
[Acessing speaker spk_3 track 1 of 1:  48%|████▊     | 10/21 [00:20<00:22,  2.05s/it]
[Acessing speaker spk_3 track 1 of 1:  52%|█████▏    | 11/2





[Acessing speaker spk_4 track 1 of 1:   0%|          | 0/24 [00:00<?, ?it/s]
[Acessing speaker spk_4 track 1 of 1:   4%|▍         | 1/24 [00:04<01:53,  4.93s/it]
[Acessing speaker spk_4 track 1 of 1:   8%|▊         | 2/24 [00:08<01:26,  3.95s/it]
[Acessing speaker spk_4 track 1 of 1:  12%|█▎        | 3/24 [00:13<01:32,  4.40s/it]
[Acessing speaker spk_4 track 1 of 1:  17%|█▋        | 4/24 [00:17<01:28,  4.44s/it]
[Acessing speaker spk_4 track 1 of 1:  21%|██        | 5/24 [00:21<01:17,  4.10s/it]
[Acessing speaker spk_4 track 1 of 1:  25%|██▌       | 6/24 [00:24<01:10,  3.92s/it]
[Acessing speaker spk_4 track 1 of 1:  29%|██▉       | 7/24 [00:27<01:01,  3.62s/it]
[Acessing speaker spk_4 track 1 of 1:  33%|███▎      | 8/24 [00:28<00:42,  2.68s/it]
[Acessing speaker spk_4 track 1 of 1:  38%|███▊      | 9/24 [00:28<00:29,  1.97s/it]
[Acessing speaker spk_4 track 1 of 1:  42%|████▏     | 10/24 [00:29<00:22,  1.62s/it]
[Acessing speaker spk_4 track 1 of 1:  46%|████▌     | 11/2





[Acessing speaker spk_5 track 1 of 2:   0%|          | 0/22 [00:00<?, ?it/s]
[Acessing speaker spk_5 track 1 of 2:   5%|▍         | 1/22 [00:00<00:18,  1.15it/s]
[Acessing speaker spk_5 track 1 of 2:   9%|▉         | 2/22 [00:01<00:12,  1.55it/s]
[Acessing speaker spk_5 track 1 of 2:  14%|█▎        | 3/22 [00:02<00:20,  1.06s/it]
[Acessing speaker spk_5 track 1 of 2:  18%|█▊        | 4/22 [00:03<00:14,  1.26it/s]
[Acessing speaker spk_5 track 1 of 2:  23%|██▎       | 5/22 [00:03<00:12,  1.34it/s]
[Acessing speaker spk_5 track 1 of 2:  27%|██▋       | 6/22 [00:04<00:12,  1.33it/s]
[Acessing speaker spk_5 track 1 of 2:  32%|███▏      | 7/22 [00:05<00:10,  1.48it/s]
[Acessing speaker spk_5 track 1 of 2:  36%|███▋      | 8/22 [00:06<00:10,  1.37it/s]
[Acessing speaker spk_5 track 1 of 2:  41%|████      | 9/22 [00:06<00:09,  1.38it/s]
[Acessing speaker spk_5 track 1 of 2:  45%|████▌     | 10/22 [00:09<00:14,  1.21s/it]
[Acessing speaker spk_5 track 1 of 2:  50%|█████     | 11/2


Starte Inference für Experiment: E12_stage2_bs12_len20
  base_model      = cocktail_stage2
  model_type      = avsr_cocktail
  checkpoint_path = model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
  beam_size       = 12
  max_length      = 20
  output_dir_name = output_E12_stage2_bs12_len20
  session_dir     = data-bin/dev_without_central_videos/dev/session_49
  comment         = Stage-2-Modell, beam=12, len=20 (beste Konfiguration)
Loading avsr_cocktail model...
Loading model from model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
avsr_cocktail model loaded successfully!
Inferring 1 sessions using avsr_cocktail model
Processing session session_49


Processing speakers:   0%|          | 0/6 [00:00<?, ?it/s]





[Acessing speaker spk_0 track 1 of 1:   0%|          | 0/12 [00:00<?, ?it/s]
[Acessing speaker spk_0 track 1 of 1:   8%|▊         | 1/12 [00:01<00:13,  1.23s/it]
[Acessing speaker spk_0 track 1 of 1:  17%|█▋        | 2/12 [00:02<00:11,  1.15s/it]
[Acessing speaker spk_0 track 1 of 1:  25%|██▌       | 3/12 [00:02<00:07,  1.13it/s]
[Acessing speaker spk_0 track 1 of 1:  33%|███▎      | 4/12 [00:03<00:05,  1.35it/s]
[Acessing speaker spk_0 track 1 of 1:  42%|████▏     | 5/12 [00:04<00:05,  1.27it/s]
[Acessing speaker spk_0 track 1 of 1:  50%|█████     | 6/12 [00:05<00:04,  1.25it/s]
[Acessing speaker spk_0 track 1 of 1:  58%|█████▊    | 7/12 [00:05<00:03,  1.26it/s]
[Acessing speaker spk_0 track 1 of 1:  67%|██████▋   | 8/12 [00:06<00:03,  1.27it/s]
[Acessing speaker spk_0 track 1 of 1:  75%|███████▌  | 9/12 [00:10<00:04,  1.65s/it]
[Acessing speaker spk_0 track 1 of 1:  83%|████████▎ | 10/12 [00:11<00:02,  1.39s/it]
[Acessing speaker spk_0 track 1 of 1:  92%|█████████▏| 11/1





[Acessing speaker spk_1 track 1 of 1:   0%|          | 0/14 [00:00<?, ?it/s]
[Acessing speaker spk_1 track 1 of 1:   7%|▋         | 1/14 [00:08<01:44,  8.03s/it]
[Acessing speaker spk_1 track 1 of 1:  14%|█▍        | 2/14 [00:09<00:48,  4.07s/it]
[Acessing speaker spk_1 track 1 of 1:  21%|██▏       | 3/14 [00:09<00:27,  2.52s/it]
[Acessing speaker spk_1 track 1 of 1:  29%|██▊       | 4/14 [00:10<00:18,  1.83s/it]
[Acessing speaker spk_1 track 1 of 1:  36%|███▌      | 5/14 [00:11<00:13,  1.45s/it]
[Acessing speaker spk_1 track 1 of 1:  43%|████▎     | 6/14 [00:12<00:10,  1.36s/it]
[Acessing speaker spk_1 track 1 of 1:  50%|█████     | 7/14 [00:14<00:09,  1.35s/it]
[Acessing speaker spk_1 track 1 of 1:  57%|█████▋    | 8/14 [00:16<00:09,  1.56s/it]
[Acessing speaker spk_1 track 1 of 1:  64%|██████▍   | 9/14 [00:19<00:10,  2.19s/it]
[Acessing speaker spk_1 track 1 of 1:  71%|███████▏  | 10/14 [00:30<00:19,  4.93s/it]
[Acessing speaker spk_1 track 1 of 1:  79%|███████▊  | 11/1





[Acessing speaker spk_2 track 1 of 8:   0%|          | 0/1 [00:00<?, ?it/s]
Processing speaker spk_2 track 1 of 8: 100%|██████████| 1/1 [00:00<00:00,  1.53it/s]

[Acessing speaker spk_2 track 2 of 8:   0%|          | 0/1 [00:00<?, ?it/s]
Processing speaker spk_2 track 2 of 8: 100%|██████████| 1/1 [00:00<00:00,  1.47it/s]

[Acessing speaker spk_2 track 3 of 8:   0%|          | 0/1 [00:00<?, ?it/s]
Processing speaker spk_2 track 3 of 8: 100%|██████████| 1/1 [00:00<00:00,  1.87it/s]

[Acessing speaker spk_2 track 4 of 8:   0%|          | 0/3 [00:00<?, ?it/s]
[Acessing speaker spk_2 track 4 of 8:  33%|███▎      | 1/3 [00:05<00:10,  5.05s/it]
[Acessing speaker spk_2 track 4 of 8:  67%|██████▋   | 2/3 [00:09<00:04,  4.97s/it]
Processing speaker spk_2 track 4 of 8: 100%|██████████| 3/3 [00:17<00:00,  5.87s/it]

[Acessing speaker spk_2 track 5 of 8:   0%|          | 0/2 [00:00<?, ?it/s]
[Acessing speaker spk_2 track 5 of 8:  50%|█████     | 1/2 [00:04<00:04,  4.17s/it]
Processing spea





[Acessing speaker spk_3 track 1 of 1:   0%|          | 0/21 [00:00<?, ?it/s]
[Acessing speaker spk_3 track 1 of 1:   5%|▍         | 1/21 [00:00<00:16,  1.20it/s]
[Acessing speaker spk_3 track 1 of 1:  10%|▉         | 2/21 [00:02<00:22,  1.18s/it]
[Acessing speaker spk_3 track 1 of 1:  14%|█▍        | 3/21 [00:05<00:37,  2.08s/it]
[Acessing speaker spk_3 track 1 of 1:  19%|█▉        | 4/21 [00:09<00:49,  2.92s/it]
[Acessing speaker spk_3 track 1 of 1:  24%|██▍       | 5/21 [00:12<00:43,  2.75s/it]
[Acessing speaker spk_3 track 1 of 1:  29%|██▊       | 6/21 [00:13<00:32,  2.16s/it]
[Acessing speaker spk_3 track 1 of 1:  33%|███▎      | 7/21 [00:18<00:46,  3.30s/it]
[Acessing speaker spk_3 track 1 of 1:  38%|███▊      | 8/21 [00:22<00:44,  3.41s/it]
[Acessing speaker spk_3 track 1 of 1:  43%|████▎     | 9/21 [00:23<00:33,  2.83s/it]
[Acessing speaker spk_3 track 1 of 1:  48%|████▊     | 10/21 [00:26<00:29,  2.69s/it]
[Acessing speaker spk_3 track 1 of 1:  52%|█████▏    | 11/2





[Acessing speaker spk_4 track 1 of 1:   0%|          | 0/22 [00:00<?, ?it/s]
[Acessing speaker spk_4 track 1 of 1:   5%|▍         | 1/22 [00:05<02:00,  5.73s/it]
[Acessing speaker spk_4 track 1 of 1:   9%|▉         | 2/22 [00:13<02:21,  7.05s/it]
[Acessing speaker spk_4 track 1 of 1:  14%|█▎        | 3/22 [00:22<02:30,  7.90s/it]
[Acessing speaker spk_4 track 1 of 1:  18%|█▊        | 4/22 [00:26<01:56,  6.48s/it]
[Acessing speaker spk_4 track 1 of 1:  23%|██▎       | 5/22 [00:31<01:37,  5.76s/it]
[Acessing speaker spk_4 track 1 of 1:  27%|██▋       | 6/22 [00:34<01:19,  4.98s/it]
[Acessing speaker spk_4 track 1 of 1:  32%|███▏      | 7/22 [00:35<00:55,  3.69s/it]
[Acessing speaker spk_4 track 1 of 1:  36%|███▋      | 8/22 [00:36<00:37,  2.69s/it]
[Acessing speaker spk_4 track 1 of 1:  41%|████      | 9/22 [00:37<00:28,  2.19s/it]
[Acessing speaker spk_4 track 1 of 1:  45%|████▌     | 10/22 [00:38<00:20,  1.74s/it]
[Acessing speaker spk_4 track 1 of 1:  50%|█████     | 11/2





[Acessing speaker spk_5 track 1 of 2:   0%|          | 0/21 [00:00<?, ?it/s]
[Acessing speaker spk_5 track 1 of 2:   5%|▍         | 1/21 [00:00<00:19,  1.05it/s]
[Acessing speaker spk_5 track 1 of 2:  10%|▉         | 2/21 [00:01<00:13,  1.37it/s]
[Acessing speaker spk_5 track 1 of 2:  14%|█▍        | 3/21 [00:03<00:25,  1.39s/it]
[Acessing speaker spk_5 track 1 of 2:  19%|█▉        | 4/21 [00:04<00:20,  1.18s/it]
[Acessing speaker spk_5 track 1 of 2:  24%|██▍       | 5/21 [00:05<00:16,  1.05s/it]
[Acessing speaker spk_5 track 1 of 2:  29%|██▊       | 6/21 [00:06<00:15,  1.03s/it]
[Acessing speaker spk_5 track 1 of 2:  33%|███▎      | 7/21 [00:07<00:12,  1.09it/s]
[Acessing speaker spk_5 track 1 of 2:  38%|███▊      | 8/21 [00:08<00:13,  1.02s/it]
[Acessing speaker spk_5 track 1 of 2:  43%|████▎     | 9/21 [00:09<00:11,  1.03it/s]
[Acessing speaker spk_5 track 1 of 2:  48%|████▊     | 10/21 [00:12<00:17,  1.62s/it]
[Acessing speaker spk_5 track 1 of 2:  52%|█████▏    | 11/2


Starte Inference für Experiment: E13_stage2_bs8_len20
  base_model      = cocktail_stage2
  model_type      = avsr_cocktail
  checkpoint_path = model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
  beam_size       = 8
  max_length      = 20
  output_dir_name = output_E13_stage2_bs8_len20
  session_dir     = data-bin/dev_without_central_videos/dev/session_49
  comment         = Stage-2-Modell, beam=8, len=20
Loading avsr_cocktail model...
Loading model from model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
avsr_cocktail model loaded successfully!
Inferring 1 sessions using avsr_cocktail model
Processing session session_49


Processing speakers:   0%|          | 0/6 [00:00<?, ?it/s]





[Acessing speaker spk_0 track 1 of 1:   0%|          | 0/12 [00:00<?, ?it/s]
[Acessing speaker spk_0 track 1 of 1:   8%|▊         | 1/12 [00:02<00:25,  2.35s/it]
[Acessing speaker spk_0 track 1 of 1:  17%|█▋        | 2/12 [00:03<00:15,  1.56s/it]
[Acessing speaker spk_0 track 1 of 1:  25%|██▌       | 3/12 [00:03<00:09,  1.08s/it]
[Acessing speaker spk_0 track 1 of 1:  33%|███▎      | 4/12 [00:04<00:06,  1.17it/s]
[Acessing speaker spk_0 track 1 of 1:  42%|████▏     | 5/12 [00:05<00:05,  1.28it/s]
[Acessing speaker spk_0 track 1 of 1:  50%|█████     | 6/12 [00:05<00:04,  1.26it/s]
[Acessing speaker spk_0 track 1 of 1:  58%|█████▊    | 7/12 [00:06<00:03,  1.35it/s]
[Acessing speaker spk_0 track 1 of 1:  67%|██████▋   | 8/12 [00:07<00:03,  1.33it/s]
[Acessing speaker spk_0 track 1 of 1:  75%|███████▌  | 9/12 [00:10<00:04,  1.50s/it]
[Acessing speaker spk_0 track 1 of 1:  83%|████████▎ | 10/12 [00:11<00:02,  1.30s/it]
[Acessing speaker spk_0 track 1 of 1:  92%|█████████▏| 11/1





[Acessing speaker spk_1 track 1 of 1:   0%|          | 0/14 [00:00<?, ?it/s]
[Acessing speaker spk_1 track 1 of 1:   7%|▋         | 1/14 [00:06<01:21,  6.29s/it]
[Acessing speaker spk_1 track 1 of 1:  14%|█▍        | 2/14 [00:07<00:39,  3.26s/it]
[Acessing speaker spk_1 track 1 of 1:  21%|██▏       | 3/14 [00:08<00:22,  2.03s/it]
[Acessing speaker spk_1 track 1 of 1:  29%|██▊       | 4/14 [00:08<00:15,  1.50s/it]
[Acessing speaker spk_1 track 1 of 1:  36%|███▌      | 5/14 [00:09<00:10,  1.20s/it]
[Acessing speaker spk_1 track 1 of 1:  43%|████▎     | 6/14 [00:10<00:09,  1.17s/it]
[Acessing speaker spk_1 track 1 of 1:  50%|█████     | 7/14 [00:11<00:08,  1.17s/it]
[Acessing speaker spk_1 track 1 of 1:  57%|█████▋    | 8/14 [00:13<00:07,  1.31s/it]
[Acessing speaker spk_1 track 1 of 1:  64%|██████▍   | 9/14 [00:16<00:09,  1.88s/it]
[Acessing speaker spk_1 track 1 of 1:  71%|███████▏  | 10/14 [00:26<00:17,  4.47s/it]
[Acessing speaker spk_1 track 1 of 1:  79%|███████▊  | 11/1





[Acessing speaker spk_2 track 1 of 8:   0%|          | 0/1 [00:00<?, ?it/s]
Processing speaker spk_2 track 1 of 8: 100%|██████████| 1/1 [00:00<00:00,  1.72it/s]

[Acessing speaker spk_2 track 2 of 8:   0%|          | 0/1 [00:00<?, ?it/s]
Processing speaker spk_2 track 2 of 8: 100%|██████████| 1/1 [00:00<00:00,  1.99it/s]

[Acessing speaker spk_2 track 3 of 8:   0%|          | 0/1 [00:00<?, ?it/s]
Processing speaker spk_2 track 3 of 8: 100%|██████████| 1/1 [00:00<00:00,  2.27it/s]

[Acessing speaker spk_2 track 4 of 8:   0%|          | 0/3 [00:00<?, ?it/s]
[Acessing speaker spk_2 track 4 of 8:  33%|███▎      | 1/3 [00:04<00:08,  4.35s/it]
[Acessing speaker spk_2 track 4 of 8:  67%|██████▋   | 2/3 [00:08<00:04,  4.36s/it]
Processing speaker spk_2 track 4 of 8: 100%|██████████| 3/3 [00:15<00:00,  5.24s/it]

[Acessing speaker spk_2 track 5 of 8:   0%|          | 0/2 [00:00<?, ?it/s]
[Acessing speaker spk_2 track 5 of 8:  50%|█████     | 1/2 [00:04<00:04,  4.02s/it]
Processing spea





[Acessing speaker spk_3 track 1 of 1:   0%|          | 0/21 [00:00<?, ?it/s]
[Acessing speaker spk_3 track 1 of 1:   5%|▍         | 1/21 [00:00<00:13,  1.43it/s]
[Acessing speaker spk_3 track 1 of 1:  10%|▉         | 2/21 [00:01<00:19,  1.02s/it]
[Acessing speaker spk_3 track 1 of 1:  14%|█▍        | 3/21 [00:04<00:33,  1.86s/it]
[Acessing speaker spk_3 track 1 of 1:  19%|█▉        | 4/21 [00:09<00:53,  3.16s/it]
[Acessing speaker spk_3 track 1 of 1:  24%|██▍       | 5/21 [00:12<00:45,  2.82s/it]
[Acessing speaker spk_3 track 1 of 1:  29%|██▊       | 6/21 [00:12<00:31,  2.13s/it]
[Acessing speaker spk_3 track 1 of 1:  33%|███▎      | 7/21 [00:16<00:36,  2.61s/it]
[Acessing speaker spk_3 track 1 of 1:  38%|███▊      | 8/21 [00:19<00:36,  2.84s/it]
[Acessing speaker spk_3 track 1 of 1:  43%|████▎     | 9/21 [00:21<00:28,  2.38s/it]
[Acessing speaker spk_3 track 1 of 1:  48%|████▊     | 10/21 [00:23<00:25,  2.29s/it]
[Acessing speaker spk_3 track 1 of 1:  52%|█████▏    | 11/2





[Acessing speaker spk_4 track 1 of 1:   0%|          | 0/22 [00:00<?, ?it/s]
[Acessing speaker spk_4 track 1 of 1:   5%|▍         | 1/22 [00:05<01:53,  5.42s/it]
[Acessing speaker spk_4 track 1 of 1:   9%|▉         | 2/22 [00:12<02:09,  6.49s/it]
[Acessing speaker spk_4 track 1 of 1:  14%|█▎        | 3/22 [00:21<02:20,  7.39s/it]
[Acessing speaker spk_4 track 1 of 1:  18%|█▊        | 4/22 [00:25<01:54,  6.33s/it]
[Acessing speaker spk_4 track 1 of 1:  23%|██▎       | 5/22 [00:29<01:33,  5.50s/it]
[Acessing speaker spk_4 track 1 of 1:  27%|██▋       | 6/22 [00:33<01:15,  4.75s/it]
[Acessing speaker spk_4 track 1 of 1:  32%|███▏      | 7/22 [00:34<00:52,  3.50s/it]
[Acessing speaker spk_4 track 1 of 1:  36%|███▋      | 8/22 [00:34<00:35,  2.55s/it]
[Acessing speaker spk_4 track 1 of 1:  41%|████      | 9/22 [00:35<00:26,  2.07s/it]
[Acessing speaker spk_4 track 1 of 1:  45%|████▌     | 10/22 [00:36<00:19,  1.61s/it]
[Acessing speaker spk_4 track 1 of 1:  50%|█████     | 11/2





[Acessing speaker spk_5 track 1 of 2:   0%|          | 0/21 [00:00<?, ?it/s]
[Acessing speaker spk_5 track 1 of 2:   5%|▍         | 1/21 [00:00<00:17,  1.14it/s]
[Acessing speaker spk_5 track 1 of 2:  10%|▉         | 2/21 [00:01<00:13,  1.46it/s]
[Acessing speaker spk_5 track 1 of 2:  14%|█▍        | 3/21 [00:03<00:21,  1.22s/it]
[Acessing speaker spk_5 track 1 of 2:  19%|█▉        | 4/21 [00:04<00:17,  1.02s/it]
[Acessing speaker spk_5 track 1 of 2:  24%|██▍       | 5/21 [00:04<00:14,  1.07it/s]
[Acessing speaker spk_5 track 1 of 2:  29%|██▊       | 6/21 [00:05<00:14,  1.03it/s]
[Acessing speaker spk_5 track 1 of 2:  33%|███▎      | 7/21 [00:06<00:12,  1.08it/s]
[Acessing speaker spk_5 track 1 of 2:  38%|███▊      | 8/21 [00:07<00:12,  1.01it/s]
[Acessing speaker spk_5 track 1 of 2:  43%|████▎     | 9/21 [00:08<00:10,  1.11it/s]
[Acessing speaker spk_5 track 1 of 2:  48%|████▊     | 10/21 [00:11<00:16,  1.46s/it]
[Acessing speaker spk_5 track 1 of 2:  52%|█████▏    | 11/2


########## Starte Experimente für session_50 ##########

Starte Inference für Experiment: E11_stage2_bs3_len15
  base_model      = cocktail_stage2
  model_type      = avsr_cocktail
  checkpoint_path = model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
  beam_size       = 3
  max_length      = 15
  output_dir_name = output_E11_stage2_bs3_len15
  session_dir     = data-bin/dev_without_central_videos/dev/session_50
  comment         = Stage-2-Modell, beam=3, len=15
Loading avsr_cocktail model...
Loading model from model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
avsr_cocktail model loaded successfully!
Inferring 1 sessions using avsr_cocktail model
Processing session session_50


Processing speakers:   0%|          | 0/6 [00:00<?, ?it/s]





[Acessing speaker spk_0 track 1 of 1:   0%|          | 0/26 [00:00<?, ?it/s]
[Acessing speaker spk_0 track 1 of 1:   4%|▍         | 1/26 [00:00<00:23,  1.05it/s]
[Acessing speaker spk_0 track 1 of 1:   8%|▊         | 2/26 [00:01<00:17,  1.37it/s]
[Acessing speaker spk_0 track 1 of 1:  12%|█▏        | 3/26 [00:01<00:13,  1.69it/s]
[Acessing speaker spk_0 track 1 of 1:  15%|█▌        | 4/26 [00:02<00:10,  2.02it/s]
[Acessing speaker spk_0 track 1 of 1:  19%|█▉        | 5/26 [00:03<00:12,  1.62it/s]
[Acessing speaker spk_0 track 1 of 1:  23%|██▎       | 6/26 [00:03<00:13,  1.45it/s]
[Acessing speaker spk_0 track 1 of 1:  27%|██▋       | 7/26 [00:04<00:13,  1.38it/s]
[Acessing speaker spk_0 track 1 of 1:  31%|███       | 8/26 [00:05<00:11,  1.52it/s]
[Acessing speaker spk_0 track 1 of 1:  35%|███▍      | 9/26 [00:05<00:10,  1.57it/s]
[Acessing speaker spk_0 track 1 of 1:  38%|███▊      | 10/26 [00:06<00:10,  1.58it/s]
[Acessing speaker spk_0 track 1 of 1:  42%|████▏     | 11/2





[Acessing speaker spk_1 track 1 of 1:   0%|          | 0/25 [00:00<?, ?it/s]
[Acessing speaker spk_1 track 1 of 1:   4%|▍         | 1/25 [00:03<01:18,  3.28s/it]
[Acessing speaker spk_1 track 1 of 1:   8%|▊         | 2/25 [00:06<01:10,  3.08s/it]
[Acessing speaker spk_1 track 1 of 1:  12%|█▏        | 3/25 [00:07<00:49,  2.27s/it]
[Acessing speaker spk_1 track 1 of 1:  16%|█▌        | 4/25 [00:07<00:32,  1.53s/it]
[Acessing speaker spk_1 track 1 of 1:  20%|██        | 5/25 [00:09<00:32,  1.65s/it]
[Acessing speaker spk_1 track 1 of 1:  24%|██▍       | 6/25 [00:11<00:30,  1.59s/it]
[Acessing speaker spk_1 track 1 of 1:  28%|██▊       | 7/25 [00:11<00:22,  1.23s/it]
[Acessing speaker spk_1 track 1 of 1:  32%|███▏      | 8/25 [00:12<00:16,  1.01it/s]
[Acessing speaker spk_1 track 1 of 1:  36%|███▌      | 9/25 [00:12<00:14,  1.14it/s]
[Acessing speaker spk_1 track 1 of 1:  40%|████      | 10/25 [00:16<00:27,  1.83s/it]
[Acessing speaker spk_1 track 1 of 1:  44%|████▍     | 11/2





[Acessing speaker spk_2 track 1 of 2:   0%|          | 0/19 [00:00<?, ?it/s]
[Acessing speaker spk_2 track 1 of 2:   5%|▌         | 1/19 [00:01<00:19,  1.06s/it]
[Acessing speaker spk_2 track 1 of 2:  11%|█         | 2/19 [00:01<00:12,  1.32it/s]
[Acessing speaker spk_2 track 1 of 2:  16%|█▌        | 3/19 [00:05<00:34,  2.15s/it]
[Acessing speaker spk_2 track 1 of 2:  21%|██        | 4/19 [00:06<00:28,  1.87s/it]
[Acessing speaker spk_2 track 1 of 2:  26%|██▋       | 5/19 [00:07<00:19,  1.41s/it]
[Acessing speaker spk_2 track 1 of 2:  32%|███▏      | 6/19 [00:11<00:28,  2.21s/it]
[Acessing speaker spk_2 track 1 of 2:  37%|███▋      | 7/19 [00:11<00:19,  1.66s/it]
[Acessing speaker spk_2 track 1 of 2:  42%|████▏     | 8/19 [00:12<00:13,  1.27s/it]
[Acessing speaker spk_2 track 1 of 2:  47%|████▋     | 9/19 [00:12<00:11,  1.11s/it]
[Acessing speaker spk_2 track 1 of 2:  53%|█████▎    | 10/19 [00:13<00:08,  1.03it/s]
[Acessing speaker spk_2 track 1 of 2:  58%|█████▊    | 11/1





[Acessing speaker spk_3 track 1 of 3:   0%|          | 0/16 [00:00<?, ?it/s]
[Acessing speaker spk_3 track 1 of 3:   6%|▋         | 1/16 [00:01<00:15,  1.06s/it]
[Acessing speaker spk_3 track 1 of 3:  12%|█▎        | 2/16 [00:01<00:09,  1.41it/s]
[Acessing speaker spk_3 track 1 of 3:  19%|█▉        | 3/16 [00:02<00:10,  1.23it/s]
[Acessing speaker spk_3 track 1 of 3:  25%|██▌       | 4/16 [00:05<00:18,  1.56s/it]
[Acessing speaker spk_3 track 1 of 3:  31%|███▏      | 5/16 [00:05<00:13,  1.24s/it]
[Acessing speaker spk_3 track 1 of 3:  38%|███▊      | 6/16 [00:07<00:12,  1.22s/it]
[Acessing speaker spk_3 track 1 of 3:  44%|████▍     | 7/16 [00:07<00:08,  1.02it/s]
[Acessing speaker spk_3 track 1 of 3:  50%|█████     | 8/16 [00:08<00:07,  1.07it/s]
[Acessing speaker spk_3 track 1 of 3:  56%|█████▋    | 9/16 [00:10<00:08,  1.19s/it]
[Acessing speaker spk_3 track 1 of 3:  62%|██████▎   | 10/16 [00:10<00:05,  1.02it/s]
[Acessing speaker spk_3 track 1 of 3:  69%|██████▉   | 11/1





[Acessing speaker spk_4 track 1 of 1:   0%|          | 0/27 [00:00<?, ?it/s]
[Acessing speaker spk_4 track 1 of 1:   4%|▎         | 1/27 [00:00<00:18,  1.42it/s]
[Acessing speaker spk_4 track 1 of 1:   7%|▋         | 2/27 [00:01<00:12,  2.08it/s]
[Acessing speaker spk_4 track 1 of 1:  11%|█         | 3/27 [00:01<00:15,  1.60it/s]
[Acessing speaker spk_4 track 1 of 1:  15%|█▍        | 4/27 [00:02<00:16,  1.35it/s]
[Acessing speaker spk_4 track 1 of 1:  19%|█▊        | 5/27 [00:04<00:20,  1.05it/s]
[Acessing speaker spk_4 track 1 of 1:  22%|██▏       | 6/27 [00:04<00:16,  1.27it/s]
[Acessing speaker spk_4 track 1 of 1:  26%|██▌       | 7/27 [00:05<00:13,  1.45it/s]
[Acessing speaker spk_4 track 1 of 1:  30%|██▉       | 8/27 [00:06<00:17,  1.09it/s]
[Acessing speaker spk_4 track 1 of 1:  33%|███▎      | 9/27 [00:07<00:15,  1.15it/s]
[Acessing speaker spk_4 track 1 of 1:  37%|███▋      | 10/27 [00:07<00:12,  1.35it/s]
[Acessing speaker spk_4 track 1 of 1:  41%|████      | 11/2





[Acessing speaker spk_5 track 1 of 1:   0%|          | 0/31 [00:00<?, ?it/s]
[Acessing speaker spk_5 track 1 of 1:   3%|▎         | 1/31 [00:00<00:20,  1.44it/s]
[Acessing speaker spk_5 track 1 of 1:   6%|▋         | 2/31 [00:01<00:19,  1.46it/s]
[Acessing speaker spk_5 track 1 of 1:  10%|▉         | 3/31 [00:01<00:15,  1.76it/s]
[Acessing speaker spk_5 track 1 of 1:  13%|█▎        | 4/31 [00:02<00:14,  1.85it/s]
[Acessing speaker spk_5 track 1 of 1:  16%|█▌        | 5/31 [00:02<00:13,  1.88it/s]
[Acessing speaker spk_5 track 1 of 1:  19%|█▉        | 6/31 [00:03<00:14,  1.75it/s]
[Acessing speaker spk_5 track 1 of 1:  23%|██▎       | 7/31 [00:04<00:14,  1.71it/s]
[Acessing speaker spk_5 track 1 of 1:  26%|██▌       | 8/31 [00:06<00:24,  1.05s/it]
[Acessing speaker spk_5 track 1 of 1:  29%|██▉       | 9/31 [00:10<00:44,  2.01s/it]
[Acessing speaker spk_5 track 1 of 1:  32%|███▏      | 10/31 [00:13<00:51,  2.46s/it]
[Acessing speaker spk_5 track 1 of 1:  35%|███▌      | 11/3


Starte Inference für Experiment: E12_stage2_bs12_len20
  base_model      = cocktail_stage2
  model_type      = avsr_cocktail
  checkpoint_path = model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
  beam_size       = 12
  max_length      = 20
  output_dir_name = output_E12_stage2_bs12_len20
  session_dir     = data-bin/dev_without_central_videos/dev/session_50
  comment         = Stage-2-Modell, beam=12, len=20 (beste Konfiguration)
Loading avsr_cocktail model...
Loading model from model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
avsr_cocktail model loaded successfully!
Inferring 1 sessions using avsr_cocktail model
Processing session session_50


Processing speakers:   0%|          | 0/6 [00:00<?, ?it/s]





[Acessing speaker spk_0 track 1 of 1:   0%|          | 0/25 [00:00<?, ?it/s]
[Acessing speaker spk_0 track 1 of 1:   4%|▍         | 1/25 [00:00<00:20,  1.18it/s]
[Acessing speaker spk_0 track 1 of 1:   8%|▊         | 2/25 [00:01<00:16,  1.43it/s]
[Acessing speaker spk_0 track 1 of 1:  12%|█▏        | 3/25 [00:02<00:16,  1.36it/s]
[Acessing speaker spk_0 track 1 of 1:  16%|█▌        | 4/25 [00:02<00:13,  1.53it/s]
[Acessing speaker spk_0 track 1 of 1:  20%|██        | 5/25 [00:03<00:17,  1.16it/s]
[Acessing speaker spk_0 track 1 of 1:  24%|██▍       | 6/25 [00:05<00:18,  1.03it/s]
[Acessing speaker spk_0 track 1 of 1:  28%|██▊       | 7/25 [00:06<00:17,  1.01it/s]
[Acessing speaker spk_0 track 1 of 1:  32%|███▏      | 8/25 [00:06<00:15,  1.10it/s]
[Acessing speaker spk_0 track 1 of 1:  36%|███▌      | 9/25 [00:07<00:13,  1.18it/s]
[Acessing speaker spk_0 track 1 of 1:  40%|████      | 10/25 [00:08<00:12,  1.19it/s]
[Acessing speaker spk_0 track 1 of 1:  44%|████▍     | 11/2





[Acessing speaker spk_1 track 1 of 1:   0%|          | 0/24 [00:00<?, ?it/s]
[Acessing speaker spk_1 track 1 of 1:   4%|▍         | 1/24 [00:11<04:35, 11.99s/it]
[Acessing speaker spk_1 track 1 of 1:   8%|▊         | 2/24 [00:13<02:13,  6.08s/it]
[Acessing speaker spk_1 track 1 of 1:  12%|█▎        | 3/24 [00:14<01:15,  3.59s/it]
[Acessing speaker spk_1 track 1 of 1:  17%|█▋        | 4/24 [00:16<00:58,  2.91s/it]
[Acessing speaker spk_1 track 1 of 1:  21%|██        | 5/24 [00:18<00:49,  2.62s/it]
[Acessing speaker spk_1 track 1 of 1:  25%|██▌       | 6/24 [00:19<00:35,  2.00s/it]
[Acessing speaker spk_1 track 1 of 1:  29%|██▉       | 7/24 [00:19<00:26,  1.55s/it]
[Acessing speaker spk_1 track 1 of 1:  33%|███▎      | 8/24 [00:20<00:21,  1.32s/it]
[Acessing speaker spk_1 track 1 of 1:  38%|███▊      | 9/24 [00:25<00:36,  2.43s/it]
[Acessing speaker spk_1 track 1 of 1:  42%|████▏     | 10/24 [00:26<00:28,  2.05s/it]
[Acessing speaker spk_1 track 1 of 1:  46%|████▌     | 11/2





[Acessing speaker spk_2 track 1 of 2:   0%|          | 0/18 [00:00<?, ?it/s]
[Acessing speaker spk_2 track 1 of 2:   6%|▌         | 1/18 [00:01<00:22,  1.33s/it]
[Acessing speaker spk_2 track 1 of 2:  11%|█         | 2/18 [00:02<00:15,  1.03it/s]
[Acessing speaker spk_2 track 1 of 2:  17%|█▋        | 3/18 [00:07<00:45,  3.06s/it]
[Acessing speaker spk_2 track 1 of 2:  22%|██▏       | 4/18 [00:08<00:31,  2.26s/it]
[Acessing speaker spk_2 track 1 of 2:  28%|██▊       | 5/18 [00:13<00:40,  3.08s/it]
[Acessing speaker spk_2 track 1 of 2:  33%|███▎      | 6/18 [00:14<00:28,  2.35s/it]
[Acessing speaker spk_2 track 1 of 2:  39%|███▉      | 7/18 [00:14<00:19,  1.75s/it]
[Acessing speaker spk_2 track 1 of 2:  44%|████▍     | 8/18 [00:15<00:15,  1.57s/it]
[Acessing speaker spk_2 track 1 of 2:  50%|█████     | 9/18 [00:16<00:11,  1.33s/it]
[Acessing speaker spk_2 track 1 of 2:  56%|█████▌    | 10/18 [00:17<00:09,  1.14s/it]
[Acessing speaker spk_2 track 1 of 2:  61%|██████    | 11/1





[Acessing speaker spk_3 track 1 of 3:   0%|          | 0/16 [00:00<?, ?it/s]
[Acessing speaker spk_3 track 1 of 3:   6%|▋         | 1/16 [00:01<00:17,  1.17s/it]
[Acessing speaker spk_3 track 1 of 3:  12%|█▎        | 2/16 [00:01<00:11,  1.18it/s]
[Acessing speaker spk_3 track 1 of 3:  19%|█▉        | 3/16 [00:03<00:13,  1.01s/it]
[Acessing speaker spk_3 track 1 of 3:  25%|██▌       | 4/16 [00:06<00:21,  1.81s/it]
[Acessing speaker spk_3 track 1 of 3:  31%|███▏      | 5/16 [00:07<00:17,  1.56s/it]
[Acessing speaker spk_3 track 1 of 3:  38%|███▊      | 6/16 [00:08<00:14,  1.48s/it]
[Acessing speaker spk_3 track 1 of 3:  44%|████▍     | 7/16 [00:09<00:10,  1.22s/it]
[Acessing speaker spk_3 track 1 of 3:  50%|█████     | 8/16 [00:10<00:09,  1.21s/it]
[Acessing speaker spk_3 track 1 of 3:  56%|█████▋    | 9/16 [00:12<00:10,  1.57s/it]
[Acessing speaker spk_3 track 1 of 3:  62%|██████▎   | 10/16 [00:13<00:08,  1.35s/it]
[Acessing speaker spk_3 track 1 of 3:  69%|██████▉   | 11/1





[Acessing speaker spk_4 track 1 of 1:   0%|          | 0/27 [00:00<?, ?it/s]
[Acessing speaker spk_4 track 1 of 1:   4%|▎         | 1/27 [00:00<00:23,  1.11it/s]
[Acessing speaker spk_4 track 1 of 1:   7%|▋         | 2/27 [00:01<00:15,  1.60it/s]
[Acessing speaker spk_4 track 1 of 1:  11%|█         | 3/27 [00:02<00:18,  1.28it/s]
[Acessing speaker spk_4 track 1 of 1:  15%|█▍        | 4/27 [00:03<00:20,  1.14it/s]
[Acessing speaker spk_4 track 1 of 1:  19%|█▊        | 5/27 [00:04<00:24,  1.10s/it]
[Acessing speaker spk_4 track 1 of 1:  22%|██▏       | 6/27 [00:05<00:19,  1.06it/s]
[Acessing speaker spk_4 track 1 of 1:  26%|██▌       | 7/27 [00:06<00:17,  1.13it/s]
[Acessing speaker spk_4 track 1 of 1:  30%|██▉       | 8/27 [00:09<00:31,  1.65s/it]
[Acessing speaker spk_4 track 1 of 1:  33%|███▎      | 9/27 [00:10<00:25,  1.44s/it]
[Acessing speaker spk_4 track 1 of 1:  37%|███▋      | 10/27 [00:11<00:20,  1.20s/it]
[Acessing speaker spk_4 track 1 of 1:  41%|████      | 11/2





[Acessing speaker spk_5 track 1 of 1:   0%|          | 0/29 [00:00<?, ?it/s]
[Acessing speaker spk_5 track 1 of 1:   3%|▎         | 1/29 [00:00<00:20,  1.37it/s]
[Acessing speaker spk_5 track 1 of 1:   7%|▋         | 2/29 [00:01<00:22,  1.19it/s]
[Acessing speaker spk_5 track 1 of 1:  10%|█         | 3/29 [00:02<00:18,  1.43it/s]
[Acessing speaker spk_5 track 1 of 1:  14%|█▍        | 4/29 [00:02<00:18,  1.34it/s]
[Acessing speaker spk_5 track 1 of 1:  17%|█▋        | 5/29 [00:03<00:17,  1.37it/s]
[Acessing speaker spk_5 track 1 of 1:  21%|██        | 6/29 [00:04<00:18,  1.27it/s]
[Acessing speaker spk_5 track 1 of 1:  24%|██▍       | 7/29 [00:05<00:17,  1.24it/s]
[Acessing speaker spk_5 track 1 of 1:  28%|██▊       | 8/29 [00:07<00:26,  1.27s/it]
[Acessing speaker spk_5 track 1 of 1:  31%|███       | 9/29 [00:16<01:14,  3.73s/it]
[Acessing speaker spk_5 track 1 of 1:  34%|███▍      | 10/29 [00:25<01:40,  5.26s/it]
[Acessing speaker spk_5 track 1 of 1:  38%|███▊      | 11/2


Starte Inference für Experiment: E13_stage2_bs8_len20
  base_model      = cocktail_stage2
  model_type      = avsr_cocktail
  checkpoint_path = model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
  beam_size       = 8
  max_length      = 20
  output_dir_name = output_E13_stage2_bs8_len20
  session_dir     = data-bin/dev_without_central_videos/dev/session_50
  comment         = Stage-2-Modell, beam=8, len=20
Loading avsr_cocktail model...
Loading model from model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
avsr_cocktail model loaded successfully!
Inferring 1 sessions using avsr_cocktail model
Processing session session_50


Processing speakers:   0%|          | 0/6 [00:00<?, ?it/s]





[Acessing speaker spk_0 track 1 of 1:   0%|          | 0/25 [00:00<?, ?it/s]
[Acessing speaker spk_0 track 1 of 1:   4%|▍         | 1/25 [00:00<00:19,  1.23it/s]
[Acessing speaker spk_0 track 1 of 1:   8%|▊         | 2/25 [00:01<00:15,  1.49it/s]
[Acessing speaker spk_0 track 1 of 1:  12%|█▏        | 3/25 [00:02<00:15,  1.47it/s]
[Acessing speaker spk_0 track 1 of 1:  16%|█▌        | 4/25 [00:02<00:12,  1.70it/s]
[Acessing speaker spk_0 track 1 of 1:  20%|██        | 5/25 [00:03<00:16,  1.24it/s]
[Acessing speaker spk_0 track 1 of 1:  24%|██▍       | 6/25 [00:04<00:16,  1.13it/s]
[Acessing speaker spk_0 track 1 of 1:  28%|██▊       | 7/25 [00:05<00:16,  1.12it/s]
[Acessing speaker spk_0 track 1 of 1:  32%|███▏      | 8/25 [00:06<00:13,  1.30it/s]
[Acessing speaker spk_0 track 1 of 1:  36%|███▌      | 9/25 [00:06<00:11,  1.34it/s]
[Acessing speaker spk_0 track 1 of 1:  40%|████      | 10/25 [00:07<00:10,  1.38it/s]
[Acessing speaker spk_0 track 1 of 1:  44%|████▍     | 11/2





[Acessing speaker spk_1 track 1 of 1:   0%|          | 0/24 [00:00<?, ?it/s]
[Acessing speaker spk_1 track 1 of 1:   4%|▍         | 1/24 [00:10<04:12, 10.98s/it]
[Acessing speaker spk_1 track 1 of 1:   8%|▊         | 2/24 [00:12<02:04,  5.67s/it]
[Acessing speaker spk_1 track 1 of 1:  12%|█▎        | 3/24 [00:13<01:09,  3.30s/it]
[Acessing speaker spk_1 track 1 of 1:  17%|█▋        | 4/24 [00:15<00:55,  2.79s/it]
[Acessing speaker spk_1 track 1 of 1:  21%|██        | 5/24 [00:17<00:46,  2.46s/it]
[Acessing speaker spk_1 track 1 of 1:  25%|██▌       | 6/24 [00:17<00:32,  1.82s/it]
[Acessing speaker spk_1 track 1 of 1:  29%|██▉       | 7/24 [00:18<00:24,  1.44s/it]
[Acessing speaker spk_1 track 1 of 1:  33%|███▎      | 8/24 [00:19<00:19,  1.23s/it]
[Acessing speaker spk_1 track 1 of 1:  38%|███▊      | 9/24 [00:23<00:33,  2.20s/it]
[Acessing speaker spk_1 track 1 of 1:  42%|████▏     | 10/24 [00:24<00:26,  1.86s/it]
[Acessing speaker spk_1 track 1 of 1:  46%|████▌     | 11/2





[Acessing speaker spk_2 track 1 of 2:   0%|          | 0/18 [00:00<?, ?it/s]
[Acessing speaker spk_2 track 1 of 2:   6%|▌         | 1/18 [00:01<00:20,  1.21s/it]
[Acessing speaker spk_2 track 1 of 2:  11%|█         | 2/18 [00:01<00:12,  1.23it/s]
[Acessing speaker spk_2 track 1 of 2:  17%|█▋        | 3/18 [00:07<00:44,  2.99s/it]
[Acessing speaker spk_2 track 1 of 2:  22%|██▏       | 4/18 [00:08<00:30,  2.20s/it]
[Acessing speaker spk_2 track 1 of 2:  28%|██▊       | 5/18 [00:12<00:38,  2.93s/it]
[Acessing speaker spk_2 track 1 of 2:  33%|███▎      | 6/18 [00:13<00:26,  2.19s/it]
[Acessing speaker spk_2 track 1 of 2:  39%|███▉      | 7/18 [00:13<00:18,  1.65s/it]
[Acessing speaker spk_2 track 1 of 2:  44%|████▍     | 8/18 [00:14<00:14,  1.40s/it]
[Acessing speaker spk_2 track 1 of 2:  50%|█████     | 9/18 [00:15<00:10,  1.21s/it]
[Acessing speaker spk_2 track 1 of 2:  56%|█████▌    | 10/18 [00:16<00:08,  1.00s/it]
[Acessing speaker spk_2 track 1 of 2:  61%|██████    | 11/1





[Acessing speaker spk_3 track 1 of 3:   0%|          | 0/16 [00:00<?, ?it/s]
[Acessing speaker spk_3 track 1 of 3:   6%|▋         | 1/16 [00:01<00:16,  1.08s/it]
[Acessing speaker spk_3 track 1 of 3:  12%|█▎        | 2/16 [00:01<00:10,  1.31it/s]
[Acessing speaker spk_3 track 1 of 3:  19%|█▉        | 3/16 [00:02<00:12,  1.08it/s]
[Acessing speaker spk_3 track 1 of 3:  25%|██▌       | 4/16 [00:05<00:19,  1.62s/it]
[Acessing speaker spk_3 track 1 of 3:  31%|███▏      | 5/16 [00:06<00:15,  1.44s/it]
[Acessing speaker spk_3 track 1 of 3:  38%|███▊      | 6/16 [00:07<00:13,  1.33s/it]
[Acessing speaker spk_3 track 1 of 3:  44%|████▍     | 7/16 [00:08<00:10,  1.12s/it]
[Acessing speaker spk_3 track 1 of 3:  50%|█████     | 8/16 [00:09<00:08,  1.06s/it]
[Acessing speaker spk_3 track 1 of 3:  56%|█████▋    | 9/16 [00:11<00:09,  1.37s/it]
[Acessing speaker spk_3 track 1 of 3:  62%|██████▎   | 10/16 [00:12<00:07,  1.20s/it]
[Acessing speaker spk_3 track 1 of 3:  69%|██████▉   | 11/1





[Acessing speaker spk_4 track 1 of 1:   0%|          | 0/27 [00:00<?, ?it/s]
[Acessing speaker spk_4 track 1 of 1:   4%|▎         | 1/27 [00:00<00:21,  1.22it/s]
[Acessing speaker spk_4 track 1 of 1:   7%|▋         | 2/27 [00:01<00:14,  1.78it/s]
[Acessing speaker spk_4 track 1 of 1:  11%|█         | 3/27 [00:02<00:16,  1.46it/s]
[Acessing speaker spk_4 track 1 of 1:  15%|█▍        | 4/27 [00:03<00:19,  1.21it/s]
[Acessing speaker spk_4 track 1 of 1:  19%|█▊        | 5/27 [00:04<00:22,  1.01s/it]
[Acessing speaker spk_4 track 1 of 1:  22%|██▏       | 6/27 [00:04<00:17,  1.17it/s]
[Acessing speaker spk_4 track 1 of 1:  26%|██▌       | 7/27 [00:05<00:15,  1.33it/s]
[Acessing speaker spk_4 track 1 of 1:  30%|██▉       | 8/27 [00:07<00:19,  1.01s/it]
[Acessing speaker spk_4 track 1 of 1:  33%|███▎      | 9/27 [00:08<00:17,  1.00it/s]
[Acessing speaker spk_4 track 1 of 1:  37%|███▋      | 10/27 [00:08<00:14,  1.17it/s]
[Acessing speaker spk_4 track 1 of 1:  41%|████      | 11/2





[Acessing speaker spk_5 track 1 of 1:   0%|          | 0/29 [00:00<?, ?it/s]
[Acessing speaker spk_5 track 1 of 1:   3%|▎         | 1/29 [00:00<00:17,  1.62it/s]
[Acessing speaker spk_5 track 1 of 1:   7%|▋         | 2/29 [00:01<00:17,  1.55it/s]
[Acessing speaker spk_5 track 1 of 1:  10%|█         | 3/29 [00:01<00:15,  1.66it/s]
[Acessing speaker spk_5 track 1 of 1:  14%|█▍        | 4/29 [00:02<00:15,  1.63it/s]
[Acessing speaker spk_5 track 1 of 1:  17%|█▋        | 5/29 [00:03<00:15,  1.54it/s]
[Acessing speaker spk_5 track 1 of 1:  21%|██        | 6/29 [00:03<00:15,  1.49it/s]
[Acessing speaker spk_5 track 1 of 1:  24%|██▍       | 7/29 [00:04<00:15,  1.42it/s]
[Acessing speaker spk_5 track 1 of 1:  28%|██▊       | 8/29 [00:06<00:24,  1.15s/it]
[Acessing speaker spk_5 track 1 of 1:  31%|███       | 9/29 [00:15<01:08,  3.40s/it]
[Acessing speaker spk_5 track 1 of 1:  34%|███▍      | 10/29 [00:23<01:32,  4.85s/it]
[Acessing speaker spk_5 track 1 of 1:  38%|███▊      | 11/2


########## Starte Experimente für session_54 ##########

Starte Inference für Experiment: E11_stage2_bs3_len15
  base_model      = cocktail_stage2
  model_type      = avsr_cocktail
  checkpoint_path = model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
  beam_size       = 3
  max_length      = 15
  output_dir_name = output_E11_stage2_bs3_len15
  session_dir     = data-bin/dev_without_central_videos/dev/session_54
  comment         = Stage-2-Modell, beam=3, len=15
Loading avsr_cocktail model...
Loading model from model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
avsr_cocktail model loaded successfully!
Inferring 1 sessions using avsr_cocktail model
Processing session session_54


Processing speakers:   0%|          | 0/5 [00:00<?, ?it/s]





[Acessing speaker spk_0 track 1 of 2:   0%|          | 0/27 [00:00<?, ?it/s]
[Acessing speaker spk_0 track 1 of 2:   4%|▎         | 1/27 [00:00<00:22,  1.17it/s]
[Acessing speaker spk_0 track 1 of 2:   7%|▋         | 2/27 [00:01<00:15,  1.59it/s]
[Acessing speaker spk_0 track 1 of 2:  11%|█         | 3/27 [00:01<00:11,  2.11it/s]
[Acessing speaker spk_0 track 1 of 2:  15%|█▍        | 4/27 [00:01<00:09,  2.32it/s]
[Acessing speaker spk_0 track 1 of 2:  19%|█▊        | 5/27 [00:03<00:16,  1.32it/s]
[Acessing speaker spk_0 track 1 of 2:  22%|██▏       | 6/27 [00:05<00:24,  1.18s/it]
[Acessing speaker spk_0 track 1 of 2:  26%|██▌       | 7/27 [00:08<00:34,  1.71s/it]
[Acessing speaker spk_0 track 1 of 2:  30%|██▉       | 8/27 [00:09<00:33,  1.76s/it]
[Acessing speaker spk_0 track 1 of 2:  33%|███▎      | 9/27 [00:10<00:25,  1.44s/it]
[Acessing speaker spk_0 track 1 of 2:  37%|███▋      | 10/27 [00:11<00:19,  1.14s/it]
[Acessing speaker spk_0 track 1 of 2:  41%|████      | 11/2





[Acessing speaker spk_1 track 1 of 1:   0%|          | 0/29 [00:00<?, ?it/s]
[Acessing speaker spk_1 track 1 of 1:   3%|▎         | 1/29 [00:01<00:34,  1.23s/it]
[Acessing speaker spk_1 track 1 of 1:   7%|▋         | 2/29 [00:04<01:08,  2.53s/it]
[Acessing speaker spk_1 track 1 of 1:  10%|█         | 3/29 [00:09<01:36,  3.69s/it]
[Acessing speaker spk_1 track 1 of 1:  14%|█▍        | 4/29 [00:13<01:35,  3.80s/it]
[Acessing speaker spk_1 track 1 of 1:  17%|█▋        | 5/29 [00:17<01:30,  3.77s/it]
[Acessing speaker spk_1 track 1 of 1:  21%|██        | 6/29 [00:20<01:23,  3.62s/it]
[Acessing speaker spk_1 track 1 of 1:  24%|██▍       | 7/29 [00:24<01:19,  3.61s/it]
[Acessing speaker spk_1 track 1 of 1:  28%|██▊       | 8/29 [00:26<01:03,  3.00s/it]
[Acessing speaker spk_1 track 1 of 1:  31%|███       | 9/29 [00:31<01:12,  3.61s/it]
[Acessing speaker spk_1 track 1 of 1:  34%|███▍      | 10/29 [00:35<01:11,  3.78s/it]
[Acessing speaker spk_1 track 1 of 1:  38%|███▊      | 11/2





[Acessing speaker spk_2 track 1 of 1:   0%|          | 0/32 [00:00<?, ?it/s]
[Acessing speaker spk_2 track 1 of 1:   3%|▎         | 1/32 [00:01<00:32,  1.04s/it]
[Acessing speaker spk_2 track 1 of 1:   6%|▋         | 2/32 [00:01<00:25,  1.17it/s]
[Acessing speaker spk_2 track 1 of 1:   9%|▉         | 3/32 [00:04<00:44,  1.53s/it]
[Acessing speaker spk_2 track 1 of 1:  12%|█▎        | 4/32 [00:06<00:57,  2.06s/it]
[Acessing speaker spk_2 track 1 of 1:  16%|█▌        | 5/32 [00:07<00:40,  1.48s/it]
[Acessing speaker spk_2 track 1 of 1:  19%|█▉        | 6/32 [00:07<00:30,  1.17s/it]
[Acessing speaker spk_2 track 1 of 1:  22%|██▏       | 7/32 [00:09<00:33,  1.34s/it]
[Acessing speaker spk_2 track 1 of 1:  25%|██▌       | 8/32 [00:11<00:38,  1.62s/it]
[Acessing speaker spk_2 track 1 of 1:  28%|██▊       | 9/32 [00:12<00:32,  1.41s/it]
[Acessing speaker spk_2 track 1 of 1:  31%|███▏      | 10/32 [00:13<00:26,  1.22s/it]
[Acessing speaker spk_2 track 1 of 1:  34%|███▍      | 11/3





[Acessing speaker spk_3 track 1 of 1:   0%|          | 0/39 [00:00<?, ?it/s]
[Acessing speaker spk_3 track 1 of 1:   3%|▎         | 1/39 [00:00<00:29,  1.29it/s]
[Acessing speaker spk_3 track 1 of 1:   5%|▌         | 2/39 [00:02<00:47,  1.30s/it]
[Acessing speaker spk_3 track 1 of 1:   8%|▊         | 3/39 [00:02<00:32,  1.11it/s]
[Acessing speaker spk_3 track 1 of 1:  10%|█         | 4/39 [00:03<00:28,  1.25it/s]
[Acessing speaker spk_3 track 1 of 1:  13%|█▎        | 5/39 [00:04<00:28,  1.20it/s]
[Acessing speaker spk_3 track 1 of 1:  15%|█▌        | 6/39 [00:04<00:22,  1.49it/s]
[Acessing speaker spk_3 track 1 of 1:  18%|█▊        | 7/39 [00:06<00:30,  1.04it/s]
[Acessing speaker spk_3 track 1 of 1:  21%|██        | 8/39 [00:07<00:28,  1.09it/s]
[Acessing speaker spk_3 track 1 of 1:  23%|██▎       | 9/39 [00:09<00:39,  1.33s/it]
[Acessing speaker spk_3 track 1 of 1:  26%|██▌       | 10/39 [00:12<00:51,  1.78s/it]
[Acessing speaker spk_3 track 1 of 1:  28%|██▊       | 11/3





[Acessing speaker spk_4 track 1 of 1:   0%|          | 0/28 [00:00<?, ?it/s]
[Acessing speaker spk_4 track 1 of 1:   4%|▎         | 1/28 [00:01<00:52,  1.96s/it]
[Acessing speaker spk_4 track 1 of 1:   7%|▋         | 2/28 [00:02<00:30,  1.17s/it]
[Acessing speaker spk_4 track 1 of 1:  11%|█         | 3/28 [00:03<00:21,  1.16it/s]
[Acessing speaker spk_4 track 1 of 1:  14%|█▍        | 4/28 [00:03<00:16,  1.46it/s]
[Acessing speaker spk_4 track 1 of 1:  18%|█▊        | 5/28 [00:04<00:22,  1.02it/s]
[Acessing speaker spk_4 track 1 of 1:  21%|██▏       | 6/28 [00:05<00:20,  1.10it/s]
[Acessing speaker spk_4 track 1 of 1:  25%|██▌       | 7/28 [00:08<00:29,  1.40s/it]
[Acessing speaker spk_4 track 1 of 1:  29%|██▊       | 8/28 [00:08<00:23,  1.16s/it]
[Acessing speaker spk_4 track 1 of 1:  32%|███▏      | 9/28 [00:10<00:23,  1.22s/it]
[Acessing speaker spk_4 track 1 of 1:  36%|███▌      | 10/28 [00:10<00:18,  1.02s/it]
[Acessing speaker spk_4 track 1 of 1:  39%|███▉      | 11/2


Starte Inference für Experiment: E12_stage2_bs12_len20
  base_model      = cocktail_stage2
  model_type      = avsr_cocktail
  checkpoint_path = model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
  beam_size       = 12
  max_length      = 20
  output_dir_name = output_E12_stage2_bs12_len20
  session_dir     = data-bin/dev_without_central_videos/dev/session_54
  comment         = Stage-2-Modell, beam=12, len=20 (beste Konfiguration)
Loading avsr_cocktail model...
Loading model from model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
avsr_cocktail model loaded successfully!
Inferring 1 sessions using avsr_cocktail model
Processing session session_54


Processing speakers:   0%|          | 0/5 [00:00<?, ?it/s]





[Acessing speaker spk_0 track 1 of 2:   0%|          | 0/26 [00:00<?, ?it/s]
[Acessing speaker spk_0 track 1 of 2:   4%|▍         | 1/26 [00:00<00:22,  1.12it/s]
[Acessing speaker spk_0 track 1 of 2:   8%|▊         | 2/26 [00:01<00:16,  1.41it/s]
[Acessing speaker spk_0 track 1 of 2:  12%|█▏        | 3/26 [00:03<00:26,  1.15s/it]
[Acessing speaker spk_0 track 1 of 2:  15%|█▌        | 4/26 [00:03<00:20,  1.09it/s]
[Acessing speaker spk_0 track 1 of 2:  19%|█▉        | 5/26 [00:05<00:26,  1.28s/it]
[Acessing speaker spk_0 track 1 of 2:  23%|██▎       | 6/26 [00:08<00:34,  1.72s/it]
[Acessing speaker spk_0 track 1 of 2:  27%|██▋       | 7/26 [00:11<00:42,  2.23s/it]
[Acessing speaker spk_0 track 1 of 2:  31%|███       | 8/26 [00:13<00:39,  2.22s/it]
[Acessing speaker spk_0 track 1 of 2:  35%|███▍      | 9/26 [00:14<00:31,  1.83s/it]
[Acessing speaker spk_0 track 1 of 2:  38%|███▊      | 10/26 [00:15<00:24,  1.54s/it]
[Acessing speaker spk_0 track 1 of 2:  42%|████▏     | 11/2





[Acessing speaker spk_1 track 1 of 1:   0%|          | 0/27 [00:00<?, ?it/s]
[Acessing speaker spk_1 track 1 of 1:   4%|▎         | 1/27 [00:01<00:26,  1.01s/it]
[Acessing speaker spk_1 track 1 of 1:   7%|▋         | 2/27 [00:07<01:40,  4.02s/it]
[Acessing speaker spk_1 track 1 of 1:  11%|█         | 3/27 [00:15<02:20,  5.86s/it]
[Acessing speaker spk_1 track 1 of 1:  15%|█▍        | 4/27 [00:21<02:22,  6.21s/it]
[Acessing speaker spk_1 track 1 of 1:  19%|█▊        | 5/27 [00:27<02:13,  6.07s/it]
[Acessing speaker spk_1 track 1 of 1:  22%|██▏       | 6/27 [00:32<01:54,  5.47s/it]
[Acessing speaker spk_1 track 1 of 1:  26%|██▌       | 7/27 [00:34<01:26,  4.34s/it]
[Acessing speaker spk_1 track 1 of 1:  30%|██▉       | 8/27 [00:39<01:31,  4.82s/it]
[Acessing speaker spk_1 track 1 of 1:  33%|███▎      | 9/27 [00:44<01:27,  4.89s/it]
[Acessing speaker spk_1 track 1 of 1:  37%|███▋      | 10/27 [00:45<01:02,  3.66s/it]
[Acessing speaker spk_1 track 1 of 1:  41%|████      | 11/2





[Acessing speaker spk_2 track 1 of 1:   0%|          | 0/32 [00:00<?, ?it/s]
[Acessing speaker spk_2 track 1 of 1:   3%|▎         | 1/32 [00:01<00:49,  1.61s/it]
[Acessing speaker spk_2 track 1 of 1:   6%|▋         | 2/32 [00:02<00:36,  1.21s/it]
[Acessing speaker spk_2 track 1 of 1:   9%|▉         | 3/32 [00:05<00:58,  2.03s/it]
[Acessing speaker spk_2 track 1 of 1:  12%|█▎        | 4/32 [00:09<01:14,  2.68s/it]
[Acessing speaker spk_2 track 1 of 1:  16%|█▌        | 5/32 [00:09<00:52,  1.96s/it]
[Acessing speaker spk_2 track 1 of 1:  19%|█▉        | 6/32 [00:10<00:41,  1.58s/it]
[Acessing speaker spk_2 track 1 of 1:  22%|██▏       | 7/32 [00:13<00:45,  1.80s/it]
[Acessing speaker spk_2 track 1 of 1:  25%|██▌       | 8/32 [00:15<00:50,  2.11s/it]
[Acessing speaker spk_2 track 1 of 1:  28%|██▊       | 9/32 [00:17<00:42,  1.84s/it]
[Acessing speaker spk_2 track 1 of 1:  31%|███▏      | 10/32 [00:18<00:36,  1.66s/it]
[Acessing speaker spk_2 track 1 of 1:  34%|███▍      | 11/3





[Acessing speaker spk_3 track 1 of 1:   0%|          | 0/38 [00:00<?, ?it/s]
[Acessing speaker spk_3 track 1 of 1:   3%|▎         | 1/38 [00:00<00:35,  1.05it/s]
[Acessing speaker spk_3 track 1 of 1:   5%|▌         | 2/38 [00:03<01:00,  1.67s/it]
[Acessing speaker spk_3 track 1 of 1:   8%|▊         | 3/38 [00:03<00:44,  1.26s/it]
[Acessing speaker spk_3 track 1 of 1:  11%|█         | 4/38 [00:04<00:36,  1.07s/it]
[Acessing speaker spk_3 track 1 of 1:  13%|█▎        | 5/38 [00:05<00:34,  1.05s/it]
[Acessing speaker spk_3 track 1 of 1:  16%|█▌        | 6/38 [00:06<00:31,  1.00it/s]
[Acessing speaker spk_3 track 1 of 1:  18%|█▊        | 7/38 [00:09<00:45,  1.47s/it]
[Acessing speaker spk_3 track 1 of 1:  21%|██        | 8/38 [00:10<00:41,  1.39s/it]
[Acessing speaker spk_3 track 1 of 1:  24%|██▎       | 9/38 [00:12<00:47,  1.63s/it]
[Acessing speaker spk_3 track 1 of 1:  26%|██▋       | 10/38 [00:15<01:00,  2.16s/it]
[Acessing speaker spk_3 track 1 of 1:  29%|██▉       | 11/3





[Acessing speaker spk_4 track 1 of 1:   0%|          | 0/28 [00:00<?, ?it/s]
[Acessing speaker spk_4 track 1 of 1:   4%|▎         | 1/28 [00:02<00:59,  2.22s/it]
[Acessing speaker spk_4 track 1 of 1:   7%|▋         | 2/28 [00:03<00:38,  1.48s/it]
[Acessing speaker spk_4 track 1 of 1:  11%|█         | 3/28 [00:03<00:27,  1.09s/it]
[Acessing speaker spk_4 track 1 of 1:  14%|█▍        | 4/28 [00:04<00:20,  1.17it/s]
[Acessing speaker spk_4 track 1 of 1:  18%|█▊        | 5/28 [00:06<00:27,  1.20s/it]
[Acessing speaker spk_4 track 1 of 1:  21%|██▏       | 6/28 [00:07<00:25,  1.17s/it]
[Acessing speaker spk_4 track 1 of 1:  25%|██▌       | 7/28 [00:10<00:37,  1.77s/it]
[Acessing speaker spk_4 track 1 of 1:  29%|██▊       | 8/28 [00:11<00:30,  1.50s/it]
[Acessing speaker spk_4 track 1 of 1:  32%|███▏      | 9/28 [00:11<00:24,  1.27s/it]
[Acessing speaker spk_4 track 1 of 1:  36%|███▌      | 10/28 [00:12<00:20,  1.12s/it]
[Acessing speaker spk_4 track 1 of 1:  39%|███▉      | 11/2


Starte Inference für Experiment: E13_stage2_bs8_len20
  base_model      = cocktail_stage2
  model_type      = avsr_cocktail
  checkpoint_path = model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
  beam_size       = 8
  max_length      = 20
  output_dir_name = output_E13_stage2_bs8_len20
  session_dir     = data-bin/dev_without_central_videos/dev/session_54
  comment         = Stage-2-Modell, beam=8, len=20
Loading avsr_cocktail model...
Loading model from model-bin/avsr_cocktail_mcorec_stage2_lr5e-5_30k/checkpoint-30000
avsr_cocktail model loaded successfully!
Inferring 1 sessions using avsr_cocktail model
Processing session session_54


Processing speakers:   0%|          | 0/5 [00:00<?, ?it/s]





[Acessing speaker spk_0 track 1 of 2:   0%|          | 0/26 [00:00<?, ?it/s]
[Acessing speaker spk_0 track 1 of 2:   4%|▍         | 1/26 [00:00<00:22,  1.11it/s]
[Acessing speaker spk_0 track 1 of 2:   8%|▊         | 2/26 [00:01<00:15,  1.56it/s]
[Acessing speaker spk_0 track 1 of 2:  12%|█▏        | 3/26 [00:01<00:13,  1.74it/s]
[Acessing speaker spk_0 track 1 of 2:  15%|█▌        | 4/26 [00:02<00:17,  1.27it/s]
[Acessing speaker spk_0 track 1 of 2:  19%|█▉        | 5/26 [00:05<00:26,  1.25s/it]
[Acessing speaker spk_0 track 1 of 2:  23%|██▎       | 6/26 [00:07<00:32,  1.61s/it]
[Acessing speaker spk_0 track 1 of 2:  27%|██▋       | 7/26 [00:10<00:39,  2.07s/it]
[Acessing speaker spk_0 track 1 of 2:  31%|███       | 8/26 [00:12<00:35,  1.96s/it]
[Acessing speaker spk_0 track 1 of 2:  35%|███▍      | 9/26 [00:12<00:27,  1.63s/it]
[Acessing speaker spk_0 track 1 of 2:  38%|███▊      | 10/26 [00:13<00:21,  1.36s/it]
[Acessing speaker spk_0 track 1 of 2:  42%|████▏     | 11/2





[Acessing speaker spk_1 track 1 of 1:   0%|          | 0/27 [00:00<?, ?it/s]
[Acessing speaker spk_1 track 1 of 1:   4%|▎         | 1/27 [00:02<01:04,  2.46s/it]
[Acessing speaker spk_1 track 1 of 1:   7%|▋         | 2/27 [00:08<01:48,  4.33s/it]
[Acessing speaker spk_1 track 1 of 1:  11%|█         | 3/27 [00:15<02:18,  5.76s/it]
[Acessing speaker spk_1 track 1 of 1:  15%|█▍        | 4/27 [00:21<02:15,  5.90s/it]
[Acessing speaker spk_1 track 1 of 1:  19%|█▊        | 5/27 [00:26<02:04,  5.67s/it]
[Acessing speaker spk_1 track 1 of 1:  22%|██▏       | 6/27 [00:31<01:48,  5.17s/it]
[Acessing speaker spk_1 track 1 of 1:  26%|██▌       | 7/27 [00:33<01:22,  4.12s/it]
[Acessing speaker spk_1 track 1 of 1:  30%|██▉       | 8/27 [00:38<01:25,  4.52s/it]
[Acessing speaker spk_1 track 1 of 1:  33%|███▎      | 9/27 [00:43<01:22,  4.59s/it]
[Acessing speaker spk_1 track 1 of 1:  37%|███▋      | 10/27 [00:44<00:58,  3.43s/it]
[Acessing speaker spk_1 track 1 of 1:  41%|████      | 11/2





[Acessing speaker spk_2 track 1 of 1:   0%|          | 0/32 [00:00<?, ?it/s]
[Acessing speaker spk_2 track 1 of 1:   3%|▎         | 1/32 [00:01<00:46,  1.51s/it]
[Acessing speaker spk_2 track 1 of 1:   6%|▋         | 2/32 [00:02<00:34,  1.14s/it]
[Acessing speaker spk_2 track 1 of 1:   9%|▉         | 3/32 [00:05<00:53,  1.84s/it]
[Acessing speaker spk_2 track 1 of 1:  12%|█▎        | 4/32 [00:08<01:06,  2.37s/it]
[Acessing speaker spk_2 track 1 of 1:  16%|█▌        | 5/32 [00:08<00:46,  1.72s/it]
[Acessing speaker spk_2 track 1 of 1:  19%|█▉        | 6/32 [00:09<00:36,  1.40s/it]
[Acessing speaker spk_2 track 1 of 1:  22%|██▏       | 7/32 [00:11<00:42,  1.69s/it]
[Acessing speaker spk_2 track 1 of 1:  25%|██▌       | 8/32 [00:14<00:46,  1.94s/it]
[Acessing speaker spk_2 track 1 of 1:  28%|██▊       | 9/32 [00:15<00:38,  1.69s/it]
[Acessing speaker spk_2 track 1 of 1:  31%|███▏      | 10/32 [00:16<00:31,  1.45s/it]
[Acessing speaker spk_2 track 1 of 1:  34%|███▍      | 11/3





[Acessing speaker spk_3 track 1 of 1:   0%|          | 0/38 [00:00<?, ?it/s]
[Acessing speaker spk_3 track 1 of 1:   3%|▎         | 1/38 [00:00<00:27,  1.36it/s]
[Acessing speaker spk_3 track 1 of 1:   5%|▌         | 2/38 [00:02<00:52,  1.46s/it]
[Acessing speaker spk_3 track 1 of 1:   8%|▊         | 3/38 [00:03<00:38,  1.09s/it]
[Acessing speaker spk_3 track 1 of 1:  11%|█         | 4/38 [00:04<00:31,  1.08it/s]
[Acessing speaker spk_3 track 1 of 1:  13%|█▎        | 5/38 [00:05<00:31,  1.04it/s]
[Acessing speaker spk_3 track 1 of 1:  16%|█▌        | 6/38 [00:05<00:27,  1.16it/s]
[Acessing speaker spk_3 track 1 of 1:  18%|█▊        | 7/38 [00:07<00:37,  1.22s/it]
[Acessing speaker spk_3 track 1 of 1:  21%|██        | 8/38 [00:08<00:34,  1.14s/it]
[Acessing speaker spk_3 track 1 of 1:  24%|██▎       | 9/38 [00:10<00:41,  1.42s/it]
[Acessing speaker spk_3 track 1 of 1:  26%|██▋       | 10/38 [00:13<00:55,  1.99s/it]
[Acessing speaker spk_3 track 1 of 1:  29%|██▉       | 11/3





[Acessing speaker spk_4 track 1 of 1:   0%|          | 0/28 [00:00<?, ?it/s]
[Acessing speaker spk_4 track 1 of 1:   4%|▎         | 1/28 [00:02<00:56,  2.08s/it]
[Acessing speaker spk_4 track 1 of 1:   7%|▋         | 2/28 [00:02<00:33,  1.28s/it]
[Acessing speaker spk_4 track 1 of 1:  11%|█         | 3/28 [00:03<00:23,  1.05it/s]
[Acessing speaker spk_4 track 1 of 1:  14%|█▍        | 4/28 [00:03<00:18,  1.26it/s]
[Acessing speaker spk_4 track 1 of 1:  18%|█▊        | 5/28 [00:05<00:25,  1.11s/it]
[Acessing speaker spk_4 track 1 of 1:  21%|██▏       | 6/28 [00:06<00:22,  1.04s/it]
[Acessing speaker spk_4 track 1 of 1:  25%|██▌       | 7/28 [00:09<00:33,  1.57s/it]
[Acessing speaker spk_4 track 1 of 1:  29%|██▊       | 8/28 [00:10<00:26,  1.35s/it]
[Acessing speaker spk_4 track 1 of 1:  32%|███▏      | 9/28 [00:11<00:25,  1.37s/it]
[Acessing speaker spk_4 track 1 of 1:  36%|███▌      | 10/28 [00:12<00:20,  1.13s/it]
[Acessing speaker spk_4 track 1 of 1:  39%|███▉      | 11/2

## 9 – Evaluation & Aggregation

In [5]:
# Ergebnisse aller Experimente auswerten und an die gemeinsame CSV anhängen
df_dev = append_eval_results_for_experiments(
    experiments=EXPERIMENTS,
    session_ids=SESSION_IDS,
    target_csv="results_dev_subset_by_session.csv",
)



########## Evaluate für session_40 ##########
Starte Evaluate: /home/josch080/Projektgruppe/mcorec_train/bin/python script/evaluate.py --session_dir data-bin/dev_without_central_videos/dev/session_40 --output_dir_name output_ --label_dir_name labels
Evaluating 1 sessions

=== Evaluating session session_40 ===

--- Evaluating output dir: output_E01_bs4_len15 ---
Conversation clustering F1 score: 1.0
Speaker to WER: {'spk_0': 0.564, 'spk_1': 0.4281, 'spk_2': 0.5576, 'spk_3': 0.4283, 'spk_4': 0.4793, 'spk_5': 0.4189}
Speaker clustering F1 score: {'spk_0': 1.0, 'spk_1': 1.0, 'spk_2': 1.0, 'spk_3': 1.0, 'spk_4': 1.0, 'spk_5': 1.0}
Joint ASR-Clustering Error Rate: {'spk_0': 0.282, 'spk_1': 0.21405, 'spk_2': 0.2788, 'spk_3': 0.21415, 'spk_4': 0.23965, 'spk_5': 0.20945}

--- Evaluating output dir: output_E02_bs8_len15 ---
Conversation clustering F1 score: 1.0
Speaker to WER: {'spk_0': 0.561, 'spk_1': 0.4312, 'spk_2': 0.5506, 'spk_3': 0.4283, 'spk_4': 0.5041, 'spk_5': 0.4189}
Speaker clusterin

## 10 – Rohergebnisse (pro Session)

Vollständige Tabelle aller Stage-2-Experimente aufgeschlüsselt nach Session.

In [7]:
import pandas as pd

# gemeinsame Ergebnis-CSV laden
df = pd.read_csv("results_dev_subset_by_session.csv")

stage2_models = [
    "E11_stage2_bs3_len15",
    "E12_stage2_bs12_len20",
    "E13_stage2_bs8_len20",
]

# Nur Stage-2-Zeilen filtern und nach Modell + Session sortieren
stage2_df = (
    df[df["model"].isin(stage2_models)]
    .sort_values(["model", "session"])
)

display(stage2_df)


Unnamed: 0,session,exp,avg_conv_f1,avg_speaker_wer,avg_joint_error,model,timestamp
10,session_40,output_E11_stage2_bs3_len15,1.0,0.55485,0.277425,E11_stage2_bs3_len15,2025-11-30T14:02:31
23,session_43,output_E11_stage2_bs3_len15,1.0,0.5591,0.27955,E11_stage2_bs3_len15,2025-11-30T14:02:45
36,session_49,output_E11_stage2_bs3_len15,1.0,0.47395,0.236975,E11_stage2_bs3_len15,2025-11-30T14:02:58
49,session_50,output_E11_stage2_bs3_len15,0.571429,0.576617,0.502608,E11_stage2_bs3_len15,2025-11-30T14:03:11
62,session_54,output_E11_stage2_bs3_len15,0.666667,0.57232,0.45282,E11_stage2_bs3_len15,2025-11-30T14:03:24
11,session_40,output_E12_stage2_bs12_len20,1.0,0.529883,0.264942,E12_stage2_bs12_len20,2025-11-30T14:02:31
24,session_43,output_E12_stage2_bs12_len20,1.0,0.54305,0.271525,E12_stage2_bs12_len20,2025-11-30T14:02:45
37,session_49,output_E12_stage2_bs12_len20,1.0,0.4647,0.23235,E12_stage2_bs12_len20,2025-11-30T14:02:58
50,session_50,output_E12_stage2_bs12_len20,0.571429,0.5696,0.4991,E12_stage2_bs12_len20,2025-11-30T14:03:11
63,session_54,output_E12_stage2_bs12_len20,0.666667,0.548,0.44066,E12_stage2_bs12_len20,2025-11-30T14:03:24


## 11 – Vergleich Stage-2 vs. BL4

Stage-2-Ergebnisse werden mit den BL4-Ergebnissen bei gleichen Hyperparametern verglichen.
Positive Δ-Werte bedeuten, dass Stage-2 *schlechter* als BL4 ist.

In [14]:
import pandas as pd

# CSV laden
dev_df = pd.read_csv("results_dev_subset_by_session.csv")

# Stage-2-Ergebnisse über alle Sessions mitteln
stage2_models = list(EXPERIMENTS.keys())
stage2_df = dev_df[dev_df["model"].isin(stage2_models)].copy()

stage2_by_model = (
    stage2_df
    .groupby("model")[["avg_conv_f1", "avg_speaker_wer", "avg_joint_error"]]
    .mean()
    .reset_index()
)

# Hyperparameter aus EXPERIMENTS-Dict in neue Spalten mappen
stage2_by_model["beam_size"] = stage2_by_model["model"].map(
    lambda m: EXPERIMENTS_STAGE2[m]["beam_size"]
)
stage2_by_model["max_length"] = stage2_by_model["model"].map(
    lambda m: EXPERIMENTS_STAGE2[m]["max_length"]
)

# Finetuned-Experimente (E01..E10) aggregieren
# Beam/Length direkt aus dem Modellnamen parsen.

ft_df = dev_df[dev_df["model"].str.startswith("E0")].copy()

def parse_beam_len(model_name: str):
    # Beispiel: "E08_bs8_len20" -> beam=8, length=20
    parts = model_name.split("_")
    beam = int(parts[1].replace("bs", ""))
    length = int(parts[2].replace("len", ""))
    return beam, length

# pd.Series erlaubt das Entpacken des Tupels in zwei Spalten per apply()
ft_df[["beam_size", "max_length"]] = ft_df["model"].apply(
    lambda m: pd.Series(parse_beam_len(m))
)

# BL4-Ergebnisse über Sessions mitteln
ft_agg = (
    ft_df
    .groupby(["model", "beam_size", "max_length"])[["avg_speaker_wer", "avg_joint_error"]]
    .mean()
    .reset_index()
)

# Stage-2 und BL4 per (beam_size, max_length) zusammenführen
comparison = pd.merge(
    stage2_by_model,
    ft_agg,
    on=["beam_size", "max_length"],
    how="left",
    suffixes=("_stage2", "_ft"),
)

# Δ-Spalten: positiv = Stage-2 schlechter als BL4
comparison["Δ_WER_stage2_minus_ft"] = (
    comparison["avg_speaker_wer_stage2"] - comparison["avg_speaker_wer_ft"]
)
comparison["Δ_JER_stage2_minus_ft"] = (
    comparison["avg_joint_error_stage2"] - comparison["avg_joint_error_ft"]
)

# Spalten hübsch sortieren
comparison_table = comparison[
    [
        "model_stage2", "beam_size", "max_length",
        "model_ft", "avg_speaker_wer_ft", "avg_joint_error_ft",
        "avg_speaker_wer_stage2", "avg_joint_error_stage2",
        "Δ_WER_stage2_minus_ft", "Δ_JER_stage2_minus_ft",
    ]
].sort_values(["max_length", "beam_size"])

print("Vergleich Stage-2 vs. finetuned (gleiche beam/max_length, aus results_dev_subset_by_session):")
display(comparison_table)


Vergleich Stage-2 vs. finetuned (gleiche beam/max_length, aus results_dev_subset_by_session):


Unnamed: 0,model_stage2,beam_size,max_length,model_ft,avg_speaker_wer_ft,avg_joint_error_ft,avg_speaker_wer_stage2,avg_joint_error_stage2,Δ_WER_stage2_minus_ft,Δ_JER_stage2_minus_ft
0,E11_stage2_bs3_len15,3,15,,,,0.547367,0.349876,,
2,E13_stage2_bs8_len20,8,20,E08_bs8_len20,0.495798,0.324091,0.532463,0.342423,0.036665,0.018332
1,E12_stage2_bs12_len20,12,20,E09_bs12_len20,0.495416,0.3239,0.531047,0.341715,0.035631,0.017815


## 12 – Interpretation

| Konfiguration | WER BL4 | WER Stage-2 | Δ WER | JER BL4 | JER Stage-2 | Δ JER |
|---------------|---------|-------------|-------|---------|-------------|-------|
| beam=8,  len=20 | 0.4958 | 0.5325 | +0.037 | 0.3241 | 0.3424 | +0.018 |
| beam=12, len=20 | 0.4954 | 0.5310 | +0.036 | 0.3239 | 0.3417 | +0.018 |

Das Stage-2-Modell ist in beiden getesteten Konfigurationen konsistent **schlechter** als BL4.

**Erklärung:** BL4 wurde bereits umfangreich auf MCoRec optimiert. Das weitere Fine-Tuning
mit 30 000 Schritten und LR 5·10⁻⁵ hat das Modell überanpasst – es hat sich von der
besser generalisierenden Lösung wegbewegt, ohne neue Informationen zu lernen.

**Schlussfolgerung:** Stage-2-Fine-Tuning wird nicht weiterverfolgt. BL4 bleibt das
Referenzmodell für alle weiteren Experimente.
