## üß† Brain-To-Text ‚Äî Extensions

This notebook builds on the **Brain-To-Text Baseline** by **Piotr Jurczyk** üîó https://www.kaggle.com/code/piotrjurczyk/brain-to-text-baseline

- The dataset comes from a **saved execution of the original notebook** (best model + train/val/test predictions), enabling **fast EDA** without rerunning long training.
- **Data loading, preprocessing, and splits** are kept identical to ensure full reproducibility.


### ‚ú® Added Contributions
- **Beam search decoding**
- **Data augmentation**

### üöÄ Next Improvements that I suggest
- Fine-tune **beam search parameters**
- Integrate a **Transformer / Language Model** during decoding to:
  - Select the best sentence among beam candidates
  - Perform **word-by-word orthographic correction**  
    *(tested locally with **LLaMA 3 2B**, carefully constrained to avoid reformulation)*



---


## Note to the Kaggle community üéÑ


I am taking a short **Kaggle break for Christmas**, so I will not actively iterate on this notebook for a while.


Anyone interested is **very welcome to reuse, adapt, or extend this code** for their own experiments or submissions.


Happy holidays and good luck! üéÖ‚ú®

In [1]:
"""
================================================================================
BRAIN-TO-TEXT '25 : TRAINING WITH DATA AUGMENTATION + TEST INFERENCE
================================================================================
Pipeline:
1. Model training with strong data augmentation (60 epochs, ~2h GPU)
2. Beam search inference on TEST with fixed params (bw=50, prune=-12, min=-8)
3. Generation of submission_final.csv for Kaggle

Estimated time: 2h15 (2h training + 15min test)
Expected gain: WER 29.5% ‚Üí 25‚Äì27% (data augmentation)
================================================================================
"""

# ============================================================================
# IMPORTS
# ============================================================================

import os
import torch
import torch.nn as nn
from torch.nn.utils.rnn import pad_sequence, pack_padded_sequence, pad_packed_sequence
from torch.utils.data import Dataset, DataLoader
from torch.optim import AdamW
from torch.optim.lr_scheduler import OneCycleLR
import numpy as np
import pandas as pd
from tqdm import tqdm
import h5py
from glob import glob
import json
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Dependency installation (Kaggle-compatible)
print("Installing dependencies...")
os.system('pip install -q jiwer')
os.system('pip install -q pyctcdecode')

print(f"\n{'#'*80}")
print(f"##  BRAIN-TO-TEXT '25 - TRAIN + DATA AUGMENTATION")
print(f"{'#'*80}\n")


# ============================================================================
# DATA AUGMENTATION MODULES
# ============================================================================

class SpecAugmentStrong(nn.Module):
    """Strong SpecAugment implementation (time + frequency masking)."""
    def __init__(self, prob=0.8, time_mask_param=60, feature_mask_param=50,
                 num_time_masks=2, num_feat_masks=2):
        super().__init__()
        self.prob = prob
        self.time_mask_param = time_mask_param
        self.feature_mask_param = feature_mask_param
        self.num_time_masks = num_time_masks
        self.num_feat_masks = num_feat_masks

    def forward(self, x):
        # Apply augmentation only during training and with given probability
        if not self.training or torch.rand(1) > self.prob:
            return x
        
        b, c, t = x.size()
        x_aug = x.clone()
        
        # Time masking
        for _ in range(self.num_time_masks):
            mask_len = torch.randint(0, min(self.time_mask_param, t // 2), (1,)).item()
            if mask_len > 0 and t > mask_len:
                t0 = torch.randint(0, t - mask_len, (1,)).item()
                x_aug[:, :, t0:t0 + mask_len] = 0
        
        # Feature masking
        for _ in range(self.num_feat_masks):
            mask_feat = torch.randint(0, min(self.feature_mask_param, c // 2), (1,)).item()
            if mask_feat > 0 and c > mask_feat:
                f0 = torch.randint(0, c - mask_feat, (1,)).item()
                x_aug[:, f0:f0 + mask_feat, :] = 0
        
        return x_aug


class NoiseInjection(nn.Module):
    """Gaussian noise injection."""
    def __init__(self, prob=0.5, noise_std=0.05):
        super().__init__()
        self.prob = prob
        self.noise_std = noise_std
    
    def forward(self, x):
        if not self.training or torch.rand(1) > self.prob:
            return x
        noise = torch.randn_like(x) * self.noise_std
        return x + noise


class TimeWarping(nn.Module):
    """Temporal warping augmentation."""
    def __init__(self, prob=0.4, warp_factor_range=(0.9, 1.1)):
        super().__init__()
        self.prob = prob
        self.warp_min, self.warp_max = warp_factor_range
    
    def forward(self, x):
        if not self.training or torch.rand(1) > self.prob:
            return x
        
        b, c, t = x.size()
        warp_factor = self.warp_min + torch.rand(1).item() * (self.warp_max - self.warp_min)
        new_t = int(t * warp_factor)
        new_t = max(10, min(new_t, t * 2))
        
        # Interpolate along time axis
        x_warped = torch.nn.functional.interpolate(
            x, size=new_t, mode='linear', align_corners=False
        )
        
        # Crop or pad back to original length
        if new_t < t:
            x_warped = torch.nn.functional.pad(x_warped, (0, t - new_t))
        elif new_t > t:
            x_warped = x_warped[:, :, :t]
        
        return x_warped


# ============================================================================
# MODEL WITH DATA AUGMENTATION
# ============================================================================

class BrainCTCModel(nn.Module):
    """CTC-based Brain-to-Text model with strong data augmentation."""
    def __init__(self, input_dim=512, hidden_dim=512, num_layers=3, 
                 vocab_size=50, dropout=0.5):
        super().__init__()
        
        # Augmentation pipeline (applied only during training)
        self.augmentations = nn.Sequential(
            NoiseInjection(prob=0.5, noise_std=0.05),
            TimeWarping(prob=0.4, warp_factor_range=(0.9, 1.1)),
            SpecAugmentStrong(
                prob=0.8,
                time_mask_param=60,
                feature_mask_param=50,
                num_time_masks=2,
                num_feat_masks=2
            )
        )
        
        # Convolutional feature extractor
        self.cnn = nn.Sequential(
            nn.Conv1d(input_dim, 256, kernel_size=11, stride=2, padding=5),
            nn.BatchNorm1d(256), nn.ReLU(), nn.Dropout(dropout),
            nn.Conv1d(256, 256, kernel_size=5, stride=2, padding=2),
            nn.BatchNorm1d(256), nn.ReLU(), nn.Dropout(dropout)
        )
        
        # Bidirectional LSTM encoder
        self.lstm = nn.LSTM(
            256, hidden_dim, num_layers,
            batch_first=True,
            bidirectional=True,
            dropout=dropout if num_layers > 1 else 0
        )
        
        # Output projection to vocabulary size
        self.fc = nn.Linear(hidden_dim * 2, vocab_size)
    
    def forward(self, x, lengths):
        # Input: (B, T, C) ‚Üí (B, C, T)
        x = x.transpose(1, 2)
        
        # Apply augmentations during training only
        if self.training:
            x = self.augmentations(x)
        
        # CNN feature extraction
        x = self.cnn(x)
        x = x.transpose(1, 2)
        
        # Update sequence lengths after strided convolutions
        cnn_lengths = (lengths.cpu() // 4).clamp(min=1)
        
        # Pack sequences for LSTM
        x_packed = pack_padded_sequence(
            x, cnn_lengths, batch_first=True, enforce_sorted=False
        )
        lstm_out, _ = self.lstm(x_packed)
        lstm_out, _ = pad_packed_sequence(lstm_out, batch_first=True)
        
        # Linear projection + log-softmax for CTC
        logits = self.fc(lstm_out)
        log_probs = torch.log_softmax(logits, dim=-1)
        
        # Output shape: (T, B, V)
        return log_probs.transpose(0, 1), cnn_lengths


# ============================================================================
# DATASET
# ============================================================================

class BrainDataset(Dataset):
    """Brain-to-Text dataset loader."""
    def __init__(self, split='train'):
        self.samples = []
        
        data_dir = '/kaggle/input/brain-to-text-25/t15_copyTask_neuralData/hdf5_data_final/'
        pattern = f'{data_dir}/**/data_{split}.hdf5'
        files = sorted(glob(pattern, recursive=True))
        
        print(f"Loading {split} split: {len(files)} files")
        
        for filepath in tqdm(files, desc=split):
            try:
                with h5py.File(filepath, 'r') as f:
                    for trial_key in f.keys():
                        trial = f[trial_key]
                        
                        # Neural features
                        neural = trial['input_features'][:]
                        n_steps = trial.attrs['n_time_steps']
                        neural = neural[:n_steps]
                        neural = (neural - neural.mean()) / (neural.std() + 1e-8)
                        
                        # Sentence label
                        sentence = trial.attrs.get('sentence_label', '')
                        if isinstance(sentence, bytes):
                            sentence = sentence.decode('utf-8')
                        
                        self.samples.append({
                            'neural': torch.FloatTensor(neural),
                            'sentence': sentence.lower(),
                            'length': len(neural)
                        })
            except Exception:
                continue
        
        print(f"‚úì {len(self.samples)} samples loaded")
    
    def __len__(self):
        return len(self.samples)
    
    def __getitem__(self, idx):
        return self.samples[idx]


# ============================================================================
# COLLATE FUNCTION
# ============================================================================

def collate_fn(batch, char2idx):
    """Pads features and encodes text targets for CTC training."""
    features = [s['neural'] for s in batch]
    sentences = [s['sentence'] for s in batch]
    lengths = torch.LongTensor([s['length'] for s in batch])
    
    # Pad neural features
    features_padded = pad_sequence(features, batch_first=True)
    
    # Encode sentences into character indices
    targets = []
    target_lengths = []
    for sentence in sentences:
        encoded = [char2idx.get(c, char2idx.get('<BLANK>')) for c in sentence]
        targets.extend(encoded)
        target_lengths.append(len(encoded))
    
    targets = torch.LongTensor(targets)
    target_lengths = torch.LongTensor(target_lengths)
    
    return features_padded, targets, lengths, target_lengths


# ============================================================================
# VOCABULARY
# ============================================================================

def build_vocabulary():
    """Builds character-level vocabulary."""
    # Special characters
    chars = [' ', '!', "'", ',', '-', '.', ';', '?', '[', ']']
    
    # Letters a‚Äìz
    chars += [chr(i) for i in range(ord('a'), ord('z') + 1)]
    
    # Curly apostrophe (Unicode U+2019)
    chars += ['\u2019']
    
    char2idx = {'<BLANK>': 0}
    for i, c in enumerate(chars, 1):
        char2idx[c] = i
    
    idx2char = {v: k for k, v in char2idx.items()}
    
    return char2idx, idx2char


# ============================================================================
# TRAINING
# ============================================================================

def train_model():
    """Full training loop with validation and checkpointing."""
    print(f"\n{'='*80}")
    print("TRAINING WITH DATA AUGMENTATION")
    print(f"{'='*80}\n")
    
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    print(f"Device: {device}\n")
    
    # Vocabulary
    char2idx, idx2char = build_vocabulary()
    print(f"Vocabulary size: {len(char2idx)} characters\n")
    
    # Datasets and loaders
    train_dataset = BrainDataset('train')
    val_dataset = BrainDataset('val')
    
    train_loader = DataLoader(
        train_dataset,
        batch_size=64,
        shuffle=True,
        collate_fn=lambda b: collate_fn(b, char2idx),
        num_workers=2,
        pin_memory=True
    )
    
    val_loader = DataLoader(
        val_dataset,
        batch_size=64,
        shuffle=False,
        collate_fn=lambda b: collate_fn(b, char2idx),
        num_workers=2,
        pin_memory=True
    )
    
    # Model
    model = BrainCTCModel(
        input_dim=512,
        hidden_dim=512,
        num_layers=3,
        vocab_size=len(char2idx),
        dropout=0.5
    ).to(device)
    
    # Optimizer and scheduler
    optimizer = AdamW(model.parameters(), lr=1e-3, weight_decay=1e-4)
    scheduler = OneCycleLR(
        optimizer,
        max_lr=1e-3,
        steps_per_epoch=len(train_loader),
        epochs=60
    )
    
    # CTC loss
    ctc_loss = nn.CTCLoss(blank=0, zero_infinity=True)
    
    print("Configuration:")
    print("  - Epochs: 60")
    print("  - Batch size: 64")
    print("  - Learning rate: 1e-3")
    print("  - Augmentations: SpecAugment + Noise + TimeWarp")
    print(f"{'='*80}\n")
    
    best_wer = float('inf')
    
    # Training loop
    for epoch in range(60):
        model.train()
        train_loss = 0.0
        
        pbar = tqdm(train_loader, desc=f'Epoch {epoch+1}/60')
        for features, targets, lengths, target_lengths in pbar:
            features = features.to(device)
            targets = targets.to(device)
            
            optimizer.zero_grad()
            
            log_probs, cnn_lengths = model(features, lengths)
            loss = ctc_loss(log_probs, targets, cnn_lengths, target_lengths)
            
            loss.backward()
            torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
            optimizer.step()
            scheduler.step()
            
            train_loss += loss.item()
            pbar.set_postfix({'loss': f'{loss.item():.4f}'})
        
        # Validation every 10 epochs
        if (epoch + 1) % 10 == 0:
            model.eval()
            from jiwer import wer
            
            all_preds = []
            all_targets = []
            
            with torch.no_grad():
                for features, _, lengths, _ in val_loader:
                    features = features.to(device)
                    log_probs, cnn_lengths = model(features, lengths)
                    
                    for b in range(log_probs.size(1)):
                        logits = log_probs[:cnn_lengths[b], b].cpu().numpy()
                        pred_indices = np.argmax(logits, axis=1)
                        
                        # Greedy CTC decoding
                        decoded = []
                        prev = None
                        for idx in pred_indices:
                            if idx != 0 and idx != prev:
                                decoded.append(idx2char.get(idx, ''))
                            prev = idx
                        
                        all_preds.append(''.join(decoded))
                        all_targets.append(
                            val_dataset.samples[len(all_preds)-1]['sentence']
                        )
            
            val_wer = wer(
                [t.lower() for t in all_targets],
                [p.lower() for p in all_preds]
            ) * 100
            
            print(f"\nEpoch {epoch+1} - WER: {val_wer:.2f}%")
            
            # Save best model
            if val_wer < best_wer:
                best_wer = val_wer
                torch.save({
                    'epoch': epoch + 1,
                    'model_state_dict': model.state_dict(),
                    'optimizer_state_dict': optimizer.state_dict(),
                    'wer': val_wer,
                    'char2idx': char2idx,
                    'idx2char': idx2char,
                    'config': {
                        'hidden_dim': 512,
                        'num_layers': 3,
                        'dropout': 0.5
                    }
                }, 'best_model_augmented.pt')
                print(f"‚úì Model saved (WER: {val_wer:.2f}%)")
    
    print(f"\n{'='*80}")
    print("TRAINING COMPLETED")
    print(f"Best WER: {best_wer:.2f}%")
    print(f"{'='*80}\n")
    
    return char2idx, idx2char


# ============================================================================
# BEAM SEARCH TEST INFERENCE
# ============================================================================

def inference_test(char2idx):
    """Runs beam search inference on the TEST set and creates submission file."""
    print(f"\n{'='*80}")
    print("TEST INFERENCE WITH BEAM SEARCH")
    print(f"{'='*80}\n")
    
    from pyctcdecode import build_ctcdecoder
    
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    
    # Load trained model
    checkpoint = torch.load('best_model_augmented.pt', map_location=device)
    
    model = BrainCTCModel(
        input_dim=512,
        hidden_dim=512,
        num_layers=3,
        vocab_size=len(char2idx),
        dropout=0.5
    ).to(device)
    
    model.load_state_dict(checkpoint['model_state_dict'])
    model.eval()
    
    print(f"‚úì Model loaded (val WER: {checkpoint['wer']:.2f}%)")
    
    # Build CTC decoder vocabulary
    vocab_list = [''] * len(char2idx)
    for char, idx in char2idx.items():
        vocab_list[idx] = '' if char == '<BLANK>' else char
    
    decoder = build_ctcdecoder(labels=vocab_list)
    print("‚úì Decoder created (beam_width=50)\n")
    
    # Load TEST set
    test_dataset = BrainDataset('test')
    
    all_preds = []
    print("Running inference on TEST set...")
    
    with torch.no_grad():
        for sample in tqdm(test_dataset.samples):
            features = sample['neural'].unsqueeze(0).to(device)
            length = torch.LongTensor([sample['length']])
            
            log_probs, cnn_lengths = model(features, length)
            logits = log_probs[:cnn_lengths[0], 0].cpu().numpy()
            probs = np.exp(logits)
            
            try:
                beam_results = decoder.decode_beams(
                    probs,
                    beam_width=50,
                    beam_prune_logp=-12.0,
                    token_min_logp=-8.0
                )
                pred_text = beam_results[0][0] if beam_results else ""
            except Exception:
                pred_text = ""
            
            all_preds.append(pred_text)
    
    # Save Kaggle submission
    with open('submission.csv', 'w', encoding='utf-8') as f:
        f.write('id,text\n')
        for i, pred in enumerate(all_preds):
            pred_clean = pred.replace('"', '""')
            f.write(f'{i},"{pred_clean}"\n')
    
    print(f"\n‚úì Submission saved: submission.csv ({len(all_preds)} predictions)")
    
    print("\nExamples:")
    for i in range(min(5, len(all_preds))):
        print(f"  {i}: {all_preds[i][:60]}")
    
    print(f"\n{'='*80}")
    print("üöÄ Ready for Kaggle submission!")
    print(f"{'='*80}\n")


# ============================================================================
# MAIN ENTRY POINT
# ============================================================================

def main():
    start = datetime.now()
    
    # Training phase
    char2idx, idx2char = train_model()
    
    # Test inference phase
    inference_test(char2idx)
    
    duration = (datetime.now() - start).total_seconds() / 60
    print(f"\n‚è±Ô∏è  Total runtime: {duration:.1f} min")


if __name__ == "__main__":
    try:
        main()
    except Exception as e:
        print(f"\n‚ùå ERROR: {e}")
        import traceback
        traceback.print_exc()


Installing dependencies...
   ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 3.2/3.2 MB 93.5 MB/s eta 0:00:00
     ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 61.0/61.0 kB 4.2 MB/s eta 0:00:00
   ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 538.1/538.1 kB 35.0 MB/s eta 0:00:00
   ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 18.0/18.0 MB 113.4 MB/s eta 0:00:00


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
bigframes 2.26.0 requires google-cloud-bigquery-storage<3.0.0,>=2.30.0, which is not installed.
cesium 0.12.4 requires numpy<3.0,>=2.0, but you have numpy 1.26.4 which is incompatible.
google-colab 1.0.0 requires jupyter-server==2.14.0, but you have jupyter-server 2.12.5 which is incompatible.
google-colab 1.0.0 requires requests==2.32.4, but you have requests 2.32.5 which is incompatible.
dopamine-rl 4.1.2 requires gymnasium>=1.0.0, but you have gymnasium 0.29.0 which is incompatible.
jaxlib 0.7.2 requires numpy>=2.0, but you have numpy 1.26.4 which is incompatible.
thinc 8.3.6 requires numpy<3.0.0,>=2.0.0, but you have numpy 1.26.4 which is incompatible.
opencv-contrib-python 4.12.0.88 requires numpy<2.3.0,>=2; python_version >= "3.9", but you have numpy 1.26.4 which is incompatible.
opencv-python 4.12.0.88 requ


################################################################################
##  BRAIN-TO-TEXT '25 - TRAIN + DATA AUGMENTATION
################################################################################


TRAINING WITH DATA AUGMENTATION

Device: cuda

Vocabulary size: 38 characters

Loading train split: 45 files


train: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 45/45 [03:25<00:00,  4.56s/it]


‚úì 8072 samples loaded
Loading val split: 41 files


val: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 41/41 [00:34<00:00,  1.18it/s]


‚úì 1426 samples loaded
Configuration:
  - Epochs: 60
  - Batch size: 64
  - Learning rate: 1e-3
  - Augmentations: SpecAugment + Noise + TimeWarp



Epoch 1/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:12<00:00,  1.74it/s, loss=3.0352]
Epoch 2/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:09<00:00,  1.83it/s, loss=2.8708]
Epoch 3/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.81it/s, loss=2.8723]
Epoch 4/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:11<00:00,  1.78it/s, loss=2.8721]
Epoch 5/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:11<00:00,  1.78it/s, loss=2.7023]
Epoch 6/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:11<00:00,  1.78it/s, loss=1.8585]
Epoch 7/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:11<00:00,  1.78it/s, loss=1.5385]
Epoch 8/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:12<00:00,  1.76it/s, loss=1.1238]
Epoch 9/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:11<00:00,  1.78it/s, loss=1.4150]
Epoch 10/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:11<00:00,  1.78it/s, loss=1.0730]



Epoch 10 - WER: 62.81%
‚úì Model saved (WER: 62.81%)


Epoch 11/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:11<00:00,  1.77it/s, loss=0.8003]
Epoch 12/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:11<00:00,  1.78it/s, loss=0.9184]
Epoch 13/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.79it/s, loss=0.9348]
Epoch 14/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:11<00:00,  1.79it/s, loss=1.1043]
Epoch 15/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:11<00:00,  1.78it/s, loss=0.7561]
Epoch 16/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:11<00:00,  1.77it/s, loss=1.2642]
Epoch 17/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:11<00:00,  1.78it/s, loss=1.2041]
Epoch 18/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:11<00:00,  1.78it/s, loss=0.5986]
Epoch 19/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.80it/s, loss=1.4548]
Epoch 20/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.79it/s, loss=0.4773]



Epoch 20 - WER: 43.20%
‚úì Model saved (WER: 43.20%)


Epoch 21/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:09<00:00,  1.82it/s, loss=0.3245]
Epoch 22/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:11<00:00,  1.79it/s, loss=0.4204]
Epoch 23/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.79it/s, loss=0.5367]
Epoch 24/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.81it/s, loss=0.3531]
Epoch 25/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.81it/s, loss=0.3457]
Epoch 26/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.80it/s, loss=0.2010]
Epoch 27/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:11<00:00,  1.79it/s, loss=0.3182]
Epoch 28/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.80it/s, loss=0.4301]
Epoch 29/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.81it/s, loss=0.5637]
Epoch 30/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.81it/s, loss=0.3867]



Epoch 30 - WER: 35.35%
‚úì Model saved (WER: 35.35%)


Epoch 31/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.80it/s, loss=0.2138]
Epoch 32/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.81it/s, loss=0.8477]
Epoch 33/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:09<00:00,  1.81it/s, loss=0.3832]
Epoch 34/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.80it/s, loss=0.3069]
Epoch 35/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.81it/s, loss=0.2004]
Epoch 36/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:09<00:00,  1.82it/s, loss=0.2604]
Epoch 37/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.80it/s, loss=0.4292]
Epoch 38/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.79it/s, loss=0.1426]
Epoch 39/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.81it/s, loss=0.2845]
Epoch 40/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.81it/s, loss=0.0443]



Epoch 40 - WER: 31.97%
‚úì Model saved (WER: 31.97%)


Epoch 41/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:09<00:00,  1.82it/s, loss=0.0765]
Epoch 42/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.80it/s, loss=0.1280]
Epoch 43/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:09<00:00,  1.81it/s, loss=0.0787]
Epoch 44/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.81it/s, loss=0.0874]
Epoch 45/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.81it/s, loss=0.1937]
Epoch 46/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.81it/s, loss=0.1536]
Epoch 47/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:09<00:00,  1.83it/s, loss=0.0765]
Epoch 48/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:09<00:00,  1.82it/s, loss=0.0580]
Epoch 49/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.81it/s, loss=0.2025]
Epoch 50/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:09<00:00,  1.82it/s, loss=0.0396]



Epoch 50 - WER: 29.79%
‚úì Model saved (WER: 29.79%)


Epoch 51/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.80it/s, loss=0.2763]
Epoch 52/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:09<00:00,  1.82it/s, loss=0.0128]
Epoch 53/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.81it/s, loss=0.1220]
Epoch 54/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.81it/s, loss=0.1981]
Epoch 55/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.81it/s, loss=0.0207]
Epoch 56/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.81it/s, loss=0.0523]
Epoch 57/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.81it/s, loss=0.0205]
Epoch 58/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:10<00:00,  1.81it/s, loss=0.1722]
Epoch 59/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:09<00:00,  1.82it/s, loss=0.0469]
Epoch 60/60: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 127/127 [01:09<00:00,  1.82it/s, loss=0.0795]



Epoch 60 - WER: 29.35%


kenlm python bindings are not installed. Most likely you want to install it using: pip install https://github.com/kpu/kenlm/archive/master.zip
kenlm python bindings are not installed. Most likely you want to install it using: pip install https://github.com/kpu/kenlm/archive/master.zip


‚úì Model saved (WER: 29.35%)

TRAINING COMPLETED
Best WER: 29.35%


TEST INFERENCE WITH BEAM SEARCH

‚úì Model loaded (val WER: 29.35%)
‚úì Decoder created (beam_width=50)

Loading test split: 41 files


test: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 41/41 [00:41<00:00,  1.01s/it]


‚úì 1450 samples loaded
Running inference on TEST set...


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1450/1450 [01:46<00:00, 13.55it/s]


‚úì Submission saved: submission.csv (1450 predictions)

Examples:
  0: i get tcirend with the song and dats beteen.
  1: emorinci care.
  2: you ceat a migeal surprise.
  3: i think maybe you like at it.
  4: show that they do have problems.

üöÄ Ready for Kaggle submission!


‚è±Ô∏è  Total runtime: 77.8 min



