<a href="https://colab.research.google.com/github/FurqanBhat/Transformers-from-SCRATCH/blob/main/transformers_from_scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install torch torchmetrics transformers datasets tokenizers tqdm

Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.5.147 (from torch)
  Downloading nvidia_curand_cu12-10.3.5.147-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cusolver-cu12==11.6.1.9 (from torch)
  Downloading nvidia_cusolver_cu12-11.6.1.9-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cusparse-cu12==12.3.1.170 (from torch)
  Downloading nvidia_cusparse_cu12-12.3.1.170-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-nvjitlink-cu12==12.4.127 (from torch)
  Downloading nvidia_nvjitlink_cu12-1

In [None]:
# ==============================================================================
# All Imports
# ==============================================================================
import torch
import torch.nn as nn
from torch.utils.data import Dataset, random_split, DataLoader
from torch.utils.tensorboard import SummaryWriter
import torchmetrics

from datasets import load_dataset
from tokenizers import Tokenizer
from tokenizers.models import WordLevel
from tokenizers.trainers import WordLevelTrainer
from tokenizers.pre_tokenizers import Whitespace

from tqdm import tqdm
import warnings
from pathlib import Path
import math

# ==============================================================================
# config.py
# ==============================================================================
def get_config():
    return {
        "batch_size": 8,
        "num_epochs": 20,
        "lr": 10**-4,
        "seq_len": 350,
        "d_model": 512,
        "lang_src": "en",
        "lang_tgt": "it",
        "model_folder": "weights",
        "model_basename": "tmodel_",
        "preload": None,
        "tokenizer_file": "tokenizer_{0}.json",
        "experiment_name": "runs/tmodel"
    }

def get_weights_file_path(config, epoch: str):
    model_folder = config["model_folder"]
    model_basename = config["model_basename"]
    model_filename = f"{model_basename}{epoch}.pt"
    return str(Path('.') / model_folder / model_filename)

# ==============================================================================
# Causal Mask (used by both dataset.py and train.py)
# ==============================================================================
def causal_mask(size):
    mask = torch.triu(torch.ones(1, size, size), diagonal=1).type(torch.int)
    return mask == 0

# ==============================================================================
# model.py
# ==============================================================================
class InputEmbeddings(nn.Module):
    def __init__(self, d_model: int, vocab_size: int):
        super().__init__()
        self.d_model = d_model
        self.vocab_size = vocab_size
        self.embedding = nn.Embedding(vocab_size, d_model)

    def forward(self, x):
        return self.embedding(x) * math.sqrt(self.d_model)

class PositionalEncoding(nn.Module):
    def __init__(self, d_model: int, seq_len: int, dropout: float):
        super().__init__()
        self.d_model = d_model
        self.seq_len = seq_len
        self.dropout = nn.Dropout(dropout)
        pe = torch.zeros(seq_len, d_model)
        position = torch.arange(0, seq_len, dtype=torch.float).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model))
        pe[:, 0::2] = torch.sin(position * div_term)
        pe[:, 1::2] = torch.cos(position * div_term)
        pe = pe.unsqueeze(0)
        self.register_buffer('pe', pe)

    def forward(self, x):
        x = x + (self.pe[:, :x.shape[1], :]).requires_grad_(False)
        return self.dropout(x)

class LayerNormalization(nn.Module):
    def __init__(self, eps: float = 10**-6):
        super().__init__()
        self.eps = eps
        self.alpha = nn.Parameter(torch.ones(1))
        self.bias = nn.Parameter(torch.zeros(1))

    def forward(self, x):
        mean = x.mean(dim=-1, keepdim=True)
        std = x.std(dim=-1, keepdim=True)
        return self.alpha * (x - mean) / (std + self.eps) + self.bias

class FeedForwardBlock(nn.Module):
    def __init__(self, d_model: int, d_ff: int, dropout: float):
        super().__init__()
        self.linear1 = nn.Linear(d_model, d_ff)
        self.dropout = nn.Dropout(dropout)
        self.linear2 = nn.Linear(d_ff, d_model)

    def forward(self, x):
        return self.linear2(self.dropout(torch.relu(self.linear1(x))))

class MultiHeadAttentionBlock(nn.Module):
    def __init__(self, d_model: int, h: int, dropout: float):
        super().__init__()
        self.d_model = d_model
        self.h = h
        assert d_model % h == 0, "d_model is not divisible by h"
        self.d_k = d_model // h
        self.w_q = nn.Linear(d_model, d_model)
        self.w_k = nn.Linear(d_model, d_model)
        self.w_v = nn.Linear(d_model, d_model)
        self.w_o = nn.Linear(d_model, d_model)
        self.dropout = nn.Dropout(dropout)

    @staticmethod
    def attention(query, key, value, mask, dropout: nn.Dropout):
        d_k = query.shape[-1]
        attention_scores = (query @ key.transpose(-2, -1)) / math.sqrt(d_k)
        if mask is not None:
            attention_scores.masked_fill_(mask == 0, -1e9)
        attention_scores = attention_scores.softmax(dim=-1)
        if dropout is not None:
            attention_scores = dropout(attention_scores)
        return (attention_scores @ value), attention_scores

    def forward(self, q, k, v, mask):
        query = self.w_q(q)
        key = self.w_k(k)
        value = self.w_v(v)
        query = query.view(query.shape[0], query.shape[1], self.h, self.d_k).transpose(1, 2)
        key = key.view(key.shape[0], key.shape[1], self.h, self.d_k).transpose(1, 2)
        value = value.view(value.shape[0], value.shape[1], self.h, self.d_k).transpose(1, 2)
        x, self.attention_scores = MultiHeadAttentionBlock.attention(query, key, value, mask, self.dropout)
        x = x.transpose(1, 2).contiguous().view(x.shape[0], -1, self.h * self.d_k)
        return self.w_o(x)

class ResidualConnection(nn.Module):
    def __init__(self, dropout: float):
        super().__init__()
        self.dropout = nn.Dropout(dropout)
        self.norm = LayerNormalization()

    def forward(self, x, sublayer):
        return x + self.dropout(sublayer(self.norm(x)))

class EncoderBlock(nn.Module):
    def __init__(self, self_attention_block: MultiHeadAttentionBlock, feed_forward_block: FeedForwardBlock, dropout: float):
        super().__init__()
        self.self_attention_block = self_attention_block
        self.feed_forward_block = feed_forward_block
        self.residual_connections = nn.ModuleList([ResidualConnection(dropout) for _ in range(2)])

    def forward(self, x, src_mask):
        x = self.residual_connections[0](x, lambda x: self.self_attention_block(x, x, x, src_mask))
        x = self.residual_connections[1](x, self.feed_forward_block)
        return x

class Encoder(nn.Module):
    def __init__(self, layers: nn.ModuleList):
        super().__init__()
        self.layers = layers
        self.norm = LayerNormalization()

    def forward(self, x, mask):
        for layer in self.layers:
            x = layer(x, mask)
        return self.norm(x)

class DecoderBlock(nn.Module):
    def __init__(self, self_attention_block: MultiHeadAttentionBlock, cross_attention_block: MultiHeadAttentionBlock, feed_forward_block: FeedForwardBlock, dropout: float):
        super().__init__()
        self.self_attention_block = self_attention_block
        self.cross_attention_block = cross_attention_block
        self.feed_forward_block = feed_forward_block
        self.residual_connections = nn.ModuleList([ResidualConnection(dropout) for _ in range(3)])

    def forward(self, x, encoder_output, src_mask, tgt_mask):
        x = self.residual_connections[0](x, lambda x: self.self_attention_block(x, x, x, tgt_mask))
        x = self.residual_connections[1](x, lambda x: self.cross_attention_block(x, encoder_output, encoder_output, src_mask))
        x = self.residual_connections[2](x, self.feed_forward_block)
        return x

class Decoder(nn.Module):
    def __init__(self, layers: nn.ModuleList):
        super().__init__()
        self.layers = layers
        self.norm = LayerNormalization()

    def forward(self, x, encoder_output, src_mask, tgt_mask):
        for layer in self.layers:
            x = layer(x, encoder_output, src_mask, tgt_mask)
        return self.norm(x)

class ProjectionLayer(nn.Module):
    def __init__(self, d_model: int, vocab_size: int):
        super().__init__()
        self.proj = nn.Linear(d_model, vocab_size)

    def forward(self, x):
        return torch.log_softmax(self.proj(x), dim=-1)

class Transformer(nn.Module):
    def __init__(self, encoder: Encoder, decoder: Decoder, src_embed: InputEmbeddings, tgt_embed: InputEmbeddings, src_pos: PositionalEncoding, tgt_pos: PositionalEncoding, projection_layer: ProjectionLayer):
        super().__init__()
        self.encoder = encoder
        self.decoder = decoder
        self.src_embed = src_embed
        self.tgt_embed = tgt_embed
        self.src_pos = src_pos
        self.tgt_pos = tgt_pos
        self.projection_layer = projection_layer

    def encode(self, src, src_mask):
        src = self.src_embed(src)
        src = self.src_pos(src)
        return self.encoder(src, src_mask)

    def decode(self, encoder_output, src_mask, tgt, tgt_mask):
        tgt = self.tgt_embed(tgt)
        tgt = self.tgt_pos(tgt)
        return self.decoder(tgt, encoder_output, src_mask, tgt_mask)

    def project(self, x):
        return self.projection_layer(x)

def build_transformer(src_vocab_size: int, tgt_vocab_size: int, src_seq_len: int, tgt_seq_len: int, d_model: int = 512, N: int = 6, h: int = 8, dropout: float = 0.1, d_ff: int = 2048) -> Transformer:
    src_embed = InputEmbeddings(d_model, src_vocab_size)
    tgt_embed = InputEmbeddings(d_model, tgt_vocab_size)
    src_pos = PositionalEncoding(d_model, src_seq_len, dropout)
    tgt_pos = PositionalEncoding(d_model, tgt_seq_len, dropout)
    encoder_blocks = []
    for _ in range(N):
        encoder_self_attention_block = MultiHeadAttentionBlock(d_model, h, dropout)
        feed_forward_block = FeedForwardBlock(d_model, d_ff, dropout)
        encoder_block = EncoderBlock(encoder_self_attention_block, feed_forward_block, dropout)
        encoder_blocks.append(encoder_block)
    decoder_blocks = []
    for _ in range(N):
        decoder_self_attention_block = MultiHeadAttentionBlock(d_model, h, dropout)
        decoder_cross_attention_block = MultiHeadAttentionBlock(d_model, h, dropout)
        feed_forward_block = FeedForwardBlock(d_model, d_ff, dropout)
        decoder_block = DecoderBlock(decoder_self_attention_block, decoder_cross_attention_block, feed_forward_block, dropout)
        decoder_blocks.append(decoder_block)
    encoder = Encoder(nn.ModuleList(encoder_blocks))
    decoder = Decoder(nn.ModuleList(decoder_blocks))
    projection_layer = ProjectionLayer(d_model, tgt_vocab_size)
    transformer = Transformer(encoder, decoder, src_embed, tgt_embed, src_pos, tgt_pos, projection_layer)
    for p in transformer.parameters():
        if p.dim() > 1:
            nn.init.xavier_uniform_(p)
    return transformer

# ==============================================================================
# dataset.py
# ==============================================================================
class BilingualDataset(Dataset):
    def __init__(self, ds, tokenizer_src, tokenizer_tgt, src_lang, tgt_lang, seq_len):
        super().__init__()
        self.ds = ds
        self.tokenizer_src = tokenizer_src
        self.tokenizer_tgt = tokenizer_tgt
        self.src_lang = src_lang
        self.tgt_lang = tgt_lang
        self.seq_len = seq_len
        self.sos_token = torch.tensor([tokenizer_tgt.token_to_id("[SOS]")], dtype=torch.int64)
        self.eos_token = torch.tensor([tokenizer_tgt.token_to_id("[EOS]")], dtype=torch.int64)
        self.pad_token = torch.tensor([tokenizer_tgt.token_to_id("[PAD]")], dtype=torch.int64)

    def __len__(self):
        return len(self.ds)

    def __getitem__(self, idx):
        src_target_pair = self.ds[idx]
        src_text = src_target_pair['translation'][self.src_lang]
        tgt_text = src_target_pair['translation'][self.tgt_lang]

        # Corrected a bug from your original file: used tokenizer_tgt for target text
        enc_input_tokens = self.tokenizer_src.encode(src_text).ids
        dec_input_tokens = self.tokenizer_tgt.encode(tgt_text).ids

        enc_num_padding_tokens = self.seq_len - len(enc_input_tokens) - 2
        dec_num_padding_tokens = self.seq_len - len(dec_input_tokens) - 1

        if enc_num_padding_tokens < 0 or dec_num_padding_tokens < 0:
            raise ValueError('Sentence is too long')

        encoder_input = torch.cat([
            self.sos_token,
            torch.tensor(enc_input_tokens, dtype=torch.int64),
            self.eos_token,
            torch.tensor([self.pad_token] * enc_num_padding_tokens, dtype=torch.int64)
        ], dim=0)

        decoder_input = torch.cat([
            self.sos_token,
            torch.tensor(dec_input_tokens, dtype=torch.int64),
            torch.tensor([self.pad_token] * dec_num_padding_tokens, dtype=torch.int64)
        ], dim=0)

        # ----------------- THIS IS THE FIX -----------------
        # The number of padding tokens for the label must ensure its total length is seq_len.
        label = torch.cat([
            torch.tensor(dec_input_tokens, dtype=torch.int64),
            self.eos_token,
            torch.tensor([self.pad_token] * dec_num_padding_tokens, dtype=torch.int64)
        ], dim=0)
        # ---------------------------------------------------

        assert encoder_input.size(0) == self.seq_len
        assert decoder_input.size(0) == self.seq_len
        assert label.size(0) == self.seq_len

        return {
            "encoder_input": encoder_input,
            "decoder_input": decoder_input,
            "encoder_mask": (encoder_input != self.pad_token).unsqueeze(0).unsqueeze(0).int(),
            "decoder_mask": (decoder_input != self.pad_token).unsqueeze(0).int() & causal_mask(decoder_input.size(0)),
            "label": label,
            "src_text": src_text,
            "tgt_text": tgt_text,
        }

# ==============================================================================
# train.py
# ==============================================================================
def greedy_decode(model, source, source_mask, tokenizer_src, tokenizer_tgt, max_len, device):
    sos_idx = tokenizer_tgt.token_to_id('[SOS]')
    eos_idx = tokenizer_tgt.token_to_id('[EOS]')
    encoder_output = model.encode(source, source_mask)
    decoder_input = torch.empty(1, 1).fill_(sos_idx).type_as(source).to(device)
    while True:
        if decoder_input.size(1) == max_len:
            break
        decoder_mask = causal_mask(decoder_input.size(1)).type_as(source_mask).to(device)
        out = model.decode(encoder_output, source_mask, decoder_input, decoder_mask)
        prob = model.project(out[:, -1])
        _, next_word = torch.max(prob, dim=1)
        decoder_input = torch.cat(
            [decoder_input, torch.empty(1, 1).type_as(source).fill_(next_word.item()).to(device)], dim=1
        )
        if next_word == eos_idx:
            break
    return decoder_input.squeeze(0)

def run_validation(model, validation_ds, tokenizer_src, tokenizer_tgt, max_len, device, print_msg, global_step, writer, num_examples=2):
    model.eval()
    count = 0
    source_texts = []
    expected = []
    predicted = []
    console_width = 80
    with torch.no_grad():
        for batch in validation_ds:
            count += 1
            encoder_input = batch["encoder_input"].to(device)
            encoder_mask = batch["encoder_mask"].to(device)
            assert encoder_input.size(0) == 1, "Batch size must be 1 for validation"
            model_out = greedy_decode(model, encoder_input, encoder_mask, tokenizer_src, tokenizer_tgt, max_len, device)
            source_text = batch["src_text"][0]
            target_text = batch["tgt_text"][0]
            model_out_text = tokenizer_tgt.decode(model_out.detach().cpu().numpy())
            source_texts.append(source_text)
            expected.append(target_text)
            predicted.append(model_out_text)
            print_msg('-' * console_width)
            print_msg(f"SOURCE: {source_text}")
            print_msg(f"TARGET: {target_text}")
            print_msg(f"PREDICTED: {model_out_text}")
            if count == num_examples:
                break
    if writer:
        metric = torchmetrics.CharErrorRate()
        cer = metric(predicted, expected)
        writer.add_scalar('validation cer', cer, global_step)
        writer.flush()
        metric = torchmetrics.WordErrorRate()
        wer = metric(predicted, expected)
        writer.add_scalar('validation wer', wer, global_step)
        writer.flush()
        metric = torchmetrics.BLEUScore()
        bleu = metric(predicted, expected)
        writer.add_scalar('validation BLEU', bleu, global_step)
        writer.flush()

def get_all_sentences(ds, lang):
    for item in ds:
        yield item['translation'][lang]

def get_or_build_tokenizer(config, ds, lang):
    tokenizer_path = Path(config['tokenizer_file'].format(lang))
    if not Path.exists(tokenizer_path):
        tokenizer = Tokenizer(WordLevel(unk_token='[UNK]'))
        tokenizer.pre_tokenizer = Whitespace()
        trainer = WordLevelTrainer(special_tokens=['[UNK]', '[PAD]', '[SOS]', '[EOS]'], min_frequency=2)
        tokenizer.train_from_iterator(get_all_sentences(ds, lang), trainer=trainer)
        tokenizer.save(str(tokenizer_path))
    else:
        tokenizer = Tokenizer.from_file(str(tokenizer_path))
    return tokenizer

def get_ds(config):
    ds_raw = load_dataset('opus_books', f'{config["lang_src"]}-{config["lang_tgt"]}', split='train')
    tokenizer_src = get_or_build_tokenizer(config, ds_raw, config['lang_src'])
    tokenizer_tgt = get_or_build_tokenizer(config, ds_raw, config['lang_tgt'])
    train_ds_size = int(0.9 * len(ds_raw))
    val_ds_size = len(ds_raw) - train_ds_size
    train_ds_raw, val_ds_raw = random_split(ds_raw, [train_ds_size, val_ds_size])
    train_ds = BilingualDataset(train_ds_raw, tokenizer_src, tokenizer_tgt, config['lang_src'], config['lang_tgt'], config['seq_len'])
    val_ds = BilingualDataset(val_ds_raw, tokenizer_src, tokenizer_tgt, config['lang_src'], config['lang_tgt'], config['seq_len'])

    max_len_src = 0
    max_len_tgt = 0
    for item in ds_raw:
        src_ids = tokenizer_src.encode(item['translation'][config['lang_src']]).ids
        tgt_ids = tokenizer_tgt.encode(item['translation'][config['lang_tgt']]).ids
        max_len_src = max(max_len_src, len(src_ids))
        max_len_tgt = max(max_len_tgt, len(tgt_ids))

    print(f'Max length of source sentence: {max_len_src}')
    print(f'Max length of target sentence: {max_len_tgt}')

    train_dataloader = DataLoader(train_ds, batch_size=config['batch_size'], shuffle=True, num_workers=2)
    val_dataloader = DataLoader(val_ds, batch_size=1, shuffle=True, num_workers=1)
    return train_dataloader, val_dataloader, tokenizer_src, tokenizer_tgt

def get_model(config, vocab_src_len, vocab_tgt_len):
    model = build_transformer(vocab_src_len, vocab_tgt_len, config['seq_len'], config['seq_len'], config['d_model'])
    return model

def train_model(config):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print(f"Using device: {device}")
    Path(config['model_folder']).mkdir(parents=True, exist_ok=True)
    train_dataloader, val_dataloader, tokenizer_src, tokenizer_tgt = get_ds(config)
    model = get_model(config, tokenizer_src.get_vocab_size(), tokenizer_tgt.get_vocab_size()).to(device)
    writer = SummaryWriter(config['experiment_name'])
    optimizer = torch.optim.Adam(model.parameters(), lr=config['lr'], eps=1e-9)

    initial_epoch = 0
    global_step = 0
    if config['preload']:
        model_filename = get_weights_file_path(config, config['preload'])
        print(f'Preloading model {model_filename}')
        state = torch.load(model_filename)
        model.load_state_dict(state['model_state_dict'])
        initial_epoch = state['epoch'] + 1
        optimizer.load_state_dict(state['optimizer_state_dict'])
        global_step = state['global_step']

    loss_fn = nn.CrossEntropyLoss(ignore_index=tokenizer_src.token_to_id('[PAD]'), label_smoothing=0.1).to(device)

    for epoch in range(initial_epoch, config['num_epochs']):
        torch.cuda.empty_cache()
        model.train()
        batch_iterator = tqdm(train_dataloader, desc=f"Processing Epoch {epoch:02d}")
        for batch in batch_iterator:
            encoder_input = batch['encoder_input'].to(device)
            decoder_input = batch['decoder_input'].to(device)
            encoder_mask = batch['encoder_mask'].to(device)
            decoder_mask = batch['decoder_mask'].to(device)

            encoder_output = model.encode(encoder_input, encoder_mask)
            decoder_output = model.decode(encoder_output, encoder_mask, decoder_input, decoder_mask)
            proj_output = model.project(decoder_output)

            label = batch['label'].to(device)
            loss = loss_fn(proj_output.view(-1, tokenizer_tgt.get_vocab_size()), label.view(-1))
            batch_iterator.set_postfix({"loss": f"{loss.item():6.3f}"})

            writer.add_scalar('train loss', loss.item(), global_step)
            writer.flush()

            loss.backward()
            optimizer.step()
            optimizer.zero_grad(set_to_none=True)

            global_step += 1

        run_validation(model, val_dataloader, tokenizer_src, tokenizer_tgt, config['seq_len'], device, lambda msg: batch_iterator.write(msg), global_step, writer)

        model_filename = get_weights_file_path(config, f"{epoch:02d}")
        torch.save({
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'global_step': global_step
        }, model_filename)



2025-06-10 00:23:12.728296: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1749514992.972094      19 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1749514993.041212      19 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


In [None]:
warnings.filterwarnings("ignore")
config = get_config()
train_model(config)

Using device: cuda


README.md:   0%|          | 0.00/28.1k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/5.73M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/32332 [00:00<?, ? examples/s]

Max length of source sentence: 309
Max length of target sentence: 274


Processing Epoch 00:   0%|          | 0/3638 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Processing Epoch 00: 100%|██████████| 3638/3638 [29:11<00:00,  2.08it/s, loss=3.921]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


--------------------------------------------------------------------------------
SOURCE: 'Perhaps.
TARGET: — Sì, può darsi....
PREDICTED: — E il signor Rochester .
--------------------------------------------------------------------------------
SOURCE: "The child ought to have change of air and scene," he added, speaking to himself; "nerves not in a good state."
TARGET: — Per questa bimba ci vorrebbe un cambiamento d'aria e di luogo — aggiunse come se parlasse a sè stesso. I suoi nervi non sono in buono stato.
PREDICTED: — , e la mia voce , — disse , — disse la mia vita , — non è più .


Processing Epoch 01:   0%|          | 0/3638 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Processing Epoch 01: 100%|██████████| 3638/3638 [29:26<00:00,  2.06it/s, loss=5.319]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


--------------------------------------------------------------------------------
SOURCE: The large, beautiful, perfectly regular shape of the horse, with his wonderful hindquarters and his exceptionally short pasterns just above his hoofs, involuntarily arrested Vronsky's attention.
TARGET: Le forme grandi, stupende, del tutto regolari dello stallone dal dorso magnifico e le giunture straordinariamente corte proprio al di sopra degli zoccoli, fermarono involontariamente l’attenzione di Vronskij.
PREDICTED: Il fiume , , che il suo viso , , e , , , a Levin , che aveva fatto il suo marito .
--------------------------------------------------------------------------------
SOURCE: He wished to glance round but dared not do so, and he tried to keep calm and not to urge his mare, but to let her retain a reserve of strength such as he felt that Gladiator still had.
TARGET: Voleva voltarsi indietro a guardare, ma non osava, e cercava di calmarsi e di non lanciare la cavalla per non sciupare in e

Processing Epoch 02:   0%|          | 0/3638 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Processing Epoch 02: 100%|██████████| 3638/3638 [29:21<00:00,  2.07it/s, loss=3.605]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


--------------------------------------------------------------------------------
SOURCE: As a child that has been hurt skips about, making its muscles move in order to dull its pain, so Karenin needed mental activity to smother those thoughts about his wife which in her presence and in the presence of Vronsky, and amid the continual mention of his name, forced themselves upon him.
TARGET: Come un bambino che, dopo aver urtato in qualche cosa, mette in moto, saltando, i propri muscoli per soffocare il dolore, così Aleksej Aleksandrovic aveva bisogno di un moto intellettuale per soffocare quei suoi pensieri sulla moglie che ora, alla presenza di lei e alla presenza di Vronskij, e alla continua ripetizione del nome di lui, urgevano perché si prestasse loro attenzione.
PREDICTED: E , come è un uomo che si , come un uomo che , come si , come se la sua vita , Aleksej Aleksandrovic , che la sua vita , la sua vita , la sua vita , e la sua vita , che la sua vita , la sua vita , la sua vita , la

Processing Epoch 03:   0%|          | 0/3638 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Processing Epoch 03: 100%|██████████| 3638/3638 [29:20<00:00,  2.07it/s, loss=4.892]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


--------------------------------------------------------------------------------
SOURCE: 'No, not one, I swear.
TARGET: — Eh, via, neppure una.
PREDICTED: — No , non ho bisogno di Dio !
--------------------------------------------------------------------------------
SOURCE: This place I was obliged to leave four days before I came here.
TARGET: Fui obbligata di lasciare quel posto quattro giorni prima di venire qui.
PREDICTED: Questo giorno mi fu molto difficile che io avessi potuto .


Processing Epoch 04:   0%|          | 0/3638 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Processing Epoch 04: 100%|██████████| 3638/3638 [29:21<00:00,  2.06it/s, loss=4.965]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


--------------------------------------------------------------------------------
SOURCE: "Well, but what?"
TARGET: — Altro che?
PREDICTED: — Ebbene , ma che cosa ?
--------------------------------------------------------------------------------
SOURCE: “No shoot,” says Friday, “no yet; me shoot now, me no kill; me stay, give you one more laugh:” and, indeed, so he did; for when the bear saw his enemy gone, he came back from the bough, where he stood, but did it very cautiously, looking behind him every step, and coming backward till he got into the body of the tree, then, with the same hinder end foremost, he came down the tree, grasping it with his claws, and moving one foot at a time, very leisurely.
TARGET: Venerdì saltò tanto, e i corrispondenti atti dell’orso furono tanto grotteschi, che avemmo campo a ridere per un bel pezzo Dopo di che andato all’estrema punta del ramo, laddove poteva farlo piegare col proprio peso vi si attaccò, e lasciandosi bellamente calar giù finchè fosse v

Processing Epoch 05:   0%|          | 0/3638 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Processing Epoch 05: 100%|██████████| 3638/3638 [29:23<00:00,  2.06it/s, loss=4.972]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


--------------------------------------------------------------------------------
SOURCE: Another thing was that, having read a great many books he became convinced that those who shared his outlook understood only what he had understood, explaining nothing and merely ignoring those problems – without a solution to which he felt he could not live, – but trying to solve quite other problems which could not interest him, such as, for instance, the development of organisms, a mechanical explanation of the soul, and so on.
TARGET: Un’altra cosa era che, avendo letto molti libri, s’era convinto che le persone le quali condividevano le sue opinioni non intendevano null’altro e, senza spiegar nulla, negavano non soltanto le questioni, senza la soluzione delle quali egli sentiva di non poter vivere, ma cercavano di risolvere questioni del tutto diverse, che non potevano interessarlo, come, per esempio, quella dell’evoluzione degli organismi, quella della spiegazione meccanica dell’anima e simil

Processing Epoch 06:   0%|          | 0/3638 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Processing Epoch 06: 100%|██████████| 3638/3638 [29:28<00:00,  2.06it/s, loss=3.645]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


--------------------------------------------------------------------------------
SOURCE: 'You are married, I hear?' said the landowner.
TARGET: — Vi siete ammogliato, ho sentito — disse il proprietario.
PREDICTED: — Voi , che mi hai sentito ? — disse il proprietario .
--------------------------------------------------------------------------------
SOURCE: 'Ah, so you are here!' she said on seeing him. 'Well, how is your poor sister?
TARGET: — Ah, anche voi siete qui — ella disse nel vederlo. — Be’, come va la vostra povera sorella?
PREDICTED: — Ah , siete qui ! — ella disse , guardando il tuo nome . — Su , come va ’ la tua sorella ?


Processing Epoch 07:   0%|          | 0/3638 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Processing Epoch 07: 100%|██████████| 3638/3638 [29:26<00:00,  2.06it/s, loss=3.714]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


--------------------------------------------------------------------------------
SOURCE: So it happens with fortune, who shows her power where valour has not prepared to resist her, and thither she turns her forces where she knows that barriers and defences have not been raised to constrain her.
TARGET: Similmente interviene della fortuna: la quale dimonstra la sua potenzia dove non è ordinata virtù a resisterle, e quivi volta li sua impeti, dove la sa che non sono fatti li argini e li ripari a tenerla.
PREDICTED: Così , con la fortuna , che la sua fortuna non ha più di non , e ha cercato di di , e che non si è più di , e che non si .
--------------------------------------------------------------------------------
SOURCE: 'N'est-ce pas immoral?' [Isn't it immoral?] was all she said after a pause.
TARGET: — N’est-ce-pas immoral? — ella disse soltanto dopo un po’ di silenzio.
PREDICTED: — C ’ est pas à ridicule ? — disse lei . — Non c ’ è un letto di silenzio .


Processing Epoch 08:   0%|          | 0/3638 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Processing Epoch 08: 100%|██████████| 3638/3638 [29:26<00:00,  2.06it/s, loss=3.895]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


--------------------------------------------------------------------------------
SOURCE: "You would often see him?
TARGET: — Lo vedevate spesso?
PREDICTED: — Avete spesso spesso a vedere ?
--------------------------------------------------------------------------------
SOURCE: Yashvin's face wore the expression it had when he was losing at cards.
TARGET: Sul viso di Jašvin c’era l’espressione che soleva avere quando perdeva al giuoco.
PREDICTED: Jašvin aveva l ’ espressione del viso di Jašvin quando era stato da mangiare .


Processing Epoch 09:   0%|          | 0/3638 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Processing Epoch 09: 100%|██████████| 3638/3638 [29:23<00:00,  2.06it/s, loss=3.018]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


--------------------------------------------------------------------------------
SOURCE: Already in February he had received a letter from Mary Nikolavna to say that his brother Nicholas's health was getting worse, but that he would not submit to any treatment. In consequence of this news Levin went to Moscow, saw his brother, and managed to persuade him to consult a doctor and go to a watering-place abroad.
TARGET: Inoltre, in febbraio, aveva ricevuta da Mar’ja Nikolaevna una lettera in cui si diceva che le condizioni di salute del fratello erano peggiorate, e che egli non voleva curarsi; in seguito a questa lettera, Levin era andato a Mosca e aveva fatto in tempo a persuadere il fratello a consigliarsi con un medico e ad andare all’estero per la cura delle acque.
PREDICTED: Durante il tè , era rimasto da una lettera da Mar ’ ja Nikolaevna che aveva parlato del fratello Nikolaj , ma che non temeva che fosse la notizia di Levin non potesse più la notizia di Mosca , e Levin si era allon

Processing Epoch 10:   0%|          | 0/3638 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Processing Epoch 10: 100%|██████████| 3638/3638 [29:21<00:00,  2.07it/s, loss=3.538]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


--------------------------------------------------------------------------------
SOURCE: CHAPTER I
TARGET: PARTE PRIMA
PREDICTED: I
--------------------------------------------------------------------------------
SOURCE: 'Throw up everything and let us two conceal ourselves somewhere alone with our love,' said he to himself.
TARGET: “Lei ed io dobbiamo abbandonare tutto e andarci a nascondere in qualche luogo, noi due, con il nostro amore” disse a se stesso.
PREDICTED: — tutto e parleremo , perché ci soltanto due — disse , cercando di indovinare in precedenza che era lui .


Processing Epoch 11:   0%|          | 0/3638 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Processing Epoch 11: 100%|██████████| 3638/3638 [29:23<00:00,  2.06it/s, loss=3.101]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


--------------------------------------------------------------------------------
SOURCE: I want to stop here, leaning up against this gritty old wall.
TARGET: Voglio star qui appoggiato a questo muro.
PREDICTED: Avrei voglia di tornare indietro , con questa , al muro .
--------------------------------------------------------------------------------
SOURCE: More carriages kept driving up, and now ladies with flowers in their hair got out, holding up their trains; or men appeared who doffed their military caps or black hats as they entered the church.
TARGET: Le carrozze si avvicinavano ininterrottamente, e ora signore con fiori e con gli strascichi sollevati, ora uomini che si toglievano il chepì o il cappello nero, entravano in chiesa.
PREDICTED: Dietro alle signore si avvicinarono a loro , e ora , con la loro camicia , si afferrò con un movimento delle loro signore , si separarono e si come il Maškin Verch .


Processing Epoch 12:   0%|          | 0/3638 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Processing Epoch 12: 100%|██████████| 3638/3638 [29:25<00:00,  2.06it/s, loss=2.658]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


--------------------------------------------------------------------------------
SOURCE: In uttering these words I looked up: he seemed to me a tall gentleman; but then I was very little; his features were large, and they and all the lines of his frame were equally harsh and prim.
TARGET: Mi parve alto, ma mi ricordo che io allora ero molto piccola. I tratti di lui mi parvero molto marcati, e vi scorsi, come nelle linee di tutta la persona, una espressione di durezza e d'ipocrisia.
PREDICTED: Quelle parole mi parvero parole e mi pareva che un signore era un poco debole , ma era una piccola faccia , e poi i tratti erano e i suoi tratti , che erano e il suo carattere .
--------------------------------------------------------------------------------
SOURCE: 'Ah, so you are here!' she said on seeing him. 'Well, how is your poor sister?
TARGET: — Ah, anche voi siete qui — ella disse nel vederlo. — Be’, come va la vostra povera sorella?
PREDICTED: — Ah , siete così ! — ella disse , vedendo .

Processing Epoch 13:   0%|          | 0/3638 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Processing Epoch 13: 100%|██████████| 3638/3638 [29:23<00:00,  2.06it/s, loss=2.917]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


--------------------------------------------------------------------------------
SOURCE: He did not undress, but paced up and down with his even step on the resounding parquet floor of the dining-room, which was lit by one lamp, over the carpet of the dark drawing-room, where a light was reflected only from a recently painted portrait of himself which hung above the sofa, and on through her sitting-room, where two candles were burning, lighting up the portraits of her relatives and friends and the elegant knick-knacks, long familiar to him, on her writing-table.
TARGET: Senza essersi spogliato, andava avanti e indietro, con passo eguale, sul pavimento di legno scricchiolante della sala da pranzo illuminata da un’unica lampada, sul tappeto del salotto oscuro in cui la luce si rifletteva solo sul suo grande ritratto fatto da poco, appeso sopra il divano, e attraverso lo studiolo di lei, dove ardevano due candele che davan luce ai ritratti dei parenti e delle amiche e agli oggetti belli d

Processing Epoch 14:   0%|          | 0/3638 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Processing Epoch 14: 100%|██████████| 3638/3638 [29:20<00:00,  2.07it/s, loss=3.056]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


--------------------------------------------------------------------------------
SOURCE: This beautiful baby only inspired him with a sense of repulsion and pity.
TARGET: Quel bellissimo bambino gli ispirava soltanto disgusto e pena.
PREDICTED: Questo bambino solo con lui la propria impressione e con la pena .
--------------------------------------------------------------------------------
SOURCE: 'Certainly count them!
TARGET: — Contare, sì, assolutamente.
PREDICTED: — .


Processing Epoch 15:   0%|          | 0/3638 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Processing Epoch 15: 100%|██████████| 3638/3638 [29:24<00:00,  2.06it/s, loss=2.407]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


--------------------------------------------------------------------------------
SOURCE: 'I'm ready to agree with Arseny beforehand.
TARGET: — Ma con Arsenij sono fin d’ora d’accordo su tutto.
PREDICTED: — Sono già pronta a carte con Arsenij .
--------------------------------------------------------------------------------
SOURCE: "Courage," urged the lawyer,--"speak out."
TARGET: — Coraggio! — continuò il legale, — parlate forte.
PREDICTED: — — mi disse l ’ avvocato .


Processing Epoch 16:   0%|          | 0/3638 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Processing Epoch 16: 100%|██████████| 3638/3638 [29:26<00:00,  2.06it/s, loss=2.426]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


--------------------------------------------------------------------------------
SOURCE: I found in the low grounds hares (as I thought them to be) and foxes; but they differed greatly from all the other kinds I had met with, nor could I satisfy myself to eat them, though I killed several.
TARGET: Trovai nelle terre basse e volpi e lepri, almeno così le giudicai; tanto diverse per altro da tutte le solite in cui m’era altrove abbattuto, che, se bene ne uccidessi molte, non seppi risolvermi ad assaggiarne.
PREDICTED: Capii nel far delle eriche ( come potei ”, ma mi trovai molto ; su l ’ un l ’ altro , che nè potei nè mi di non aver mai avuta la speranza d ’ averne tuttavia altri , benchè dopo molto tempo di , eccetto pallini .
--------------------------------------------------------------------------------
SOURCE: A woman who could betray me for such a rival was not worth contending for; she deserved only scorn; less, however, than I, who had been her dupe.
TARGET: "Una donna che poteva

Processing Epoch 17:   0%|          | 0/3638 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Processing Epoch 17: 100%|██████████| 3638/3638 [29:24<00:00,  2.06it/s, loss=2.543]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


--------------------------------------------------------------------------------
SOURCE: 'After all, he is a good man: truthful, kind, and remarkable in his own sphere,' said Anna to herself when she had returned to her room, as if defending him from some one who accused him and declared it was impossible to love him. 'But why do his ears stick out so?
TARGET: «Però è un brav’uomo, leale, di buon cuore e notevole nel suo campo — si andava dicendo Anna, tornata in camera sua; quasi a difenderlo di fronte a qualcuno che lo accusasse e che dicesse a lei che non lo si poteva amare. — Ma come mai ha le orecchie che gli sporgono così stranamente in fuori?
PREDICTED: — Ma è buono , anima mia , gentile e , e in genere , del suo ambiente — disse ella , quando si fu presa da un ’ irritazione che , come se un pazzo di sollievo gli era entrato — e che il bambino gli dice di no , senza volere proprio perché il bastone ?
-------------------------------------------------------------------------------

Processing Epoch 18:   0%|          | 0/3638 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Processing Epoch 18: 100%|██████████| 3638/3638 [29:19<00:00,  2.07it/s, loss=2.122]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


--------------------------------------------------------------------------------
SOURCE: It is very old, and it was very strong and great once.
TARGET: È città antichissima, e una volta era assai forte e grande.
PREDICTED: È molto vecchio e molta di grande fatica .
--------------------------------------------------------------------------------
SOURCE: He vanished, but reappeared instantly--
TARGET: Egli uscì e poco dopo ricomparve.
PREDICTED: Egli si fece pensieroso , ma subito .


Processing Epoch 19:   0%|          | 0/3638 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Processing Epoch 19: 100%|██████████| 3638/3638 [29:25<00:00,  2.06it/s, loss=2.301]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


--------------------------------------------------------------------------------
SOURCE: George got hold of the paper, and read us out the boating fatalities, and the weather forecast, which latter prophesied "rain, cold, wet to fine" (whatever more than usually ghastly thing in weather that may be), "occasional local thunder-storms, east wind, with general depression over the Midland Counties (London and Channel).
TARGET: Giorgio s’impadronì del giornale, e ci lesse le disgrazie fluviali e marittime, e la previsione del tempo, che vaticinava «pioggia, freddo, vento» (tutto ciò che ci può esser di peggiore nel tempo), e qualche temporale locale, con depressione generale sulle contee centrali (Londra e Canale).
PREDICTED: Giorgio della carta , e ci dirigemmo a sentire la barca , e i racconti delle barche che nelle pozzanghere , nelle notti più lunghe ( più si può dire che si può fare una qualche tempo ), che si può scegliere una e un po ’ di sole , più durante la stagione delle torte , 