# **Miniproject 2**
## **~Large~ Small Language Model**

### **Objective**
Implement a transformer-based, character-level language model (GPT-like) and train it on the Shakespeare dataset. By the end of this project, you should be able to generate Shakespearean-like text given a seed string.

You will probably want to train the model on a GPU. You can use free GPUs on [Google Colab](https://colab.research.google.com/?utm_source=scs-index).

### **Dataset**:

The Shakespeare dataset contains the complete works of William Shakespeare, including his plays, poems, and sonnets.

[**Download link**](https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt)

In a character-level language model, each character in the input data is mapped to its respective index from a dictionary. The input to the model is in the form (B, N), where B is the batch size and N is the number of tokens for each sequence. The model was tested with B=N=128, but feel free to explore different values.

An interface for the dataset class that takes care of tokenization is provided below.



```python
from torch.utils.data import Dataset

class CharDataset(Dataset):
    """
    Emits batches of characters.

    Adapted from "https://github.com/karpathy/minGPT".
    """

    def __init__(self, config, data):

        chars = ... # get characters from the input data
        self.stoi = { ch:i for i,ch in enumerate(chars) } # map characters to integer indices

        ...

    def get_vocab_size(self):
        raise NotImplementedError()

    def __len__(self):
        raise NotImplementedError()

    def __getitem__(self, idx):
        # grab a chunk of (block_size + 1) characters from the data
        # encode every character to an integer
        # return the chunk and the shifted version as tensors
        pass
```




### **Requirements**

#### **Architecture**

Implement the Transformer's decoder-only structure.
This includes

* input token embeddings
* the causal multi-head self-attention mechanism
* feed-forward neural networks
* positional encodings, residual connections, layer normalizations.

The project was tested with $12$ layers, $8$ attention heads, and $768$ embedding dimensions, on a single GPU.

The `forward` method for the entire model has the following form:

```
tok_emb = WTE(idx) # token embeddings
pos_emb = WPE(pos) # position embeddings
x = Dropout(tok_emb + pos_emb)
for Block in Blocks:
    x = Block(x)
x = Final_LayerNorm(x)
logits = LM_Head(x)
```

The `forward` method for the transformer block has the following form:



```
x = x + self.CausalSelfAttn(self.LayerNorm_1(x))
out = x + self.MLP(self.LayerNorm_2(x))
```

---

#### **Training**

In a character-level transformer language model, the goal is to predict the next character in a sequence given the previous characters. To train such a model effectively, we use two versions of our data: the input sequence and a shifted version of this sequence, which serves as the target for our predictions.

Preprocess the dataset to a character-level representation.
Use a sliding window approach for sequence chunks (e.g., window size of $128$ characters).
Implement causal masking for the self-attention mechanism.
Use the [Adam](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html) optimizer and the cross-entropy loss.

**Optional**:

* Implement a learning rate decay strategy
* Implement gradient clipping

---


#### **Evaluation and Inference**

* Monitor the cross-entropy loss. Use a seed string to initialize the model and generate Shakespearean-like text.

* In order to generate the characters, at each generation step you can either select the character with the highest probability, or you can sample according to the output distribution.

The high-level pseudocode for generation is:

```python
model.eval()
with torch.no_grad():
    context = "O God, O God!"
    tokenized_context = tokenize(context)
    # the model should implement a method to generate tokens given a prompt
    y = model.generate(tokenized, ...)
    completion = tokens_to_string(y)
```

**Optional**:
* Compute the [perplexity](https://medium.com/@priyankads/perplexity-of-language-models-41160427ed72#:~:text=Intuitively%2C%20perplexity%20means%20to%20be,loss%20obtained%20from%20the%20model.) metric for quantitative evaluation.

### **Example Outputs**

The following are my outputs after $6000$ steps of training, with the seed string "O God, O God!"



```
O God, O God! neither? unto the base very ears,
As damned with it.

DUKE OF YORK:
Away! Once more, one word.

RICHARD:
Clove, dear so; and therein my son will be
false of woe: if ye seems to be the mother
Of gracious order this time when R going kinsperse eyes,
What dost bewreck her fairer drying tears.

NORTHUMBERLAND:
Have you forgot the Duke of Norfolk, get him to
again; and and agilic: there is my spirit
So maly did must such a marble perfection.

ELBOW:
Come, bring them with oaths, and so deliver
```


### Resources:

* Vaswani et al., "Attention is All You Need": [link](https://arxiv.org/abs/1706.03762)

* Illustrated Transformer by Jay Alammar: [link](https://jalammar.github.io/illustrated-transformer/)

* OpenAI GPT-2 Paper: [link](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)

* Deep Learning Course slides on transformers: [link](https://fleuret.org/dlc/materials/dlc-handout-13-3-transformers.pdf)

In [None]:
import torch
import time
from tqdm import tqdm
from torch.utils.data import Dataset, DataLoader
import torch.nn as nn
import torch.optim as optim
import numpy as np
import math
device = torch.device("cuda")

In [None]:
print(torch.__version__)
if torch.cuda.is_available():
  print("GPU is available")
  print("CUDA version:", torch.version.cuda)
else:
  print("GPU is not available")

2.5.1+cu121
GPU is available
CUDA version: 12.1


In [None]:
class CharDataset(Dataset):
  def __init__(self, config, data):
    # create dictionary of chars
    chars = sorted(list(set(data)))
    self.stoi = {ch: i for i, ch in enumerate(chars)}
    self.itos = {i: ch for i, ch in enumerate(chars)}
    self.vocab_size = len(chars)
    # get data from the text
    self.data = [self.stoi[ch] for ch in data]
    self.block_size = config['block_size']

  def get_vocab_size(self):
    return self.vocab_size

  def __len__(self):
    return len(self.data) - self.block_size

  def __getitem__(self, idx):
    # grab a chunk of (block_size + 1) characters from the data
    chunk = self.data[idx:idx + self.block_size + 1]
    # encode every character to an integer(long)
    # return the chunk and the shifted version as tensors
    x = torch.tensor(chunk[:-1], dtype=torch.long)  # current sequence
    y = torch.tensor(chunk[1:], dtype=torch.long)  # target sequence
    return x, y


In [None]:
class MultiHeadAttention(nn.Module):
  def __init__(self):
    super(MultiHeadAttention, self).__init__()
    self.W_Q = nn.Linear(d_model, d_k * n_heads, bias = False)
    self.W_K = nn.Linear(d_model, d_k * n_heads, bias = False)
    self.W_V = nn.Linear(d_model, d_v * n_heads, bias = False)
    self.fc = nn.Linear(n_heads * d_v, d_model, bias = False)
    self.layer_norm = nn.LayerNorm(d_model)

  def forward(self, input_Q, input_K, input_V, attn_mask):
    residual, batch_size = input_Q, input_Q.size(0)
    Q = self.W_Q(input_Q).view(batch_size, -1, n_heads, d_k).transpose(1, 2)
    K = self.W_K(input_K).view(batch_size, -1, n_heads, d_k).transpose(1, 2)
    V = self.W_V(input_V).view(batch_size, -1, n_heads, d_v).transpose(1, 2)
    attn_mask = attn_mask.unsqueeze(1).repeat(1, n_heads, 1, 1)
    context, attn = ScaledDotProductAttention()(Q, K, V, attn_mask)
    context = context.transpose(1, 2).reshape(batch_size, -1, n_heads * d_v)
    output = self.fc(context)
    return self.layer_norm(output + residual), attn

class PoswiseFeedForwardNet(nn.Module):
  def __init__(self):
    super(PoswiseFeedForwardNet, self).__init__()
    self.fc = nn.Sequential(
      nn.Linear(d_model, d_ff, bias=False),
      nn.ReLU(),
      nn.Linear(d_ff, d_model, bias=False)
    )
    self.layer_norm = nn.LayerNorm(d_model)

  def forward(self, inputs):
    residual = inputs # inputs : [batch_size, len_q, d_model]
    output = self.fc(inputs)
    return self.layer_norm(output + residual)

def get_attn_pad_mask(seq_q, seq_k):
  batch_size, len_q = seq_q.size()
  batch_size, len_k = seq_k.size()
  pad_attn_mask = seq_k.data.eq(0).unsqueeze(1)  # batch_size x 1 x len_k, one is masking
  return pad_attn_mask.expand(batch_size, len_q, len_k)  # batch_size x len_q x len_k

def get_attn_subsequent_mask(seq):
  """
  seq: [batch_size, tgt_len]
  """
  attn_shape = [seq.size(0), seq.size(1), seq.size(1)]
  # attn_shape: [batch_size, tgt_len, tgt_len]
  subsequence_mask = np.triu(np.ones(attn_shape), k=1)
  subsequence_mask = torch.from_numpy(subsequence_mask).byte()
  subsequence_mask = subsequence_mask.to(device)
  return subsequence_mask  # [batch_size, tgt_len, tgt_len]


class ScaledDotProductAttention(nn.Module):
  def __init__(self):
    super(ScaledDotProductAttention, self).__init__()

  def forward(self, Q, K, V, attn_mask):
    scores = torch.matmul(Q, K.transpose(-1, -2)) / math.sqrt(Q.size(-1))
    scores.masked_fill_(attn_mask, -1e9)
    attn = nn.Softmax(dim = -1)(scores)
    context = torch.matmul(attn, V)
    return context, attn


class PositionalEncoding(nn.Module):
  def __init__(self, d_model, dropout=0.1, max_len=5000):
    super(PositionalEncoding, self).__init__()
    self.dropout = nn.Dropout(p=dropout)

    pe = torch.zeros(max_len, d_model)
    position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
    div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model))
    pe[:, 0::2] = torch.sin(position * div_term)
    pe[:, 1::2] = torch.cos(position * div_term)
    # pe:[max_len*d_model]

    pe = pe.unsqueeze(0).transpose(0, 1)
    # pe：[max_len*1*d_model]
    self.register_buffer('pe', pe)

  def forward(self, x):
    x = x + self.pe[:x.size(0), :] # x: [seq_len, batch_size, d_model]
    return self.dropout(x)

class DecoderLayer(nn.Module):
  def __init__(self):
    super(DecoderLayer, self).__init__()
    self.dec_self_attn = MultiHeadAttention()
    self.pos_ffn = PoswiseFeedForwardNet()

  def forward(self, dec_inputs, dec_self_attn_mask):
    dec_outputs, dec_self_attn = self.dec_self_attn(dec_inputs, dec_inputs, dec_inputs, attn_mask=dec_self_attn_mask)
    dec_outputs = self.pos_ffn(dec_outputs)
    return dec_outputs, dec_self_attn

class Decoder(nn.Module):
  def __init__(self):
    super(Decoder, self).__init__()
    self.tgt_emb = nn.Embedding(vocab_size, d_model)
    self.pos_emb = PositionalEncoding(d_model)
    self.layers = nn.ModuleList([DecoderLayer() for _ in range(n_layers)])
    self.final_layer_norm = nn.LayerNorm(d_model)

  def forward(self, dec_inputs): # dec_inputs : [batch_size x target_len]
    dec_outputs = self.tgt_emb(dec_inputs)  #dec_outputs  [batch_size, tgt_len, d_model]
    dec_outputs = self.pos_emb(dec_outputs.transpose(0, 1)).transpose(0, 1) # [batch_size, tgt_len, d_model]

    dec_self_attn_pad_mask = get_attn_pad_mask(dec_inputs, dec_inputs)
    dec_self_attn_subsequent_mask = get_attn_subsequent_mask(dec_inputs)

    ## When two matrices are added together, those greater than 0 are 1, those less than 0 are 0, and those with 1 are then filled to infinity
    dec_self_attn_mask = torch.gt((dec_self_attn_pad_mask + dec_self_attn_subsequent_mask), 0)

    dec_self_attns = []
    for layer in self.layers:
      dec_outputs, dec_self_attn = layer(dec_outputs, dec_self_attn_mask)
      dec_self_attns.append(dec_self_attn)
    dec_outputs = self.final_layer_norm(dec_outputs)
    return dec_outputs, dec_self_attns

class GPT(nn.Module):
  def __init__(self):
    super(GPT, self).__init__()
    self.decoder = Decoder()
    self.projection = nn.Linear(d_model, vocab_size, bias=False)
    # Initialize weights
    self.apply(self._init_weights)
  def _init_weights(self, module):
    if isinstance(module, nn.Linear):
      nn.init.normal_(module.weight, mean=0, std=0.02)
      if module.bias is not None:
        nn.init.zeros_(module.bias)
    elif isinstance(module, nn.Embedding):
      nn.init.normal_(module.weight, mean=0, std=0.02)
  def forward(self, dec_inputs):
    dec_outputs, dec_self_attns = self.decoder(dec_inputs) # [batch_size, tgt_len, d_model]
    dec_logits = self.projection(dec_outputs)  # [batch_size, tgt_len, vocab_size]
    return dec_logits.view(-1, dec_logits.size(-1)), dec_self_attns

In [None]:
def check_labels(dec_outputs):
  if (dec_outputs < 0).any() or (dec_outputs >= vocab_size).any():
    print(f"Invalid target label values detected! Min: {dec_outputs.min()}, Max: {dec_outputs.max()}")
    # Print part of the label data for easy debugging
    print(f"Some labeled data: {dec_outputs[:10]}")
    raise ValueError(f"Tag out of bounds!The label maximum is {dec_outputs.max()}, and the vocab_size is {vocab_size}.")

def train_step(model,data_loader,optimizer,criterion,clip=1,print_every=None):
  model.train()
  if print_every == 0:
    print_every = 1
  print_loss_total = 0
  epoch_loss = 0
  for i, (dec_inputs, dec_outputs) in enumerate(tqdm(data_loader)):
    optimizer.zero_grad()
    dec_inputs, dec_outputs =dec_inputs.to(device), dec_outputs.to(device)
    print(f"dec_inputs shape: {dec_inputs.shape}")
    print(f"dec_outputs shape: {dec_outputs.shape}")
    # Check whether the target label is out of bounds
    check_labels(dec_outputs)
    # outputs: [batch_size * tgt_len, tgt_vocab_size]
    outputs, dec_self_attns = model(dec_inputs)
    print(f"model output shape: {outputs.shape}")
    loss = criterion(outputs, dec_outputs.view(-1))
    print_loss_total += loss.item()
    epoch_loss += loss.item()
    loss.backward()
    # gradient clipping
    torch.nn.utils.clip_grad_norm_(model.parameters(), clip)
    optimizer.step()

    if print_every and (i + 1) % print_every == 0:
      print_loss_avg = print_loss_total / print_every
      print_loss_total = 0
      print('\tCurrent Loss: %.4f' % print_loss_avg)
  return epoch_loss / len(data_loader)

def compute_validation_loss(model, data_loader, criterion):
  model.eval()
  val_loss = 0
  total_samples = 0
  with torch.no_grad():
    for dec_inputs, dec_outputs in tqdm(data_loader, desc="Validating"):
      dec_inputs, dec_outputs = dec_inputs.to(device), dec_outputs.to(device)
      check_labels(dec_outputs)
      # Forward pass
      outputs, _ = model(dec_inputs)
      # Compute loss
      loss = criterion(outputs.view(-1, outputs.size(-1)), dec_outputs.view(-1))  #Flattened into one dimension
      val_loss += loss.item() * dec_inputs.size(0)  # Cumulative total loss, multiplied by batch size
      total_samples += dec_inputs.size(0)  # Total number of samples
  return val_loss / total_samples

def train(model, train_loader, val_loader, epochs):
  criterion = nn.CrossEntropyLoss(ignore_index=0).to(device)
  optimizer = optim.Adam(model.parameters(), lr=1e-5)
  for epoch in range(epochs):
    start_time = time.time()
    train_loss = train_step(model, train_loader, optimizer, criterion, CLIP, print_every=10)
    val_loss = compute_validation_loss(model, val_loader, criterion)
    end_time = time.time()
    checkpoint_path = f'checkpoints_epoch_{epoch}.pt'
    torch.save(model.state_dict(), checkpoint_path)
    print(f"Epoch {epoch}, Train Loss: {train_loss:.4f}, Val Loss: {val_loss:.4f}, Time: {end_time - start_time:.2f}s")

In [None]:
def load_data(file_path, config, batch_size=256):
  with open(file_path, 'r') as f:
    text_data = f.read()
  # Split data into train (60%), validation (20%), and test (20%)
  train_split = int(len(text_data) * 0.6)
  val_split = int(len(text_data) * 0.8)
  train_data = text_data[:train_split]
  val_data = text_data[train_split:val_split]
  test_data = text_data[val_split:]
  train_dataset = CharDataset(config, train_data)
  stoi = train_dataset.stoi
  vocab_size = train_dataset.get_vocab_size()
  val_dataset = CharDataset(config, val_data)
  test_dataset = CharDataset(config, test_data)
  # Create DataLoaders
  train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
  val_dataloader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)
  test_dataloader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)
  print(f"Vocabulary size: {train_dataset.get_vocab_size()}")
  print(f"Train data length: {len(train_data)}")
  print(f"Validation data length: {len(val_data)}")
  return train_dataloader, val_dataloader, test_dataloader

In [None]:
if __name__ == '__main__':
  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
  # Configurations
  config = {'block_size': 128}
  # Load data
  train_dataloader, val_dataloader, test_dataloader = load_data('input.txt', config)
  # Model parameters
  d_model = 768  # Embedding Size
  d_ff = 2048  # FeedForward dimension
  n_layers = 12  # Number of Decoder Layers
  n_heads = 8  # Number of heads in Multi-Head Attention
  CLIP = 1
  epochs = 5
  vocab_size = train_dataloader.dataset.vocab_size
  # Initialize model
  model = GPT().to(device)
  # Train model (includes validation)
  train(model, train_dataloader, val_dataloader, epochs)

Vocabulary size: 64
Train data length: 669236
Validation data length: 223079


  0%|          | 0/2614 [00:00<?, ?it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 1/2614 [00:00<16:26,  2.65it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 2/2614 [00:01<25:57,  1.68it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 3/2614 [00:01<29:09,  1.49it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 4/2614 [00:02<30:27,  1.43it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 5/2614 [00:03<31:14,  1.39it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 6/2614 [00:04<31:43,  1.37it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 7/2614 [00:04<32:01,  1.36it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 8/2614 [00:05<32:13,  1.35it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 9/2614 [00:06<32:19,  1.34it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 10/2614 [00:07<32:25,  1.34it/s]

	Current Loss: 3.5747
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 11/2614 [00:07<32:27,  1.34it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 12/2614 [00:08<32:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 13/2614 [00:09<32:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 14/2614 [00:10<32:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 15/2614 [00:10<32:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 16/2614 [00:11<32:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 17/2614 [00:12<32:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 18/2614 [00:13<32:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 19/2614 [00:13<32:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 20/2614 [00:14<32:29,  1.33it/s]

	Current Loss: 3.3106
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 21/2614 [00:15<32:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 22/2614 [00:16<32:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 23/2614 [00:16<32:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 24/2614 [00:17<32:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 25/2614 [00:18<32:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 26/2614 [00:19<32:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 27/2614 [00:19<32:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 28/2614 [00:20<32:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 29/2614 [00:21<32:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 30/2614 [00:22<32:22,  1.33it/s]

	Current Loss: 3.2906
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 31/2614 [00:22<32:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 32/2614 [00:23<32:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 33/2614 [00:24<32:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 34/2614 [00:25<32:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 35/2614 [00:25<32:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 36/2614 [00:26<32:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 37/2614 [00:27<32:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 38/2614 [00:28<32:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 39/2614 [00:28<32:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 40/2614 [00:29<32:15,  1.33it/s]

	Current Loss: 3.2813
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 41/2614 [00:30<32:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 42/2614 [00:31<32:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 43/2614 [00:31<32:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 44/2614 [00:32<32:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 45/2614 [00:33<32:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 46/2614 [00:34<32:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 47/2614 [00:34<32:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 48/2614 [00:35<32:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 49/2614 [00:36<32:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 50/2614 [00:37<32:07,  1.33it/s]

	Current Loss: 3.2744
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 51/2614 [00:37<32:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 52/2614 [00:38<32:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 53/2614 [00:39<32:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 54/2614 [00:40<32:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 55/2614 [00:40<32:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 56/2614 [00:41<32:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 57/2614 [00:42<32:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 58/2614 [00:43<32:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 59/2614 [00:43<32:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 60/2614 [00:44<31:59,  1.33it/s]

	Current Loss: 3.2773
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 61/2614 [00:45<32:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 62/2614 [00:46<31:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 63/2614 [00:46<31:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 64/2614 [00:47<31:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 65/2614 [00:48<31:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 66/2614 [00:49<31:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 67/2614 [00:49<31:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 68/2614 [00:50<31:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 69/2614 [00:51<31:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 70/2614 [00:52<31:52,  1.33it/s]

	Current Loss: 3.2757
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 71/2614 [00:53<31:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 72/2614 [00:53<31:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 73/2614 [00:54<31:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 74/2614 [00:55<31:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 75/2614 [00:56<31:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 76/2614 [00:56<31:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 77/2614 [00:57<31:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 78/2614 [00:58<31:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 79/2614 [00:59<31:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 80/2614 [00:59<31:45,  1.33it/s]

	Current Loss: 3.2715
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 81/2614 [01:00<31:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 82/2614 [01:01<31:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 83/2614 [01:02<31:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 84/2614 [01:02<31:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 85/2614 [01:03<31:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 86/2614 [01:04<31:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 87/2614 [01:05<31:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 88/2614 [01:05<31:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 89/2614 [01:06<31:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 90/2614 [01:07<31:41,  1.33it/s]

	Current Loss: 3.2747
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 91/2614 [01:08<31:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 92/2614 [01:08<31:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 93/2614 [01:09<31:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 94/2614 [01:10<31:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 95/2614 [01:11<31:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 96/2614 [01:11<31:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 97/2614 [01:12<31:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 98/2614 [01:13<31:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 99/2614 [01:14<31:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 100/2614 [01:14<31:29,  1.33it/s]

	Current Loss: 3.2758
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 101/2614 [01:15<31:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 102/2614 [01:16<31:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 103/2614 [01:17<31:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 104/2614 [01:17<31:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 105/2614 [01:18<31:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 106/2614 [01:19<31:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 107/2614 [01:20<31:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 108/2614 [01:20<31:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 109/2614 [01:21<31:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 110/2614 [01:22<31:23,  1.33it/s]

	Current Loss: 3.2700
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 111/2614 [01:23<31:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 112/2614 [01:23<31:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 113/2614 [01:24<31:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 114/2614 [01:25<31:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 115/2614 [01:26<31:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 116/2614 [01:26<31:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 117/2614 [01:27<31:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 118/2614 [01:28<31:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 119/2614 [01:29<31:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 120/2614 [01:29<31:14,  1.33it/s]

	Current Loss: 3.2730
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 121/2614 [01:30<31:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 122/2614 [01:31<31:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 123/2614 [01:32<31:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 124/2614 [01:32<31:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 125/2614 [01:33<31:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 126/2614 [01:34<31:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 127/2614 [01:35<31:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 128/2614 [01:35<31:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 129/2614 [01:36<31:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 130/2614 [01:37<31:07,  1.33it/s]

	Current Loss: 3.2636
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 131/2614 [01:38<31:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 132/2614 [01:38<31:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 133/2614 [01:39<31:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 134/2614 [01:40<31:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 135/2614 [01:41<31:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 136/2614 [01:41<31:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 137/2614 [01:42<31:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 138/2614 [01:43<31:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 139/2614 [01:44<31:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 140/2614 [01:44<31:01,  1.33it/s]

	Current Loss: 3.2648
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 141/2614 [01:45<31:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 142/2614 [01:46<30:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 143/2614 [01:47<30:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 144/2614 [01:47<30:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 145/2614 [01:48<30:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 146/2614 [01:49<30:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 147/2614 [01:50<30:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 148/2614 [01:50<30:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 149/2614 [01:51<30:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 150/2614 [01:52<30:52,  1.33it/s]

	Current Loss: 3.2560
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 151/2614 [01:53<30:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 152/2614 [01:53<30:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 153/2614 [01:54<30:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 154/2614 [01:55<30:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 155/2614 [01:56<30:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 156/2614 [01:56<30:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 157/2614 [01:57<30:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 158/2614 [01:58<30:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 159/2614 [01:59<30:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 160/2614 [01:59<30:44,  1.33it/s]

	Current Loss: 3.2559
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 161/2614 [02:00<30:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 162/2614 [02:01<30:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 163/2614 [02:02<30:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 164/2614 [02:02<30:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 165/2614 [02:03<30:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 166/2614 [02:04<30:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 167/2614 [02:05<30:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 168/2614 [02:05<30:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 169/2614 [02:06<30:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 170/2614 [02:07<30:37,  1.33it/s]

	Current Loss: 3.2419
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 171/2614 [02:08<30:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 172/2614 [02:08<30:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 173/2614 [02:09<30:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 174/2614 [02:10<30:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 175/2614 [02:11<30:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 176/2614 [02:11<30:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 177/2614 [02:12<30:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 178/2614 [02:13<30:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 179/2614 [02:14<30:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 180/2614 [02:14<30:30,  1.33it/s]

	Current Loss: 3.2295
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 181/2614 [02:15<30:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 182/2614 [02:16<30:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 183/2614 [02:17<30:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 184/2614 [02:17<30:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 185/2614 [02:18<30:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 186/2614 [02:19<30:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 187/2614 [02:20<30:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 188/2614 [02:21<30:49,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 189/2614 [02:21<30:40,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 190/2614 [02:22<30:34,  1.32it/s]

	Current Loss: 3.2252
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 191/2614 [02:23<30:30,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 192/2614 [02:24<30:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 193/2614 [02:24<30:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 194/2614 [02:25<30:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 195/2614 [02:26<30:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 196/2614 [02:27<30:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 197/2614 [02:27<30:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 198/2614 [02:28<30:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 199/2614 [02:29<30:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 200/2614 [02:30<30:14,  1.33it/s]

	Current Loss: 3.2125
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 201/2614 [02:30<30:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 202/2614 [02:31<30:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 203/2614 [02:32<30:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 204/2614 [02:33<30:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 205/2614 [02:33<30:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 206/2614 [02:34<30:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 207/2614 [02:35<30:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 208/2614 [02:36<30:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 209/2614 [02:36<30:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 210/2614 [02:37<30:07,  1.33it/s]

	Current Loss: 3.2228
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 211/2614 [02:38<30:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 212/2614 [02:39<30:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 213/2614 [02:39<30:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 214/2614 [02:40<30:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 215/2614 [02:41<30:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 216/2614 [02:42<30:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 217/2614 [02:42<30:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 218/2614 [02:43<30:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 219/2614 [02:44<30:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 220/2614 [02:45<29:59,  1.33it/s]

	Current Loss: 3.2000
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 221/2614 [02:45<29:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 222/2614 [02:46<29:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 223/2614 [02:47<29:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 224/2614 [02:48<29:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 225/2614 [02:48<29:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 226/2614 [02:49<29:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 227/2614 [02:50<29:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 228/2614 [02:51<29:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 229/2614 [02:51<29:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 230/2614 [02:52<29:51,  1.33it/s]

	Current Loss: 3.1913
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 231/2614 [02:53<29:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 232/2614 [02:54<29:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 233/2614 [02:54<29:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 234/2614 [02:55<29:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 235/2614 [02:56<29:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 236/2614 [02:57<29:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 237/2614 [02:57<29:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 238/2614 [02:58<29:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 239/2614 [02:59<29:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 240/2614 [03:00<29:45,  1.33it/s]

	Current Loss: 3.1790
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 241/2614 [03:00<29:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 242/2614 [03:01<29:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 243/2614 [03:02<29:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 244/2614 [03:03<29:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 245/2614 [03:03<29:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 246/2614 [03:04<29:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 247/2614 [03:05<29:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 248/2614 [03:06<29:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 249/2614 [03:06<29:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 250/2614 [03:07<29:37,  1.33it/s]

	Current Loss: 3.1805
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 251/2614 [03:08<29:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 252/2614 [03:09<29:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 253/2614 [03:09<29:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 254/2614 [03:10<29:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 255/2614 [03:11<29:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 256/2614 [03:12<29:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 257/2614 [03:12<29:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 258/2614 [03:13<29:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 259/2614 [03:14<29:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 260/2614 [03:15<29:32,  1.33it/s]

	Current Loss: 3.1734
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 261/2614 [03:15<29:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 262/2614 [03:16<29:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 263/2614 [03:17<29:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 264/2614 [03:18<29:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 265/2614 [03:18<29:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 266/2614 [03:19<29:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 267/2614 [03:20<29:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 268/2614 [03:21<29:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 269/2614 [03:21<29:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 270/2614 [03:22<29:47,  1.31it/s]

	Current Loss: 3.1630
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 271/2614 [03:23<29:39,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 272/2614 [03:24<29:33,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 273/2614 [03:24<29:29,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 274/2614 [03:25<29:26,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 275/2614 [03:26<29:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 276/2614 [03:27<29:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 277/2614 [03:27<29:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 278/2614 [03:28<29:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 279/2614 [03:29<29:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 280/2614 [03:30<29:15,  1.33it/s]

	Current Loss: 3.1597
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 281/2614 [03:30<29:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 282/2614 [03:31<29:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 283/2614 [03:32<29:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 284/2614 [03:33<29:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 285/2614 [03:33<29:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 286/2614 [03:34<29:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 287/2614 [03:35<29:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 288/2614 [03:36<29:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 289/2614 [03:36<29:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 290/2614 [03:37<29:07,  1.33it/s]

	Current Loss: 3.1508
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 291/2614 [03:38<29:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 292/2614 [03:39<29:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 293/2614 [03:39<29:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 294/2614 [03:40<29:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 295/2614 [03:41<29:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 296/2614 [03:42<29:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 297/2614 [03:42<29:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 298/2614 [03:43<29:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 299/2614 [03:44<29:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 300/2614 [03:45<28:59,  1.33it/s]

	Current Loss: 3.1438
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 301/2614 [03:45<28:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 302/2614 [03:46<28:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 303/2614 [03:47<28:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 304/2614 [03:48<28:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 305/2614 [03:49<28:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 306/2614 [03:49<28:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 307/2614 [03:50<28:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 308/2614 [03:51<28:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 309/2614 [03:52<28:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 310/2614 [03:52<28:52,  1.33it/s]

	Current Loss: 3.1287
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 311/2614 [03:53<28:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 312/2614 [03:54<28:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 313/2614 [03:55<28:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 314/2614 [03:55<28:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 315/2614 [03:56<28:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 316/2614 [03:57<28:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 317/2614 [03:58<28:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 318/2614 [03:58<28:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 319/2614 [03:59<28:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 320/2614 [04:00<28:44,  1.33it/s]

	Current Loss: 3.1277
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 321/2614 [04:01<28:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 322/2614 [04:01<28:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 323/2614 [04:02<28:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 324/2614 [04:03<28:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 325/2614 [04:04<28:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 326/2614 [04:04<28:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 327/2614 [04:05<28:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 328/2614 [04:06<28:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 329/2614 [04:07<28:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 330/2614 [04:07<28:36,  1.33it/s]

	Current Loss: 3.1178
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 331/2614 [04:08<28:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 332/2614 [04:09<28:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 333/2614 [04:10<28:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 334/2614 [04:10<28:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 335/2614 [04:11<28:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 336/2614 [04:12<28:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 337/2614 [04:13<28:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 338/2614 [04:13<28:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 339/2614 [04:14<28:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 340/2614 [04:15<28:29,  1.33it/s]

	Current Loss: 3.1077
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 341/2614 [04:16<28:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 342/2614 [04:16<28:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 343/2614 [04:17<28:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 344/2614 [04:18<28:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 345/2614 [04:19<28:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 346/2614 [04:19<28:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 347/2614 [04:20<28:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 348/2614 [04:21<28:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 349/2614 [04:22<28:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 350/2614 [04:22<28:21,  1.33it/s]

	Current Loss: 3.1068
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 351/2614 [04:23<28:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 352/2614 [04:24<28:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 353/2614 [04:25<28:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 354/2614 [04:25<28:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 355/2614 [04:26<28:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 356/2614 [04:27<28:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 357/2614 [04:28<28:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 358/2614 [04:28<28:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 359/2614 [04:29<28:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 360/2614 [04:30<28:16,  1.33it/s]

	Current Loss: 3.0953
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 361/2614 [04:31<28:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 362/2614 [04:31<28:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 363/2614 [04:32<28:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 364/2614 [04:33<28:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 365/2614 [04:34<28:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 366/2614 [04:34<28:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 367/2614 [04:35<28:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 368/2614 [04:36<28:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 369/2614 [04:37<28:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 370/2614 [04:37<28:07,  1.33it/s]

	Current Loss: 3.0953
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 371/2614 [04:38<28:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 372/2614 [04:39<28:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 373/2614 [04:40<28:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 374/2614 [04:40<28:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 375/2614 [04:41<28:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 376/2614 [04:42<28:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 377/2614 [04:43<28:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 378/2614 [04:43<28:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 379/2614 [04:44<28:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 380/2614 [04:45<28:02,  1.33it/s]

	Current Loss: 3.0893
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 381/2614 [04:46<27:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 382/2614 [04:46<28:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 383/2614 [04:47<27:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 384/2614 [04:48<27:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 385/2614 [04:49<27:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 386/2614 [04:49<27:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 387/2614 [04:50<27:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 388/2614 [04:51<27:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 389/2614 [04:52<27:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 390/2614 [04:52<27:51,  1.33it/s]

	Current Loss: 3.0872
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 391/2614 [04:53<27:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 392/2614 [04:54<27:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 393/2614 [04:55<27:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 394/2614 [04:55<27:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 395/2614 [04:56<27:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 396/2614 [04:57<27:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 397/2614 [04:58<27:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 398/2614 [04:58<27:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 399/2614 [04:59<27:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 400/2614 [05:00<27:44,  1.33it/s]

	Current Loss: 3.0735
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 401/2614 [05:01<27:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 402/2614 [05:01<27:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 403/2614 [05:02<27:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 404/2614 [05:03<27:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 405/2614 [05:04<27:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 406/2614 [05:04<27:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 407/2614 [05:05<27:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 408/2614 [05:06<27:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 409/2614 [05:07<27:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 410/2614 [05:07<27:37,  1.33it/s]

	Current Loss: 3.0676
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 411/2614 [05:08<27:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 412/2614 [05:09<27:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 413/2614 [05:10<27:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 414/2614 [05:10<27:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 415/2614 [05:11<27:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 416/2614 [05:12<27:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 417/2614 [05:13<27:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 418/2614 [05:13<27:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 419/2614 [05:14<27:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 420/2614 [05:15<27:30,  1.33it/s]

	Current Loss: 3.0633
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 421/2614 [05:16<27:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 422/2614 [05:16<27:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 423/2614 [05:17<27:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 424/2614 [05:18<27:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 425/2614 [05:19<27:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 426/2614 [05:19<27:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 427/2614 [05:20<27:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 428/2614 [05:21<27:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 429/2614 [05:22<27:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 430/2614 [05:22<27:21,  1.33it/s]

	Current Loss: 3.0526
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 431/2614 [05:23<27:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 432/2614 [05:24<27:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 433/2614 [05:25<27:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 434/2614 [05:25<27:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 435/2614 [05:26<27:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 436/2614 [05:27<27:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 437/2614 [05:28<27:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 438/2614 [05:29<27:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 439/2614 [05:29<27:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 440/2614 [05:30<27:14,  1.33it/s]

	Current Loss: 3.0415
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 441/2614 [05:31<27:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 442/2614 [05:32<27:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 443/2614 [05:32<27:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 444/2614 [05:33<27:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 445/2614 [05:34<27:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 446/2614 [05:35<27:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 447/2614 [05:35<27:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 448/2614 [05:36<27:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 449/2614 [05:37<27:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 450/2614 [05:38<27:06,  1.33it/s]

	Current Loss: 3.0358
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 451/2614 [05:38<27:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 452/2614 [05:39<27:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 453/2614 [05:40<27:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 454/2614 [05:41<27:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 455/2614 [05:41<27:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 456/2614 [05:42<27:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 457/2614 [05:43<27:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 458/2614 [05:44<27:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 459/2614 [05:44<27:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 460/2614 [05:45<27:00,  1.33it/s]

	Current Loss: 3.0354
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 461/2614 [05:46<26:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 462/2614 [05:47<26:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 463/2614 [05:47<26:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 464/2614 [05:48<26:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 465/2614 [05:49<26:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 466/2614 [05:50<26:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 467/2614 [05:50<26:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 468/2614 [05:51<26:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 469/2614 [05:52<26:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 470/2614 [05:53<26:52,  1.33it/s]

	Current Loss: 3.0209
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 471/2614 [05:53<26:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 472/2614 [05:54<26:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 473/2614 [05:55<26:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 474/2614 [05:56<26:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 475/2614 [05:56<26:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 476/2614 [05:57<26:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 477/2614 [05:58<26:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 478/2614 [05:59<26:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 479/2614 [05:59<26:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 480/2614 [06:00<26:44,  1.33it/s]

	Current Loss: 3.0111
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 481/2614 [06:01<26:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 482/2614 [06:02<26:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 483/2614 [06:02<26:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 484/2614 [06:03<26:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 485/2614 [06:04<26:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 486/2614 [06:05<26:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 487/2614 [06:05<26:48,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 488/2614 [06:06<26:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 489/2614 [06:07<26:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 490/2614 [06:08<26:36,  1.33it/s]

	Current Loss: 3.0081
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 491/2614 [06:08<26:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 492/2614 [06:09<26:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 493/2614 [06:10<26:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 494/2614 [06:11<26:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 495/2614 [06:11<26:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 496/2614 [06:12<26:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 497/2614 [06:13<26:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 498/2614 [06:14<26:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 499/2614 [06:14<26:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 500/2614 [06:15<26:29,  1.33it/s]

	Current Loss: 2.9957
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 501/2614 [06:16<26:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 502/2614 [06:17<26:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 503/2614 [06:17<26:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 504/2614 [06:18<26:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 505/2614 [06:19<26:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 506/2614 [06:20<26:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 507/2614 [06:20<26:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 508/2614 [06:21<26:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 509/2614 [06:22<26:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 510/2614 [06:23<26:22,  1.33it/s]

	Current Loss: 2.9952
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 511/2614 [06:23<26:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 512/2614 [06:24<26:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 513/2614 [06:25<26:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 514/2614 [06:26<26:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 515/2614 [06:26<26:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 516/2614 [06:27<26:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 517/2614 [06:28<26:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 518/2614 [06:29<26:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 519/2614 [06:29<26:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 520/2614 [06:30<26:16,  1.33it/s]

	Current Loss: 2.9831
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 521/2614 [06:31<26:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 522/2614 [06:32<26:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 523/2614 [06:32<26:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 524/2614 [06:33<26:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 525/2614 [06:34<26:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 526/2614 [06:35<26:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 527/2614 [06:35<26:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 528/2614 [06:36<26:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 529/2614 [06:37<26:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 530/2614 [06:38<26:06,  1.33it/s]

	Current Loss: 2.9719
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 531/2614 [06:38<26:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 532/2614 [06:39<26:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 533/2614 [06:40<26:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 534/2614 [06:41<26:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 535/2614 [06:41<26:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 536/2614 [06:42<26:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 537/2614 [06:43<26:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 538/2614 [06:44<26:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 539/2614 [06:44<26:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 540/2614 [06:45<26:00,  1.33it/s]

	Current Loss: 2.9662
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 541/2614 [06:46<25:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 542/2614 [06:47<25:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 543/2614 [06:47<25:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 544/2614 [06:48<25:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 545/2614 [06:49<25:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 546/2614 [06:50<25:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 547/2614 [06:50<25:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 548/2614 [06:51<25:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 549/2614 [06:52<25:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 550/2614 [06:53<25:52,  1.33it/s]

	Current Loss: 2.9616
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 551/2614 [06:53<25:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 552/2614 [06:54<25:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 553/2614 [06:55<25:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 554/2614 [06:56<25:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 555/2614 [06:56<25:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 556/2614 [06:57<25:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 557/2614 [06:58<25:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 558/2614 [06:59<25:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 559/2614 [06:59<25:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 560/2614 [07:00<25:44,  1.33it/s]

	Current Loss: 2.9545
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 561/2614 [07:01<25:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 562/2614 [07:02<25:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 563/2614 [07:03<25:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 564/2614 [07:03<25:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 565/2614 [07:04<25:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 566/2614 [07:05<25:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 567/2614 [07:06<25:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 568/2614 [07:06<25:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 569/2614 [07:07<25:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 570/2614 [07:08<25:37,  1.33it/s]

	Current Loss: 2.9405
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 571/2614 [07:09<25:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 572/2614 [07:09<25:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 573/2614 [07:10<25:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 574/2614 [07:11<25:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 575/2614 [07:12<25:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 576/2614 [07:12<25:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 577/2614 [07:13<25:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 578/2614 [07:14<25:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 579/2614 [07:15<25:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 580/2614 [07:15<25:30,  1.33it/s]

	Current Loss: 2.9355
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 581/2614 [07:16<25:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 582/2614 [07:17<25:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 583/2614 [07:18<25:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 584/2614 [07:18<25:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 585/2614 [07:19<25:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 586/2614 [07:20<25:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 587/2614 [07:21<25:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 588/2614 [07:21<25:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 589/2614 [07:22<25:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 590/2614 [07:23<25:22,  1.33it/s]

	Current Loss: 2.9298
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 591/2614 [07:24<25:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 592/2614 [07:24<25:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 593/2614 [07:25<25:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 594/2614 [07:26<25:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 595/2614 [07:27<25:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 596/2614 [07:27<25:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 597/2614 [07:28<25:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 598/2614 [07:29<25:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 599/2614 [07:30<25:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 600/2614 [07:30<25:14,  1.33it/s]

	Current Loss: 2.9254
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 601/2614 [07:31<25:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 602/2614 [07:32<25:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 603/2614 [07:33<25:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 604/2614 [07:33<25:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 605/2614 [07:34<25:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 606/2614 [07:35<25:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 607/2614 [07:36<25:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 608/2614 [07:36<25:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 609/2614 [07:37<25:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 610/2614 [07:38<25:07,  1.33it/s]

	Current Loss: 2.9261
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 611/2614 [07:39<25:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 612/2614 [07:39<25:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 613/2614 [07:40<25:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 614/2614 [07:41<25:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 615/2614 [07:42<25:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 616/2614 [07:42<25:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 617/2614 [07:43<25:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 618/2614 [07:44<25:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 619/2614 [07:45<25:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 620/2614 [07:45<24:59,  1.33it/s]

	Current Loss: 2.9174
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 621/2614 [07:46<24:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 622/2614 [07:47<24:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 623/2614 [07:48<24:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 624/2614 [07:48<24:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 625/2614 [07:49<24:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 626/2614 [07:50<24:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 627/2614 [07:51<24:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 628/2614 [07:51<24:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 629/2614 [07:52<24:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 630/2614 [07:53<24:51,  1.33it/s]

	Current Loss: 2.9140
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 631/2614 [07:54<24:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 632/2614 [07:54<24:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 633/2614 [07:55<24:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 634/2614 [07:56<24:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 635/2614 [07:57<24:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 636/2614 [07:57<24:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 637/2614 [07:58<24:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 638/2614 [07:59<24:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 639/2614 [08:00<24:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 640/2614 [08:00<24:44,  1.33it/s]

	Current Loss: 2.9059
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 641/2614 [08:01<24:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 642/2614 [08:02<24:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 643/2614 [08:03<24:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 644/2614 [08:03<24:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 645/2614 [08:04<24:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 646/2614 [08:05<24:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 647/2614 [08:06<24:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 648/2614 [08:06<24:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 649/2614 [08:07<24:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 650/2614 [08:08<24:36,  1.33it/s]

	Current Loss: 2.9007
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 651/2614 [08:09<24:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 652/2614 [08:09<24:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 653/2614 [08:10<24:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 654/2614 [08:11<24:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 655/2614 [08:12<24:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 656/2614 [08:12<24:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 657/2614 [08:13<24:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 658/2614 [08:14<24:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 659/2614 [08:15<24:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 660/2614 [08:15<24:29,  1.33it/s]

	Current Loss: 2.9006
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 661/2614 [08:16<24:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 662/2614 [08:17<24:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 663/2614 [08:18<24:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 664/2614 [08:18<24:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 665/2614 [08:19<24:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 666/2614 [08:20<24:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 667/2614 [08:21<24:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 668/2614 [08:21<24:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 669/2614 [08:22<24:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 670/2614 [08:23<24:21,  1.33it/s]

	Current Loss: 2.8873
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 671/2614 [08:24<24:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 672/2614 [08:24<24:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 673/2614 [08:25<24:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 674/2614 [08:26<24:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 675/2614 [08:27<24:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 676/2614 [08:27<24:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 677/2614 [08:28<24:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 678/2614 [08:29<24:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 679/2614 [08:30<24:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 680/2614 [08:30<24:14,  1.33it/s]

	Current Loss: 2.8796
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 681/2614 [08:31<24:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 682/2614 [08:32<24:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 683/2614 [08:33<24:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 684/2614 [08:33<24:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 685/2614 [08:34<24:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 686/2614 [08:35<24:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 687/2614 [08:36<24:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 688/2614 [08:37<24:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 689/2614 [08:37<24:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 690/2614 [08:38<24:06,  1.33it/s]

	Current Loss: 2.8726
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 691/2614 [08:39<24:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 692/2614 [08:40<24:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 693/2614 [08:40<24:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 694/2614 [08:41<24:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 695/2614 [08:42<24:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 696/2614 [08:43<24:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 697/2614 [08:43<24:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 698/2614 [08:44<24:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 699/2614 [08:45<23:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 700/2614 [08:46<23:58,  1.33it/s]

	Current Loss: 2.8728
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 701/2614 [08:46<23:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 702/2614 [08:47<23:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 703/2614 [08:48<23:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 704/2614 [08:49<23:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 705/2614 [08:49<23:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 706/2614 [08:50<23:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 707/2614 [08:51<23:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 708/2614 [08:52<23:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 709/2614 [08:52<23:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 710/2614 [08:53<23:51,  1.33it/s]

	Current Loss: 2.8636
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 711/2614 [08:54<23:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 712/2614 [08:55<23:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 713/2614 [08:55<23:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 714/2614 [08:56<23:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 715/2614 [08:57<23:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 716/2614 [08:58<23:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 717/2614 [08:58<23:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 718/2614 [08:59<23:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 719/2614 [09:00<23:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 720/2614 [09:01<23:44,  1.33it/s]

	Current Loss: 2.8489
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 721/2614 [09:01<23:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 722/2614 [09:02<23:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 723/2614 [09:03<23:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 724/2614 [09:04<23:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 725/2614 [09:04<23:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 726/2614 [09:05<23:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 727/2614 [09:06<23:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 728/2614 [09:07<23:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 729/2614 [09:07<23:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 730/2614 [09:08<23:36,  1.33it/s]

	Current Loss: 2.8526
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 731/2614 [09:09<23:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 732/2614 [09:10<23:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 733/2614 [09:10<23:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 734/2614 [09:11<23:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 735/2614 [09:12<23:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 736/2614 [09:13<23:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 737/2614 [09:13<23:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 738/2614 [09:14<23:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 739/2614 [09:15<23:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 740/2614 [09:16<23:28,  1.33it/s]

	Current Loss: 2.8529
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 741/2614 [09:16<23:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 742/2614 [09:17<23:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 743/2614 [09:18<23:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 744/2614 [09:19<23:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 745/2614 [09:19<23:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 746/2614 [09:20<23:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 747/2614 [09:21<23:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 748/2614 [09:22<23:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 749/2614 [09:22<23:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 750/2614 [09:23<23:22,  1.33it/s]

	Current Loss: 2.8427
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 751/2614 [09:24<23:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 752/2614 [09:25<23:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 753/2614 [09:25<23:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 754/2614 [09:26<23:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 755/2614 [09:27<23:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 756/2614 [09:28<23:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 757/2614 [09:28<23:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 758/2614 [09:29<23:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 759/2614 [09:30<23:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 760/2614 [09:31<23:14,  1.33it/s]

	Current Loss: 2.8384
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 761/2614 [09:31<23:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 762/2614 [09:32<23:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 763/2614 [09:33<23:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 764/2614 [09:34<23:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 765/2614 [09:34<23:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 766/2614 [09:35<23:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 767/2614 [09:36<23:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 768/2614 [09:37<23:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 769/2614 [09:37<23:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 770/2614 [09:38<23:06,  1.33it/s]

	Current Loss: 2.8269
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 771/2614 [09:39<23:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 772/2614 [09:40<23:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 773/2614 [09:40<23:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 774/2614 [09:41<23:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 775/2614 [09:42<23:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 776/2614 [09:43<23:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 777/2614 [09:43<23:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 778/2614 [09:44<23:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 779/2614 [09:45<22:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 780/2614 [09:46<22:58,  1.33it/s]

	Current Loss: 2.8217
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 781/2614 [09:46<22:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 782/2614 [09:47<22:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 783/2614 [09:48<22:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 784/2614 [09:49<22:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 785/2614 [09:49<22:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 786/2614 [09:50<22:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 787/2614 [09:51<22:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 788/2614 [09:52<22:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 789/2614 [09:52<22:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 790/2614 [09:53<22:51,  1.33it/s]

	Current Loss: 2.8249
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 791/2614 [09:54<22:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 792/2614 [09:55<22:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 793/2614 [09:55<22:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 794/2614 [09:56<22:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 795/2614 [09:57<22:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 796/2614 [09:58<22:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 797/2614 [09:58<22:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 798/2614 [09:59<22:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 799/2614 [10:00<22:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 800/2614 [10:01<22:44,  1.33it/s]

	Current Loss: 2.8302
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 801/2614 [10:01<22:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 802/2614 [10:02<22:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 803/2614 [10:03<22:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 804/2614 [10:04<22:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 805/2614 [10:04<22:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 806/2614 [10:05<22:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 807/2614 [10:06<22:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 808/2614 [10:07<22:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 809/2614 [10:07<22:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 810/2614 [10:08<22:37,  1.33it/s]

	Current Loss: 2.8203
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 811/2614 [10:09<22:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 812/2614 [10:10<22:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 813/2614 [10:10<22:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 814/2614 [10:11<22:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 815/2614 [10:12<22:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 816/2614 [10:13<22:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 817/2614 [10:14<22:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 818/2614 [10:14<22:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 819/2614 [10:15<22:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 820/2614 [10:16<22:28,  1.33it/s]

	Current Loss: 2.8100
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 821/2614 [10:17<22:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 822/2614 [10:17<22:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 823/2614 [10:18<22:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 824/2614 [10:19<22:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 825/2614 [10:20<22:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 826/2614 [10:20<22:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 827/2614 [10:21<22:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 828/2614 [10:22<22:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 829/2614 [10:23<22:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 830/2614 [10:23<22:21,  1.33it/s]

	Current Loss: 2.8141
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 831/2614 [10:24<22:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 832/2614 [10:25<22:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 833/2614 [10:26<22:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 834/2614 [10:26<22:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 835/2614 [10:27<22:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 836/2614 [10:28<22:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 837/2614 [10:29<22:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 838/2614 [10:29<22:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 839/2614 [10:30<22:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 840/2614 [10:31<22:14,  1.33it/s]

	Current Loss: 2.8072
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 841/2614 [10:32<22:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 842/2614 [10:32<22:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 843/2614 [10:33<22:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 844/2614 [10:34<22:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 845/2614 [10:35<22:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 846/2614 [10:35<22:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 847/2614 [10:36<22:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 848/2614 [10:37<22:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 849/2614 [10:38<22:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 850/2614 [10:38<22:06,  1.33it/s]

	Current Loss: 2.8299
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 851/2614 [10:39<22:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 852/2614 [10:40<22:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 853/2614 [10:41<22:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 854/2614 [10:41<22:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 855/2614 [10:42<22:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 856/2614 [10:43<22:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 857/2614 [10:44<22:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 858/2614 [10:44<22:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 859/2614 [10:45<22:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 860/2614 [10:46<21:59,  1.33it/s]

	Current Loss: 2.8178
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 861/2614 [10:47<21:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 862/2614 [10:47<21:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 863/2614 [10:48<21:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 864/2614 [10:49<21:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 865/2614 [10:50<21:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 866/2614 [10:50<21:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 867/2614 [10:51<21:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 868/2614 [10:52<21:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 869/2614 [10:53<21:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 870/2614 [10:53<21:51,  1.33it/s]

	Current Loss: 2.7998
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 871/2614 [10:54<21:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 872/2614 [10:55<21:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 873/2614 [10:56<21:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 874/2614 [10:56<21:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 875/2614 [10:57<21:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 876/2614 [10:58<21:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 877/2614 [10:59<21:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 878/2614 [10:59<21:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 879/2614 [11:00<21:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 880/2614 [11:01<21:43,  1.33it/s]

	Current Loss: 2.7950
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 881/2614 [11:02<21:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 882/2614 [11:02<21:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 883/2614 [11:03<21:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 884/2614 [11:04<21:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 885/2614 [11:05<21:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 886/2614 [11:05<21:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 887/2614 [11:06<21:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 888/2614 [11:07<21:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 889/2614 [11:08<21:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 890/2614 [11:08<21:36,  1.33it/s]

	Current Loss: 2.7910
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 891/2614 [11:09<21:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 892/2614 [11:10<21:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 893/2614 [11:11<21:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 894/2614 [11:11<21:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 895/2614 [11:12<21:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 896/2614 [11:13<21:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 897/2614 [11:14<21:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 898/2614 [11:14<21:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 899/2614 [11:15<21:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 900/2614 [11:16<21:28,  1.33it/s]

	Current Loss: 2.7872
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 901/2614 [11:17<21:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 902/2614 [11:17<21:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 903/2614 [11:18<21:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 904/2614 [11:19<21:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 905/2614 [11:20<21:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 906/2614 [11:20<21:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 907/2614 [11:21<21:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 908/2614 [11:22<21:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 909/2614 [11:23<21:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 910/2614 [11:23<21:21,  1.33it/s]

	Current Loss: 2.7825
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 911/2614 [11:24<21:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 912/2614 [11:25<21:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 913/2614 [11:26<21:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 914/2614 [11:26<21:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 915/2614 [11:27<21:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 916/2614 [11:28<21:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 917/2614 [11:29<21:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 918/2614 [11:29<21:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 919/2614 [11:30<21:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 920/2614 [11:31<21:14,  1.33it/s]

	Current Loss: 2.7830
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 921/2614 [11:32<21:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 922/2614 [11:32<21:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 923/2614 [11:33<21:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 924/2614 [11:34<21:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 925/2614 [11:35<21:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 926/2614 [11:35<21:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 927/2614 [11:36<21:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 928/2614 [11:37<21:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 929/2614 [11:38<21:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 930/2614 [11:38<21:06,  1.33it/s]

	Current Loss: 2.7758
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 931/2614 [11:39<21:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 932/2614 [11:40<21:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 933/2614 [11:41<21:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 934/2614 [11:41<21:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 935/2614 [11:42<21:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 936/2614 [11:43<21:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 937/2614 [11:44<21:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 938/2614 [11:44<21:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 939/2614 [11:45<20:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 940/2614 [11:46<20:59,  1.33it/s]

	Current Loss: 2.7817
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 941/2614 [11:47<20:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 942/2614 [11:47<20:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 943/2614 [11:48<20:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 944/2614 [11:49<20:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 945/2614 [11:50<20:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 946/2614 [11:51<20:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 947/2614 [11:51<20:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 948/2614 [11:52<20:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 949/2614 [11:53<20:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 950/2614 [11:54<20:52,  1.33it/s]

	Current Loss: 2.7708
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 951/2614 [11:54<20:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 952/2614 [11:55<20:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 953/2614 [11:56<20:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 954/2614 [11:57<20:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 955/2614 [11:57<20:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 956/2614 [11:58<20:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 957/2614 [11:59<20:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 958/2614 [12:00<20:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 959/2614 [12:00<20:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 960/2614 [12:01<20:43,  1.33it/s]

	Current Loss: 2.7719
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 961/2614 [12:02<20:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 962/2614 [12:03<20:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 963/2614 [12:03<20:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 964/2614 [12:04<20:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 965/2614 [12:05<20:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 966/2614 [12:06<20:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 967/2614 [12:06<20:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 968/2614 [12:07<20:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 969/2614 [12:08<20:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 970/2614 [12:09<20:35,  1.33it/s]

	Current Loss: 2.7677
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 971/2614 [12:09<20:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 972/2614 [12:10<20:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 973/2614 [12:11<20:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 974/2614 [12:12<20:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 975/2614 [12:12<20:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 976/2614 [12:13<20:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 977/2614 [12:14<20:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 978/2614 [12:15<20:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 979/2614 [12:15<20:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 980/2614 [12:16<20:27,  1.33it/s]

	Current Loss: 2.7660
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 981/2614 [12:17<20:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 982/2614 [12:18<20:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 983/2614 [12:18<20:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 984/2614 [12:19<20:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 985/2614 [12:20<20:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 986/2614 [12:21<20:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 987/2614 [12:21<20:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 988/2614 [12:22<20:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 989/2614 [12:23<20:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 990/2614 [12:24<20:20,  1.33it/s]

	Current Loss: 2.7580
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 991/2614 [12:24<20:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 992/2614 [12:25<20:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 993/2614 [12:26<20:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 994/2614 [12:27<20:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 995/2614 [12:27<20:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 996/2614 [12:28<20:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 997/2614 [12:29<20:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 998/2614 [12:30<20:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 999/2614 [12:30<20:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1000/2614 [12:31<20:13,  1.33it/s]

	Current Loss: 2.7562
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1001/2614 [12:32<20:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1002/2614 [12:33<20:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1003/2614 [12:33<20:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1004/2614 [12:34<20:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1005/2614 [12:35<20:25,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1006/2614 [12:36<20:19,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1007/2614 [12:36<20:15,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1008/2614 [12:37<20:12,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1009/2614 [12:38<20:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1010/2614 [12:39<20:08,  1.33it/s]

	Current Loss: 2.7494
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1011/2614 [12:39<20:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1012/2614 [12:40<20:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1013/2614 [12:41<20:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1014/2614 [12:42<20:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1015/2614 [12:42<20:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1016/2614 [12:43<20:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1017/2614 [12:44<20:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1018/2614 [12:45<19:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1019/2614 [12:45<19:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1020/2614 [12:46<19:58,  1.33it/s]

	Current Loss: 2.7466
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1021/2614 [12:47<19:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1022/2614 [12:48<19:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1023/2614 [12:48<19:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1024/2614 [12:49<19:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1025/2614 [12:50<19:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1026/2614 [12:51<19:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1027/2614 [12:51<19:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1028/2614 [12:52<19:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1029/2614 [12:53<19:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1030/2614 [12:54<19:50,  1.33it/s]

	Current Loss: 2.7474
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1031/2614 [12:54<19:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1032/2614 [12:55<19:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1033/2614 [12:56<19:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1034/2614 [12:57<19:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1035/2614 [12:57<19:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1036/2614 [12:58<19:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1037/2614 [12:59<19:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1038/2614 [13:00<19:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1039/2614 [13:00<19:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1040/2614 [13:01<19:45,  1.33it/s]

	Current Loss: 2.7485
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1041/2614 [13:02<19:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1042/2614 [13:03<19:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1043/2614 [13:03<19:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1044/2614 [13:04<19:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1045/2614 [13:05<19:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1046/2614 [13:06<19:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1047/2614 [13:06<19:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1048/2614 [13:07<19:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1049/2614 [13:08<19:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1050/2614 [13:09<19:35,  1.33it/s]

	Current Loss: 2.7462
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1051/2614 [13:09<19:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1052/2614 [13:10<19:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1053/2614 [13:11<19:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1054/2614 [13:12<19:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1055/2614 [13:12<19:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1056/2614 [13:13<19:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1057/2614 [13:14<19:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1058/2614 [13:15<19:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1059/2614 [13:16<19:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1060/2614 [13:16<19:28,  1.33it/s]

	Current Loss: 2.7431
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1061/2614 [13:17<19:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1062/2614 [13:18<19:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1063/2614 [13:19<19:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1064/2614 [13:19<19:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1065/2614 [13:20<19:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1066/2614 [13:21<19:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1067/2614 [13:22<19:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1068/2614 [13:22<19:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1069/2614 [13:23<19:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1070/2614 [13:24<19:21,  1.33it/s]

	Current Loss: 2.7415
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1071/2614 [13:25<19:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1072/2614 [13:25<19:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1073/2614 [13:26<19:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1074/2614 [13:27<19:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1075/2614 [13:28<19:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1076/2614 [13:28<19:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1077/2614 [13:29<19:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1078/2614 [13:30<19:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1079/2614 [13:31<19:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1080/2614 [13:31<19:13,  1.33it/s]

	Current Loss: 2.7329
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1081/2614 [13:32<19:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1082/2614 [13:33<19:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1083/2614 [13:34<19:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1084/2614 [13:34<19:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1085/2614 [13:35<19:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1086/2614 [13:36<19:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1087/2614 [13:37<19:29,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1088/2614 [13:37<19:22,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1089/2614 [13:38<19:17,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1090/2614 [13:39<19:13,  1.32it/s]

	Current Loss: 2.7318
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1091/2614 [13:40<19:10,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1092/2614 [13:40<19:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1093/2614 [13:41<19:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1094/2614 [13:42<19:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1095/2614 [13:43<19:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1096/2614 [13:43<19:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1097/2614 [13:44<19:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1098/2614 [13:45<19:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1099/2614 [13:46<19:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1100/2614 [13:46<18:59,  1.33it/s]

	Current Loss: 2.7280
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1101/2614 [13:47<18:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1102/2614 [13:48<18:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1103/2614 [13:49<18:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1104/2614 [13:49<18:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1105/2614 [13:50<18:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1106/2614 [13:51<18:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1107/2614 [13:52<18:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1108/2614 [13:52<18:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1109/2614 [13:53<18:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1110/2614 [13:54<18:52,  1.33it/s]

	Current Loss: 2.7203
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1111/2614 [13:55<18:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1112/2614 [13:55<18:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1113/2614 [13:56<18:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1114/2614 [13:57<18:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1115/2614 [13:58<18:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1116/2614 [13:58<18:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1117/2614 [13:59<18:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1118/2614 [14:00<18:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1119/2614 [14:01<18:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1120/2614 [14:01<18:43,  1.33it/s]

	Current Loss: 2.7215
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1121/2614 [14:02<18:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1122/2614 [14:03<18:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1123/2614 [14:04<18:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1124/2614 [14:04<18:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1125/2614 [14:05<18:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1126/2614 [14:06<18:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1127/2614 [14:07<18:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1128/2614 [14:07<18:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1129/2614 [14:08<18:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1130/2614 [14:09<18:35,  1.33it/s]

	Current Loss: 2.7422
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1131/2614 [14:10<18:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1132/2614 [14:10<18:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1133/2614 [14:11<18:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1134/2614 [14:12<18:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1135/2614 [14:13<18:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1136/2614 [14:13<18:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1137/2614 [14:14<18:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1138/2614 [14:15<18:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1139/2614 [14:16<18:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1140/2614 [14:16<18:28,  1.33it/s]

	Current Loss: 2.7349
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1141/2614 [14:17<18:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1142/2614 [14:18<18:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1143/2614 [14:19<18:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1144/2614 [14:19<18:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1145/2614 [14:20<18:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1146/2614 [14:21<18:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1147/2614 [14:22<18:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1148/2614 [14:22<18:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1149/2614 [14:23<18:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1150/2614 [14:24<18:20,  1.33it/s]

	Current Loss: 2.7227
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1151/2614 [14:25<18:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1152/2614 [14:25<18:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1153/2614 [14:26<18:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1154/2614 [14:27<18:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1155/2614 [14:28<18:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1156/2614 [14:28<18:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1157/2614 [14:29<18:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1158/2614 [14:30<18:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1159/2614 [14:31<18:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1160/2614 [14:32<18:13,  1.33it/s]

	Current Loss: 2.7144
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1161/2614 [14:32<18:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1162/2614 [14:33<18:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1163/2614 [14:34<18:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1164/2614 [14:35<18:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1165/2614 [14:35<18:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1166/2614 [14:36<18:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1167/2614 [14:37<18:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1168/2614 [14:38<18:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1169/2614 [14:38<18:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1170/2614 [14:39<18:06,  1.33it/s]

	Current Loss: 2.7165
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1171/2614 [14:40<18:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1172/2614 [14:41<18:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1173/2614 [14:41<18:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1174/2614 [14:42<18:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1175/2614 [14:43<18:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1176/2614 [14:44<18:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1177/2614 [14:44<18:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1178/2614 [14:45<17:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1179/2614 [14:46<17:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1180/2614 [14:47<17:57,  1.33it/s]

	Current Loss: 2.7150
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1181/2614 [14:47<17:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1182/2614 [14:48<17:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1183/2614 [14:49<17:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1184/2614 [14:50<17:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1185/2614 [14:50<17:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1186/2614 [14:51<17:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1187/2614 [14:52<17:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1188/2614 [14:53<17:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1189/2614 [14:53<17:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1190/2614 [14:54<17:50,  1.33it/s]

	Current Loss: 2.7038
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1191/2614 [14:55<17:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1192/2614 [14:56<17:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1193/2614 [14:56<17:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1194/2614 [14:57<17:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1195/2614 [14:58<17:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1196/2614 [14:59<17:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1197/2614 [14:59<17:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1198/2614 [15:00<17:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1199/2614 [15:01<17:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1200/2614 [15:02<17:43,  1.33it/s]

	Current Loss: 2.7008
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1201/2614 [15:02<17:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1202/2614 [15:03<17:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1203/2614 [15:04<17:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1204/2614 [15:05<17:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1205/2614 [15:05<17:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1206/2614 [15:06<17:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1207/2614 [15:07<17:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1208/2614 [15:08<17:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1209/2614 [15:08<17:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1210/2614 [15:09<17:35,  1.33it/s]

	Current Loss: 2.6988
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1211/2614 [15:10<17:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1212/2614 [15:11<17:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1213/2614 [15:11<17:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1214/2614 [15:12<17:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1215/2614 [15:13<17:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1216/2614 [15:14<17:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1217/2614 [15:14<17:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1218/2614 [15:15<17:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1219/2614 [15:16<17:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1220/2614 [15:17<17:28,  1.33it/s]

	Current Loss: 2.6960
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1221/2614 [15:17<17:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1222/2614 [15:18<17:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1223/2614 [15:19<17:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1224/2614 [15:20<17:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1225/2614 [15:20<17:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1226/2614 [15:21<17:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1227/2614 [15:22<17:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1228/2614 [15:23<17:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1229/2614 [15:23<17:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1230/2614 [15:24<17:20,  1.33it/s]

	Current Loss: 2.6942
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1231/2614 [15:25<17:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1232/2614 [15:26<17:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1233/2614 [15:26<17:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1234/2614 [15:27<17:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1235/2614 [15:28<17:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1236/2614 [15:29<17:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1237/2614 [15:29<17:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1238/2614 [15:30<17:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1239/2614 [15:31<17:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1240/2614 [15:32<17:13,  1.33it/s]

	Current Loss: 2.6852
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1241/2614 [15:32<17:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1242/2614 [15:33<17:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1243/2614 [15:34<17:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1244/2614 [15:35<17:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1245/2614 [15:35<17:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1246/2614 [15:36<17:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1247/2614 [15:37<17:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1248/2614 [15:38<17:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1249/2614 [15:38<17:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1250/2614 [15:39<17:19,  1.31it/s]

	Current Loss: 2.6908
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1251/2614 [15:40<17:14,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1252/2614 [15:41<17:10,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1253/2614 [15:41<17:07,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1254/2614 [15:42<17:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1255/2614 [15:43<17:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1256/2614 [15:44<17:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1257/2614 [15:44<17:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1258/2614 [15:45<17:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1259/2614 [15:46<16:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1260/2614 [15:47<16:58,  1.33it/s]

	Current Loss: 2.6893
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1261/2614 [15:47<16:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1262/2614 [15:48<16:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1263/2614 [15:49<16:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1264/2614 [15:50<16:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1265/2614 [15:50<16:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1266/2614 [15:51<16:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1267/2614 [15:52<16:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1268/2614 [15:53<16:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1269/2614 [15:53<16:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1270/2614 [15:54<16:52,  1.33it/s]

	Current Loss: 2.6852
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1271/2614 [15:55<16:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1272/2614 [15:56<16:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1273/2614 [15:57<16:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1274/2614 [15:57<16:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1275/2614 [15:58<16:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1276/2614 [15:59<16:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1277/2614 [16:00<16:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1278/2614 [16:00<16:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1279/2614 [16:01<16:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1280/2614 [16:02<16:42,  1.33it/s]

	Current Loss: 2.6785
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1281/2614 [16:03<16:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1282/2614 [16:03<16:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1283/2614 [16:04<16:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1284/2614 [16:05<16:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1285/2614 [16:06<16:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1286/2614 [16:06<16:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1287/2614 [16:07<16:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1288/2614 [16:08<16:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1289/2614 [16:09<16:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1290/2614 [16:09<16:35,  1.33it/s]

	Current Loss: 2.6728
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1291/2614 [16:10<16:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1292/2614 [16:11<16:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1293/2614 [16:12<16:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1294/2614 [16:12<16:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1295/2614 [16:13<16:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1296/2614 [16:14<16:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1297/2614 [16:15<16:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1298/2614 [16:15<16:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1299/2614 [16:16<16:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1300/2614 [16:17<16:28,  1.33it/s]

	Current Loss: 2.6722
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1301/2614 [16:18<16:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1302/2614 [16:18<16:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1303/2614 [16:19<16:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1304/2614 [16:20<16:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1305/2614 [16:21<16:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1306/2614 [16:21<16:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1307/2614 [16:22<16:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1308/2614 [16:23<16:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1309/2614 [16:24<16:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1310/2614 [16:24<16:20,  1.33it/s]

	Current Loss: 2.6672
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1311/2614 [16:25<16:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1312/2614 [16:26<16:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1313/2614 [16:27<16:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1314/2614 [16:27<16:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1315/2614 [16:28<16:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1316/2614 [16:29<16:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1317/2614 [16:30<16:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1318/2614 [16:30<16:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1319/2614 [16:31<16:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1320/2614 [16:32<16:12,  1.33it/s]

	Current Loss: 2.6661
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1321/2614 [16:33<16:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1322/2614 [16:33<16:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1323/2614 [16:34<16:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1324/2614 [16:35<16:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1325/2614 [16:36<16:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1326/2614 [16:36<16:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1327/2614 [16:37<16:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1328/2614 [16:38<16:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1329/2614 [16:39<16:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1330/2614 [16:39<16:05,  1.33it/s]

	Current Loss: 2.6590
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1331/2614 [16:40<16:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1332/2614 [16:41<16:23,  1.30it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1333/2614 [16:42<16:16,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1334/2614 [16:42<16:13,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1335/2614 [16:43<16:07,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1336/2614 [16:44<16:04,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1337/2614 [16:45<16:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1338/2614 [16:45<16:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1339/2614 [16:46<15:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1340/2614 [16:47<15:58,  1.33it/s]

	Current Loss: 2.6561
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1341/2614 [16:48<15:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1342/2614 [16:48<15:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1343/2614 [16:49<15:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1344/2614 [16:50<15:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1345/2614 [16:51<15:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1346/2614 [16:51<15:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1347/2614 [16:52<15:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1348/2614 [16:53<15:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1349/2614 [16:54<15:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1350/2614 [16:54<15:50,  1.33it/s]

	Current Loss: 2.6545
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1351/2614 [16:55<15:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1352/2614 [16:56<15:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1353/2614 [16:57<15:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1354/2614 [16:57<15:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1355/2614 [16:58<15:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1356/2614 [16:59<15:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1357/2614 [17:00<15:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1358/2614 [17:00<15:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1359/2614 [17:01<15:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1360/2614 [17:02<15:43,  1.33it/s]

	Current Loss: 2.6564
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1361/2614 [17:03<15:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1362/2614 [17:03<15:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1363/2614 [17:04<15:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1364/2614 [17:05<15:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1365/2614 [17:06<15:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1366/2614 [17:06<15:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1367/2614 [17:07<15:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1368/2614 [17:08<15:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1369/2614 [17:09<15:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1370/2614 [17:09<15:35,  1.33it/s]

	Current Loss: 2.6495
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1371/2614 [17:10<15:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1372/2614 [17:11<15:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1373/2614 [17:12<15:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1374/2614 [17:13<15:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1375/2614 [17:13<15:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1376/2614 [17:14<15:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1377/2614 [17:15<15:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1378/2614 [17:16<15:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1379/2614 [17:16<15:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1380/2614 [17:17<15:27,  1.33it/s]

	Current Loss: 2.6458
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1381/2614 [17:18<15:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1382/2614 [17:19<15:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1383/2614 [17:19<15:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1384/2614 [17:20<15:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1385/2614 [17:21<15:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1386/2614 [17:22<15:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1387/2614 [17:22<15:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1388/2614 [17:23<15:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1389/2614 [17:24<15:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1390/2614 [17:25<15:20,  1.33it/s]

	Current Loss: 2.6598
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1391/2614 [17:25<15:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1392/2614 [17:26<15:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1393/2614 [17:27<15:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1394/2614 [17:28<15:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1395/2614 [17:28<15:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1396/2614 [17:29<15:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1397/2614 [17:30<15:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1398/2614 [17:31<15:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1399/2614 [17:31<15:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1400/2614 [17:32<15:13,  1.33it/s]

	Current Loss: 2.6490
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1401/2614 [17:33<15:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1402/2614 [17:34<15:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1403/2614 [17:34<15:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1404/2614 [17:35<15:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1405/2614 [17:36<15:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1406/2614 [17:37<15:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1407/2614 [17:37<15:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1408/2614 [17:38<15:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1409/2614 [17:39<15:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1410/2614 [17:40<15:05,  1.33it/s]

	Current Loss: 2.6366
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1411/2614 [17:40<15:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1412/2614 [17:41<15:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1413/2614 [17:42<15:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1414/2614 [17:43<15:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1415/2614 [17:43<15:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1416/2614 [17:44<15:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1417/2614 [17:45<15:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1418/2614 [17:46<14:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1419/2614 [17:46<14:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1420/2614 [17:47<14:57,  1.33it/s]

	Current Loss: 2.6333
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1421/2614 [17:48<14:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1422/2614 [17:49<14:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1423/2614 [17:49<14:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1424/2614 [17:50<14:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1425/2614 [17:51<14:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1426/2614 [17:52<14:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1427/2614 [17:52<14:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1428/2614 [17:53<14:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1429/2614 [17:54<14:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1430/2614 [17:55<14:50,  1.33it/s]

	Current Loss: 2.6318
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1431/2614 [17:55<14:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1432/2614 [17:56<14:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1433/2614 [17:57<14:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1434/2614 [17:58<14:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1435/2614 [17:58<14:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1436/2614 [17:59<14:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1437/2614 [18:00<14:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1438/2614 [18:01<14:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1439/2614 [18:01<14:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1440/2614 [18:02<14:42,  1.33it/s]

	Current Loss: 2.6286
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1441/2614 [18:03<14:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1442/2614 [18:04<14:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1443/2614 [18:04<14:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1444/2614 [18:05<14:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1445/2614 [18:06<14:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1446/2614 [18:07<14:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1447/2614 [18:07<14:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1448/2614 [18:08<14:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1449/2614 [18:09<14:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1450/2614 [18:10<14:34,  1.33it/s]

	Current Loss: 2.6243
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1451/2614 [18:10<14:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1452/2614 [18:11<14:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1453/2614 [18:12<14:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1454/2614 [18:13<14:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1455/2614 [18:13<14:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1456/2614 [18:14<14:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1457/2614 [18:15<14:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1458/2614 [18:16<14:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1459/2614 [18:16<14:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1460/2614 [18:17<14:27,  1.33it/s]

	Current Loss: 2.6203
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1461/2614 [18:18<14:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1462/2614 [18:19<14:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1463/2614 [18:19<14:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1464/2614 [18:20<14:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1465/2614 [18:21<14:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1466/2614 [18:22<14:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1467/2614 [18:22<14:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1468/2614 [18:23<14:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1469/2614 [18:24<14:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1470/2614 [18:25<14:20,  1.33it/s]

	Current Loss: 2.6230
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1471/2614 [18:25<14:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1472/2614 [18:26<14:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1473/2614 [18:27<14:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1474/2614 [18:28<14:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1475/2614 [18:28<14:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1476/2614 [18:29<14:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1477/2614 [18:30<14:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1478/2614 [18:31<14:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1479/2614 [18:31<14:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1480/2614 [18:32<14:12,  1.33it/s]

	Current Loss: 2.6161
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1481/2614 [18:33<14:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1482/2614 [18:34<14:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1483/2614 [18:34<14:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1484/2614 [18:35<14:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1485/2614 [18:36<14:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1486/2614 [18:37<14:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1487/2614 [18:37<14:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1488/2614 [18:38<14:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1489/2614 [18:39<14:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1490/2614 [18:40<14:05,  1.33it/s]

	Current Loss: 2.6168
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1491/2614 [18:40<14:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1492/2614 [18:41<14:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1493/2614 [18:42<14:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1494/2614 [18:43<14:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1495/2614 [18:44<14:13,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1496/2614 [18:44<13:56,  1.34it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1497/2614 [18:45<13:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1498/2614 [18:46<13:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1499/2614 [18:46<13:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1500/2614 [18:47<13:56,  1.33it/s]

	Current Loss: 2.6087
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1501/2614 [18:48<13:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1502/2614 [18:49<13:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1503/2614 [18:50<13:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1504/2614 [18:50<13:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1505/2614 [18:51<13:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1506/2614 [18:52<13:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1507/2614 [18:53<13:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1508/2614 [18:53<13:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1509/2614 [18:54<13:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1510/2614 [18:55<13:50,  1.33it/s]

	Current Loss: 2.6059
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1511/2614 [18:56<13:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1512/2614 [18:56<13:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1513/2614 [18:57<13:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1514/2614 [18:58<13:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1515/2614 [18:59<13:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1516/2614 [18:59<13:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1517/2614 [19:00<13:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1518/2614 [19:01<13:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1519/2614 [19:02<13:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1520/2614 [19:02<13:42,  1.33it/s]

	Current Loss: 2.6118
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1521/2614 [19:03<13:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1522/2614 [19:04<13:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1523/2614 [19:05<13:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1524/2614 [19:05<13:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1525/2614 [19:06<13:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1526/2614 [19:07<13:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1527/2614 [19:08<13:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1528/2614 [19:08<13:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1529/2614 [19:09<13:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1530/2614 [19:10<13:34,  1.33it/s]

	Current Loss: 2.6044
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1531/2614 [19:11<13:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1532/2614 [19:11<13:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1533/2614 [19:12<13:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1534/2614 [19:13<13:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1535/2614 [19:14<13:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1536/2614 [19:14<13:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1537/2614 [19:15<13:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1538/2614 [19:16<13:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1539/2614 [19:17<13:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1540/2614 [19:17<13:27,  1.33it/s]

	Current Loss: 2.5961
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1541/2614 [19:18<13:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1542/2614 [19:19<13:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1543/2614 [19:20<13:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1544/2614 [19:20<13:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1545/2614 [19:21<13:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1546/2614 [19:22<13:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1547/2614 [19:23<13:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1548/2614 [19:23<13:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1549/2614 [19:24<13:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1550/2614 [19:25<13:20,  1.33it/s]

	Current Loss: 2.6063
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1551/2614 [19:26<13:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1552/2614 [19:26<13:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1553/2614 [19:27<13:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1554/2614 [19:28<13:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1555/2614 [19:29<13:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1556/2614 [19:29<13:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1557/2614 [19:30<13:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1558/2614 [19:31<13:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1559/2614 [19:32<13:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1560/2614 [19:32<13:12,  1.33it/s]

	Current Loss: 2.5957
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1561/2614 [19:33<13:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1562/2614 [19:34<13:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1563/2614 [19:35<13:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1564/2614 [19:35<13:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1565/2614 [19:36<13:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1566/2614 [19:37<13:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1567/2614 [19:38<13:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1568/2614 [19:38<13:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1569/2614 [19:39<13:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1570/2614 [19:40<13:04,  1.33it/s]

	Current Loss: 2.5956
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1571/2614 [19:41<13:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1572/2614 [19:41<13:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1573/2614 [19:42<13:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1574/2614 [19:43<13:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1575/2614 [19:44<13:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1576/2614 [19:44<13:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1577/2614 [19:45<13:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1578/2614 [19:46<13:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1579/2614 [19:47<12:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1580/2614 [19:47<12:57,  1.33it/s]

	Current Loss: 2.5877
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1581/2614 [19:48<12:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1582/2614 [19:49<12:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1583/2614 [19:50<12:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1584/2614 [19:50<12:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1585/2614 [19:51<12:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1586/2614 [19:52<12:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1587/2614 [19:53<12:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1588/2614 [19:53<12:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1589/2614 [19:54<12:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1590/2614 [19:55<12:50,  1.33it/s]

	Current Loss: 2.5812
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1591/2614 [19:56<12:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1592/2614 [19:56<12:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1593/2614 [19:57<12:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1594/2614 [19:58<12:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1595/2614 [19:59<12:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1596/2614 [19:59<12:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1597/2614 [20:00<12:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1598/2614 [20:01<12:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1599/2614 [20:02<12:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1600/2614 [20:02<12:42,  1.33it/s]

	Current Loss: 2.5794
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1601/2614 [20:03<12:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1602/2614 [20:04<12:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1603/2614 [20:05<12:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1604/2614 [20:05<12:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1605/2614 [20:06<12:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1606/2614 [20:07<12:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1607/2614 [20:08<12:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1608/2614 [20:08<12:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1609/2614 [20:09<12:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1610/2614 [20:10<12:35,  1.33it/s]

	Current Loss: 2.5791
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1611/2614 [20:11<12:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1612/2614 [20:11<12:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1613/2614 [20:12<12:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1614/2614 [20:13<12:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1615/2614 [20:14<12:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1616/2614 [20:14<12:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1617/2614 [20:15<12:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1618/2614 [20:16<12:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1619/2614 [20:17<12:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1620/2614 [20:17<12:27,  1.33it/s]

	Current Loss: 2.5750
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1621/2614 [20:18<12:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1622/2614 [20:19<12:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1623/2614 [20:20<12:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1624/2614 [20:20<12:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1625/2614 [20:21<12:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1626/2614 [20:22<12:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1627/2614 [20:23<12:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1628/2614 [20:24<12:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1629/2614 [20:24<12:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1630/2614 [20:25<12:20,  1.33it/s]

	Current Loss: 2.5709
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1631/2614 [20:26<12:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1632/2614 [20:27<12:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1633/2614 [20:27<12:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1634/2614 [20:28<12:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1635/2614 [20:29<12:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1636/2614 [20:30<12:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1637/2614 [20:30<12:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1638/2614 [20:31<12:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1639/2614 [20:32<12:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1640/2614 [20:33<12:12,  1.33it/s]

	Current Loss: 2.5739
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1641/2614 [20:33<12:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1642/2614 [20:34<12:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1643/2614 [20:35<12:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1644/2614 [20:36<12:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1645/2614 [20:36<12:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1646/2614 [20:37<12:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1647/2614 [20:38<12:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1648/2614 [20:39<12:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1649/2614 [20:39<12:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1650/2614 [20:40<12:04,  1.33it/s]

	Current Loss: 2.5732
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1651/2614 [20:41<12:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1652/2614 [20:42<12:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1653/2614 [20:42<12:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1654/2614 [20:43<12:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1655/2614 [20:44<12:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1656/2614 [20:45<12:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1657/2614 [20:45<11:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1658/2614 [20:46<11:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1659/2614 [20:47<11:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1660/2614 [20:48<11:58,  1.33it/s]

	Current Loss: 2.5670
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1661/2614 [20:48<11:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1662/2614 [20:49<11:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1663/2614 [20:50<11:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1664/2614 [20:51<11:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1665/2614 [20:51<11:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1666/2614 [20:52<11:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1667/2614 [20:53<11:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1668/2614 [20:54<11:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1669/2614 [20:54<11:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1670/2614 [20:55<11:49,  1.33it/s]

	Current Loss: 2.5637
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1671/2614 [20:56<11:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1672/2614 [20:57<11:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1673/2614 [20:57<11:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1674/2614 [20:58<11:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1675/2614 [20:59<11:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1676/2614 [21:00<11:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1677/2614 [21:00<11:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1678/2614 [21:01<11:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1679/2614 [21:02<11:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1680/2614 [21:03<11:41,  1.33it/s]

	Current Loss: 2.5594
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1681/2614 [21:03<11:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1682/2614 [21:04<11:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1683/2614 [21:05<11:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1684/2614 [21:06<11:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1685/2614 [21:06<11:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1686/2614 [21:07<11:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1687/2614 [21:08<11:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1688/2614 [21:09<11:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1689/2614 [21:09<11:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1690/2614 [21:10<11:34,  1.33it/s]

	Current Loss: 2.5627
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1691/2614 [21:11<11:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1692/2614 [21:12<11:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1693/2614 [21:12<11:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1694/2614 [21:13<11:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1695/2614 [21:14<11:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1696/2614 [21:15<11:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1697/2614 [21:15<11:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1698/2614 [21:16<11:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1699/2614 [21:17<11:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1700/2614 [21:18<11:27,  1.33it/s]

	Current Loss: 2.5636
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1701/2614 [21:18<11:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1702/2614 [21:19<11:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1703/2614 [21:20<11:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1704/2614 [21:21<11:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1705/2614 [21:21<11:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1706/2614 [21:22<11:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1707/2614 [21:23<11:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1708/2614 [21:24<11:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1709/2614 [21:24<11:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1710/2614 [21:25<11:19,  1.33it/s]

	Current Loss: 2.5618
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1711/2614 [21:26<11:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1712/2614 [21:27<11:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1713/2614 [21:27<11:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1714/2614 [21:28<11:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1715/2614 [21:29<11:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1716/2614 [21:30<11:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1717/2614 [21:30<11:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1718/2614 [21:31<11:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1719/2614 [21:32<11:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1720/2614 [21:33<11:11,  1.33it/s]

	Current Loss: 2.5571
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1721/2614 [21:33<11:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1722/2614 [21:34<11:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1723/2614 [21:35<11:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1724/2614 [21:36<11:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1725/2614 [21:36<11:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1726/2614 [21:37<11:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1727/2614 [21:38<11:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1728/2614 [21:39<11:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1729/2614 [21:39<11:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1730/2614 [21:40<11:04,  1.33it/s]

	Current Loss: 2.5557
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1731/2614 [21:41<11:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1732/2614 [21:42<11:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1733/2614 [21:42<11:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1734/2614 [21:43<11:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1735/2614 [21:44<11:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1736/2614 [21:45<10:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1737/2614 [21:45<10:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1738/2614 [21:46<10:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1739/2614 [21:47<10:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1740/2614 [21:48<10:56,  1.33it/s]

	Current Loss: 2.5502
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1741/2614 [21:48<10:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1742/2614 [21:49<10:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1743/2614 [21:50<10:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1744/2614 [21:51<10:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1745/2614 [21:51<10:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1746/2614 [21:52<10:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1747/2614 [21:53<10:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1748/2614 [21:54<10:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1749/2614 [21:54<10:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1750/2614 [21:55<10:49,  1.33it/s]

	Current Loss: 2.5501
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1751/2614 [21:56<10:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1752/2614 [21:57<10:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1753/2614 [21:57<10:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1754/2614 [21:58<10:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1755/2614 [21:59<10:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1756/2614 [22:00<10:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1757/2614 [22:00<10:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1758/2614 [22:01<10:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1759/2614 [22:02<10:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1760/2614 [22:03<10:42,  1.33it/s]

	Current Loss: 2.5570
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1761/2614 [22:03<10:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1762/2614 [22:04<10:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1763/2614 [22:05<10:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1764/2614 [22:06<10:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1765/2614 [22:06<10:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1766/2614 [22:07<10:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1767/2614 [22:08<10:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1768/2614 [22:09<10:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1769/2614 [22:10<10:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1770/2614 [22:10<10:34,  1.33it/s]

	Current Loss: 2.5481
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1771/2614 [22:11<10:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1772/2614 [22:12<10:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1773/2614 [22:13<10:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1774/2614 [22:13<10:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1775/2614 [22:14<10:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1776/2614 [22:15<10:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1777/2614 [22:16<10:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1778/2614 [22:16<10:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1779/2614 [22:17<10:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1780/2614 [22:18<10:26,  1.33it/s]

	Current Loss: 2.5474
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1781/2614 [22:19<10:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1782/2614 [22:19<10:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1783/2614 [22:20<10:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1784/2614 [22:21<10:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1785/2614 [22:22<10:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1786/2614 [22:22<10:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1787/2614 [22:23<10:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1788/2614 [22:24<10:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1789/2614 [22:25<10:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1790/2614 [22:25<10:19,  1.33it/s]

	Current Loss: 2.5374
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1791/2614 [22:26<10:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1792/2614 [22:27<10:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1793/2614 [22:28<10:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1794/2614 [22:28<10:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1795/2614 [22:29<10:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1796/2614 [22:30<10:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1797/2614 [22:31<10:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1798/2614 [22:31<10:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1799/2614 [22:32<10:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1800/2614 [22:33<10:12,  1.33it/s]

	Current Loss: 2.5423
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1801/2614 [22:34<10:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1802/2614 [22:34<10:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1803/2614 [22:35<10:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1804/2614 [22:36<10:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1805/2614 [22:37<10:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1806/2614 [22:37<10:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1807/2614 [22:38<10:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1808/2614 [22:39<10:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1809/2614 [22:40<10:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1810/2614 [22:40<10:04,  1.33it/s]

	Current Loss: 2.5398
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1811/2614 [22:41<10:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1812/2614 [22:42<10:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1813/2614 [22:43<10:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1814/2614 [22:43<10:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1815/2614 [22:44<10:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1816/2614 [22:45<10:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1817/2614 [22:46<09:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1818/2614 [22:46<09:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1819/2614 [22:47<09:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1820/2614 [22:48<09:57,  1.33it/s]

	Current Loss: 2.5376
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1821/2614 [22:49<09:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1822/2614 [22:49<09:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1823/2614 [22:50<09:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1824/2614 [22:51<09:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1825/2614 [22:52<09:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1826/2614 [22:52<09:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1827/2614 [22:53<09:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1828/2614 [22:54<09:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1829/2614 [22:55<09:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1830/2614 [22:55<09:49,  1.33it/s]

	Current Loss: 2.5298
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1831/2614 [22:56<09:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1832/2614 [22:57<09:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1833/2614 [22:58<09:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1834/2614 [22:58<09:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1835/2614 [22:59<09:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1836/2614 [23:00<09:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1837/2614 [23:01<09:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1838/2614 [23:01<09:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1839/2614 [23:02<09:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1840/2614 [23:03<09:41,  1.33it/s]

	Current Loss: 2.5280
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1841/2614 [23:04<09:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1842/2614 [23:04<09:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1843/2614 [23:05<09:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1844/2614 [23:06<09:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1845/2614 [23:07<09:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1846/2614 [23:07<09:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1847/2614 [23:08<09:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1848/2614 [23:09<09:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1849/2614 [23:10<09:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1850/2614 [23:10<09:34,  1.33it/s]

	Current Loss: 2.5293
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1851/2614 [23:11<09:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1852/2614 [23:12<09:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1853/2614 [23:13<09:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1854/2614 [23:13<09:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1855/2614 [23:14<09:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1856/2614 [23:15<09:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1857/2614 [23:16<09:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1858/2614 [23:16<09:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1859/2614 [23:17<09:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1860/2614 [23:18<09:27,  1.33it/s]

	Current Loss: 2.5236
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1861/2614 [23:19<09:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1862/2614 [23:19<09:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1863/2614 [23:20<09:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1864/2614 [23:21<09:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1865/2614 [23:22<09:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1866/2614 [23:22<09:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1867/2614 [23:23<09:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1868/2614 [23:24<09:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1869/2614 [23:25<09:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1870/2614 [23:25<09:19,  1.33it/s]

	Current Loss: 2.5275
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1871/2614 [23:26<09:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1872/2614 [23:27<09:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1873/2614 [23:28<09:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1874/2614 [23:28<09:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1875/2614 [23:29<09:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1876/2614 [23:30<09:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1877/2614 [23:31<09:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1878/2614 [23:31<09:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1879/2614 [23:32<09:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1880/2614 [23:33<09:11,  1.33it/s]

	Current Loss: 2.5253
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1881/2614 [23:34<09:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1882/2614 [23:34<09:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1883/2614 [23:35<09:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1884/2614 [23:36<09:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1885/2614 [23:37<09:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1886/2614 [23:37<09:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1887/2614 [23:38<09:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1888/2614 [23:39<09:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1889/2614 [23:40<09:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1890/2614 [23:40<09:04,  1.33it/s]

	Current Loss: 2.5177
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1891/2614 [23:41<09:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1892/2614 [23:42<09:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1893/2614 [23:43<09:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1894/2614 [23:43<09:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1895/2614 [23:44<09:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1896/2614 [23:45<08:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1897/2614 [23:46<08:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1898/2614 [23:47<08:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1899/2614 [23:47<08:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1900/2614 [23:48<08:56,  1.33it/s]

	Current Loss: 2.5181
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1901/2614 [23:49<08:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1902/2614 [23:50<08:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1903/2614 [23:50<08:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1904/2614 [23:51<08:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1905/2614 [23:52<08:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1906/2614 [23:53<08:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1907/2614 [23:53<08:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1908/2614 [23:54<08:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1909/2614 [23:55<08:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1910/2614 [23:56<08:49,  1.33it/s]

	Current Loss: 2.5184
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1911/2614 [23:56<08:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1912/2614 [23:57<08:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1913/2614 [23:58<08:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1914/2614 [23:59<08:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1915/2614 [23:59<08:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1916/2614 [24:00<08:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1917/2614 [24:01<08:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1918/2614 [24:02<08:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1919/2614 [24:02<08:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1920/2614 [24:03<08:41,  1.33it/s]

	Current Loss: 2.5628
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1921/2614 [24:04<08:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1922/2614 [24:05<08:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1923/2614 [24:05<08:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1924/2614 [24:06<08:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1925/2614 [24:07<08:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1926/2614 [24:08<08:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1927/2614 [24:08<08:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1928/2614 [24:09<08:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1929/2614 [24:10<08:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1930/2614 [24:11<08:34,  1.33it/s]

	Current Loss: 2.5482
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1931/2614 [24:11<08:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1932/2614 [24:12<08:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1933/2614 [24:13<08:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1934/2614 [24:14<08:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1935/2614 [24:14<08:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1936/2614 [24:15<08:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1937/2614 [24:16<08:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1938/2614 [24:17<08:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1939/2614 [24:17<08:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1940/2614 [24:18<08:26,  1.33it/s]

	Current Loss: 2.5238
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1941/2614 [24:19<08:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1942/2614 [24:20<08:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1943/2614 [24:20<08:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1944/2614 [24:21<08:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1945/2614 [24:22<08:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1946/2614 [24:23<08:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1947/2614 [24:23<08:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1948/2614 [24:24<08:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1949/2614 [24:25<08:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1950/2614 [24:26<08:19,  1.33it/s]

	Current Loss: 2.5133
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1951/2614 [24:26<08:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1952/2614 [24:27<08:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1953/2614 [24:28<08:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1954/2614 [24:29<08:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1955/2614 [24:29<08:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1956/2614 [24:30<08:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1957/2614 [24:31<08:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1958/2614 [24:32<08:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1959/2614 [24:32<08:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1960/2614 [24:33<08:11,  1.33it/s]

	Current Loss: 2.5145
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1961/2614 [24:34<08:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1962/2614 [24:35<08:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1963/2614 [24:35<08:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1964/2614 [24:36<08:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1965/2614 [24:37<08:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1966/2614 [24:38<08:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1967/2614 [24:38<08:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1968/2614 [24:39<08:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1969/2614 [24:40<08:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1970/2614 [24:41<08:05,  1.33it/s]

	Current Loss: 2.5061
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1971/2614 [24:41<08:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1972/2614 [24:42<08:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1973/2614 [24:43<08:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1974/2614 [24:44<08:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1975/2614 [24:44<08:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1976/2614 [24:45<07:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1977/2614 [24:46<07:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1978/2614 [24:47<07:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1979/2614 [24:47<07:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1980/2614 [24:48<07:56,  1.33it/s]

	Current Loss: 2.5027
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1981/2614 [24:49<07:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1982/2614 [24:50<07:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1983/2614 [24:50<07:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1984/2614 [24:51<07:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1985/2614 [24:52<07:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1986/2614 [24:53<07:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1987/2614 [24:53<07:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1988/2614 [24:54<07:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1989/2614 [24:55<07:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1990/2614 [24:56<07:49,  1.33it/s]

	Current Loss: 2.5137
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1991/2614 [24:56<07:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1992/2614 [24:57<07:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1993/2614 [24:58<07:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1994/2614 [24:59<07:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1995/2614 [24:59<07:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1996/2614 [25:00<07:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1997/2614 [25:01<07:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1998/2614 [25:02<07:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1999/2614 [25:02<07:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2000/2614 [25:03<07:41,  1.33it/s]

	Current Loss: 2.5095
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2001/2614 [25:04<07:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2002/2614 [25:05<07:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2003/2614 [25:05<07:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2004/2614 [25:06<07:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2005/2614 [25:07<07:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2006/2614 [25:08<07:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2007/2614 [25:08<07:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2008/2614 [25:09<07:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2009/2614 [25:10<07:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2010/2614 [25:11<07:34,  1.33it/s]

	Current Loss: 2.5012
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2011/2614 [25:11<07:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2012/2614 [25:12<07:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2013/2614 [25:13<07:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2014/2614 [25:14<07:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2015/2614 [25:14<07:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2016/2614 [25:15<07:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2017/2614 [25:16<07:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2018/2614 [25:17<07:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2019/2614 [25:17<07:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2020/2614 [25:18<07:26,  1.33it/s]

	Current Loss: 2.4954
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2021/2614 [25:19<07:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2022/2614 [25:20<07:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2023/2614 [25:21<07:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2024/2614 [25:21<07:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2025/2614 [25:22<07:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2026/2614 [25:23<07:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2027/2614 [25:24<07:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2028/2614 [25:24<07:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2029/2614 [25:25<07:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2030/2614 [25:26<07:19,  1.33it/s]

	Current Loss: 2.4936
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2031/2614 [25:27<07:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2032/2614 [25:27<07:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2033/2614 [25:28<07:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2034/2614 [25:29<07:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2035/2614 [25:30<07:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2036/2614 [25:30<07:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2037/2614 [25:31<07:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2038/2614 [25:32<07:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2039/2614 [25:33<07:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2040/2614 [25:33<07:11,  1.33it/s]

	Current Loss: 2.4877
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2041/2614 [25:34<07:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2042/2614 [25:35<07:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2043/2614 [25:36<07:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2044/2614 [25:36<07:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2045/2614 [25:37<07:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2046/2614 [25:38<07:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2047/2614 [25:39<07:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2048/2614 [25:39<07:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2049/2614 [25:40<07:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2050/2614 [25:41<07:04,  1.33it/s]

	Current Loss: 2.4901
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2051/2614 [25:42<07:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2052/2614 [25:42<07:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2053/2614 [25:43<07:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2054/2614 [25:44<07:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2055/2614 [25:45<07:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2056/2614 [25:45<06:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2057/2614 [25:46<06:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2058/2614 [25:47<06:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2059/2614 [25:48<06:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2060/2614 [25:48<06:56,  1.33it/s]

	Current Loss: 2.4887
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2061/2614 [25:49<06:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2062/2614 [25:50<06:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2063/2614 [25:51<06:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2064/2614 [25:51<06:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2065/2614 [25:52<06:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2066/2614 [25:53<06:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2067/2614 [25:54<06:56,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2068/2614 [25:54<06:53,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2069/2614 [25:55<06:52,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2070/2614 [25:56<06:51,  1.32it/s]

	Current Loss: 2.4860
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2071/2614 [25:57<06:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2072/2614 [25:57<06:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2073/2614 [25:58<06:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2074/2614 [25:59<06:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2075/2614 [26:00<06:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2076/2614 [26:00<06:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2077/2614 [26:01<06:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2078/2614 [26:02<06:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2079/2614 [26:03<06:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2080/2614 [26:03<06:41,  1.33it/s]

	Current Loss: 2.4879
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2081/2614 [26:04<06:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2082/2614 [26:05<06:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2083/2614 [26:06<06:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2084/2614 [26:06<06:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2085/2614 [26:07<06:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2086/2614 [26:08<06:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2087/2614 [26:09<06:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2088/2614 [26:09<06:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2089/2614 [26:10<06:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2090/2614 [26:11<06:33,  1.33it/s]

	Current Loss: 2.4791
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2091/2614 [26:12<06:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2092/2614 [26:12<06:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2093/2614 [26:13<06:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2094/2614 [26:14<06:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2095/2614 [26:15<06:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2096/2614 [26:15<06:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2097/2614 [26:16<06:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2098/2614 [26:17<06:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2099/2614 [26:18<06:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2100/2614 [26:18<06:26,  1.33it/s]

	Current Loss: 2.4847
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2101/2614 [26:19<06:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2102/2614 [26:20<06:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2103/2614 [26:21<06:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2104/2614 [26:21<06:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2105/2614 [26:22<06:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2106/2614 [26:23<06:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2107/2614 [26:24<06:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2108/2614 [26:24<06:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2109/2614 [26:25<06:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2110/2614 [26:26<06:19,  1.33it/s]

	Current Loss: 2.4798
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2111/2614 [26:27<06:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2112/2614 [26:27<06:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2113/2614 [26:28<06:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2114/2614 [26:29<06:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2115/2614 [26:30<06:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2116/2614 [26:30<06:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2117/2614 [26:31<06:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2118/2614 [26:32<06:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2119/2614 [26:33<06:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2120/2614 [26:33<06:11,  1.33it/s]

	Current Loss: 2.4759
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2121/2614 [26:34<06:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2122/2614 [26:35<06:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2123/2614 [26:36<06:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2124/2614 [26:36<06:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2125/2614 [26:37<06:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2126/2614 [26:38<06:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2127/2614 [26:39<06:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2128/2614 [26:39<06:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2129/2614 [26:40<06:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2130/2614 [26:41<06:04,  1.33it/s]

	Current Loss: 2.4738
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2131/2614 [26:42<06:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2132/2614 [26:43<06:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2133/2614 [26:43<06:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2134/2614 [26:44<06:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2135/2614 [26:45<06:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2136/2614 [26:46<05:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2137/2614 [26:46<05:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2138/2614 [26:47<05:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2139/2614 [26:48<05:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2140/2614 [26:49<05:56,  1.33it/s]

	Current Loss: 2.4694
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2141/2614 [26:49<05:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2142/2614 [26:50<05:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2143/2614 [26:51<05:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2144/2614 [26:52<05:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2145/2614 [26:52<05:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2146/2614 [26:53<05:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2147/2614 [26:54<05:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2148/2614 [26:55<05:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2149/2614 [26:55<05:55,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2150/2614 [26:56<05:52,  1.32it/s]

	Current Loss: 2.4682
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2151/2614 [26:57<05:50,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2152/2614 [26:58<05:49,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2153/2614 [26:58<05:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2154/2614 [26:59<05:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2155/2614 [27:00<05:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2156/2614 [27:01<05:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2157/2614 [27:01<05:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2158/2614 [27:02<05:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2159/2614 [27:03<05:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2160/2614 [27:04<05:42,  1.33it/s]

	Current Loss: 2.4656
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2161/2614 [27:04<05:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2162/2614 [27:05<05:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2163/2614 [27:06<05:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2164/2614 [27:07<05:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2165/2614 [27:07<05:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2166/2614 [27:08<05:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2167/2614 [27:09<05:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2168/2614 [27:10<05:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2169/2614 [27:10<05:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2170/2614 [27:11<05:33,  1.33it/s]

	Current Loss: 2.4627
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2171/2614 [27:12<05:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2172/2614 [27:13<05:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2173/2614 [27:13<05:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2174/2614 [27:14<05:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2175/2614 [27:15<05:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2176/2614 [27:16<05:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2177/2614 [27:16<05:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2178/2614 [27:17<05:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2179/2614 [27:18<05:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2180/2614 [27:19<05:26,  1.33it/s]

	Current Loss: 2.4644
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2181/2614 [27:19<05:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2182/2614 [27:20<05:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2183/2614 [27:21<05:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2184/2614 [27:22<05:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2185/2614 [27:22<05:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2186/2614 [27:23<05:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2187/2614 [27:24<05:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2188/2614 [27:25<05:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2189/2614 [27:25<05:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2190/2614 [27:26<05:18,  1.33it/s]

	Current Loss: 2.4635
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2191/2614 [27:27<05:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2192/2614 [27:28<05:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2193/2614 [27:28<05:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2194/2614 [27:29<05:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2195/2614 [27:30<05:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2196/2614 [27:31<05:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2197/2614 [27:31<05:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2198/2614 [27:32<05:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2199/2614 [27:33<05:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2200/2614 [27:34<05:11,  1.33it/s]

	Current Loss: 2.4605
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2201/2614 [27:34<05:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2202/2614 [27:35<05:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2203/2614 [27:36<05:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2204/2614 [27:37<05:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2205/2614 [27:37<05:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2206/2614 [27:38<05:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2207/2614 [27:39<05:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2208/2614 [27:40<05:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2209/2614 [27:40<05:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2210/2614 [27:41<05:03,  1.33it/s]

	Current Loss: 2.4591
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2211/2614 [27:42<05:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2212/2614 [27:43<05:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2213/2614 [27:43<05:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2214/2614 [27:44<05:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2215/2614 [27:45<04:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2216/2614 [27:46<04:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2217/2614 [27:46<04:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2218/2614 [27:47<04:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2219/2614 [27:48<04:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2220/2614 [27:49<04:56,  1.33it/s]

	Current Loss: 2.4587
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2221/2614 [27:49<04:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2222/2614 [27:50<04:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2223/2614 [27:51<04:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2224/2614 [27:52<04:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2225/2614 [27:52<04:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2226/2614 [27:53<04:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2227/2614 [27:54<04:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2228/2614 [27:55<04:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2229/2614 [27:55<04:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2230/2614 [27:56<04:48,  1.33it/s]

	Current Loss: 2.4520
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2231/2614 [27:57<04:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2232/2614 [27:58<04:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2233/2614 [27:58<04:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2234/2614 [27:59<04:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2235/2614 [28:00<04:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2236/2614 [28:01<04:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2237/2614 [28:02<04:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2238/2614 [28:02<04:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2239/2614 [28:03<04:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2240/2614 [28:04<04:41,  1.33it/s]

	Current Loss: 2.4443
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2241/2614 [28:05<04:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2242/2614 [28:05<04:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2243/2614 [28:06<04:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2244/2614 [28:07<04:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2245/2614 [28:08<04:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2246/2614 [28:08<04:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2247/2614 [28:09<04:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2248/2614 [28:10<04:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2249/2614 [28:11<04:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2250/2614 [28:11<04:33,  1.33it/s]

	Current Loss: 2.4435
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2251/2614 [28:12<04:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2252/2614 [28:13<04:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2253/2614 [28:14<04:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2254/2614 [28:14<04:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2255/2614 [28:15<04:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2256/2614 [28:16<04:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2257/2614 [28:17<04:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2258/2614 [28:17<04:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2259/2614 [28:18<04:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2260/2614 [28:19<04:26,  1.33it/s]

	Current Loss: 2.4459
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2261/2614 [28:20<04:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2262/2614 [28:20<04:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2263/2614 [28:21<04:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2264/2614 [28:22<04:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2265/2614 [28:23<04:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2266/2614 [28:23<04:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2267/2614 [28:24<04:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2268/2614 [28:25<04:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2269/2614 [28:26<04:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2270/2614 [28:26<04:18,  1.33it/s]

	Current Loss: 2.4406
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2271/2614 [28:27<04:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2272/2614 [28:28<04:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2273/2614 [28:29<04:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2274/2614 [28:29<04:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2275/2614 [28:30<04:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2276/2614 [28:31<04:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2277/2614 [28:32<04:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2278/2614 [28:32<04:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2279/2614 [28:33<04:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2280/2614 [28:34<04:11,  1.33it/s]

	Current Loss: 2.4460
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2281/2614 [28:35<04:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2282/2614 [28:35<04:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2283/2614 [28:36<04:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2284/2614 [28:37<04:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2285/2614 [28:38<04:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2286/2614 [28:38<04:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2287/2614 [28:39<04:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2288/2614 [28:40<04:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2289/2614 [28:41<04:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2290/2614 [28:41<04:03,  1.33it/s]

	Current Loss: 2.4414
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2291/2614 [28:42<04:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2292/2614 [28:43<04:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2293/2614 [28:44<04:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2294/2614 [28:44<04:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2295/2614 [28:45<03:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2296/2614 [28:46<03:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2297/2614 [28:47<03:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2298/2614 [28:47<03:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2299/2614 [28:48<03:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2300/2614 [28:49<03:56,  1.33it/s]

	Current Loss: 2.4432
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2301/2614 [28:50<03:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2302/2614 [28:50<03:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2303/2614 [28:51<03:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2304/2614 [28:52<03:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2305/2614 [28:53<03:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2306/2614 [28:53<03:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2307/2614 [28:54<03:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2308/2614 [28:55<03:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2309/2614 [28:56<03:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2310/2614 [28:56<03:48,  1.33it/s]

	Current Loss: 2.4366
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2311/2614 [28:57<03:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2312/2614 [28:58<03:49,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2313/2614 [28:59<03:47,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2314/2614 [28:59<03:46,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2315/2614 [29:00<03:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2316/2614 [29:01<03:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2317/2614 [29:02<03:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2318/2614 [29:02<03:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2319/2614 [29:03<03:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2320/2614 [29:04<03:41,  1.33it/s]

	Current Loss: 2.4289
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2321/2614 [29:05<03:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2322/2614 [29:05<03:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2323/2614 [29:06<03:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2324/2614 [29:07<03:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2325/2614 [29:08<03:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2326/2614 [29:08<03:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2327/2614 [29:09<03:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2328/2614 [29:10<03:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2329/2614 [29:11<03:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2330/2614 [29:11<03:33,  1.33it/s]

	Current Loss: 2.4316
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2331/2614 [29:12<03:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2332/2614 [29:13<03:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2333/2614 [29:14<03:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2334/2614 [29:14<03:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2335/2614 [29:15<03:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2336/2614 [29:16<03:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2337/2614 [29:17<03:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2338/2614 [29:17<03:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2339/2614 [29:18<03:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2340/2614 [29:19<03:26,  1.33it/s]

	Current Loss: 2.4283
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2341/2614 [29:20<03:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2342/2614 [29:20<03:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2343/2614 [29:21<03:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2344/2614 [29:22<03:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2345/2614 [29:23<03:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2346/2614 [29:23<03:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2347/2614 [29:24<03:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2348/2614 [29:25<03:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2349/2614 [29:26<03:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2350/2614 [29:27<03:18,  1.33it/s]

	Current Loss: 2.4241
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2351/2614 [29:27<03:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2352/2614 [29:28<03:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2353/2614 [29:29<03:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2354/2614 [29:30<03:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2355/2614 [29:30<03:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2356/2614 [29:31<03:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2357/2614 [29:32<03:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2358/2614 [29:33<03:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2359/2614 [29:33<03:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2360/2614 [29:34<03:11,  1.33it/s]

	Current Loss: 2.4239
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2361/2614 [29:35<03:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2362/2614 [29:36<03:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2363/2614 [29:36<03:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2364/2614 [29:37<03:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2365/2614 [29:38<03:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2366/2614 [29:39<03:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2367/2614 [29:39<03:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2368/2614 [29:40<03:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2369/2614 [29:41<03:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2370/2614 [29:42<03:03,  1.33it/s]

	Current Loss: 2.4195
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2371/2614 [29:42<03:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2372/2614 [29:43<03:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2373/2614 [29:44<03:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2374/2614 [29:45<03:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2375/2614 [29:45<02:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2376/2614 [29:46<02:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2377/2614 [29:47<02:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2378/2614 [29:48<02:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2379/2614 [29:48<02:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2380/2614 [29:49<02:55,  1.33it/s]

	Current Loss: 2.4138
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2381/2614 [29:50<02:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2382/2614 [29:51<02:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2383/2614 [29:51<02:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2384/2614 [29:52<02:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2385/2614 [29:53<02:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2386/2614 [29:54<02:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2387/2614 [29:54<02:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2388/2614 [29:55<02:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2389/2614 [29:56<02:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2390/2614 [29:57<02:48,  1.33it/s]

	Current Loss: 2.4173
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2391/2614 [29:57<02:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2392/2614 [29:58<02:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2393/2614 [29:59<02:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2394/2614 [30:00<02:47,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2395/2614 [30:00<02:46,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2396/2614 [30:01<02:44,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2397/2614 [30:02<02:43,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2398/2614 [30:03<02:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2399/2614 [30:03<02:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2400/2614 [30:04<02:41,  1.33it/s]

	Current Loss: 2.4157
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2401/2614 [30:05<02:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2402/2614 [30:06<02:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2403/2614 [30:06<02:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2404/2614 [30:07<02:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2405/2614 [30:08<02:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2406/2614 [30:09<02:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2407/2614 [30:09<02:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2408/2614 [30:10<02:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2409/2614 [30:11<02:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2410/2614 [30:12<02:33,  1.33it/s]

	Current Loss: 2.4095
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2411/2614 [30:12<02:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2412/2614 [30:13<02:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2413/2614 [30:14<02:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2414/2614 [30:15<02:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2415/2614 [30:15<02:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2416/2614 [30:16<02:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2417/2614 [30:17<02:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2418/2614 [30:18<02:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2419/2614 [30:18<02:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2420/2614 [30:19<02:25,  1.33it/s]

	Current Loss: 2.4098
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2421/2614 [30:20<02:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2422/2614 [30:21<02:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2423/2614 [30:21<02:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2424/2614 [30:22<02:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2425/2614 [30:23<02:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2426/2614 [30:24<02:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2427/2614 [30:24<02:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2428/2614 [30:25<02:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2429/2614 [30:26<02:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2430/2614 [30:27<02:18,  1.33it/s]

	Current Loss: 2.4061
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2431/2614 [30:27<02:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2432/2614 [30:28<02:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2433/2614 [30:29<02:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2434/2614 [30:30<02:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2435/2614 [30:30<02:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2436/2614 [30:31<02:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2437/2614 [30:32<02:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2438/2614 [30:33<02:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2439/2614 [30:33<02:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2440/2614 [30:34<02:10,  1.33it/s]

	Current Loss: 2.4048
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2441/2614 [30:35<02:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2442/2614 [30:36<02:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2443/2614 [30:36<02:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2444/2614 [30:37<02:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2445/2614 [30:38<02:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2446/2614 [30:39<02:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2447/2614 [30:39<02:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2448/2614 [30:40<02:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2449/2614 [30:41<02:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2450/2614 [30:42<02:03,  1.33it/s]

	Current Loss: 2.4003
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2451/2614 [30:42<02:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2452/2614 [30:43<02:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2453/2614 [30:44<02:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2454/2614 [30:45<02:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2455/2614 [30:45<01:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2456/2614 [30:46<01:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2457/2614 [30:47<01:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2458/2614 [30:48<01:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2459/2614 [30:48<01:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2460/2614 [30:49<01:55,  1.33it/s]

	Current Loss: 2.3988
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2461/2614 [30:50<01:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2462/2614 [30:51<01:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2463/2614 [30:51<01:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2464/2614 [30:52<01:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2465/2614 [30:53<01:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2466/2614 [30:54<01:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2467/2614 [30:55<01:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2468/2614 [30:55<01:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2469/2614 [30:56<01:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2470/2614 [30:57<01:48,  1.33it/s]

	Current Loss: 2.3964
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2471/2614 [30:58<01:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2472/2614 [30:58<01:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2473/2614 [30:59<01:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2474/2614 [31:00<01:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2475/2614 [31:01<01:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2476/2614 [31:01<01:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2477/2614 [31:02<01:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2478/2614 [31:03<01:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2479/2614 [31:04<01:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2480/2614 [31:04<01:40,  1.33it/s]

	Current Loss: 2.3960
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2481/2614 [31:05<01:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2482/2614 [31:06<01:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2483/2614 [31:07<01:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2484/2614 [31:07<01:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2485/2614 [31:08<01:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2486/2614 [31:09<01:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2487/2614 [31:10<01:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2488/2614 [31:10<01:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2489/2614 [31:11<01:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2490/2614 [31:12<01:33,  1.33it/s]

	Current Loss: 2.3957
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2491/2614 [31:13<01:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2492/2614 [31:13<01:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2493/2614 [31:14<01:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2494/2614 [31:15<01:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2495/2614 [31:16<01:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2496/2614 [31:16<01:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2497/2614 [31:17<01:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2498/2614 [31:18<01:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2499/2614 [31:19<01:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2500/2614 [31:19<01:25,  1.33it/s]

	Current Loss: 2.3865
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2501/2614 [31:20<01:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2502/2614 [31:21<01:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2503/2614 [31:22<01:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2504/2614 [31:22<01:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2505/2614 [31:23<01:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2506/2614 [31:24<01:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2507/2614 [31:25<01:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2508/2614 [31:25<01:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2509/2614 [31:26<01:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2510/2614 [31:27<01:18,  1.33it/s]

	Current Loss: 2.3859
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2511/2614 [31:28<01:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2512/2614 [31:28<01:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2513/2614 [31:29<01:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2514/2614 [31:30<01:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2515/2614 [31:31<01:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2516/2614 [31:31<01:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2517/2614 [31:32<01:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2518/2614 [31:33<01:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2519/2614 [31:34<01:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2520/2614 [31:34<01:10,  1.33it/s]

	Current Loss: 2.3871
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2521/2614 [31:35<01:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2522/2614 [31:36<01:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2523/2614 [31:37<01:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2524/2614 [31:37<01:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2525/2614 [31:38<01:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2526/2614 [31:39<01:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2527/2614 [31:40<01:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2528/2614 [31:40<01:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2529/2614 [31:41<01:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2530/2614 [31:42<01:03,  1.33it/s]

	Current Loss: 2.3868
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2531/2614 [31:43<01:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2532/2614 [31:43<01:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2533/2614 [31:44<01:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2534/2614 [31:45<01:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2535/2614 [31:46<00:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2536/2614 [31:46<00:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2537/2614 [31:47<00:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2538/2614 [31:48<00:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2539/2614 [31:49<00:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2540/2614 [31:49<00:55,  1.33it/s]

	Current Loss: 2.3876
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2541/2614 [31:50<00:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2542/2614 [31:51<00:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2543/2614 [31:52<00:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2544/2614 [31:52<00:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2545/2614 [31:53<00:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2546/2614 [31:54<00:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2547/2614 [31:55<00:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2548/2614 [31:55<00:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2549/2614 [31:56<00:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2550/2614 [31:57<00:48,  1.33it/s]

	Current Loss: 2.3808
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2551/2614 [31:58<00:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2552/2614 [31:58<00:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2553/2614 [31:59<00:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2554/2614 [32:00<00:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2555/2614 [32:01<00:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2556/2614 [32:01<00:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2557/2614 [32:02<00:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2558/2614 [32:03<00:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2559/2614 [32:04<00:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2560/2614 [32:04<00:40,  1.33it/s]

	Current Loss: 2.3803
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2561/2614 [32:05<00:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2562/2614 [32:06<00:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2563/2614 [32:07<00:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2564/2614 [32:07<00:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2565/2614 [32:08<00:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2566/2614 [32:09<00:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2567/2614 [32:10<00:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2568/2614 [32:10<00:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2569/2614 [32:11<00:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2570/2614 [32:12<00:33,  1.33it/s]

	Current Loss: 2.3782
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2571/2614 [32:13<00:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2572/2614 [32:13<00:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2573/2614 [32:14<00:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2574/2614 [32:15<00:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2575/2614 [32:16<00:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2576/2614 [32:16<00:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2577/2614 [32:17<00:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2578/2614 [32:18<00:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2579/2614 [32:19<00:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2580/2614 [32:19<00:25,  1.33it/s]

	Current Loss: 2.3739
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2581/2614 [32:20<00:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2582/2614 [32:21<00:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2583/2614 [32:22<00:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2584/2614 [32:23<00:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2585/2614 [32:23<00:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2586/2614 [32:24<00:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2587/2614 [32:25<00:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2588/2614 [32:26<00:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2589/2614 [32:26<00:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2590/2614 [32:27<00:18,  1.33it/s]

	Current Loss: 2.3742
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2591/2614 [32:28<00:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2592/2614 [32:29<00:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2593/2614 [32:29<00:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2594/2614 [32:30<00:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2595/2614 [32:31<00:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2596/2614 [32:32<00:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2597/2614 [32:32<00:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2598/2614 [32:33<00:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2599/2614 [32:34<00:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2600/2614 [32:35<00:10,  1.33it/s]

	Current Loss: 2.3674
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2601/2614 [32:35<00:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2602/2614 [32:36<00:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2603/2614 [32:37<00:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2604/2614 [32:38<00:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2605/2614 [32:38<00:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2606/2614 [32:39<00:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2607/2614 [32:40<00:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2608/2614 [32:41<00:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2609/2614 [32:41<00:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2610/2614 [32:42<00:03,  1.33it/s]

	Current Loss: 2.3676
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2611/2614 [32:43<00:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2612/2614 [32:44<00:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2613/2614 [32:44<00:00,  1.33it/s]

dec_inputs shape: torch.Size([180, 128])
dec_outputs shape: torch.Size([180, 128])
model output shape: torch.Size([23040, 64])


100%|██████████| 2614/2614 [32:45<00:00,  1.33it/s]
Validating:   6%|▋         | 55/871 [00:15<03:49,  3.56it/s]

Invalid target label values detected! Min: 0, Max: 64
Some labeled data: tensor([[52, 42,  6,  ..., 42,  1, 58],
        [42,  6,  1,  ...,  1, 58, 46],
        [ 6,  1,  4,  ..., 58, 46, 43],
        ...,
        [ 0,  0, 25,  ..., 53, 59, 56],
        [ 0, 25, 27,  ..., 59, 56, 58],
        [25, 27, 26,  ..., 56, 58, 46]], device='cuda:0')





ValueError: Tag out of bounds!The label maximum is 64, and the vocab_size is 64.

In [None]:
def generate_text(model, start_text, length, dataset):
  model.eval()
  generated = start_text
  input_ids = torch.tensor([dataset.stoi[ch] for ch in start_text]).unsqueeze(0).to(device)
  for _ in range(length):
    with torch.no_grad():
      logits, _ = model(input_ids)
      predicted_id = torch.argmax(logits[0, -1]).item()  # Get the predicted next token

    generated += dataset.itos[predicted_id]  # Convert the index to a character
    input_ids = torch.cat([input_ids, torch.tensor([[predicted_id]]).to(device)], dim=1)  # Append the predicted token to input_ids
  return generated

start_text = "O God, O God!"
dataset = test_dataloader.dataset
generate_text(model,start_text, length = 100, dataset=dataset)
print(generated_text)

O God, O God!






































































































In [None]:
def evaluate(model, data_loader, criterion):
  model.eval()
  total_loss = 0
  with torch.no_grad():
    for dec_inputs, dec_outputs in data_loader:
      dec_inputs, dec_outputs = dec_inputs.to(device), dec_outputs.to(device)
      outputs, _ = model(dec_inputs)
      loss = criterion(outputs, dec_outputs.view(-1))
      total_loss += loss.item()
  return total_loss / len(data_loader)

test_loss = evaluate(model, test_dataloader, nn.CrossEntropyLoss(ignore_index=0).to(device))
print("The loss of test data set is:", test_loss)

The loss of test data set is: 5.535172477792242
