# **Miniproject 2**
## **~Large~ Small Language Model**

### **Objective**
Implement a transformer-based, character-level language model (GPT-like) and train it on the Shakespeare dataset. By the end of this project, you should be able to generate Shakespearean-like text given a seed string.

You will probably want to train the model on a GPU. You can use free GPUs on [Google Colab](https://colab.research.google.com/?utm_source=scs-index).

### **Dataset**:

The Shakespeare dataset contains the complete works of William Shakespeare, including his plays, poems, and sonnets.

[**Download link**](https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt)

In a character-level language model, each character in the input data is mapped to its respective index from a dictionary. The input to the model is in the form (B, N), where B is the batch size and N is the number of tokens for each sequence. The model was tested with B=N=128, but feel free to explore different values.

An interface for the dataset class that takes care of tokenization is provided below.



```python
from torch.utils.data import Dataset

class CharDataset(Dataset):
    """
    Emits batches of characters.

    Adapted from "https://github.com/karpathy/minGPT".
    """

    def __init__(self, config, data):

        chars = ... # get characters from the input data
        self.stoi = { ch:i for i,ch in enumerate(chars) } # map characters to integer indices

        ...

    def get_vocab_size(self):
        raise NotImplementedError()

    def __len__(self):
        raise NotImplementedError()

    def __getitem__(self, idx):
        # grab a chunk of (block_size + 1) characters from the data
        # encode every character to an integer
        # return the chunk and the shifted version as tensors
        pass
```




### **Requirements**

#### **Architecture**

Implement the Transformer's decoder-only structure.
This includes

* input token embeddings
* the causal multi-head self-attention mechanism
* feed-forward neural networks
* positional encodings, residual connections, layer normalizations.

The project was tested with $12$ layers, $8$ attention heads, and $768$ embedding dimensions, on a single GPU.

The `forward` method for the entire model has the following form:

```
tok_emb = WTE(idx) # token embeddings
pos_emb = WPE(pos) # position embeddings
x = Dropout(tok_emb + pos_emb)
for Block in Blocks:
    x = Block(x)
x = Final_LayerNorm(x)
logits = LM_Head(x)
```

The `forward` method for the transformer block has the following form:



```
x = x + self.CausalSelfAttn(self.LayerNorm_1(x))
out = x + self.MLP(self.LayerNorm_2(x))
```

---

#### **Training**

In a character-level transformer language model, the goal is to predict the next character in a sequence given the previous characters. To train such a model effectively, we use two versions of our data: the input sequence and a shifted version of this sequence, which serves as the target for our predictions.

Preprocess the dataset to a character-level representation.
Use a sliding window approach for sequence chunks (e.g., window size of $128$ characters).
Implement causal masking for the self-attention mechanism.
Use the [Adam](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html) optimizer and the cross-entropy loss.

**Optional**:

* Implement a learning rate decay strategy
* Implement gradient clipping

---


#### **Evaluation and Inference**

* Monitor the cross-entropy loss. Use a seed string to initialize the model and generate Shakespearean-like text.

* In order to generate the characters, at each generation step you can either select the character with the highest probability, or you can sample according to the output distribution.

The high-level pseudocode for generation is:

```python
model.eval()
with torch.no_grad():
    context = "O God, O God!"
    tokenized_context = tokenize(context)
    # the model should implement a method to generate tokens given a prompt
    y = model.generate(tokenized, ...)
    completion = tokens_to_string(y)
```

**Optional**:
* Compute the [perplexity](https://medium.com/@priyankads/perplexity-of-language-models-41160427ed72#:~:text=Intuitively%2C%20perplexity%20means%20to%20be,loss%20obtained%20from%20the%20model.) metric for quantitative evaluation.

### **Example Outputs**

The following are my outputs after $6000$ steps of training, with the seed string "O God, O God!"



```
O God, O God! neither? unto the base very ears,
As damned with it.

DUKE OF YORK:
Away! Once more, one word.

RICHARD:
Clove, dear so; and therein my son will be
false of woe: if ye seems to be the mother
Of gracious order this time when R going kinsperse eyes,
What dost bewreck her fairer drying tears.

NORTHUMBERLAND:
Have you forgot the Duke of Norfolk, get him to
again; and and agilic: there is my spirit
So maly did must such a marble perfection.

ELBOW:
Come, bring them with oaths, and so deliver
```


### Resources:

* Vaswani et al., "Attention is All You Need": [link](https://arxiv.org/abs/1706.03762)

* Illustrated Transformer by Jay Alammar: [link](https://jalammar.github.io/illustrated-transformer/)

* OpenAI GPT-2 Paper: [link](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)

* Deep Learning Course slides on transformers: [link](https://fleuret.org/dlc/materials/dlc-handout-13-3-transformers.pdf)

In [None]:
import torch
import time
from tqdm import tqdm
from torch.utils.data import Dataset, DataLoader
import torch.nn as nn
import torch.optim as optim
import numpy as np
import math
device = torch.device("cuda")

In [None]:
print(torch.__version__)
if torch.cuda.is_available():
  print("GPU is available")
  print("CUDA version:", torch.version.cuda)
else:
  print("GPU is not available")

2.5.1+cu121
GPU is available
CUDA version: 12.1


In [None]:
class CharDataset(Dataset):
  def __init__(self, config, data):
    # create dictionary of chars
    chars = sorted(list(set(data)))
    self.stoi = {ch: i for i, ch in enumerate(chars)}
    self.itos = {i: ch for i, ch in enumerate(chars)}
    self.vocab_size = len(chars)
    # get data from the text
    self.data = [self.stoi[ch] for ch in data]
    self.block_size = config['block_size']

  def get_vocab_size(self):
    return self.vocab_size

  def __len__(self):
    return len(self.data) - self.block_size

  def __getitem__(self, idx):
    # grab a chunk of (block_size + 1) characters from the data
    chunk = self.data[idx:idx + self.block_size + 1]
    # encode every character to an integer(long)
    # return the chunk and the shifted version as tensors
    x = torch.tensor(chunk[:-1], dtype=torch.long)  # current sequence
    y = torch.tensor(chunk[1:], dtype=torch.long)  # target sequence
    return x, y


In [None]:
class MultiHeadAttention(nn.Module):
  def __init__(self):
    super(MultiHeadAttention, self).__init__()
    self.W_Q = nn.Linear(d_model, d_k * n_heads, bias = False)
    self.W_K = nn.Linear(d_model, d_k * n_heads, bias = False)
    self.W_V = nn.Linear(d_model, d_v * n_heads, bias = False)
    self.fc = nn.Linear(n_heads * d_v, d_model, bias = False)
    self.layer_norm = nn.LayerNorm(d_model)

  def forward(self, input_Q, input_K, input_V, attn_mask):
    residual, batch_size = input_Q, input_Q.size(0)
    Q = self.W_Q(input_Q).view(batch_size, -1, n_heads, d_k).transpose(1, 2)
    K = self.W_K(input_K).view(batch_size, -1, n_heads, d_k).transpose(1, 2)
    V = self.W_V(input_V).view(batch_size, -1, n_heads, d_v).transpose(1, 2)
    attn_mask = attn_mask.unsqueeze(1).repeat(1, n_heads, 1, 1)
    context, attn = ScaledDotProductAttention()(Q, K, V, attn_mask)
    context = context.transpose(1, 2).reshape(batch_size, -1, n_heads * d_v)
    output = self.fc(context)
    return self.layer_norm(output + residual), attn

class PoswiseFeedForwardNet(nn.Module):
  def __init__(self):
    super(PoswiseFeedForwardNet, self).__init__()
    self.fc = nn.Sequential(
      nn.Linear(d_model, d_ff, bias=False),
      nn.ReLU(),
      nn.Linear(d_ff, d_model, bias=False)
    )
    self.layer_norm = nn.LayerNorm(d_model)

  def forward(self, inputs):
    residual = inputs # inputs : [batch_size, len_q, d_model]
    output = self.fc(inputs)
    return self.layer_norm(output + residual)

def get_attn_pad_mask(seq_q, seq_k):
  batch_size, len_q = seq_q.size()
  batch_size, len_k = seq_k.size()
  pad_attn_mask = seq_k.data.eq(0).unsqueeze(1)  # batch_size x 1 x len_k, one is masking
  return pad_attn_mask.expand(batch_size, len_q, len_k)  # batch_size x len_q x len_k

def get_attn_subsequent_mask(seq):
  """
  seq: [batch_size, tgt_len]
  """
  attn_shape = [seq.size(0), seq.size(1), seq.size(1)]
  # attn_shape: [batch_size, tgt_len, tgt_len]
  subsequence_mask = np.triu(np.ones(attn_shape), k=1)
  subsequence_mask = torch.from_numpy(subsequence_mask).byte()
  subsequence_mask = subsequence_mask.to(device)
  return subsequence_mask  # [batch_size, tgt_len, tgt_len]


class ScaledDotProductAttention(nn.Module):
  def __init__(self):
    super(ScaledDotProductAttention, self).__init__()

  def forward(self, Q, K, V, attn_mask):
    scores = torch.matmul(Q, K.transpose(-1, -2)) / math.sqrt(Q.size(-1))
    scores.masked_fill_(attn_mask, -1e9)
    attn = nn.Softmax(dim = -1)(scores)
    context = torch.matmul(attn, V)
    return context, attn


class PositionalEncoding(nn.Module):
  def __init__(self, d_model, dropout=0.1, max_len=5000):
    super(PositionalEncoding, self).__init__()
    self.dropout = nn.Dropout(p=dropout)

    pe = torch.zeros(max_len, d_model)
    position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
    div_term = torch.exp(torch.arange(0, d_model, 2).float() * (-math.log(10000.0) / d_model))
    pe[:, 0::2] = torch.sin(position * div_term)
    pe[:, 1::2] = torch.cos(position * div_term)
    # pe:[max_len*d_model]

    pe = pe.unsqueeze(0).transpose(0, 1)
    # pe：[max_len*1*d_model]
    self.register_buffer('pe', pe)

  def forward(self, x):
    x = x + self.pe[:x.size(0), :] # x: [seq_len, batch_size, d_model]
    return self.dropout(x)

class DecoderLayer(nn.Module):
  def __init__(self):
    super(DecoderLayer, self).__init__()
    self.dec_self_attn = MultiHeadAttention()
    self.pos_ffn = PoswiseFeedForwardNet()

  def forward(self, dec_inputs, dec_self_attn_mask):
    dec_outputs, dec_self_attn = self.dec_self_attn(dec_inputs, dec_inputs, dec_inputs, attn_mask=dec_self_attn_mask)
    dec_outputs = self.pos_ffn(dec_outputs)
    return dec_outputs, dec_self_attn

class Decoder(nn.Module):
  def __init__(self):
    super(Decoder, self).__init__()
    self.tgt_emb = nn.Embedding(vocab_size, d_model)
    self.pos_emb = PositionalEncoding(d_model)
    self.layers = nn.ModuleList([DecoderLayer() for _ in range(n_layers)])
    self.final_layer_norm = nn.LayerNorm(d_model)

  def forward(self, dec_inputs): # dec_inputs : [batch_size x target_len]
    dec_outputs = self.tgt_emb(dec_inputs)  #dec_outputs  [batch_size, tgt_len, d_model]
    dec_outputs = self.pos_emb(dec_outputs.transpose(0, 1)).transpose(0, 1) # [batch_size, tgt_len, d_model]

    dec_self_attn_pad_mask = get_attn_pad_mask(dec_inputs, dec_inputs)
    dec_self_attn_subsequent_mask = get_attn_subsequent_mask(dec_inputs)

    ## When two matrices are added together, those greater than 0 are 1, those less than 0 are 0, and those with 1 are then filled to infinity
    dec_self_attn_mask = torch.gt((dec_self_attn_pad_mask + dec_self_attn_subsequent_mask), 0)

    dec_self_attns = []
    for layer in self.layers:
      dec_outputs, dec_self_attn = layer(dec_outputs, dec_self_attn_mask)
      dec_self_attns.append(dec_self_attn)
    dec_outputs = self.final_layer_norm(dec_outputs)
    return dec_outputs, dec_self_attns

class GPT(nn.Module):
  def __init__(self):
    super(GPT, self).__init__()
    self.decoder = Decoder()
    self.projection = nn.Linear(d_model, vocab_size, bias=False)
    # Initialize weights
    self.apply(self._init_weights)
  def _init_weights(self, module):
    if isinstance(module, nn.Linear):
      nn.init.normal_(module.weight, mean=0, std=0.02)
      if module.bias is not None:
        nn.init.zeros_(module.bias)
    elif isinstance(module, nn.Embedding):
      nn.init.normal_(module.weight, mean=0, std=0.02)
  def forward(self, dec_inputs):
    dec_outputs, dec_self_attns = self.decoder(dec_inputs) # [batch_size, tgt_len, d_model]
    dec_logits = self.projection(dec_outputs)  # [batch_size, tgt_len, vocab_size]
    return dec_logits.view(-1, dec_logits.size(-1)), dec_self_attns

In [None]:
def check_labels(dec_outputs):
  if (dec_outputs < 0).any() or (dec_outputs >= vocab_size).any():
    print(f"Invalid target label values detected! Min: {dec_outputs.min()}, Max: {dec_outputs.max()}")
    # Print part of the label data for easy debugging
    print(f"Some labeled data: {dec_outputs[:10]}")
    raise ValueError(f"Tag out of bounds!The label maximum is {dec_outputs.max()}, and the vocab_size is {vocab_size}.")

def train_step(model,data_loader,optimizer,criterion,clip=1,print_every=None):
  model.train()
  if print_every == 0:
    print_every = 1
  print_loss_total = 0
  epoch_loss = 0
  for i, (dec_inputs, dec_outputs) in enumerate(tqdm(data_loader)):
    optimizer.zero_grad()
    dec_inputs, dec_outputs =dec_inputs.to(device), dec_outputs.to(device)
    print(f"dec_inputs shape: {dec_inputs.shape}")
    print(f"dec_outputs shape: {dec_outputs.shape}")
    # Check whether the target label is out of bounds
    check_labels(dec_outputs)
    # outputs: [batch_size * tgt_len, tgt_vocab_size]
    outputs, dec_self_attns = model(dec_inputs)
    print(f"model output shape: {outputs.shape}")
    loss = criterion(outputs, dec_outputs.view(-1))
    print_loss_total += loss.item()
    epoch_loss += loss.item()
    loss.backward()
    # gradient clipping
    torch.nn.utils.clip_grad_norm_(model.parameters(), clip)
    optimizer.step()

    if print_every and (i + 1) % print_every == 0:
      print_loss_avg = print_loss_total / print_every
      print_loss_total = 0
      print('\tCurrent Loss: %.4f' % print_loss_avg)
  return epoch_loss / len(data_loader)

def train(model, train_loader, epochs):
  criterion = nn.CrossEntropyLoss(ignore_index=0).to(device)
  optimizer = optim.Adam(model.parameters(), lr=1e-5)
  for epoch in range(epochs):
    start_time = time.time()
    train_loss = train_step(model, train_loader, optimizer, criterion, CLIP, print_every=10)
    end_time = time.time()
    checkpoint_path = f'checkpoints_epoch_{epoch}.pt'
    torch.save(model.state_dict(), checkpoint_path)
    print(f"Epoch {epoch}, Train Loss: {train_loss:.4f}, Time: {end_time - start_time:.2f}s")

In [None]:
def load_data(file_path, config, batch_size=256):
  with open(file_path, 'r') as f:
    text_data = f.read()
  # Split data into train (60%), validation (20%), and test (20%)
  train_split = int(len(text_data) * 0.6)
  val_split = int(len(text_data) * 0.8)
  train_data = text_data[:train_split]
  val_data = text_data[train_split:val_split]
  test_data = text_data[val_split:]
  train_dataset = CharDataset(config, train_data)
  stoi = train_dataset.stoi
  vocab_size = train_dataset.get_vocab_size()
  val_dataset = CharDataset(config, val_data)
  test_dataset = CharDataset(config, test_data)
  # Create DataLoaders
  train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, drop_last=True)
  val_dataloader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, drop_last=True)
  test_dataloader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False, drop_last=True)
  print(f"Vocabulary size: {train_dataset.get_vocab_size()}")
  print(f"Train data length: {len(train_data)}")
  print(f"Validation data length: {len(val_data)}")
  return train_dataloader, val_dataloader, test_dataloader

In [None]:
if __name__ == '__main__':
  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
  # Configurations
  config = {'block_size': 128}
  # Load data
  train_dataloader, val_dataloader, test_dataloader = load_data('input.txt', config)
  # Model parameters
  d_model = 768  # Embedding Size
  d_ff = 2048  # FeedForward dimension
  n_layers = 12  # Number of Decoder Layers
  n_heads = 8  # Number of heads in Multi-Head Attention
  CLIP = 1
  d_k = d_v = 64
  epochs = 5
  vocab_size = train_dataloader.dataset.vocab_size
  # Initialize model
  model = GPT().to(device)
  # Train model (includes validation)
  train(model, train_dataloader, epochs)

Vocabulary size: 64
Train data length: 669236
Validation data length: 223079


  0%|          | 0/2613 [00:00<?, ?it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 1/2613 [00:02<1:39:48,  2.29s/it]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 2/2613 [00:02<48:36,  1.12s/it]  

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 3/2613 [00:03<41:21,  1.05it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 4/2613 [00:04<37:55,  1.15it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 5/2613 [00:04<36:02,  1.21it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 6/2613 [00:05<34:53,  1.25it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 7/2613 [00:06<34:11,  1.27it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 8/2613 [00:07<33:40,  1.29it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 9/2613 [00:07<33:19,  1.30it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 10/2613 [00:08<33:04,  1.31it/s]

	Current Loss: 3.7241
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 11/2613 [00:09<32:54,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 12/2613 [00:10<32:47,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 13/2613 [00:10<32:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 14/2613 [00:11<32:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 15/2613 [00:12<32:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 16/2613 [00:13<32:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 17/2613 [00:13<32:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 18/2613 [00:14<32:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 19/2613 [00:15<32:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 20/2613 [00:16<32:28,  1.33it/s]

	Current Loss: 3.3132
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 21/2613 [00:16<32:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 22/2613 [00:17<32:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 23/2613 [00:18<32:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 24/2613 [00:19<32:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 25/2613 [00:19<32:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 26/2613 [00:20<32:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 27/2613 [00:21<32:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 28/2613 [00:22<32:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 29/2613 [00:22<32:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 30/2613 [00:23<32:21,  1.33it/s]

	Current Loss: 3.2863
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 31/2613 [00:24<32:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 32/2613 [00:25<32:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 33/2613 [00:25<32:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 34/2613 [00:26<32:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 35/2613 [00:27<32:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 36/2613 [00:28<32:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 37/2613 [00:28<32:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 38/2613 [00:29<32:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 39/2613 [00:30<32:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 40/2613 [00:31<32:13,  1.33it/s]

	Current Loss: 3.2864
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 41/2613 [00:31<32:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 42/2613 [00:32<32:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 43/2613 [00:33<32:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 44/2613 [00:34<32:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 45/2613 [00:34<32:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 46/2613 [00:35<32:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 47/2613 [00:36<32:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 48/2613 [00:37<32:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 49/2613 [00:37<32:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 50/2613 [00:38<32:05,  1.33it/s]

	Current Loss: 3.2726
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 51/2613 [00:39<32:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 52/2613 [00:40<32:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 53/2613 [00:40<32:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 54/2613 [00:41<32:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 55/2613 [00:42<32:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 56/2613 [00:43<32:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 57/2613 [00:43<31:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 58/2613 [00:44<31:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 59/2613 [00:45<31:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 60/2613 [00:46<31:58,  1.33it/s]

	Current Loss: 3.2826
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 61/2613 [00:46<31:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 62/2613 [00:47<31:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 63/2613 [00:48<31:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 64/2613 [00:49<31:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 65/2613 [00:49<31:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 66/2613 [00:50<31:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 67/2613 [00:51<31:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 68/2613 [00:52<31:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 69/2613 [00:52<31:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 70/2613 [00:53<31:51,  1.33it/s]

	Current Loss: 3.2733
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 71/2613 [00:54<31:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 72/2613 [00:55<31:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 73/2613 [00:55<31:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 74/2613 [00:56<31:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 75/2613 [00:57<31:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 76/2613 [00:58<31:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 77/2613 [00:58<31:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 78/2613 [00:59<31:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 79/2613 [01:00<31:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 80/2613 [01:01<31:43,  1.33it/s]

	Current Loss: 3.2844
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 81/2613 [01:01<31:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 82/2613 [01:02<31:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 83/2613 [01:03<31:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 84/2613 [01:04<31:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 85/2613 [01:04<31:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 86/2613 [01:05<31:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 87/2613 [01:06<31:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 88/2613 [01:07<31:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 89/2613 [01:07<31:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 90/2613 [01:08<31:36,  1.33it/s]

	Current Loss: 3.2782
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 91/2613 [01:09<31:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 92/2613 [01:10<31:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 93/2613 [01:10<31:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 94/2613 [01:11<31:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 95/2613 [01:12<31:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 96/2613 [01:13<31:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 97/2613 [01:13<31:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 98/2613 [01:14<31:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 99/2613 [01:15<31:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 100/2613 [01:16<31:28,  1.33it/s]

	Current Loss: 3.2707
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 101/2613 [01:16<31:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 102/2613 [01:17<31:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 103/2613 [01:18<31:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 104/2613 [01:19<31:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 105/2613 [01:19<31:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 106/2613 [01:20<31:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 107/2613 [01:21<31:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 108/2613 [01:22<31:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 109/2613 [01:23<31:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 110/2613 [01:23<31:20,  1.33it/s]

	Current Loss: 3.2678
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 111/2613 [01:24<31:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 112/2613 [01:25<31:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 113/2613 [01:26<31:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 114/2613 [01:26<31:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 115/2613 [01:27<31:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 116/2613 [01:28<31:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 117/2613 [01:29<31:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 118/2613 [01:29<31:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 119/2613 [01:30<31:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 120/2613 [01:31<31:13,  1.33it/s]

	Current Loss: 3.2683
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 121/2613 [01:32<31:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 122/2613 [01:32<31:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 123/2613 [01:33<31:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 124/2613 [01:34<31:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 125/2613 [01:35<31:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 126/2613 [01:35<31:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 127/2613 [01:36<31:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 128/2613 [01:37<31:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 129/2613 [01:38<31:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 130/2613 [01:38<31:06,  1.33it/s]

	Current Loss: 3.2685
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 131/2613 [01:39<31:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 132/2613 [01:40<31:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 133/2613 [01:41<31:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 134/2613 [01:41<31:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 135/2613 [01:42<31:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 136/2613 [01:43<31:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 137/2613 [01:44<30:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 138/2613 [01:44<31:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 139/2613 [01:45<30:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 140/2613 [01:46<30:58,  1.33it/s]

	Current Loss: 3.2705
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 141/2613 [01:47<30:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 142/2613 [01:47<30:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 143/2613 [01:48<30:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 144/2613 [01:49<30:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 145/2613 [01:50<30:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 146/2613 [01:50<30:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 147/2613 [01:51<30:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 148/2613 [01:52<30:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 149/2613 [01:53<30:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 150/2613 [01:53<30:50,  1.33it/s]

	Current Loss: 3.2564
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 151/2613 [01:54<30:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 152/2613 [01:55<30:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 153/2613 [01:56<30:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 154/2613 [01:56<30:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 155/2613 [01:57<30:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 156/2613 [01:58<30:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 157/2613 [01:59<30:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 158/2613 [01:59<30:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 159/2613 [02:00<30:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 160/2613 [02:01<30:42,  1.33it/s]

	Current Loss: 3.2483
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 161/2613 [02:02<30:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 162/2613 [02:02<30:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 163/2613 [02:03<30:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 164/2613 [02:04<30:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 165/2613 [02:05<30:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 166/2613 [02:05<30:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 167/2613 [02:06<30:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 168/2613 [02:07<30:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 169/2613 [02:08<30:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 170/2613 [02:08<30:35,  1.33it/s]

	Current Loss: 3.2500
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 171/2613 [02:09<30:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 172/2613 [02:10<30:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 173/2613 [02:11<30:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 174/2613 [02:11<30:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 175/2613 [02:12<30:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 176/2613 [02:13<30:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 177/2613 [02:14<30:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 178/2613 [02:14<30:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 179/2613 [02:15<30:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 180/2613 [02:16<30:27,  1.33it/s]

	Current Loss: 3.2455
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 181/2613 [02:17<30:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 182/2613 [02:17<30:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 183/2613 [02:18<30:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 184/2613 [02:19<30:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 185/2613 [02:20<30:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 186/2613 [02:20<30:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 187/2613 [02:21<30:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 188/2613 [02:22<30:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 189/2613 [02:23<30:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 190/2613 [02:23<30:21,  1.33it/s]

	Current Loss: 3.2280
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 191/2613 [02:24<30:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 192/2613 [02:25<30:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 193/2613 [02:26<30:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 194/2613 [02:26<30:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 195/2613 [02:27<30:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 196/2613 [02:28<30:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 197/2613 [02:29<30:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 198/2613 [02:29<30:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 199/2613 [02:30<30:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 200/2613 [02:31<30:13,  1.33it/s]

	Current Loss: 3.2230
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 201/2613 [02:32<30:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 202/2613 [02:32<30:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 203/2613 [02:33<30:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 204/2613 [02:34<30:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 205/2613 [02:35<30:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 206/2613 [02:35<30:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 207/2613 [02:36<30:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 208/2613 [02:37<30:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 209/2613 [02:38<30:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 210/2613 [02:38<30:04,  1.33it/s]

	Current Loss: 3.2099
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 211/2613 [02:39<30:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 212/2613 [02:40<30:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 213/2613 [02:41<30:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 214/2613 [02:41<30:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 215/2613 [02:42<29:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 216/2613 [02:43<29:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 217/2613 [02:44<29:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 218/2613 [02:44<29:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 219/2613 [02:45<29:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 220/2613 [02:46<29:58,  1.33it/s]

	Current Loss: 3.2110
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 221/2613 [02:47<29:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 222/2613 [02:47<29:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 223/2613 [02:48<29:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 224/2613 [02:49<29:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 225/2613 [02:50<29:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 226/2613 [02:50<29:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 227/2613 [02:51<29:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 228/2613 [02:52<29:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 229/2613 [02:53<29:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 230/2613 [02:53<29:49,  1.33it/s]

	Current Loss: 3.1996
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 231/2613 [02:54<29:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 232/2613 [02:55<29:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 233/2613 [02:56<29:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 234/2613 [02:56<29:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 235/2613 [02:57<29:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 236/2613 [02:58<29:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 237/2613 [02:59<29:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 238/2613 [02:59<29:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 239/2613 [03:00<29:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 240/2613 [03:01<29:42,  1.33it/s]

	Current Loss: 3.1955
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 241/2613 [03:02<29:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 242/2613 [03:02<29:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 243/2613 [03:03<29:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 244/2613 [03:04<29:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 245/2613 [03:05<29:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 246/2613 [03:05<29:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 247/2613 [03:06<29:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 248/2613 [03:07<29:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 249/2613 [03:08<29:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 250/2613 [03:08<29:34,  1.33it/s]

	Current Loss: 3.1850
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 251/2613 [03:09<29:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 252/2613 [03:10<29:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 253/2613 [03:11<29:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 254/2613 [03:11<29:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 255/2613 [03:12<29:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 256/2613 [03:13<29:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 257/2613 [03:14<29:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 258/2613 [03:14<29:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 259/2613 [03:15<29:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 260/2613 [03:16<29:28,  1.33it/s]

	Current Loss: 3.1821
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 261/2613 [03:17<29:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 262/2613 [03:17<29:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 263/2613 [03:18<29:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 264/2613 [03:19<29:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 265/2613 [03:20<29:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 266/2613 [03:20<29:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 267/2613 [03:21<29:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 268/2613 [03:22<29:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 269/2613 [03:23<29:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 270/2613 [03:23<29:19,  1.33it/s]

	Current Loss: 3.1683
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 271/2613 [03:24<29:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 272/2613 [03:25<29:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 273/2613 [03:26<29:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 274/2613 [03:26<29:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 275/2613 [03:27<29:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 276/2613 [03:28<29:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 277/2613 [03:29<29:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 278/2613 [03:29<29:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 279/2613 [03:30<29:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 280/2613 [03:31<29:12,  1.33it/s]

	Current Loss: 3.1659
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 281/2613 [03:32<29:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 282/2613 [03:32<29:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 283/2613 [03:33<29:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 284/2613 [03:34<29:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 285/2613 [03:35<29:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 286/2613 [03:35<29:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 287/2613 [03:36<29:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 288/2613 [03:37<29:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 289/2613 [03:38<29:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 290/2613 [03:38<29:04,  1.33it/s]

	Current Loss: 3.1602
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 291/2613 [03:39<29:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 292/2613 [03:40<29:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 293/2613 [03:41<29:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 294/2613 [03:41<29:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 295/2613 [03:42<29:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 296/2613 [03:43<29:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 297/2613 [03:44<29:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 298/2613 [03:44<28:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 299/2613 [03:45<28:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 300/2613 [03:46<28:57,  1.33it/s]

	Current Loss: 3.1531
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 301/2613 [03:47<28:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 302/2613 [03:47<28:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 303/2613 [03:48<28:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 304/2613 [03:49<28:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 305/2613 [03:50<28:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 306/2613 [03:50<28:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 307/2613 [03:51<28:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 308/2613 [03:52<28:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 309/2613 [03:53<28:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 310/2613 [03:53<28:49,  1.33it/s]

	Current Loss: 3.1450
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 311/2613 [03:54<28:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 312/2613 [03:55<28:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 313/2613 [03:56<28:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 314/2613 [03:57<28:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 315/2613 [03:57<28:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 316/2613 [03:58<28:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 317/2613 [03:59<28:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 318/2613 [04:00<28:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 319/2613 [04:00<28:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 320/2613 [04:01<28:42,  1.33it/s]

	Current Loss: 3.1319
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 321/2613 [04:02<28:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 322/2613 [04:03<28:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 323/2613 [04:03<28:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 324/2613 [04:04<28:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 325/2613 [04:05<28:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 326/2613 [04:06<28:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 327/2613 [04:06<28:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 328/2613 [04:07<28:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 329/2613 [04:08<28:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 330/2613 [04:09<28:34,  1.33it/s]

	Current Loss: 3.1271
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 331/2613 [04:09<28:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 332/2613 [04:10<28:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 333/2613 [04:11<28:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 334/2613 [04:12<28:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 335/2613 [04:12<28:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 336/2613 [04:13<28:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 337/2613 [04:14<28:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 338/2613 [04:15<28:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 339/2613 [04:15<28:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 340/2613 [04:16<28:26,  1.33it/s]

	Current Loss: 3.1206
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 341/2613 [04:17<28:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 342/2613 [04:18<28:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 343/2613 [04:18<28:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 344/2613 [04:19<28:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 345/2613 [04:20<28:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 346/2613 [04:21<28:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 347/2613 [04:21<28:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 348/2613 [04:22<28:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 349/2613 [04:23<28:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 350/2613 [04:24<28:18,  1.33it/s]

	Current Loss: 3.1238
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 351/2613 [04:24<28:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 352/2613 [04:25<28:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 353/2613 [04:26<28:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 354/2613 [04:27<28:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 355/2613 [04:27<28:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 356/2613 [04:28<28:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 357/2613 [04:29<28:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 358/2613 [04:30<28:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 359/2613 [04:30<28:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 360/2613 [04:31<28:12,  1.33it/s]

	Current Loss: 3.1046
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 361/2613 [04:32<28:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 362/2613 [04:33<28:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 363/2613 [04:33<28:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 364/2613 [04:34<28:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 365/2613 [04:35<28:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 366/2613 [04:36<28:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 367/2613 [04:36<28:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 368/2613 [04:37<28:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 369/2613 [04:38<28:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 370/2613 [04:39<28:04,  1.33it/s]

	Current Loss: 3.1021
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 371/2613 [04:39<28:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 372/2613 [04:40<28:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 373/2613 [04:41<28:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 374/2613 [04:42<28:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 375/2613 [04:42<28:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 376/2613 [04:43<27:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 377/2613 [04:44<27:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 378/2613 [04:45<27:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 379/2613 [04:45<27:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 380/2613 [04:46<27:56,  1.33it/s]

	Current Loss: 3.0852
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 381/2613 [04:47<27:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 382/2613 [04:48<27:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 383/2613 [04:48<27:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 384/2613 [04:49<27:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 385/2613 [04:50<27:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 386/2613 [04:51<27:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 387/2613 [04:51<27:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 388/2613 [04:52<27:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 389/2613 [04:53<27:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 390/2613 [04:54<27:50,  1.33it/s]

	Current Loss: 3.0915
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 391/2613 [04:54<27:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 392/2613 [04:55<27:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 393/2613 [04:56<27:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 394/2613 [04:57<27:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 395/2613 [04:57<27:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 396/2613 [04:58<27:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 397/2613 [04:59<27:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 398/2613 [05:00<27:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 399/2613 [05:00<27:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 400/2613 [05:01<27:41,  1.33it/s]

	Current Loss: 3.0744
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 401/2613 [05:02<27:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 402/2613 [05:03<27:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 403/2613 [05:03<27:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 404/2613 [05:04<27:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 405/2613 [05:05<27:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 406/2613 [05:06<27:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 407/2613 [05:06<27:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 408/2613 [05:07<27:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 409/2613 [05:08<27:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 410/2613 [05:09<27:34,  1.33it/s]

	Current Loss: 3.0695
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 411/2613 [05:09<27:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 412/2613 [05:10<27:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 413/2613 [05:11<27:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 414/2613 [05:12<27:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 415/2613 [05:12<27:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 416/2613 [05:13<27:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 417/2613 [05:14<27:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 418/2613 [05:15<27:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 419/2613 [05:15<27:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 420/2613 [05:16<27:27,  1.33it/s]

	Current Loss: 3.0652
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 421/2613 [05:17<27:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 422/2613 [05:18<27:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 423/2613 [05:18<27:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 424/2613 [05:19<27:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 425/2613 [05:20<27:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 426/2613 [05:21<27:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 427/2613 [05:21<27:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 428/2613 [05:22<27:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 429/2613 [05:23<27:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 430/2613 [05:24<27:20,  1.33it/s]

	Current Loss: 3.0553
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 431/2613 [05:24<27:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 432/2613 [05:25<27:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 433/2613 [05:26<27:46,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 434/2613 [05:27<27:37,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 435/2613 [05:27<27:30,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 436/2613 [05:28<27:24,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 437/2613 [05:29<27:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 438/2613 [05:30<27:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 439/2613 [05:30<27:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 440/2613 [05:31<27:15,  1.33it/s]

	Current Loss: 3.0592
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 441/2613 [05:32<27:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 442/2613 [05:33<27:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 443/2613 [05:33<27:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 444/2613 [05:34<27:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 445/2613 [05:35<27:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 446/2613 [05:36<27:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 447/2613 [05:36<27:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 448/2613 [05:37<27:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 449/2613 [05:38<27:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 450/2613 [05:39<27:05,  1.33it/s]

	Current Loss: 3.0477
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 451/2613 [05:39<27:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 452/2613 [05:40<27:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 453/2613 [05:41<27:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 454/2613 [05:42<27:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 455/2613 [05:42<27:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 456/2613 [05:43<27:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 457/2613 [05:44<26:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 458/2613 [05:45<26:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 459/2613 [05:45<26:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 460/2613 [05:46<26:57,  1.33it/s]

	Current Loss: 3.0490
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 461/2613 [05:47<26:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 462/2613 [05:48<26:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 463/2613 [05:48<26:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 464/2613 [05:49<26:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 465/2613 [05:50<26:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 466/2613 [05:51<26:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 467/2613 [05:51<26:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 468/2613 [05:52<26:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 469/2613 [05:53<26:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 470/2613 [05:54<26:50,  1.33it/s]

	Current Loss: 3.0340
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 471/2613 [05:54<26:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 472/2613 [05:55<26:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 473/2613 [05:56<26:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 474/2613 [05:57<26:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 475/2613 [05:57<26:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 476/2613 [05:58<26:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 477/2613 [05:59<26:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 478/2613 [06:00<26:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 479/2613 [06:00<26:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 480/2613 [06:01<26:43,  1.33it/s]

	Current Loss: 3.0296
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 481/2613 [06:02<26:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 482/2613 [06:03<26:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 483/2613 [06:03<26:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 484/2613 [06:04<26:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 485/2613 [06:05<26:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 486/2613 [06:06<26:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 487/2613 [06:06<26:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 488/2613 [06:07<26:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 489/2613 [06:08<26:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 490/2613 [06:09<26:34,  1.33it/s]

	Current Loss: 3.0272
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 491/2613 [06:09<26:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 492/2613 [06:10<26:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 493/2613 [06:11<26:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 494/2613 [06:12<26:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 495/2613 [06:12<26:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 496/2613 [06:13<26:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 497/2613 [06:14<26:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 498/2613 [06:15<26:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 499/2613 [06:16<26:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 500/2613 [06:16<26:26,  1.33it/s]

	Current Loss: 3.0223
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 501/2613 [06:17<26:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 502/2613 [06:18<26:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 503/2613 [06:19<26:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 504/2613 [06:19<26:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 505/2613 [06:20<26:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 506/2613 [06:21<26:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 507/2613 [06:22<26:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 508/2613 [06:22<26:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 509/2613 [06:23<26:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 510/2613 [06:24<26:19,  1.33it/s]

	Current Loss: 3.0086
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 511/2613 [06:25<26:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 512/2613 [06:25<26:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 513/2613 [06:26<26:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 514/2613 [06:27<26:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 515/2613 [06:28<26:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 516/2613 [06:28<26:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 517/2613 [06:29<26:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 518/2613 [06:30<26:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 519/2613 [06:31<26:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 520/2613 [06:31<26:13,  1.33it/s]

	Current Loss: 3.0084
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 521/2613 [06:32<26:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 522/2613 [06:33<26:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 523/2613 [06:34<26:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 524/2613 [06:34<26:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 525/2613 [06:35<26:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 526/2613 [06:36<26:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 527/2613 [06:37<26:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 528/2613 [06:37<26:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 529/2613 [06:38<26:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 530/2613 [06:39<26:05,  1.33it/s]

	Current Loss: 3.0024
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 531/2613 [06:40<26:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 532/2613 [06:40<26:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 533/2613 [06:41<26:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 534/2613 [06:42<26:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 535/2613 [06:43<26:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 536/2613 [06:43<26:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 537/2613 [06:44<26:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 538/2613 [06:45<25:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 539/2613 [06:46<25:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 540/2613 [06:46<25:57,  1.33it/s]

	Current Loss: 2.9986
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 541/2613 [06:47<25:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 542/2613 [06:48<25:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 543/2613 [06:49<25:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 544/2613 [06:49<25:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 545/2613 [06:50<25:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 546/2613 [06:51<25:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 547/2613 [06:52<25:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 548/2613 [06:52<25:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 549/2613 [06:53<25:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 550/2613 [06:54<25:49,  1.33it/s]

	Current Loss: 2.9863
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 551/2613 [06:55<25:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 552/2613 [06:55<25:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 553/2613 [06:56<25:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 554/2613 [06:57<25:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 555/2613 [06:58<25:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 556/2613 [06:58<25:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 557/2613 [06:59<25:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 558/2613 [07:00<25:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 559/2613 [07:01<25:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 560/2613 [07:01<25:42,  1.33it/s]

	Current Loss: 2.9818
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 561/2613 [07:02<25:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 562/2613 [07:03<25:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 563/2613 [07:04<25:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 564/2613 [07:04<25:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 565/2613 [07:05<25:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 566/2613 [07:06<25:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 567/2613 [07:07<25:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 568/2613 [07:07<25:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 569/2613 [07:08<25:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 570/2613 [07:09<25:35,  1.33it/s]

	Current Loss: 2.9850
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 571/2613 [07:10<25:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 572/2613 [07:10<25:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 573/2613 [07:11<25:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 574/2613 [07:12<25:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 575/2613 [07:13<25:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 576/2613 [07:13<25:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 577/2613 [07:14<25:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 578/2613 [07:15<25:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 579/2613 [07:16<25:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 580/2613 [07:16<25:29,  1.33it/s]

	Current Loss: 2.9771
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 581/2613 [07:17<25:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 582/2613 [07:18<25:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 583/2613 [07:19<25:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 584/2613 [07:19<25:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 585/2613 [07:20<25:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 586/2613 [07:21<25:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 587/2613 [07:22<25:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 588/2613 [07:22<25:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 589/2613 [07:23<25:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 590/2613 [07:24<25:20,  1.33it/s]

	Current Loss: 2.9628
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 591/2613 [07:25<25:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 592/2613 [07:25<25:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 593/2613 [07:26<25:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 594/2613 [07:27<25:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 595/2613 [07:28<25:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 596/2613 [07:28<25:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 597/2613 [07:29<25:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 598/2613 [07:30<25:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 599/2613 [07:31<25:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 600/2613 [07:31<25:11,  1.33it/s]

	Current Loss: 2.9609
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 601/2613 [07:32<25:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 602/2613 [07:33<25:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 603/2613 [07:34<25:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 604/2613 [07:34<25:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 605/2613 [07:35<25:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 606/2613 [07:36<25:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 607/2613 [07:37<25:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 608/2613 [07:37<25:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 609/2613 [07:38<25:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 610/2613 [07:39<25:04,  1.33it/s]

	Current Loss: 2.9497
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 611/2613 [07:40<25:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 612/2613 [07:40<25:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 613/2613 [07:41<25:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 614/2613 [07:42<25:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 615/2613 [07:43<25:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 616/2613 [07:43<25:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 617/2613 [07:44<24:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 618/2613 [07:45<24:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 619/2613 [07:46<24:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 620/2613 [07:46<24:57,  1.33it/s]

	Current Loss: 2.9496
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 621/2613 [07:47<24:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 622/2613 [07:48<24:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 623/2613 [07:49<24:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 624/2613 [07:49<24:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 625/2613 [07:50<24:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 626/2613 [07:51<24:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 627/2613 [07:52<24:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 628/2613 [07:52<24:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 629/2613 [07:53<24:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 630/2613 [07:54<24:49,  1.33it/s]

	Current Loss: 2.9353
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 631/2613 [07:55<24:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 632/2613 [07:55<24:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 633/2613 [07:56<24:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 634/2613 [07:57<24:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 635/2613 [07:58<24:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 636/2613 [07:58<24:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 637/2613 [07:59<24:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 638/2613 [08:00<24:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 639/2613 [08:01<24:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 640/2613 [08:01<24:41,  1.33it/s]

	Current Loss: 2.9305
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 641/2613 [08:02<24:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 642/2613 [08:03<24:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 643/2613 [08:04<24:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 644/2613 [08:04<24:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 645/2613 [08:05<24:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 646/2613 [08:06<24:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 647/2613 [08:07<24:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 648/2613 [08:07<24:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 649/2613 [08:08<24:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 650/2613 [08:09<24:34,  1.33it/s]

	Current Loss: 2.9331
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 651/2613 [08:10<24:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 652/2613 [08:10<24:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 653/2613 [08:11<24:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 654/2613 [08:12<24:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 655/2613 [08:13<24:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 656/2613 [08:13<24:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 657/2613 [08:14<24:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 658/2613 [08:15<24:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 659/2613 [08:16<24:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 660/2613 [08:16<24:28,  1.33it/s]

	Current Loss: 2.9243
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 661/2613 [08:17<24:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 662/2613 [08:18<24:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 663/2613 [08:19<24:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 664/2613 [08:19<24:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 665/2613 [08:20<24:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 666/2613 [08:21<24:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 667/2613 [08:22<24:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 668/2613 [08:22<24:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 669/2613 [08:23<24:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 670/2613 [08:24<24:20,  1.33it/s]

	Current Loss: 2.9191
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 671/2613 [08:25<24:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 672/2613 [08:25<24:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 673/2613 [08:26<24:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 674/2613 [08:27<24:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 675/2613 [08:28<24:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 676/2613 [08:29<24:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 677/2613 [08:29<24:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 678/2613 [08:30<24:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 679/2613 [08:31<24:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 680/2613 [08:32<24:12,  1.33it/s]

	Current Loss: 2.9078
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 681/2613 [08:32<24:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 682/2613 [08:33<24:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 683/2613 [08:34<24:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 684/2613 [08:35<24:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 685/2613 [08:35<24:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 686/2613 [08:36<24:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 687/2613 [08:37<24:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 688/2613 [08:38<24:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 689/2613 [08:38<24:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 690/2613 [08:39<24:04,  1.33it/s]

	Current Loss: 2.9017
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 691/2613 [08:40<24:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 692/2613 [08:41<24:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 693/2613 [08:41<24:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 694/2613 [08:42<24:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 695/2613 [08:43<24:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 696/2613 [08:44<23:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 697/2613 [08:44<23:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 698/2613 [08:45<23:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 699/2613 [08:46<23:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 700/2613 [08:47<23:56,  1.33it/s]

	Current Loss: 2.8966
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 701/2613 [08:47<23:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 702/2613 [08:48<23:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 703/2613 [08:49<23:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 704/2613 [08:50<23:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 705/2613 [08:50<23:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 706/2613 [08:51<23:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 707/2613 [08:52<23:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 708/2613 [08:53<23:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 709/2613 [08:53<23:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 710/2613 [08:54<23:50,  1.33it/s]

	Current Loss: 2.8912
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 711/2613 [08:55<23:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 712/2613 [08:56<23:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 713/2613 [08:56<23:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 714/2613 [08:57<23:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 715/2613 [08:58<23:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 716/2613 [08:59<23:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 717/2613 [08:59<23:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 718/2613 [09:00<23:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 719/2613 [09:01<23:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 720/2613 [09:02<23:42,  1.33it/s]

	Current Loss: 2.8865
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 721/2613 [09:02<23:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 722/2613 [09:03<23:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 723/2613 [09:04<23:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 724/2613 [09:05<23:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 725/2613 [09:05<23:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 726/2613 [09:06<23:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 727/2613 [09:07<23:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 728/2613 [09:08<23:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 729/2613 [09:08<23:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 730/2613 [09:09<23:34,  1.33it/s]

	Current Loss: 2.8704
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 731/2613 [09:10<23:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 732/2613 [09:11<23:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 733/2613 [09:11<23:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 734/2613 [09:12<23:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 735/2613 [09:13<23:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 736/2613 [09:14<23:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 737/2613 [09:14<23:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 738/2613 [09:15<23:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 739/2613 [09:16<23:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 740/2613 [09:17<23:27,  1.33it/s]

	Current Loss: 2.8703
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 741/2613 [09:17<23:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 742/2613 [09:18<23:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 743/2613 [09:19<23:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 744/2613 [09:20<23:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 745/2613 [09:20<23:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 746/2613 [09:21<23:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 747/2613 [09:22<23:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 748/2613 [09:23<23:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 749/2613 [09:23<23:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 750/2613 [09:24<23:19,  1.33it/s]

	Current Loss: 2.8693
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 751/2613 [09:25<23:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 752/2613 [09:26<23:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 753/2613 [09:26<23:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 754/2613 [09:27<23:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 755/2613 [09:28<23:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 756/2613 [09:29<23:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 757/2613 [09:29<23:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 758/2613 [09:30<23:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 759/2613 [09:31<23:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 760/2613 [09:32<23:22,  1.32it/s]

	Current Loss: 2.8613
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 761/2613 [09:32<23:18,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 762/2613 [09:33<23:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 763/2613 [09:34<23:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 764/2613 [09:35<23:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 765/2613 [09:35<23:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 766/2613 [09:36<23:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 767/2613 [09:37<23:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 768/2613 [09:38<23:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 769/2613 [09:38<23:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 770/2613 [09:39<23:04,  1.33it/s]

	Current Loss: 2.8554
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 771/2613 [09:40<23:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 772/2613 [09:41<23:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 773/2613 [09:41<23:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 774/2613 [09:42<23:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 775/2613 [09:43<23:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 776/2613 [09:44<23:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 777/2613 [09:44<22:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 778/2613 [09:45<22:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 779/2613 [09:46<22:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 780/2613 [09:47<22:57,  1.33it/s]

	Current Loss: 2.8499
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 781/2613 [09:47<22:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 782/2613 [09:48<22:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 783/2613 [09:49<22:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 784/2613 [09:50<22:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 785/2613 [09:50<22:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 786/2613 [09:51<22:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 787/2613 [09:52<22:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 788/2613 [09:53<22:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 789/2613 [09:53<22:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 790/2613 [09:54<22:50,  1.33it/s]

	Current Loss: 2.8429
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 791/2613 [09:55<22:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 792/2613 [09:56<22:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 793/2613 [09:56<22:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 794/2613 [09:57<22:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 795/2613 [09:58<22:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 796/2613 [09:59<22:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 797/2613 [09:59<22:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 798/2613 [10:00<22:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 799/2613 [10:01<22:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 800/2613 [10:02<22:42,  1.33it/s]

	Current Loss: 2.8350
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 801/2613 [10:02<22:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 802/2613 [10:03<22:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 803/2613 [10:04<22:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 804/2613 [10:05<22:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 805/2613 [10:05<22:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 806/2613 [10:06<22:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 807/2613 [10:07<22:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 808/2613 [10:08<22:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 809/2613 [10:08<22:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 810/2613 [10:09<22:35,  1.33it/s]

	Current Loss: 2.8326
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 811/2613 [10:10<22:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 812/2613 [10:11<22:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 813/2613 [10:11<22:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 814/2613 [10:12<22:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 815/2613 [10:13<22:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 816/2613 [10:14<22:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 817/2613 [10:14<22:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 818/2613 [10:15<22:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 819/2613 [10:16<22:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 820/2613 [10:17<22:27,  1.33it/s]

	Current Loss: 2.8273
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 821/2613 [10:17<22:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 822/2613 [10:18<22:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 823/2613 [10:19<22:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 824/2613 [10:20<22:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 825/2613 [10:20<22:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 826/2613 [10:21<22:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 827/2613 [10:22<22:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 828/2613 [10:23<22:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 829/2613 [10:23<22:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 830/2613 [10:24<22:19,  1.33it/s]

	Current Loss: 2.8293
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 831/2613 [10:25<22:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 832/2613 [10:26<22:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 833/2613 [10:26<22:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 834/2613 [10:27<22:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 835/2613 [10:28<22:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 836/2613 [10:29<22:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 837/2613 [10:29<22:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 838/2613 [10:30<22:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 839/2613 [10:31<22:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 840/2613 [10:32<22:14,  1.33it/s]

	Current Loss: 2.8274
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 841/2613 [10:32<22:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 842/2613 [10:33<22:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 843/2613 [10:34<22:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 844/2613 [10:35<22:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 845/2613 [10:35<22:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 846/2613 [10:36<22:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 847/2613 [10:37<22:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 848/2613 [10:38<22:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 849/2613 [10:38<22:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 850/2613 [10:39<22:04,  1.33it/s]

	Current Loss: 2.8216
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 851/2613 [10:40<22:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 852/2613 [10:41<22:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 853/2613 [10:42<22:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 854/2613 [10:42<22:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 855/2613 [10:43<22:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 856/2613 [10:44<22:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 857/2613 [10:45<21:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 858/2613 [10:45<21:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 859/2613 [10:46<21:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 860/2613 [10:47<21:57,  1.33it/s]

	Current Loss: 2.8085
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 861/2613 [10:48<21:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 862/2613 [10:48<21:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 863/2613 [10:49<21:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 864/2613 [10:50<21:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 865/2613 [10:51<21:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 866/2613 [10:51<21:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 867/2613 [10:52<21:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 868/2613 [10:53<21:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 869/2613 [10:54<21:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 870/2613 [10:54<21:50,  1.33it/s]

	Current Loss: 2.8057
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 871/2613 [10:55<21:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 872/2613 [10:56<21:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 873/2613 [10:57<21:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 874/2613 [10:57<21:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 875/2613 [10:58<21:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 876/2613 [10:59<21:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 877/2613 [11:00<21:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 878/2613 [11:00<21:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 879/2613 [11:01<21:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 880/2613 [11:02<21:42,  1.33it/s]

	Current Loss: 2.8031
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 881/2613 [11:03<21:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 882/2613 [11:03<21:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 883/2613 [11:04<21:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 884/2613 [11:05<21:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 885/2613 [11:06<21:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 886/2613 [11:06<21:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 887/2613 [11:07<21:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 888/2613 [11:08<21:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 889/2613 [11:09<21:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 890/2613 [11:09<21:34,  1.33it/s]

	Current Loss: 2.8000
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 891/2613 [11:10<21:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 892/2613 [11:11<21:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 893/2613 [11:12<21:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 894/2613 [11:12<21:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 895/2613 [11:13<21:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 896/2613 [11:14<21:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 897/2613 [11:15<21:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 898/2613 [11:15<21:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 899/2613 [11:16<21:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 900/2613 [11:17<21:26,  1.33it/s]

	Current Loss: 2.7959
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 901/2613 [11:18<21:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 902/2613 [11:18<21:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 903/2613 [11:19<21:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 904/2613 [11:20<21:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 905/2613 [11:21<21:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 906/2613 [11:21<21:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 907/2613 [11:22<21:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 908/2613 [11:23<21:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 909/2613 [11:24<21:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 910/2613 [11:24<21:19,  1.33it/s]

	Current Loss: 2.7931
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 911/2613 [11:25<21:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 912/2613 [11:26<21:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 913/2613 [11:27<21:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 914/2613 [11:27<21:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 915/2613 [11:28<21:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 916/2613 [11:29<21:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 917/2613 [11:30<21:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 918/2613 [11:30<21:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 919/2613 [11:31<21:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 920/2613 [11:32<21:11,  1.33it/s]

	Current Loss: 2.7878
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 921/2613 [11:33<21:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 922/2613 [11:33<21:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 923/2613 [11:34<21:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 924/2613 [11:35<21:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 925/2613 [11:36<21:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 926/2613 [11:36<21:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 927/2613 [11:37<21:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 928/2613 [11:38<21:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 929/2613 [11:39<21:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 930/2613 [11:39<21:03,  1.33it/s]

	Current Loss: 2.7876
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 931/2613 [11:40<21:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 932/2613 [11:41<21:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 933/2613 [11:42<21:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 934/2613 [11:42<21:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 935/2613 [11:43<21:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 936/2613 [11:44<20:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 937/2613 [11:45<20:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 938/2613 [11:45<20:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 939/2613 [11:46<20:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 940/2613 [11:47<20:56,  1.33it/s]

	Current Loss: 2.7793
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 941/2613 [11:48<20:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 942/2613 [11:48<20:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 943/2613 [11:49<20:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 944/2613 [11:50<20:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 945/2613 [11:51<20:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 946/2613 [11:51<20:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 947/2613 [11:52<20:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 948/2613 [11:53<20:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 949/2613 [11:54<20:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 950/2613 [11:54<20:48,  1.33it/s]

	Current Loss: 2.7749
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 951/2613 [11:55<20:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 952/2613 [11:56<20:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 953/2613 [11:57<20:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 954/2613 [11:57<20:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 955/2613 [11:58<20:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 956/2613 [11:59<20:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 957/2613 [12:00<20:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 958/2613 [12:00<20:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 959/2613 [12:01<20:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 960/2613 [12:02<20:41,  1.33it/s]

	Current Loss: 2.7707
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 961/2613 [12:03<20:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 962/2613 [12:03<20:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 963/2613 [12:04<20:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 964/2613 [12:05<20:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 965/2613 [12:06<20:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 966/2613 [12:06<20:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 967/2613 [12:07<20:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 968/2613 [12:08<20:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 969/2613 [12:09<20:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 970/2613 [12:09<20:34,  1.33it/s]

	Current Loss: 2.7655
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 971/2613 [12:10<20:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 972/2613 [12:11<20:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 973/2613 [12:12<20:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 974/2613 [12:12<20:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 975/2613 [12:13<20:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 976/2613 [12:14<20:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 977/2613 [12:15<20:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 978/2613 [12:15<20:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 979/2613 [12:16<20:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 980/2613 [12:17<20:26,  1.33it/s]

	Current Loss: 2.7666
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 981/2613 [12:18<20:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 982/2613 [12:18<20:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 983/2613 [12:19<20:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 984/2613 [12:20<20:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 985/2613 [12:21<20:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 986/2613 [12:21<20:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 987/2613 [12:22<20:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 988/2613 [12:23<20:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 989/2613 [12:24<20:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 990/2613 [12:24<20:19,  1.33it/s]

	Current Loss: 2.7550
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 991/2613 [12:25<20:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 992/2613 [12:26<20:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 993/2613 [12:27<20:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 994/2613 [12:27<20:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 995/2613 [12:28<20:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 996/2613 [12:29<20:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 997/2613 [12:30<20:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 998/2613 [12:30<20:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 999/2613 [12:31<20:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1000/2613 [12:32<20:12,  1.33it/s]

	Current Loss: 2.7585
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1001/2613 [12:33<20:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1002/2613 [12:33<20:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1003/2613 [12:34<20:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1004/2613 [12:35<20:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1005/2613 [12:36<20:23,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1006/2613 [12:36<20:17,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1007/2613 [12:37<20:13,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1008/2613 [12:38<20:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1009/2613 [12:39<20:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1010/2613 [12:39<20:07,  1.33it/s]

	Current Loss: 2.7507
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1011/2613 [12:40<20:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1012/2613 [12:41<20:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1013/2613 [12:42<20:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1014/2613 [12:43<20:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1015/2613 [12:43<20:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1016/2613 [12:44<19:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1017/2613 [12:45<19:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1018/2613 [12:46<19:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1019/2613 [12:46<19:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1020/2613 [12:47<19:56,  1.33it/s]

	Current Loss: 2.7447
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1021/2613 [12:48<19:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1022/2613 [12:49<19:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1023/2613 [12:49<19:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1024/2613 [12:50<19:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1025/2613 [12:51<19:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1026/2613 [12:52<19:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1027/2613 [12:52<19:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1028/2613 [12:53<19:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1029/2613 [12:54<19:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1030/2613 [12:55<19:50,  1.33it/s]

	Current Loss: 2.7458
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1031/2613 [12:55<19:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1032/2613 [12:56<19:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1033/2613 [12:57<19:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1034/2613 [12:58<19:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1035/2613 [12:58<19:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1036/2613 [12:59<19:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1037/2613 [13:00<19:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1038/2613 [13:01<19:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1039/2613 [13:01<19:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1040/2613 [13:02<19:42,  1.33it/s]

	Current Loss: 2.7369
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1041/2613 [13:03<19:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1042/2613 [13:04<19:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1043/2613 [13:04<19:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1044/2613 [13:05<19:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1045/2613 [13:06<19:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1046/2613 [13:07<19:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1047/2613 [13:07<19:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1048/2613 [13:08<19:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1049/2613 [13:09<19:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1050/2613 [13:10<19:34,  1.33it/s]

	Current Loss: 2.7356
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1051/2613 [13:10<19:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1052/2613 [13:11<19:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1053/2613 [13:12<19:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1054/2613 [13:13<19:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1055/2613 [13:13<19:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1056/2613 [13:14<19:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1057/2613 [13:15<19:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1058/2613 [13:16<19:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1059/2613 [13:16<19:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1060/2613 [13:17<19:26,  1.33it/s]

	Current Loss: 2.7331
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1061/2613 [13:18<19:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1062/2613 [13:19<19:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1063/2613 [13:19<19:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1064/2613 [13:20<19:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1065/2613 [13:21<19:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1066/2613 [13:22<19:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1067/2613 [13:22<19:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1068/2613 [13:23<19:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1069/2613 [13:24<19:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1070/2613 [13:25<19:19,  1.33it/s]

	Current Loss: 2.7318
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1071/2613 [13:25<19:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1072/2613 [13:26<19:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1073/2613 [13:27<19:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1074/2613 [13:28<19:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1075/2613 [13:28<19:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1076/2613 [13:29<19:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1077/2613 [13:30<19:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1078/2613 [13:31<19:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1079/2613 [13:31<19:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1080/2613 [13:32<19:11,  1.33it/s]

	Current Loss: 2.7241
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1081/2613 [13:33<19:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1082/2613 [13:34<19:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1083/2613 [13:34<19:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1084/2613 [13:35<19:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1085/2613 [13:36<19:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1086/2613 [13:37<19:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1087/2613 [13:37<19:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1088/2613 [13:38<19:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1089/2613 [13:39<19:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1090/2613 [13:40<19:04,  1.33it/s]

	Current Loss: 2.7231
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1091/2613 [13:40<19:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1092/2613 [13:41<19:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1093/2613 [13:42<19:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1094/2613 [13:43<19:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1095/2613 [13:43<19:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1096/2613 [13:44<18:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1097/2613 [13:45<18:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1098/2613 [13:46<18:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1099/2613 [13:46<18:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1100/2613 [13:47<18:56,  1.33it/s]

	Current Loss: 2.7163
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1101/2613 [13:48<18:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1102/2613 [13:49<18:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1103/2613 [13:49<18:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1104/2613 [13:50<18:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1105/2613 [13:51<18:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1106/2613 [13:52<18:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1107/2613 [13:52<18:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1108/2613 [13:53<18:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1109/2613 [13:54<18:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1110/2613 [13:55<18:49,  1.33it/s]

	Current Loss: 2.7132
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1111/2613 [13:55<18:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1112/2613 [13:56<18:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1113/2613 [13:57<18:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1114/2613 [13:58<18:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1115/2613 [13:58<18:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1116/2613 [13:59<18:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1117/2613 [14:00<18:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1118/2613 [14:01<18:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1119/2613 [14:01<18:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1120/2613 [14:02<18:41,  1.33it/s]

	Current Loss: 2.7110
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1121/2613 [14:03<18:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1122/2613 [14:04<18:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1123/2613 [14:04<18:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1124/2613 [14:05<18:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1125/2613 [14:06<18:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1126/2613 [14:07<18:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1127/2613 [14:07<18:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1128/2613 [14:08<18:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1129/2613 [14:09<18:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1130/2613 [14:10<18:34,  1.33it/s]

	Current Loss: 2.7110
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1131/2613 [14:10<18:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1132/2613 [14:11<18:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1133/2613 [14:12<18:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1134/2613 [14:13<18:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1135/2613 [14:13<18:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1136/2613 [14:14<18:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1137/2613 [14:15<18:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1138/2613 [14:16<18:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1139/2613 [14:16<18:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1140/2613 [14:17<18:26,  1.33it/s]

	Current Loss: 2.6994
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1141/2613 [14:18<18:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1142/2613 [14:19<18:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1143/2613 [14:19<18:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1144/2613 [14:20<18:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1145/2613 [14:21<18:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1146/2613 [14:22<18:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1147/2613 [14:22<18:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1148/2613 [14:23<18:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1149/2613 [14:24<18:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1150/2613 [14:25<18:19,  1.33it/s]

	Current Loss: 2.6986
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1151/2613 [14:25<18:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1152/2613 [14:26<18:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1153/2613 [14:27<18:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1154/2613 [14:28<18:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1155/2613 [14:28<18:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1156/2613 [14:29<18:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1157/2613 [14:30<18:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1158/2613 [14:31<18:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1159/2613 [14:31<18:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1160/2613 [14:32<18:11,  1.33it/s]

	Current Loss: 2.6914
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1161/2613 [14:33<18:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1162/2613 [14:34<18:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1163/2613 [14:34<18:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1164/2613 [14:35<18:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1165/2613 [14:36<18:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1166/2613 [14:37<18:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1167/2613 [14:37<18:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1168/2613 [14:38<18:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1169/2613 [14:39<18:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1170/2613 [14:40<18:04,  1.33it/s]

	Current Loss: 2.6912
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1171/2613 [14:40<18:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1172/2613 [14:41<18:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1173/2613 [14:42<18:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1174/2613 [14:43<18:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1175/2613 [14:43<18:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1176/2613 [14:44<17:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1177/2613 [14:45<17:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1178/2613 [14:46<17:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1179/2613 [14:46<17:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1180/2613 [14:47<17:56,  1.33it/s]

	Current Loss: 2.6901
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1181/2613 [14:48<17:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1182/2613 [14:49<17:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1183/2613 [14:49<17:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1184/2613 [14:50<17:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1185/2613 [14:51<17:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1186/2613 [14:52<17:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1187/2613 [14:52<17:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1188/2613 [14:53<17:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1189/2613 [14:54<17:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1190/2613 [14:55<17:49,  1.33it/s]

	Current Loss: 2.6943
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1191/2613 [14:55<17:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1192/2613 [14:56<17:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1193/2613 [14:57<17:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1194/2613 [14:58<17:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1195/2613 [14:59<17:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1196/2613 [14:59<17:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1197/2613 [15:00<17:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1198/2613 [15:01<17:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1199/2613 [15:02<17:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1200/2613 [15:02<17:40,  1.33it/s]

	Current Loss: 2.6814
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1201/2613 [15:03<17:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1202/2613 [15:04<17:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1203/2613 [15:05<17:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1204/2613 [15:05<17:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1205/2613 [15:06<17:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1206/2613 [15:07<17:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1207/2613 [15:08<17:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1208/2613 [15:08<17:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1209/2613 [15:09<17:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1210/2613 [15:10<17:33,  1.33it/s]

	Current Loss: 2.6831
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1211/2613 [15:11<17:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1212/2613 [15:11<17:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1213/2613 [15:12<17:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1214/2613 [15:13<17:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1215/2613 [15:14<17:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1216/2613 [15:14<17:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1217/2613 [15:15<17:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1218/2613 [15:16<17:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1219/2613 [15:17<17:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1220/2613 [15:17<17:26,  1.33it/s]

	Current Loss: 2.6728
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1221/2613 [15:18<17:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1222/2613 [15:19<17:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1223/2613 [15:20<17:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1224/2613 [15:20<17:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1225/2613 [15:21<17:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1226/2613 [15:22<17:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1227/2613 [15:23<17:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1228/2613 [15:23<17:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1229/2613 [15:24<17:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1230/2613 [15:25<17:21,  1.33it/s]

	Current Loss: 2.6661
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1231/2613 [15:26<17:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1232/2613 [15:26<17:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1233/2613 [15:27<17:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1234/2613 [15:28<17:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1235/2613 [15:29<17:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1236/2613 [15:29<17:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1237/2613 [15:30<17:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1238/2613 [15:31<17:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1239/2613 [15:32<17:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1240/2613 [15:32<17:12,  1.33it/s]

	Current Loss: 2.6718
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1241/2613 [15:33<17:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1242/2613 [15:34<17:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1243/2613 [15:35<17:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1244/2613 [15:35<17:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1245/2613 [15:36<17:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1246/2613 [15:37<17:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1247/2613 [15:38<17:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1248/2613 [15:38<17:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1249/2613 [15:39<17:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1250/2613 [15:40<17:03,  1.33it/s]

	Current Loss: 2.6699
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1251/2613 [15:41<17:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1252/2613 [15:41<17:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1253/2613 [15:42<17:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1254/2613 [15:43<17:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1255/2613 [15:44<17:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1256/2613 [15:44<16:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1257/2613 [15:45<16:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1258/2613 [15:46<16:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1259/2613 [15:47<16:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1260/2613 [15:47<16:56,  1.33it/s]

	Current Loss: 2.6892
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1261/2613 [15:48<16:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1262/2613 [15:49<16:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1263/2613 [15:50<16:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1264/2613 [15:50<16:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1265/2613 [15:51<16:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1266/2613 [15:52<16:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1267/2613 [15:53<16:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1268/2613 [15:53<16:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1269/2613 [15:54<16:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1270/2613 [15:55<16:50,  1.33it/s]

	Current Loss: 2.6851
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1271/2613 [15:56<16:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1272/2613 [15:56<16:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1273/2613 [15:57<16:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1274/2613 [15:58<16:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1275/2613 [15:59<16:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1276/2613 [15:59<16:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1277/2613 [16:00<16:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1278/2613 [16:01<16:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1279/2613 [16:02<16:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1280/2613 [16:02<16:41,  1.33it/s]

	Current Loss: 2.6680
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1281/2613 [16:03<16:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1282/2613 [16:04<16:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1283/2613 [16:05<16:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1284/2613 [16:05<16:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1285/2613 [16:06<16:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1286/2613 [16:07<16:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1287/2613 [16:08<16:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1288/2613 [16:08<16:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1289/2613 [16:09<16:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1290/2613 [16:10<16:34,  1.33it/s]

	Current Loss: 2.6732
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1291/2613 [16:11<16:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1292/2613 [16:11<16:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1293/2613 [16:12<16:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1294/2613 [16:13<16:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1295/2613 [16:14<16:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1296/2613 [16:14<16:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1297/2613 [16:15<16:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1298/2613 [16:16<16:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1299/2613 [16:17<16:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1300/2613 [16:17<16:27,  1.33it/s]

	Current Loss: 2.6689
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1301/2613 [16:18<16:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1302/2613 [16:19<16:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1303/2613 [16:20<16:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1304/2613 [16:20<16:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1305/2613 [16:21<16:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1306/2613 [16:22<16:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1307/2613 [16:23<16:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1308/2613 [16:23<16:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1309/2613 [16:24<16:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1310/2613 [16:25<16:18,  1.33it/s]

	Current Loss: 2.6587
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1311/2613 [16:26<16:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1312/2613 [16:26<16:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1313/2613 [16:27<16:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1314/2613 [16:28<16:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1315/2613 [16:29<16:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1316/2613 [16:29<16:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1317/2613 [16:30<16:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1318/2613 [16:31<16:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1319/2613 [16:32<16:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1320/2613 [16:32<16:11,  1.33it/s]

	Current Loss: 2.6570
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1321/2613 [16:33<16:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1322/2613 [16:34<16:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1323/2613 [16:35<16:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1324/2613 [16:35<16:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1325/2613 [16:36<16:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1326/2613 [16:37<16:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1327/2613 [16:38<16:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1328/2613 [16:38<16:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1329/2613 [16:39<16:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1330/2613 [16:40<16:04,  1.33it/s]

	Current Loss: 2.6688
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1331/2613 [16:41<16:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1332/2613 [16:41<16:16,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1333/2613 [16:42<16:11,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1334/2613 [16:43<16:07,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1335/2613 [16:44<16:05,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1336/2613 [16:45<16:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1337/2613 [16:45<16:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1338/2613 [16:46<15:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1339/2613 [16:47<15:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1340/2613 [16:48<15:57,  1.33it/s]

	Current Loss: 2.6529
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1341/2613 [16:48<15:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1342/2613 [16:49<15:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1343/2613 [16:50<15:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1344/2613 [16:51<15:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1345/2613 [16:51<15:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1346/2613 [16:52<15:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1347/2613 [16:53<15:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1348/2613 [16:54<15:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1349/2613 [16:54<15:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1350/2613 [16:55<15:49,  1.33it/s]

	Current Loss: 2.6492
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1351/2613 [16:56<15:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1352/2613 [16:57<15:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1353/2613 [16:57<15:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1354/2613 [16:58<15:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1355/2613 [16:59<15:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1356/2613 [17:00<15:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1357/2613 [17:00<15:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1358/2613 [17:01<15:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1359/2613 [17:02<15:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1360/2613 [17:03<15:41,  1.33it/s]

	Current Loss: 2.6464
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1361/2613 [17:03<15:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1362/2613 [17:04<15:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1363/2613 [17:05<15:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1364/2613 [17:06<15:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1365/2613 [17:06<15:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1366/2613 [17:07<15:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1367/2613 [17:08<15:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1368/2613 [17:09<15:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1369/2613 [17:09<15:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1370/2613 [17:10<15:34,  1.33it/s]

	Current Loss: 2.6635
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1371/2613 [17:11<15:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1372/2613 [17:12<15:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1373/2613 [17:12<15:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1374/2613 [17:13<15:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1375/2613 [17:14<15:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1376/2613 [17:15<15:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1377/2613 [17:15<15:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1378/2613 [17:16<15:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1379/2613 [17:17<15:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1380/2613 [17:18<15:26,  1.33it/s]

	Current Loss: 2.6493
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1381/2613 [17:18<15:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1382/2613 [17:19<15:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1383/2613 [17:20<15:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1384/2613 [17:21<15:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1385/2613 [17:21<15:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1386/2613 [17:22<15:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1387/2613 [17:23<15:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1388/2613 [17:24<15:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1389/2613 [17:24<15:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1390/2613 [17:25<15:19,  1.33it/s]

	Current Loss: 2.6393
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1391/2613 [17:26<15:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1392/2613 [17:27<15:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1393/2613 [17:27<15:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1394/2613 [17:28<15:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1395/2613 [17:29<15:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1396/2613 [17:30<15:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1397/2613 [17:30<15:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1398/2613 [17:31<15:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1399/2613 [17:32<15:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1400/2613 [17:33<15:12,  1.33it/s]

	Current Loss: 2.6298
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1401/2613 [17:33<15:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1402/2613 [17:34<15:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1403/2613 [17:35<15:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1404/2613 [17:36<15:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1405/2613 [17:36<15:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1406/2613 [17:37<15:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1407/2613 [17:38<15:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1408/2613 [17:39<15:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1409/2613 [17:39<15:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1410/2613 [17:40<15:04,  1.33it/s]

	Current Loss: 2.6342
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1411/2613 [17:41<15:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1412/2613 [17:42<15:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1413/2613 [17:42<15:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1414/2613 [17:43<15:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1415/2613 [17:44<15:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1416/2613 [17:45<15:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1417/2613 [17:45<14:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1418/2613 [17:46<14:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1419/2613 [17:47<14:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1420/2613 [17:48<14:56,  1.33it/s]

	Current Loss: 2.6339
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1421/2613 [17:48<14:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1422/2613 [17:49<14:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1423/2613 [17:50<14:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1424/2613 [17:51<14:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1425/2613 [17:51<14:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1426/2613 [17:52<14:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1427/2613 [17:53<14:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1428/2613 [17:54<14:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1429/2613 [17:54<14:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1430/2613 [17:55<14:49,  1.33it/s]

	Current Loss: 2.6233
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1431/2613 [17:56<14:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1432/2613 [17:57<14:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1433/2613 [17:57<14:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1434/2613 [17:58<14:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1435/2613 [17:59<14:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1436/2613 [18:00<14:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1437/2613 [18:00<14:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1438/2613 [18:01<14:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1439/2613 [18:02<14:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1440/2613 [18:03<14:41,  1.33it/s]

	Current Loss: 2.6180
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1441/2613 [18:03<14:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1442/2613 [18:04<14:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1443/2613 [18:05<14:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1444/2613 [18:06<14:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1445/2613 [18:06<14:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1446/2613 [18:07<14:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1447/2613 [18:08<14:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1448/2613 [18:09<14:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1449/2613 [18:09<14:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1450/2613 [18:10<14:34,  1.33it/s]

	Current Loss: 2.6246
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1451/2613 [18:11<14:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1452/2613 [18:12<14:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1453/2613 [18:12<14:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1454/2613 [18:13<14:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1455/2613 [18:14<14:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1456/2613 [18:15<14:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1457/2613 [18:15<14:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1458/2613 [18:16<14:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1459/2613 [18:17<14:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1460/2613 [18:18<14:26,  1.33it/s]

	Current Loss: 2.6164
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1461/2613 [18:18<14:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1462/2613 [18:19<14:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1463/2613 [18:20<14:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1464/2613 [18:21<14:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1465/2613 [18:21<14:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1466/2613 [18:22<14:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1467/2613 [18:23<14:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1468/2613 [18:24<14:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1469/2613 [18:24<14:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1470/2613 [18:25<14:18,  1.33it/s]

	Current Loss: 2.6099
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1471/2613 [18:26<14:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1472/2613 [18:27<14:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1473/2613 [18:27<14:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1474/2613 [18:28<14:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1475/2613 [18:29<14:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1476/2613 [18:30<14:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1477/2613 [18:30<14:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1478/2613 [18:31<14:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1479/2613 [18:32<14:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1480/2613 [18:33<14:12,  1.33it/s]

	Current Loss: 2.6026
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1481/2613 [18:33<14:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1482/2613 [18:34<14:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1483/2613 [18:35<14:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1484/2613 [18:36<14:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1485/2613 [18:36<14:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1486/2613 [18:37<14:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1487/2613 [18:38<14:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1488/2613 [18:39<14:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1489/2613 [18:39<14:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1490/2613 [18:40<14:03,  1.33it/s]

	Current Loss: 2.6009
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1491/2613 [18:41<14:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1492/2613 [18:42<14:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1493/2613 [18:42<14:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1494/2613 [18:43<14:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1495/2613 [18:44<14:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1496/2613 [18:45<13:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1497/2613 [18:46<13:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1498/2613 [18:46<13:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1499/2613 [18:47<13:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1500/2613 [18:48<13:56,  1.33it/s]

	Current Loss: 2.5962
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1501/2613 [18:49<13:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1502/2613 [18:49<13:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1503/2613 [18:50<13:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1504/2613 [18:51<13:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1505/2613 [18:52<13:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1506/2613 [18:52<13:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1507/2613 [18:53<13:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1508/2613 [18:54<13:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1509/2613 [18:55<13:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1510/2613 [18:55<13:48,  1.33it/s]

	Current Loss: 2.5995
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1511/2613 [18:56<13:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1512/2613 [18:57<13:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1513/2613 [18:58<13:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1514/2613 [18:58<13:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1515/2613 [18:59<13:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1516/2613 [19:00<13:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1517/2613 [19:01<13:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1518/2613 [19:01<13:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1519/2613 [19:02<13:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1520/2613 [19:03<13:41,  1.33it/s]

	Current Loss: 2.5911
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1521/2613 [19:04<13:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1522/2613 [19:04<13:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1523/2613 [19:05<13:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1524/2613 [19:06<13:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1525/2613 [19:07<13:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1526/2613 [19:07<13:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1527/2613 [19:08<13:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1528/2613 [19:09<13:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1529/2613 [19:10<13:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1530/2613 [19:10<13:34,  1.33it/s]

	Current Loss: 2.5927
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1531/2613 [19:11<13:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1532/2613 [19:12<13:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1533/2613 [19:13<13:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1534/2613 [19:13<13:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1535/2613 [19:14<13:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1536/2613 [19:15<13:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1537/2613 [19:16<13:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1538/2613 [19:16<13:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1539/2613 [19:17<13:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1540/2613 [19:18<13:26,  1.33it/s]

	Current Loss: 2.5883
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1541/2613 [19:19<13:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1542/2613 [19:19<13:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1543/2613 [19:20<13:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1544/2613 [19:21<13:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1545/2613 [19:22<13:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1546/2613 [19:22<13:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1547/2613 [19:23<13:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1548/2613 [19:24<13:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1549/2613 [19:25<13:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1550/2613 [19:25<13:18,  1.33it/s]

	Current Loss: 2.5878
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1551/2613 [19:26<13:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1552/2613 [19:27<13:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1553/2613 [19:28<13:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1554/2613 [19:28<13:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1555/2613 [19:29<13:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1556/2613 [19:30<13:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1557/2613 [19:31<13:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1558/2613 [19:31<13:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1559/2613 [19:32<13:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1560/2613 [19:33<13:11,  1.33it/s]

	Current Loss: 2.5816
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1561/2613 [19:34<13:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1562/2613 [19:34<13:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1563/2613 [19:35<13:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1564/2613 [19:36<13:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1565/2613 [19:37<13:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1566/2613 [19:37<13:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1567/2613 [19:38<13:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1568/2613 [19:39<13:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1569/2613 [19:40<13:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1570/2613 [19:40<13:03,  1.33it/s]

	Current Loss: 2.5720
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1571/2613 [19:41<13:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1572/2613 [19:42<13:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1573/2613 [19:43<13:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1574/2613 [19:43<13:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1575/2613 [19:44<12:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1576/2613 [19:45<12:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1577/2613 [19:46<12:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1578/2613 [19:46<12:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1579/2613 [19:47<12:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1580/2613 [19:48<12:56,  1.33it/s]

	Current Loss: 2.5715
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1581/2613 [19:49<12:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1582/2613 [19:49<12:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1583/2613 [19:50<12:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1584/2613 [19:51<12:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1585/2613 [19:52<12:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1586/2613 [19:52<12:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1587/2613 [19:53<12:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1588/2613 [19:54<12:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1589/2613 [19:55<12:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1590/2613 [19:55<12:48,  1.33it/s]

	Current Loss: 2.5804
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1591/2613 [19:56<12:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1592/2613 [19:57<12:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1593/2613 [19:58<12:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1594/2613 [19:58<12:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1595/2613 [19:59<12:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1596/2613 [20:00<12:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1597/2613 [20:01<12:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1598/2613 [20:01<12:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1599/2613 [20:02<12:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1600/2613 [20:03<12:41,  1.33it/s]

	Current Loss: 2.5702
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1601/2613 [20:04<12:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1602/2613 [20:04<12:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1603/2613 [20:05<12:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1604/2613 [20:06<12:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1605/2613 [20:07<12:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1606/2613 [20:07<12:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1607/2613 [20:08<12:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1608/2613 [20:09<12:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1609/2613 [20:10<12:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1610/2613 [20:10<12:33,  1.33it/s]

	Current Loss: 2.5689
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1611/2613 [20:11<12:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1612/2613 [20:12<12:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1613/2613 [20:13<12:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1614/2613 [20:13<12:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1615/2613 [20:14<12:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1616/2613 [20:15<12:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1617/2613 [20:16<12:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1618/2613 [20:16<12:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1619/2613 [20:17<12:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1620/2613 [20:18<12:26,  1.33it/s]

	Current Loss: 2.5743
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1621/2613 [20:19<12:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1622/2613 [20:19<12:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1623/2613 [20:20<12:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1624/2613 [20:21<12:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1625/2613 [20:22<12:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1626/2613 [20:22<12:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1627/2613 [20:23<12:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1628/2613 [20:24<12:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1629/2613 [20:25<12:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1630/2613 [20:25<12:18,  1.33it/s]

	Current Loss: 2.5723
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1631/2613 [20:26<12:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1632/2613 [20:27<12:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1633/2613 [20:28<12:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1634/2613 [20:28<12:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1635/2613 [20:29<12:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1636/2613 [20:30<12:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1637/2613 [20:31<12:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1638/2613 [20:31<12:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1639/2613 [20:32<12:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1640/2613 [20:33<12:11,  1.33it/s]

	Current Loss: 2.5727
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1641/2613 [20:34<12:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1642/2613 [20:34<12:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1643/2613 [20:35<12:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1644/2613 [20:36<12:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1645/2613 [20:37<12:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1646/2613 [20:37<12:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1647/2613 [20:38<12:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1648/2613 [20:39<12:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1649/2613 [20:40<12:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1650/2613 [20:40<12:03,  1.33it/s]

	Current Loss: 2.5632
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1651/2613 [20:41<12:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1652/2613 [20:42<12:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1653/2613 [20:43<12:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1654/2613 [20:43<12:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1655/2613 [20:44<11:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1656/2613 [20:45<11:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1657/2613 [20:46<11:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1658/2613 [20:46<11:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1659/2613 [20:47<12:03,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1660/2613 [20:48<12:00,  1.32it/s]

	Current Loss: 2.5621
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1661/2613 [20:49<11:58,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1662/2613 [20:50<11:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1663/2613 [20:50<11:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1664/2613 [20:51<11:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1665/2613 [20:52<11:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1666/2613 [20:53<11:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1667/2613 [20:53<11:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1668/2613 [20:54<11:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1669/2613 [20:55<11:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1670/2613 [20:56<11:49,  1.33it/s]

	Current Loss: 2.5544
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1671/2613 [20:56<11:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1672/2613 [20:57<11:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1673/2613 [20:58<11:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1674/2613 [20:59<11:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1675/2613 [20:59<11:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1676/2613 [21:00<11:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1677/2613 [21:01<11:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1678/2613 [21:02<11:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1679/2613 [21:02<11:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1680/2613 [21:03<11:40,  1.33it/s]

	Current Loss: 2.5582
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1681/2613 [21:04<11:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1682/2613 [21:05<11:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1683/2613 [21:05<11:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1684/2613 [21:06<11:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1685/2613 [21:07<11:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1686/2613 [21:08<11:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1687/2613 [21:08<11:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1688/2613 [21:09<11:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1689/2613 [21:10<11:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1690/2613 [21:11<11:33,  1.33it/s]

	Current Loss: 2.5555
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1691/2613 [21:11<11:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1692/2613 [21:12<11:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1693/2613 [21:13<11:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1694/2613 [21:14<11:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1695/2613 [21:14<11:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1696/2613 [21:15<11:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1697/2613 [21:16<11:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1698/2613 [21:17<11:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1699/2613 [21:17<11:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1700/2613 [21:18<11:25,  1.33it/s]

	Current Loss: 2.5484
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1701/2613 [21:19<11:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1702/2613 [21:20<11:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1703/2613 [21:20<11:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1704/2613 [21:21<11:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1705/2613 [21:22<11:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1706/2613 [21:23<11:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1707/2613 [21:23<11:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1708/2613 [21:24<11:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1709/2613 [21:25<11:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1710/2613 [21:26<11:18,  1.33it/s]

	Current Loss: 2.5492
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1711/2613 [21:26<11:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1712/2613 [21:27<11:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1713/2613 [21:28<11:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1714/2613 [21:29<11:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1715/2613 [21:29<11:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1716/2613 [21:30<11:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1717/2613 [21:31<11:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1718/2613 [21:32<11:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1719/2613 [21:32<11:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1720/2613 [21:33<11:11,  1.33it/s]

	Current Loss: 2.5381
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1721/2613 [21:34<11:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1722/2613 [21:35<11:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1723/2613 [21:35<11:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1724/2613 [21:36<11:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1725/2613 [21:37<11:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1726/2613 [21:38<11:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1727/2613 [21:38<11:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1728/2613 [21:39<11:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1729/2613 [21:40<11:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1730/2613 [21:41<11:03,  1.33it/s]

	Current Loss: 2.5448
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1731/2613 [21:41<11:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1732/2613 [21:42<11:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1733/2613 [21:43<11:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1734/2613 [21:44<11:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1735/2613 [21:44<10:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1736/2613 [21:45<10:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1737/2613 [21:46<10:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1738/2613 [21:47<10:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1739/2613 [21:47<10:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1740/2613 [21:48<10:55,  1.33it/s]

	Current Loss: 2.5357
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1741/2613 [21:49<10:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1742/2613 [21:50<10:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1743/2613 [21:50<10:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1744/2613 [21:51<10:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1745/2613 [21:52<10:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1746/2613 [21:53<10:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1747/2613 [21:53<10:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1748/2613 [21:54<10:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1749/2613 [21:55<10:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1750/2613 [21:56<10:48,  1.33it/s]

	Current Loss: 2.5358
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1751/2613 [21:56<10:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1752/2613 [21:57<10:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1753/2613 [21:58<10:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1754/2613 [21:59<10:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1755/2613 [21:59<10:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1756/2613 [22:00<10:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1757/2613 [22:01<10:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1758/2613 [22:02<10:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1759/2613 [22:02<10:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1760/2613 [22:03<10:40,  1.33it/s]

	Current Loss: 2.5340
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1761/2613 [22:04<10:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1762/2613 [22:05<10:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1763/2613 [22:05<10:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1764/2613 [22:06<10:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1765/2613 [22:07<10:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1766/2613 [22:08<10:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1767/2613 [22:08<10:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1768/2613 [22:09<10:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1769/2613 [22:10<10:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1770/2613 [22:11<10:33,  1.33it/s]

	Current Loss: 2.5297
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1771/2613 [22:11<10:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1772/2613 [22:12<10:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1773/2613 [22:13<10:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1774/2613 [22:14<10:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1775/2613 [22:14<10:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1776/2613 [22:15<10:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1777/2613 [22:16<10:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1778/2613 [22:17<10:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1779/2613 [22:17<10:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1780/2613 [22:18<10:25,  1.33it/s]

	Current Loss: 2.5314
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1781/2613 [22:19<10:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1782/2613 [22:20<10:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1783/2613 [22:20<10:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1784/2613 [22:21<10:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1785/2613 [22:22<10:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1786/2613 [22:23<10:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1787/2613 [22:23<10:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1788/2613 [22:24<10:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1789/2613 [22:25<10:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1790/2613 [22:26<10:18,  1.33it/s]

	Current Loss: 2.5175
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1791/2613 [22:26<10:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1792/2613 [22:27<10:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1793/2613 [22:28<10:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1794/2613 [22:29<10:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1795/2613 [22:29<10:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1796/2613 [22:30<10:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1797/2613 [22:31<10:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1798/2613 [22:32<10:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1799/2613 [22:32<10:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1800/2613 [22:33<10:10,  1.33it/s]

	Current Loss: 2.5286
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1801/2613 [22:34<10:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1802/2613 [22:35<10:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1803/2613 [22:35<10:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1804/2613 [22:36<10:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1805/2613 [22:37<10:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1806/2613 [22:38<10:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1807/2613 [22:38<10:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1808/2613 [22:39<10:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1809/2613 [22:40<10:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1810/2613 [22:41<10:03,  1.33it/s]

	Current Loss: 2.5307
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1811/2613 [22:41<10:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1812/2613 [22:42<10:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1813/2613 [22:43<10:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1814/2613 [22:44<10:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1815/2613 [22:44<09:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1816/2613 [22:45<09:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1817/2613 [22:46<09:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1818/2613 [22:47<09:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1819/2613 [22:47<09:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1820/2613 [22:48<09:56,  1.33it/s]

	Current Loss: 2.5212
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1821/2613 [22:49<09:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1822/2613 [22:50<09:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1823/2613 [22:50<09:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1824/2613 [22:51<09:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1825/2613 [22:52<09:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1826/2613 [22:53<09:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1827/2613 [22:54<09:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1828/2613 [22:54<09:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1829/2613 [22:55<09:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1830/2613 [22:56<09:48,  1.33it/s]

	Current Loss: 2.5189
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1831/2613 [22:57<09:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1832/2613 [22:57<09:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1833/2613 [22:58<09:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1834/2613 [22:59<09:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1835/2613 [23:00<09:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1836/2613 [23:00<09:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1837/2613 [23:01<09:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1838/2613 [23:02<09:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1839/2613 [23:03<09:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1840/2613 [23:03<09:40,  1.33it/s]

	Current Loss: 2.5099
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1841/2613 [23:04<09:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1842/2613 [23:05<09:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1843/2613 [23:06<09:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1844/2613 [23:06<09:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1845/2613 [23:07<09:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1846/2613 [23:08<09:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1847/2613 [23:09<09:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1848/2613 [23:09<09:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1849/2613 [23:10<09:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1850/2613 [23:11<09:33,  1.33it/s]

	Current Loss: 2.5097
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1851/2613 [23:12<09:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1852/2613 [23:12<09:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1853/2613 [23:13<09:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1854/2613 [23:14<09:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1855/2613 [23:15<09:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1856/2613 [23:15<09:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1857/2613 [23:16<09:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1858/2613 [23:17<09:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1859/2613 [23:18<09:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1860/2613 [23:18<09:25,  1.33it/s]

	Current Loss: 2.5119
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1861/2613 [23:19<09:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1862/2613 [23:20<09:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1863/2613 [23:21<09:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1864/2613 [23:21<09:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1865/2613 [23:22<09:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1866/2613 [23:23<09:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1867/2613 [23:24<09:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1868/2613 [23:24<09:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1869/2613 [23:25<09:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1870/2613 [23:26<09:18,  1.33it/s]

	Current Loss: 2.5122
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1871/2613 [23:27<09:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1872/2613 [23:27<09:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1873/2613 [23:28<09:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1874/2613 [23:29<09:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1875/2613 [23:30<09:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1876/2613 [23:30<09:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1877/2613 [23:31<09:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1878/2613 [23:32<09:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1879/2613 [23:33<09:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1880/2613 [23:33<09:10,  1.33it/s]

	Current Loss: 2.5081
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1881/2613 [23:34<09:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1882/2613 [23:35<09:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1883/2613 [23:36<09:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1884/2613 [23:36<09:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1885/2613 [23:37<09:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1886/2613 [23:38<09:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1887/2613 [23:39<09:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1888/2613 [23:39<09:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1889/2613 [23:40<09:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1890/2613 [23:41<09:03,  1.33it/s]

	Current Loss: 2.4956
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1891/2613 [23:42<09:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1892/2613 [23:42<09:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1893/2613 [23:43<09:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1894/2613 [23:44<09:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1895/2613 [23:45<08:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1896/2613 [23:45<08:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1897/2613 [23:46<08:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1898/2613 [23:47<08:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1899/2613 [23:48<08:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1900/2613 [23:48<08:55,  1.33it/s]

	Current Loss: 2.5066
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1901/2613 [23:49<08:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1902/2613 [23:50<08:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1903/2613 [23:51<08:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1904/2613 [23:51<08:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1905/2613 [23:52<08:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1906/2613 [23:53<08:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1907/2613 [23:54<08:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1908/2613 [23:54<08:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1909/2613 [23:55<08:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1910/2613 [23:56<08:48,  1.33it/s]

	Current Loss: 2.5053
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1911/2613 [23:57<08:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1912/2613 [23:57<08:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1913/2613 [23:58<08:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1914/2613 [23:59<08:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1915/2613 [24:00<08:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1916/2613 [24:00<08:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1917/2613 [24:01<08:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1918/2613 [24:02<08:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1919/2613 [24:03<08:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1920/2613 [24:03<08:40,  1.33it/s]

	Current Loss: 2.4965
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1921/2613 [24:04<08:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1922/2613 [24:05<08:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1923/2613 [24:06<08:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1924/2613 [24:06<08:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1925/2613 [24:07<08:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1926/2613 [24:08<08:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1927/2613 [24:09<08:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1928/2613 [24:09<08:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1929/2613 [24:10<08:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1930/2613 [24:11<08:33,  1.33it/s]

	Current Loss: 2.4891
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1931/2613 [24:12<08:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1932/2613 [24:12<08:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1933/2613 [24:13<08:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1934/2613 [24:14<08:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1935/2613 [24:15<08:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1936/2613 [24:15<08:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1937/2613 [24:16<08:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1938/2613 [24:17<08:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1939/2613 [24:18<08:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1940/2613 [24:18<08:25,  1.33it/s]

	Current Loss: 2.4884
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1941/2613 [24:19<08:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1942/2613 [24:20<08:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1943/2613 [24:21<08:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1944/2613 [24:21<08:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1945/2613 [24:22<08:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1946/2613 [24:23<08:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1947/2613 [24:24<08:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1948/2613 [24:24<08:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1949/2613 [24:25<08:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1950/2613 [24:26<08:17,  1.33it/s]

	Current Loss: 2.4886
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1951/2613 [24:27<08:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1952/2613 [24:27<08:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1953/2613 [24:28<08:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1954/2613 [24:29<08:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1955/2613 [24:30<08:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1956/2613 [24:30<08:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1957/2613 [24:31<08:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1958/2613 [24:32<08:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1959/2613 [24:33<08:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1960/2613 [24:33<08:10,  1.33it/s]

	Current Loss: 2.4839
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1961/2613 [24:34<08:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1962/2613 [24:35<08:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1963/2613 [24:36<08:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1964/2613 [24:36<08:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1965/2613 [24:37<08:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1966/2613 [24:38<08:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1967/2613 [24:39<08:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1968/2613 [24:39<08:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1969/2613 [24:40<08:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1970/2613 [24:41<08:02,  1.33it/s]

	Current Loss: 2.4803
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1971/2613 [24:42<08:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1972/2613 [24:42<08:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1973/2613 [24:43<08:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1974/2613 [24:44<08:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1975/2613 [24:45<07:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1976/2613 [24:45<07:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1977/2613 [24:46<07:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1978/2613 [24:47<07:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1979/2613 [24:48<07:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1980/2613 [24:48<07:55,  1.33it/s]

	Current Loss: 2.4806
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1981/2613 [24:49<07:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1982/2613 [24:50<07:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1983/2613 [24:51<07:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1984/2613 [24:51<07:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1985/2613 [24:52<07:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1986/2613 [24:53<07:59,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1987/2613 [24:54<07:48,  1.34it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1988/2613 [24:54<07:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1989/2613 [24:55<07:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1990/2613 [24:56<07:47,  1.33it/s]

	Current Loss: 2.4791
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1991/2613 [24:57<07:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1992/2613 [24:57<07:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1993/2613 [24:58<07:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1994/2613 [24:59<07:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1995/2613 [25:00<07:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1996/2613 [25:00<07:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1997/2613 [25:01<07:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1998/2613 [25:02<07:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 1999/2613 [25:03<07:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2000/2613 [25:03<07:40,  1.33it/s]

	Current Loss: 2.4765
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2001/2613 [25:04<07:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2002/2613 [25:05<07:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2003/2613 [25:06<07:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2004/2613 [25:06<07:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2005/2613 [25:07<07:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2006/2613 [25:08<07:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2007/2613 [25:09<07:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2008/2613 [25:09<07:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2009/2613 [25:10<07:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2010/2613 [25:11<07:33,  1.33it/s]

	Current Loss: 2.4795
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2011/2613 [25:12<07:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2012/2613 [25:12<07:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2013/2613 [25:13<07:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2014/2613 [25:14<07:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2015/2613 [25:15<07:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2016/2613 [25:16<07:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2017/2613 [25:16<07:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2018/2613 [25:17<07:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2019/2613 [25:18<07:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2020/2613 [25:19<07:25,  1.33it/s]

	Current Loss: 2.4747
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2021/2613 [25:19<07:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2022/2613 [25:20<07:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2023/2613 [25:21<07:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2024/2613 [25:22<07:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2025/2613 [25:22<07:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2026/2613 [25:23<07:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2027/2613 [25:24<07:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2028/2613 [25:25<07:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2029/2613 [25:25<07:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2030/2613 [25:26<07:17,  1.33it/s]

	Current Loss: 2.4717
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2031/2613 [25:27<07:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2032/2613 [25:28<07:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2033/2613 [25:28<07:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2034/2613 [25:29<07:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2035/2613 [25:30<07:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2036/2613 [25:31<07:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2037/2613 [25:31<07:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2038/2613 [25:32<07:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2039/2613 [25:33<07:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2040/2613 [25:34<07:10,  1.33it/s]

	Current Loss: 2.4641
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2041/2613 [25:34<07:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2042/2613 [25:35<07:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2043/2613 [25:36<07:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2044/2613 [25:37<07:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2045/2613 [25:37<07:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2046/2613 [25:38<07:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2047/2613 [25:39<07:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2048/2613 [25:40<07:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2049/2613 [25:40<07:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2050/2613 [25:41<07:03,  1.33it/s]

	Current Loss: 2.4586
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2051/2613 [25:42<07:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2052/2613 [25:43<07:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2053/2613 [25:43<07:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2054/2613 [25:44<06:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2055/2613 [25:45<06:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2056/2613 [25:46<06:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2057/2613 [25:46<06:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2058/2613 [25:47<06:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2059/2613 [25:48<06:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2060/2613 [25:49<06:55,  1.33it/s]

	Current Loss: 2.4564
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2061/2613 [25:49<06:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2062/2613 [25:50<06:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2063/2613 [25:51<06:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2064/2613 [25:52<06:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2065/2613 [25:52<06:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2066/2613 [25:53<06:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2067/2613 [25:54<06:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2068/2613 [25:55<06:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2069/2613 [25:55<06:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2070/2613 [25:56<06:47,  1.33it/s]

	Current Loss: 2.4543
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2071/2613 [25:57<06:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2072/2613 [25:58<06:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2073/2613 [25:58<06:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2074/2613 [25:59<06:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2075/2613 [26:00<06:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2076/2613 [26:01<06:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2077/2613 [26:01<06:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2078/2613 [26:02<06:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2079/2613 [26:03<06:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2080/2613 [26:04<06:40,  1.33it/s]

	Current Loss: 2.4526
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2081/2613 [26:04<06:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2082/2613 [26:05<06:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2083/2613 [26:06<06:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2084/2613 [26:07<06:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2085/2613 [26:07<06:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2086/2613 [26:08<06:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2087/2613 [26:09<06:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2088/2613 [26:10<06:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2089/2613 [26:10<06:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2090/2613 [26:11<06:33,  1.33it/s]

	Current Loss: 2.4584
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2091/2613 [26:12<06:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2092/2613 [26:13<06:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2093/2613 [26:13<06:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2094/2613 [26:14<06:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2095/2613 [26:15<06:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2096/2613 [26:16<06:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2097/2613 [26:16<06:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2098/2613 [26:17<06:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2099/2613 [26:18<06:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2100/2613 [26:19<06:25,  1.33it/s]

	Current Loss: 2.4522
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2101/2613 [26:19<06:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2102/2613 [26:20<06:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2103/2613 [26:21<06:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2104/2613 [26:22<06:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2105/2613 [26:22<06:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2106/2613 [26:23<06:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2107/2613 [26:24<06:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2108/2613 [26:25<06:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2109/2613 [26:25<06:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2110/2613 [26:26<06:18,  1.33it/s]

	Current Loss: 2.4438
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2111/2613 [26:27<06:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2112/2613 [26:28<06:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2113/2613 [26:28<06:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2114/2613 [26:29<06:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2115/2613 [26:30<06:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2116/2613 [26:31<06:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2117/2613 [26:31<06:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2118/2613 [26:32<06:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2119/2613 [26:33<06:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2120/2613 [26:34<06:10,  1.33it/s]

	Current Loss: 2.4427
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2121/2613 [26:34<06:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2122/2613 [26:35<06:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2123/2613 [26:36<06:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2124/2613 [26:37<06:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2125/2613 [26:37<06:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2126/2613 [26:38<06:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2127/2613 [26:39<06:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2128/2613 [26:40<06:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2129/2613 [26:40<06:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2130/2613 [26:41<06:02,  1.33it/s]

	Current Loss: 2.4386
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2131/2613 [26:42<06:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2132/2613 [26:43<06:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2133/2613 [26:43<06:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2134/2613 [26:44<06:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2135/2613 [26:45<05:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2136/2613 [26:46<05:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2137/2613 [26:46<05:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2138/2613 [26:47<05:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2139/2613 [26:48<05:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2140/2613 [26:49<05:55,  1.33it/s]

	Current Loss: 2.4362
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2141/2613 [26:49<05:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2142/2613 [26:50<05:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2143/2613 [26:51<05:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2144/2613 [26:52<05:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2145/2613 [26:52<05:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2146/2613 [26:53<05:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2147/2613 [26:54<05:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2148/2613 [26:55<05:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2149/2613 [26:55<05:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2150/2613 [26:56<05:48,  1.33it/s]

	Current Loss: 2.4374
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2151/2613 [26:57<05:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2152/2613 [26:58<05:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2153/2613 [26:58<05:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2154/2613 [26:59<05:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2155/2613 [27:00<05:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2156/2613 [27:01<05:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2157/2613 [27:01<05:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2158/2613 [27:02<05:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2159/2613 [27:03<05:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2160/2613 [27:04<05:40,  1.33it/s]

	Current Loss: 2.4355
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2161/2613 [27:04<05:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2162/2613 [27:05<05:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2163/2613 [27:06<05:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2164/2613 [27:07<05:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2165/2613 [27:07<05:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2166/2613 [27:08<05:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2167/2613 [27:09<05:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2168/2613 [27:10<05:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2169/2613 [27:10<05:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2170/2613 [27:11<05:32,  1.33it/s]

	Current Loss: 2.4332
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2171/2613 [27:12<05:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2172/2613 [27:13<05:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2173/2613 [27:13<05:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2174/2613 [27:14<05:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2175/2613 [27:15<05:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2176/2613 [27:16<05:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2177/2613 [27:16<05:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2178/2613 [27:17<05:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2179/2613 [27:18<05:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2180/2613 [27:19<05:25,  1.33it/s]

	Current Loss: 2.4371
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2181/2613 [27:19<05:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2182/2613 [27:20<05:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2183/2613 [27:21<05:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2184/2613 [27:22<05:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2185/2613 [27:23<05:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2186/2613 [27:23<05:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2187/2613 [27:24<05:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2188/2613 [27:25<05:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2189/2613 [27:26<05:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2190/2613 [27:26<05:17,  1.33it/s]

	Current Loss: 2.4306
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2191/2613 [27:27<05:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2192/2613 [27:28<05:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2193/2613 [27:29<05:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2194/2613 [27:29<05:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2195/2613 [27:30<05:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2196/2613 [27:31<05:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2197/2613 [27:32<05:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2198/2613 [27:32<05:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2199/2613 [27:33<05:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2200/2613 [27:34<05:10,  1.33it/s]

	Current Loss: 2.4307
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2201/2613 [27:35<05:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2202/2613 [27:35<05:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2203/2613 [27:36<05:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2204/2613 [27:37<05:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2205/2613 [27:38<05:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2206/2613 [27:38<05:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2207/2613 [27:39<05:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2208/2613 [27:40<05:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2209/2613 [27:41<05:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2210/2613 [27:41<05:02,  1.33it/s]

	Current Loss: 2.4253
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2211/2613 [27:42<05:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2212/2613 [27:43<05:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2213/2613 [27:44<05:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2214/2613 [27:44<04:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2215/2613 [27:45<04:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2216/2613 [27:46<04:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2217/2613 [27:47<04:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2218/2613 [27:47<04:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2219/2613 [27:48<04:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2220/2613 [27:49<04:55,  1.33it/s]

	Current Loss: 2.4184
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2221/2613 [27:50<04:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2222/2613 [27:50<04:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2223/2613 [27:51<04:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2224/2613 [27:52<04:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2225/2613 [27:53<04:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2226/2613 [27:53<04:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2227/2613 [27:54<04:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2228/2613 [27:55<04:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2229/2613 [27:56<04:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2230/2613 [27:56<04:47,  1.33it/s]

	Current Loss: 2.4256
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2231/2613 [27:57<04:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2232/2613 [27:58<04:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2233/2613 [27:59<04:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2234/2613 [27:59<04:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2235/2613 [28:00<04:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2236/2613 [28:01<04:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2237/2613 [28:02<04:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2238/2613 [28:02<04:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2239/2613 [28:03<04:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2240/2613 [28:04<04:40,  1.33it/s]

	Current Loss: 2.4144
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2241/2613 [28:05<04:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2242/2613 [28:05<04:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2243/2613 [28:06<04:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2244/2613 [28:07<04:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2245/2613 [28:08<04:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2246/2613 [28:08<04:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2247/2613 [28:09<04:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2248/2613 [28:10<04:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2249/2613 [28:11<04:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2250/2613 [28:11<04:32,  1.33it/s]

	Current Loss: 2.4133
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2251/2613 [28:12<04:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2252/2613 [28:13<04:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2253/2613 [28:14<04:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2254/2613 [28:14<04:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2255/2613 [28:15<04:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2256/2613 [28:16<04:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2257/2613 [28:17<04:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2258/2613 [28:17<04:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2259/2613 [28:18<04:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2260/2613 [28:19<04:25,  1.33it/s]

	Current Loss: 2.4146
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2261/2613 [28:20<04:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2262/2613 [28:20<04:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2263/2613 [28:21<04:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2264/2613 [28:22<04:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2265/2613 [28:23<04:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2266/2613 [28:23<04:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2267/2613 [28:24<04:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2268/2613 [28:25<04:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2269/2613 [28:26<04:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2270/2613 [28:26<04:17,  1.33it/s]

	Current Loss: 2.4079
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2271/2613 [28:27<04:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2272/2613 [28:28<04:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2273/2613 [28:29<04:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2274/2613 [28:29<04:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2275/2613 [28:30<04:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2276/2613 [28:31<04:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2277/2613 [28:32<04:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2278/2613 [28:32<04:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2279/2613 [28:33<04:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2280/2613 [28:34<04:10,  1.33it/s]

	Current Loss: 2.4146
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2281/2613 [28:35<04:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2282/2613 [28:35<04:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2283/2613 [28:36<04:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2284/2613 [28:37<04:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2285/2613 [28:38<04:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2286/2613 [28:38<04:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2287/2613 [28:39<04:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2288/2613 [28:40<04:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2289/2613 [28:41<04:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2290/2613 [28:41<04:02,  1.33it/s]

	Current Loss: 2.4063
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2291/2613 [28:42<04:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2292/2613 [28:43<04:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2293/2613 [28:44<04:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2294/2613 [28:44<03:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2295/2613 [28:45<03:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2296/2613 [28:46<03:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2297/2613 [28:47<03:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2298/2613 [28:47<03:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2299/2613 [28:48<03:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2300/2613 [28:49<03:55,  1.33it/s]

	Current Loss: 2.4033
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2301/2613 [28:50<03:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2302/2613 [28:50<03:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2303/2613 [28:51<03:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2304/2613 [28:52<03:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2305/2613 [28:53<03:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2306/2613 [28:53<03:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2307/2613 [28:54<03:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2308/2613 [28:55<03:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2309/2613 [28:56<03:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2310/2613 [28:56<03:47,  1.33it/s]

	Current Loss: 2.4114
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2311/2613 [28:57<03:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2312/2613 [28:58<03:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2313/2613 [28:59<03:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2314/2613 [28:59<03:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2315/2613 [29:00<03:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2316/2613 [29:01<03:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2317/2613 [29:02<03:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2318/2613 [29:02<03:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2319/2613 [29:03<03:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2320/2613 [29:04<03:40,  1.33it/s]

	Current Loss: 2.4028
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2321/2613 [29:05<03:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2322/2613 [29:05<03:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2323/2613 [29:06<03:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2324/2613 [29:07<03:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2325/2613 [29:08<03:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2326/2613 [29:08<03:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2327/2613 [29:09<03:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2328/2613 [29:10<03:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2329/2613 [29:11<03:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2330/2613 [29:11<03:32,  1.33it/s]

	Current Loss: 2.3975
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2331/2613 [29:12<03:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2332/2613 [29:13<03:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2333/2613 [29:14<03:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2334/2613 [29:14<03:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2335/2613 [29:15<03:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2336/2613 [29:16<03:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2337/2613 [29:17<03:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2338/2613 [29:17<03:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2339/2613 [29:18<03:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2340/2613 [29:19<03:25,  1.33it/s]

	Current Loss: 2.4006
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2341/2613 [29:20<03:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2342/2613 [29:20<03:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2343/2613 [29:21<03:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2344/2613 [29:22<03:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2345/2613 [29:23<03:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2346/2613 [29:23<03:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2347/2613 [29:24<03:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2348/2613 [29:25<03:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2349/2613 [29:26<03:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2350/2613 [29:27<03:17,  1.33it/s]

	Current Loss: 2.3931
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2351/2613 [29:27<03:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2352/2613 [29:28<03:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2353/2613 [29:29<03:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2354/2613 [29:30<03:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2355/2613 [29:30<03:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2356/2613 [29:31<03:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2357/2613 [29:32<03:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2358/2613 [29:33<03:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2359/2613 [29:33<03:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2360/2613 [29:34<03:10,  1.33it/s]

	Current Loss: 2.3917
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2361/2613 [29:35<03:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2362/2613 [29:36<03:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2363/2613 [29:36<03:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2364/2613 [29:37<03:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2365/2613 [29:38<03:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2366/2613 [29:39<03:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2367/2613 [29:39<03:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2368/2613 [29:40<03:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2369/2613 [29:41<03:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2370/2613 [29:42<03:02,  1.33it/s]

	Current Loss: 2.3908
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2371/2613 [29:42<03:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2372/2613 [29:43<03:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2373/2613 [29:44<03:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2374/2613 [29:45<02:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2375/2613 [29:45<02:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2376/2613 [29:46<02:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2377/2613 [29:47<02:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2378/2613 [29:48<02:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2379/2613 [29:48<02:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2380/2613 [29:49<02:55,  1.33it/s]

	Current Loss: 2.3943
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2381/2613 [29:50<02:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2382/2613 [29:51<02:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2383/2613 [29:51<02:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2384/2613 [29:52<02:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2385/2613 [29:53<02:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2386/2613 [29:54<02:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2387/2613 [29:54<02:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2388/2613 [29:55<02:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2389/2613 [29:56<02:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2390/2613 [29:57<02:47,  1.33it/s]

	Current Loss: 2.3936
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2391/2613 [29:57<02:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2392/2613 [29:58<02:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2393/2613 [29:59<02:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2394/2613 [30:00<02:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2395/2613 [30:00<02:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2396/2613 [30:01<02:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2397/2613 [30:02<02:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2398/2613 [30:03<02:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2399/2613 [30:03<02:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2400/2613 [30:04<02:40,  1.33it/s]

	Current Loss: 2.3863
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2401/2613 [30:05<02:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2402/2613 [30:06<02:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2403/2613 [30:06<02:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2404/2613 [30:07<02:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2405/2613 [30:08<02:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2406/2613 [30:09<02:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2407/2613 [30:09<02:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2408/2613 [30:10<02:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2409/2613 [30:11<02:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2410/2613 [30:12<02:32,  1.33it/s]

	Current Loss: 2.3911
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2411/2613 [30:12<02:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2412/2613 [30:13<02:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2413/2613 [30:14<02:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2414/2613 [30:15<02:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2415/2613 [30:15<02:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2416/2613 [30:16<02:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2417/2613 [30:17<02:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2418/2613 [30:18<02:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2419/2613 [30:18<02:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2420/2613 [30:19<02:25,  1.33it/s]

	Current Loss: 2.3835
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2421/2613 [30:20<02:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2422/2613 [30:21<02:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2423/2613 [30:21<02:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2424/2613 [30:22<02:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2425/2613 [30:23<02:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2426/2613 [30:24<02:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2427/2613 [30:24<02:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2428/2613 [30:25<02:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2429/2613 [30:26<02:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2430/2613 [30:27<02:17,  1.33it/s]

	Current Loss: 2.3810
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2431/2613 [30:27<02:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2432/2613 [30:28<02:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2433/2613 [30:29<02:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2434/2613 [30:30<02:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2435/2613 [30:30<02:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2436/2613 [30:31<02:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2437/2613 [30:32<02:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2438/2613 [30:33<02:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2439/2613 [30:33<02:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2440/2613 [30:34<02:10,  1.33it/s]

	Current Loss: 2.3790
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2441/2613 [30:35<02:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2442/2613 [30:36<02:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2443/2613 [30:36<02:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2444/2613 [30:37<02:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2445/2613 [30:38<02:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2446/2613 [30:39<02:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2447/2613 [30:39<02:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2448/2613 [30:40<02:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2449/2613 [30:41<02:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2450/2613 [30:42<02:02,  1.33it/s]

	Current Loss: 2.3724
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2451/2613 [30:42<02:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2452/2613 [30:43<02:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2453/2613 [30:44<02:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2454/2613 [30:45<01:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2455/2613 [30:45<01:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2456/2613 [30:46<01:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2457/2613 [30:47<01:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2458/2613 [30:48<01:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2459/2613 [30:48<01:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2460/2613 [30:49<01:54,  1.33it/s]

	Current Loss: 2.3711
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2461/2613 [30:50<01:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2462/2613 [30:51<01:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2463/2613 [30:51<01:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2464/2613 [30:52<01:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2465/2613 [30:53<01:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2466/2613 [30:54<01:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2467/2613 [30:54<01:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2468/2613 [30:55<01:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2469/2613 [30:56<01:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2470/2613 [30:57<01:47,  1.33it/s]

	Current Loss: 2.3737
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2471/2613 [30:57<01:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2472/2613 [30:58<01:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2473/2613 [30:59<01:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2474/2613 [31:00<01:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2475/2613 [31:00<01:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2476/2613 [31:01<01:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2477/2613 [31:02<01:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2478/2613 [31:03<01:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2479/2613 [31:03<01:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2480/2613 [31:04<01:39,  1.33it/s]

	Current Loss: 2.3723
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2481/2613 [31:05<01:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2482/2613 [31:06<01:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2483/2613 [31:06<01:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2484/2613 [31:07<01:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2485/2613 [31:08<01:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2486/2613 [31:09<01:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2487/2613 [31:09<01:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2488/2613 [31:10<01:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2489/2613 [31:11<01:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2490/2613 [31:12<01:32,  1.33it/s]

	Current Loss: 2.3688
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2491/2613 [31:12<01:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2492/2613 [31:13<01:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2493/2613 [31:14<01:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2494/2613 [31:15<01:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2495/2613 [31:15<01:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2496/2613 [31:16<01:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2497/2613 [31:17<01:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2498/2613 [31:18<01:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2499/2613 [31:18<01:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2500/2613 [31:19<01:24,  1.33it/s]

	Current Loss: 2.3604
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2501/2613 [31:20<01:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2502/2613 [31:21<01:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2503/2613 [31:21<01:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2504/2613 [31:22<01:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2505/2613 [31:23<01:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2506/2613 [31:24<01:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2507/2613 [31:24<01:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2508/2613 [31:25<01:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2509/2613 [31:26<01:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2510/2613 [31:27<01:17,  1.33it/s]

	Current Loss: 2.3657
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2511/2613 [31:27<01:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2512/2613 [31:28<01:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2513/2613 [31:29<01:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2514/2613 [31:30<01:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2515/2613 [31:30<01:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2516/2613 [31:31<01:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2517/2613 [31:32<01:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2518/2613 [31:33<01:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2519/2613 [31:34<01:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2520/2613 [31:34<01:09,  1.33it/s]

	Current Loss: 2.3562
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2521/2613 [31:35<01:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2522/2613 [31:36<01:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2523/2613 [31:37<01:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2524/2613 [31:37<01:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2525/2613 [31:38<01:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2526/2613 [31:39<01:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2527/2613 [31:40<01:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2528/2613 [31:40<01:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2529/2613 [31:41<01:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2530/2613 [31:42<01:02,  1.33it/s]

	Current Loss: 2.3601
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2531/2613 [31:43<01:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2532/2613 [31:43<01:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2533/2613 [31:44<01:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2534/2613 [31:45<00:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2535/2613 [31:46<00:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2536/2613 [31:46<00:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2537/2613 [31:47<00:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2538/2613 [31:48<00:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2539/2613 [31:49<00:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2540/2613 [31:49<00:54,  1.33it/s]

	Current Loss: 2.3550
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2541/2613 [31:50<00:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2542/2613 [31:51<00:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2543/2613 [31:52<00:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2544/2613 [31:52<00:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2545/2613 [31:53<00:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2546/2613 [31:54<00:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2547/2613 [31:55<00:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2548/2613 [31:55<00:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2549/2613 [31:56<00:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2550/2613 [31:57<00:47,  1.33it/s]

	Current Loss: 2.3501
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2551/2613 [31:58<00:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2552/2613 [31:58<00:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2553/2613 [31:59<00:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2554/2613 [32:00<00:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2555/2613 [32:01<00:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2556/2613 [32:01<00:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2557/2613 [32:02<00:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2558/2613 [32:03<00:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2559/2613 [32:04<00:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2560/2613 [32:04<00:39,  1.33it/s]

	Current Loss: 2.3545
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2561/2613 [32:05<00:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2562/2613 [32:06<00:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2563/2613 [32:07<00:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2564/2613 [32:07<00:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2565/2613 [32:08<00:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2566/2613 [32:09<00:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2567/2613 [32:10<00:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2568/2613 [32:10<00:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2569/2613 [32:11<00:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2570/2613 [32:12<00:32,  1.33it/s]

	Current Loss: 2.3496
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2571/2613 [32:13<00:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2572/2613 [32:13<00:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2573/2613 [32:14<00:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2574/2613 [32:15<00:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2575/2613 [32:16<00:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2576/2613 [32:16<00:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2577/2613 [32:17<00:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2578/2613 [32:18<00:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2579/2613 [32:19<00:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2580/2613 [32:19<00:24,  1.33it/s]

	Current Loss: 2.3452
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2581/2613 [32:20<00:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2582/2613 [32:21<00:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2583/2613 [32:22<00:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2584/2613 [32:22<00:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2585/2613 [32:23<00:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2586/2613 [32:24<00:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2587/2613 [32:25<00:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2588/2613 [32:25<00:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2589/2613 [32:26<00:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2590/2613 [32:27<00:17,  1.33it/s]

	Current Loss: 2.3424
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2591/2613 [32:28<00:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2592/2613 [32:28<00:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2593/2613 [32:29<00:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2594/2613 [32:30<00:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2595/2613 [32:31<00:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2596/2613 [32:31<00:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2597/2613 [32:32<00:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2598/2613 [32:33<00:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2599/2613 [32:34<00:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2600/2613 [32:34<00:09,  1.33it/s]

	Current Loss: 2.3445
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2601/2613 [32:35<00:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2602/2613 [32:36<00:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2603/2613 [32:37<00:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2604/2613 [32:37<00:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2605/2613 [32:38<00:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2606/2613 [32:39<00:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2607/2613 [32:40<00:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2608/2613 [32:40<00:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2609/2613 [32:41<00:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2610/2613 [32:42<00:02,  1.33it/s]

	Current Loss: 2.3377
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2611/2613 [32:43<00:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2612/2613 [32:43<00:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|██████████| 2613/2613 [32:44<00:00,  1.33it/s]


Epoch 0, Train Loss: 2.7202, Time: 1964.71s


  0%|          | 0/2613 [00:00<?, ?it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 1/2613 [00:00<14:28,  3.01it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 2/2613 [00:01<25:13,  1.72it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 3/2613 [00:01<28:37,  1.52it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 4/2613 [00:02<30:11,  1.44it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 5/2613 [00:03<31:04,  1.40it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 6/2613 [00:04<31:36,  1.37it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 7/2613 [00:04<31:55,  1.36it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 8/2613 [00:05<32:08,  1.35it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 9/2613 [00:06<32:16,  1.34it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 10/2613 [00:07<32:22,  1.34it/s]

	Current Loss: 2.3383
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 11/2613 [00:07<32:25,  1.34it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 12/2613 [00:08<32:27,  1.34it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 13/2613 [00:09<32:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 14/2613 [00:10<32:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 15/2613 [00:10<32:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 16/2613 [00:11<32:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 17/2613 [00:12<32:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 18/2613 [00:13<32:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 19/2613 [00:13<32:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 20/2613 [00:14<32:28,  1.33it/s]

	Current Loss: 2.3381
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 21/2613 [00:15<32:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 22/2613 [00:16<32:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 23/2613 [00:16<32:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 24/2613 [00:17<32:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 25/2613 [00:18<32:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 26/2613 [00:19<32:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 27/2613 [00:19<32:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 28/2613 [00:20<32:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 29/2613 [00:21<32:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 30/2613 [00:22<32:21,  1.33it/s]

	Current Loss: 2.3296
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 31/2613 [00:22<32:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 32/2613 [00:23<32:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 33/2613 [00:24<32:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 34/2613 [00:25<32:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 35/2613 [00:25<32:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 36/2613 [00:26<32:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 37/2613 [00:27<32:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 38/2613 [00:28<32:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 39/2613 [00:28<32:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 40/2613 [00:29<32:14,  1.33it/s]

	Current Loss: 2.3349
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 41/2613 [00:30<32:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 42/2613 [00:31<32:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 43/2613 [00:31<32:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 44/2613 [00:32<32:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 45/2613 [00:33<32:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 46/2613 [00:34<32:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 47/2613 [00:34<32:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 48/2613 [00:35<32:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 49/2613 [00:36<32:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 50/2613 [00:37<32:06,  1.33it/s]

	Current Loss: 2.3366
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 51/2613 [00:37<32:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 52/2613 [00:38<32:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 53/2613 [00:39<32:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 54/2613 [00:40<32:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 55/2613 [00:40<32:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 56/2613 [00:41<32:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 57/2613 [00:42<32:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 58/2613 [00:43<31:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 59/2613 [00:43<31:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 60/2613 [00:44<31:58,  1.33it/s]

	Current Loss: 2.3280
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 61/2613 [00:45<31:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 62/2613 [00:46<31:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 63/2613 [00:46<31:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 64/2613 [00:47<31:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 65/2613 [00:48<31:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 66/2613 [00:49<31:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 67/2613 [00:49<31:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 68/2613 [00:50<31:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 69/2613 [00:51<31:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 70/2613 [00:52<31:51,  1.33it/s]

	Current Loss: 2.3360
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 71/2613 [00:52<31:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 72/2613 [00:53<31:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 73/2613 [00:54<31:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 74/2613 [00:55<31:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 75/2613 [00:55<31:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 76/2613 [00:56<31:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 77/2613 [00:57<31:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 78/2613 [00:58<31:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 79/2613 [00:58<31:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 80/2613 [00:59<31:43,  1.33it/s]

	Current Loss: 2.3242
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 81/2613 [01:00<31:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 82/2613 [01:01<31:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 83/2613 [01:01<31:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 84/2613 [01:02<31:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 85/2613 [01:03<31:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 86/2613 [01:04<31:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 87/2613 [01:04<31:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 88/2613 [01:05<31:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 89/2613 [01:06<31:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 90/2613 [01:07<31:35,  1.33it/s]

	Current Loss: 2.3239
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 91/2613 [01:07<31:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 92/2613 [01:08<31:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 93/2613 [01:09<31:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 94/2613 [01:10<31:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 95/2613 [01:10<31:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 96/2613 [01:11<31:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 97/2613 [01:12<31:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 98/2613 [01:13<31:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 99/2613 [01:13<31:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 100/2613 [01:14<31:29,  1.33it/s]

	Current Loss: 2.3228
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 101/2613 [01:15<31:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 102/2613 [01:16<31:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 103/2613 [01:16<31:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 104/2613 [01:17<31:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 105/2613 [01:18<31:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 106/2613 [01:19<31:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 107/2613 [01:20<31:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 108/2613 [01:20<31:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 109/2613 [01:21<31:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 110/2613 [01:22<31:22,  1.33it/s]

	Current Loss: 2.3155
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 111/2613 [01:23<31:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 112/2613 [01:23<31:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 113/2613 [01:24<31:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 114/2613 [01:25<31:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 115/2613 [01:26<31:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 116/2613 [01:26<31:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 117/2613 [01:27<31:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 118/2613 [01:28<31:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 119/2613 [01:29<31:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 120/2613 [01:29<31:15,  1.33it/s]

	Current Loss: 2.3246
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 121/2613 [01:30<31:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 122/2613 [01:31<31:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 123/2613 [01:32<31:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 124/2613 [01:32<31:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 125/2613 [01:33<31:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 126/2613 [01:34<31:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 127/2613 [01:35<31:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 128/2613 [01:35<31:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 129/2613 [01:36<31:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 130/2613 [01:37<31:06,  1.33it/s]

	Current Loss: 2.3170
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 131/2613 [01:38<31:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 132/2613 [01:38<31:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 133/2613 [01:39<31:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 134/2613 [01:40<31:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 135/2613 [01:41<31:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 136/2613 [01:41<31:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 137/2613 [01:42<31:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 138/2613 [01:43<31:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 139/2613 [01:44<31:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 140/2613 [01:44<30:58,  1.33it/s]

	Current Loss: 2.3180
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 141/2613 [01:45<30:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 142/2613 [01:46<30:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 143/2613 [01:47<30:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 144/2613 [01:47<30:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 145/2613 [01:48<30:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 146/2613 [01:49<30:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 147/2613 [01:50<30:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 148/2613 [01:50<30:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 149/2613 [01:51<30:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 150/2613 [01:52<30:53,  1.33it/s]

	Current Loss: 2.3199
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 151/2613 [01:53<30:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 152/2613 [01:53<30:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 153/2613 [01:54<30:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 154/2613 [01:55<30:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 155/2613 [01:56<30:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 156/2613 [01:56<30:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 157/2613 [01:57<30:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 158/2613 [01:58<30:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 159/2613 [01:59<30:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 160/2613 [01:59<30:44,  1.33it/s]

	Current Loss: 2.3181
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 161/2613 [02:00<30:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 162/2613 [02:01<30:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 163/2613 [02:02<30:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 164/2613 [02:02<30:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 165/2613 [02:03<30:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 166/2613 [02:04<30:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 167/2613 [02:05<30:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 168/2613 [02:05<30:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 169/2613 [02:06<30:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 170/2613 [02:07<30:37,  1.33it/s]

	Current Loss: 2.3063
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 171/2613 [02:08<30:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 172/2613 [02:08<30:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 173/2613 [02:09<30:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 174/2613 [02:10<30:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 175/2613 [02:11<30:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 176/2613 [02:11<30:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 177/2613 [02:12<30:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 178/2613 [02:13<30:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 179/2613 [02:14<30:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 180/2613 [02:14<30:29,  1.33it/s]

	Current Loss: 2.3060
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 181/2613 [02:15<30:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 182/2613 [02:16<30:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 183/2613 [02:17<30:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 184/2613 [02:17<30:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 185/2613 [02:18<30:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 186/2613 [02:19<30:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 187/2613 [02:20<30:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 188/2613 [02:20<30:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 189/2613 [02:21<30:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 190/2613 [02:22<30:58,  1.30it/s]

	Current Loss: 2.3015
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 191/2613 [02:23<30:11,  1.34it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 192/2613 [02:23<30:13,  1.34it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 193/2613 [02:24<30:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 194/2613 [02:25<30:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 195/2613 [02:26<30:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 196/2613 [02:26<30:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 197/2613 [02:27<30:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 198/2613 [02:28<30:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 199/2613 [02:29<30:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 200/2613 [02:29<30:12,  1.33it/s]

	Current Loss: 2.2969
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 201/2613 [02:30<30:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 202/2613 [02:31<30:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 203/2613 [02:32<30:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 204/2613 [02:32<30:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 205/2613 [02:33<30:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 206/2613 [02:34<30:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 207/2613 [02:35<30:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 208/2613 [02:35<30:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 209/2613 [02:36<30:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 210/2613 [02:37<30:07,  1.33it/s]

	Current Loss: 2.2975
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 211/2613 [02:38<30:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 212/2613 [02:38<30:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 213/2613 [02:39<30:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 214/2613 [02:40<30:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 215/2613 [02:41<30:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 216/2613 [02:41<30:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 217/2613 [02:42<30:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 218/2613 [02:43<29:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 219/2613 [02:44<29:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 220/2613 [02:44<30:00,  1.33it/s]

	Current Loss: 2.2985
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 221/2613 [02:45<29:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 222/2613 [02:46<29:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 223/2613 [02:47<29:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 224/2613 [02:47<29:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 225/2613 [02:48<29:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 226/2613 [02:49<29:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 227/2613 [02:50<29:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 228/2613 [02:50<29:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 229/2613 [02:51<29:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 230/2613 [02:52<29:51,  1.33it/s]

	Current Loss: 2.2931
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 231/2613 [02:53<29:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 232/2613 [02:53<29:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 233/2613 [02:54<29:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 234/2613 [02:55<29:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 235/2613 [02:56<29:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 236/2613 [02:56<29:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 237/2613 [02:57<29:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 238/2613 [02:58<29:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 239/2613 [02:59<29:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 240/2613 [03:00<29:43,  1.33it/s]

	Current Loss: 2.3010
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 241/2613 [03:00<29:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 242/2613 [03:01<29:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 243/2613 [03:02<29:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 244/2613 [03:03<29:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 245/2613 [03:03<29:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 246/2613 [03:04<29:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 247/2613 [03:05<29:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 248/2613 [03:06<29:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 249/2613 [03:06<29:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 250/2613 [03:07<29:34,  1.33it/s]

	Current Loss: 2.2931
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 251/2613 [03:08<29:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 252/2613 [03:09<29:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 253/2613 [03:09<29:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 254/2613 [03:10<29:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 255/2613 [03:11<29:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 256/2613 [03:12<29:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 257/2613 [03:12<29:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 258/2613 [03:13<29:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 259/2613 [03:14<29:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 260/2613 [03:15<29:28,  1.33it/s]

	Current Loss: 2.2857
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 261/2613 [03:15<29:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 262/2613 [03:16<29:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 263/2613 [03:17<29:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 264/2613 [03:18<29:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 265/2613 [03:18<29:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 266/2613 [03:19<29:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 267/2613 [03:20<29:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 268/2613 [03:21<29:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 269/2613 [03:21<29:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 270/2613 [03:22<29:22,  1.33it/s]

	Current Loss: 2.2842
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 271/2613 [03:23<29:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 272/2613 [03:24<29:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 273/2613 [03:24<29:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 274/2613 [03:25<29:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 275/2613 [03:26<29:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 276/2613 [03:27<29:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 277/2613 [03:27<29:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 278/2613 [03:28<29:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 279/2613 [03:29<29:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 280/2613 [03:30<29:11,  1.33it/s]

	Current Loss: 2.2849
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 281/2613 [03:30<29:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 282/2613 [03:31<29:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 283/2613 [03:32<29:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 284/2613 [03:33<29:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 285/2613 [03:33<29:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 286/2613 [03:34<29:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 287/2613 [03:35<29:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 288/2613 [03:36<29:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 289/2613 [03:36<29:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 290/2613 [03:37<29:05,  1.33it/s]

	Current Loss: 2.2847
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 291/2613 [03:38<29:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 292/2613 [03:39<29:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 293/2613 [03:39<29:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 294/2613 [03:40<29:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 295/2613 [03:41<29:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 296/2613 [03:42<29:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 297/2613 [03:42<29:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 298/2613 [03:43<29:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 299/2613 [03:44<28:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 300/2613 [03:45<28:57,  1.33it/s]

	Current Loss: 2.2808
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 301/2613 [03:45<28:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 302/2613 [03:46<28:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 303/2613 [03:47<28:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 304/2613 [03:48<28:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 305/2613 [03:48<28:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 306/2613 [03:49<28:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 307/2613 [03:50<28:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 308/2613 [03:51<28:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 309/2613 [03:51<28:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 310/2613 [03:52<28:51,  1.33it/s]

	Current Loss: 2.2816
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 311/2613 [03:53<28:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 312/2613 [03:54<28:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 313/2613 [03:54<28:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 314/2613 [03:55<28:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 315/2613 [03:56<28:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 316/2613 [03:57<28:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 317/2613 [03:57<28:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 318/2613 [03:58<28:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 319/2613 [03:59<28:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 320/2613 [04:00<28:43,  1.33it/s]

	Current Loss: 2.2753
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 321/2613 [04:00<28:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 322/2613 [04:01<28:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 323/2613 [04:02<28:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 324/2613 [04:03<28:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 325/2613 [04:03<28:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 326/2613 [04:04<28:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 327/2613 [04:05<28:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 328/2613 [04:06<28:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 329/2613 [04:06<28:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 330/2613 [04:07<28:36,  1.33it/s]

	Current Loss: 2.2803
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 331/2613 [04:08<28:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 332/2613 [04:09<28:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 333/2613 [04:09<28:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 334/2613 [04:10<28:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 335/2613 [04:11<28:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 336/2613 [04:12<28:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 337/2613 [04:12<28:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 338/2613 [04:13<28:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 339/2613 [04:14<28:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 340/2613 [04:15<28:30,  1.33it/s]

	Current Loss: 2.2718
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 341/2613 [04:15<28:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 342/2613 [04:16<28:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 343/2613 [04:17<28:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 344/2613 [04:18<28:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 345/2613 [04:18<28:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 346/2613 [04:19<28:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 347/2613 [04:20<28:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 348/2613 [04:21<28:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 349/2613 [04:21<28:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 350/2613 [04:22<28:21,  1.33it/s]

	Current Loss: 2.2782
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 351/2613 [04:23<28:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 352/2613 [04:24<28:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 353/2613 [04:24<28:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 354/2613 [04:25<28:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 355/2613 [04:26<28:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 356/2613 [04:27<28:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 357/2613 [04:27<28:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 358/2613 [04:28<28:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 359/2613 [04:29<28:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 360/2613 [04:30<28:12,  1.33it/s]

	Current Loss: 2.2690
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 361/2613 [04:30<28:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 362/2613 [04:31<28:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 363/2613 [04:32<28:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 364/2613 [04:33<28:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 365/2613 [04:33<28:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 366/2613 [04:34<28:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 367/2613 [04:35<28:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 368/2613 [04:36<28:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 369/2613 [04:36<28:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 370/2613 [04:37<28:06,  1.33it/s]

	Current Loss: 2.2705
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 371/2613 [04:38<28:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 372/2613 [04:39<28:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 373/2613 [04:39<28:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 374/2613 [04:40<28:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 375/2613 [04:41<28:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 376/2613 [04:42<28:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 377/2613 [04:42<28:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 378/2613 [04:43<28:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 379/2613 [04:44<28:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 380/2613 [04:45<27:59,  1.33it/s]

	Current Loss: 2.2668
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 381/2613 [04:46<27:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 382/2613 [04:46<27:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 383/2613 [04:47<27:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 384/2613 [04:48<27:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 385/2613 [04:49<27:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 386/2613 [04:49<27:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 387/2613 [04:50<27:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 388/2613 [04:51<27:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 389/2613 [04:52<27:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 390/2613 [04:52<27:52,  1.33it/s]

	Current Loss: 2.2655
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 391/2613 [04:53<27:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 392/2613 [04:54<27:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 393/2613 [04:55<27:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 394/2613 [04:55<27:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 395/2613 [04:56<27:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 396/2613 [04:57<27:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 397/2613 [04:58<27:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 398/2613 [04:58<27:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 399/2613 [04:59<27:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 400/2613 [05:00<27:44,  1.33it/s]

	Current Loss: 2.2623
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 401/2613 [05:01<27:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 402/2613 [05:01<27:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 403/2613 [05:02<27:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 404/2613 [05:03<27:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 405/2613 [05:04<27:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 406/2613 [05:04<27:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 407/2613 [05:05<27:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 408/2613 [05:06<27:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 409/2613 [05:07<27:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 410/2613 [05:07<27:35,  1.33it/s]

	Current Loss: 2.2595
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 411/2613 [05:08<27:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 412/2613 [05:09<27:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 413/2613 [05:10<27:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 414/2613 [05:10<27:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 415/2613 [05:11<27:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 416/2613 [05:12<27:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 417/2613 [05:13<27:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 418/2613 [05:13<27:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 419/2613 [05:14<27:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 420/2613 [05:15<27:29,  1.33it/s]

	Current Loss: 2.2604
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 421/2613 [05:16<27:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 422/2613 [05:16<27:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 423/2613 [05:17<27:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 424/2613 [05:18<27:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 425/2613 [05:19<27:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 426/2613 [05:19<27:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 427/2613 [05:20<27:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 428/2613 [05:21<27:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 429/2613 [05:22<27:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 430/2613 [05:22<27:21,  1.33it/s]

	Current Loss: 2.2592
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 431/2613 [05:23<27:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 432/2613 [05:24<27:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 433/2613 [05:25<27:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 434/2613 [05:25<27:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 435/2613 [05:26<27:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 436/2613 [05:27<27:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 437/2613 [05:28<27:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 438/2613 [05:28<27:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 439/2613 [05:29<27:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 440/2613 [05:30<27:14,  1.33it/s]

	Current Loss: 2.2518
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 441/2613 [05:31<27:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 442/2613 [05:31<27:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 443/2613 [05:32<27:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 444/2613 [05:33<27:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 445/2613 [05:34<27:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 446/2613 [05:34<27:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 447/2613 [05:35<27:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 448/2613 [05:36<27:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 449/2613 [05:37<27:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 450/2613 [05:37<27:05,  1.33it/s]

	Current Loss: 2.2481
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 451/2613 [05:38<27:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 452/2613 [05:39<27:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 453/2613 [05:40<27:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 454/2613 [05:40<27:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 455/2613 [05:41<27:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 456/2613 [05:42<27:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 457/2613 [05:43<27:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 458/2613 [05:43<27:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 459/2613 [05:44<26:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 460/2613 [05:45<26:58,  1.33it/s]

	Current Loss: 2.2534
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 461/2613 [05:46<26:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 462/2613 [05:46<26:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 463/2613 [05:47<26:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 464/2613 [05:48<26:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 465/2613 [05:49<26:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 466/2613 [05:49<26:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 467/2613 [05:50<26:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 468/2613 [05:51<26:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 469/2613 [05:52<26:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 470/2613 [05:52<26:52,  1.33it/s]

	Current Loss: 2.2501
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 471/2613 [05:53<26:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 472/2613 [05:54<26:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 473/2613 [05:55<26:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 474/2613 [05:55<26:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 475/2613 [05:56<26:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 476/2613 [05:57<26:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 477/2613 [05:58<26:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 478/2613 [05:58<26:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 479/2613 [05:59<26:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 480/2613 [06:00<26:42,  1.33it/s]

	Current Loss: 2.2540
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 481/2613 [06:01<26:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 482/2613 [06:01<26:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 483/2613 [06:02<26:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 484/2613 [06:03<26:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 485/2613 [06:04<26:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 486/2613 [06:04<26:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 487/2613 [06:05<26:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 488/2613 [06:06<26:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 489/2613 [06:07<26:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 490/2613 [06:07<26:35,  1.33it/s]

	Current Loss: 2.2475
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 491/2613 [06:08<26:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 492/2613 [06:09<26:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 493/2613 [06:10<26:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 494/2613 [06:10<26:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 495/2613 [06:11<26:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 496/2613 [06:12<26:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 497/2613 [06:13<26:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 498/2613 [06:13<26:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 499/2613 [06:14<26:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 500/2613 [06:15<26:28,  1.33it/s]

	Current Loss: 2.2434
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 501/2613 [06:16<26:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 502/2613 [06:16<26:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 503/2613 [06:17<26:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 504/2613 [06:18<26:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 505/2613 [06:19<26:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 506/2613 [06:19<26:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 507/2613 [06:20<26:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 508/2613 [06:21<26:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 509/2613 [06:22<26:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 510/2613 [06:22<26:20,  1.33it/s]

	Current Loss: 2.2448
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 511/2613 [06:23<26:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 512/2613 [06:24<26:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 513/2613 [06:25<26:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 514/2613 [06:25<26:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 515/2613 [06:26<26:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 516/2613 [06:27<26:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 517/2613 [06:28<26:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 518/2613 [06:29<26:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 519/2613 [06:29<26:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 520/2613 [06:30<26:12,  1.33it/s]

	Current Loss: 2.2426
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 521/2613 [06:31<26:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 522/2613 [06:32<26:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 523/2613 [06:32<26:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 524/2613 [06:33<26:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 525/2613 [06:34<26:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 526/2613 [06:35<26:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 527/2613 [06:35<26:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 528/2613 [06:36<26:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 529/2613 [06:37<26:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 530/2613 [06:38<26:06,  1.33it/s]

	Current Loss: 2.2387
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 531/2613 [06:38<26:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 532/2613 [06:39<26:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 533/2613 [06:40<26:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 534/2613 [06:41<26:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 535/2613 [06:41<26:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 536/2613 [06:42<26:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 537/2613 [06:43<26:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 538/2613 [06:44<25:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 539/2613 [06:44<26:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 540/2613 [06:45<25:58,  1.33it/s]

	Current Loss: 2.2342
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 541/2613 [06:46<25:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 542/2613 [06:47<25:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 543/2613 [06:47<25:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 544/2613 [06:48<25:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 545/2613 [06:49<25:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 546/2613 [06:50<25:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 547/2613 [06:50<25:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 548/2613 [06:51<25:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 549/2613 [06:52<25:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 550/2613 [06:53<25:49,  1.33it/s]

	Current Loss: 2.2396
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 551/2613 [06:53<25:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 552/2613 [06:54<25:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 553/2613 [06:55<25:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 554/2613 [06:56<25:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 555/2613 [06:56<25:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 556/2613 [06:57<25:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 557/2613 [06:58<25:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 558/2613 [06:59<25:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 559/2613 [06:59<25:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 560/2613 [07:00<25:43,  1.33it/s]

	Current Loss: 2.2367
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 561/2613 [07:01<25:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 562/2613 [07:02<25:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 563/2613 [07:02<25:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 564/2613 [07:03<25:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 565/2613 [07:04<25:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 566/2613 [07:05<25:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 567/2613 [07:05<25:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 568/2613 [07:06<25:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 569/2613 [07:07<25:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 570/2613 [07:08<25:35,  1.33it/s]

	Current Loss: 2.2322
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 571/2613 [07:08<25:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 572/2613 [07:09<25:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 573/2613 [07:10<25:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 574/2613 [07:11<25:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 575/2613 [07:11<25:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 576/2613 [07:12<25:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 577/2613 [07:13<25:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 578/2613 [07:14<25:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 579/2613 [07:14<25:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 580/2613 [07:15<25:28,  1.33it/s]

	Current Loss: 2.2310
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 581/2613 [07:16<25:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 582/2613 [07:17<25:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 583/2613 [07:17<25:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 584/2613 [07:18<25:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 585/2613 [07:19<25:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 586/2613 [07:20<25:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 587/2613 [07:20<25:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 588/2613 [07:21<25:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 589/2613 [07:22<25:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 590/2613 [07:23<25:21,  1.33it/s]

	Current Loss: 2.2307
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 591/2613 [07:23<25:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 592/2613 [07:24<25:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 593/2613 [07:25<25:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 594/2613 [07:26<25:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 595/2613 [07:26<25:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 596/2613 [07:27<25:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 597/2613 [07:28<25:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 598/2613 [07:29<25:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 599/2613 [07:29<25:42,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 600/2613 [07:30<25:32,  1.31it/s]

	Current Loss: 2.2290
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 601/2613 [07:31<25:26,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 602/2613 [07:32<25:21,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 603/2613 [07:32<25:17,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 604/2613 [07:33<25:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 605/2613 [07:34<25:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 606/2613 [07:35<25:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 607/2613 [07:35<25:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 608/2613 [07:36<25:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 609/2613 [07:37<25:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 610/2613 [07:38<25:06,  1.33it/s]

	Current Loss: 2.2241
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 611/2613 [07:38<25:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 612/2613 [07:39<25:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 613/2613 [07:40<25:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 614/2613 [07:41<25:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 615/2613 [07:41<25:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 616/2613 [07:42<25:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 617/2613 [07:43<25:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 618/2613 [07:44<25:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 619/2613 [07:44<24:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 620/2613 [07:45<24:59,  1.33it/s]

	Current Loss: 2.2220
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 621/2613 [07:46<24:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 622/2613 [07:47<24:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 623/2613 [07:47<24:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 624/2613 [07:48<24:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 625/2613 [07:49<24:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 626/2613 [07:50<24:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 627/2613 [07:50<24:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 628/2613 [07:51<24:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 629/2613 [07:52<24:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 630/2613 [07:53<24:53,  1.33it/s]

	Current Loss: 2.2267
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 631/2613 [07:54<24:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 632/2613 [07:54<24:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 633/2613 [07:55<24:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 634/2613 [07:56<24:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 635/2613 [07:57<24:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 636/2613 [07:57<24:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 637/2613 [07:58<24:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 638/2613 [07:59<24:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 639/2613 [08:00<24:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 640/2613 [08:00<24:42,  1.33it/s]

	Current Loss: 2.2183
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 641/2613 [08:01<24:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 642/2613 [08:02<24:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 643/2613 [08:03<24:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 644/2613 [08:03<24:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 645/2613 [08:04<24:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 646/2613 [08:05<24:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 647/2613 [08:06<24:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 648/2613 [08:06<24:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 649/2613 [08:07<24:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 650/2613 [08:08<24:35,  1.33it/s]

	Current Loss: 2.2174
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 651/2613 [08:09<24:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 652/2613 [08:09<24:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 653/2613 [08:10<24:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 654/2613 [08:11<24:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 655/2613 [08:12<24:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 656/2613 [08:12<24:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 657/2613 [08:13<24:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 658/2613 [08:14<24:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 659/2613 [08:15<24:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 660/2613 [08:15<24:28,  1.33it/s]

	Current Loss: 2.2175
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 661/2613 [08:16<24:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 662/2613 [08:17<24:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 663/2613 [08:18<24:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 664/2613 [08:18<24:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 665/2613 [08:19<24:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 666/2613 [08:20<24:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 667/2613 [08:21<24:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 668/2613 [08:21<24:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 669/2613 [08:22<24:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 670/2613 [08:23<24:21,  1.33it/s]

	Current Loss: 2.2125
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 671/2613 [08:24<24:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 672/2613 [08:24<24:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 673/2613 [08:25<24:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 674/2613 [08:26<24:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 675/2613 [08:27<24:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 676/2613 [08:27<24:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 677/2613 [08:28<24:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 678/2613 [08:29<24:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 679/2613 [08:30<24:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 680/2613 [08:30<24:12,  1.33it/s]

	Current Loss: 2.2095
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 681/2613 [08:31<24:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 682/2613 [08:32<24:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 683/2613 [08:33<24:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 684/2613 [08:33<24:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 685/2613 [08:34<24:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 686/2613 [08:35<24:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 687/2613 [08:36<24:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 688/2613 [08:36<24:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 689/2613 [08:37<24:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 690/2613 [08:38<24:05,  1.33it/s]

	Current Loss: 2.2123
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 691/2613 [08:39<24:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 692/2613 [08:39<24:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 693/2613 [08:40<24:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 694/2613 [08:41<24:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 695/2613 [08:42<24:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 696/2613 [08:42<24:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 697/2613 [08:43<24:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 698/2613 [08:44<23:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 699/2613 [08:45<23:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 700/2613 [08:45<23:57,  1.33it/s]

	Current Loss: 2.2127
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 701/2613 [08:46<23:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 702/2613 [08:47<23:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 703/2613 [08:48<23:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 704/2613 [08:48<23:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 705/2613 [08:49<23:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 706/2613 [08:50<23:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 707/2613 [08:51<23:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 708/2613 [08:51<23:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 709/2613 [08:52<23:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 710/2613 [08:53<23:50,  1.33it/s]

	Current Loss: 2.2118
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 711/2613 [08:54<23:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 712/2613 [08:54<23:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 713/2613 [08:55<23:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 714/2613 [08:56<23:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 715/2613 [08:57<23:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 716/2613 [08:57<23:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 717/2613 [08:58<23:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 718/2613 [08:59<23:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 719/2613 [09:00<23:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 720/2613 [09:00<23:43,  1.33it/s]

	Current Loss: 2.2068
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 721/2613 [09:01<23:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 722/2613 [09:02<23:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 723/2613 [09:03<23:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 724/2613 [09:03<23:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 725/2613 [09:04<23:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 726/2613 [09:05<23:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 727/2613 [09:06<23:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 728/2613 [09:06<23:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 729/2613 [09:07<23:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 730/2613 [09:08<23:35,  1.33it/s]

	Current Loss: 2.2030
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 731/2613 [09:09<23:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 732/2613 [09:09<23:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 733/2613 [09:10<23:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 734/2613 [09:11<23:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 735/2613 [09:12<23:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 736/2613 [09:12<23:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 737/2613 [09:13<23:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 738/2613 [09:14<23:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 739/2613 [09:15<23:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 740/2613 [09:15<23:28,  1.33it/s]

	Current Loss: 2.1990
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 741/2613 [09:16<23:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 742/2613 [09:17<23:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 743/2613 [09:18<23:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 744/2613 [09:18<23:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 745/2613 [09:19<23:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 746/2613 [09:20<23:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 747/2613 [09:21<23:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 748/2613 [09:21<23:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 749/2613 [09:22<23:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 750/2613 [09:23<23:21,  1.33it/s]

	Current Loss: 2.1967
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 751/2613 [09:24<23:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 752/2613 [09:24<23:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 753/2613 [09:25<23:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 754/2613 [09:26<23:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 755/2613 [09:27<23:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 756/2613 [09:27<23:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 757/2613 [09:28<23:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 758/2613 [09:29<23:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 759/2613 [09:30<23:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 760/2613 [09:30<23:13,  1.33it/s]

	Current Loss: 2.1955
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 761/2613 [09:31<23:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 762/2613 [09:32<23:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 763/2613 [09:33<23:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 764/2613 [09:33<23:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 765/2613 [09:34<23:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 766/2613 [09:35<23:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 767/2613 [09:36<23:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 768/2613 [09:36<23:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 769/2613 [09:37<23:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 770/2613 [09:38<23:05,  1.33it/s]

	Current Loss: 2.1956
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 771/2613 [09:39<23:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 772/2613 [09:40<23:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 773/2613 [09:40<23:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 774/2613 [09:41<23:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 775/2613 [09:42<23:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 776/2613 [09:43<23:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 777/2613 [09:43<22:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 778/2613 [09:44<22:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 779/2613 [09:45<22:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 780/2613 [09:46<22:58,  1.33it/s]

	Current Loss: 2.1967
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 781/2613 [09:46<22:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 782/2613 [09:47<22:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 783/2613 [09:48<22:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 784/2613 [09:49<22:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 785/2613 [09:49<22:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 786/2613 [09:50<22:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 787/2613 [09:51<22:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 788/2613 [09:52<22:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 789/2613 [09:52<22:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 790/2613 [09:53<22:50,  1.33it/s]

	Current Loss: 2.1916
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 791/2613 [09:54<22:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 792/2613 [09:55<22:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 793/2613 [09:55<22:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 794/2613 [09:56<22:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 795/2613 [09:57<22:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 796/2613 [09:58<22:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 797/2613 [09:58<22:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 798/2613 [09:59<22:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 799/2613 [10:00<22:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 800/2613 [10:01<22:42,  1.33it/s]

	Current Loss: 2.1873
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 801/2613 [10:01<22:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 802/2613 [10:02<22:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 803/2613 [10:03<22:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 804/2613 [10:04<22:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 805/2613 [10:04<22:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 806/2613 [10:05<22:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 807/2613 [10:06<22:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 808/2613 [10:07<22:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 809/2613 [10:07<22:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 810/2613 [10:08<22:36,  1.33it/s]

	Current Loss: 2.1887
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 811/2613 [10:09<22:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 812/2613 [10:10<22:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 813/2613 [10:10<22:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 814/2613 [10:11<22:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 815/2613 [10:12<22:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 816/2613 [10:13<22:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 817/2613 [10:13<22:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 818/2613 [10:14<22:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 819/2613 [10:15<22:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 820/2613 [10:16<22:27,  1.33it/s]

	Current Loss: 2.1911
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 821/2613 [10:16<22:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 822/2613 [10:17<22:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 823/2613 [10:18<22:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 824/2613 [10:19<22:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 825/2613 [10:19<22:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 826/2613 [10:20<22:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 827/2613 [10:21<22:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 828/2613 [10:22<22:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 829/2613 [10:22<22:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 830/2613 [10:23<22:21,  1.33it/s]

	Current Loss: 2.1869
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 831/2613 [10:24<22:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 832/2613 [10:25<22:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 833/2613 [10:25<22:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 834/2613 [10:26<22:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 835/2613 [10:27<22:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 836/2613 [10:28<22:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 837/2613 [10:28<22:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 838/2613 [10:29<22:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 839/2613 [10:30<22:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 840/2613 [10:31<22:13,  1.33it/s]

	Current Loss: 2.1838
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 841/2613 [10:31<22:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 842/2613 [10:32<22:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 843/2613 [10:33<22:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 844/2613 [10:34<22:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 845/2613 [10:34<22:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 846/2613 [10:35<22:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 847/2613 [10:36<22:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 848/2613 [10:37<22:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 849/2613 [10:37<22:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 850/2613 [10:38<22:05,  1.33it/s]

	Current Loss: 2.1829
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 851/2613 [10:39<22:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 852/2613 [10:40<22:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 853/2613 [10:40<22:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 854/2613 [10:41<22:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 855/2613 [10:42<22:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 856/2613 [10:43<22:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 857/2613 [10:43<21:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 858/2613 [10:44<21:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 859/2613 [10:45<21:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 860/2613 [10:46<21:58,  1.33it/s]

	Current Loss: 2.1819
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 861/2613 [10:46<21:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 862/2613 [10:47<21:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 863/2613 [10:48<21:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 864/2613 [10:49<21:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 865/2613 [10:49<21:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 866/2613 [10:50<21:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 867/2613 [10:51<21:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 868/2613 [10:52<21:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 869/2613 [10:52<21:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 870/2613 [10:53<21:49,  1.33it/s]

	Current Loss: 2.1829
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 871/2613 [10:54<21:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 872/2613 [10:55<21:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 873/2613 [10:55<21:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 874/2613 [10:56<21:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 875/2613 [10:57<21:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 876/2613 [10:58<21:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 877/2613 [10:58<21:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 878/2613 [10:59<21:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 879/2613 [11:00<21:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 880/2613 [11:01<21:42,  1.33it/s]

	Current Loss: 2.1794
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 881/2613 [11:01<21:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 882/2613 [11:02<21:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 883/2613 [11:03<21:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 884/2613 [11:04<21:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 885/2613 [11:04<21:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 886/2613 [11:05<21:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 887/2613 [11:06<21:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 888/2613 [11:07<21:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 889/2613 [11:07<21:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 890/2613 [11:08<21:38,  1.33it/s]

	Current Loss: 2.1768
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 891/2613 [11:09<21:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 892/2613 [11:10<21:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 893/2613 [11:10<21:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 894/2613 [11:11<21:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 895/2613 [11:12<21:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 896/2613 [11:13<21:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 897/2613 [11:13<21:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 898/2613 [11:14<21:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 899/2613 [11:15<21:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 900/2613 [11:16<21:28,  1.33it/s]

	Current Loss: 2.1677
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 901/2613 [11:16<21:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 902/2613 [11:17<21:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 903/2613 [11:18<21:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 904/2613 [11:19<21:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 905/2613 [11:20<21:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 906/2613 [11:20<21:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 907/2613 [11:21<21:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 908/2613 [11:22<21:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 909/2613 [11:23<21:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 910/2613 [11:23<21:21,  1.33it/s]

	Current Loss: 2.1701
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 911/2613 [11:24<21:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 912/2613 [11:25<21:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 913/2613 [11:26<21:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 914/2613 [11:26<21:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 915/2613 [11:27<21:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 916/2613 [11:28<21:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 917/2613 [11:29<21:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 918/2613 [11:29<21:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 919/2613 [11:30<21:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 920/2613 [11:31<21:12,  1.33it/s]

	Current Loss: 2.1697
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 921/2613 [11:32<21:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 922/2613 [11:32<21:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 923/2613 [11:33<21:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 924/2613 [11:34<21:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 925/2613 [11:35<21:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 926/2613 [11:35<21:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 927/2613 [11:36<21:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 928/2613 [11:37<21:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 929/2613 [11:38<21:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 930/2613 [11:38<21:04,  1.33it/s]

	Current Loss: 2.1699
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 931/2613 [11:39<21:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 932/2613 [11:40<21:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 933/2613 [11:41<21:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 934/2613 [11:41<21:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 935/2613 [11:42<21:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 936/2613 [11:43<21:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 937/2613 [11:44<20:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 938/2613 [11:44<21:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 939/2613 [11:45<20:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 940/2613 [11:46<20:58,  1.33it/s]

	Current Loss: 2.1675
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 941/2613 [11:47<20:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 942/2613 [11:47<20:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 943/2613 [11:48<20:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 944/2613 [11:49<20:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 945/2613 [11:50<20:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 946/2613 [11:50<20:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 947/2613 [11:51<20:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 948/2613 [11:52<20:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 949/2613 [11:53<20:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 950/2613 [11:53<20:50,  1.33it/s]

	Current Loss: 2.1658
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 951/2613 [11:54<20:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 952/2613 [11:55<20:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 953/2613 [11:56<20:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 954/2613 [11:56<20:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 955/2613 [11:57<20:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 956/2613 [11:58<20:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 957/2613 [11:59<20:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 958/2613 [11:59<20:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 959/2613 [12:00<20:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 960/2613 [12:01<20:42,  1.33it/s]

	Current Loss: 2.1690
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 961/2613 [12:02<20:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 962/2613 [12:02<20:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 963/2613 [12:03<20:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 964/2613 [12:04<20:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 965/2613 [12:05<20:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 966/2613 [12:05<20:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 967/2613 [12:06<20:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 968/2613 [12:07<20:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 969/2613 [12:08<20:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 970/2613 [12:08<20:35,  1.33it/s]

	Current Loss: 2.1659
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 971/2613 [12:09<20:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 972/2613 [12:10<20:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 973/2613 [12:11<20:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 974/2613 [12:11<20:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 975/2613 [12:12<20:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 976/2613 [12:13<20:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 977/2613 [12:14<20:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 978/2613 [12:14<20:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 979/2613 [12:15<20:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 980/2613 [12:16<20:27,  1.33it/s]

	Current Loss: 2.1629
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 981/2613 [12:17<20:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 982/2613 [12:17<20:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 983/2613 [12:18<20:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 984/2613 [12:19<20:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 985/2613 [12:20<20:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 986/2613 [12:20<20:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 987/2613 [12:21<20:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 988/2613 [12:22<20:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 989/2613 [12:23<20:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 990/2613 [12:23<20:19,  1.33it/s]

	Current Loss: 2.1674
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 991/2613 [12:24<20:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 992/2613 [12:25<20:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 993/2613 [12:26<20:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 994/2613 [12:26<20:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 995/2613 [12:27<20:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 996/2613 [12:28<20:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 997/2613 [12:29<20:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 998/2613 [12:29<20:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 999/2613 [12:30<20:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1000/2613 [12:31<20:12,  1.33it/s]

	Current Loss: 2.1572
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1001/2613 [12:32<20:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1002/2613 [12:32<20:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1003/2613 [12:33<20:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1004/2613 [12:34<20:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1005/2613 [12:35<20:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1006/2613 [12:35<20:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1007/2613 [12:36<20:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1008/2613 [12:37<20:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1009/2613 [12:38<20:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1010/2613 [12:38<20:05,  1.33it/s]

	Current Loss: 2.1551
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1011/2613 [12:39<20:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1012/2613 [12:40<20:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1013/2613 [12:41<20:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1014/2613 [12:41<20:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1015/2613 [12:42<20:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1016/2613 [12:43<20:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1017/2613 [12:44<19:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1018/2613 [12:44<19:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1019/2613 [12:45<19:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1020/2613 [12:46<19:57,  1.33it/s]

	Current Loss: 2.1539
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1021/2613 [12:47<19:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1022/2613 [12:47<19:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1023/2613 [12:48<19:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1024/2613 [12:49<19:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1025/2613 [12:50<19:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1026/2613 [12:50<19:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1027/2613 [12:51<19:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1028/2613 [12:52<19:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1029/2613 [12:53<19:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1030/2613 [12:53<19:50,  1.33it/s]

	Current Loss: 2.1584
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1031/2613 [12:54<19:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1032/2613 [12:55<19:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1033/2613 [12:56<19:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1034/2613 [12:56<19:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1035/2613 [12:57<19:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1036/2613 [12:58<19:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1037/2613 [12:59<19:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1038/2613 [12:59<19:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1039/2613 [13:00<19:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1040/2613 [13:01<19:42,  1.33it/s]

	Current Loss: 2.1548
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1041/2613 [13:02<19:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1042/2613 [13:02<19:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1043/2613 [13:03<19:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1044/2613 [13:04<19:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1045/2613 [13:05<19:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1046/2613 [13:06<19:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1047/2613 [13:06<19:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1048/2613 [13:07<19:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1049/2613 [13:08<19:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1050/2613 [13:09<19:34,  1.33it/s]

	Current Loss: 2.1553
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1051/2613 [13:09<19:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1052/2613 [13:10<19:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1053/2613 [13:11<19:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1054/2613 [13:12<19:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1055/2613 [13:12<19:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1056/2613 [13:13<19:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1057/2613 [13:14<19:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1058/2613 [13:15<19:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1059/2613 [13:15<19:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1060/2613 [13:16<19:29,  1.33it/s]

	Current Loss: 2.1440
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1061/2613 [13:17<19:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1062/2613 [13:18<19:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1063/2613 [13:18<19:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1064/2613 [13:19<19:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1065/2613 [13:20<19:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1066/2613 [13:21<19:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1067/2613 [13:21<19:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1068/2613 [13:22<19:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1069/2613 [13:23<19:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1070/2613 [13:24<19:19,  1.33it/s]

	Current Loss: 2.1462
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1071/2613 [13:24<19:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1072/2613 [13:25<19:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1073/2613 [13:26<19:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1074/2613 [13:27<19:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1075/2613 [13:27<19:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1076/2613 [13:28<19:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1077/2613 [13:29<19:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1078/2613 [13:30<19:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1079/2613 [13:30<19:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1080/2613 [13:31<19:11,  1.33it/s]

	Current Loss: 2.1474
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1081/2613 [13:32<19:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1082/2613 [13:33<19:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1083/2613 [13:33<19:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1084/2613 [13:34<19:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1085/2613 [13:35<19:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1086/2613 [13:36<19:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1087/2613 [13:36<19:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1088/2613 [13:37<19:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1089/2613 [13:38<19:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1090/2613 [13:39<19:05,  1.33it/s]

	Current Loss: 2.1397
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1091/2613 [13:39<19:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1092/2613 [13:40<19:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1093/2613 [13:41<19:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1094/2613 [13:42<19:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1095/2613 [13:42<19:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1096/2613 [13:43<18:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1097/2613 [13:44<18:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1098/2613 [13:45<18:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1099/2613 [13:45<18:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1100/2613 [13:46<18:56,  1.33it/s]

	Current Loss: 2.1417
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1101/2613 [13:47<18:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1102/2613 [13:48<18:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1103/2613 [13:48<18:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1104/2613 [13:49<18:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1105/2613 [13:50<18:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1106/2613 [13:51<18:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1107/2613 [13:51<18:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1108/2613 [13:52<18:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1109/2613 [13:53<18:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1110/2613 [13:54<18:49,  1.33it/s]

	Current Loss: 2.1412
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1111/2613 [13:54<18:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1112/2613 [13:55<18:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1113/2613 [13:56<18:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1114/2613 [13:57<18:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1115/2613 [13:57<18:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1116/2613 [13:58<18:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1117/2613 [13:59<18:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1118/2613 [14:00<18:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1119/2613 [14:00<18:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1120/2613 [14:01<18:42,  1.33it/s]

	Current Loss: 2.1362
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1121/2613 [14:02<18:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1122/2613 [14:03<18:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1123/2613 [14:03<18:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1124/2613 [14:04<18:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1125/2613 [14:05<18:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1126/2613 [14:06<18:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1127/2613 [14:06<18:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1128/2613 [14:07<18:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1129/2613 [14:08<18:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1130/2613 [14:09<18:35,  1.33it/s]

	Current Loss: 2.1416
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1131/2613 [14:09<18:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1132/2613 [14:10<18:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1133/2613 [14:11<18:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1134/2613 [14:12<18:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1135/2613 [14:12<18:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1136/2613 [14:13<18:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1137/2613 [14:14<18:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1138/2613 [14:15<18:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1139/2613 [14:15<18:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1140/2613 [14:16<18:27,  1.33it/s]

	Current Loss: 2.1373
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1141/2613 [14:17<18:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1142/2613 [14:18<18:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1143/2613 [14:18<18:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1144/2613 [14:19<18:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1145/2613 [14:20<18:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1146/2613 [14:21<18:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1147/2613 [14:21<18:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1148/2613 [14:22<18:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1149/2613 [14:23<18:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1150/2613 [14:24<18:19,  1.33it/s]

	Current Loss: 2.1331
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1151/2613 [14:24<18:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1152/2613 [14:25<18:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1153/2613 [14:26<18:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1154/2613 [14:27<18:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1155/2613 [14:27<18:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1156/2613 [14:28<18:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1157/2613 [14:29<18:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1158/2613 [14:30<18:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1159/2613 [14:30<18:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1160/2613 [14:31<18:12,  1.33it/s]

	Current Loss: 2.1371
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1161/2613 [14:32<18:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1162/2613 [14:33<18:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1163/2613 [14:33<18:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1164/2613 [14:34<18:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1165/2613 [14:35<18:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1166/2613 [14:36<18:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1167/2613 [14:36<18:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1168/2613 [14:37<18:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1169/2613 [14:38<18:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1170/2613 [14:39<18:04,  1.33it/s]

	Current Loss: 2.1291
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1171/2613 [14:40<18:23,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1172/2613 [14:40<18:16,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1173/2613 [14:41<18:12,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1174/2613 [14:42<18:08,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1175/2613 [14:43<18:05,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1176/2613 [14:43<18:04,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1177/2613 [14:44<18:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1178/2613 [14:45<18:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1179/2613 [14:46<17:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1180/2613 [14:46<17:59,  1.33it/s]

	Current Loss: 2.1349
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1181/2613 [14:47<17:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1182/2613 [14:48<17:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1183/2613 [14:49<17:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1184/2613 [14:49<17:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1185/2613 [14:50<17:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1186/2613 [14:51<17:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1187/2613 [14:52<17:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1188/2613 [14:52<17:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1189/2613 [14:53<17:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1190/2613 [14:54<17:50,  1.33it/s]

	Current Loss: 2.1311
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1191/2613 [14:55<17:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1192/2613 [14:55<17:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1193/2613 [14:56<17:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1194/2613 [14:57<17:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1195/2613 [14:58<17:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1196/2613 [14:58<17:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1197/2613 [14:59<17:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1198/2613 [15:00<17:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1199/2613 [15:01<17:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1200/2613 [15:01<17:43,  1.33it/s]

	Current Loss: 2.1253
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1201/2613 [15:02<17:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1202/2613 [15:03<17:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1203/2613 [15:04<17:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1204/2613 [15:04<17:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1205/2613 [15:05<17:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1206/2613 [15:06<17:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1207/2613 [15:07<17:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1208/2613 [15:07<17:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1209/2613 [15:08<17:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1210/2613 [15:09<17:35,  1.33it/s]

	Current Loss: 2.1275
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1211/2613 [15:10<17:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1212/2613 [15:10<17:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1213/2613 [15:11<17:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1214/2613 [15:12<17:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1215/2613 [15:13<17:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1216/2613 [15:13<17:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1217/2613 [15:14<17:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1218/2613 [15:15<17:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1219/2613 [15:16<17:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1220/2613 [15:16<17:26,  1.33it/s]

	Current Loss: 2.1265
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1221/2613 [15:17<17:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1222/2613 [15:18<17:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1223/2613 [15:19<17:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1224/2613 [15:19<17:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1225/2613 [15:20<17:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1226/2613 [15:21<17:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1227/2613 [15:22<17:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1228/2613 [15:22<17:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1229/2613 [15:23<17:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1230/2613 [15:24<17:21,  1.33it/s]

	Current Loss: 2.1297
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1231/2613 [15:25<17:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1232/2613 [15:25<17:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1233/2613 [15:26<17:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1234/2613 [15:27<17:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1235/2613 [15:28<17:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1236/2613 [15:28<17:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1237/2613 [15:29<17:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1238/2613 [15:30<17:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1239/2613 [15:31<17:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1240/2613 [15:31<17:12,  1.33it/s]

	Current Loss: 2.1206
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1241/2613 [15:32<17:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1242/2613 [15:33<17:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1243/2613 [15:34<17:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1244/2613 [15:34<17:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1245/2613 [15:35<17:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1246/2613 [15:36<17:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1247/2613 [15:37<17:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1248/2613 [15:37<17:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1249/2613 [15:38<17:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1250/2613 [15:39<17:04,  1.33it/s]

	Current Loss: 2.1185
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1251/2613 [15:40<17:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1252/2613 [15:40<17:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1253/2613 [15:41<17:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1254/2613 [15:42<17:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1255/2613 [15:43<17:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1256/2613 [15:43<17:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1257/2613 [15:44<16:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1258/2613 [15:45<16:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1259/2613 [15:46<16:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1260/2613 [15:46<16:56,  1.33it/s]

	Current Loss: 2.1183
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1261/2613 [15:47<16:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1262/2613 [15:48<16:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1263/2613 [15:49<16:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1264/2613 [15:49<16:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1265/2613 [15:50<16:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1266/2613 [15:51<16:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1267/2613 [15:52<16:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1268/2613 [15:52<16:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1269/2613 [15:53<16:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1270/2613 [15:54<16:49,  1.33it/s]

	Current Loss: 2.1178
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1271/2613 [15:55<16:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1272/2613 [15:55<16:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1273/2613 [15:56<16:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1274/2613 [15:57<16:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1275/2613 [15:58<16:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1276/2613 [15:58<16:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1277/2613 [15:59<16:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1278/2613 [16:00<16:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1279/2613 [16:01<16:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1280/2613 [16:01<16:42,  1.33it/s]

	Current Loss: 2.1162
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1281/2613 [16:02<16:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1282/2613 [16:03<16:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1283/2613 [16:04<16:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1284/2613 [16:04<16:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1285/2613 [16:05<16:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1286/2613 [16:06<16:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1287/2613 [16:07<16:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1288/2613 [16:07<16:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1289/2613 [16:08<16:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1290/2613 [16:09<16:34,  1.33it/s]

	Current Loss: 2.1133
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1291/2613 [16:10<16:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1292/2613 [16:10<16:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1293/2613 [16:11<16:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1294/2613 [16:12<16:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1295/2613 [16:13<16:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1296/2613 [16:13<16:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1297/2613 [16:14<16:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1298/2613 [16:15<16:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1299/2613 [16:16<16:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1300/2613 [16:17<16:27,  1.33it/s]

	Current Loss: 2.1104
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1301/2613 [16:17<16:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1302/2613 [16:18<16:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1303/2613 [16:19<16:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1304/2613 [16:20<16:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1305/2613 [16:20<16:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1306/2613 [16:21<16:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1307/2613 [16:22<16:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1308/2613 [16:23<16:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1309/2613 [16:23<16:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1310/2613 [16:24<16:19,  1.33it/s]

	Current Loss: 2.1178
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1311/2613 [16:25<16:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1312/2613 [16:26<16:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1313/2613 [16:26<16:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1314/2613 [16:27<16:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1315/2613 [16:28<16:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1316/2613 [16:29<16:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1317/2613 [16:29<16:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1318/2613 [16:30<16:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1319/2613 [16:31<16:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1320/2613 [16:32<16:12,  1.33it/s]

	Current Loss: 2.1121
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1321/2613 [16:32<16:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1322/2613 [16:33<16:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1323/2613 [16:34<16:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1324/2613 [16:35<16:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1325/2613 [16:35<16:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1326/2613 [16:36<16:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1327/2613 [16:37<16:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1328/2613 [16:38<16:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1329/2613 [16:38<16:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1330/2613 [16:39<16:04,  1.33it/s]

	Current Loss: 2.1088
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1331/2613 [16:40<16:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1332/2613 [16:41<16:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1333/2613 [16:41<16:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1334/2613 [16:42<16:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1335/2613 [16:43<16:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1336/2613 [16:44<15:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1337/2613 [16:44<15:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1338/2613 [16:45<15:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1339/2613 [16:46<15:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1340/2613 [16:47<15:56,  1.33it/s]

	Current Loss: 2.1071
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1341/2613 [16:47<15:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1342/2613 [16:48<15:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1343/2613 [16:49<15:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1344/2613 [16:50<15:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1345/2613 [16:50<15:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1346/2613 [16:51<15:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1347/2613 [16:52<15:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1348/2613 [16:53<15:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1349/2613 [16:53<15:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1350/2613 [16:54<15:49,  1.33it/s]

	Current Loss: 2.0986
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1351/2613 [16:55<15:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1352/2613 [16:56<15:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1353/2613 [16:56<15:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1354/2613 [16:57<15:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1355/2613 [16:58<15:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1356/2613 [16:59<15:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1357/2613 [16:59<15:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1358/2613 [17:00<15:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1359/2613 [17:01<15:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1360/2613 [17:02<15:41,  1.33it/s]

	Current Loss: 2.1062
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1361/2613 [17:02<15:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1362/2613 [17:03<15:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1363/2613 [17:04<15:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1364/2613 [17:05<15:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1365/2613 [17:05<15:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1366/2613 [17:06<15:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1367/2613 [17:07<15:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1368/2613 [17:08<15:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1369/2613 [17:08<15:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1370/2613 [17:09<15:34,  1.33it/s]

	Current Loss: 2.1018
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1371/2613 [17:10<15:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1372/2613 [17:11<15:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1373/2613 [17:11<15:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1374/2613 [17:12<15:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1375/2613 [17:13<15:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1376/2613 [17:14<15:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1377/2613 [17:14<15:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1378/2613 [17:15<15:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1379/2613 [17:16<15:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1380/2613 [17:17<15:27,  1.33it/s]

	Current Loss: 2.1040
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1381/2613 [17:17<15:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1382/2613 [17:18<15:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1383/2613 [17:19<15:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1384/2613 [17:20<15:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1385/2613 [17:20<15:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1386/2613 [17:21<15:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1387/2613 [17:22<15:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1388/2613 [17:23<15:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1389/2613 [17:23<15:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1390/2613 [17:24<15:18,  1.33it/s]

	Current Loss: 2.0966
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1391/2613 [17:25<15:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1392/2613 [17:26<15:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1393/2613 [17:26<15:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1394/2613 [17:27<15:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1395/2613 [17:28<15:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1396/2613 [17:29<15:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1397/2613 [17:29<15:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1398/2613 [17:30<15:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1399/2613 [17:31<15:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1400/2613 [17:32<15:11,  1.33it/s]

	Current Loss: 2.0974
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1401/2613 [17:32<15:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1402/2613 [17:33<15:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1403/2613 [17:34<15:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1404/2613 [17:35<15:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1405/2613 [17:35<15:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1406/2613 [17:36<15:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1407/2613 [17:37<15:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1408/2613 [17:38<15:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1409/2613 [17:38<15:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1410/2613 [17:39<15:03,  1.33it/s]

	Current Loss: 2.0958
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1411/2613 [17:40<15:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1412/2613 [17:41<15:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1413/2613 [17:41<15:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1414/2613 [17:42<15:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1415/2613 [17:43<15:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1416/2613 [17:44<15:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1417/2613 [17:44<14:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1418/2613 [17:45<14:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1419/2613 [17:46<14:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1420/2613 [17:47<14:56,  1.33it/s]

	Current Loss: 2.0943
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1421/2613 [17:47<14:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1422/2613 [17:48<14:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1423/2613 [17:49<14:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1424/2613 [17:50<14:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1425/2613 [17:50<14:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1426/2613 [17:51<14:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1427/2613 [17:52<14:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1428/2613 [17:53<14:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1429/2613 [17:53<14:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1430/2613 [17:54<14:48,  1.33it/s]

	Current Loss: 2.0927
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1431/2613 [17:55<14:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1432/2613 [17:56<14:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1433/2613 [17:56<14:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1434/2613 [17:57<14:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1435/2613 [17:58<14:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1436/2613 [17:59<14:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1437/2613 [17:59<14:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1438/2613 [18:00<14:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1439/2613 [18:01<14:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1440/2613 [18:02<14:41,  1.33it/s]

	Current Loss: 2.0926
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1441/2613 [18:02<14:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1442/2613 [18:03<14:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1443/2613 [18:04<14:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1444/2613 [18:05<14:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1445/2613 [18:05<14:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1446/2613 [18:06<14:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1447/2613 [18:07<14:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1448/2613 [18:08<14:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1449/2613 [18:08<14:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1450/2613 [18:09<14:34,  1.33it/s]

	Current Loss: 2.0898
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1451/2613 [18:10<14:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1452/2613 [18:11<14:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1453/2613 [18:11<14:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1454/2613 [18:12<14:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1455/2613 [18:13<14:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1456/2613 [18:14<14:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1457/2613 [18:14<14:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1458/2613 [18:15<14:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1459/2613 [18:16<14:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1460/2613 [18:17<14:26,  1.33it/s]

	Current Loss: 2.0855
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1461/2613 [18:18<14:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1462/2613 [18:18<14:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1463/2613 [18:19<14:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1464/2613 [18:20<14:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1465/2613 [18:21<14:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1466/2613 [18:21<14:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1467/2613 [18:22<14:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1468/2613 [18:23<14:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1469/2613 [18:24<14:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1470/2613 [18:24<14:19,  1.33it/s]

	Current Loss: 2.0850
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1471/2613 [18:25<14:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1472/2613 [18:26<14:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1473/2613 [18:27<14:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1474/2613 [18:27<14:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1475/2613 [18:28<14:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1476/2613 [18:29<14:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1477/2613 [18:30<14:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1478/2613 [18:30<14:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1479/2613 [18:31<14:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1480/2613 [18:32<14:11,  1.33it/s]

	Current Loss: 2.0855
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1481/2613 [18:33<14:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1482/2613 [18:33<14:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1483/2613 [18:34<14:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1484/2613 [18:35<14:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1485/2613 [18:36<14:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1486/2613 [18:36<14:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1487/2613 [18:37<14:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1488/2613 [18:38<14:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1489/2613 [18:39<14:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1490/2613 [18:39<14:04,  1.33it/s]

	Current Loss: 2.0869
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1491/2613 [18:40<14:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1492/2613 [18:41<14:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1493/2613 [18:42<14:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1494/2613 [18:42<14:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1495/2613 [18:43<14:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1496/2613 [18:44<13:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1497/2613 [18:45<13:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1498/2613 [18:45<13:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1499/2613 [18:46<13:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1500/2613 [18:47<13:56,  1.33it/s]

	Current Loss: 2.0821
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1501/2613 [18:48<13:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1502/2613 [18:48<13:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1503/2613 [18:49<13:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1504/2613 [18:50<13:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1505/2613 [18:51<13:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1506/2613 [18:51<13:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1507/2613 [18:52<13:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1508/2613 [18:53<13:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1509/2613 [18:54<13:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1510/2613 [18:54<13:48,  1.33it/s]

	Current Loss: 2.0798
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1511/2613 [18:55<13:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1512/2613 [18:56<13:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1513/2613 [18:57<13:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1514/2613 [18:57<13:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1515/2613 [18:58<13:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1516/2613 [18:59<13:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1517/2613 [19:00<13:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1518/2613 [19:00<13:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1519/2613 [19:01<13:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1520/2613 [19:02<13:41,  1.33it/s]

	Current Loss: 2.0783
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1521/2613 [19:03<13:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1522/2613 [19:03<13:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1523/2613 [19:04<13:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1524/2613 [19:05<13:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1525/2613 [19:06<13:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1526/2613 [19:06<13:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1527/2613 [19:07<13:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1528/2613 [19:08<13:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1529/2613 [19:09<13:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1530/2613 [19:09<13:33,  1.33it/s]

	Current Loss: 2.0801
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1531/2613 [19:10<13:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1532/2613 [19:11<13:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1533/2613 [19:12<13:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1534/2613 [19:12<13:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1535/2613 [19:13<13:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1536/2613 [19:14<13:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1537/2613 [19:15<13:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1538/2613 [19:15<13:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1539/2613 [19:16<13:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1540/2613 [19:17<13:25,  1.33it/s]

	Current Loss: 2.0784
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1541/2613 [19:18<13:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1542/2613 [19:18<13:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1543/2613 [19:19<13:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1544/2613 [19:20<13:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1545/2613 [19:21<13:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1546/2613 [19:21<13:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1547/2613 [19:22<13:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1548/2613 [19:23<13:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1549/2613 [19:24<13:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1550/2613 [19:24<13:19,  1.33it/s]

	Current Loss: 2.0790
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1551/2613 [19:25<13:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1552/2613 [19:26<13:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1553/2613 [19:27<13:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1554/2613 [19:27<13:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1555/2613 [19:28<13:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1556/2613 [19:29<13:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1557/2613 [19:30<13:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1558/2613 [19:30<13:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1559/2613 [19:31<13:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1560/2613 [19:32<13:11,  1.33it/s]

	Current Loss: 2.0792
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1561/2613 [19:33<13:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1562/2613 [19:33<13:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1563/2613 [19:34<13:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1564/2613 [19:35<13:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1565/2613 [19:36<13:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1566/2613 [19:36<13:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1567/2613 [19:37<13:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1568/2613 [19:38<13:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1569/2613 [19:39<13:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1570/2613 [19:39<13:03,  1.33it/s]

	Current Loss: 2.0730
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1571/2613 [19:40<13:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1572/2613 [19:41<13:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1573/2613 [19:42<13:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1574/2613 [19:42<13:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1575/2613 [19:43<13:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1576/2613 [19:44<12:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1577/2613 [19:45<12:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1578/2613 [19:45<12:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1579/2613 [19:46<12:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1580/2613 [19:47<12:56,  1.33it/s]

	Current Loss: 2.0766
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1581/2613 [19:48<12:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1582/2613 [19:48<12:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1583/2613 [19:49<12:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1584/2613 [19:50<12:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1585/2613 [19:51<12:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1586/2613 [19:51<12:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1587/2613 [19:52<12:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1588/2613 [19:53<12:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1589/2613 [19:54<12:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1590/2613 [19:54<12:49,  1.33it/s]

	Current Loss: 2.0705
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1591/2613 [19:55<12:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1592/2613 [19:56<12:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1593/2613 [19:57<12:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1594/2613 [19:57<12:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1595/2613 [19:58<12:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1596/2613 [19:59<12:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1597/2613 [20:00<12:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1598/2613 [20:00<12:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1599/2613 [20:01<12:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1600/2613 [20:02<12:41,  1.33it/s]

	Current Loss: 2.0694
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1601/2613 [20:03<12:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1602/2613 [20:03<12:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1603/2613 [20:04<12:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1604/2613 [20:05<12:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1605/2613 [20:06<12:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1606/2613 [20:06<12:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1607/2613 [20:07<12:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1608/2613 [20:08<12:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1609/2613 [20:09<12:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1610/2613 [20:09<12:35,  1.33it/s]

	Current Loss: 2.0740
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1611/2613 [20:10<12:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1612/2613 [20:11<12:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1613/2613 [20:12<12:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1614/2613 [20:12<12:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1615/2613 [20:13<12:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1616/2613 [20:14<12:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1617/2613 [20:15<12:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1618/2613 [20:15<12:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1619/2613 [20:16<12:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1620/2613 [20:17<12:26,  1.33it/s]

	Current Loss: 2.0668
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1621/2613 [20:18<12:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1622/2613 [20:19<12:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1623/2613 [20:19<12:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1624/2613 [20:20<12:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1625/2613 [20:21<12:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1626/2613 [20:22<12:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1627/2613 [20:22<12:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1628/2613 [20:23<12:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1629/2613 [20:24<12:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1630/2613 [20:25<12:18,  1.33it/s]

	Current Loss: 2.0663
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1631/2613 [20:25<12:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1632/2613 [20:26<12:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1633/2613 [20:27<12:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1634/2613 [20:28<12:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1635/2613 [20:28<12:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1636/2613 [20:29<12:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1637/2613 [20:30<12:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1638/2613 [20:31<12:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1639/2613 [20:31<12:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1640/2613 [20:32<12:10,  1.33it/s]

	Current Loss: 2.0718
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1641/2613 [20:33<12:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1642/2613 [20:34<12:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1643/2613 [20:34<12:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1644/2613 [20:35<12:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1645/2613 [20:36<12:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1646/2613 [20:37<12:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1647/2613 [20:37<12:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1648/2613 [20:38<12:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1649/2613 [20:39<12:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1650/2613 [20:40<12:03,  1.33it/s]

	Current Loss: 2.0611
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1651/2613 [20:40<12:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1652/2613 [20:41<12:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1653/2613 [20:42<12:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1654/2613 [20:43<12:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1655/2613 [20:43<12:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1656/2613 [20:44<11:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1657/2613 [20:45<11:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1658/2613 [20:46<11:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1659/2613 [20:46<11:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1660/2613 [20:47<11:56,  1.33it/s]

	Current Loss: 2.0583
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1661/2613 [20:48<11:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1662/2613 [20:49<11:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1663/2613 [20:49<11:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1664/2613 [20:50<11:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1665/2613 [20:51<11:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1666/2613 [20:52<11:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1667/2613 [20:52<11:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1668/2613 [20:53<11:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1669/2613 [20:54<11:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1670/2613 [20:55<11:49,  1.33it/s]

	Current Loss: 2.0590
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1671/2613 [20:55<11:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1672/2613 [20:56<11:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1673/2613 [20:57<11:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1674/2613 [20:58<11:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1675/2613 [20:58<11:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1676/2613 [20:59<11:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1677/2613 [21:00<11:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1678/2613 [21:01<11:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1679/2613 [21:01<11:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1680/2613 [21:02<11:41,  1.33it/s]

	Current Loss: 2.0647
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1681/2613 [21:03<11:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1682/2613 [21:04<11:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1683/2613 [21:04<11:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1684/2613 [21:05<11:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1685/2613 [21:06<11:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1686/2613 [21:07<11:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1687/2613 [21:07<11:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1688/2613 [21:08<11:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1689/2613 [21:09<11:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1690/2613 [21:10<11:33,  1.33it/s]

	Current Loss: 2.0607
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1691/2613 [21:10<11:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1692/2613 [21:11<11:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1693/2613 [21:12<11:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1694/2613 [21:13<11:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1695/2613 [21:13<11:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1696/2613 [21:14<11:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1697/2613 [21:15<11:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1698/2613 [21:16<11:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1699/2613 [21:16<11:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1700/2613 [21:17<11:26,  1.33it/s]

	Current Loss: 2.0553
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1701/2613 [21:18<11:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1702/2613 [21:19<11:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1703/2613 [21:19<11:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1704/2613 [21:20<11:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1705/2613 [21:21<11:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1706/2613 [21:22<11:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1707/2613 [21:22<11:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1708/2613 [21:23<11:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1709/2613 [21:24<11:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1710/2613 [21:25<11:18,  1.33it/s]

	Current Loss: 2.0565
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1711/2613 [21:25<11:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1712/2613 [21:26<11:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1713/2613 [21:27<11:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1714/2613 [21:28<11:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1715/2613 [21:28<11:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1716/2613 [21:29<11:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1717/2613 [21:30<11:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1718/2613 [21:31<11:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1719/2613 [21:31<11:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1720/2613 [21:32<11:11,  1.33it/s]

	Current Loss: 2.0545
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1721/2613 [21:33<11:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1722/2613 [21:34<11:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1723/2613 [21:34<11:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1724/2613 [21:35<11:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1725/2613 [21:36<11:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1726/2613 [21:37<11:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1727/2613 [21:37<11:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1728/2613 [21:38<11:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1729/2613 [21:39<11:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1730/2613 [21:40<11:03,  1.33it/s]

	Current Loss: 2.0559
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1731/2613 [21:40<11:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1732/2613 [21:41<11:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1733/2613 [21:42<11:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1734/2613 [21:43<11:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1735/2613 [21:43<11:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1736/2613 [21:44<10:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1737/2613 [21:45<10:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1738/2613 [21:46<10:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1739/2613 [21:46<10:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1740/2613 [21:47<10:56,  1.33it/s]

	Current Loss: 2.0627
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1741/2613 [21:48<10:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1742/2613 [21:49<10:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1743/2613 [21:49<10:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1744/2613 [21:50<10:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1745/2613 [21:51<10:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1746/2613 [21:52<10:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1747/2613 [21:52<10:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1748/2613 [21:53<10:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1749/2613 [21:54<10:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1750/2613 [21:55<10:48,  1.33it/s]

	Current Loss: 2.0500
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1751/2613 [21:55<10:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1752/2613 [21:56<10:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1753/2613 [21:57<10:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1754/2613 [21:58<10:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1755/2613 [21:58<10:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1756/2613 [21:59<10:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1757/2613 [22:00<10:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1758/2613 [22:01<10:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1759/2613 [22:01<10:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1760/2613 [22:02<10:40,  1.33it/s]

	Current Loss: 2.0492
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1761/2613 [22:03<10:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1762/2613 [22:04<10:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1763/2613 [22:04<10:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1764/2613 [22:05<10:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1765/2613 [22:06<10:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1766/2613 [22:07<10:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1767/2613 [22:07<10:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1768/2613 [22:08<10:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1769/2613 [22:09<10:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1770/2613 [22:10<10:33,  1.33it/s]

	Current Loss: 2.0474
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1771/2613 [22:10<10:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1772/2613 [22:11<10:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1773/2613 [22:12<10:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1774/2613 [22:13<10:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1775/2613 [22:13<10:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1776/2613 [22:14<10:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1777/2613 [22:15<10:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1778/2613 [22:16<10:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1779/2613 [22:16<10:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1780/2613 [22:17<10:25,  1.33it/s]

	Current Loss: 2.0513
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1781/2613 [22:18<10:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1782/2613 [22:19<10:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1783/2613 [22:19<10:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1784/2613 [22:20<10:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1785/2613 [22:21<10:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1786/2613 [22:22<10:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1787/2613 [22:22<10:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1788/2613 [22:23<10:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1789/2613 [22:24<10:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1790/2613 [22:25<10:18,  1.33it/s]

	Current Loss: 2.0414
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1791/2613 [22:26<10:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1792/2613 [22:26<10:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1793/2613 [22:27<10:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1794/2613 [22:28<10:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1795/2613 [22:29<10:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1796/2613 [22:29<10:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1797/2613 [22:30<10:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1798/2613 [22:31<10:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1799/2613 [22:32<10:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1800/2613 [22:32<10:11,  1.33it/s]

	Current Loss: 2.0469
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1801/2613 [22:33<10:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1802/2613 [22:34<10:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1803/2613 [22:35<10:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1804/2613 [22:35<10:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1805/2613 [22:36<10:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1806/2613 [22:37<10:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1807/2613 [22:38<10:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1808/2613 [22:38<10:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1809/2613 [22:39<10:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1810/2613 [22:40<10:03,  1.33it/s]

	Current Loss: 2.0450
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1811/2613 [22:41<10:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1812/2613 [22:41<10:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1813/2613 [22:42<10:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1814/2613 [22:43<10:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1815/2613 [22:44<09:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1816/2613 [22:44<09:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1817/2613 [22:45<09:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1818/2613 [22:46<09:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1819/2613 [22:47<09:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1820/2613 [22:47<09:56,  1.33it/s]

	Current Loss: 2.0422
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1821/2613 [22:48<09:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1822/2613 [22:49<09:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1823/2613 [22:50<09:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1824/2613 [22:50<09:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1825/2613 [22:51<09:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1826/2613 [22:52<09:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1827/2613 [22:53<09:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1828/2613 [22:53<09:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1829/2613 [22:54<09:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1830/2613 [22:55<09:48,  1.33it/s]

	Current Loss: 2.0451
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1831/2613 [22:56<09:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1832/2613 [22:56<09:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1833/2613 [22:57<09:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1834/2613 [22:58<09:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1835/2613 [22:59<09:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1836/2613 [22:59<09:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1837/2613 [23:00<09:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1838/2613 [23:01<09:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1839/2613 [23:02<09:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1840/2613 [23:02<09:41,  1.33it/s]

	Current Loss: 2.0382
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1841/2613 [23:03<09:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1842/2613 [23:04<09:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1843/2613 [23:05<09:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1844/2613 [23:05<09:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1845/2613 [23:06<09:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1846/2613 [23:07<09:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1847/2613 [23:08<09:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1848/2613 [23:08<09:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1849/2613 [23:09<09:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1850/2613 [23:10<09:33,  1.33it/s]

	Current Loss: 2.0393
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1851/2613 [23:11<09:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1852/2613 [23:11<09:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1853/2613 [23:12<09:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1854/2613 [23:13<09:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1855/2613 [23:14<09:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1856/2613 [23:14<09:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1857/2613 [23:15<09:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1858/2613 [23:16<09:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1859/2613 [23:17<09:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1860/2613 [23:17<09:25,  1.33it/s]

	Current Loss: 2.0417
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1861/2613 [23:18<09:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1862/2613 [23:19<09:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1863/2613 [23:20<09:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1864/2613 [23:20<09:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1865/2613 [23:21<09:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1866/2613 [23:22<09:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1867/2613 [23:23<09:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1868/2613 [23:23<09:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1869/2613 [23:24<09:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1870/2613 [23:25<09:18,  1.33it/s]

	Current Loss: 2.0405
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1871/2613 [23:26<09:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1872/2613 [23:26<09:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1873/2613 [23:27<09:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1874/2613 [23:28<09:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1875/2613 [23:29<09:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1876/2613 [23:29<09:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1877/2613 [23:30<09:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1878/2613 [23:31<09:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1879/2613 [23:32<09:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1880/2613 [23:32<09:10,  1.33it/s]

	Current Loss: 2.0370
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1881/2613 [23:33<09:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1882/2613 [23:34<09:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1883/2613 [23:35<09:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1884/2613 [23:35<09:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1885/2613 [23:36<09:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1886/2613 [23:37<09:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1887/2613 [23:38<09:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1888/2613 [23:38<09:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1889/2613 [23:39<09:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1890/2613 [23:40<09:03,  1.33it/s]

	Current Loss: 2.0357
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1891/2613 [23:41<09:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1892/2613 [23:41<09:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1893/2613 [23:42<09:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1894/2613 [23:43<09:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1895/2613 [23:44<08:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1896/2613 [23:44<08:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1897/2613 [23:45<08:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1898/2613 [23:46<08:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1899/2613 [23:47<08:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1900/2613 [23:47<08:55,  1.33it/s]

	Current Loss: 2.0260
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1901/2613 [23:48<08:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1902/2613 [23:49<08:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1903/2613 [23:50<08:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1904/2613 [23:50<08:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1905/2613 [23:51<08:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1906/2613 [23:52<08:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1907/2613 [23:53<08:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1908/2613 [23:53<08:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1909/2613 [23:54<08:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1910/2613 [23:55<08:48,  1.33it/s]

	Current Loss: 2.0291
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1911/2613 [23:56<08:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1912/2613 [23:56<08:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1913/2613 [23:57<08:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1914/2613 [23:58<08:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1915/2613 [23:59<08:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1916/2613 [23:59<08:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1917/2613 [24:00<08:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1918/2613 [24:01<08:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1919/2613 [24:02<08:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1920/2613 [24:02<08:40,  1.33it/s]

	Current Loss: 2.0296
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1921/2613 [24:03<08:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1922/2613 [24:04<08:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1923/2613 [24:05<08:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1924/2613 [24:05<08:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1925/2613 [24:06<08:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1926/2613 [24:07<08:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1927/2613 [24:08<08:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1928/2613 [24:08<08:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1929/2613 [24:09<08:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1930/2613 [24:10<08:33,  1.33it/s]

	Current Loss: 2.0260
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1931/2613 [24:11<08:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1932/2613 [24:11<08:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1933/2613 [24:12<08:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1934/2613 [24:13<08:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1935/2613 [24:14<08:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1936/2613 [24:14<08:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1937/2613 [24:15<08:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1938/2613 [24:16<08:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1939/2613 [24:17<08:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1940/2613 [24:17<08:25,  1.33it/s]

	Current Loss: 2.0284
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1941/2613 [24:18<08:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1942/2613 [24:19<08:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1943/2613 [24:20<08:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1944/2613 [24:20<08:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1945/2613 [24:21<08:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1946/2613 [24:22<08:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1947/2613 [24:23<08:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1948/2613 [24:23<08:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1949/2613 [24:24<08:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1950/2613 [24:25<08:18,  1.33it/s]

	Current Loss: 2.0245
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1951/2613 [24:26<08:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1952/2613 [24:26<08:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1953/2613 [24:27<08:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1954/2613 [24:28<08:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1955/2613 [24:29<08:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1956/2613 [24:30<08:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1957/2613 [24:30<08:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1958/2613 [24:31<08:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1959/2613 [24:32<08:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1960/2613 [24:33<08:10,  1.33it/s]

	Current Loss: 2.0274
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1961/2613 [24:33<08:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1962/2613 [24:34<08:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1963/2613 [24:35<08:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1964/2613 [24:36<08:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1965/2613 [24:36<08:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1966/2613 [24:37<08:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1967/2613 [24:38<08:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1968/2613 [24:39<08:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1969/2613 [24:39<08:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1970/2613 [24:40<08:03,  1.33it/s]

	Current Loss: 2.0233
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1971/2613 [24:41<08:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1972/2613 [24:42<08:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1973/2613 [24:42<08:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1974/2613 [24:43<08:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1975/2613 [24:44<07:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1976/2613 [24:45<07:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1977/2613 [24:45<07:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1978/2613 [24:46<07:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1979/2613 [24:47<07:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1980/2613 [24:48<07:55,  1.33it/s]

	Current Loss: 2.0228
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1981/2613 [24:48<07:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1982/2613 [24:49<07:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1983/2613 [24:50<07:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1984/2613 [24:51<07:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1985/2613 [24:51<07:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1986/2613 [24:52<07:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1987/2613 [24:53<07:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1988/2613 [24:54<07:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1989/2613 [24:54<07:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1990/2613 [24:55<07:48,  1.33it/s]

	Current Loss: 2.0188
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1991/2613 [24:56<07:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1992/2613 [24:57<07:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1993/2613 [24:57<07:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1994/2613 [24:58<07:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1995/2613 [24:59<07:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1996/2613 [25:00<07:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1997/2613 [25:00<07:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1998/2613 [25:01<07:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 1999/2613 [25:02<07:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2000/2613 [25:03<07:40,  1.33it/s]

	Current Loss: 2.0213
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2001/2613 [25:03<07:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2002/2613 [25:04<07:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2003/2613 [25:05<07:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2004/2613 [25:06<07:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2005/2613 [25:06<07:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2006/2613 [25:07<07:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2007/2613 [25:08<07:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2008/2613 [25:09<07:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2009/2613 [25:09<07:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2010/2613 [25:10<07:33,  1.33it/s]

	Current Loss: 2.0170
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2011/2613 [25:11<07:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2012/2613 [25:12<07:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2013/2613 [25:12<07:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2014/2613 [25:13<07:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2015/2613 [25:14<07:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2016/2613 [25:15<07:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2017/2613 [25:15<07:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2018/2613 [25:16<07:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2019/2613 [25:17<07:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2020/2613 [25:18<07:25,  1.33it/s]

	Current Loss: 2.0160
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2021/2613 [25:18<07:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2022/2613 [25:19<07:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2023/2613 [25:20<07:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2024/2613 [25:21<07:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2025/2613 [25:21<07:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2026/2613 [25:22<07:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2027/2613 [25:23<07:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2028/2613 [25:24<07:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2029/2613 [25:24<07:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2030/2613 [25:25<07:18,  1.33it/s]

	Current Loss: 2.0116
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2031/2613 [25:26<07:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2032/2613 [25:27<07:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2033/2613 [25:27<07:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2034/2613 [25:28<07:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2035/2613 [25:29<07:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2036/2613 [25:30<07:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2037/2613 [25:30<07:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2038/2613 [25:31<07:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2039/2613 [25:32<07:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2040/2613 [25:33<07:10,  1.33it/s]

	Current Loss: 2.0199
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2041/2613 [25:33<07:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2042/2613 [25:34<07:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2043/2613 [25:35<07:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2044/2613 [25:36<07:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2045/2613 [25:36<07:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2046/2613 [25:37<07:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2047/2613 [25:38<07:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2048/2613 [25:39<07:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2049/2613 [25:39<07:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2050/2613 [25:40<07:03,  1.33it/s]

	Current Loss: 2.0163
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2051/2613 [25:41<07:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2052/2613 [25:42<07:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2053/2613 [25:42<07:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2054/2613 [25:43<07:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2055/2613 [25:44<06:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2056/2613 [25:45<06:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2057/2613 [25:45<06:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2058/2613 [25:46<06:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2059/2613 [25:47<06:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2060/2613 [25:48<06:55,  1.33it/s]

	Current Loss: 2.0095
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2061/2613 [25:48<06:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2062/2613 [25:49<06:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2063/2613 [25:50<06:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2064/2613 [25:51<06:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2065/2613 [25:51<06:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2066/2613 [25:52<06:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2067/2613 [25:53<06:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2068/2613 [25:54<06:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2069/2613 [25:54<06:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2070/2613 [25:55<06:49,  1.33it/s]

	Current Loss: 2.0137
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2071/2613 [25:56<06:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2072/2613 [25:57<06:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2073/2613 [25:57<06:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2074/2613 [25:58<06:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2075/2613 [25:59<06:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2076/2613 [26:00<06:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2077/2613 [26:00<06:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2078/2613 [26:01<06:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2079/2613 [26:02<06:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2080/2613 [26:03<06:40,  1.33it/s]

	Current Loss: 2.0128
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2081/2613 [26:03<06:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2082/2613 [26:04<06:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2083/2613 [26:05<06:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2084/2613 [26:06<06:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2085/2613 [26:06<06:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2086/2613 [26:07<06:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2087/2613 [26:08<06:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2088/2613 [26:09<06:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2089/2613 [26:09<06:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2090/2613 [26:10<06:32,  1.33it/s]

	Current Loss: 2.0100
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2091/2613 [26:11<06:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2092/2613 [26:12<06:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2093/2613 [26:12<06:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2094/2613 [26:13<06:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2095/2613 [26:14<06:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2096/2613 [26:15<06:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2097/2613 [26:15<06:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2098/2613 [26:16<06:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2099/2613 [26:17<06:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2100/2613 [26:18<06:25,  1.33it/s]

	Current Loss: 2.0079
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2101/2613 [26:18<06:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2102/2613 [26:19<06:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2103/2613 [26:20<06:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2104/2613 [26:21<06:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2105/2613 [26:21<06:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2106/2613 [26:22<06:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2107/2613 [26:23<06:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2108/2613 [26:24<06:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2109/2613 [26:24<06:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2110/2613 [26:25<06:17,  1.33it/s]

	Current Loss: 2.0049
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2111/2613 [26:26<06:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2112/2613 [26:27<06:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2113/2613 [26:27<06:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2114/2613 [26:28<06:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2115/2613 [26:29<06:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2116/2613 [26:30<06:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2117/2613 [26:30<06:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2118/2613 [26:31<06:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2119/2613 [26:32<06:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2120/2613 [26:33<06:10,  1.33it/s]

	Current Loss: 1.9999
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2121/2613 [26:34<06:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2122/2613 [26:34<06:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2123/2613 [26:35<06:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2124/2613 [26:36<06:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2125/2613 [26:37<06:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2126/2613 [26:37<06:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2127/2613 [26:38<06:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2128/2613 [26:39<06:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2129/2613 [26:40<06:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2130/2613 [26:40<06:03,  1.33it/s]

	Current Loss: 2.0032
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2131/2613 [26:41<06:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2132/2613 [26:42<06:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2133/2613 [26:43<06:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2134/2613 [26:43<06:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2135/2613 [26:44<05:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2136/2613 [26:45<05:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2137/2613 [26:46<05:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2138/2613 [26:46<05:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2139/2613 [26:47<05:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2140/2613 [26:48<05:55,  1.33it/s]

	Current Loss: 1.9974
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2141/2613 [26:49<05:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2142/2613 [26:49<05:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2143/2613 [26:50<05:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2144/2613 [26:51<05:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2145/2613 [26:52<05:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2146/2613 [26:52<05:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2147/2613 [26:53<05:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2148/2613 [26:54<05:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2149/2613 [26:55<05:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2150/2613 [26:55<05:47,  1.33it/s]

	Current Loss: 1.9978
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2151/2613 [26:56<05:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2152/2613 [26:57<05:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2153/2613 [26:58<05:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2154/2613 [26:58<05:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2155/2613 [26:59<05:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2156/2613 [27:00<05:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2157/2613 [27:01<05:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2158/2613 [27:01<05:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2159/2613 [27:02<05:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2160/2613 [27:03<05:40,  1.33it/s]

	Current Loss: 2.0010
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2161/2613 [27:04<05:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2162/2613 [27:04<05:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2163/2613 [27:05<05:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2164/2613 [27:06<05:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2165/2613 [27:07<05:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2166/2613 [27:07<05:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2167/2613 [27:08<05:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2168/2613 [27:09<05:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2169/2613 [27:10<05:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2170/2613 [27:10<05:33,  1.33it/s]

	Current Loss: 2.0064
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2171/2613 [27:11<05:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2172/2613 [27:12<05:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2173/2613 [27:13<05:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2174/2613 [27:13<05:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2175/2613 [27:14<05:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2176/2613 [27:15<05:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2177/2613 [27:16<05:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2178/2613 [27:16<05:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2179/2613 [27:17<05:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2180/2613 [27:18<05:25,  1.33it/s]

	Current Loss: 2.0005
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2181/2613 [27:19<05:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2182/2613 [27:19<05:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2183/2613 [27:20<05:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2184/2613 [27:21<05:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2185/2613 [27:22<05:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2186/2613 [27:22<05:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2187/2613 [27:23<05:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2188/2613 [27:24<05:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2189/2613 [27:25<05:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2190/2613 [27:25<05:17,  1.33it/s]

	Current Loss: 1.9975
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2191/2613 [27:26<05:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2192/2613 [27:27<05:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2193/2613 [27:28<05:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2194/2613 [27:28<05:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2195/2613 [27:29<05:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2196/2613 [27:30<05:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2197/2613 [27:31<05:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2198/2613 [27:31<05:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2199/2613 [27:32<05:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2200/2613 [27:33<05:10,  1.33it/s]

	Current Loss: 1.9959
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2201/2613 [27:34<05:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2202/2613 [27:34<05:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2203/2613 [27:35<05:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2204/2613 [27:36<05:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2205/2613 [27:37<05:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2206/2613 [27:37<05:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2207/2613 [27:38<05:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2208/2613 [27:39<05:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2209/2613 [27:40<05:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2210/2613 [27:40<05:02,  1.33it/s]

	Current Loss: 1.9900
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2211/2613 [27:41<05:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2212/2613 [27:42<05:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2213/2613 [27:43<05:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2214/2613 [27:43<04:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2215/2613 [27:44<04:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2216/2613 [27:45<04:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2217/2613 [27:46<04:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2218/2613 [27:46<04:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2219/2613 [27:47<04:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2220/2613 [27:48<04:55,  1.33it/s]

	Current Loss: 1.9949
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2221/2613 [27:49<04:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2222/2613 [27:49<04:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2223/2613 [27:50<04:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2224/2613 [27:51<04:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2225/2613 [27:52<04:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2226/2613 [27:52<04:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2227/2613 [27:53<04:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2228/2613 [27:54<04:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2229/2613 [27:55<04:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2230/2613 [27:55<04:47,  1.33it/s]

	Current Loss: 1.9996
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2231/2613 [27:56<04:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2232/2613 [27:57<04:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2233/2613 [27:58<04:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2234/2613 [27:58<04:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2235/2613 [27:59<04:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2236/2613 [28:00<04:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2237/2613 [28:01<04:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2238/2613 [28:01<04:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2239/2613 [28:02<04:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2240/2613 [28:03<04:40,  1.33it/s]

	Current Loss: 1.9895
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2241/2613 [28:04<04:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2242/2613 [28:04<04:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2243/2613 [28:05<04:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2244/2613 [28:06<04:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2245/2613 [28:07<04:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2246/2613 [28:07<04:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2247/2613 [28:08<04:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2248/2613 [28:09<04:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2249/2613 [28:10<04:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2250/2613 [28:10<04:32,  1.33it/s]

	Current Loss: 1.9918
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2251/2613 [28:11<04:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2252/2613 [28:12<04:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2253/2613 [28:13<04:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2254/2613 [28:13<04:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2255/2613 [28:14<04:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2256/2613 [28:15<04:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2257/2613 [28:16<04:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2258/2613 [28:16<04:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2259/2613 [28:17<04:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2260/2613 [28:18<04:25,  1.33it/s]

	Current Loss: 1.9921
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2261/2613 [28:19<04:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2262/2613 [28:19<04:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2263/2613 [28:20<04:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2264/2613 [28:21<04:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2265/2613 [28:22<04:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2266/2613 [28:22<04:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2267/2613 [28:23<04:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2268/2613 [28:24<04:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2269/2613 [28:25<04:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2270/2613 [28:25<04:18,  1.33it/s]

	Current Loss: 1.9883
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2271/2613 [28:26<04:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2272/2613 [28:27<04:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2273/2613 [28:28<04:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2274/2613 [28:29<04:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2275/2613 [28:29<04:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2276/2613 [28:30<04:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2277/2613 [28:31<04:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2278/2613 [28:32<04:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2279/2613 [28:32<04:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2280/2613 [28:33<04:10,  1.33it/s]

	Current Loss: 1.9880
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2281/2613 [28:34<04:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2282/2613 [28:35<04:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2283/2613 [28:35<04:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2284/2613 [28:36<04:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2285/2613 [28:37<04:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2286/2613 [28:38<04:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2287/2613 [28:38<04:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2288/2613 [28:39<04:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2289/2613 [28:40<04:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2290/2613 [28:41<04:02,  1.33it/s]

	Current Loss: 1.9822
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2291/2613 [28:41<04:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2292/2613 [28:42<04:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2293/2613 [28:43<04:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2294/2613 [28:44<03:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2295/2613 [28:44<03:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2296/2613 [28:45<03:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2297/2613 [28:46<03:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2298/2613 [28:47<03:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2299/2613 [28:47<03:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2300/2613 [28:48<03:55,  1.33it/s]

	Current Loss: 1.9845
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2301/2613 [28:49<03:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2302/2613 [28:50<03:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2303/2613 [28:50<03:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2304/2613 [28:51<03:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2305/2613 [28:52<03:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2306/2613 [28:53<03:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2307/2613 [28:53<03:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2308/2613 [28:54<03:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2309/2613 [28:55<03:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2310/2613 [28:56<03:47,  1.33it/s]

	Current Loss: 1.9851
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2311/2613 [28:56<03:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2312/2613 [28:57<03:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2313/2613 [28:58<03:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2314/2613 [28:59<03:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2315/2613 [28:59<03:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2316/2613 [29:00<03:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2317/2613 [29:01<03:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2318/2613 [29:02<03:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2319/2613 [29:02<03:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2320/2613 [29:03<03:40,  1.33it/s]

	Current Loss: 1.9856
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2321/2613 [29:04<03:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2322/2613 [29:05<03:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2323/2613 [29:05<03:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2324/2613 [29:06<03:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2325/2613 [29:07<03:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2326/2613 [29:08<03:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2327/2613 [29:08<03:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2328/2613 [29:09<03:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2329/2613 [29:10<03:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2330/2613 [29:11<03:32,  1.33it/s]

	Current Loss: 1.9762
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2331/2613 [29:11<03:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2332/2613 [29:12<03:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2333/2613 [29:13<03:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2334/2613 [29:14<03:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2335/2613 [29:14<03:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2336/2613 [29:15<03:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2337/2613 [29:16<03:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2338/2613 [29:17<03:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2339/2613 [29:17<03:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2340/2613 [29:18<03:25,  1.33it/s]

	Current Loss: 1.9807
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2341/2613 [29:19<03:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2342/2613 [29:20<03:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2343/2613 [29:20<03:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2344/2613 [29:21<03:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2345/2613 [29:22<03:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2346/2613 [29:23<03:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2347/2613 [29:23<03:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2348/2613 [29:24<03:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2349/2613 [29:25<03:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2350/2613 [29:26<03:17,  1.33it/s]

	Current Loss: 1.9776
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2351/2613 [29:26<03:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2352/2613 [29:27<03:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2353/2613 [29:28<03:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2354/2613 [29:29<03:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2355/2613 [29:29<03:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2356/2613 [29:30<03:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2357/2613 [29:31<03:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2358/2613 [29:32<03:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2359/2613 [29:32<03:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2360/2613 [29:33<03:10,  1.33it/s]

	Current Loss: 1.9775
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2361/2613 [29:34<03:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2362/2613 [29:35<03:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2363/2613 [29:35<03:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2364/2613 [29:36<03:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2365/2613 [29:37<03:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2366/2613 [29:38<03:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2367/2613 [29:38<03:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2368/2613 [29:39<03:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2369/2613 [29:40<03:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2370/2613 [29:41<03:02,  1.33it/s]

	Current Loss: 1.9811
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2371/2613 [29:41<03:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2372/2613 [29:42<03:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2373/2613 [29:43<03:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2374/2613 [29:44<02:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2375/2613 [29:44<02:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2376/2613 [29:45<02:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2377/2613 [29:46<02:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2378/2613 [29:47<02:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2379/2613 [29:47<02:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2380/2613 [29:48<02:55,  1.33it/s]

	Current Loss: 1.9746
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2381/2613 [29:49<02:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2382/2613 [29:50<02:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2383/2613 [29:50<02:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2384/2613 [29:51<02:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2385/2613 [29:52<02:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2386/2613 [29:53<02:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2387/2613 [29:53<02:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2388/2613 [29:54<02:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2389/2613 [29:55<02:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2390/2613 [29:56<02:47,  1.33it/s]

	Current Loss: 1.9705
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2391/2613 [29:56<02:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2392/2613 [29:57<02:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2393/2613 [29:58<02:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2394/2613 [29:59<02:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2395/2613 [29:59<02:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2396/2613 [30:00<02:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2397/2613 [30:01<02:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2398/2613 [30:02<02:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2399/2613 [30:02<02:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2400/2613 [30:03<02:40,  1.33it/s]

	Current Loss: 1.9751
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2401/2613 [30:04<02:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2402/2613 [30:05<02:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2403/2613 [30:05<02:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2404/2613 [30:06<02:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2405/2613 [30:07<02:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2406/2613 [30:08<02:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2407/2613 [30:08<02:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2408/2613 [30:09<02:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2409/2613 [30:10<02:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2410/2613 [30:11<02:32,  1.33it/s]

	Current Loss: 1.9707
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2411/2613 [30:11<02:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2412/2613 [30:12<02:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2413/2613 [30:13<02:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2414/2613 [30:14<02:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2415/2613 [30:14<02:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2416/2613 [30:15<02:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2417/2613 [30:16<02:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2418/2613 [30:17<02:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2419/2613 [30:17<02:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2420/2613 [30:18<02:25,  1.33it/s]

	Current Loss: 1.9729
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2421/2613 [30:19<02:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2422/2613 [30:20<02:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2423/2613 [30:20<02:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2424/2613 [30:21<02:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2425/2613 [30:22<02:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2426/2613 [30:23<02:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2427/2613 [30:23<02:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2428/2613 [30:24<02:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2429/2613 [30:25<02:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2430/2613 [30:26<02:17,  1.33it/s]

	Current Loss: 1.9702
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2431/2613 [30:27<02:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2432/2613 [30:27<02:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2433/2613 [30:28<02:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2434/2613 [30:29<02:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2435/2613 [30:30<02:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2436/2613 [30:30<02:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2437/2613 [30:31<02:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2438/2613 [30:32<02:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2439/2613 [30:33<02:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2440/2613 [30:33<02:10,  1.33it/s]

	Current Loss: 1.9665
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2441/2613 [30:34<02:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2442/2613 [30:35<02:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2443/2613 [30:36<02:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2444/2613 [30:36<02:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2445/2613 [30:37<02:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2446/2613 [30:38<02:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2447/2613 [30:39<02:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2448/2613 [30:39<02:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2449/2613 [30:40<02:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2450/2613 [30:41<02:02,  1.33it/s]

	Current Loss: 1.9688
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2451/2613 [30:42<02:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2452/2613 [30:42<02:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2453/2613 [30:43<02:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2454/2613 [30:44<01:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2455/2613 [30:45<01:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2456/2613 [30:45<01:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2457/2613 [30:46<01:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2458/2613 [30:47<01:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2459/2613 [30:48<01:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2460/2613 [30:48<01:55,  1.33it/s]

	Current Loss: 1.9703
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2461/2613 [30:49<01:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2462/2613 [30:50<01:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2463/2613 [30:51<01:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2464/2613 [30:51<01:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2465/2613 [30:52<01:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2466/2613 [30:53<01:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2467/2613 [30:54<01:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2468/2613 [30:54<01:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2469/2613 [30:55<01:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2470/2613 [30:56<01:47,  1.33it/s]

	Current Loss: 1.9638
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2471/2613 [30:57<01:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2472/2613 [30:57<01:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2473/2613 [30:58<01:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2474/2613 [30:59<01:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2475/2613 [31:00<01:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2476/2613 [31:00<01:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2477/2613 [31:01<01:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2478/2613 [31:02<01:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2479/2613 [31:03<01:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2480/2613 [31:03<01:40,  1.33it/s]

	Current Loss: 1.9588
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2481/2613 [31:04<01:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2482/2613 [31:05<01:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2483/2613 [31:06<01:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2484/2613 [31:06<01:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2485/2613 [31:07<01:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2486/2613 [31:08<01:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2487/2613 [31:09<01:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2488/2613 [31:09<01:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2489/2613 [31:10<01:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2490/2613 [31:11<01:32,  1.33it/s]

	Current Loss: 1.9746
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2491/2613 [31:12<01:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2492/2613 [31:12<01:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2493/2613 [31:13<01:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2494/2613 [31:14<01:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2495/2613 [31:15<01:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2496/2613 [31:15<01:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2497/2613 [31:16<01:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2498/2613 [31:17<01:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2499/2613 [31:18<01:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2500/2613 [31:18<01:24,  1.33it/s]

	Current Loss: 1.9644
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2501/2613 [31:19<01:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2502/2613 [31:20<01:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2503/2613 [31:21<01:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2504/2613 [31:21<01:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2505/2613 [31:22<01:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2506/2613 [31:23<01:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2507/2613 [31:24<01:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2508/2613 [31:24<01:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2509/2613 [31:25<01:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2510/2613 [31:26<01:17,  1.33it/s]

	Current Loss: 1.9654
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2511/2613 [31:27<01:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2512/2613 [31:27<01:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2513/2613 [31:28<01:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2514/2613 [31:29<01:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2515/2613 [31:30<01:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2516/2613 [31:30<01:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2517/2613 [31:31<01:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2518/2613 [31:32<01:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2519/2613 [31:33<01:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2520/2613 [31:33<01:09,  1.33it/s]

	Current Loss: 1.9588
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2521/2613 [31:34<01:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2522/2613 [31:35<01:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2523/2613 [31:36<01:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2524/2613 [31:36<01:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2525/2613 [31:37<01:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2526/2613 [31:38<01:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2527/2613 [31:39<01:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2528/2613 [31:39<01:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2529/2613 [31:40<01:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2530/2613 [31:41<01:02,  1.33it/s]

	Current Loss: 1.9656
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2531/2613 [31:42<01:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2532/2613 [31:42<01:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2533/2613 [31:43<01:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2534/2613 [31:44<00:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2535/2613 [31:45<00:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2536/2613 [31:45<00:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2537/2613 [31:46<00:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2538/2613 [31:47<00:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2539/2613 [31:48<00:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2540/2613 [31:48<00:54,  1.33it/s]

	Current Loss: 1.9584
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2541/2613 [31:49<00:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2542/2613 [31:50<00:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2543/2613 [31:51<00:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2544/2613 [31:51<00:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2545/2613 [31:52<00:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2546/2613 [31:53<00:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2547/2613 [31:54<00:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2548/2613 [31:54<00:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2549/2613 [31:55<00:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2550/2613 [31:56<00:47,  1.33it/s]

	Current Loss: 1.9655
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2551/2613 [31:57<00:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2552/2613 [31:57<00:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2553/2613 [31:58<00:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2554/2613 [31:59<00:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2555/2613 [32:00<00:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2556/2613 [32:00<00:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2557/2613 [32:01<00:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2558/2613 [32:02<00:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2559/2613 [32:03<00:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2560/2613 [32:03<00:39,  1.33it/s]

	Current Loss: 1.9566
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2561/2613 [32:04<00:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2562/2613 [32:05<00:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2563/2613 [32:06<00:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2564/2613 [32:07<00:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2565/2613 [32:07<00:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2566/2613 [32:08<00:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2567/2613 [32:09<00:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2568/2613 [32:10<00:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2569/2613 [32:10<00:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2570/2613 [32:11<00:32,  1.33it/s]

	Current Loss: 1.9539
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2571/2613 [32:12<00:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2572/2613 [32:13<00:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2573/2613 [32:13<00:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2574/2613 [32:14<00:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2575/2613 [32:15<00:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2576/2613 [32:16<00:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2577/2613 [32:16<00:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2578/2613 [32:17<00:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2579/2613 [32:18<00:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2580/2613 [32:19<00:24,  1.33it/s]

	Current Loss: 1.9520
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2581/2613 [32:19<00:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2582/2613 [32:20<00:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2583/2613 [32:21<00:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2584/2613 [32:22<00:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2585/2613 [32:22<00:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2586/2613 [32:23<00:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2587/2613 [32:24<00:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2588/2613 [32:25<00:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2589/2613 [32:25<00:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2590/2613 [32:26<00:17,  1.33it/s]

	Current Loss: 1.9556
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2591/2613 [32:27<00:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2592/2613 [32:28<00:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2593/2613 [32:28<00:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2594/2613 [32:29<00:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2595/2613 [32:30<00:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2596/2613 [32:31<00:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2597/2613 [32:31<00:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2598/2613 [32:32<00:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2599/2613 [32:33<00:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2600/2613 [32:34<00:09,  1.33it/s]

	Current Loss: 1.9519
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2601/2613 [32:34<00:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2602/2613 [32:35<00:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2603/2613 [32:36<00:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2604/2613 [32:37<00:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2605/2613 [32:37<00:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2606/2613 [32:38<00:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2607/2613 [32:39<00:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2608/2613 [32:40<00:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2609/2613 [32:40<00:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2610/2613 [32:41<00:02,  1.33it/s]

	Current Loss: 1.9586
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2611/2613 [32:42<00:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2612/2613 [32:43<00:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|██████████| 2613/2613 [32:43<00:00,  1.33it/s]


Epoch 1, Train Loss: 2.1231, Time: 1963.91s


  0%|          | 0/2613 [00:00<?, ?it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 1/2613 [00:00<14:53,  2.92it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 2/2613 [00:01<25:23,  1.71it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 3/2613 [00:01<28:41,  1.52it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 4/2613 [00:02<30:15,  1.44it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 5/2613 [00:03<31:09,  1.39it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 6/2613 [00:04<31:38,  1.37it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 7/2613 [00:04<31:59,  1.36it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 8/2613 [00:05<32:10,  1.35it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 9/2613 [00:06<32:19,  1.34it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 10/2613 [00:07<32:25,  1.34it/s]

	Current Loss: 1.9520
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 11/2613 [00:07<32:28,  1.34it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 12/2613 [00:08<32:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 13/2613 [00:09<32:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 14/2613 [00:10<32:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 15/2613 [00:10<32:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 16/2613 [00:11<32:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 17/2613 [00:12<32:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 18/2613 [00:13<32:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 19/2613 [00:13<32:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 20/2613 [00:14<32:28,  1.33it/s]

	Current Loss: 1.9476
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 21/2613 [00:15<32:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 22/2613 [00:16<32:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 23/2613 [00:16<32:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 24/2613 [00:17<32:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 25/2613 [00:18<32:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 26/2613 [00:19<32:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 27/2613 [00:19<32:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 28/2613 [00:20<32:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 29/2613 [00:21<32:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 30/2613 [00:22<32:22,  1.33it/s]

	Current Loss: 1.9435
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 31/2613 [00:22<32:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 32/2613 [00:23<32:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 33/2613 [00:24<32:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 34/2613 [00:25<32:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 35/2613 [00:25<32:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 36/2613 [00:26<32:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 37/2613 [00:27<32:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 38/2613 [00:28<32:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 39/2613 [00:28<32:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 40/2613 [00:29<32:13,  1.33it/s]

	Current Loss: 1.9507
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 41/2613 [00:30<32:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 42/2613 [00:31<32:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 43/2613 [00:31<32:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 44/2613 [00:32<32:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 45/2613 [00:33<32:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 46/2613 [00:34<32:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 47/2613 [00:34<32:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 48/2613 [00:35<32:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 49/2613 [00:36<32:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 50/2613 [00:37<32:06,  1.33it/s]

	Current Loss: 1.9484
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 51/2613 [00:37<32:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 52/2613 [00:38<32:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 53/2613 [00:39<32:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 54/2613 [00:40<32:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 55/2613 [00:40<32:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 56/2613 [00:41<32:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 57/2613 [00:42<32:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 58/2613 [00:43<31:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 59/2613 [00:43<31:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 60/2613 [00:44<31:57,  1.33it/s]

	Current Loss: 1.9454
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 61/2613 [00:45<31:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 62/2613 [00:46<31:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 63/2613 [00:46<31:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 64/2613 [00:47<31:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 65/2613 [00:48<31:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 66/2613 [00:49<31:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 67/2613 [00:49<31:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 68/2613 [00:50<31:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 69/2613 [00:51<31:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 70/2613 [00:52<31:50,  1.33it/s]

	Current Loss: 1.9427
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 71/2613 [00:52<31:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 72/2613 [00:53<31:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 73/2613 [00:54<31:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 74/2613 [00:55<31:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 75/2613 [00:55<31:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 76/2613 [00:56<31:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 77/2613 [00:57<31:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 78/2613 [00:58<31:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 79/2613 [00:58<31:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 80/2613 [00:59<31:45,  1.33it/s]

	Current Loss: 1.9461
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 81/2613 [01:00<31:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 82/2613 [01:01<31:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 83/2613 [01:01<31:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 84/2613 [01:02<31:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 85/2613 [01:03<31:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 86/2613 [01:04<31:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 87/2613 [01:04<31:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 88/2613 [01:05<31:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 89/2613 [01:06<31:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 90/2613 [01:07<31:36,  1.33it/s]

	Current Loss: 1.9495
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 91/2613 [01:08<31:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 92/2613 [01:08<31:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 93/2613 [01:09<31:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 94/2613 [01:10<31:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 95/2613 [01:11<31:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 96/2613 [01:11<31:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 97/2613 [01:12<31:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 98/2613 [01:13<31:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 99/2613 [01:14<31:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 100/2613 [01:14<31:29,  1.33it/s]

	Current Loss: 1.9451
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 101/2613 [01:15<31:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 102/2613 [01:16<31:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 103/2613 [01:17<31:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 104/2613 [01:17<31:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 105/2613 [01:18<31:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 106/2613 [01:19<31:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 107/2613 [01:20<31:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 108/2613 [01:20<31:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 109/2613 [01:21<31:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 110/2613 [01:22<31:21,  1.33it/s]

	Current Loss: 1.9378
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 111/2613 [01:23<31:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 112/2613 [01:23<31:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 113/2613 [01:24<31:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 114/2613 [01:25<31:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 115/2613 [01:26<31:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 116/2613 [01:26<31:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 117/2613 [01:27<31:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 118/2613 [01:28<31:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 119/2613 [01:29<31:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 120/2613 [01:29<31:14,  1.33it/s]

	Current Loss: 1.9400
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 121/2613 [01:30<31:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 122/2613 [01:31<31:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 123/2613 [01:32<31:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 124/2613 [01:32<31:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 125/2613 [01:33<31:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 126/2613 [01:34<31:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 127/2613 [01:35<31:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 128/2613 [01:35<31:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 129/2613 [01:36<31:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 130/2613 [01:37<31:07,  1.33it/s]

	Current Loss: 1.9333
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 131/2613 [01:38<31:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 132/2613 [01:38<31:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 133/2613 [01:39<31:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 134/2613 [01:40<31:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 135/2613 [01:41<31:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 136/2613 [01:41<31:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 137/2613 [01:42<31:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 138/2613 [01:43<31:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 139/2613 [01:44<30:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 140/2613 [01:44<30:58,  1.33it/s]

	Current Loss: 1.9374
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 141/2613 [01:45<30:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 142/2613 [01:46<30:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 143/2613 [01:47<30:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 144/2613 [01:47<30:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 145/2613 [01:48<30:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 146/2613 [01:49<30:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 147/2613 [01:50<30:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 148/2613 [01:50<30:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 149/2613 [01:51<30:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 150/2613 [01:52<30:51,  1.33it/s]

	Current Loss: 1.9380
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 151/2613 [01:53<30:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 152/2613 [01:53<30:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 153/2613 [01:54<30:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 154/2613 [01:55<30:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 155/2613 [01:56<30:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 156/2613 [01:56<30:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 157/2613 [01:57<30:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 158/2613 [01:58<30:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 159/2613 [01:59<30:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 160/2613 [01:59<30:44,  1.33it/s]

	Current Loss: 1.9332
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 161/2613 [02:00<30:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 162/2613 [02:01<30:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 163/2613 [02:02<30:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 164/2613 [02:02<30:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 165/2613 [02:03<30:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 166/2613 [02:04<30:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 167/2613 [02:05<30:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 168/2613 [02:05<30:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 169/2613 [02:06<30:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 170/2613 [02:07<30:36,  1.33it/s]

	Current Loss: 1.9322
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 171/2613 [02:08<30:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 172/2613 [02:08<30:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 173/2613 [02:09<30:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 174/2613 [02:10<30:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 175/2613 [02:11<30:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 176/2613 [02:11<30:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 177/2613 [02:12<30:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 178/2613 [02:13<30:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 179/2613 [02:14<30:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 180/2613 [02:14<30:28,  1.33it/s]

	Current Loss: 1.9309
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 181/2613 [02:15<30:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 182/2613 [02:16<30:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 183/2613 [02:17<30:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 184/2613 [02:17<30:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 185/2613 [02:18<30:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 186/2613 [02:19<30:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 187/2613 [02:20<30:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 188/2613 [02:20<30:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 189/2613 [02:21<30:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 190/2613 [02:22<30:21,  1.33it/s]

	Current Loss: 1.9368
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 191/2613 [02:23<30:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 192/2613 [02:23<30:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 193/2613 [02:24<30:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 194/2613 [02:25<30:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 195/2613 [02:26<30:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 196/2613 [02:26<30:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 197/2613 [02:27<30:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 198/2613 [02:28<30:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 199/2613 [02:29<30:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 200/2613 [02:29<30:14,  1.33it/s]

	Current Loss: 1.9355
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 201/2613 [02:30<30:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 202/2613 [02:31<30:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 203/2613 [02:32<30:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 204/2613 [02:32<30:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 205/2613 [02:33<30:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 206/2613 [02:34<30:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 207/2613 [02:35<30:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 208/2613 [02:35<30:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 209/2613 [02:36<30:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 210/2613 [02:37<30:06,  1.33it/s]

	Current Loss: 1.9270
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 211/2613 [02:38<30:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 212/2613 [02:38<30:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 213/2613 [02:39<30:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 214/2613 [02:40<30:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 215/2613 [02:41<30:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 216/2613 [02:41<30:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 217/2613 [02:42<30:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 218/2613 [02:43<30:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 219/2613 [02:44<29:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 220/2613 [02:44<29:58,  1.33it/s]

	Current Loss: 1.9343
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 221/2613 [02:45<30:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 222/2613 [02:46<30:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 223/2613 [02:47<29:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 224/2613 [02:47<29:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 225/2613 [02:48<29:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 226/2613 [02:49<29:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 227/2613 [02:50<29:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 228/2613 [02:51<29:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 229/2613 [02:51<29:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 230/2613 [02:52<29:50,  1.33it/s]

	Current Loss: 1.9291
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 231/2613 [02:53<29:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 232/2613 [02:54<29:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 233/2613 [02:54<29:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 234/2613 [02:55<29:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 235/2613 [02:56<29:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 236/2613 [02:57<29:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 237/2613 [02:57<29:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 238/2613 [02:58<29:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 239/2613 [02:59<29:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 240/2613 [03:00<29:43,  1.33it/s]

	Current Loss: 1.9299
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 241/2613 [03:00<29:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 242/2613 [03:01<29:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 243/2613 [03:02<29:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 244/2613 [03:03<29:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 245/2613 [03:03<29:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 246/2613 [03:04<29:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 247/2613 [03:05<29:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 248/2613 [03:06<29:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 249/2613 [03:06<29:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 250/2613 [03:07<29:36,  1.33it/s]

	Current Loss: 1.9187
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 251/2613 [03:08<29:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 252/2613 [03:09<29:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 253/2613 [03:09<29:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 254/2613 [03:10<29:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 255/2613 [03:11<29:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 256/2613 [03:12<29:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 257/2613 [03:12<29:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 258/2613 [03:13<29:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 259/2613 [03:14<29:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 260/2613 [03:15<29:28,  1.33it/s]

	Current Loss: 1.9305
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 261/2613 [03:15<29:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 262/2613 [03:16<29:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 263/2613 [03:17<29:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 264/2613 [03:18<29:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 265/2613 [03:18<29:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 266/2613 [03:19<29:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 267/2613 [03:20<29:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 268/2613 [03:21<29:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 269/2613 [03:21<29:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 270/2613 [03:22<29:21,  1.33it/s]

	Current Loss: 1.9269
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 271/2613 [03:23<29:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 272/2613 [03:24<29:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 273/2613 [03:24<29:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 274/2613 [03:25<29:49,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 275/2613 [03:26<29:39,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 276/2613 [03:27<29:32,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 277/2613 [03:27<29:26,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 278/2613 [03:28<29:22,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 279/2613 [03:29<29:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 280/2613 [03:30<29:18,  1.33it/s]

	Current Loss: 1.9229
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 281/2613 [03:30<29:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 282/2613 [03:31<29:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 283/2613 [03:32<29:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 284/2613 [03:33<29:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 285/2613 [03:33<29:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 286/2613 [03:34<29:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 287/2613 [03:35<29:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 288/2613 [03:36<29:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 289/2613 [03:36<29:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 290/2613 [03:37<29:06,  1.33it/s]

	Current Loss: 1.9277
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 291/2613 [03:38<29:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 292/2613 [03:39<29:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 293/2613 [03:39<29:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 294/2613 [03:40<29:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 295/2613 [03:41<29:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 296/2613 [03:42<29:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 297/2613 [03:42<29:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 298/2613 [03:43<29:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 299/2613 [03:44<28:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 300/2613 [03:45<28:57,  1.33it/s]

	Current Loss: 1.9231
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 301/2613 [03:45<28:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 302/2613 [03:46<28:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 303/2613 [03:47<28:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 304/2613 [03:48<28:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 305/2613 [03:48<28:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 306/2613 [03:49<28:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 307/2613 [03:50<28:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 308/2613 [03:51<28:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 309/2613 [03:51<28:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 310/2613 [03:52<28:51,  1.33it/s]

	Current Loss: 1.9165
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 311/2613 [03:53<28:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 312/2613 [03:54<28:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 313/2613 [03:54<28:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 314/2613 [03:55<28:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 315/2613 [03:56<28:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 316/2613 [03:57<28:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 317/2613 [03:57<28:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 318/2613 [03:58<28:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 319/2613 [03:59<28:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 320/2613 [04:00<28:44,  1.33it/s]

	Current Loss: 1.9165
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 321/2613 [04:00<28:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 322/2613 [04:01<28:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 323/2613 [04:02<28:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 324/2613 [04:03<28:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 325/2613 [04:03<28:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 326/2613 [04:04<28:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 327/2613 [04:05<28:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 328/2613 [04:06<28:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 329/2613 [04:06<28:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 330/2613 [04:07<28:36,  1.33it/s]

	Current Loss: 1.9207
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 331/2613 [04:08<28:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 332/2613 [04:09<28:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 333/2613 [04:09<28:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 334/2613 [04:10<28:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 335/2613 [04:11<28:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 336/2613 [04:12<28:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 337/2613 [04:12<28:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 338/2613 [04:13<28:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 339/2613 [04:14<28:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 340/2613 [04:15<28:29,  1.33it/s]

	Current Loss: 1.9188
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 341/2613 [04:15<28:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 342/2613 [04:16<28:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 343/2613 [04:17<28:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 344/2613 [04:18<28:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 345/2613 [04:19<28:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 346/2613 [04:19<28:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 347/2613 [04:20<28:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 348/2613 [04:21<28:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 349/2613 [04:22<28:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 350/2613 [04:22<28:22,  1.33it/s]

	Current Loss: 1.9169
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 351/2613 [04:23<28:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 352/2613 [04:24<28:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 353/2613 [04:25<28:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 354/2613 [04:25<28:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 355/2613 [04:26<28:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 356/2613 [04:27<28:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 357/2613 [04:28<28:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 358/2613 [04:28<28:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 359/2613 [04:29<28:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 360/2613 [04:30<28:14,  1.33it/s]

	Current Loss: 1.9175
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 361/2613 [04:31<28:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 362/2613 [04:31<28:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 363/2613 [04:32<28:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 364/2613 [04:33<28:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 365/2613 [04:34<28:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 366/2613 [04:34<28:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 367/2613 [04:35<28:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 368/2613 [04:36<28:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 369/2613 [04:37<28:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 370/2613 [04:37<28:05,  1.33it/s]

	Current Loss: 1.9235
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 371/2613 [04:38<28:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 372/2613 [04:39<28:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 373/2613 [04:40<28:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 374/2613 [04:40<28:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 375/2613 [04:41<28:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 376/2613 [04:42<28:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 377/2613 [04:43<28:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 378/2613 [04:43<27:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 379/2613 [04:44<27:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 380/2613 [04:45<27:58,  1.33it/s]

	Current Loss: 1.9130
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 381/2613 [04:46<27:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 382/2613 [04:46<27:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 383/2613 [04:47<27:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 384/2613 [04:48<27:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 385/2613 [04:49<27:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 386/2613 [04:49<27:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 387/2613 [04:50<27:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 388/2613 [04:51<27:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 389/2613 [04:52<27:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 390/2613 [04:52<27:52,  1.33it/s]

	Current Loss: 1.9199
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 391/2613 [04:53<27:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 392/2613 [04:54<27:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 393/2613 [04:55<27:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 394/2613 [04:55<27:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 395/2613 [04:56<27:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 396/2613 [04:57<27:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 397/2613 [04:58<27:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 398/2613 [04:58<27:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 399/2613 [04:59<27:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 400/2613 [05:00<27:43,  1.33it/s]

	Current Loss: 1.9149
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 401/2613 [05:01<27:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 402/2613 [05:01<27:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 403/2613 [05:02<27:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 404/2613 [05:03<27:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 405/2613 [05:04<27:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 406/2613 [05:04<27:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 407/2613 [05:05<27:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 408/2613 [05:06<27:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 409/2613 [05:07<27:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 410/2613 [05:07<27:35,  1.33it/s]

	Current Loss: 1.9152
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 411/2613 [05:08<27:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 412/2613 [05:09<27:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 413/2613 [05:10<27:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 414/2613 [05:10<27:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 415/2613 [05:11<27:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 416/2613 [05:12<27:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 417/2613 [05:13<27:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 418/2613 [05:13<27:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 419/2613 [05:14<27:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 420/2613 [05:15<27:29,  1.33it/s]

	Current Loss: 1.9113
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 421/2613 [05:16<27:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 422/2613 [05:16<27:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 423/2613 [05:17<27:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 424/2613 [05:18<27:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 425/2613 [05:19<27:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 426/2613 [05:19<27:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 427/2613 [05:20<27:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 428/2613 [05:21<27:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 429/2613 [05:22<27:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 430/2613 [05:22<27:21,  1.33it/s]

	Current Loss: 1.9118
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 431/2613 [05:23<27:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 432/2613 [05:24<27:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 433/2613 [05:25<27:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 434/2613 [05:25<27:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 435/2613 [05:26<27:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 436/2613 [05:27<27:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 437/2613 [05:28<27:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 438/2613 [05:28<27:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 439/2613 [05:29<27:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 440/2613 [05:30<27:14,  1.33it/s]

	Current Loss: 1.9131
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 441/2613 [05:31<27:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 442/2613 [05:31<27:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 443/2613 [05:32<27:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 444/2613 [05:33<27:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 445/2613 [05:34<27:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 446/2613 [05:34<27:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 447/2613 [05:35<27:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 448/2613 [05:36<27:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 449/2613 [05:37<27:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 450/2613 [05:37<27:06,  1.33it/s]

	Current Loss: 1.9128
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 451/2613 [05:38<27:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 452/2613 [05:39<27:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 453/2613 [05:40<27:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 454/2613 [05:40<27:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 455/2613 [05:41<27:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 456/2613 [05:42<27:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 457/2613 [05:43<27:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 458/2613 [05:43<27:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 459/2613 [05:44<26:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 460/2613 [05:45<26:57,  1.33it/s]

	Current Loss: 1.9052
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 461/2613 [05:46<26:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 462/2613 [05:46<26:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 463/2613 [05:47<26:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 464/2613 [05:48<26:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 465/2613 [05:49<26:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 466/2613 [05:49<26:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 467/2613 [05:50<26:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 468/2613 [05:51<26:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 469/2613 [05:52<26:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 470/2613 [05:52<26:51,  1.33it/s]

	Current Loss: 1.9043
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 471/2613 [05:53<26:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 472/2613 [05:54<26:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 473/2613 [05:55<26:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 474/2613 [05:56<26:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 475/2613 [05:56<26:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 476/2613 [05:57<26:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 477/2613 [05:58<26:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 478/2613 [05:59<26:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 479/2613 [05:59<26:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 480/2613 [06:00<26:43,  1.33it/s]

	Current Loss: 1.9135
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 481/2613 [06:01<26:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 482/2613 [06:02<26:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 483/2613 [06:02<26:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 484/2613 [06:03<26:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 485/2613 [06:04<26:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 486/2613 [06:05<26:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 487/2613 [06:05<26:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 488/2613 [06:06<26:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 489/2613 [06:07<26:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 490/2613 [06:08<26:35,  1.33it/s]

	Current Loss: 1.9092
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 491/2613 [06:08<26:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 492/2613 [06:09<26:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 493/2613 [06:10<26:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 494/2613 [06:11<26:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 495/2613 [06:11<26:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 496/2613 [06:12<26:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 497/2613 [06:13<26:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 498/2613 [06:14<26:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 499/2613 [06:14<26:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 500/2613 [06:15<26:28,  1.33it/s]

	Current Loss: 1.9027
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 501/2613 [06:16<26:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 502/2613 [06:17<26:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 503/2613 [06:17<26:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 504/2613 [06:18<26:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 505/2613 [06:19<26:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 506/2613 [06:20<26:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 507/2613 [06:20<26:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 508/2613 [06:21<26:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 509/2613 [06:22<26:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 510/2613 [06:23<26:21,  1.33it/s]

	Current Loss: 1.9058
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 511/2613 [06:23<26:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 512/2613 [06:24<26:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 513/2613 [06:25<26:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 514/2613 [06:26<26:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 515/2613 [06:26<26:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 516/2613 [06:27<26:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 517/2613 [06:28<26:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 518/2613 [06:29<26:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 519/2613 [06:29<26:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 520/2613 [06:30<26:14,  1.33it/s]

	Current Loss: 1.9106
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 521/2613 [06:31<26:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 522/2613 [06:32<26:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 523/2613 [06:32<26:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 524/2613 [06:33<26:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 525/2613 [06:34<26:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 526/2613 [06:35<26:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 527/2613 [06:35<26:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 528/2613 [06:36<26:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 529/2613 [06:37<26:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 530/2613 [06:38<26:07,  1.33it/s]

	Current Loss: 1.9067
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 531/2613 [06:38<26:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 532/2613 [06:39<26:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 533/2613 [06:40<26:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 534/2613 [06:41<26:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 535/2613 [06:41<26:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 536/2613 [06:42<26:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 537/2613 [06:43<26:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 538/2613 [06:44<26:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 539/2613 [06:44<25:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 540/2613 [06:45<25:58,  1.33it/s]

	Current Loss: 1.9002
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 541/2613 [06:46<25:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 542/2613 [06:47<25:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 543/2613 [06:47<25:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 544/2613 [06:48<25:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 545/2613 [06:49<25:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 546/2613 [06:50<25:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 547/2613 [06:50<25:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 548/2613 [06:51<25:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 549/2613 [06:52<25:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 550/2613 [06:53<25:51,  1.33it/s]

	Current Loss: 1.9011
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 551/2613 [06:53<25:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 552/2613 [06:54<25:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 553/2613 [06:55<25:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 554/2613 [06:56<25:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 555/2613 [06:56<25:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 556/2613 [06:57<25:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 557/2613 [06:58<25:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 558/2613 [06:59<25:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 559/2613 [06:59<25:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 560/2613 [07:00<25:46,  1.33it/s]

	Current Loss: 1.8943
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 561/2613 [07:01<25:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 562/2613 [07:02<25:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 563/2613 [07:02<25:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 564/2613 [07:03<25:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 565/2613 [07:04<25:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 566/2613 [07:05<25:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 567/2613 [07:05<25:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 568/2613 [07:06<25:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 569/2613 [07:07<25:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 570/2613 [07:08<25:36,  1.33it/s]

	Current Loss: 1.9030
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 571/2613 [07:08<25:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 572/2613 [07:09<25:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 573/2613 [07:10<25:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 574/2613 [07:11<25:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 575/2613 [07:11<25:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 576/2613 [07:12<25:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 577/2613 [07:13<25:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 578/2613 [07:14<25:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 579/2613 [07:14<25:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 580/2613 [07:15<25:28,  1.33it/s]

	Current Loss: 1.8976
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 581/2613 [07:16<25:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 582/2613 [07:17<25:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 583/2613 [07:17<25:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 584/2613 [07:18<25:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 585/2613 [07:19<25:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 586/2613 [07:20<25:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 587/2613 [07:20<25:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 588/2613 [07:21<25:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 589/2613 [07:22<25:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 590/2613 [07:23<25:22,  1.33it/s]

	Current Loss: 1.8951
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 591/2613 [07:23<25:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 592/2613 [07:24<25:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 593/2613 [07:25<25:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 594/2613 [07:26<25:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 595/2613 [07:26<25:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 596/2613 [07:27<25:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 597/2613 [07:28<25:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 598/2613 [07:29<25:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 599/2613 [07:30<25:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 600/2613 [07:30<25:15,  1.33it/s]

	Current Loss: 1.9042
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 601/2613 [07:31<25:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 602/2613 [07:32<25:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 603/2613 [07:33<25:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 604/2613 [07:33<25:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 605/2613 [07:34<25:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 606/2613 [07:35<25:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 607/2613 [07:36<25:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 608/2613 [07:36<25:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 609/2613 [07:37<25:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 610/2613 [07:38<25:06,  1.33it/s]

	Current Loss: 1.8970
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 611/2613 [07:39<25:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 612/2613 [07:39<25:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 613/2613 [07:40<25:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 614/2613 [07:41<25:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 615/2613 [07:42<25:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 616/2613 [07:42<25:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 617/2613 [07:43<24:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 618/2613 [07:44<24:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 619/2613 [07:45<24:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 620/2613 [07:45<24:59,  1.33it/s]

	Current Loss: 1.8962
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 621/2613 [07:46<24:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 622/2613 [07:47<24:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 623/2613 [07:48<24:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 624/2613 [07:48<24:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 625/2613 [07:49<24:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 626/2613 [07:50<24:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 627/2613 [07:51<24:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 628/2613 [07:51<24:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 629/2613 [07:52<24:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 630/2613 [07:53<24:50,  1.33it/s]

	Current Loss: 1.8942
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 631/2613 [07:54<24:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 632/2613 [07:54<24:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 633/2613 [07:55<24:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 634/2613 [07:56<24:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 635/2613 [07:57<24:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 636/2613 [07:57<24:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 637/2613 [07:58<24:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 638/2613 [07:59<24:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 639/2613 [08:00<24:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 640/2613 [08:00<24:43,  1.33it/s]

	Current Loss: 1.8943
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 641/2613 [08:01<24:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 642/2613 [08:02<24:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 643/2613 [08:03<24:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 644/2613 [08:03<24:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 645/2613 [08:04<24:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 646/2613 [08:05<24:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 647/2613 [08:06<24:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 648/2613 [08:06<24:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 649/2613 [08:07<24:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 650/2613 [08:08<24:35,  1.33it/s]

	Current Loss: 1.8861
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 651/2613 [08:09<24:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 652/2613 [08:09<24:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 653/2613 [08:10<24:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 654/2613 [08:11<24:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 655/2613 [08:12<24:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 656/2613 [08:12<24:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 657/2613 [08:13<24:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 658/2613 [08:14<24:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 659/2613 [08:15<24:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 660/2613 [08:15<24:28,  1.33it/s]

	Current Loss: 1.8910
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 661/2613 [08:16<24:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 662/2613 [08:17<24:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 663/2613 [08:18<24:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 664/2613 [08:18<24:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 665/2613 [08:19<24:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 666/2613 [08:20<24:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 667/2613 [08:21<24:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 668/2613 [08:21<24:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 669/2613 [08:22<24:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 670/2613 [08:23<24:19,  1.33it/s]

	Current Loss: 1.8903
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 671/2613 [08:24<24:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 672/2613 [08:24<24:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 673/2613 [08:25<24:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 674/2613 [08:26<24:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 675/2613 [08:27<24:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 676/2613 [08:27<24:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 677/2613 [08:28<24:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 678/2613 [08:29<24:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 679/2613 [08:30<24:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 680/2613 [08:30<24:13,  1.33it/s]

	Current Loss: 1.8859
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 681/2613 [08:31<24:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 682/2613 [08:32<24:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 683/2613 [08:33<24:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 684/2613 [08:33<24:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 685/2613 [08:34<24:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 686/2613 [08:35<24:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 687/2613 [08:36<24:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 688/2613 [08:36<24:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 689/2613 [08:37<24:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 690/2613 [08:38<24:06,  1.33it/s]

	Current Loss: 1.8919
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 691/2613 [08:39<24:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 692/2613 [08:39<24:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 693/2613 [08:40<24:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 694/2613 [08:41<24:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 695/2613 [08:42<24:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 696/2613 [08:42<24:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 697/2613 [08:43<24:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 698/2613 [08:44<23:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 699/2613 [08:45<23:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 700/2613 [08:45<23:58,  1.33it/s]

	Current Loss: 1.8908
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 701/2613 [08:46<23:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 702/2613 [08:47<23:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 703/2613 [08:48<23:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 704/2613 [08:48<23:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 705/2613 [08:49<23:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 706/2613 [08:50<23:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 707/2613 [08:51<23:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 708/2613 [08:51<23:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 709/2613 [08:52<23:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 710/2613 [08:53<23:50,  1.33it/s]

	Current Loss: 1.8877
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 711/2613 [08:54<23:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 712/2613 [08:54<23:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 713/2613 [08:55<23:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 714/2613 [08:56<23:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 715/2613 [08:57<23:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 716/2613 [08:57<23:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 717/2613 [08:58<23:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 718/2613 [08:59<23:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 719/2613 [09:00<23:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 720/2613 [09:00<23:43,  1.33it/s]

	Current Loss: 1.8853
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 721/2613 [09:01<23:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 722/2613 [09:02<23:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 723/2613 [09:03<23:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 724/2613 [09:03<23:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 725/2613 [09:04<23:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 726/2613 [09:05<23:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 727/2613 [09:06<23:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 728/2613 [09:07<23:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 729/2613 [09:07<23:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 730/2613 [09:08<23:35,  1.33it/s]

	Current Loss: 1.8864
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 731/2613 [09:09<23:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 732/2613 [09:10<23:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 733/2613 [09:10<23:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 734/2613 [09:11<23:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 735/2613 [09:12<23:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 736/2613 [09:13<23:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 737/2613 [09:13<23:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 738/2613 [09:14<23:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 739/2613 [09:15<23:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 740/2613 [09:16<23:27,  1.33it/s]

	Current Loss: 1.8831
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 741/2613 [09:16<23:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 742/2613 [09:17<23:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 743/2613 [09:18<23:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 744/2613 [09:19<23:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 745/2613 [09:19<23:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 746/2613 [09:20<23:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 747/2613 [09:21<23:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 748/2613 [09:22<23:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 749/2613 [09:22<23:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 750/2613 [09:23<23:21,  1.33it/s]

	Current Loss: 1.8922
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 751/2613 [09:24<23:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 752/2613 [09:25<23:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 753/2613 [09:25<23:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 754/2613 [09:26<23:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 755/2613 [09:27<23:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 756/2613 [09:28<23:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 757/2613 [09:28<23:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 758/2613 [09:29<23:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 759/2613 [09:30<23:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 760/2613 [09:31<23:13,  1.33it/s]

	Current Loss: 1.8793
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 761/2613 [09:31<23:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 762/2613 [09:32<23:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 763/2613 [09:33<23:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 764/2613 [09:34<23:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 765/2613 [09:34<23:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 766/2613 [09:35<23:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 767/2613 [09:36<23:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 768/2613 [09:37<23:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 769/2613 [09:37<23:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 770/2613 [09:38<23:05,  1.33it/s]

	Current Loss: 1.8839
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 771/2613 [09:39<23:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 772/2613 [09:40<23:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 773/2613 [09:40<23:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 774/2613 [09:41<23:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 775/2613 [09:42<23:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 776/2613 [09:43<23:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 777/2613 [09:43<23:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 778/2613 [09:44<22:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 779/2613 [09:45<22:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 780/2613 [09:46<22:58,  1.33it/s]

	Current Loss: 1.8814
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 781/2613 [09:46<22:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 782/2613 [09:47<22:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 783/2613 [09:48<22:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 784/2613 [09:49<22:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 785/2613 [09:49<22:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 786/2613 [09:50<22:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 787/2613 [09:51<22:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 788/2613 [09:52<22:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 789/2613 [09:52<22:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 790/2613 [09:53<22:50,  1.33it/s]

	Current Loss: 1.8827
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 791/2613 [09:54<22:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 792/2613 [09:55<22:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 793/2613 [09:55<22:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 794/2613 [09:56<22:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 795/2613 [09:57<22:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 796/2613 [09:58<22:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 797/2613 [09:58<22:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 798/2613 [09:59<22:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 799/2613 [10:00<22:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 800/2613 [10:01<22:43,  1.33it/s]

	Current Loss: 1.8786
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 801/2613 [10:01<22:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 802/2613 [10:02<22:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 803/2613 [10:03<22:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 804/2613 [10:04<22:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 805/2613 [10:04<22:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 806/2613 [10:05<22:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 807/2613 [10:06<22:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 808/2613 [10:07<22:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 809/2613 [10:07<22:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 810/2613 [10:08<22:35,  1.33it/s]

	Current Loss: 1.8783
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 811/2613 [10:09<22:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 812/2613 [10:10<22:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 813/2613 [10:10<22:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 814/2613 [10:11<22:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 815/2613 [10:12<22:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 816/2613 [10:13<22:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 817/2613 [10:13<22:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 818/2613 [10:14<22:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 819/2613 [10:15<22:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 820/2613 [10:16<22:28,  1.33it/s]

	Current Loss: 1.8772
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 821/2613 [10:16<22:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 822/2613 [10:17<22:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 823/2613 [10:18<22:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 824/2613 [10:19<22:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 825/2613 [10:19<22:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 826/2613 [10:20<22:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 827/2613 [10:21<22:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 828/2613 [10:22<22:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 829/2613 [10:22<22:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 830/2613 [10:23<22:21,  1.33it/s]

	Current Loss: 1.8774
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 831/2613 [10:24<22:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 832/2613 [10:25<22:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 833/2613 [10:25<22:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 834/2613 [10:26<22:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 835/2613 [10:27<22:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 836/2613 [10:28<22:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 837/2613 [10:28<22:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 838/2613 [10:29<22:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 839/2613 [10:30<22:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 840/2613 [10:31<22:13,  1.33it/s]

	Current Loss: 1.8810
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 841/2613 [10:31<22:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 842/2613 [10:32<22:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 843/2613 [10:33<22:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 844/2613 [10:34<22:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 845/2613 [10:34<22:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 846/2613 [10:35<22:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 847/2613 [10:36<22:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 848/2613 [10:37<22:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 849/2613 [10:37<22:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 850/2613 [10:38<22:06,  1.33it/s]

	Current Loss: 1.8780
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 851/2613 [10:39<22:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 852/2613 [10:40<22:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 853/2613 [10:40<22:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 854/2613 [10:41<22:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 855/2613 [10:42<22:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 856/2613 [10:43<22:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 857/2613 [10:44<22:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 858/2613 [10:44<22:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 859/2613 [10:45<21:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 860/2613 [10:46<21:58,  1.33it/s]

	Current Loss: 1.8735
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 861/2613 [10:47<21:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 862/2613 [10:47<21:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 863/2613 [10:48<21:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 864/2613 [10:49<21:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 865/2613 [10:50<21:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 866/2613 [10:50<21:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 867/2613 [10:51<21:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 868/2613 [10:52<21:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 869/2613 [10:53<21:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 870/2613 [10:53<21:50,  1.33it/s]

	Current Loss: 1.8715
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 871/2613 [10:54<21:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 872/2613 [10:55<21:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 873/2613 [10:56<21:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 874/2613 [10:56<21:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 875/2613 [10:57<21:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 876/2613 [10:58<21:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 877/2613 [10:59<21:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 878/2613 [10:59<21:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 879/2613 [11:00<21:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 880/2613 [11:01<21:43,  1.33it/s]

	Current Loss: 1.8692
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 881/2613 [11:02<21:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 882/2613 [11:02<21:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 883/2613 [11:03<21:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 884/2613 [11:04<21:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 885/2613 [11:05<21:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 886/2613 [11:05<21:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 887/2613 [11:06<21:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 888/2613 [11:07<21:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 889/2613 [11:08<21:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 890/2613 [11:08<21:35,  1.33it/s]

	Current Loss: 1.8715
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 891/2613 [11:09<21:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 892/2613 [11:10<21:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 893/2613 [11:11<21:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 894/2613 [11:11<21:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 895/2613 [11:12<21:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 896/2613 [11:13<21:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 897/2613 [11:14<21:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 898/2613 [11:14<21:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 899/2613 [11:15<21:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 900/2613 [11:16<21:28,  1.33it/s]

	Current Loss: 1.8748
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 901/2613 [11:17<21:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 902/2613 [11:17<21:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 903/2613 [11:18<21:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 904/2613 [11:19<21:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 905/2613 [11:20<21:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 906/2613 [11:20<21:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 907/2613 [11:21<21:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 908/2613 [11:22<21:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 909/2613 [11:23<21:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 910/2613 [11:23<21:20,  1.33it/s]

	Current Loss: 1.8710
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 911/2613 [11:24<21:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 912/2613 [11:25<21:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 913/2613 [11:26<21:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 914/2613 [11:26<21:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 915/2613 [11:27<21:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 916/2613 [11:28<21:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 917/2613 [11:29<21:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 918/2613 [11:29<21:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 919/2613 [11:30<21:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 920/2613 [11:31<21:13,  1.33it/s]

	Current Loss: 1.8668
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 921/2613 [11:32<21:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 922/2613 [11:32<21:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 923/2613 [11:33<21:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 924/2613 [11:34<21:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 925/2613 [11:35<21:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 926/2613 [11:35<21:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 927/2613 [11:36<21:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 928/2613 [11:37<21:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 929/2613 [11:38<21:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 930/2613 [11:38<21:05,  1.33it/s]

	Current Loss: 1.8664
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 931/2613 [11:39<21:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 932/2613 [11:40<21:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 933/2613 [11:41<21:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 934/2613 [11:41<21:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 935/2613 [11:42<21:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 936/2613 [11:43<21:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 937/2613 [11:44<20:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 938/2613 [11:44<20:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 939/2613 [11:45<20:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 940/2613 [11:46<20:56,  1.33it/s]

	Current Loss: 1.8652
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 941/2613 [11:47<20:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 942/2613 [11:47<20:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 943/2613 [11:48<20:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 944/2613 [11:49<20:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 945/2613 [11:50<20:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 946/2613 [11:50<20:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 947/2613 [11:51<20:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 948/2613 [11:52<20:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 949/2613 [11:53<20:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 950/2613 [11:53<20:50,  1.33it/s]

	Current Loss: 1.8659
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 951/2613 [11:54<20:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 952/2613 [11:55<20:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 953/2613 [11:56<20:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 954/2613 [11:56<20:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 955/2613 [11:57<20:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 956/2613 [11:58<20:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 957/2613 [11:59<20:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 958/2613 [11:59<20:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 959/2613 [12:00<20:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 960/2613 [12:01<20:43,  1.33it/s]

	Current Loss: 1.8636
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 961/2613 [12:02<20:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 962/2613 [12:02<20:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 963/2613 [12:03<20:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 964/2613 [12:04<20:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 965/2613 [12:05<20:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 966/2613 [12:05<20:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 967/2613 [12:06<20:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 968/2613 [12:07<20:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 969/2613 [12:08<20:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 970/2613 [12:08<20:35,  1.33it/s]

	Current Loss: 1.8683
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 971/2613 [12:09<20:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 972/2613 [12:10<20:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 973/2613 [12:11<20:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 974/2613 [12:11<20:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 975/2613 [12:12<20:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 976/2613 [12:13<20:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 977/2613 [12:14<20:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 978/2613 [12:14<20:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 979/2613 [12:15<20:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 980/2613 [12:16<20:27,  1.33it/s]

	Current Loss: 1.8689
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 981/2613 [12:17<20:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 982/2613 [12:17<20:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 983/2613 [12:18<20:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 984/2613 [12:19<20:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 985/2613 [12:20<20:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 986/2613 [12:20<20:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 987/2613 [12:21<20:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 988/2613 [12:22<20:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 989/2613 [12:23<20:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 990/2613 [12:23<20:20,  1.33it/s]

	Current Loss: 1.8677
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 991/2613 [12:24<20:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 992/2613 [12:25<20:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 993/2613 [12:26<20:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 994/2613 [12:27<20:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 995/2613 [12:27<20:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 996/2613 [12:28<20:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 997/2613 [12:29<20:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 998/2613 [12:30<20:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 999/2613 [12:30<20:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1000/2613 [12:31<20:12,  1.33it/s]

	Current Loss: 1.8651
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1001/2613 [12:32<20:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1002/2613 [12:33<20:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1003/2613 [12:33<20:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1004/2613 [12:34<20:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1005/2613 [12:35<20:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1006/2613 [12:36<20:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1007/2613 [12:36<20:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1008/2613 [12:37<20:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1009/2613 [12:38<20:31,  1.30it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1010/2613 [12:39<20:08,  1.33it/s]

	Current Loss: 1.8628
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1011/2613 [12:39<20:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1012/2613 [12:40<20:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1013/2613 [12:41<20:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1014/2613 [12:42<20:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1015/2613 [12:42<20:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1016/2613 [12:43<20:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1017/2613 [12:44<20:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1018/2613 [12:45<19:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1019/2613 [12:45<19:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1020/2613 [12:46<19:56,  1.33it/s]

	Current Loss: 1.8608
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1021/2613 [12:47<19:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1022/2613 [12:48<19:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1023/2613 [12:48<19:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1024/2613 [12:49<19:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1025/2613 [12:50<19:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1026/2613 [12:51<19:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1027/2613 [12:51<19:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1028/2613 [12:52<19:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1029/2613 [12:53<19:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1030/2613 [12:54<19:49,  1.33it/s]

	Current Loss: 1.8581
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1031/2613 [12:54<19:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1032/2613 [12:55<19:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1033/2613 [12:56<19:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1034/2613 [12:57<19:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1035/2613 [12:57<19:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1036/2613 [12:58<19:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1037/2613 [12:59<19:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1038/2613 [13:00<19:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1039/2613 [13:00<19:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1040/2613 [13:01<19:43,  1.33it/s]

	Current Loss: 1.8652
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1041/2613 [13:02<19:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1042/2613 [13:03<19:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1043/2613 [13:03<19:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1044/2613 [13:04<19:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1045/2613 [13:05<19:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1046/2613 [13:06<19:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1047/2613 [13:06<19:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1048/2613 [13:07<19:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1049/2613 [13:08<19:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1050/2613 [13:09<19:36,  1.33it/s]

	Current Loss: 1.8621
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1051/2613 [13:09<19:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1052/2613 [13:10<19:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1053/2613 [13:11<19:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1054/2613 [13:12<19:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1055/2613 [13:12<19:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1056/2613 [13:13<19:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1057/2613 [13:14<19:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1058/2613 [13:15<19:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1059/2613 [13:15<19:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1060/2613 [13:16<19:27,  1.33it/s]

	Current Loss: 1.8568
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1061/2613 [13:17<19:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1062/2613 [13:18<19:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1063/2613 [13:18<19:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1064/2613 [13:19<19:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1065/2613 [13:20<19:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1066/2613 [13:21<19:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1067/2613 [13:21<19:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1068/2613 [13:22<19:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1069/2613 [13:23<19:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1070/2613 [13:24<19:20,  1.33it/s]

	Current Loss: 1.8625
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1071/2613 [13:24<19:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1072/2613 [13:25<19:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1073/2613 [13:26<19:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1074/2613 [13:27<19:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1075/2613 [13:27<19:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1076/2613 [13:28<19:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1077/2613 [13:29<19:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1078/2613 [13:30<19:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1079/2613 [13:30<19:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1080/2613 [13:31<19:12,  1.33it/s]

	Current Loss: 1.8581
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1081/2613 [13:32<19:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1082/2613 [13:33<19:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1083/2613 [13:33<19:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1084/2613 [13:34<19:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1085/2613 [13:35<19:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1086/2613 [13:36<19:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1087/2613 [13:36<19:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1088/2613 [13:37<19:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1089/2613 [13:38<19:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1090/2613 [13:39<19:04,  1.33it/s]

	Current Loss: 1.8559
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1091/2613 [13:39<19:13,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1092/2613 [13:40<19:09,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1093/2613 [13:41<19:07,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1094/2613 [13:42<19:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1095/2613 [13:42<19:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1096/2613 [13:43<19:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1097/2613 [13:44<19:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1098/2613 [13:45<18:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1099/2613 [13:45<18:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1100/2613 [13:46<18:58,  1.33it/s]

	Current Loss: 1.8547
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1101/2613 [13:47<18:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1102/2613 [13:48<18:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1103/2613 [13:49<18:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1104/2613 [13:49<18:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1105/2613 [13:50<18:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1106/2613 [13:51<18:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1107/2613 [13:52<18:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1108/2613 [13:52<18:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1109/2613 [13:53<18:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1110/2613 [13:54<18:49,  1.33it/s]

	Current Loss: 1.8556
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1111/2613 [13:55<18:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1112/2613 [13:55<18:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1113/2613 [13:56<18:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1114/2613 [13:57<18:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1115/2613 [13:58<18:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1116/2613 [13:58<18:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1117/2613 [13:59<18:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1118/2613 [14:00<18:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1119/2613 [14:01<18:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1120/2613 [14:01<18:42,  1.33it/s]

	Current Loss: 1.8554
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1121/2613 [14:02<18:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1122/2613 [14:03<18:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1123/2613 [14:04<18:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1124/2613 [14:04<18:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1125/2613 [14:05<18:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1126/2613 [14:06<18:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1127/2613 [14:07<18:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1128/2613 [14:07<18:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1129/2613 [14:08<18:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1130/2613 [14:09<18:34,  1.33it/s]

	Current Loss: 1.8490
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1131/2613 [14:10<18:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1132/2613 [14:10<18:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1133/2613 [14:11<18:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1134/2613 [14:12<18:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1135/2613 [14:13<18:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1136/2613 [14:13<18:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1137/2613 [14:14<18:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1138/2613 [14:15<18:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1139/2613 [14:16<18:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1140/2613 [14:16<18:27,  1.33it/s]

	Current Loss: 1.8549
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1141/2613 [14:17<18:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1142/2613 [14:18<18:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1143/2613 [14:19<18:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1144/2613 [14:19<18:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1145/2613 [14:20<18:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1146/2613 [14:21<18:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1147/2613 [14:22<18:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1148/2613 [14:22<18:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1149/2613 [14:23<18:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1150/2613 [14:24<18:20,  1.33it/s]

	Current Loss: 1.8481
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1151/2613 [14:25<18:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1152/2613 [14:25<18:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1153/2613 [14:26<18:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1154/2613 [14:27<18:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1155/2613 [14:28<18:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1156/2613 [14:28<18:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1157/2613 [14:29<18:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1158/2613 [14:30<18:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1159/2613 [14:31<18:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1160/2613 [14:31<18:13,  1.33it/s]

	Current Loss: 1.8522
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1161/2613 [14:32<18:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1162/2613 [14:33<18:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1163/2613 [14:34<18:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1164/2613 [14:34<18:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1165/2613 [14:35<18:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1166/2613 [14:36<18:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1167/2613 [14:37<18:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1168/2613 [14:37<18:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1169/2613 [14:38<18:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1170/2613 [14:39<18:04,  1.33it/s]

	Current Loss: 1.8506
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1171/2613 [14:40<18:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1172/2613 [14:40<18:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1173/2613 [14:41<18:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1174/2613 [14:42<18:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1175/2613 [14:43<18:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1176/2613 [14:43<18:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1177/2613 [14:44<17:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1178/2613 [14:45<17:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1179/2613 [14:46<17:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1180/2613 [14:46<17:57,  1.33it/s]

	Current Loss: 1.8505
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1181/2613 [14:47<17:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1182/2613 [14:48<17:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1183/2613 [14:49<17:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1184/2613 [14:49<17:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1185/2613 [14:50<17:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1186/2613 [14:51<17:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1187/2613 [14:52<17:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1188/2613 [14:52<17:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1189/2613 [14:53<17:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1190/2613 [14:54<17:49,  1.33it/s]

	Current Loss: 1.8510
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1191/2613 [14:55<17:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1192/2613 [14:55<17:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1193/2613 [14:56<17:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1194/2613 [14:57<17:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1195/2613 [14:58<17:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1196/2613 [14:58<17:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1197/2613 [14:59<17:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1198/2613 [15:00<17:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1199/2613 [15:01<17:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1200/2613 [15:01<17:42,  1.33it/s]

	Current Loss: 1.8446
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1201/2613 [15:02<17:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1202/2613 [15:03<17:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1203/2613 [15:04<17:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1204/2613 [15:04<17:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1205/2613 [15:05<17:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1206/2613 [15:06<17:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1207/2613 [15:07<17:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1208/2613 [15:07<17:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1209/2613 [15:08<17:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1210/2613 [15:09<17:34,  1.33it/s]

	Current Loss: 1.8465
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1211/2613 [15:10<17:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1212/2613 [15:10<17:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1213/2613 [15:11<17:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1214/2613 [15:12<17:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1215/2613 [15:13<17:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1216/2613 [15:13<17:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1217/2613 [15:14<17:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1218/2613 [15:15<17:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1219/2613 [15:16<17:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1220/2613 [15:16<17:27,  1.33it/s]

	Current Loss: 1.8489
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1221/2613 [15:17<17:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1222/2613 [15:18<17:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1223/2613 [15:19<17:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1224/2613 [15:19<17:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1225/2613 [15:20<17:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1226/2613 [15:21<17:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1227/2613 [15:22<17:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1228/2613 [15:22<17:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1229/2613 [15:23<17:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1230/2613 [15:24<17:19,  1.33it/s]

	Current Loss: 1.8505
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1231/2613 [15:25<17:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1232/2613 [15:25<17:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1233/2613 [15:26<17:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1234/2613 [15:27<17:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1235/2613 [15:28<17:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1236/2613 [15:28<17:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1237/2613 [15:29<17:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1238/2613 [15:30<17:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1239/2613 [15:31<17:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1240/2613 [15:31<17:12,  1.33it/s]

	Current Loss: 1.8457
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1241/2613 [15:32<17:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1242/2613 [15:33<17:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1243/2613 [15:34<17:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1244/2613 [15:35<17:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1245/2613 [15:35<17:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1246/2613 [15:36<17:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1247/2613 [15:37<17:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1248/2613 [15:38<17:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1249/2613 [15:38<17:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1250/2613 [15:39<17:04,  1.33it/s]

	Current Loss: 1.8441
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1251/2613 [15:40<17:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1252/2613 [15:41<17:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1253/2613 [15:41<17:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1254/2613 [15:42<17:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1255/2613 [15:43<17:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1256/2613 [15:44<17:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1257/2613 [15:44<16:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1258/2613 [15:45<16:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1259/2613 [15:46<16:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1260/2613 [15:47<16:56,  1.33it/s]

	Current Loss: 1.8420
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1261/2613 [15:47<16:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1262/2613 [15:48<16:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1263/2613 [15:49<16:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1264/2613 [15:50<16:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1265/2613 [15:50<16:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1266/2613 [15:51<16:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1267/2613 [15:52<16:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1268/2613 [15:53<16:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1269/2613 [15:53<16:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1270/2613 [15:54<16:50,  1.33it/s]

	Current Loss: 1.8392
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1271/2613 [15:55<16:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1272/2613 [15:56<16:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1273/2613 [15:56<16:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1274/2613 [15:57<16:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1275/2613 [15:58<16:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1276/2613 [15:59<16:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1277/2613 [15:59<16:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1278/2613 [16:00<16:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1279/2613 [16:01<16:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1280/2613 [16:02<16:42,  1.33it/s]

	Current Loss: 1.8478
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1281/2613 [16:02<16:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1282/2613 [16:03<16:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1283/2613 [16:04<16:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1284/2613 [16:05<16:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1285/2613 [16:05<16:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1286/2613 [16:06<16:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1287/2613 [16:07<16:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1288/2613 [16:08<16:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1289/2613 [16:08<16:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1290/2613 [16:09<16:34,  1.33it/s]

	Current Loss: 1.8490
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1291/2613 [16:10<16:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1292/2613 [16:11<16:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1293/2613 [16:11<16:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1294/2613 [16:12<16:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1295/2613 [16:13<16:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1296/2613 [16:14<16:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1297/2613 [16:14<16:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1298/2613 [16:15<16:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1299/2613 [16:16<16:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1300/2613 [16:17<16:27,  1.33it/s]

	Current Loss: 1.8381
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1301/2613 [16:17<16:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1302/2613 [16:18<16:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1303/2613 [16:19<16:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1304/2613 [16:20<16:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1305/2613 [16:20<16:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1306/2613 [16:21<16:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1307/2613 [16:22<16:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1308/2613 [16:23<16:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1309/2613 [16:23<16:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1310/2613 [16:24<16:19,  1.33it/s]

	Current Loss: 1.8424
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1311/2613 [16:25<16:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1312/2613 [16:26<16:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1313/2613 [16:26<16:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1314/2613 [16:27<16:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1315/2613 [16:28<16:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1316/2613 [16:29<16:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1317/2613 [16:29<16:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1318/2613 [16:30<16:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1319/2613 [16:31<16:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1320/2613 [16:32<16:12,  1.33it/s]

	Current Loss: 1.8387
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1321/2613 [16:32<16:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1322/2613 [16:33<16:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1323/2613 [16:34<16:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1324/2613 [16:35<16:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1325/2613 [16:35<16:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1326/2613 [16:36<16:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1327/2613 [16:37<16:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1328/2613 [16:38<16:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1329/2613 [16:38<16:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1330/2613 [16:39<16:04,  1.33it/s]

	Current Loss: 1.8445
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1331/2613 [16:40<16:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1332/2613 [16:41<16:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1333/2613 [16:41<16:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1334/2613 [16:42<16:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1335/2613 [16:43<16:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1336/2613 [16:44<16:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1337/2613 [16:44<16:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1338/2613 [16:45<15:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1339/2613 [16:46<15:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1340/2613 [16:47<15:56,  1.33it/s]

	Current Loss: 1.8415
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1341/2613 [16:47<15:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1342/2613 [16:48<15:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1343/2613 [16:49<15:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1344/2613 [16:50<15:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1345/2613 [16:50<15:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1346/2613 [16:51<15:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1347/2613 [16:52<15:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1348/2613 [16:53<15:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1349/2613 [16:53<15:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1350/2613 [16:54<15:49,  1.33it/s]

	Current Loss: 1.8312
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1351/2613 [16:55<15:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1352/2613 [16:56<15:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1353/2613 [16:56<15:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1354/2613 [16:57<15:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1355/2613 [16:58<15:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1356/2613 [16:59<15:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1357/2613 [16:59<15:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1358/2613 [17:00<15:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1359/2613 [17:01<15:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1360/2613 [17:02<15:41,  1.33it/s]

	Current Loss: 1.8434
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1361/2613 [17:02<15:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1362/2613 [17:03<15:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1363/2613 [17:04<15:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1364/2613 [17:05<15:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1365/2613 [17:05<15:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1366/2613 [17:06<15:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1367/2613 [17:07<15:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1368/2613 [17:08<15:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1369/2613 [17:08<15:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1370/2613 [17:09<15:35,  1.33it/s]

	Current Loss: 1.8407
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1371/2613 [17:10<15:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1372/2613 [17:11<15:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1373/2613 [17:12<15:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1374/2613 [17:12<15:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1375/2613 [17:13<15:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1376/2613 [17:14<15:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1377/2613 [17:15<15:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1378/2613 [17:15<15:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1379/2613 [17:16<15:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1380/2613 [17:17<15:26,  1.33it/s]

	Current Loss: 1.8341
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1381/2613 [17:18<15:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1382/2613 [17:18<15:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1383/2613 [17:19<15:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1384/2613 [17:20<15:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1385/2613 [17:21<15:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1386/2613 [17:21<15:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1387/2613 [17:22<15:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1388/2613 [17:23<15:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1389/2613 [17:24<15:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1390/2613 [17:24<15:18,  1.33it/s]

	Current Loss: 1.8319
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1391/2613 [17:25<15:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1392/2613 [17:26<15:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1393/2613 [17:27<15:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1394/2613 [17:27<15:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1395/2613 [17:28<15:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1396/2613 [17:29<15:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1397/2613 [17:30<15:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1398/2613 [17:30<15:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1399/2613 [17:31<15:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1400/2613 [17:32<15:12,  1.33it/s]

	Current Loss: 1.8305
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1401/2613 [17:33<15:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1402/2613 [17:33<15:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1403/2613 [17:34<15:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1404/2613 [17:35<15:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1405/2613 [17:36<15:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1406/2613 [17:36<15:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1407/2613 [17:37<15:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1408/2613 [17:38<15:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1409/2613 [17:39<15:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1410/2613 [17:39<15:04,  1.33it/s]

	Current Loss: 1.8275
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1411/2613 [17:40<15:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1412/2613 [17:41<15:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1413/2613 [17:42<15:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1414/2613 [17:42<15:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1415/2613 [17:43<15:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1416/2613 [17:44<15:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1417/2613 [17:45<14:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1418/2613 [17:45<14:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1419/2613 [17:46<14:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1420/2613 [17:47<14:57,  1.33it/s]

	Current Loss: 1.8341
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1421/2613 [17:48<14:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1422/2613 [17:48<14:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1423/2613 [17:49<14:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1424/2613 [17:50<14:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1425/2613 [17:51<14:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1426/2613 [17:51<14:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1427/2613 [17:52<14:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1428/2613 [17:53<14:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1429/2613 [17:54<14:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1430/2613 [17:54<14:48,  1.33it/s]

	Current Loss: 1.8270
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1431/2613 [17:55<14:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1432/2613 [17:56<14:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1433/2613 [17:57<14:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1434/2613 [17:57<14:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1435/2613 [17:58<14:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1436/2613 [17:59<14:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1437/2613 [18:00<14:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1438/2613 [18:00<14:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1439/2613 [18:01<14:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1440/2613 [18:02<14:41,  1.33it/s]

	Current Loss: 1.8332
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1441/2613 [18:03<14:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1442/2613 [18:03<14:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1443/2613 [18:04<14:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1444/2613 [18:05<14:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1445/2613 [18:06<14:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1446/2613 [18:06<14:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1447/2613 [18:07<14:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1448/2613 [18:08<14:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1449/2613 [18:09<14:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1450/2613 [18:09<14:34,  1.33it/s]

	Current Loss: 1.8305
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1451/2613 [18:10<14:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1452/2613 [18:11<14:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1453/2613 [18:12<14:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1454/2613 [18:12<14:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1455/2613 [18:13<14:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1456/2613 [18:14<14:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1457/2613 [18:15<14:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1458/2613 [18:15<14:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1459/2613 [18:16<14:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1460/2613 [18:17<14:27,  1.33it/s]

	Current Loss: 1.8359
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1461/2613 [18:18<14:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1462/2613 [18:18<14:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1463/2613 [18:19<14:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1464/2613 [18:20<14:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1465/2613 [18:21<14:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1466/2613 [18:21<14:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1467/2613 [18:22<14:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1468/2613 [18:23<14:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1469/2613 [18:24<14:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1470/2613 [18:24<14:19,  1.33it/s]

	Current Loss: 1.8276
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1471/2613 [18:25<14:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1472/2613 [18:26<14:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1473/2613 [18:27<14:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1474/2613 [18:27<14:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1475/2613 [18:28<14:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1476/2613 [18:29<14:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1477/2613 [18:30<14:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1478/2613 [18:30<14:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1479/2613 [18:31<14:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1480/2613 [18:32<14:11,  1.33it/s]

	Current Loss: 1.8293
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1481/2613 [18:33<14:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1482/2613 [18:33<14:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1483/2613 [18:34<14:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1484/2613 [18:35<14:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1485/2613 [18:36<14:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1486/2613 [18:36<14:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1487/2613 [18:37<14:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1488/2613 [18:38<14:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1489/2613 [18:39<14:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1490/2613 [18:39<14:04,  1.33it/s]

	Current Loss: 1.8231
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1491/2613 [18:40<14:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1492/2613 [18:41<14:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1493/2613 [18:42<14:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1494/2613 [18:42<14:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1495/2613 [18:43<13:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1496/2613 [18:44<13:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1497/2613 [18:45<13:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1498/2613 [18:45<13:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1499/2613 [18:46<13:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1500/2613 [18:47<13:57,  1.33it/s]

	Current Loss: 1.8183
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1501/2613 [18:48<13:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1502/2613 [18:48<13:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1503/2613 [18:49<13:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1504/2613 [18:50<13:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1505/2613 [18:51<13:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1506/2613 [18:52<13:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1507/2613 [18:52<13:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1508/2613 [18:53<13:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1509/2613 [18:54<13:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1510/2613 [18:55<13:49,  1.33it/s]

	Current Loss: 1.8351
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1511/2613 [18:55<13:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1512/2613 [18:56<13:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1513/2613 [18:57<13:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1514/2613 [18:58<13:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1515/2613 [18:58<13:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1516/2613 [18:59<13:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1517/2613 [19:00<13:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1518/2613 [19:01<13:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1519/2613 [19:01<13:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1520/2613 [19:02<13:41,  1.33it/s]

	Current Loss: 1.8264
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1521/2613 [19:03<13:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1522/2613 [19:04<13:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1523/2613 [19:04<13:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1524/2613 [19:05<13:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1525/2613 [19:06<13:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1526/2613 [19:07<13:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1527/2613 [19:07<13:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1528/2613 [19:08<13:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1529/2613 [19:09<13:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1530/2613 [19:10<13:34,  1.33it/s]

	Current Loss: 1.8230
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1531/2613 [19:10<13:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1532/2613 [19:11<13:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1533/2613 [19:12<13:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1534/2613 [19:13<13:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1535/2613 [19:13<13:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1536/2613 [19:14<13:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1537/2613 [19:15<13:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1538/2613 [19:16<13:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1539/2613 [19:16<13:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1540/2613 [19:17<13:27,  1.33it/s]

	Current Loss: 1.8217
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1541/2613 [19:18<13:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1542/2613 [19:19<13:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1543/2613 [19:19<13:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1544/2613 [19:20<13:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1545/2613 [19:21<13:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1546/2613 [19:22<13:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1547/2613 [19:22<13:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1548/2613 [19:23<13:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1549/2613 [19:24<13:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1550/2613 [19:25<13:19,  1.33it/s]

	Current Loss: 1.8213
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1551/2613 [19:25<13:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1552/2613 [19:26<13:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1553/2613 [19:27<13:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1554/2613 [19:28<13:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1555/2613 [19:28<13:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1556/2613 [19:29<13:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1557/2613 [19:30<13:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1558/2613 [19:31<13:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1559/2613 [19:31<13:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1560/2613 [19:32<13:11,  1.33it/s]

	Current Loss: 1.8285
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1561/2613 [19:33<13:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1562/2613 [19:34<13:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1563/2613 [19:34<13:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1564/2613 [19:35<13:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1565/2613 [19:36<13:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1566/2613 [19:37<13:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1567/2613 [19:37<13:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1568/2613 [19:38<13:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1569/2613 [19:39<13:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1570/2613 [19:40<13:04,  1.33it/s]

	Current Loss: 1.8202
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1571/2613 [19:40<13:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1572/2613 [19:41<13:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1573/2613 [19:42<13:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1574/2613 [19:43<13:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1575/2613 [19:43<13:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1576/2613 [19:44<13:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1577/2613 [19:45<12:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1578/2613 [19:46<12:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1579/2613 [19:46<12:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1580/2613 [19:47<12:57,  1.33it/s]

	Current Loss: 1.8223
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1581/2613 [19:48<12:59,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1582/2613 [19:49<12:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1583/2613 [19:49<12:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1584/2613 [19:50<12:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1585/2613 [19:51<12:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1586/2613 [19:52<12:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1587/2613 [19:52<12:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1588/2613 [19:53<12:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1589/2613 [19:54<12:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1590/2613 [19:55<12:49,  1.33it/s]

	Current Loss: 1.8244
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1591/2613 [19:55<12:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1592/2613 [19:56<12:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1593/2613 [19:57<12:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1594/2613 [19:58<12:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1595/2613 [19:58<12:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1596/2613 [19:59<12:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1597/2613 [20:00<12:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1598/2613 [20:01<12:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1599/2613 [20:01<12:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1600/2613 [20:02<12:41,  1.33it/s]

	Current Loss: 1.8193
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1601/2613 [20:03<12:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1602/2613 [20:04<12:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1603/2613 [20:04<12:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1604/2613 [20:05<12:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1605/2613 [20:06<12:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1606/2613 [20:07<12:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1607/2613 [20:07<12:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1608/2613 [20:08<12:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1609/2613 [20:09<12:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1610/2613 [20:10<12:34,  1.33it/s]

	Current Loss: 1.8189
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1611/2613 [20:10<12:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1612/2613 [20:11<12:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1613/2613 [20:12<12:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1614/2613 [20:13<12:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1615/2613 [20:13<12:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1616/2613 [20:14<12:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1617/2613 [20:15<12:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1618/2613 [20:16<12:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1619/2613 [20:16<12:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1620/2613 [20:17<12:26,  1.33it/s]

	Current Loss: 1.8183
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1621/2613 [20:18<12:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1622/2613 [20:19<12:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1623/2613 [20:19<12:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1624/2613 [20:20<12:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1625/2613 [20:21<12:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1626/2613 [20:22<12:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1627/2613 [20:22<12:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1628/2613 [20:23<12:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1629/2613 [20:24<12:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1630/2613 [20:25<12:18,  1.33it/s]

	Current Loss: 1.8102
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1631/2613 [20:26<12:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1632/2613 [20:26<12:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1633/2613 [20:27<12:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1634/2613 [20:28<12:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1635/2613 [20:29<12:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1636/2613 [20:29<12:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1637/2613 [20:30<12:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1638/2613 [20:31<12:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1639/2613 [20:32<12:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1640/2613 [20:32<12:11,  1.33it/s]

	Current Loss: 1.8193
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1641/2613 [20:33<12:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1642/2613 [20:34<12:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1643/2613 [20:35<12:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1644/2613 [20:35<12:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1645/2613 [20:36<12:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1646/2613 [20:37<12:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1647/2613 [20:38<12:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1648/2613 [20:38<12:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1649/2613 [20:39<12:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1650/2613 [20:40<12:04,  1.33it/s]

	Current Loss: 1.8196
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1651/2613 [20:41<12:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1652/2613 [20:41<12:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1653/2613 [20:42<12:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1654/2613 [20:43<12:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1655/2613 [20:44<12:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1656/2613 [20:44<11:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1657/2613 [20:45<11:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1658/2613 [20:46<11:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1659/2613 [20:47<11:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1660/2613 [20:47<11:56,  1.33it/s]

	Current Loss: 1.8178
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1661/2613 [20:48<11:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1662/2613 [20:49<11:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1663/2613 [20:50<12:08,  1.30it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1664/2613 [20:50<12:02,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1665/2613 [20:51<11:58,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1666/2613 [20:52<11:56,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1667/2613 [20:53<11:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1668/2613 [20:53<11:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1669/2613 [20:54<11:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1670/2613 [20:55<11:49,  1.33it/s]

	Current Loss: 1.8233
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1671/2613 [20:56<11:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1672/2613 [20:56<11:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1673/2613 [20:57<11:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1674/2613 [20:58<11:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1675/2613 [20:59<11:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1676/2613 [20:59<11:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1677/2613 [21:00<11:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1678/2613 [21:01<11:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1679/2613 [21:02<11:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1680/2613 [21:02<11:41,  1.33it/s]

	Current Loss: 1.8156
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1681/2613 [21:03<11:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1682/2613 [21:04<11:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1683/2613 [21:05<11:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1684/2613 [21:05<11:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1685/2613 [21:06<11:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1686/2613 [21:07<11:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1687/2613 [21:08<11:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1688/2613 [21:08<11:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1689/2613 [21:09<11:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1690/2613 [21:10<11:33,  1.33it/s]

	Current Loss: 1.8127
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1691/2613 [21:11<11:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1692/2613 [21:11<11:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1693/2613 [21:12<11:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1694/2613 [21:13<11:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1695/2613 [21:14<11:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1696/2613 [21:14<11:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1697/2613 [21:15<11:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1698/2613 [21:16<11:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1699/2613 [21:17<11:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1700/2613 [21:17<11:25,  1.33it/s]

	Current Loss: 1.8134
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1701/2613 [21:18<11:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1702/2613 [21:19<11:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1703/2613 [21:20<11:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1704/2613 [21:20<11:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1705/2613 [21:21<11:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1706/2613 [21:22<11:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1707/2613 [21:23<11:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1708/2613 [21:23<11:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1709/2613 [21:24<11:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1710/2613 [21:25<11:18,  1.33it/s]

	Current Loss: 1.8161
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1711/2613 [21:26<11:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1712/2613 [21:26<11:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1713/2613 [21:27<11:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1714/2613 [21:28<11:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1715/2613 [21:29<11:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1716/2613 [21:29<11:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1717/2613 [21:30<11:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1718/2613 [21:31<11:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1719/2613 [21:32<11:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1720/2613 [21:32<11:11,  1.33it/s]

	Current Loss: 1.8130
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1721/2613 [21:33<11:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1722/2613 [21:34<11:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1723/2613 [21:35<11:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1724/2613 [21:35<11:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1725/2613 [21:36<11:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1726/2613 [21:37<11:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1727/2613 [21:38<11:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1728/2613 [21:38<11:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1729/2613 [21:39<11:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1730/2613 [21:40<11:03,  1.33it/s]

	Current Loss: 1.8122
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1731/2613 [21:41<11:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1732/2613 [21:41<11:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1733/2613 [21:42<11:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1734/2613 [21:43<11:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1735/2613 [21:44<10:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1736/2613 [21:44<10:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1737/2613 [21:45<10:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1738/2613 [21:46<10:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1739/2613 [21:47<10:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1740/2613 [21:47<10:56,  1.33it/s]

	Current Loss: 1.8185
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1741/2613 [21:48<10:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1742/2613 [21:49<10:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1743/2613 [21:50<10:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1744/2613 [21:50<10:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1745/2613 [21:51<10:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1746/2613 [21:52<10:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1747/2613 [21:53<10:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1748/2613 [21:53<10:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1749/2613 [21:54<10:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1750/2613 [21:55<10:49,  1.33it/s]

	Current Loss: 1.8101
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1751/2613 [21:56<10:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1752/2613 [21:56<10:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1753/2613 [21:57<10:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1754/2613 [21:58<10:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1755/2613 [21:59<10:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1756/2613 [21:59<10:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1757/2613 [22:00<10:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1758/2613 [22:01<10:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1759/2613 [22:02<10:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1760/2613 [22:03<10:41,  1.33it/s]

	Current Loss: 1.8129
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1761/2613 [22:03<10:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1762/2613 [22:04<10:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1763/2613 [22:05<10:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1764/2613 [22:06<10:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1765/2613 [22:06<10:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1766/2613 [22:07<10:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1767/2613 [22:08<10:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1768/2613 [22:09<10:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1769/2613 [22:09<10:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1770/2613 [22:10<10:33,  1.33it/s]

	Current Loss: 1.8125
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1771/2613 [22:11<10:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1772/2613 [22:12<10:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1773/2613 [22:12<10:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1774/2613 [22:13<10:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1775/2613 [22:14<10:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1776/2613 [22:15<10:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1777/2613 [22:15<10:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1778/2613 [22:16<10:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1779/2613 [22:17<10:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1780/2613 [22:18<10:25,  1.33it/s]

	Current Loss: 1.8098
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1781/2613 [22:18<10:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1782/2613 [22:19<10:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1783/2613 [22:20<10:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1784/2613 [22:21<10:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1785/2613 [22:21<10:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1786/2613 [22:22<10:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1787/2613 [22:23<10:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1788/2613 [22:24<10:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1789/2613 [22:24<10:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1790/2613 [22:25<10:18,  1.33it/s]

	Current Loss: 1.8079
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1791/2613 [22:26<10:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1792/2613 [22:27<10:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1793/2613 [22:27<10:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1794/2613 [22:28<10:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1795/2613 [22:29<10:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1796/2613 [22:30<10:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1797/2613 [22:30<10:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1798/2613 [22:31<10:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1799/2613 [22:32<10:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1800/2613 [22:33<10:10,  1.33it/s]

	Current Loss: 1.8016
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1801/2613 [22:33<10:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1802/2613 [22:34<10:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1803/2613 [22:35<10:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1804/2613 [22:36<10:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1805/2613 [22:36<10:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1806/2613 [22:37<10:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1807/2613 [22:38<10:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1808/2613 [22:39<10:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1809/2613 [22:39<10:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1810/2613 [22:40<10:03,  1.33it/s]

	Current Loss: 1.8058
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1811/2613 [22:41<10:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1812/2613 [22:42<10:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1813/2613 [22:42<10:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1814/2613 [22:43<10:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1815/2613 [22:44<10:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1816/2613 [22:45<09:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1817/2613 [22:45<09:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1818/2613 [22:46<09:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1819/2613 [22:47<09:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1820/2613 [22:48<09:55,  1.33it/s]

	Current Loss: 1.8040
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1821/2613 [22:48<09:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1822/2613 [22:49<09:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1823/2613 [22:50<09:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1824/2613 [22:51<09:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1825/2613 [22:51<09:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1826/2613 [22:52<09:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1827/2613 [22:53<09:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1828/2613 [22:54<09:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1829/2613 [22:54<09:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1830/2613 [22:55<09:48,  1.33it/s]

	Current Loss: 1.8036
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1831/2613 [22:56<09:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1832/2613 [22:57<09:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1833/2613 [22:57<09:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1834/2613 [22:58<09:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1835/2613 [22:59<09:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1836/2613 [23:00<09:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1837/2613 [23:00<09:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1838/2613 [23:01<09:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1839/2613 [23:02<09:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1840/2613 [23:03<09:41,  1.33it/s]

	Current Loss: 1.8091
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1841/2613 [23:03<09:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1842/2613 [23:04<09:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1843/2613 [23:05<09:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1844/2613 [23:06<09:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1845/2613 [23:06<09:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1846/2613 [23:07<09:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1847/2613 [23:08<09:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1848/2613 [23:09<09:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1849/2613 [23:09<09:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1850/2613 [23:10<09:33,  1.33it/s]

	Current Loss: 1.8062
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1851/2613 [23:11<09:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1852/2613 [23:12<09:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1853/2613 [23:12<09:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1854/2613 [23:13<09:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1855/2613 [23:14<09:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1856/2613 [23:15<09:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1857/2613 [23:15<09:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1858/2613 [23:16<09:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1859/2613 [23:17<09:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1860/2613 [23:18<09:25,  1.33it/s]

	Current Loss: 1.8081
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1861/2613 [23:18<09:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1862/2613 [23:19<09:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1863/2613 [23:20<09:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1864/2613 [23:21<09:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1865/2613 [23:21<09:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1866/2613 [23:22<09:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1867/2613 [23:23<09:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1868/2613 [23:24<09:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1869/2613 [23:24<09:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1870/2613 [23:25<09:18,  1.33it/s]

	Current Loss: 1.8050
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1871/2613 [23:26<09:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1872/2613 [23:27<09:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1873/2613 [23:27<09:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1874/2613 [23:28<09:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1875/2613 [23:29<09:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1876/2613 [23:30<09:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1877/2613 [23:30<09:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1878/2613 [23:31<09:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1879/2613 [23:32<09:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1880/2613 [23:33<09:10,  1.33it/s]

	Current Loss: 1.8023
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1881/2613 [23:33<09:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1882/2613 [23:34<09:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1883/2613 [23:35<09:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1884/2613 [23:36<09:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1885/2613 [23:36<09:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1886/2613 [23:37<09:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1887/2613 [23:38<09:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1888/2613 [23:39<09:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1889/2613 [23:39<09:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1890/2613 [23:40<09:02,  1.33it/s]

	Current Loss: 1.8037
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1891/2613 [23:41<09:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1892/2613 [23:42<09:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1893/2613 [23:42<09:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1894/2613 [23:43<09:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1895/2613 [23:44<08:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1896/2613 [23:45<08:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1897/2613 [23:45<08:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1898/2613 [23:46<08:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1899/2613 [23:47<08:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1900/2613 [23:48<08:55,  1.33it/s]

	Current Loss: 1.7991
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1901/2613 [23:48<08:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1902/2613 [23:49<08:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1903/2613 [23:50<08:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1904/2613 [23:51<08:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1905/2613 [23:51<08:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1906/2613 [23:52<08:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1907/2613 [23:53<08:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1908/2613 [23:54<08:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1909/2613 [23:54<08:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1910/2613 [23:55<08:48,  1.33it/s]

	Current Loss: 1.7956
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1911/2613 [23:56<08:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1912/2613 [23:57<08:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1913/2613 [23:57<08:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1914/2613 [23:58<08:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1915/2613 [23:59<08:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1916/2613 [24:00<08:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1917/2613 [24:00<08:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1918/2613 [24:01<08:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1919/2613 [24:02<08:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1920/2613 [24:03<08:40,  1.33it/s]

	Current Loss: 1.8031
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1921/2613 [24:03<08:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1922/2613 [24:04<08:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1923/2613 [24:05<08:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1924/2613 [24:06<08:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1925/2613 [24:06<08:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1926/2613 [24:07<08:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1927/2613 [24:08<08:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1928/2613 [24:09<08:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1929/2613 [24:09<08:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1930/2613 [24:10<08:33,  1.33it/s]

	Current Loss: 1.8031
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1931/2613 [24:11<08:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1932/2613 [24:12<08:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1933/2613 [24:12<08:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1934/2613 [24:13<08:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1935/2613 [24:14<08:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1936/2613 [24:15<08:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1937/2613 [24:15<08:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1938/2613 [24:16<08:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1939/2613 [24:17<08:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1940/2613 [24:18<08:25,  1.33it/s]

	Current Loss: 1.7967
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1941/2613 [24:19<08:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1942/2613 [24:19<08:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1943/2613 [24:20<08:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1944/2613 [24:21<08:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1945/2613 [24:22<08:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1946/2613 [24:22<08:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1947/2613 [24:23<08:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1948/2613 [24:24<08:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1949/2613 [24:25<08:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1950/2613 [24:25<08:18,  1.33it/s]

	Current Loss: 1.8009
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1951/2613 [24:26<08:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1952/2613 [24:27<08:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1953/2613 [24:28<08:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1954/2613 [24:28<08:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1955/2613 [24:29<08:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1956/2613 [24:30<08:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1957/2613 [24:31<08:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1958/2613 [24:31<08:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1959/2613 [24:32<08:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1960/2613 [24:33<08:10,  1.33it/s]

	Current Loss: 1.7975
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1961/2613 [24:34<08:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1962/2613 [24:34<08:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1963/2613 [24:35<08:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1964/2613 [24:36<08:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1965/2613 [24:37<08:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1966/2613 [24:37<08:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1967/2613 [24:38<08:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1968/2613 [24:39<08:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1969/2613 [24:40<08:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1970/2613 [24:40<08:03,  1.33it/s]

	Current Loss: 1.7969
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1971/2613 [24:41<08:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1972/2613 [24:42<08:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1973/2613 [24:43<08:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1974/2613 [24:43<08:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1975/2613 [24:44<07:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1976/2613 [24:45<07:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1977/2613 [24:46<07:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1978/2613 [24:46<07:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1979/2613 [24:47<07:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1980/2613 [24:48<07:55,  1.33it/s]

	Current Loss: 1.8017
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1981/2613 [24:49<07:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1982/2613 [24:49<07:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1983/2613 [24:50<07:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1984/2613 [24:51<07:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1985/2613 [24:52<07:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1986/2613 [24:52<07:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1987/2613 [24:53<07:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1988/2613 [24:54<07:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1989/2613 [24:55<07:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1990/2613 [24:55<07:56,  1.31it/s]

	Current Loss: 1.7931
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1991/2613 [24:56<07:52,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1992/2613 [24:57<07:50,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1993/2613 [24:58<07:48,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1994/2613 [24:58<07:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1995/2613 [24:59<07:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1996/2613 [25:00<07:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1997/2613 [25:01<07:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1998/2613 [25:01<07:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 1999/2613 [25:02<07:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2000/2613 [25:03<07:41,  1.33it/s]

	Current Loss: 1.7930
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2001/2613 [25:04<07:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2002/2613 [25:04<07:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2003/2613 [25:05<07:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2004/2613 [25:06<07:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2005/2613 [25:07<07:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2006/2613 [25:07<07:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2007/2613 [25:08<07:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2008/2613 [25:09<07:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2009/2613 [25:10<07:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2010/2613 [25:10<07:33,  1.33it/s]

	Current Loss: 1.8025
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2011/2613 [25:11<07:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2012/2613 [25:12<07:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2013/2613 [25:13<07:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2014/2613 [25:13<07:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2015/2613 [25:14<07:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2016/2613 [25:15<07:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2017/2613 [25:16<07:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2018/2613 [25:16<07:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2019/2613 [25:17<07:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2020/2613 [25:18<07:25,  1.33it/s]

	Current Loss: 1.7917
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2021/2613 [25:19<07:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2022/2613 [25:19<07:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2023/2613 [25:20<07:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2024/2613 [25:21<07:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2025/2613 [25:22<07:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2026/2613 [25:22<07:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2027/2613 [25:23<07:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2028/2613 [25:24<07:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2029/2613 [25:25<07:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2030/2613 [25:25<07:18,  1.33it/s]

	Current Loss: 1.7883
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2031/2613 [25:26<07:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2032/2613 [25:27<07:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2033/2613 [25:28<07:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2034/2613 [25:28<07:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2035/2613 [25:29<07:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2036/2613 [25:30<07:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2037/2613 [25:31<07:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2038/2613 [25:31<07:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2039/2613 [25:32<07:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2040/2613 [25:33<07:10,  1.33it/s]

	Current Loss: 1.7884
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2041/2613 [25:34<07:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2042/2613 [25:34<07:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2043/2613 [25:35<07:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2044/2613 [25:36<07:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2045/2613 [25:37<07:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2046/2613 [25:37<07:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2047/2613 [25:38<07:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2048/2613 [25:39<07:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2049/2613 [25:40<07:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2050/2613 [25:40<07:02,  1.33it/s]

	Current Loss: 1.7904
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2051/2613 [25:41<07:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2052/2613 [25:42<07:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2053/2613 [25:43<07:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2054/2613 [25:43<06:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2055/2613 [25:44<06:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2056/2613 [25:45<06:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2057/2613 [25:46<06:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2058/2613 [25:46<06:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2059/2613 [25:47<06:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2060/2613 [25:48<06:55,  1.33it/s]

	Current Loss: 1.7941
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2061/2613 [25:49<06:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2062/2613 [25:49<06:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2063/2613 [25:50<06:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2064/2613 [25:51<06:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2065/2613 [25:52<06:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2066/2613 [25:52<06:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2067/2613 [25:53<06:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2068/2613 [25:54<06:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2069/2613 [25:55<06:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2070/2613 [25:55<06:48,  1.33it/s]

	Current Loss: 1.7902
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2071/2613 [25:56<06:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2072/2613 [25:57<06:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2073/2613 [25:58<06:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2074/2613 [25:58<06:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2075/2613 [25:59<06:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2076/2613 [26:00<06:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2077/2613 [26:01<06:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2078/2613 [26:01<06:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2079/2613 [26:02<06:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2080/2613 [26:03<06:40,  1.33it/s]

	Current Loss: 1.7877
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2081/2613 [26:04<06:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2082/2613 [26:04<06:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2083/2613 [26:05<06:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2084/2613 [26:06<06:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2085/2613 [26:07<06:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2086/2613 [26:08<06:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2087/2613 [26:08<06:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2088/2613 [26:09<06:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2089/2613 [26:10<06:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2090/2613 [26:11<06:33,  1.33it/s]

	Current Loss: 1.7899
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2091/2613 [26:11<06:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2092/2613 [26:12<06:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2093/2613 [26:13<06:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2094/2613 [26:14<06:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2095/2613 [26:14<06:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2096/2613 [26:15<06:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2097/2613 [26:16<06:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2098/2613 [26:17<06:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2099/2613 [26:17<06:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2100/2613 [26:18<06:25,  1.33it/s]

	Current Loss: 1.7888
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2101/2613 [26:19<06:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2102/2613 [26:20<06:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2103/2613 [26:20<06:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2104/2613 [26:21<06:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2105/2613 [26:22<06:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2106/2613 [26:23<06:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2107/2613 [26:23<06:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2108/2613 [26:24<06:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2109/2613 [26:25<06:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2110/2613 [26:26<06:17,  1.33it/s]

	Current Loss: 1.7851
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2111/2613 [26:26<06:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2112/2613 [26:27<06:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2113/2613 [26:28<06:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2114/2613 [26:29<06:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2115/2613 [26:29<06:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2116/2613 [26:30<06:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2117/2613 [26:31<06:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2118/2613 [26:32<06:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2119/2613 [26:32<06:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2120/2613 [26:33<06:10,  1.33it/s]

	Current Loss: 1.7833
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2121/2613 [26:34<06:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2122/2613 [26:35<06:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2123/2613 [26:35<06:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2124/2613 [26:36<06:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2125/2613 [26:37<06:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2126/2613 [26:38<06:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2127/2613 [26:38<06:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2128/2613 [26:39<06:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2129/2613 [26:40<06:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2130/2613 [26:41<06:02,  1.33it/s]

	Current Loss: 1.7920
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2131/2613 [26:41<06:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2132/2613 [26:42<06:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2133/2613 [26:43<06:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2134/2613 [26:44<05:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2135/2613 [26:44<05:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2136/2613 [26:45<05:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2137/2613 [26:46<05:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2138/2613 [26:47<05:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2139/2613 [26:47<05:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2140/2613 [26:48<05:55,  1.33it/s]

	Current Loss: 1.7836
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2141/2613 [26:49<05:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2142/2613 [26:50<05:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2143/2613 [26:50<05:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2144/2613 [26:51<05:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2145/2613 [26:52<05:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2146/2613 [26:53<05:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2147/2613 [26:53<05:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2148/2613 [26:54<05:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2149/2613 [26:55<05:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2150/2613 [26:56<05:48,  1.33it/s]

	Current Loss: 1.7830
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2151/2613 [26:56<05:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2152/2613 [26:57<05:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2153/2613 [26:58<05:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2154/2613 [26:59<05:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2155/2613 [26:59<05:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2156/2613 [27:00<05:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2157/2613 [27:01<05:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2158/2613 [27:02<05:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2159/2613 [27:02<05:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2160/2613 [27:03<05:40,  1.33it/s]

	Current Loss: 1.7876
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2161/2613 [27:04<05:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2162/2613 [27:05<05:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2163/2613 [27:05<05:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2164/2613 [27:06<05:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2165/2613 [27:07<05:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2166/2613 [27:08<05:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2167/2613 [27:08<05:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2168/2613 [27:09<05:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2169/2613 [27:10<05:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2170/2613 [27:11<05:32,  1.33it/s]

	Current Loss: 1.7834
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2171/2613 [27:11<05:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2172/2613 [27:12<05:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2173/2613 [27:13<05:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2174/2613 [27:14<05:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2175/2613 [27:14<05:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2176/2613 [27:15<05:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2177/2613 [27:16<05:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2178/2613 [27:17<05:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2179/2613 [27:17<05:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2180/2613 [27:18<05:25,  1.33it/s]

	Current Loss: 1.7856
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2181/2613 [27:19<05:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2182/2613 [27:20<05:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2183/2613 [27:20<05:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2184/2613 [27:21<05:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2185/2613 [27:22<05:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2186/2613 [27:23<05:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2187/2613 [27:23<05:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2188/2613 [27:24<05:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2189/2613 [27:25<05:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2190/2613 [27:26<05:17,  1.33it/s]

	Current Loss: 1.7870
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2191/2613 [27:26<05:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2192/2613 [27:27<05:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2193/2613 [27:28<05:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2194/2613 [27:29<05:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2195/2613 [27:29<05:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2196/2613 [27:30<05:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2197/2613 [27:31<05:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2198/2613 [27:32<05:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2199/2613 [27:32<05:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2200/2613 [27:33<05:10,  1.33it/s]

	Current Loss: 1.7865
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2201/2613 [27:34<05:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2202/2613 [27:35<05:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2203/2613 [27:35<05:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2204/2613 [27:36<05:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2205/2613 [27:37<05:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2206/2613 [27:38<05:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2207/2613 [27:38<05:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2208/2613 [27:39<05:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2209/2613 [27:40<05:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2210/2613 [27:41<05:02,  1.33it/s]

	Current Loss: 1.7875
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2211/2613 [27:41<05:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2212/2613 [27:42<05:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2213/2613 [27:43<05:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2214/2613 [27:44<04:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2215/2613 [27:44<04:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2216/2613 [27:45<04:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2217/2613 [27:46<04:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2218/2613 [27:47<04:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2219/2613 [27:47<04:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2220/2613 [27:48<04:55,  1.33it/s]

	Current Loss: 1.7964
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2221/2613 [27:49<04:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2222/2613 [27:50<04:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2223/2613 [27:50<04:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2224/2613 [27:51<04:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2225/2613 [27:52<04:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2226/2613 [27:53<04:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2227/2613 [27:53<04:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2228/2613 [27:54<04:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2229/2613 [27:55<04:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2230/2613 [27:56<04:47,  1.33it/s]

	Current Loss: 1.7847
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2231/2613 [27:56<04:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2232/2613 [27:57<04:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2233/2613 [27:58<04:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2234/2613 [27:59<04:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2235/2613 [27:59<04:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2236/2613 [28:00<04:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2237/2613 [28:01<04:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2238/2613 [28:02<04:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2239/2613 [28:02<04:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2240/2613 [28:03<04:40,  1.33it/s]

	Current Loss: 1.7806
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2241/2613 [28:04<04:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2242/2613 [28:05<04:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2243/2613 [28:05<04:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2244/2613 [28:06<04:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2245/2613 [28:07<04:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2246/2613 [28:08<04:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2247/2613 [28:08<04:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2248/2613 [28:09<04:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2249/2613 [28:10<04:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2250/2613 [28:11<04:32,  1.33it/s]

	Current Loss: 1.7773
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2251/2613 [28:11<04:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2252/2613 [28:12<04:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2253/2613 [28:13<04:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2254/2613 [28:14<04:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2255/2613 [28:14<04:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2256/2613 [28:15<04:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2257/2613 [28:16<04:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2258/2613 [28:17<04:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2259/2613 [28:18<04:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2260/2613 [28:18<04:25,  1.33it/s]

	Current Loss: 1.7822
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2261/2613 [28:19<04:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2262/2613 [28:20<04:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2263/2613 [28:21<04:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2264/2613 [28:21<04:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2265/2613 [28:22<04:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2266/2613 [28:23<04:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2267/2613 [28:24<04:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2268/2613 [28:24<04:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2269/2613 [28:25<04:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2270/2613 [28:26<04:17,  1.33it/s]

	Current Loss: 1.7811
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2271/2613 [28:27<04:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2272/2613 [28:27<04:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2273/2613 [28:28<04:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2274/2613 [28:29<04:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2275/2613 [28:30<04:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2276/2613 [28:30<04:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2277/2613 [28:31<04:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2278/2613 [28:32<04:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2279/2613 [28:33<04:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2280/2613 [28:33<04:10,  1.33it/s]

	Current Loss: 1.7816
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2281/2613 [28:34<04:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2282/2613 [28:35<04:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2283/2613 [28:36<04:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2284/2613 [28:36<04:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2285/2613 [28:37<04:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2286/2613 [28:38<04:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2287/2613 [28:39<04:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2288/2613 [28:39<04:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2289/2613 [28:40<04:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2290/2613 [28:41<04:02,  1.33it/s]

	Current Loss: 1.7740
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2291/2613 [28:42<04:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2292/2613 [28:42<04:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2293/2613 [28:43<04:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2294/2613 [28:44<03:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2295/2613 [28:45<03:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2296/2613 [28:45<03:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2297/2613 [28:46<03:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2298/2613 [28:47<03:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2299/2613 [28:48<03:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2300/2613 [28:48<03:55,  1.33it/s]

	Current Loss: 1.7755
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2301/2613 [28:49<03:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2302/2613 [28:50<03:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2303/2613 [28:51<03:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2304/2613 [28:51<03:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2305/2613 [28:52<03:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2306/2613 [28:53<03:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2307/2613 [28:54<03:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2308/2613 [28:54<03:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2309/2613 [28:55<03:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2310/2613 [28:56<03:47,  1.33it/s]

	Current Loss: 1.7765
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2311/2613 [28:57<03:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2312/2613 [28:57<03:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2313/2613 [28:58<03:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2314/2613 [28:59<03:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2315/2613 [29:00<03:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2316/2613 [29:00<03:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2317/2613 [29:01<03:46,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2318/2613 [29:02<03:44,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2319/2613 [29:03<03:42,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2320/2613 [29:03<03:41,  1.32it/s]

	Current Loss: 1.7701
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2321/2613 [29:04<03:40,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2322/2613 [29:05<03:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2323/2613 [29:06<03:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2324/2613 [29:06<03:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2325/2613 [29:07<03:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2326/2613 [29:08<03:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2327/2613 [29:09<03:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2328/2613 [29:09<03:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2329/2613 [29:10<03:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2330/2613 [29:11<03:32,  1.33it/s]

	Current Loss: 1.7778
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2331/2613 [29:12<03:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2332/2613 [29:12<03:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2333/2613 [29:13<03:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2334/2613 [29:14<03:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2335/2613 [29:15<03:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2336/2613 [29:15<03:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2337/2613 [29:16<03:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2338/2613 [29:17<03:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2339/2613 [29:18<03:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2340/2613 [29:18<03:25,  1.33it/s]

	Current Loss: 1.7787
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2341/2613 [29:19<03:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2342/2613 [29:20<03:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2343/2613 [29:21<03:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2344/2613 [29:21<03:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2345/2613 [29:22<03:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2346/2613 [29:23<03:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2347/2613 [29:24<03:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2348/2613 [29:24<03:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2349/2613 [29:25<03:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2350/2613 [29:26<03:17,  1.33it/s]

	Current Loss: 1.7814
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2351/2613 [29:27<03:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2352/2613 [29:27<03:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2353/2613 [29:28<03:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2354/2613 [29:29<03:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2355/2613 [29:30<03:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2356/2613 [29:30<03:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2357/2613 [29:31<03:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2358/2613 [29:32<03:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2359/2613 [29:33<03:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2360/2613 [29:33<03:10,  1.33it/s]

	Current Loss: 1.7756
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2361/2613 [29:34<03:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2362/2613 [29:35<03:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2363/2613 [29:36<03:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2364/2613 [29:36<03:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2365/2613 [29:37<03:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2366/2613 [29:38<03:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2367/2613 [29:39<03:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2368/2613 [29:39<03:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2369/2613 [29:40<03:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2370/2613 [29:41<03:02,  1.33it/s]

	Current Loss: 1.7754
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2371/2613 [29:42<03:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2372/2613 [29:42<03:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2373/2613 [29:43<03:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2374/2613 [29:44<02:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2375/2613 [29:45<02:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2376/2613 [29:45<02:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2377/2613 [29:46<02:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2378/2613 [29:47<02:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2379/2613 [29:48<02:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2380/2613 [29:48<02:55,  1.33it/s]

	Current Loss: 1.7734
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2381/2613 [29:49<02:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2382/2613 [29:50<02:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2383/2613 [29:51<02:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2384/2613 [29:51<02:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2385/2613 [29:52<02:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2386/2613 [29:53<02:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2387/2613 [29:54<02:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2388/2613 [29:54<02:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2389/2613 [29:55<02:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2390/2613 [29:56<02:47,  1.33it/s]

	Current Loss: 1.7671
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2391/2613 [29:57<02:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2392/2613 [29:57<02:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2393/2613 [29:58<02:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2394/2613 [29:59<02:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2395/2613 [30:00<02:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2396/2613 [30:00<02:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2397/2613 [30:01<02:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2398/2613 [30:02<02:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2399/2613 [30:03<02:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2400/2613 [30:03<02:40,  1.33it/s]

	Current Loss: 1.7725
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2401/2613 [30:04<02:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2402/2613 [30:05<02:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2403/2613 [30:06<02:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2404/2613 [30:06<02:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2405/2613 [30:07<02:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2406/2613 [30:08<02:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2407/2613 [30:09<02:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2408/2613 [30:09<02:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2409/2613 [30:10<02:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2410/2613 [30:11<02:32,  1.33it/s]

	Current Loss: 1.7751
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2411/2613 [30:12<02:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2412/2613 [30:13<02:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2413/2613 [30:13<02:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2414/2613 [30:14<02:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2415/2613 [30:15<02:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2416/2613 [30:16<02:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2417/2613 [30:16<02:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2418/2613 [30:17<02:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2419/2613 [30:18<02:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2420/2613 [30:19<02:25,  1.33it/s]

	Current Loss: 1.7724
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2421/2613 [30:19<02:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2422/2613 [30:20<02:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2423/2613 [30:21<02:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2424/2613 [30:22<02:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2425/2613 [30:22<02:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2426/2613 [30:23<02:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2427/2613 [30:24<02:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2428/2613 [30:25<02:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2429/2613 [30:25<02:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2430/2613 [30:26<02:17,  1.33it/s]

	Current Loss: 1.7712
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2431/2613 [30:27<02:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2432/2613 [30:28<02:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2433/2613 [30:28<02:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2434/2613 [30:29<02:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2435/2613 [30:30<02:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2436/2613 [30:31<02:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2437/2613 [30:31<02:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2438/2613 [30:32<02:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2439/2613 [30:33<02:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2440/2613 [30:34<02:09,  1.33it/s]

	Current Loss: 1.7668
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2441/2613 [30:34<02:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2442/2613 [30:35<02:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2443/2613 [30:36<02:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2444/2613 [30:37<02:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2445/2613 [30:37<02:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2446/2613 [30:38<02:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2447/2613 [30:39<02:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2448/2613 [30:40<02:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2449/2613 [30:40<02:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2450/2613 [30:41<02:02,  1.33it/s]

	Current Loss: 1.7595
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2451/2613 [30:42<02:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2452/2613 [30:43<02:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2453/2613 [30:43<02:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2454/2613 [30:44<01:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2455/2613 [30:45<01:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2456/2613 [30:46<01:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2457/2613 [30:46<01:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2458/2613 [30:47<01:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2459/2613 [30:48<01:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2460/2613 [30:49<01:54,  1.33it/s]

	Current Loss: 1.7606
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2461/2613 [30:49<01:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2462/2613 [30:50<01:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2463/2613 [30:51<01:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2464/2613 [30:52<01:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2465/2613 [30:52<01:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2466/2613 [30:53<01:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2467/2613 [30:54<01:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2468/2613 [30:55<01:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2469/2613 [30:55<01:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2470/2613 [30:56<01:47,  1.33it/s]

	Current Loss: 1.7698
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2471/2613 [30:57<01:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2472/2613 [30:58<01:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2473/2613 [30:58<01:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2474/2613 [30:59<01:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2475/2613 [31:00<01:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2476/2613 [31:01<01:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2477/2613 [31:01<01:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2478/2613 [31:02<01:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2479/2613 [31:03<01:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2480/2613 [31:04<01:39,  1.33it/s]

	Current Loss: 1.7665
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2481/2613 [31:04<01:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2482/2613 [31:05<01:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2483/2613 [31:06<01:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2484/2613 [31:07<01:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2485/2613 [31:07<01:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2486/2613 [31:08<01:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2487/2613 [31:09<01:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2488/2613 [31:10<01:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2489/2613 [31:10<01:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2490/2613 [31:11<01:32,  1.33it/s]

	Current Loss: 1.7705
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2491/2613 [31:12<01:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2492/2613 [31:13<01:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2493/2613 [31:13<01:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2494/2613 [31:14<01:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2495/2613 [31:15<01:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2496/2613 [31:16<01:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2497/2613 [31:16<01:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2498/2613 [31:17<01:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2499/2613 [31:18<01:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2500/2613 [31:19<01:24,  1.33it/s]

	Current Loss: 1.7623
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2501/2613 [31:19<01:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2502/2613 [31:20<01:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2503/2613 [31:21<01:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2504/2613 [31:22<01:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2505/2613 [31:22<01:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2506/2613 [31:23<01:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2507/2613 [31:24<01:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2508/2613 [31:25<01:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2509/2613 [31:25<01:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2510/2613 [31:26<01:17,  1.33it/s]

	Current Loss: 1.7669
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2511/2613 [31:27<01:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2512/2613 [31:28<01:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2513/2613 [31:28<01:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2514/2613 [31:29<01:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2515/2613 [31:30<01:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2516/2613 [31:31<01:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2517/2613 [31:31<01:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2518/2613 [31:32<01:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2519/2613 [31:33<01:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2520/2613 [31:34<01:09,  1.33it/s]

	Current Loss: 1.7690
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2521/2613 [31:34<01:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2522/2613 [31:35<01:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2523/2613 [31:36<01:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2524/2613 [31:37<01:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2525/2613 [31:37<01:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2526/2613 [31:38<01:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2527/2613 [31:39<01:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2528/2613 [31:40<01:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2529/2613 [31:40<01:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2530/2613 [31:41<01:02,  1.33it/s]

	Current Loss: 1.7639
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2531/2613 [31:42<01:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2532/2613 [31:43<01:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2533/2613 [31:43<01:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2534/2613 [31:44<00:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2535/2613 [31:45<00:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2536/2613 [31:46<00:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2537/2613 [31:46<00:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2538/2613 [31:47<00:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2539/2613 [31:48<00:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2540/2613 [31:49<00:54,  1.33it/s]

	Current Loss: 1.7665
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2541/2613 [31:49<00:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2542/2613 [31:50<00:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2543/2613 [31:51<00:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2544/2613 [31:52<00:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2545/2613 [31:52<00:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2546/2613 [31:53<00:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2547/2613 [31:54<00:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2548/2613 [31:55<00:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2549/2613 [31:55<00:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2550/2613 [31:56<00:47,  1.33it/s]

	Current Loss: 1.7636
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2551/2613 [31:57<00:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2552/2613 [31:58<00:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2553/2613 [31:58<00:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2554/2613 [31:59<00:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2555/2613 [32:00<00:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2556/2613 [32:01<00:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2557/2613 [32:01<00:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2558/2613 [32:02<00:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2559/2613 [32:03<00:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2560/2613 [32:04<00:39,  1.33it/s]

	Current Loss: 1.7638
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2561/2613 [32:04<00:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2562/2613 [32:05<00:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2563/2613 [32:06<00:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2564/2613 [32:07<00:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2565/2613 [32:07<00:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2566/2613 [32:08<00:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2567/2613 [32:09<00:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2568/2613 [32:10<00:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2569/2613 [32:10<00:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2570/2613 [32:11<00:32,  1.33it/s]

	Current Loss: 1.7628
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2571/2613 [32:12<00:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2572/2613 [32:13<00:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2573/2613 [32:13<00:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2574/2613 [32:14<00:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2575/2613 [32:15<00:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2576/2613 [32:16<00:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2577/2613 [32:16<00:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2578/2613 [32:17<00:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2579/2613 [32:18<00:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2580/2613 [32:19<00:24,  1.33it/s]

	Current Loss: 1.7616
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2581/2613 [32:19<00:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2582/2613 [32:20<00:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2583/2613 [32:21<00:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2584/2613 [32:22<00:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2585/2613 [32:22<00:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2586/2613 [32:23<00:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2587/2613 [32:24<00:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2588/2613 [32:25<00:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2589/2613 [32:26<00:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2590/2613 [32:26<00:17,  1.33it/s]

	Current Loss: 1.7578
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2591/2613 [32:27<00:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2592/2613 [32:28<00:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2593/2613 [32:29<00:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2594/2613 [32:29<00:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2595/2613 [32:30<00:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2596/2613 [32:31<00:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2597/2613 [32:32<00:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2598/2613 [32:32<00:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2599/2613 [32:33<00:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2600/2613 [32:34<00:09,  1.33it/s]

	Current Loss: 1.7578
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2601/2613 [32:35<00:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2602/2613 [32:35<00:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2603/2613 [32:36<00:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2604/2613 [32:37<00:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2605/2613 [32:38<00:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2606/2613 [32:38<00:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2607/2613 [32:39<00:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2608/2613 [32:40<00:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2609/2613 [32:41<00:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2610/2613 [32:41<00:02,  1.33it/s]

	Current Loss: 1.7657
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2611/2613 [32:42<00:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2612/2613 [32:43<00:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|██████████| 2613/2613 [32:44<00:00,  1.33it/s]


Epoch 2, Train Loss: 1.8457, Time: 1964.09s


  0%|          | 0/2613 [00:00<?, ?it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 1/2613 [00:00<14:34,  2.99it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 2/2613 [00:01<25:12,  1.73it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 3/2613 [00:01<28:37,  1.52it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 4/2613 [00:02<30:12,  1.44it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 5/2613 [00:03<31:04,  1.40it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 6/2613 [00:04<31:36,  1.37it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 7/2613 [00:04<31:55,  1.36it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 8/2613 [00:05<32:07,  1.35it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 9/2613 [00:06<32:16,  1.34it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 10/2613 [00:07<32:21,  1.34it/s]

	Current Loss: 1.7559
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 11/2613 [00:07<32:24,  1.34it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 12/2613 [00:08<32:26,  1.34it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 13/2613 [00:09<32:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 14/2613 [00:10<32:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 15/2613 [00:10<32:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 16/2613 [00:11<32:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 17/2613 [00:12<32:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 18/2613 [00:13<32:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 19/2613 [00:13<32:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 20/2613 [00:14<32:27,  1.33it/s]

	Current Loss: 1.7616
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 21/2613 [00:15<32:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 22/2613 [00:16<32:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 23/2613 [00:16<32:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 24/2613 [00:17<32:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 25/2613 [00:18<32:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 26/2613 [00:19<32:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 27/2613 [00:19<32:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 28/2613 [00:20<32:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 29/2613 [00:21<32:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 30/2613 [00:22<32:20,  1.33it/s]

	Current Loss: 1.7657
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 31/2613 [00:22<32:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 32/2613 [00:23<32:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 33/2613 [00:24<32:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 34/2613 [00:25<32:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 35/2613 [00:25<32:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 36/2613 [00:26<32:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 37/2613 [00:27<32:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 38/2613 [00:28<32:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 39/2613 [00:28<32:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 40/2613 [00:29<32:13,  1.33it/s]

	Current Loss: 1.7626
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 41/2613 [00:30<32:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 42/2613 [00:31<32:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 43/2613 [00:31<32:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 44/2613 [00:32<32:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 45/2613 [00:33<32:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 46/2613 [00:34<32:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 47/2613 [00:34<32:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 48/2613 [00:35<32:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 49/2613 [00:36<32:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 50/2613 [00:37<32:06,  1.33it/s]

	Current Loss: 1.7563
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 51/2613 [00:37<32:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 52/2613 [00:38<32:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 53/2613 [00:39<32:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 54/2613 [00:40<32:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 55/2613 [00:40<32:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 56/2613 [00:41<32:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 57/2613 [00:42<32:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 58/2613 [00:43<31:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 59/2613 [00:43<31:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 60/2613 [00:44<31:58,  1.33it/s]

	Current Loss: 1.7546
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 61/2613 [00:45<31:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 62/2613 [00:46<31:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 63/2613 [00:46<31:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 64/2613 [00:47<31:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 65/2613 [00:48<31:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 66/2613 [00:49<31:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 67/2613 [00:49<31:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 68/2613 [00:50<31:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 69/2613 [00:51<31:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 70/2613 [00:52<31:50,  1.33it/s]

	Current Loss: 1.7593
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 71/2613 [00:52<31:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 72/2613 [00:53<31:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 73/2613 [00:54<31:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 74/2613 [00:55<31:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 75/2613 [00:55<31:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 76/2613 [00:56<31:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 77/2613 [00:57<31:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 78/2613 [00:58<31:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 79/2613 [00:58<31:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 80/2613 [00:59<31:42,  1.33it/s]

	Current Loss: 1.7569
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 81/2613 [01:00<31:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 82/2613 [01:01<31:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 83/2613 [01:01<31:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 84/2613 [01:02<31:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 85/2613 [01:03<31:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 86/2613 [01:04<31:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 87/2613 [01:04<31:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 88/2613 [01:05<31:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 89/2613 [01:06<31:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 90/2613 [01:07<31:36,  1.33it/s]

	Current Loss: 1.7558
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 91/2613 [01:07<31:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 92/2613 [01:08<31:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 93/2613 [01:09<31:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 94/2613 [01:10<31:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 95/2613 [01:10<31:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 96/2613 [01:11<31:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 97/2613 [01:12<31:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 98/2613 [01:13<31:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 99/2613 [01:13<31:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 100/2613 [01:14<31:28,  1.33it/s]

	Current Loss: 1.7511
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 101/2613 [01:15<31:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 102/2613 [01:16<31:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 103/2613 [01:16<31:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 104/2613 [01:17<31:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 105/2613 [01:18<31:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 106/2613 [01:19<31:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 107/2613 [01:19<31:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 108/2613 [01:20<31:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 109/2613 [01:21<31:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 110/2613 [01:22<31:20,  1.33it/s]

	Current Loss: 1.7587
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 111/2613 [01:22<31:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 112/2613 [01:23<31:32,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 113/2613 [01:24<31:29,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 114/2613 [01:25<31:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 115/2613 [01:26<31:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 116/2613 [01:26<31:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 117/2613 [01:27<31:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 118/2613 [01:28<31:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 119/2613 [01:29<31:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 120/2613 [01:29<31:14,  1.33it/s]

	Current Loss: 1.7465
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 121/2613 [01:30<31:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 122/2613 [01:31<31:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 123/2613 [01:32<31:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 124/2613 [01:32<31:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 125/2613 [01:33<31:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 126/2613 [01:34<31:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 127/2613 [01:35<31:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 128/2613 [01:35<31:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 129/2613 [01:36<31:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 130/2613 [01:37<31:05,  1.33it/s]

	Current Loss: 1.7559
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 131/2613 [01:38<31:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 132/2613 [01:38<31:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 133/2613 [01:39<31:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 134/2613 [01:40<31:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 135/2613 [01:41<31:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 136/2613 [01:41<31:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 137/2613 [01:42<31:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 138/2613 [01:43<31:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 139/2613 [01:44<30:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 140/2613 [01:44<30:58,  1.33it/s]

	Current Loss: 1.7588
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 141/2613 [01:45<30:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 142/2613 [01:46<30:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 143/2613 [01:47<30:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 144/2613 [01:47<30:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 145/2613 [01:48<30:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 146/2613 [01:49<30:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 147/2613 [01:50<30:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 148/2613 [01:50<30:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 149/2613 [01:51<30:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 150/2613 [01:52<30:50,  1.33it/s]

	Current Loss: 1.7514
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 151/2613 [01:53<30:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 152/2613 [01:53<30:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 153/2613 [01:54<30:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 154/2613 [01:55<30:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 155/2613 [01:56<30:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 156/2613 [01:56<30:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 157/2613 [01:57<30:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 158/2613 [01:58<30:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 159/2613 [01:59<30:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 160/2613 [01:59<30:43,  1.33it/s]

	Current Loss: 1.7502
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 161/2613 [02:00<30:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 162/2613 [02:01<30:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 163/2613 [02:02<30:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 164/2613 [02:02<30:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 165/2613 [02:03<30:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 166/2613 [02:04<30:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 167/2613 [02:05<30:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 168/2613 [02:05<30:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 169/2613 [02:06<30:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 170/2613 [02:07<30:34,  1.33it/s]

	Current Loss: 1.7493
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 171/2613 [02:08<30:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 172/2613 [02:08<30:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 173/2613 [02:09<30:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 174/2613 [02:10<30:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 175/2613 [02:11<30:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 176/2613 [02:11<30:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 177/2613 [02:12<30:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 178/2613 [02:13<30:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 179/2613 [02:14<30:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 180/2613 [02:14<30:28,  1.33it/s]

	Current Loss: 1.7620
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 181/2613 [02:15<30:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 182/2613 [02:16<30:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 183/2613 [02:17<30:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 184/2613 [02:17<30:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 185/2613 [02:18<30:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 186/2613 [02:19<30:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 187/2613 [02:20<30:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 188/2613 [02:20<30:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 189/2613 [02:21<30:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 190/2613 [02:22<30:23,  1.33it/s]

	Current Loss: 1.7545
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 191/2613 [02:23<30:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 192/2613 [02:23<30:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 193/2613 [02:24<30:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 194/2613 [02:25<30:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 195/2613 [02:26<30:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 196/2613 [02:26<30:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 197/2613 [02:27<30:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 198/2613 [02:28<30:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 199/2613 [02:29<30:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 200/2613 [02:29<30:13,  1.33it/s]

	Current Loss: 1.7555
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 201/2613 [02:30<30:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 202/2613 [02:31<30:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 203/2613 [02:32<30:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 204/2613 [02:32<30:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 205/2613 [02:33<30:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 206/2613 [02:34<30:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 207/2613 [02:35<30:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 208/2613 [02:35<30:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 209/2613 [02:36<30:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 210/2613 [02:37<30:06,  1.33it/s]

	Current Loss: 1.7469
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 211/2613 [02:38<30:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 212/2613 [02:38<30:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 213/2613 [02:39<30:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 214/2613 [02:40<30:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 215/2613 [02:41<30:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 216/2613 [02:41<30:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 217/2613 [02:42<30:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 218/2613 [02:43<29:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 219/2613 [02:44<29:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 220/2613 [02:44<29:57,  1.33it/s]

	Current Loss: 1.7475
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 221/2613 [02:45<29:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 222/2613 [02:46<29:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 223/2613 [02:47<29:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 224/2613 [02:47<29:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 225/2613 [02:48<29:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 226/2613 [02:49<29:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 227/2613 [02:50<29:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 228/2613 [02:50<29:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 229/2613 [02:51<29:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 230/2613 [02:52<29:49,  1.33it/s]

	Current Loss: 1.7450
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 231/2613 [02:53<29:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 232/2613 [02:53<29:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 233/2613 [02:54<29:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 234/2613 [02:55<29:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 235/2613 [02:56<29:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 236/2613 [02:56<29:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 237/2613 [02:57<29:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 238/2613 [02:58<29:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 239/2613 [02:59<29:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 240/2613 [02:59<29:43,  1.33it/s]

	Current Loss: 1.7462
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 241/2613 [03:00<29:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 242/2613 [03:01<29:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 243/2613 [03:02<29:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 244/2613 [03:02<29:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 245/2613 [03:03<29:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 246/2613 [03:04<29:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 247/2613 [03:05<29:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 248/2613 [03:05<29:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 249/2613 [03:06<29:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 250/2613 [03:07<29:37,  1.33it/s]

	Current Loss: 1.7483
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 251/2613 [03:08<29:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 252/2613 [03:08<29:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 253/2613 [03:09<29:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 254/2613 [03:10<29:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 255/2613 [03:11<29:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 256/2613 [03:11<29:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 257/2613 [03:12<29:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 258/2613 [03:13<29:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 259/2613 [03:14<29:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 260/2613 [03:14<29:28,  1.33it/s]

	Current Loss: 1.7486
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 261/2613 [03:15<29:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 262/2613 [03:16<29:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 263/2613 [03:17<29:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 264/2613 [03:17<29:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 265/2613 [03:18<29:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 266/2613 [03:19<29:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 267/2613 [03:20<29:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 268/2613 [03:20<29:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 269/2613 [03:21<29:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 270/2613 [03:22<29:21,  1.33it/s]

	Current Loss: 1.7533
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 271/2613 [03:23<29:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 272/2613 [03:23<29:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 273/2613 [03:24<29:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 274/2613 [03:25<29:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 275/2613 [03:26<29:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 276/2613 [03:27<29:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 277/2613 [03:27<29:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 278/2613 [03:28<29:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 279/2613 [03:29<29:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 280/2613 [03:30<29:13,  1.33it/s]

	Current Loss: 1.7511
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 281/2613 [03:30<29:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 282/2613 [03:31<29:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 283/2613 [03:32<29:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 284/2613 [03:33<29:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 285/2613 [03:33<29:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 286/2613 [03:34<29:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 287/2613 [03:35<29:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 288/2613 [03:36<29:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 289/2613 [03:36<29:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 290/2613 [03:37<29:06,  1.33it/s]

	Current Loss: 1.7454
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 291/2613 [03:38<29:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 292/2613 [03:39<29:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 293/2613 [03:39<29:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 294/2613 [03:40<29:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 295/2613 [03:41<29:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 296/2613 [03:42<29:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 297/2613 [03:42<29:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 298/2613 [03:43<29:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 299/2613 [03:44<28:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 300/2613 [03:45<28:57,  1.33it/s]

	Current Loss: 1.7445
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 301/2613 [03:45<28:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 302/2613 [03:46<28:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 303/2613 [03:47<28:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 304/2613 [03:48<28:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 305/2613 [03:48<28:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 306/2613 [03:49<28:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 307/2613 [03:50<28:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 308/2613 [03:51<28:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 309/2613 [03:51<28:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 310/2613 [03:52<28:50,  1.33it/s]

	Current Loss: 1.7466
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 311/2613 [03:53<28:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 312/2613 [03:54<28:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 313/2613 [03:54<28:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 314/2613 [03:55<28:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 315/2613 [03:56<28:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 316/2613 [03:57<28:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 317/2613 [03:57<28:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 318/2613 [03:58<28:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 319/2613 [03:59<28:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 320/2613 [04:00<28:42,  1.33it/s]

	Current Loss: 1.7477
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 321/2613 [04:00<28:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 322/2613 [04:01<28:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 323/2613 [04:02<28:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 324/2613 [04:03<28:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 325/2613 [04:03<28:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 326/2613 [04:04<28:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 327/2613 [04:05<28:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 328/2613 [04:06<28:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 329/2613 [04:06<28:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 330/2613 [04:07<28:34,  1.33it/s]

	Current Loss: 1.7448
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 331/2613 [04:08<28:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 332/2613 [04:09<28:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 333/2613 [04:09<28:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 334/2613 [04:10<28:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 335/2613 [04:11<28:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 336/2613 [04:12<28:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 337/2613 [04:12<28:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 338/2613 [04:13<28:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 339/2613 [04:14<28:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 340/2613 [04:15<28:27,  1.33it/s]

	Current Loss: 1.7456
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 341/2613 [04:15<28:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 342/2613 [04:16<28:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 343/2613 [04:17<28:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 344/2613 [04:18<28:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 345/2613 [04:18<28:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 346/2613 [04:19<28:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 347/2613 [04:20<28:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 348/2613 [04:21<28:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 349/2613 [04:21<28:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 350/2613 [04:22<28:19,  1.33it/s]

	Current Loss: 1.7436
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 351/2613 [04:23<28:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 352/2613 [04:24<28:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 353/2613 [04:24<28:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 354/2613 [04:25<28:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 355/2613 [04:26<28:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 356/2613 [04:27<28:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 357/2613 [04:27<28:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 358/2613 [04:28<28:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 359/2613 [04:29<28:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 360/2613 [04:30<28:13,  1.33it/s]

	Current Loss: 1.7380
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 361/2613 [04:30<28:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 362/2613 [04:31<28:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 363/2613 [04:32<28:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 364/2613 [04:33<28:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 365/2613 [04:33<28:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 366/2613 [04:34<28:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 367/2613 [04:35<28:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 368/2613 [04:36<28:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 369/2613 [04:36<28:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 370/2613 [04:37<28:04,  1.33it/s]

	Current Loss: 1.7433
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 371/2613 [04:38<28:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 372/2613 [04:39<28:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 373/2613 [04:39<28:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 374/2613 [04:40<28:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 375/2613 [04:41<28:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 376/2613 [04:42<28:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 377/2613 [04:42<27:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 378/2613 [04:43<27:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 379/2613 [04:44<27:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 380/2613 [04:45<27:58,  1.33it/s]

	Current Loss: 1.7434
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 381/2613 [04:45<27:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 382/2613 [04:46<27:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 383/2613 [04:47<27:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 384/2613 [04:48<27:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 385/2613 [04:48<27:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 386/2613 [04:49<27:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 387/2613 [04:50<27:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 388/2613 [04:51<27:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 389/2613 [04:51<27:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 390/2613 [04:52<27:50,  1.33it/s]

	Current Loss: 1.7410
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 391/2613 [04:53<27:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 392/2613 [04:54<27:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 393/2613 [04:54<27:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 394/2613 [04:55<27:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 395/2613 [04:56<27:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 396/2613 [04:57<27:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 397/2613 [04:57<27:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 398/2613 [04:58<27:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 399/2613 [04:59<27:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 400/2613 [05:00<27:42,  1.33it/s]

	Current Loss: 1.7393
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 401/2613 [05:00<27:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 402/2613 [05:01<27:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 403/2613 [05:02<27:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 404/2613 [05:03<27:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 405/2613 [05:03<27:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 406/2613 [05:04<27:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 407/2613 [05:05<27:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 408/2613 [05:06<27:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 409/2613 [05:06<27:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 410/2613 [05:07<27:34,  1.33it/s]

	Current Loss: 1.7431
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 411/2613 [05:08<27:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 412/2613 [05:09<27:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 413/2613 [05:09<27:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 414/2613 [05:10<27:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 415/2613 [05:11<27:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 416/2613 [05:12<27:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 417/2613 [05:12<27:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 418/2613 [05:13<27:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 419/2613 [05:14<27:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 420/2613 [05:15<27:27,  1.33it/s]

	Current Loss: 1.7376
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 421/2613 [05:15<27:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 422/2613 [05:16<27:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 423/2613 [05:17<27:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 424/2613 [05:18<27:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 425/2613 [05:18<27:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 426/2613 [05:19<27:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 427/2613 [05:20<27:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 428/2613 [05:21<27:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 429/2613 [05:21<27:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 430/2613 [05:22<27:20,  1.33it/s]

	Current Loss: 1.7418
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 431/2613 [05:23<27:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 432/2613 [05:24<27:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 433/2613 [05:24<27:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 434/2613 [05:25<27:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 435/2613 [05:26<27:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 436/2613 [05:27<27:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 437/2613 [05:27<27:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 438/2613 [05:28<27:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 439/2613 [05:29<27:38,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 440/2613 [05:30<27:25,  1.32it/s]

	Current Loss: 1.7407
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 441/2613 [05:31<27:21,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 442/2613 [05:31<27:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 443/2613 [05:32<27:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 444/2613 [05:33<27:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 445/2613 [05:34<27:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 446/2613 [05:34<27:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 447/2613 [05:35<27:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 448/2613 [05:36<27:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 449/2613 [05:37<27:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 450/2613 [05:37<27:05,  1.33it/s]

	Current Loss: 1.7386
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 451/2613 [05:38<27:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 452/2613 [05:39<27:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 453/2613 [05:40<27:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 454/2613 [05:40<27:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 455/2613 [05:41<27:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 456/2613 [05:42<27:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 457/2613 [05:43<26:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 458/2613 [05:43<27:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 459/2613 [05:44<26:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 460/2613 [05:45<26:57,  1.33it/s]

	Current Loss: 1.7385
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 461/2613 [05:46<26:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 462/2613 [05:46<26:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 463/2613 [05:47<26:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 464/2613 [05:48<26:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 465/2613 [05:49<26:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 466/2613 [05:49<26:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 467/2613 [05:50<26:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 468/2613 [05:51<26:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 469/2613 [05:52<26:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 470/2613 [05:52<26:50,  1.33it/s]

	Current Loss: 1.7369
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 471/2613 [05:53<26:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 472/2613 [05:54<26:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 473/2613 [05:55<26:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 474/2613 [05:55<26:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 475/2613 [05:56<26:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 476/2613 [05:57<26:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 477/2613 [05:58<26:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 478/2613 [05:58<26:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 479/2613 [05:59<26:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 480/2613 [06:00<26:43,  1.33it/s]

	Current Loss: 1.7370
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 481/2613 [06:01<26:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 482/2613 [06:01<26:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 483/2613 [06:02<26:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 484/2613 [06:03<26:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 485/2613 [06:04<26:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 486/2613 [06:04<26:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 487/2613 [06:05<26:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 488/2613 [06:06<26:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 489/2613 [06:07<26:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 490/2613 [06:07<26:36,  1.33it/s]

	Current Loss: 1.7357
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 491/2613 [06:08<26:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 492/2613 [06:09<26:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 493/2613 [06:10<26:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 494/2613 [06:10<26:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 495/2613 [06:11<26:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 496/2613 [06:12<26:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 497/2613 [06:13<26:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 498/2613 [06:13<26:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 499/2613 [06:14<26:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 500/2613 [06:15<26:27,  1.33it/s]

	Current Loss: 1.7385
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 501/2613 [06:16<26:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 502/2613 [06:16<26:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 503/2613 [06:17<26:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 504/2613 [06:18<26:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 505/2613 [06:19<26:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 506/2613 [06:19<26:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 507/2613 [06:20<26:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 508/2613 [06:21<26:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 509/2613 [06:22<26:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 510/2613 [06:22<26:20,  1.33it/s]

	Current Loss: 1.7344
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 511/2613 [06:23<26:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 512/2613 [06:24<26:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 513/2613 [06:25<26:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 514/2613 [06:25<26:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 515/2613 [06:26<26:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 516/2613 [06:27<26:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 517/2613 [06:28<26:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 518/2613 [06:28<26:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 519/2613 [06:29<26:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 520/2613 [06:30<26:14,  1.33it/s]

	Current Loss: 1.7358
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 521/2613 [06:31<26:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 522/2613 [06:31<26:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 523/2613 [06:32<26:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 524/2613 [06:33<26:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 525/2613 [06:34<26:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 526/2613 [06:34<26:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 527/2613 [06:35<26:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 528/2613 [06:36<26:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 529/2613 [06:37<26:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 530/2613 [06:37<26:05,  1.33it/s]

	Current Loss: 1.7332
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 531/2613 [06:38<26:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 532/2613 [06:39<26:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 533/2613 [06:40<26:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 534/2613 [06:40<26:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 535/2613 [06:41<26:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 536/2613 [06:42<26:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 537/2613 [06:43<25:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 538/2613 [06:43<25:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 539/2613 [06:44<25:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 540/2613 [06:45<25:57,  1.33it/s]

	Current Loss: 1.7411
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 541/2613 [06:46<25:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 542/2613 [06:46<25:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 543/2613 [06:47<25:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 544/2613 [06:48<25:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 545/2613 [06:49<25:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 546/2613 [06:49<25:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 547/2613 [06:50<25:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 548/2613 [06:51<25:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 549/2613 [06:52<25:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 550/2613 [06:52<25:51,  1.33it/s]

	Current Loss: 1.7289
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 551/2613 [06:53<25:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 552/2613 [06:54<25:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 553/2613 [06:55<25:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 554/2613 [06:55<25:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 555/2613 [06:56<25:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 556/2613 [06:57<25:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 557/2613 [06:58<25:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 558/2613 [06:58<25:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 559/2613 [06:59<25:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 560/2613 [07:00<25:42,  1.33it/s]

	Current Loss: 1.7318
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 561/2613 [07:01<25:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 562/2613 [07:01<25:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 563/2613 [07:02<25:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 564/2613 [07:03<25:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 565/2613 [07:04<25:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 566/2613 [07:04<25:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 567/2613 [07:05<25:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 568/2613 [07:06<25:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 569/2613 [07:07<25:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 570/2613 [07:07<25:34,  1.33it/s]

	Current Loss: 1.7398
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 571/2613 [07:08<25:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 572/2613 [07:09<25:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 573/2613 [07:10<25:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 574/2613 [07:10<25:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 575/2613 [07:11<25:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 576/2613 [07:12<25:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 577/2613 [07:13<25:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 578/2613 [07:13<25:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 579/2613 [07:14<25:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 580/2613 [07:15<25:27,  1.33it/s]

	Current Loss: 1.7355
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 581/2613 [07:16<25:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 582/2613 [07:16<25:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 583/2613 [07:17<25:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 584/2613 [07:18<25:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 585/2613 [07:19<25:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 586/2613 [07:19<25:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 587/2613 [07:20<25:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 588/2613 [07:21<25:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 589/2613 [07:22<25:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 590/2613 [07:22<25:19,  1.33it/s]

	Current Loss: 1.7349
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 591/2613 [07:23<25:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 592/2613 [07:24<25:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 593/2613 [07:25<25:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 594/2613 [07:25<25:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 595/2613 [07:26<25:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 596/2613 [07:27<25:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 597/2613 [07:28<25:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 598/2613 [07:28<25:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 599/2613 [07:29<25:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 600/2613 [07:30<25:12,  1.33it/s]

	Current Loss: 1.7326
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 601/2613 [07:31<25:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 602/2613 [07:31<25:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 603/2613 [07:32<25:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 604/2613 [07:33<25:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 605/2613 [07:34<25:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 606/2613 [07:34<25:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 607/2613 [07:35<25:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 608/2613 [07:36<25:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 609/2613 [07:37<25:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 610/2613 [07:38<25:05,  1.33it/s]

	Current Loss: 1.7357
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 611/2613 [07:38<25:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 612/2613 [07:39<25:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 613/2613 [07:40<25:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 614/2613 [07:41<25:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 615/2613 [07:41<25:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 616/2613 [07:42<25:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 617/2613 [07:43<24:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 618/2613 [07:44<24:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 619/2613 [07:44<24:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 620/2613 [07:45<24:57,  1.33it/s]

	Current Loss: 1.7307
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 621/2613 [07:46<24:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 622/2613 [07:47<24:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 623/2613 [07:47<24:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 624/2613 [07:48<24:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 625/2613 [07:49<24:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 626/2613 [07:50<24:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 627/2613 [07:50<24:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 628/2613 [07:51<24:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 629/2613 [07:52<24:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 630/2613 [07:53<24:50,  1.33it/s]

	Current Loss: 1.7326
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 631/2613 [07:53<24:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 632/2613 [07:54<24:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 633/2613 [07:55<24:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 634/2613 [07:56<24:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 635/2613 [07:56<24:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 636/2613 [07:57<24:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 637/2613 [07:58<24:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 638/2613 [07:59<24:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 639/2613 [07:59<24:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 640/2613 [08:00<24:43,  1.33it/s]

	Current Loss: 1.7232
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 641/2613 [08:01<24:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 642/2613 [08:02<24:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 643/2613 [08:02<24:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 644/2613 [08:03<24:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 645/2613 [08:04<24:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 646/2613 [08:05<24:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 647/2613 [08:05<24:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 648/2613 [08:06<24:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 649/2613 [08:07<24:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 650/2613 [08:08<24:34,  1.33it/s]

	Current Loss: 1.7276
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 651/2613 [08:08<24:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 652/2613 [08:09<24:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 653/2613 [08:10<24:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 654/2613 [08:11<24:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 655/2613 [08:11<24:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 656/2613 [08:12<24:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 657/2613 [08:13<24:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 658/2613 [08:14<24:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 659/2613 [08:14<24:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 660/2613 [08:15<24:28,  1.33it/s]

	Current Loss: 1.7328
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 661/2613 [08:16<24:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 662/2613 [08:17<24:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 663/2613 [08:17<24:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 664/2613 [08:18<24:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 665/2613 [08:19<24:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 666/2613 [08:20<24:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 667/2613 [08:20<24:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 668/2613 [08:21<24:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 669/2613 [08:22<24:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 670/2613 [08:23<24:19,  1.33it/s]

	Current Loss: 1.7258
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 671/2613 [08:23<24:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 672/2613 [08:24<24:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 673/2613 [08:25<24:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 674/2613 [08:26<24:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 675/2613 [08:26<24:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 676/2613 [08:27<24:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 677/2613 [08:28<24:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 678/2613 [08:29<24:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 679/2613 [08:29<24:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 680/2613 [08:30<24:12,  1.33it/s]

	Current Loss: 1.7285
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 681/2613 [08:31<24:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 682/2613 [08:32<24:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 683/2613 [08:32<24:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 684/2613 [08:33<24:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 685/2613 [08:34<24:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 686/2613 [08:35<24:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 687/2613 [08:35<24:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 688/2613 [08:36<24:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 689/2613 [08:37<24:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 690/2613 [08:38<24:05,  1.33it/s]

	Current Loss: 1.7305
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 691/2613 [08:38<24:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 692/2613 [08:39<24:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 693/2613 [08:40<24:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 694/2613 [08:41<24:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 695/2613 [08:41<24:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 696/2613 [08:42<23:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 697/2613 [08:43<23:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 698/2613 [08:44<23:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 699/2613 [08:44<23:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 700/2613 [08:45<23:57,  1.33it/s]

	Current Loss: 1.7214
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 701/2613 [08:46<23:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 702/2613 [08:47<23:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 703/2613 [08:47<23:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 704/2613 [08:48<23:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 705/2613 [08:49<23:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 706/2613 [08:50<23:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 707/2613 [08:50<23:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 708/2613 [08:51<23:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 709/2613 [08:52<23:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 710/2613 [08:53<23:50,  1.33it/s]

	Current Loss: 1.7219
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 711/2613 [08:53<23:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 712/2613 [08:54<23:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 713/2613 [08:55<23:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 714/2613 [08:56<23:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 715/2613 [08:56<23:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 716/2613 [08:57<23:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 717/2613 [08:58<23:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 718/2613 [08:59<23:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 719/2613 [08:59<23:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 720/2613 [09:00<23:42,  1.33it/s]

	Current Loss: 1.7294
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 721/2613 [09:01<23:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 722/2613 [09:02<23:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 723/2613 [09:02<23:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 724/2613 [09:03<23:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 725/2613 [09:04<23:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 726/2613 [09:05<23:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 727/2613 [09:05<23:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 728/2613 [09:06<23:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 729/2613 [09:07<23:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 730/2613 [09:08<23:35,  1.33it/s]

	Current Loss: 1.7194
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 731/2613 [09:08<23:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 732/2613 [09:09<23:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 733/2613 [09:10<23:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 734/2613 [09:11<23:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 735/2613 [09:11<23:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 736/2613 [09:12<23:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 737/2613 [09:13<23:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 738/2613 [09:14<23:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 739/2613 [09:14<23:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 740/2613 [09:15<23:26,  1.33it/s]

	Current Loss: 1.7207
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 741/2613 [09:16<23:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 742/2613 [09:17<23:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 743/2613 [09:17<23:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 744/2613 [09:18<23:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 745/2613 [09:19<23:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 746/2613 [09:20<23:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 747/2613 [09:20<23:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 748/2613 [09:21<23:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 749/2613 [09:22<23:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 750/2613 [09:23<23:20,  1.33it/s]

	Current Loss: 1.7228
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 751/2613 [09:23<23:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 752/2613 [09:24<23:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 753/2613 [09:25<23:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 754/2613 [09:26<23:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 755/2613 [09:26<23:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 756/2613 [09:27<23:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 757/2613 [09:28<23:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 758/2613 [09:29<23:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 759/2613 [09:29<23:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 760/2613 [09:30<23:12,  1.33it/s]

	Current Loss: 1.7195
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 761/2613 [09:31<23:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 762/2613 [09:32<23:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 763/2613 [09:32<23:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 764/2613 [09:33<23:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 765/2613 [09:34<23:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 766/2613 [09:35<23:27,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 767/2613 [09:36<23:21,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 768/2613 [09:36<23:15,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 769/2613 [09:37<23:11,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 770/2613 [09:38<23:08,  1.33it/s]

	Current Loss: 1.7154
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 771/2613 [09:39<23:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 772/2613 [09:39<23:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 773/2613 [09:40<23:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 774/2613 [09:41<23:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 775/2613 [09:42<23:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 776/2613 [09:42<23:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 777/2613 [09:43<23:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 778/2613 [09:44<22:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 779/2613 [09:45<23:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 780/2613 [09:45<22:58,  1.33it/s]

	Current Loss: 1.7220
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 781/2613 [09:46<22:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 782/2613 [09:47<22:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 783/2613 [09:48<22:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 784/2613 [09:48<22:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 785/2613 [09:49<22:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 786/2613 [09:50<22:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 787/2613 [09:51<22:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 788/2613 [09:51<22:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 789/2613 [09:52<22:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 790/2613 [09:53<22:51,  1.33it/s]

	Current Loss: 1.7269
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 791/2613 [09:54<22:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 792/2613 [09:54<22:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 793/2613 [09:55<22:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 794/2613 [09:56<22:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 795/2613 [09:57<22:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 796/2613 [09:57<22:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 797/2613 [09:58<22:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 798/2613 [09:59<22:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 799/2613 [10:00<22:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 800/2613 [10:00<22:42,  1.33it/s]

	Current Loss: 1.7214
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 801/2613 [10:01<22:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 802/2613 [10:02<22:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 803/2613 [10:03<22:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 804/2613 [10:03<22:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 805/2613 [10:04<22:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 806/2613 [10:05<22:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 807/2613 [10:06<22:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 808/2613 [10:06<22:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 809/2613 [10:07<22:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 810/2613 [10:08<22:35,  1.33it/s]

	Current Loss: 1.7260
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 811/2613 [10:09<22:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 812/2613 [10:09<22:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 813/2613 [10:10<22:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 814/2613 [10:11<22:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 815/2613 [10:12<22:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 816/2613 [10:12<22:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 817/2613 [10:13<22:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 818/2613 [10:14<22:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 819/2613 [10:15<22:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 820/2613 [10:15<22:27,  1.33it/s]

	Current Loss: 1.7198
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 821/2613 [10:16<22:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 822/2613 [10:17<22:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 823/2613 [10:18<22:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 824/2613 [10:18<22:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 825/2613 [10:19<22:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 826/2613 [10:20<22:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 827/2613 [10:21<22:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 828/2613 [10:21<22:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 829/2613 [10:22<22:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 830/2613 [10:23<22:20,  1.33it/s]

	Current Loss: 1.7222
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 831/2613 [10:24<22:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 832/2613 [10:24<22:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 833/2613 [10:25<22:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 834/2613 [10:26<22:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 835/2613 [10:27<22:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 836/2613 [10:27<22:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 837/2613 [10:28<22:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 838/2613 [10:29<22:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 839/2613 [10:30<22:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 840/2613 [10:30<22:14,  1.33it/s]

	Current Loss: 1.7229
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 841/2613 [10:31<22:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 842/2613 [10:32<22:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 843/2613 [10:33<22:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 844/2613 [10:33<22:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 845/2613 [10:34<22:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 846/2613 [10:35<22:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 847/2613 [10:36<22:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 848/2613 [10:36<22:30,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 849/2613 [10:37<22:21,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 850/2613 [10:38<22:16,  1.32it/s]

	Current Loss: 1.7194
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 851/2613 [10:39<22:12,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 852/2613 [10:39<22:09,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 853/2613 [10:40<22:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 854/2613 [10:41<22:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 855/2613 [10:42<22:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 856/2613 [10:42<22:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 857/2613 [10:43<22:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 858/2613 [10:44<21:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 859/2613 [10:45<21:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 860/2613 [10:45<21:58,  1.33it/s]

	Current Loss: 1.7183
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 861/2613 [10:46<21:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 862/2613 [10:47<21:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 863/2613 [10:48<21:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 864/2613 [10:48<21:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 865/2613 [10:49<21:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 866/2613 [10:50<21:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 867/2613 [10:51<21:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 868/2613 [10:51<21:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 869/2613 [10:52<21:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 870/2613 [10:53<21:49,  1.33it/s]

	Current Loss: 1.7209
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 871/2613 [10:54<21:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 872/2613 [10:55<21:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 873/2613 [10:55<21:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 874/2613 [10:56<21:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 875/2613 [10:57<21:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 876/2613 [10:58<21:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 877/2613 [10:58<21:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 878/2613 [10:59<21:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 879/2613 [11:00<21:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 880/2613 [11:01<21:42,  1.33it/s]

	Current Loss: 1.7150
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 881/2613 [11:01<21:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 882/2613 [11:02<21:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 883/2613 [11:03<21:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 884/2613 [11:04<21:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 885/2613 [11:04<21:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 886/2613 [11:05<21:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 887/2613 [11:06<21:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 888/2613 [11:07<21:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 889/2613 [11:07<21:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 890/2613 [11:08<21:35,  1.33it/s]

	Current Loss: 1.7200
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 891/2613 [11:09<21:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 892/2613 [11:10<21:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 893/2613 [11:10<21:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 894/2613 [11:11<21:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 895/2613 [11:12<21:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 896/2613 [11:13<21:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 897/2613 [11:13<21:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 898/2613 [11:14<21:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 899/2613 [11:15<21:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 900/2613 [11:16<21:27,  1.33it/s]

	Current Loss: 1.7144
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 901/2613 [11:16<21:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 902/2613 [11:17<21:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 903/2613 [11:18<21:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 904/2613 [11:19<21:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 905/2613 [11:19<21:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 906/2613 [11:20<21:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 907/2613 [11:21<21:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 908/2613 [11:22<21:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 909/2613 [11:22<21:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 910/2613 [11:23<21:19,  1.33it/s]

	Current Loss: 1.7144
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 911/2613 [11:24<21:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 912/2613 [11:25<21:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 913/2613 [11:25<21:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 914/2613 [11:26<21:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 915/2613 [11:27<21:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 916/2613 [11:28<21:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 917/2613 [11:28<21:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 918/2613 [11:29<21:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 919/2613 [11:30<21:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 920/2613 [11:31<21:14,  1.33it/s]

	Current Loss: 1.7147
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 921/2613 [11:31<21:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 922/2613 [11:32<21:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 923/2613 [11:33<21:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 924/2613 [11:34<21:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 925/2613 [11:34<21:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 926/2613 [11:35<21:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 927/2613 [11:36<21:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 928/2613 [11:37<21:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 929/2613 [11:37<21:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 930/2613 [11:38<21:06,  1.33it/s]

	Current Loss: 1.7198
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 931/2613 [11:39<21:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 932/2613 [11:40<21:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 933/2613 [11:40<21:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 934/2613 [11:41<21:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 935/2613 [11:42<21:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 936/2613 [11:43<21:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 937/2613 [11:43<21:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 938/2613 [11:44<20:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 939/2613 [11:45<20:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 940/2613 [11:46<20:58,  1.33it/s]

	Current Loss: 1.7193
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 941/2613 [11:46<20:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 942/2613 [11:47<20:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 943/2613 [11:48<20:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 944/2613 [11:49<20:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 945/2613 [11:49<20:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 946/2613 [11:50<20:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 947/2613 [11:51<20:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 948/2613 [11:52<20:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 949/2613 [11:52<20:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 950/2613 [11:53<20:51,  1.33it/s]

	Current Loss: 1.7139
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 951/2613 [11:54<20:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 952/2613 [11:55<20:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 953/2613 [11:55<20:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 954/2613 [11:56<20:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 955/2613 [11:57<20:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 956/2613 [11:58<20:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 957/2613 [11:58<20:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 958/2613 [11:59<20:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 959/2613 [12:00<20:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 960/2613 [12:01<20:43,  1.33it/s]

	Current Loss: 1.7181
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 961/2613 [12:01<20:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 962/2613 [12:02<20:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 963/2613 [12:03<20:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 964/2613 [12:04<20:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 965/2613 [12:04<20:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 966/2613 [12:05<20:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 967/2613 [12:06<20:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 968/2613 [12:07<20:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 969/2613 [12:07<20:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 970/2613 [12:08<20:34,  1.33it/s]

	Current Loss: 1.7104
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 971/2613 [12:09<20:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 972/2613 [12:10<20:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 973/2613 [12:10<20:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 974/2613 [12:11<20:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 975/2613 [12:12<20:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 976/2613 [12:13<20:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 977/2613 [12:13<20:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 978/2613 [12:14<20:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 979/2613 [12:15<20:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 980/2613 [12:16<20:27,  1.33it/s]

	Current Loss: 1.7221
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 981/2613 [12:16<20:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 982/2613 [12:17<20:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 983/2613 [12:18<20:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 984/2613 [12:19<20:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 985/2613 [12:19<20:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 986/2613 [12:20<20:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 987/2613 [12:21<20:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 988/2613 [12:22<20:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 989/2613 [12:22<20:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 990/2613 [12:23<20:20,  1.33it/s]

	Current Loss: 1.7089
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 991/2613 [12:24<20:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 992/2613 [12:25<20:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 993/2613 [12:25<20:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 994/2613 [12:26<20:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 995/2613 [12:27<20:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 996/2613 [12:28<20:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 997/2613 [12:28<20:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 998/2613 [12:29<20:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 999/2613 [12:30<20:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1000/2613 [12:31<20:12,  1.33it/s]

	Current Loss: 1.7161
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1001/2613 [12:31<20:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1002/2613 [12:32<20:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1003/2613 [12:33<20:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1004/2613 [12:34<20:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1005/2613 [12:34<20:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1006/2613 [12:35<20:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1007/2613 [12:36<20:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1008/2613 [12:37<20:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1009/2613 [12:37<20:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1010/2613 [12:38<20:03,  1.33it/s]

	Current Loss: 1.7192
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1011/2613 [12:39<20:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1012/2613 [12:40<20:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1013/2613 [12:40<20:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1014/2613 [12:41<20:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1015/2613 [12:42<20:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1016/2613 [12:43<19:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1017/2613 [12:44<19:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1018/2613 [12:44<19:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1019/2613 [12:45<19:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1020/2613 [12:46<19:58,  1.33it/s]

	Current Loss: 1.7169
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1021/2613 [12:47<19:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1022/2613 [12:47<19:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1023/2613 [12:48<19:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1024/2613 [12:49<19:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1025/2613 [12:50<19:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1026/2613 [12:50<19:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1027/2613 [12:51<19:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1028/2613 [12:52<19:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1029/2613 [12:53<19:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1030/2613 [12:53<19:49,  1.33it/s]

	Current Loss: 1.7078
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1031/2613 [12:54<19:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1032/2613 [12:55<19:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1033/2613 [12:56<19:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1034/2613 [12:56<19:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1035/2613 [12:57<19:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1036/2613 [12:58<19:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1037/2613 [12:59<19:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1038/2613 [12:59<19:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1039/2613 [13:00<19:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1040/2613 [13:01<19:43,  1.33it/s]

	Current Loss: 1.7091
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1041/2613 [13:02<19:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1042/2613 [13:02<19:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1043/2613 [13:03<19:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1044/2613 [13:04<19:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1045/2613 [13:05<19:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1046/2613 [13:05<19:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1047/2613 [13:06<19:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1048/2613 [13:07<19:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1049/2613 [13:08<19:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1050/2613 [13:08<19:36,  1.33it/s]

	Current Loss: 1.7130
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1051/2613 [13:09<19:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1052/2613 [13:10<19:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1053/2613 [13:11<19:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1054/2613 [13:11<19:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1055/2613 [13:12<19:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1056/2613 [13:13<19:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1057/2613 [13:14<19:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1058/2613 [13:14<19:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1059/2613 [13:15<19:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1060/2613 [13:16<19:27,  1.33it/s]

	Current Loss: 1.7108
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1061/2613 [13:17<19:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1062/2613 [13:17<19:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1063/2613 [13:18<19:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1064/2613 [13:19<19:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1065/2613 [13:20<19:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1066/2613 [13:20<19:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1067/2613 [13:21<19:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1068/2613 [13:22<19:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1069/2613 [13:23<19:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1070/2613 [13:23<19:19,  1.33it/s]

	Current Loss: 1.7070
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1071/2613 [13:24<19:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1072/2613 [13:25<19:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1073/2613 [13:26<19:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1074/2613 [13:26<19:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1075/2613 [13:27<19:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1076/2613 [13:28<19:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1077/2613 [13:29<19:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1078/2613 [13:29<19:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1079/2613 [13:30<19:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1080/2613 [13:31<19:12,  1.33it/s]

	Current Loss: 1.7137
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1081/2613 [13:32<19:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1082/2613 [13:32<19:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1083/2613 [13:33<19:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1084/2613 [13:34<19:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1085/2613 [13:35<19:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1086/2613 [13:35<19:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1087/2613 [13:36<19:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1088/2613 [13:37<19:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1089/2613 [13:38<19:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1090/2613 [13:38<19:06,  1.33it/s]

	Current Loss: 1.7018
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1091/2613 [13:39<19:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1092/2613 [13:40<19:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1093/2613 [13:41<19:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1094/2613 [13:41<19:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1095/2613 [13:42<19:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1096/2613 [13:43<19:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1097/2613 [13:44<18:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1098/2613 [13:44<18:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1099/2613 [13:45<18:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1100/2613 [13:46<18:57,  1.33it/s]

	Current Loss: 1.7140
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1101/2613 [13:47<18:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1102/2613 [13:47<18:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1103/2613 [13:48<18:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1104/2613 [13:49<18:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1105/2613 [13:50<18:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1106/2613 [13:50<18:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1107/2613 [13:51<18:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1108/2613 [13:52<18:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1109/2613 [13:53<18:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1110/2613 [13:53<18:49,  1.33it/s]

	Current Loss: 1.7040
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1111/2613 [13:54<18:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1112/2613 [13:55<18:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1113/2613 [13:56<18:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1114/2613 [13:56<18:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1115/2613 [13:57<18:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1116/2613 [13:58<18:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1117/2613 [13:59<18:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1118/2613 [13:59<18:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1119/2613 [14:00<18:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1120/2613 [14:01<18:42,  1.33it/s]

	Current Loss: 1.7029
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1121/2613 [14:02<18:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1122/2613 [14:02<18:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1123/2613 [14:03<18:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1124/2613 [14:04<18:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1125/2613 [14:05<18:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1126/2613 [14:05<18:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1127/2613 [14:06<18:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1128/2613 [14:07<18:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1129/2613 [14:08<18:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1130/2613 [14:08<18:35,  1.33it/s]

	Current Loss: 1.7008
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1131/2613 [14:09<18:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1132/2613 [14:10<18:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1133/2613 [14:11<18:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1134/2613 [14:11<18:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1135/2613 [14:12<18:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1136/2613 [14:13<18:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1137/2613 [14:14<18:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1138/2613 [14:14<18:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1139/2613 [14:15<18:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1140/2613 [14:16<18:26,  1.33it/s]

	Current Loss: 1.7086
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1141/2613 [14:17<18:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1142/2613 [14:17<18:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1143/2613 [14:18<18:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1144/2613 [14:19<18:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1145/2613 [14:20<18:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1146/2613 [14:21<18:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1147/2613 [14:21<18:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1148/2613 [14:22<18:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1149/2613 [14:23<18:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1150/2613 [14:24<18:19,  1.33it/s]

	Current Loss: 1.7080
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1151/2613 [14:24<18:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1152/2613 [14:25<18:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1153/2613 [14:26<18:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1154/2613 [14:27<18:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1155/2613 [14:27<18:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1156/2613 [14:28<18:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1157/2613 [14:29<18:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1158/2613 [14:30<18:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1159/2613 [14:30<18:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1160/2613 [14:31<18:12,  1.33it/s]

	Current Loss: 1.7013
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1161/2613 [14:32<18:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1162/2613 [14:33<18:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1163/2613 [14:33<18:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1164/2613 [14:34<18:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1165/2613 [14:35<18:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1166/2613 [14:36<18:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1167/2613 [14:36<18:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1168/2613 [14:37<18:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1169/2613 [14:38<18:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1170/2613 [14:39<18:05,  1.33it/s]

	Current Loss: 1.7037
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1171/2613 [14:39<18:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1172/2613 [14:40<18:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1173/2613 [14:41<18:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1174/2613 [14:42<18:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1175/2613 [14:42<18:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1176/2613 [14:43<17:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1177/2613 [14:44<17:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1178/2613 [14:45<17:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1179/2613 [14:45<17:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1180/2613 [14:46<17:57,  1.33it/s]

	Current Loss: 1.7095
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1181/2613 [14:47<17:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1182/2613 [14:48<17:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1183/2613 [14:48<17:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1184/2613 [14:49<17:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1185/2613 [14:50<17:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1186/2613 [14:51<17:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1187/2613 [14:51<17:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1188/2613 [14:52<17:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1189/2613 [14:53<17:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1190/2613 [14:54<17:49,  1.33it/s]

	Current Loss: 1.7098
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1191/2613 [14:54<17:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1192/2613 [14:55<17:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1193/2613 [14:56<17:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1194/2613 [14:57<17:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1195/2613 [14:57<17:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1196/2613 [14:58<17:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1197/2613 [14:59<17:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1198/2613 [15:00<17:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1199/2613 [15:00<17:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1200/2613 [15:01<17:41,  1.33it/s]

	Current Loss: 1.7061
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1201/2613 [15:02<17:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1202/2613 [15:03<17:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1203/2613 [15:03<17:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1204/2613 [15:04<17:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1205/2613 [15:05<17:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1206/2613 [15:06<17:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1207/2613 [15:06<17:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1208/2613 [15:07<17:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1209/2613 [15:08<17:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1210/2613 [15:09<17:35,  1.33it/s]

	Current Loss: 1.7041
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1211/2613 [15:09<17:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1212/2613 [15:10<17:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1213/2613 [15:11<17:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1214/2613 [15:12<17:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1215/2613 [15:12<17:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1216/2613 [15:13<17:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1217/2613 [15:14<17:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1218/2613 [15:15<17:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1219/2613 [15:15<17:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1220/2613 [15:16<17:27,  1.33it/s]

	Current Loss: 1.7057
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1221/2613 [15:17<17:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1222/2613 [15:18<17:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1223/2613 [15:18<17:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1224/2613 [15:19<17:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1225/2613 [15:20<17:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1226/2613 [15:21<17:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1227/2613 [15:21<17:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1228/2613 [15:22<17:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1229/2613 [15:23<17:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1230/2613 [15:24<17:20,  1.33it/s]

	Current Loss: 1.7051
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1231/2613 [15:24<17:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1232/2613 [15:25<17:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1233/2613 [15:26<17:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1234/2613 [15:27<17:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1235/2613 [15:27<17:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1236/2613 [15:28<17:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1237/2613 [15:29<17:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1238/2613 [15:30<17:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1239/2613 [15:30<17:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1240/2613 [15:31<17:12,  1.33it/s]

	Current Loss: 1.7013
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1241/2613 [15:32<17:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1242/2613 [15:33<17:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1243/2613 [15:33<17:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1244/2613 [15:34<17:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1245/2613 [15:35<17:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1246/2613 [15:36<17:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1247/2613 [15:36<17:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1248/2613 [15:37<17:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1249/2613 [15:38<17:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1250/2613 [15:39<17:04,  1.33it/s]

	Current Loss: 1.7063
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1251/2613 [15:39<17:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1252/2613 [15:40<17:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1253/2613 [15:41<17:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1254/2613 [15:42<17:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1255/2613 [15:42<17:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1256/2613 [15:43<17:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1257/2613 [15:44<16:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1258/2613 [15:45<16:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1259/2613 [15:45<16:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1260/2613 [15:46<16:57,  1.33it/s]

	Current Loss: 1.7063
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1261/2613 [15:47<16:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1262/2613 [15:48<16:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1263/2613 [15:48<16:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1264/2613 [15:49<16:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1265/2613 [15:50<16:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1266/2613 [15:51<16:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1267/2613 [15:51<16:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1268/2613 [15:52<16:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1269/2613 [15:53<16:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1270/2613 [15:54<16:49,  1.33it/s]

	Current Loss: 1.6990
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1271/2613 [15:54<16:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1272/2613 [15:55<16:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1273/2613 [15:56<16:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1274/2613 [15:57<16:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1275/2613 [15:57<16:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1276/2613 [15:58<16:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1277/2613 [15:59<16:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1278/2613 [16:00<16:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1279/2613 [16:00<16:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1280/2613 [16:01<16:42,  1.33it/s]

	Current Loss: 1.7020
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1281/2613 [16:02<16:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1282/2613 [16:03<16:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1283/2613 [16:04<16:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1284/2613 [16:04<16:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1285/2613 [16:05<16:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1286/2613 [16:06<16:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1287/2613 [16:07<16:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1288/2613 [16:07<16:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1289/2613 [16:08<16:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1290/2613 [16:09<16:34,  1.33it/s]

	Current Loss: 1.7011
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1291/2613 [16:10<16:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1292/2613 [16:10<16:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1293/2613 [16:11<16:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1294/2613 [16:12<16:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1295/2613 [16:13<16:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1296/2613 [16:13<16:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1297/2613 [16:14<16:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1298/2613 [16:15<16:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1299/2613 [16:16<16:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1300/2613 [16:16<16:29,  1.33it/s]

	Current Loss: 1.6996
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1301/2613 [16:17<16:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1302/2613 [16:18<16:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1303/2613 [16:19<16:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1304/2613 [16:19<16:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1305/2613 [16:20<16:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1306/2613 [16:21<16:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1307/2613 [16:22<16:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1308/2613 [16:22<16:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1309/2613 [16:23<16:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1310/2613 [16:24<16:20,  1.33it/s]

	Current Loss: 1.6980
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1311/2613 [16:25<16:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1312/2613 [16:25<16:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1313/2613 [16:26<16:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1314/2613 [16:27<16:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1315/2613 [16:28<16:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1316/2613 [16:28<16:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1317/2613 [16:29<16:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1318/2613 [16:30<16:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1319/2613 [16:31<16:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1320/2613 [16:31<16:11,  1.33it/s]

	Current Loss: 1.7035
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1321/2613 [16:32<16:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1322/2613 [16:33<16:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1323/2613 [16:34<16:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1324/2613 [16:34<16:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1325/2613 [16:35<16:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1326/2613 [16:36<16:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1327/2613 [16:37<16:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1328/2613 [16:37<16:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1329/2613 [16:38<16:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1330/2613 [16:39<16:04,  1.33it/s]

	Current Loss: 1.7002
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1331/2613 [16:40<16:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1332/2613 [16:40<16:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1333/2613 [16:41<16:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1334/2613 [16:42<16:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1335/2613 [16:43<16:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1336/2613 [16:43<15:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1337/2613 [16:44<15:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1338/2613 [16:45<15:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1339/2613 [16:46<15:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1340/2613 [16:46<15:57,  1.33it/s]

	Current Loss: 1.6965
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1341/2613 [16:47<15:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1342/2613 [16:48<15:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1343/2613 [16:49<15:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1344/2613 [16:49<15:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1345/2613 [16:50<15:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1346/2613 [16:51<15:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1347/2613 [16:52<15:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1348/2613 [16:52<15:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1349/2613 [16:53<15:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1350/2613 [16:54<15:49,  1.33it/s]

	Current Loss: 1.7048
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1351/2613 [16:55<15:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1352/2613 [16:55<15:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1353/2613 [16:56<15:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1354/2613 [16:57<15:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1355/2613 [16:58<15:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1356/2613 [16:58<15:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1357/2613 [16:59<15:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1358/2613 [17:00<15:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1359/2613 [17:01<15:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1360/2613 [17:01<15:42,  1.33it/s]

	Current Loss: 1.6942
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1361/2613 [17:02<15:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1362/2613 [17:03<15:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1363/2613 [17:04<15:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1364/2613 [17:04<15:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1365/2613 [17:05<15:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1366/2613 [17:06<15:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1367/2613 [17:07<15:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1368/2613 [17:07<15:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1369/2613 [17:08<15:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1370/2613 [17:09<15:34,  1.33it/s]

	Current Loss: 1.7039
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1371/2613 [17:10<15:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1372/2613 [17:10<15:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1373/2613 [17:11<15:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1374/2613 [17:12<15:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1375/2613 [17:13<15:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1376/2613 [17:13<15:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1377/2613 [17:14<15:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1378/2613 [17:15<15:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1379/2613 [17:16<15:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1380/2613 [17:16<15:26,  1.33it/s]

	Current Loss: 1.6997
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1381/2613 [17:17<15:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1382/2613 [17:18<15:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1383/2613 [17:19<15:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1384/2613 [17:19<15:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1385/2613 [17:20<15:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1386/2613 [17:21<15:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1387/2613 [17:22<15:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1388/2613 [17:22<15:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1389/2613 [17:23<15:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1390/2613 [17:24<15:20,  1.33it/s]

	Current Loss: 1.6951
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1391/2613 [17:25<15:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1392/2613 [17:25<15:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1393/2613 [17:26<15:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1394/2613 [17:27<15:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1395/2613 [17:28<15:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1396/2613 [17:28<15:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1397/2613 [17:29<15:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1398/2613 [17:30<15:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1399/2613 [17:31<15:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1400/2613 [17:31<15:11,  1.33it/s]

	Current Loss: 1.7018
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1401/2613 [17:32<15:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1402/2613 [17:33<15:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1403/2613 [17:34<15:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1404/2613 [17:34<15:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1405/2613 [17:35<15:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1406/2613 [17:36<15:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1407/2613 [17:37<15:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1408/2613 [17:37<15:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1409/2613 [17:38<15:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1410/2613 [17:39<15:04,  1.33it/s]

	Current Loss: 1.6978
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1411/2613 [17:40<15:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1412/2613 [17:40<15:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1413/2613 [17:41<15:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1414/2613 [17:42<15:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1415/2613 [17:43<15:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1416/2613 [17:44<14:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1417/2613 [17:44<14:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1418/2613 [17:45<14:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1419/2613 [17:46<14:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1420/2613 [17:47<15:10,  1.31it/s]

	Current Loss: 1.6931
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1421/2613 [17:47<15:05,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1422/2613 [17:48<15:02,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1423/2613 [17:49<14:59,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1424/2613 [17:50<14:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1425/2613 [17:50<14:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1426/2613 [17:51<14:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1427/2613 [17:52<14:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1428/2613 [17:53<14:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1429/2613 [17:53<14:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1430/2613 [17:54<14:49,  1.33it/s]

	Current Loss: 1.7000
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1431/2613 [17:55<14:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1432/2613 [17:56<14:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1433/2613 [17:56<14:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1434/2613 [17:57<14:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1435/2613 [17:58<14:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1436/2613 [17:59<14:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1437/2613 [17:59<14:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1438/2613 [18:00<14:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1439/2613 [18:01<14:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1440/2613 [18:02<14:43,  1.33it/s]

	Current Loss: 1.7031
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1441/2613 [18:02<14:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1442/2613 [18:03<14:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1443/2613 [18:04<14:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1444/2613 [18:05<14:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1445/2613 [18:05<14:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1446/2613 [18:06<14:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1447/2613 [18:07<14:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1448/2613 [18:08<14:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1449/2613 [18:08<14:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1450/2613 [18:09<14:34,  1.33it/s]

	Current Loss: 1.6894
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1451/2613 [18:10<14:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1452/2613 [18:11<14:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1453/2613 [18:11<14:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1454/2613 [18:12<14:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1455/2613 [18:13<14:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1456/2613 [18:14<14:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1457/2613 [18:14<14:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1458/2613 [18:15<14:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1459/2613 [18:16<14:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1460/2613 [18:17<14:27,  1.33it/s]

	Current Loss: 1.6947
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1461/2613 [18:17<14:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1462/2613 [18:18<14:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1463/2613 [18:19<14:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1464/2613 [18:20<14:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1465/2613 [18:20<14:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1466/2613 [18:21<14:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1467/2613 [18:22<14:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1468/2613 [18:23<14:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1469/2613 [18:23<14:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1470/2613 [18:24<14:19,  1.33it/s]

	Current Loss: 1.6985
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1471/2613 [18:25<14:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1472/2613 [18:26<14:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1473/2613 [18:26<14:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1474/2613 [18:27<14:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1475/2613 [18:28<14:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1476/2613 [18:29<14:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1477/2613 [18:29<14:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1478/2613 [18:30<14:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1479/2613 [18:31<14:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1480/2613 [18:32<14:12,  1.33it/s]

	Current Loss: 1.6938
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1481/2613 [18:32<14:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1482/2613 [18:33<14:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1483/2613 [18:34<14:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1484/2613 [18:35<14:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1485/2613 [18:35<14:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1486/2613 [18:36<14:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1487/2613 [18:37<14:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1488/2613 [18:38<14:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1489/2613 [18:38<14:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1490/2613 [18:39<14:03,  1.33it/s]

	Current Loss: 1.7008
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1491/2613 [18:40<14:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1492/2613 [18:41<14:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1493/2613 [18:41<14:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1494/2613 [18:42<14:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1495/2613 [18:43<14:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1496/2613 [18:44<13:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1497/2613 [18:44<13:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1498/2613 [18:45<13:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1499/2613 [18:46<13:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1500/2613 [18:47<13:57,  1.33it/s]

	Current Loss: 1.6936
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1501/2613 [18:47<13:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1502/2613 [18:48<13:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1503/2613 [18:49<13:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1504/2613 [18:50<13:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1505/2613 [18:50<13:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1506/2613 [18:51<13:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1507/2613 [18:52<13:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1508/2613 [18:53<13:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1509/2613 [18:53<13:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1510/2613 [18:54<13:49,  1.33it/s]

	Current Loss: 1.6908
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1511/2613 [18:55<13:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1512/2613 [18:56<13:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1513/2613 [18:56<13:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1514/2613 [18:57<13:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1515/2613 [18:58<13:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1516/2613 [18:59<13:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1517/2613 [18:59<13:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1518/2613 [19:00<13:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1519/2613 [19:01<13:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1520/2613 [19:02<13:41,  1.33it/s]

	Current Loss: 1.6932
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1521/2613 [19:02<13:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1522/2613 [19:03<13:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1523/2613 [19:04<13:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1524/2613 [19:05<13:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1525/2613 [19:05<13:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1526/2613 [19:06<13:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1527/2613 [19:07<13:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1528/2613 [19:08<13:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1529/2613 [19:09<13:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1530/2613 [19:09<13:34,  1.33it/s]

	Current Loss: 1.6899
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1531/2613 [19:10<13:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1532/2613 [19:11<13:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1533/2613 [19:12<13:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1534/2613 [19:12<13:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1535/2613 [19:13<13:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1536/2613 [19:14<13:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1537/2613 [19:15<13:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1538/2613 [19:15<13:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1539/2613 [19:16<13:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1540/2613 [19:17<13:27,  1.33it/s]

	Current Loss: 1.6933
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1541/2613 [19:18<13:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1542/2613 [19:18<13:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1543/2613 [19:19<13:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1544/2613 [19:20<13:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1545/2613 [19:21<13:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1546/2613 [19:21<13:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1547/2613 [19:22<13:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1548/2613 [19:23<13:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1549/2613 [19:24<13:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1550/2613 [19:24<13:19,  1.33it/s]

	Current Loss: 1.6937
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1551/2613 [19:25<13:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1552/2613 [19:26<13:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1553/2613 [19:27<13:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1554/2613 [19:27<13:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1555/2613 [19:28<13:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1556/2613 [19:29<13:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1557/2613 [19:30<13:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1558/2613 [19:30<13:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1559/2613 [19:31<13:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1560/2613 [19:32<13:12,  1.33it/s]

	Current Loss: 1.6964
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1561/2613 [19:33<13:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1562/2613 [19:33<13:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1563/2613 [19:34<13:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1564/2613 [19:35<13:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1565/2613 [19:36<13:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1566/2613 [19:36<13:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1567/2613 [19:37<13:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1568/2613 [19:38<13:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1569/2613 [19:39<13:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1570/2613 [19:39<13:04,  1.33it/s]

	Current Loss: 1.6935
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1571/2613 [19:40<13:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1572/2613 [19:41<13:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1573/2613 [19:42<13:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1574/2613 [19:42<13:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1575/2613 [19:43<13:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1576/2613 [19:44<13:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1577/2613 [19:45<12:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1578/2613 [19:45<12:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1579/2613 [19:46<12:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1580/2613 [19:47<12:56,  1.33it/s]

	Current Loss: 1.6832
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1581/2613 [19:48<12:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1582/2613 [19:48<12:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1583/2613 [19:49<13:07,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1584/2613 [19:50<12:49,  1.34it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1585/2613 [19:51<12:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1586/2613 [19:51<12:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1587/2613 [19:52<12:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1588/2613 [19:53<12:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1589/2613 [19:54<12:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1590/2613 [19:54<12:49,  1.33it/s]

	Current Loss: 1.6898
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1591/2613 [19:55<12:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1592/2613 [19:56<12:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1593/2613 [19:57<12:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1594/2613 [19:57<12:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1595/2613 [19:58<12:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1596/2613 [19:59<12:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1597/2613 [20:00<12:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1598/2613 [20:00<12:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1599/2613 [20:01<12:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1600/2613 [20:02<12:41,  1.33it/s]

	Current Loss: 1.6866
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1601/2613 [20:03<12:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1602/2613 [20:03<12:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1603/2613 [20:04<12:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1604/2613 [20:05<12:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1605/2613 [20:06<12:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1606/2613 [20:06<12:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1607/2613 [20:07<12:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1608/2613 [20:08<12:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1609/2613 [20:09<12:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1610/2613 [20:09<12:35,  1.33it/s]

	Current Loss: 1.6860
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1611/2613 [20:10<12:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1612/2613 [20:11<12:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1613/2613 [20:12<12:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1614/2613 [20:12<12:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1615/2613 [20:13<12:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1616/2613 [20:14<12:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1617/2613 [20:15<12:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1618/2613 [20:15<12:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1619/2613 [20:16<12:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1620/2613 [20:17<12:26,  1.33it/s]

	Current Loss: 1.6878
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1621/2613 [20:18<12:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1622/2613 [20:18<12:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1623/2613 [20:19<12:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1624/2613 [20:20<12:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1625/2613 [20:21<12:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1626/2613 [20:21<12:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1627/2613 [20:22<12:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1628/2613 [20:23<12:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1629/2613 [20:24<12:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1630/2613 [20:24<12:20,  1.33it/s]

	Current Loss: 1.6863
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1631/2613 [20:25<12:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1632/2613 [20:26<12:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1633/2613 [20:27<12:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1634/2613 [20:27<12:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1635/2613 [20:28<12:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1636/2613 [20:29<12:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1637/2613 [20:30<12:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1638/2613 [20:30<12:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1639/2613 [20:31<12:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1640/2613 [20:32<12:12,  1.33it/s]

	Current Loss: 1.6893
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1641/2613 [20:33<12:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1642/2613 [20:34<12:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1643/2613 [20:34<12:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1644/2613 [20:35<12:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1645/2613 [20:36<12:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1646/2613 [20:37<12:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1647/2613 [20:37<12:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1648/2613 [20:38<12:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1649/2613 [20:39<12:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1650/2613 [20:40<12:04,  1.33it/s]

	Current Loss: 1.6894
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1651/2613 [20:40<12:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1652/2613 [20:41<12:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1653/2613 [20:42<12:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1654/2613 [20:43<12:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1655/2613 [20:43<12:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1656/2613 [20:44<11:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1657/2613 [20:45<11:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1658/2613 [20:46<11:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1659/2613 [20:46<11:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1660/2613 [20:47<11:56,  1.33it/s]

	Current Loss: 1.6925
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1661/2613 [20:48<11:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1662/2613 [20:49<11:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1663/2613 [20:49<11:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1664/2613 [20:50<11:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1665/2613 [20:51<12:04,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1666/2613 [20:52<11:59,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1667/2613 [20:52<11:57,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1668/2613 [20:53<11:54,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1669/2613 [20:54<11:53,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1670/2613 [20:55<11:51,  1.33it/s]

	Current Loss: 1.6834
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1671/2613 [20:55<11:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1672/2613 [20:56<11:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1673/2613 [20:57<11:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1674/2613 [20:58<11:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1675/2613 [20:58<11:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1676/2613 [20:59<11:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1677/2613 [21:00<11:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1678/2613 [21:01<11:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1679/2613 [21:01<11:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1680/2613 [21:02<11:41,  1.33it/s]

	Current Loss: 1.6838
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1681/2613 [21:03<11:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1682/2613 [21:04<11:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1683/2613 [21:04<11:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1684/2613 [21:05<11:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1685/2613 [21:06<11:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1686/2613 [21:07<11:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1687/2613 [21:07<11:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1688/2613 [21:08<11:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1689/2613 [21:09<11:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1690/2613 [21:10<11:34,  1.33it/s]

	Current Loss: 1.6837
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1691/2613 [21:10<11:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1692/2613 [21:11<11:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1693/2613 [21:12<11:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1694/2613 [21:13<11:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1695/2613 [21:13<11:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1696/2613 [21:14<11:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1697/2613 [21:15<11:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1698/2613 [21:16<11:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1699/2613 [21:16<11:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1700/2613 [21:17<11:26,  1.33it/s]

	Current Loss: 1.6867
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1701/2613 [21:18<11:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1702/2613 [21:19<11:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1703/2613 [21:19<11:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1704/2613 [21:20<11:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1705/2613 [21:21<11:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1706/2613 [21:22<11:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1707/2613 [21:22<11:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1708/2613 [21:23<11:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1709/2613 [21:24<11:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1710/2613 [21:25<11:18,  1.33it/s]

	Current Loss: 1.6853
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1711/2613 [21:25<11:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1712/2613 [21:26<11:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1713/2613 [21:27<11:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1714/2613 [21:28<11:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1715/2613 [21:28<11:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1716/2613 [21:29<11:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1717/2613 [21:30<11:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1718/2613 [21:31<11:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1719/2613 [21:31<11:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1720/2613 [21:32<11:11,  1.33it/s]

	Current Loss: 1.6843
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1721/2613 [21:33<11:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1722/2613 [21:34<11:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1723/2613 [21:34<11:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1724/2613 [21:35<11:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1725/2613 [21:36<11:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1726/2613 [21:37<11:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1727/2613 [21:37<11:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1728/2613 [21:38<11:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1729/2613 [21:39<11:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1730/2613 [21:40<11:04,  1.33it/s]

	Current Loss: 1.6814
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1731/2613 [21:40<11:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1732/2613 [21:41<11:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1733/2613 [21:42<11:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1734/2613 [21:43<11:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1735/2613 [21:43<11:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1736/2613 [21:44<10:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1737/2613 [21:45<10:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1738/2613 [21:46<10:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1739/2613 [21:46<10:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1740/2613 [21:47<10:56,  1.33it/s]

	Current Loss: 1.6855
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1741/2613 [21:48<10:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1742/2613 [21:49<10:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1743/2613 [21:50<10:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1744/2613 [21:50<10:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1745/2613 [21:51<10:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1746/2613 [21:52<10:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1747/2613 [21:53<10:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1748/2613 [21:53<10:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1749/2613 [21:54<10:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1750/2613 [21:55<10:49,  1.33it/s]

	Current Loss: 1.6791
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1751/2613 [21:56<10:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1752/2613 [21:56<10:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1753/2613 [21:57<10:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1754/2613 [21:58<10:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1755/2613 [21:59<10:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1756/2613 [21:59<10:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1757/2613 [22:00<10:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1758/2613 [22:01<10:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1759/2613 [22:02<10:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1760/2613 [22:02<10:42,  1.33it/s]

	Current Loss: 1.6795
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1761/2613 [22:03<10:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1762/2613 [22:04<10:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1763/2613 [22:05<10:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1764/2613 [22:05<10:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1765/2613 [22:06<10:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1766/2613 [22:07<10:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1767/2613 [22:08<10:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1768/2613 [22:08<10:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1769/2613 [22:09<10:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1770/2613 [22:10<10:34,  1.33it/s]

	Current Loss: 1.6817
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1771/2613 [22:11<10:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1772/2613 [22:11<10:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1773/2613 [22:12<10:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1774/2613 [22:13<10:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1775/2613 [22:14<10:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1776/2613 [22:14<10:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1777/2613 [22:15<10:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1778/2613 [22:16<10:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1779/2613 [22:17<10:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1780/2613 [22:17<10:26,  1.33it/s]

	Current Loss: 1.6745
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1781/2613 [22:18<10:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1782/2613 [22:19<10:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1783/2613 [22:20<10:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1784/2613 [22:20<10:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1785/2613 [22:21<10:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1786/2613 [22:22<10:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1787/2613 [22:23<10:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1788/2613 [22:23<10:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1789/2613 [22:24<10:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1790/2613 [22:25<10:18,  1.33it/s]

	Current Loss: 1.6869
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1791/2613 [22:26<10:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1792/2613 [22:26<10:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1793/2613 [22:27<10:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1794/2613 [22:28<10:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1795/2613 [22:29<10:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1796/2613 [22:29<10:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1797/2613 [22:30<10:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1798/2613 [22:31<10:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1799/2613 [22:32<10:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1800/2613 [22:32<10:11,  1.33it/s]

	Current Loss: 1.6826
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1801/2613 [22:33<10:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1802/2613 [22:34<10:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1803/2613 [22:35<10:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1804/2613 [22:35<10:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1805/2613 [22:36<10:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1806/2613 [22:37<10:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1807/2613 [22:38<10:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1808/2613 [22:38<10:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1809/2613 [22:39<10:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1810/2613 [22:40<10:03,  1.33it/s]

	Current Loss: 1.6826
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1811/2613 [22:41<10:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1812/2613 [22:41<10:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1813/2613 [22:42<10:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1814/2613 [22:43<10:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1815/2613 [22:44<09:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1816/2613 [22:44<09:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1817/2613 [22:45<09:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1818/2613 [22:46<09:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1819/2613 [22:47<09:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1820/2613 [22:47<09:55,  1.33it/s]

	Current Loss: 1.6826
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1821/2613 [22:48<09:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1822/2613 [22:49<09:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1823/2613 [22:50<09:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1824/2613 [22:50<09:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1825/2613 [22:51<09:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1826/2613 [22:52<09:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1827/2613 [22:53<09:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1828/2613 [22:53<09:59,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1829/2613 [22:54<09:55,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1830/2613 [22:55<09:53,  1.32it/s]

	Current Loss: 1.6785
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1831/2613 [22:56<09:52,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1832/2613 [22:56<09:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1833/2613 [22:57<09:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1834/2613 [22:58<09:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1835/2613 [22:59<09:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1836/2613 [23:00<09:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1837/2613 [23:00<09:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1838/2613 [23:01<09:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1839/2613 [23:02<09:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1840/2613 [23:03<09:41,  1.33it/s]

	Current Loss: 1.6795
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1841/2613 [23:03<09:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1842/2613 [23:04<09:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1843/2613 [23:05<09:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1844/2613 [23:06<09:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1845/2613 [23:06<09:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1846/2613 [23:07<09:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1847/2613 [23:08<09:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1848/2613 [23:09<09:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1849/2613 [23:09<09:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1850/2613 [23:10<09:34,  1.33it/s]

	Current Loss: 1.6793
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1851/2613 [23:11<09:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1852/2613 [23:12<09:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1853/2613 [23:12<09:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1854/2613 [23:13<09:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1855/2613 [23:14<09:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1856/2613 [23:15<09:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1857/2613 [23:15<09:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1858/2613 [23:16<09:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1859/2613 [23:17<09:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1860/2613 [23:18<09:26,  1.33it/s]

	Current Loss: 1.6772
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1861/2613 [23:18<09:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1862/2613 [23:19<09:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1863/2613 [23:20<09:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1864/2613 [23:21<09:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1865/2613 [23:21<09:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1866/2613 [23:22<09:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1867/2613 [23:23<09:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1868/2613 [23:24<09:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1869/2613 [23:24<09:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1870/2613 [23:25<09:17,  1.33it/s]

	Current Loss: 1.6761
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1871/2613 [23:26<09:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1872/2613 [23:27<09:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1873/2613 [23:27<09:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1874/2613 [23:28<09:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1875/2613 [23:29<09:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1876/2613 [23:30<09:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1877/2613 [23:30<09:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1878/2613 [23:31<09:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1879/2613 [23:32<09:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1880/2613 [23:33<09:11,  1.33it/s]

	Current Loss: 1.6819
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1881/2613 [23:33<09:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1882/2613 [23:34<09:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1883/2613 [23:35<09:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1884/2613 [23:36<09:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1885/2613 [23:36<09:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1886/2613 [23:37<09:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1887/2613 [23:38<09:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1888/2613 [23:39<09:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1889/2613 [23:39<09:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1890/2613 [23:40<09:02,  1.33it/s]

	Current Loss: 1.6759
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1891/2613 [23:41<09:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1892/2613 [23:42<09:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1893/2613 [23:42<09:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1894/2613 [23:43<09:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1895/2613 [23:44<09:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1896/2613 [23:45<08:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1897/2613 [23:45<08:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1898/2613 [23:46<08:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1899/2613 [23:47<08:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1900/2613 [23:48<08:55,  1.33it/s]

	Current Loss: 1.6834
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1901/2613 [23:48<08:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1902/2613 [23:49<08:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1903/2613 [23:50<08:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1904/2613 [23:51<08:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1905/2613 [23:51<08:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1906/2613 [23:52<08:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1907/2613 [23:53<08:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1908/2613 [23:54<08:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1909/2613 [23:54<08:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1910/2613 [23:55<08:57,  1.31it/s]

	Current Loss: 1.6765
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1911/2613 [23:56<08:54,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1912/2613 [23:57<08:51,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1913/2613 [23:57<08:49,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1914/2613 [23:58<08:47,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1915/2613 [23:59<08:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1916/2613 [24:00<08:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1917/2613 [24:00<08:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1918/2613 [24:01<08:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1919/2613 [24:02<08:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1920/2613 [24:03<08:42,  1.33it/s]

	Current Loss: 1.6703
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1921/2613 [24:03<08:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1922/2613 [24:04<08:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1923/2613 [24:05<08:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1924/2613 [24:06<08:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1925/2613 [24:06<08:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1926/2613 [24:07<08:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1927/2613 [24:08<08:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1928/2613 [24:09<08:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1929/2613 [24:09<08:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1930/2613 [24:10<08:33,  1.33it/s]

	Current Loss: 1.6771
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1931/2613 [24:11<08:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1932/2613 [24:12<08:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1933/2613 [24:12<08:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1934/2613 [24:13<08:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1935/2613 [24:14<08:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1936/2613 [24:15<08:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1937/2613 [24:15<08:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1938/2613 [24:16<08:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1939/2613 [24:17<08:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1940/2613 [24:18<08:26,  1.33it/s]

	Current Loss: 1.6709
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1941/2613 [24:19<08:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1942/2613 [24:19<08:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1943/2613 [24:20<08:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1944/2613 [24:21<08:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1945/2613 [24:22<08:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1946/2613 [24:22<08:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1947/2613 [24:23<08:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1948/2613 [24:24<08:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1949/2613 [24:25<08:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1950/2613 [24:25<08:18,  1.33it/s]

	Current Loss: 1.6802
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1951/2613 [24:26<08:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1952/2613 [24:27<08:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1953/2613 [24:28<08:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1954/2613 [24:28<08:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1955/2613 [24:29<08:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1956/2613 [24:30<08:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1957/2613 [24:31<08:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1958/2613 [24:31<08:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1959/2613 [24:32<08:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1960/2613 [24:33<08:11,  1.33it/s]

	Current Loss: 1.6762
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1961/2613 [24:34<08:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1962/2613 [24:34<08:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1963/2613 [24:35<08:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1964/2613 [24:36<08:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1965/2613 [24:37<08:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1966/2613 [24:37<08:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1967/2613 [24:38<08:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1968/2613 [24:39<08:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1969/2613 [24:40<08:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1970/2613 [24:40<08:03,  1.33it/s]

	Current Loss: 1.6759
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1971/2613 [24:41<08:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1972/2613 [24:42<08:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1973/2613 [24:43<08:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1974/2613 [24:43<08:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1975/2613 [24:44<08:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1976/2613 [24:45<07:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1977/2613 [24:46<07:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1978/2613 [24:46<07:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1979/2613 [24:47<07:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1980/2613 [24:48<07:56,  1.33it/s]

	Current Loss: 1.6752
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1981/2613 [24:49<07:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1982/2613 [24:49<07:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1983/2613 [24:50<07:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1984/2613 [24:51<07:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1985/2613 [24:52<07:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1986/2613 [24:52<07:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1987/2613 [24:53<07:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1988/2613 [24:54<07:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1989/2613 [24:55<07:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1990/2613 [24:55<07:48,  1.33it/s]

	Current Loss: 1.6680
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1991/2613 [24:56<07:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1992/2613 [24:57<07:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1993/2613 [24:58<07:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1994/2613 [24:58<07:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1995/2613 [24:59<07:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1996/2613 [25:00<07:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1997/2613 [25:01<07:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1998/2613 [25:01<07:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 1999/2613 [25:02<07:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2000/2613 [25:03<07:41,  1.33it/s]

	Current Loss: 1.6805
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2001/2613 [25:04<07:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2002/2613 [25:04<07:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2003/2613 [25:05<07:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2004/2613 [25:06<07:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2005/2613 [25:07<07:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2006/2613 [25:07<07:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2007/2613 [25:08<07:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2008/2613 [25:09<07:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2009/2613 [25:10<07:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2010/2613 [25:10<07:33,  1.33it/s]

	Current Loss: 1.6770
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2011/2613 [25:11<07:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2012/2613 [25:12<07:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2013/2613 [25:13<07:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2014/2613 [25:13<07:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2015/2613 [25:14<07:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2016/2613 [25:15<07:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2017/2613 [25:16<07:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2018/2613 [25:16<07:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2019/2613 [25:17<07:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2020/2613 [25:18<07:26,  1.33it/s]

	Current Loss: 1.6764
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2021/2613 [25:19<07:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2022/2613 [25:19<07:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2023/2613 [25:20<07:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2024/2613 [25:21<07:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2025/2613 [25:22<07:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2026/2613 [25:22<07:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2027/2613 [25:23<07:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2028/2613 [25:24<07:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2029/2613 [25:25<07:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2030/2613 [25:25<07:18,  1.33it/s]

	Current Loss: 1.6720
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2031/2613 [25:26<07:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2032/2613 [25:27<07:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2033/2613 [25:28<07:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2034/2613 [25:28<07:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2035/2613 [25:29<07:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2036/2613 [25:30<07:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2037/2613 [25:31<07:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2038/2613 [25:31<07:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2039/2613 [25:32<07:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2040/2613 [25:33<07:11,  1.33it/s]

	Current Loss: 1.6717
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2041/2613 [25:34<07:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2042/2613 [25:35<07:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2043/2613 [25:35<07:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2044/2613 [25:36<07:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2045/2613 [25:37<07:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2046/2613 [25:38<07:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2047/2613 [25:38<07:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2048/2613 [25:39<07:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2049/2613 [25:40<07:06,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2050/2613 [25:41<07:03,  1.33it/s]

	Current Loss: 1.6684
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2051/2613 [25:41<07:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2052/2613 [25:42<07:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2053/2613 [25:43<07:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2054/2613 [25:44<07:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2055/2613 [25:44<06:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2056/2613 [25:45<06:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2057/2613 [25:46<06:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2058/2613 [25:47<06:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2059/2613 [25:47<06:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2060/2613 [25:48<06:56,  1.33it/s]

	Current Loss: 1.6696
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2061/2613 [25:49<06:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2062/2613 [25:50<06:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2063/2613 [25:50<06:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2064/2613 [25:51<06:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2065/2613 [25:52<06:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2066/2613 [25:53<06:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2067/2613 [25:53<06:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2068/2613 [25:54<06:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2069/2613 [25:55<06:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2070/2613 [25:56<06:48,  1.33it/s]

	Current Loss: 1.6693
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2071/2613 [25:56<06:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2072/2613 [25:57<06:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2073/2613 [25:58<06:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2074/2613 [25:59<06:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2075/2613 [25:59<06:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2076/2613 [26:00<06:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2077/2613 [26:01<06:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2078/2613 [26:02<06:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2079/2613 [26:02<06:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2080/2613 [26:03<06:41,  1.33it/s]

	Current Loss: 1.6721
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2081/2613 [26:04<06:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2082/2613 [26:05<06:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2083/2613 [26:05<06:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2084/2613 [26:06<06:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2085/2613 [26:07<06:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2086/2613 [26:08<06:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2087/2613 [26:08<06:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2088/2613 [26:09<06:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2089/2613 [26:10<06:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2090/2613 [26:11<06:33,  1.33it/s]

	Current Loss: 1.6722
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2091/2613 [26:11<06:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2092/2613 [26:12<06:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2093/2613 [26:13<06:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2094/2613 [26:14<06:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2095/2613 [26:14<06:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2096/2613 [26:15<06:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2097/2613 [26:16<06:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2098/2613 [26:17<06:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2099/2613 [26:17<06:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2100/2613 [26:18<06:25,  1.33it/s]

	Current Loss: 1.6715
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2101/2613 [26:19<06:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2102/2613 [26:20<06:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2103/2613 [26:20<06:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2104/2613 [26:21<06:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2105/2613 [26:22<06:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2106/2613 [26:23<06:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2107/2613 [26:23<06:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2108/2613 [26:24<06:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2109/2613 [26:25<06:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2110/2613 [26:26<06:18,  1.33it/s]

	Current Loss: 1.6753
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2111/2613 [26:26<06:18,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2112/2613 [26:27<06:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2113/2613 [26:28<06:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2114/2613 [26:29<06:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2115/2613 [26:29<06:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2116/2613 [26:30<06:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2117/2613 [26:31<06:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2118/2613 [26:32<06:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2119/2613 [26:32<06:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2120/2613 [26:33<06:10,  1.33it/s]

	Current Loss: 1.6673
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2121/2613 [26:34<06:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2122/2613 [26:35<06:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2123/2613 [26:35<06:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2124/2613 [26:36<06:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2125/2613 [26:37<06:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2126/2613 [26:38<06:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2127/2613 [26:39<06:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2128/2613 [26:39<06:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2129/2613 [26:40<06:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2130/2613 [26:41<06:04,  1.33it/s]

	Current Loss: 1.6728
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2131/2613 [26:42<06:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2132/2613 [26:42<06:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2133/2613 [26:43<06:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2134/2613 [26:44<06:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2135/2613 [26:45<06:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2136/2613 [26:45<05:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2137/2613 [26:46<05:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2138/2613 [26:47<05:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2139/2613 [26:48<05:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2140/2613 [26:48<05:56,  1.33it/s]

	Current Loss: 1.6688
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2141/2613 [26:49<05:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2142/2613 [26:50<05:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2143/2613 [26:51<05:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2144/2613 [26:51<05:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2145/2613 [26:52<05:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2146/2613 [26:53<05:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2147/2613 [26:54<05:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2148/2613 [26:54<05:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2149/2613 [26:55<05:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2150/2613 [26:56<05:48,  1.33it/s]

	Current Loss: 1.6645
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2151/2613 [26:57<05:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2152/2613 [26:57<05:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2153/2613 [26:58<05:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2154/2613 [26:59<05:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2155/2613 [27:00<05:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2156/2613 [27:00<05:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2157/2613 [27:01<05:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2158/2613 [27:02<05:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2159/2613 [27:03<05:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2160/2613 [27:03<05:41,  1.33it/s]

	Current Loss: 1.6663
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2161/2613 [27:04<05:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2162/2613 [27:05<05:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2163/2613 [27:06<05:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2164/2613 [27:06<05:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2165/2613 [27:07<05:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2166/2613 [27:08<05:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2167/2613 [27:09<05:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2168/2613 [27:09<05:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2169/2613 [27:10<05:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2170/2613 [27:11<05:33,  1.33it/s]

	Current Loss: 1.6638
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2171/2613 [27:12<05:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2172/2613 [27:12<05:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2173/2613 [27:13<05:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2174/2613 [27:14<05:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2175/2613 [27:15<05:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2176/2613 [27:15<05:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2177/2613 [27:16<05:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2178/2613 [27:17<05:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2179/2613 [27:18<05:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2180/2613 [27:18<05:26,  1.33it/s]

	Current Loss: 1.6711
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2181/2613 [27:19<05:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2182/2613 [27:20<05:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2183/2613 [27:21<05:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2184/2613 [27:21<05:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2185/2613 [27:22<05:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2186/2613 [27:23<05:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2187/2613 [27:24<05:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2188/2613 [27:24<05:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2189/2613 [27:25<05:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2190/2613 [27:26<05:18,  1.33it/s]

	Current Loss: 1.6672
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2191/2613 [27:27<05:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2192/2613 [27:27<05:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2193/2613 [27:28<05:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2194/2613 [27:29<05:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2195/2613 [27:30<05:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2196/2613 [27:30<05:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2197/2613 [27:31<05:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2198/2613 [27:32<05:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2199/2613 [27:33<05:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2200/2613 [27:33<05:10,  1.33it/s]

	Current Loss: 1.6664
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2201/2613 [27:34<05:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2202/2613 [27:35<05:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2203/2613 [27:36<05:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2204/2613 [27:36<05:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2205/2613 [27:37<05:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2206/2613 [27:38<05:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2207/2613 [27:39<05:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2208/2613 [27:39<05:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2209/2613 [27:40<05:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2210/2613 [27:41<05:03,  1.33it/s]

	Current Loss: 1.6689
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2211/2613 [27:42<05:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2212/2613 [27:42<05:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2213/2613 [27:43<05:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2214/2613 [27:44<05:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2215/2613 [27:45<04:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2216/2613 [27:45<04:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2217/2613 [27:46<04:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2218/2613 [27:47<04:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2219/2613 [27:48<04:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2220/2613 [27:48<04:55,  1.33it/s]

	Current Loss: 1.6661
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2221/2613 [27:49<04:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2222/2613 [27:50<04:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2223/2613 [27:51<04:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2224/2613 [27:51<04:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2225/2613 [27:52<04:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2226/2613 [27:53<04:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2227/2613 [27:54<04:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2228/2613 [27:54<04:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2229/2613 [27:55<04:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2230/2613 [27:56<04:48,  1.33it/s]

	Current Loss: 1.6594
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2231/2613 [27:57<04:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2232/2613 [27:58<04:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2233/2613 [27:58<04:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2234/2613 [27:59<04:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2235/2613 [28:00<04:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2236/2613 [28:01<04:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2237/2613 [28:01<04:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2238/2613 [28:02<04:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2239/2613 [28:03<04:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2240/2613 [28:04<04:40,  1.33it/s]

	Current Loss: 1.6656
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2241/2613 [28:04<04:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2242/2613 [28:05<04:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2243/2613 [28:06<04:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2244/2613 [28:07<04:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2245/2613 [28:07<04:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2246/2613 [28:08<04:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2247/2613 [28:09<04:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2248/2613 [28:10<04:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2249/2613 [28:10<04:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2250/2613 [28:11<04:33,  1.33it/s]

	Current Loss: 1.6704
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2251/2613 [28:12<04:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2252/2613 [28:13<04:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2253/2613 [28:13<04:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2254/2613 [28:14<04:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2255/2613 [28:15<04:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2256/2613 [28:16<04:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2257/2613 [28:16<04:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2258/2613 [28:17<04:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2259/2613 [28:18<04:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2260/2613 [28:19<04:25,  1.33it/s]

	Current Loss: 1.6659
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2261/2613 [28:19<04:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2262/2613 [28:20<04:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2263/2613 [28:21<04:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2264/2613 [28:22<04:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2265/2613 [28:22<04:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2266/2613 [28:23<04:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2267/2613 [28:24<04:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2268/2613 [28:25<04:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2269/2613 [28:25<04:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2270/2613 [28:26<04:18,  1.33it/s]

	Current Loss: 1.6650
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2271/2613 [28:27<04:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2272/2613 [28:28<04:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2273/2613 [28:28<04:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2274/2613 [28:29<04:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2275/2613 [28:30<04:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2276/2613 [28:31<04:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2277/2613 [28:31<04:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2278/2613 [28:32<04:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2279/2613 [28:33<04:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2280/2613 [28:34<04:10,  1.33it/s]

	Current Loss: 1.6708
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2281/2613 [28:34<04:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2282/2613 [28:35<04:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2283/2613 [28:36<04:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2284/2613 [28:37<04:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2285/2613 [28:37<04:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2286/2613 [28:38<04:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2287/2613 [28:39<04:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2288/2613 [28:40<04:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2289/2613 [28:40<04:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2290/2613 [28:41<04:02,  1.33it/s]

	Current Loss: 1.6662
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2291/2613 [28:42<04:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2292/2613 [28:43<04:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2293/2613 [28:43<04:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2294/2613 [28:44<04:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2295/2613 [28:45<03:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2296/2613 [28:46<03:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2297/2613 [28:46<03:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2298/2613 [28:47<03:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2299/2613 [28:48<03:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2300/2613 [28:49<03:55,  1.33it/s]

	Current Loss: 1.6581
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2301/2613 [28:49<03:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2302/2613 [28:50<03:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2303/2613 [28:51<03:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2304/2613 [28:52<03:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2305/2613 [28:52<03:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2306/2613 [28:53<03:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2307/2613 [28:54<03:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2308/2613 [28:55<03:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2309/2613 [28:55<03:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2310/2613 [28:56<03:47,  1.33it/s]

	Current Loss: 1.6623
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2311/2613 [28:57<03:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2312/2613 [28:58<03:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2313/2613 [28:58<03:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2314/2613 [28:59<03:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2315/2613 [29:00<03:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2316/2613 [29:01<03:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2317/2613 [29:01<03:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2318/2613 [29:02<03:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2319/2613 [29:03<03:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2320/2613 [29:04<03:40,  1.33it/s]

	Current Loss: 1.6633
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2321/2613 [29:04<03:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2322/2613 [29:05<03:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2323/2613 [29:06<03:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2324/2613 [29:07<03:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2325/2613 [29:07<03:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2326/2613 [29:08<03:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2327/2613 [29:09<03:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2328/2613 [29:10<03:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2329/2613 [29:10<03:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2330/2613 [29:11<03:32,  1.33it/s]

	Current Loss: 1.6680
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2331/2613 [29:12<03:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2332/2613 [29:13<03:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2333/2613 [29:13<03:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2334/2613 [29:14<03:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2335/2613 [29:15<03:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2336/2613 [29:16<03:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2337/2613 [29:16<03:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2338/2613 [29:17<03:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2339/2613 [29:18<03:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2340/2613 [29:19<03:25,  1.33it/s]

	Current Loss: 1.6617
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2341/2613 [29:20<03:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2342/2613 [29:20<03:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2343/2613 [29:21<03:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2344/2613 [29:22<03:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2345/2613 [29:23<03:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2346/2613 [29:23<03:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2347/2613 [29:24<03:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2348/2613 [29:25<03:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2349/2613 [29:26<03:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2350/2613 [29:26<03:17,  1.33it/s]

	Current Loss: 1.6627
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2351/2613 [29:27<03:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2352/2613 [29:28<03:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2353/2613 [29:29<03:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2354/2613 [29:29<03:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2355/2613 [29:30<03:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2356/2613 [29:31<03:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2357/2613 [29:32<03:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2358/2613 [29:32<03:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2359/2613 [29:33<03:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2360/2613 [29:34<03:10,  1.33it/s]

	Current Loss: 1.6629
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2361/2613 [29:35<03:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2362/2613 [29:35<03:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2363/2613 [29:36<03:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2364/2613 [29:37<03:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2365/2613 [29:38<03:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2366/2613 [29:38<03:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2367/2613 [29:39<03:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2368/2613 [29:40<03:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2369/2613 [29:41<03:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2370/2613 [29:41<03:02,  1.33it/s]

	Current Loss: 1.6609
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2371/2613 [29:42<03:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2372/2613 [29:43<03:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2373/2613 [29:44<03:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2374/2613 [29:44<02:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2375/2613 [29:45<02:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2376/2613 [29:46<02:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2377/2613 [29:47<02:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2378/2613 [29:47<02:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2379/2613 [29:48<02:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2380/2613 [29:49<02:55,  1.33it/s]

	Current Loss: 1.6584
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2381/2613 [29:50<02:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2382/2613 [29:50<02:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2383/2613 [29:51<02:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2384/2613 [29:52<02:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2385/2613 [29:53<02:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2386/2613 [29:53<02:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2387/2613 [29:54<02:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2388/2613 [29:55<02:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2389/2613 [29:56<02:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2390/2613 [29:56<02:48,  1.33it/s]

	Current Loss: 1.6585
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2391/2613 [29:57<02:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2392/2613 [29:58<02:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2393/2613 [29:59<02:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2394/2613 [29:59<02:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2395/2613 [30:00<02:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2396/2613 [30:01<02:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2397/2613 [30:02<02:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2398/2613 [30:02<02:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2399/2613 [30:03<02:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2400/2613 [30:04<02:40,  1.33it/s]

	Current Loss: 1.6647
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2401/2613 [30:05<02:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2402/2613 [30:05<02:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2403/2613 [30:06<02:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2404/2613 [30:07<02:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2405/2613 [30:08<02:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2406/2613 [30:08<02:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2407/2613 [30:09<02:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2408/2613 [30:10<02:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2409/2613 [30:11<02:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2410/2613 [30:11<02:32,  1.33it/s]

	Current Loss: 1.6618
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2411/2613 [30:12<02:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2412/2613 [30:13<02:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2413/2613 [30:14<02:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2414/2613 [30:14<02:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2415/2613 [30:15<02:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2416/2613 [30:16<02:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2417/2613 [30:17<02:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2418/2613 [30:17<02:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2419/2613 [30:18<02:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2420/2613 [30:19<02:25,  1.33it/s]

	Current Loss: 1.6628
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2421/2613 [30:20<02:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2422/2613 [30:20<02:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2423/2613 [30:21<02:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2424/2613 [30:22<02:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2425/2613 [30:23<02:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2426/2613 [30:23<02:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2427/2613 [30:24<02:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2428/2613 [30:25<02:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2429/2613 [30:26<02:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2430/2613 [30:26<02:17,  1.33it/s]

	Current Loss: 1.6633
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2431/2613 [30:27<02:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2432/2613 [30:28<02:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2433/2613 [30:29<02:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2434/2613 [30:29<02:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2435/2613 [30:30<02:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2436/2613 [30:31<02:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2437/2613 [30:32<02:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2438/2613 [30:32<02:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2439/2613 [30:33<02:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2440/2613 [30:34<02:10,  1.33it/s]

	Current Loss: 1.6523
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2441/2613 [30:35<02:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2442/2613 [30:35<02:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2443/2613 [30:36<02:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2444/2613 [30:37<02:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2445/2613 [30:38<02:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2446/2613 [30:38<02:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2447/2613 [30:39<02:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2448/2613 [30:40<02:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2449/2613 [30:41<02:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2450/2613 [30:42<02:02,  1.33it/s]

	Current Loss: 1.6575
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2451/2613 [30:42<02:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2452/2613 [30:43<02:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2453/2613 [30:44<02:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2454/2613 [30:45<01:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2455/2613 [30:45<01:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2456/2613 [30:46<01:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2457/2613 [30:47<01:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2458/2613 [30:48<01:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2459/2613 [30:48<01:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2460/2613 [30:49<01:55,  1.33it/s]

	Current Loss: 1.6605
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2461/2613 [30:50<01:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2462/2613 [30:51<01:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2463/2613 [30:51<01:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2464/2613 [30:52<01:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2465/2613 [30:53<01:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2466/2613 [30:54<01:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2467/2613 [30:54<01:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2468/2613 [30:55<01:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2469/2613 [30:56<01:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2470/2613 [30:57<01:47,  1.33it/s]

	Current Loss: 1.6526
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2471/2613 [30:57<01:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2472/2613 [30:58<01:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2473/2613 [30:59<01:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2474/2613 [31:00<01:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2475/2613 [31:00<01:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2476/2613 [31:01<01:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2477/2613 [31:02<01:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2478/2613 [31:03<01:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2479/2613 [31:03<01:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2480/2613 [31:04<01:40,  1.33it/s]

	Current Loss: 1.6592
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2481/2613 [31:05<01:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2482/2613 [31:06<01:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2483/2613 [31:06<01:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2484/2613 [31:07<01:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2485/2613 [31:08<01:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2486/2613 [31:09<01:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2487/2613 [31:09<01:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2488/2613 [31:10<01:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2489/2613 [31:11<01:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2490/2613 [31:12<01:32,  1.33it/s]

	Current Loss: 1.6558
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2491/2613 [31:12<01:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2492/2613 [31:13<01:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2493/2613 [31:14<01:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2494/2613 [31:15<01:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2495/2613 [31:15<01:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2496/2613 [31:16<01:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2497/2613 [31:17<01:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2498/2613 [31:18<01:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2499/2613 [31:18<01:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2500/2613 [31:19<01:24,  1.33it/s]

	Current Loss: 1.6586
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2501/2613 [31:20<01:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2502/2613 [31:21<01:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2503/2613 [31:21<01:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2504/2613 [31:22<01:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2505/2613 [31:23<01:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2506/2613 [31:24<01:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2507/2613 [31:24<01:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2508/2613 [31:25<01:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2509/2613 [31:26<01:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2510/2613 [31:27<01:17,  1.33it/s]

	Current Loss: 1.6613
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2511/2613 [31:27<01:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2512/2613 [31:28<01:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2513/2613 [31:29<01:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2514/2613 [31:30<01:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2515/2613 [31:30<01:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2516/2613 [31:31<01:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2517/2613 [31:32<01:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2518/2613 [31:33<01:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2519/2613 [31:33<01:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2520/2613 [31:34<01:09,  1.33it/s]

	Current Loss: 1.6621
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2521/2613 [31:35<01:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2522/2613 [31:36<01:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2523/2613 [31:36<01:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2524/2613 [31:37<01:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2525/2613 [31:38<01:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2526/2613 [31:39<01:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2527/2613 [31:39<01:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2528/2613 [31:40<01:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2529/2613 [31:41<01:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2530/2613 [31:42<01:02,  1.33it/s]

	Current Loss: 1.6601
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2531/2613 [31:42<01:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2532/2613 [31:43<01:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2533/2613 [31:44<01:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2534/2613 [31:45<00:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2535/2613 [31:45<00:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2536/2613 [31:46<00:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2537/2613 [31:47<00:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2538/2613 [31:48<00:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2539/2613 [31:48<00:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2540/2613 [31:49<00:55,  1.33it/s]

	Current Loss: 1.6542
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2541/2613 [31:50<00:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2542/2613 [31:51<00:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2543/2613 [31:51<00:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2544/2613 [31:52<00:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2545/2613 [31:53<00:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2546/2613 [31:54<00:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2547/2613 [31:54<00:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2548/2613 [31:55<00:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2549/2613 [31:56<00:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2550/2613 [31:57<00:47,  1.33it/s]

	Current Loss: 1.6591
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2551/2613 [31:57<00:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2552/2613 [31:58<00:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2553/2613 [31:59<00:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2554/2613 [32:00<00:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2555/2613 [32:01<00:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2556/2613 [32:01<00:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2557/2613 [32:02<00:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2558/2613 [32:03<00:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2559/2613 [32:04<00:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2560/2613 [32:04<00:39,  1.33it/s]

	Current Loss: 1.6465
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2561/2613 [32:05<00:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2562/2613 [32:06<00:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2563/2613 [32:07<00:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2564/2613 [32:07<00:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2565/2613 [32:08<00:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2566/2613 [32:09<00:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2567/2613 [32:10<00:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2568/2613 [32:10<00:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2569/2613 [32:11<00:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2570/2613 [32:12<00:32,  1.33it/s]

	Current Loss: 1.6625
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2571/2613 [32:13<00:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2572/2613 [32:13<00:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2573/2613 [32:14<00:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2574/2613 [32:15<00:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2575/2613 [32:16<00:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2576/2613 [32:16<00:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2577/2613 [32:17<00:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2578/2613 [32:18<00:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2579/2613 [32:19<00:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2580/2613 [32:19<00:24,  1.33it/s]

	Current Loss: 1.6522
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2581/2613 [32:20<00:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2582/2613 [32:21<00:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2583/2613 [32:22<00:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2584/2613 [32:22<00:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2585/2613 [32:23<00:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2586/2613 [32:24<00:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2587/2613 [32:25<00:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2588/2613 [32:25<00:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2589/2613 [32:26<00:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2590/2613 [32:27<00:17,  1.33it/s]

	Current Loss: 1.6574
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2591/2613 [32:28<00:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2592/2613 [32:28<00:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2593/2613 [32:29<00:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2594/2613 [32:30<00:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2595/2613 [32:31<00:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2596/2613 [32:31<00:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2597/2613 [32:32<00:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2598/2613 [32:33<00:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2599/2613 [32:34<00:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2600/2613 [32:34<00:09,  1.33it/s]

	Current Loss: 1.6535
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2601/2613 [32:35<00:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2602/2613 [32:36<00:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2603/2613 [32:37<00:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2604/2613 [32:37<00:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2605/2613 [32:38<00:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2606/2613 [32:39<00:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2607/2613 [32:40<00:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2608/2613 [32:40<00:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2609/2613 [32:41<00:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2610/2613 [32:42<00:02,  1.33it/s]

	Current Loss: 1.6565
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2611/2613 [32:43<00:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2612/2613 [32:43<00:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|██████████| 2613/2613 [32:44<00:00,  1.33it/s]


Epoch 3, Train Loss: 1.7028, Time: 1964.71s


  0%|          | 0/2613 [00:00<?, ?it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 1/2613 [00:00<14:52,  2.93it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 2/2613 [00:01<25:27,  1.71it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 3/2613 [00:01<28:47,  1.51it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 4/2613 [00:02<30:20,  1.43it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 5/2613 [00:03<31:09,  1.40it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 6/2613 [00:04<31:38,  1.37it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 7/2613 [00:04<31:58,  1.36it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 8/2613 [00:05<32:12,  1.35it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 9/2613 [00:06<32:16,  1.34it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 10/2613 [00:07<32:23,  1.34it/s]

	Current Loss: 1.6569
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 11/2613 [00:07<32:27,  1.34it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 12/2613 [00:08<32:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  0%|          | 13/2613 [00:09<32:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 14/2613 [00:10<32:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 15/2613 [00:10<32:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 16/2613 [00:11<32:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 17/2613 [00:12<32:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 18/2613 [00:13<32:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 19/2613 [00:13<32:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 20/2613 [00:14<32:28,  1.33it/s]

	Current Loss: 1.6549
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 21/2613 [00:15<32:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 22/2613 [00:16<32:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 23/2613 [00:16<32:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 24/2613 [00:17<32:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 25/2613 [00:18<32:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 26/2613 [00:19<32:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 27/2613 [00:19<32:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 28/2613 [00:20<32:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 29/2613 [00:21<32:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 30/2613 [00:22<32:23,  1.33it/s]

	Current Loss: 1.6458
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 31/2613 [00:22<32:50,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|          | 32/2613 [00:23<32:40,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 33/2613 [00:24<32:34,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 34/2613 [00:25<32:30,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 35/2613 [00:25<32:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 36/2613 [00:26<32:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 37/2613 [00:27<32:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 38/2613 [00:28<32:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  1%|▏         | 39/2613 [00:28<32:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 40/2613 [00:29<32:17,  1.33it/s]

	Current Loss: 1.6504
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 41/2613 [00:30<32:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 42/2613 [00:31<32:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 43/2613 [00:31<32:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 44/2613 [00:32<32:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 45/2613 [00:33<32:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 46/2613 [00:34<32:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 47/2613 [00:34<32:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 48/2613 [00:35<32:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 49/2613 [00:36<32:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 50/2613 [00:37<32:10,  1.33it/s]

	Current Loss: 1.6503
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 51/2613 [00:37<32:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 52/2613 [00:38<32:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 53/2613 [00:39<32:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 54/2613 [00:40<32:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 55/2613 [00:41<32:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 56/2613 [00:41<32:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 57/2613 [00:42<32:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 58/2613 [00:43<32:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 59/2613 [00:44<32:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 60/2613 [00:44<32:01,  1.33it/s]

	Current Loss: 1.6539
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 61/2613 [00:45<32:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 62/2613 [00:46<32:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 63/2613 [00:47<32:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 64/2613 [00:47<31:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  2%|▏         | 65/2613 [00:48<31:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 66/2613 [00:49<32:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 67/2613 [00:50<31:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 68/2613 [00:50<31:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 69/2613 [00:51<31:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 70/2613 [00:52<31:51,  1.33it/s]

	Current Loss: 1.6553
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 71/2613 [00:53<31:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 72/2613 [00:53<31:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 73/2613 [00:54<31:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 74/2613 [00:55<31:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 75/2613 [00:56<31:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 76/2613 [00:56<31:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 77/2613 [00:57<31:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 78/2613 [00:58<31:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 79/2613 [00:59<31:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 80/2613 [00:59<31:47,  1.33it/s]

	Current Loss: 1.6579
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 81/2613 [01:00<31:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 82/2613 [01:01<31:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 83/2613 [01:02<31:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 84/2613 [01:02<31:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 85/2613 [01:03<31:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 86/2613 [01:04<31:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 87/2613 [01:05<31:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 88/2613 [01:05<31:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 89/2613 [01:06<31:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 90/2613 [01:07<31:39,  1.33it/s]

	Current Loss: 1.6524
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  3%|▎         | 91/2613 [01:08<31:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 92/2613 [01:08<31:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 93/2613 [01:09<31:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 94/2613 [01:10<31:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 95/2613 [01:11<31:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 96/2613 [01:11<31:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▎         | 97/2613 [01:12<31:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 98/2613 [01:13<31:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 99/2613 [01:14<31:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 100/2613 [01:14<31:32,  1.33it/s]

	Current Loss: 1.6595
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 101/2613 [01:15<31:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 102/2613 [01:16<31:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 103/2613 [01:17<31:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 104/2613 [01:17<31:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 105/2613 [01:18<31:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 106/2613 [01:19<31:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 107/2613 [01:20<31:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 108/2613 [01:20<31:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 109/2613 [01:21<31:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 110/2613 [01:22<31:23,  1.33it/s]

	Current Loss: 1.6466
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 111/2613 [01:23<31:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 112/2613 [01:23<31:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 113/2613 [01:24<31:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 114/2613 [01:25<31:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 115/2613 [01:26<31:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 116/2613 [01:26<31:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  4%|▍         | 117/2613 [01:27<31:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 118/2613 [01:28<31:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 119/2613 [01:29<31:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 120/2613 [01:29<31:13,  1.33it/s]

	Current Loss: 1.6489
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 121/2613 [01:30<31:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 122/2613 [01:31<31:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 123/2613 [01:32<31:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 124/2613 [01:32<31:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 125/2613 [01:33<31:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 126/2613 [01:34<31:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 127/2613 [01:35<31:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 128/2613 [01:35<31:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 129/2613 [01:36<31:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▍         | 130/2613 [01:37<31:07,  1.33it/s]

	Current Loss: 1.6530
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 131/2613 [01:38<31:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 132/2613 [01:38<31:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 133/2613 [01:39<31:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 134/2613 [01:40<31:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 135/2613 [01:41<31:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 136/2613 [01:41<31:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 137/2613 [01:42<31:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 138/2613 [01:43<31:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 139/2613 [01:44<31:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 140/2613 [01:44<31:00,  1.33it/s]

	Current Loss: 1.6522
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 141/2613 [01:45<31:05,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 142/2613 [01:46<30:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  5%|▌         | 143/2613 [01:47<30:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 144/2613 [01:47<30:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 145/2613 [01:48<30:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 146/2613 [01:49<30:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 147/2613 [01:50<30:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 148/2613 [01:50<30:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 149/2613 [01:51<30:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 150/2613 [01:52<30:53,  1.33it/s]

	Current Loss: 1.6473
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 151/2613 [01:53<30:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 152/2613 [01:54<30:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 153/2613 [01:54<30:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 154/2613 [01:55<30:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 155/2613 [01:56<30:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 156/2613 [01:57<30:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 157/2613 [01:57<30:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 158/2613 [01:58<30:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 159/2613 [01:59<30:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 160/2613 [02:00<30:43,  1.33it/s]

	Current Loss: 1.6481
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 161/2613 [02:00<30:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 162/2613 [02:01<30:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▌         | 163/2613 [02:02<30:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 164/2613 [02:03<30:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 165/2613 [02:03<30:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 166/2613 [02:04<30:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 167/2613 [02:05<30:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 168/2613 [02:06<30:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  6%|▋         | 169/2613 [02:06<30:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 170/2613 [02:07<30:38,  1.33it/s]

	Current Loss: 1.6504
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 171/2613 [02:08<30:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 172/2613 [02:09<30:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 173/2613 [02:09<30:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 174/2613 [02:10<30:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 175/2613 [02:11<30:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 176/2613 [02:12<30:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 177/2613 [02:12<30:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 178/2613 [02:13<30:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 179/2613 [02:14<30:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 180/2613 [02:15<30:30,  1.33it/s]

	Current Loss: 1.6467
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 181/2613 [02:15<30:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 182/2613 [02:16<30:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 183/2613 [02:17<30:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 184/2613 [02:18<30:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 185/2613 [02:18<30:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 186/2613 [02:19<30:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 187/2613 [02:20<30:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 188/2613 [02:21<30:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 189/2613 [02:21<30:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 190/2613 [02:22<30:22,  1.33it/s]

	Current Loss: 1.6485
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 191/2613 [02:23<30:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 192/2613 [02:24<30:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 193/2613 [02:24<30:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 194/2613 [02:25<30:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  7%|▋         | 195/2613 [02:26<30:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 196/2613 [02:27<30:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 197/2613 [02:27<30:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 198/2613 [02:28<30:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 199/2613 [02:29<30:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 200/2613 [02:30<30:15,  1.33it/s]

	Current Loss: 1.6508
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 201/2613 [02:30<30:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 202/2613 [02:31<30:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 203/2613 [02:32<30:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 204/2613 [02:33<30:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 205/2613 [02:33<30:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 206/2613 [02:34<30:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 207/2613 [02:35<30:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 208/2613 [02:36<30:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 209/2613 [02:36<30:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 210/2613 [02:37<30:07,  1.33it/s]

	Current Loss: 1.6439
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 211/2613 [02:38<30:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 212/2613 [02:39<30:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 213/2613 [02:39<30:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 214/2613 [02:40<30:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 215/2613 [02:41<30:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 216/2613 [02:42<30:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 217/2613 [02:42<30:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 218/2613 [02:43<30:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 219/2613 [02:44<30:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 220/2613 [02:45<29:59,  1.33it/s]

	Current Loss: 1.6449
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 221/2613 [02:45<29:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  8%|▊         | 222/2613 [02:46<29:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 223/2613 [02:47<29:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 224/2613 [02:48<29:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 225/2613 [02:48<29:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 226/2613 [02:49<29:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 227/2613 [02:50<29:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▊         | 228/2613 [02:51<29:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 229/2613 [02:51<29:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 230/2613 [02:52<29:51,  1.33it/s]

	Current Loss: 1.6446
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 231/2613 [02:53<29:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 232/2613 [02:54<29:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 233/2613 [02:54<29:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 234/2613 [02:55<29:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 235/2613 [02:56<29:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 236/2613 [02:57<29:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 237/2613 [02:57<29:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 238/2613 [02:58<29:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 239/2613 [02:59<29:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 240/2613 [03:00<29:43,  1.33it/s]

	Current Loss: 1.6528
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 241/2613 [03:00<29:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 242/2613 [03:01<29:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 243/2613 [03:02<29:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 244/2613 [03:03<29:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 245/2613 [03:03<29:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 246/2613 [03:04<29:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 247/2613 [03:05<29:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


  9%|▉         | 248/2613 [03:06<29:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 249/2613 [03:06<29:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 250/2613 [03:07<29:34,  1.33it/s]

	Current Loss: 1.6464
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 251/2613 [03:08<29:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 252/2613 [03:09<29:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 253/2613 [03:09<29:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 254/2613 [03:10<29:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 255/2613 [03:11<29:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 256/2613 [03:12<29:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 257/2613 [03:12<29:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 258/2613 [03:13<29:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 259/2613 [03:14<29:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 260/2613 [03:15<29:30,  1.33it/s]

	Current Loss: 1.6398
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|▉         | 261/2613 [03:15<29:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 262/2613 [03:16<29:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 263/2613 [03:17<29:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 264/2613 [03:18<29:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 265/2613 [03:18<29:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 266/2613 [03:19<29:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 267/2613 [03:20<29:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 268/2613 [03:21<29:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 269/2613 [03:22<29:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 270/2613 [03:22<29:19,  1.33it/s]

	Current Loss: 1.6436
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 271/2613 [03:23<29:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 272/2613 [03:24<29:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 273/2613 [03:25<29:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 10%|█         | 274/2613 [03:25<29:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 275/2613 [03:26<29:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 276/2613 [03:27<29:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 277/2613 [03:28<29:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 278/2613 [03:28<29:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 279/2613 [03:29<29:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 280/2613 [03:30<29:15,  1.33it/s]

	Current Loss: 1.6417
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 281/2613 [03:31<29:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 282/2613 [03:31<29:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 283/2613 [03:32<29:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 284/2613 [03:33<29:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 285/2613 [03:34<29:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 286/2613 [03:34<29:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 287/2613 [03:35<29:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 288/2613 [03:36<29:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 289/2613 [03:37<29:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 290/2613 [03:37<29:05,  1.33it/s]

	Current Loss: 1.6453
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 291/2613 [03:38<29:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 292/2613 [03:39<29:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█         | 293/2613 [03:40<29:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 294/2613 [03:40<29:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 295/2613 [03:41<29:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 296/2613 [03:42<29:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 297/2613 [03:43<29:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 298/2613 [03:43<29:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 299/2613 [03:44<29:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 11%|█▏        | 300/2613 [03:45<28:59,  1.33it/s]

	Current Loss: 1.6429
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 301/2613 [03:46<28:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 302/2613 [03:46<28:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 303/2613 [03:47<28:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 304/2613 [03:48<28:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 305/2613 [03:49<28:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 306/2613 [03:49<28:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 307/2613 [03:50<28:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 308/2613 [03:51<28:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 309/2613 [03:52<28:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 310/2613 [03:52<28:50,  1.33it/s]

	Current Loss: 1.6467
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 311/2613 [03:53<28:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 312/2613 [03:54<28:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 313/2613 [03:55<28:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 314/2613 [03:55<28:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 315/2613 [03:56<28:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 316/2613 [03:57<28:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 317/2613 [03:58<28:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 318/2613 [03:58<28:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 319/2613 [03:59<28:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 320/2613 [04:00<28:45,  1.33it/s]

	Current Loss: 1.6458
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 321/2613 [04:01<28:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 322/2613 [04:01<28:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 323/2613 [04:02<28:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 324/2613 [04:03<28:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 325/2613 [04:04<28:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 12%|█▏        | 326/2613 [04:04<28:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 327/2613 [04:05<28:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 328/2613 [04:06<28:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 329/2613 [04:07<28:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 330/2613 [04:07<28:35,  1.33it/s]

	Current Loss: 1.6471
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 331/2613 [04:08<28:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 332/2613 [04:09<28:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 333/2613 [04:10<28:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 334/2613 [04:10<28:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 335/2613 [04:11<28:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 336/2613 [04:12<28:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 337/2613 [04:13<28:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 338/2613 [04:13<28:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 339/2613 [04:14<28:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 340/2613 [04:15<28:28,  1.33it/s]

	Current Loss: 1.6432
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 341/2613 [04:16<28:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 342/2613 [04:16<28:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 343/2613 [04:17<28:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 344/2613 [04:18<28:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 345/2613 [04:19<28:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 346/2613 [04:19<28:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 347/2613 [04:20<28:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 348/2613 [04:21<28:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 349/2613 [04:22<28:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 350/2613 [04:22<28:22,  1.33it/s]

	Current Loss: 1.6450
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 351/2613 [04:23<28:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 13%|█▎        | 352/2613 [04:24<28:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 353/2613 [04:25<28:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 354/2613 [04:25<28:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 355/2613 [04:26<28:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 356/2613 [04:27<28:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 357/2613 [04:28<28:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 358/2613 [04:28<28:45,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▎        | 359/2613 [04:29<28:34,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 360/2613 [04:30<28:29,  1.32it/s]

	Current Loss: 1.6457
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 361/2613 [04:31<28:23,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 362/2613 [04:31<28:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 363/2613 [04:32<28:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 364/2613 [04:33<28:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 365/2613 [04:34<28:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 366/2613 [04:34<28:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 367/2613 [04:35<28:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 368/2613 [04:36<28:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 369/2613 [04:37<28:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 370/2613 [04:37<28:06,  1.33it/s]

	Current Loss: 1.6414
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 371/2613 [04:38<28:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 372/2613 [04:39<28:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 373/2613 [04:40<28:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 374/2613 [04:40<28:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 375/2613 [04:41<28:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 376/2613 [04:42<28:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 377/2613 [04:43<28:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 14%|█▍        | 378/2613 [04:43<28:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 379/2613 [04:44<27:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 380/2613 [04:45<28:03,  1.33it/s]

	Current Loss: 1.6457
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 381/2613 [04:46<27:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 382/2613 [04:47<27:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 383/2613 [04:47<27:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 384/2613 [04:48<27:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 385/2613 [04:49<27:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 386/2613 [04:50<27:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 387/2613 [04:50<27:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 388/2613 [04:51<27:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 389/2613 [04:52<27:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 390/2613 [04:53<27:54,  1.33it/s]

	Current Loss: 1.6516
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▍        | 391/2613 [04:53<27:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 392/2613 [04:54<27:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 393/2613 [04:55<27:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 394/2613 [04:56<27:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 395/2613 [04:56<27:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 396/2613 [04:57<27:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 397/2613 [04:58<27:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 398/2613 [04:59<27:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 399/2613 [04:59<27:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 400/2613 [05:00<27:44,  1.33it/s]

	Current Loss: 1.6413
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 401/2613 [05:01<27:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 402/2613 [05:02<27:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 403/2613 [05:02<27:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 404/2613 [05:03<27:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 15%|█▌        | 405/2613 [05:04<27:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 406/2613 [05:05<27:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 407/2613 [05:05<27:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 408/2613 [05:06<27:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 409/2613 [05:07<27:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 410/2613 [05:08<27:36,  1.33it/s]

	Current Loss: 1.6435
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 411/2613 [05:08<27:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 412/2613 [05:09<27:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 413/2613 [05:10<27:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 414/2613 [05:11<27:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 415/2613 [05:11<27:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 416/2613 [05:12<27:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 417/2613 [05:13<27:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 418/2613 [05:14<27:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 419/2613 [05:14<27:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 420/2613 [05:15<27:28,  1.33it/s]

	Current Loss: 1.6449
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 421/2613 [05:16<27:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 422/2613 [05:17<27:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 423/2613 [05:17<27:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▌        | 424/2613 [05:18<27:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 425/2613 [05:19<27:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 426/2613 [05:20<27:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 427/2613 [05:20<27:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 428/2613 [05:21<27:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 429/2613 [05:22<27:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 430/2613 [05:23<27:20,  1.33it/s]

	Current Loss: 1.6424
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 16%|█▋        | 431/2613 [05:23<27:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 432/2613 [05:24<27:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 433/2613 [05:25<27:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 434/2613 [05:26<27:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 435/2613 [05:26<27:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 436/2613 [05:27<27:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 437/2613 [05:28<27:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 438/2613 [05:29<27:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 439/2613 [05:29<27:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 440/2613 [05:30<27:13,  1.33it/s]

	Current Loss: 1.6368
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 441/2613 [05:31<27:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 442/2613 [05:32<27:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 443/2613 [05:32<27:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 444/2613 [05:33<27:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 445/2613 [05:34<27:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 446/2613 [05:35<27:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 447/2613 [05:35<27:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 448/2613 [05:36<27:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 449/2613 [05:37<27:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 450/2613 [05:38<27:05,  1.33it/s]

	Current Loss: 1.6393
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 451/2613 [05:38<27:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 452/2613 [05:39<27:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 453/2613 [05:40<27:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 454/2613 [05:41<27:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 455/2613 [05:41<27:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 456/2613 [05:42<27:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 17%|█▋        | 457/2613 [05:43<27:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 458/2613 [05:44<26:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 459/2613 [05:44<26:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 460/2613 [05:45<26:56,  1.33it/s]

	Current Loss: 1.6374
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 461/2613 [05:46<26:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 462/2613 [05:47<26:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 463/2613 [05:47<26:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 464/2613 [05:48<26:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 465/2613 [05:49<26:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 466/2613 [05:50<26:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 467/2613 [05:50<26:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 468/2613 [05:51<26:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 469/2613 [05:52<26:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 470/2613 [05:53<26:53,  1.33it/s]

	Current Loss: 1.6367
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 471/2613 [05:53<26:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 472/2613 [05:54<26:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 473/2613 [05:55<26:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 474/2613 [05:56<26:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 475/2613 [05:56<26:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 476/2613 [05:57<26:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 477/2613 [05:58<26:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 478/2613 [05:59<26:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 479/2613 [05:59<26:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 480/2613 [06:00<26:44,  1.33it/s]

	Current Loss: 1.6393
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 481/2613 [06:01<26:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 482/2613 [06:02<26:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 18%|█▊        | 483/2613 [06:02<26:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 484/2613 [06:03<26:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 485/2613 [06:04<26:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 486/2613 [06:05<26:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 487/2613 [06:05<26:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 488/2613 [06:06<26:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▊        | 489/2613 [06:07<26:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 490/2613 [06:08<26:35,  1.33it/s]

	Current Loss: 1.6411
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 491/2613 [06:08<26:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 492/2613 [06:09<26:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 493/2613 [06:10<26:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 494/2613 [06:11<26:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 495/2613 [06:11<26:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 496/2613 [06:12<26:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 497/2613 [06:13<26:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 498/2613 [06:14<26:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 499/2613 [06:14<26:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 500/2613 [06:15<26:28,  1.33it/s]

	Current Loss: 1.6401
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 501/2613 [06:16<26:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 502/2613 [06:17<26:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 503/2613 [06:17<26:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 504/2613 [06:18<26:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 505/2613 [06:19<26:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 506/2613 [06:20<26:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 507/2613 [06:20<26:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 508/2613 [06:21<26:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 19%|█▉        | 509/2613 [06:22<26:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 510/2613 [06:23<26:22,  1.33it/s]

	Current Loss: 1.6417
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 511/2613 [06:23<26:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 512/2613 [06:24<26:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 513/2613 [06:25<26:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 514/2613 [06:26<26:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 515/2613 [06:26<26:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 516/2613 [06:27<26:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 517/2613 [06:28<26:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 518/2613 [06:29<26:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 519/2613 [06:29<26:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 520/2613 [06:30<26:13,  1.33it/s]

	Current Loss: 1.6415
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 521/2613 [06:31<26:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|█▉        | 522/2613 [06:32<26:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 523/2613 [06:33<26:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 524/2613 [06:33<26:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 525/2613 [06:34<26:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 526/2613 [06:35<26:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 527/2613 [06:36<26:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 528/2613 [06:36<26:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 529/2613 [06:37<26:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 530/2613 [06:38<26:08,  1.33it/s]

	Current Loss: 1.6405
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 531/2613 [06:39<26:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 532/2613 [06:39<26:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 533/2613 [06:40<26:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 534/2613 [06:41<26:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 20%|██        | 535/2613 [06:42<26:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 536/2613 [06:42<26:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 537/2613 [06:43<25:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 538/2613 [06:44<26:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 539/2613 [06:45<26:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 540/2613 [06:45<25:56,  1.33it/s]

	Current Loss: 1.6402
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 541/2613 [06:46<25:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 542/2613 [06:47<25:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 543/2613 [06:48<25:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 544/2613 [06:48<25:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 545/2613 [06:49<25:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 546/2613 [06:50<25:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 547/2613 [06:51<25:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 548/2613 [06:51<25:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 549/2613 [06:52<25:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 550/2613 [06:53<25:50,  1.33it/s]

	Current Loss: 1.6336
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 551/2613 [06:54<25:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 552/2613 [06:54<25:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 553/2613 [06:55<25:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 554/2613 [06:56<25:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██        | 555/2613 [06:57<25:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 556/2613 [06:57<25:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 557/2613 [06:58<25:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 558/2613 [06:59<25:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 559/2613 [07:00<25:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 560/2613 [07:00<25:41,  1.33it/s]

	Current Loss: 1.6360
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 21%|██▏       | 561/2613 [07:01<25:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 562/2613 [07:02<25:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 563/2613 [07:03<25:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 564/2613 [07:03<25:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 565/2613 [07:04<25:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 566/2613 [07:05<25:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 567/2613 [07:06<25:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 568/2613 [07:06<25:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 569/2613 [07:07<25:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 570/2613 [07:08<25:35,  1.33it/s]

	Current Loss: 1.6335
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 571/2613 [07:09<25:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 572/2613 [07:09<25:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 573/2613 [07:10<25:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 574/2613 [07:11<25:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 575/2613 [07:12<25:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 576/2613 [07:12<25:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 577/2613 [07:13<25:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 578/2613 [07:14<25:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 579/2613 [07:15<25:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 580/2613 [07:15<25:26,  1.33it/s]

	Current Loss: 1.6421
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 581/2613 [07:16<25:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 582/2613 [07:17<25:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 583/2613 [07:18<25:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 584/2613 [07:18<25:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 585/2613 [07:19<25:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 586/2613 [07:20<25:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 22%|██▏       | 587/2613 [07:21<25:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 588/2613 [07:21<25:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 589/2613 [07:22<25:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 590/2613 [07:23<25:21,  1.33it/s]

	Current Loss: 1.6402
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 591/2613 [07:24<25:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 592/2613 [07:24<25:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 593/2613 [07:25<25:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 594/2613 [07:26<25:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 595/2613 [07:27<25:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 596/2613 [07:27<25:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 597/2613 [07:28<25:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 598/2613 [07:29<25:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 599/2613 [07:30<25:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 600/2613 [07:30<25:15,  1.33it/s]

	Current Loss: 1.6333
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 601/2613 [07:31<25:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 602/2613 [07:32<25:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 603/2613 [07:33<25:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 604/2613 [07:33<25:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 605/2613 [07:34<25:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 606/2613 [07:35<25:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 607/2613 [07:36<25:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 608/2613 [07:36<25:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 609/2613 [07:37<25:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 610/2613 [07:38<25:03,  1.33it/s]

	Current Loss: 1.6309
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 611/2613 [07:39<25:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 612/2613 [07:39<25:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 613/2613 [07:40<25:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 23%|██▎       | 614/2613 [07:41<25:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 615/2613 [07:42<25:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 616/2613 [07:42<25:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 617/2613 [07:43<25:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 618/2613 [07:44<25:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 619/2613 [07:45<24:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▎       | 620/2613 [07:45<24:59,  1.33it/s]

	Current Loss: 1.6323
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 621/2613 [07:46<24:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 622/2613 [07:47<24:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 623/2613 [07:48<24:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 624/2613 [07:48<24:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 625/2613 [07:49<24:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 626/2613 [07:50<24:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 627/2613 [07:51<24:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 628/2613 [07:51<24:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 629/2613 [07:52<24:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 630/2613 [07:53<24:50,  1.33it/s]

	Current Loss: 1.6338
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 631/2613 [07:54<24:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 632/2613 [07:54<24:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 633/2613 [07:55<24:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 634/2613 [07:56<24:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 635/2613 [07:57<24:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 636/2613 [07:57<24:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 637/2613 [07:58<24:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 638/2613 [07:59<24:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 639/2613 [08:00<24:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 24%|██▍       | 640/2613 [08:00<24:43,  1.33it/s]

	Current Loss: 1.6301
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 641/2613 [08:01<24:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 642/2613 [08:02<24:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 643/2613 [08:03<24:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 644/2613 [08:03<24:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 645/2613 [08:04<24:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 646/2613 [08:05<24:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 647/2613 [08:06<24:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 648/2613 [08:06<24:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 649/2613 [08:07<24:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 650/2613 [08:08<24:36,  1.33it/s]

	Current Loss: 1.6358
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 651/2613 [08:09<24:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 652/2613 [08:09<24:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▍       | 653/2613 [08:10<24:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 654/2613 [08:11<24:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 655/2613 [08:12<24:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 656/2613 [08:12<24:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 657/2613 [08:13<24:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 658/2613 [08:14<24:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 659/2613 [08:15<24:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 660/2613 [08:16<24:27,  1.33it/s]

	Current Loss: 1.6370
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 661/2613 [08:16<24:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 662/2613 [08:17<24:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 663/2613 [08:18<24:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 664/2613 [08:19<24:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 665/2613 [08:19<24:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 25%|██▌       | 666/2613 [08:20<24:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 667/2613 [08:21<24:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 668/2613 [08:22<24:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 669/2613 [08:22<24:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 670/2613 [08:23<24:20,  1.33it/s]

	Current Loss: 1.6303
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 671/2613 [08:24<24:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 672/2613 [08:25<24:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 673/2613 [08:25<24:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 674/2613 [08:26<24:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 675/2613 [08:27<24:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 676/2613 [08:28<24:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 677/2613 [08:28<24:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 678/2613 [08:29<24:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 679/2613 [08:30<24:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 680/2613 [08:31<24:14,  1.33it/s]

	Current Loss: 1.6331
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 681/2613 [08:31<24:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 682/2613 [08:32<24:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 683/2613 [08:33<24:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 684/2613 [08:34<24:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▌       | 685/2613 [08:34<24:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 686/2613 [08:35<24:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 687/2613 [08:36<24:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 688/2613 [08:37<24:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 689/2613 [08:37<24:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 690/2613 [08:38<24:06,  1.33it/s]

	Current Loss: 1.6272
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 691/2613 [08:39<24:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 26%|██▋       | 692/2613 [08:40<24:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 693/2613 [08:40<24:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 694/2613 [08:41<24:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 695/2613 [08:42<24:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 696/2613 [08:43<24:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 697/2613 [08:43<24:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 698/2613 [08:44<24:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 699/2613 [08:45<23:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 700/2613 [08:46<23:58,  1.33it/s]

	Current Loss: 1.6300
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 701/2613 [08:46<23:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 702/2613 [08:47<23:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 703/2613 [08:48<23:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 704/2613 [08:49<23:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 705/2613 [08:49<23:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 706/2613 [08:50<23:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 707/2613 [08:51<23:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 708/2613 [08:52<23:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 709/2613 [08:52<23:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 710/2613 [08:53<23:49,  1.33it/s]

	Current Loss: 1.6331
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 711/2613 [08:54<23:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 712/2613 [08:55<23:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 713/2613 [08:55<23:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 714/2613 [08:56<23:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 715/2613 [08:57<23:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 716/2613 [08:58<23:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 717/2613 [08:58<23:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 27%|██▋       | 718/2613 [08:59<23:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 719/2613 [09:00<23:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 720/2613 [09:01<23:44,  1.33it/s]

	Current Loss: 1.6295
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 721/2613 [09:01<23:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 722/2613 [09:02<23:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 723/2613 [09:03<23:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 724/2613 [09:04<23:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 725/2613 [09:04<23:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 726/2613 [09:05<23:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 727/2613 [09:06<23:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 728/2613 [09:07<23:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 729/2613 [09:07<23:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 730/2613 [09:08<23:36,  1.33it/s]

	Current Loss: 1.6322
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 731/2613 [09:09<23:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 732/2613 [09:10<23:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 733/2613 [09:10<23:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 734/2613 [09:11<23:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 735/2613 [09:12<23:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 736/2613 [09:13<23:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 737/2613 [09:13<23:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 738/2613 [09:14<23:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 739/2613 [09:15<23:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 740/2613 [09:16<23:27,  1.33it/s]

	Current Loss: 1.6300
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 741/2613 [09:16<23:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 742/2613 [09:17<23:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 743/2613 [09:18<23:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 28%|██▊       | 744/2613 [09:19<23:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 745/2613 [09:19<23:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 746/2613 [09:20<23:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 747/2613 [09:21<23:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 748/2613 [09:22<23:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 749/2613 [09:22<23:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 750/2613 [09:23<23:19,  1.33it/s]

	Current Loss: 1.6345
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▊       | 751/2613 [09:24<23:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 752/2613 [09:25<23:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 753/2613 [09:25<23:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 754/2613 [09:26<23:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 755/2613 [09:27<23:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 756/2613 [09:28<23:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 757/2613 [09:28<23:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 758/2613 [09:29<23:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 759/2613 [09:30<23:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 760/2613 [09:31<23:12,  1.33it/s]

	Current Loss: 1.6272
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 761/2613 [09:31<23:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 762/2613 [09:32<23:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 763/2613 [09:33<23:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 764/2613 [09:34<23:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 765/2613 [09:34<23:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 766/2613 [09:35<23:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 767/2613 [09:36<23:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 768/2613 [09:37<23:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 769/2613 [09:37<23:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 29%|██▉       | 770/2613 [09:38<23:07,  1.33it/s]

	Current Loss: 1.6305
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 771/2613 [09:39<23:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 772/2613 [09:40<23:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 773/2613 [09:40<23:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 774/2613 [09:41<23:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 775/2613 [09:42<23:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 776/2613 [09:43<23:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 777/2613 [09:44<23:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 778/2613 [09:44<22:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 779/2613 [09:45<22:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 780/2613 [09:46<22:58,  1.33it/s]

	Current Loss: 1.6264
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 781/2613 [09:47<22:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 782/2613 [09:47<22:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|██▉       | 783/2613 [09:48<23:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 784/2613 [09:49<22:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 785/2613 [09:50<22:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 786/2613 [09:50<22:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 787/2613 [09:51<22:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 788/2613 [09:52<22:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 789/2613 [09:53<22:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 790/2613 [09:53<22:52,  1.33it/s]

	Current Loss: 1.6300
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 791/2613 [09:54<22:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 792/2613 [09:55<22:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 793/2613 [09:56<22:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 794/2613 [09:56<22:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 795/2613 [09:57<22:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 30%|███       | 796/2613 [09:58<22:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 797/2613 [09:59<22:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 798/2613 [09:59<22:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 799/2613 [10:00<22:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 800/2613 [10:01<22:45,  1.33it/s]

	Current Loss: 1.6286
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 801/2613 [10:02<22:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 802/2613 [10:02<22:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 803/2613 [10:03<22:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 804/2613 [10:04<22:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 805/2613 [10:05<22:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 806/2613 [10:05<22:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 807/2613 [10:06<22:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 808/2613 [10:07<22:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 809/2613 [10:08<22:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 810/2613 [10:08<22:34,  1.33it/s]

	Current Loss: 1.6355
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 811/2613 [10:09<22:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 812/2613 [10:10<22:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 813/2613 [10:11<22:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 814/2613 [10:11<22:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 815/2613 [10:12<22:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███       | 816/2613 [10:13<22:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 817/2613 [10:14<22:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 818/2613 [10:14<22:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 819/2613 [10:15<22:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 820/2613 [10:16<22:27,  1.33it/s]

	Current Loss: 1.6326
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 821/2613 [10:17<22:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 822/2613 [10:17<22:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 31%|███▏      | 823/2613 [10:18<22:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 824/2613 [10:19<22:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 825/2613 [10:20<22:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 826/2613 [10:20<22:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 827/2613 [10:21<22:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 828/2613 [10:22<22:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 829/2613 [10:23<22:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 830/2613 [10:23<22:21,  1.33it/s]

	Current Loss: 1.6363
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 831/2613 [10:24<22:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 832/2613 [10:25<22:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 833/2613 [10:26<22:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 834/2613 [10:26<22:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 835/2613 [10:27<22:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 836/2613 [10:28<22:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 837/2613 [10:29<22:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 838/2613 [10:29<22:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 839/2613 [10:30<22:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 840/2613 [10:31<22:13,  1.33it/s]

	Current Loss: 1.6311
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 841/2613 [10:32<22:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 842/2613 [10:32<22:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 843/2613 [10:33<22:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 844/2613 [10:34<22:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 845/2613 [10:35<22:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 846/2613 [10:35<22:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 847/2613 [10:36<22:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 848/2613 [10:37<22:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 32%|███▏      | 849/2613 [10:38<22:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 850/2613 [10:38<22:04,  1.33it/s]

	Current Loss: 1.6271
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 851/2613 [10:39<22:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 852/2613 [10:40<22:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 853/2613 [10:41<22:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 854/2613 [10:41<22:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 855/2613 [10:42<22:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 856/2613 [10:43<21:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 857/2613 [10:44<21:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 858/2613 [10:44<21:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 859/2613 [10:45<22:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 860/2613 [10:46<21:56,  1.33it/s]

	Current Loss: 1.6233
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 861/2613 [10:47<21:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 862/2613 [10:47<21:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 863/2613 [10:48<21:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 864/2613 [10:49<21:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 865/2613 [10:50<21:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 866/2613 [10:50<21:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 867/2613 [10:51<21:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 868/2613 [10:52<21:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 869/2613 [10:53<21:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 870/2613 [10:53<21:50,  1.33it/s]

	Current Loss: 1.6297
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 871/2613 [10:54<21:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 872/2613 [10:55<21:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 873/2613 [10:56<21:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 874/2613 [10:56<21:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 33%|███▎      | 875/2613 [10:57<21:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 876/2613 [10:58<21:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 877/2613 [10:59<21:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 878/2613 [10:59<21:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 879/2613 [11:00<21:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 880/2613 [11:01<21:42,  1.33it/s]

	Current Loss: 1.6244
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▎      | 881/2613 [11:02<21:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 882/2613 [11:02<21:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 883/2613 [11:03<21:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 884/2613 [11:04<21:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 885/2613 [11:05<21:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 886/2613 [11:05<21:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 887/2613 [11:06<21:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 888/2613 [11:07<21:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 889/2613 [11:08<21:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 890/2613 [11:08<21:36,  1.33it/s]

	Current Loss: 1.6238
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 891/2613 [11:09<21:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 892/2613 [11:10<21:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 893/2613 [11:11<21:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 894/2613 [11:11<21:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 895/2613 [11:12<21:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 896/2613 [11:13<21:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 897/2613 [11:14<21:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 898/2613 [11:14<21:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 899/2613 [11:15<21:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 900/2613 [11:16<21:30,  1.33it/s]

	Current Loss: 1.6241
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 34%|███▍      | 901/2613 [11:17<21:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 902/2613 [11:17<21:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 903/2613 [11:18<21:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 904/2613 [11:19<21:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 905/2613 [11:20<21:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 906/2613 [11:20<21:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 907/2613 [11:21<21:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 908/2613 [11:22<21:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 909/2613 [11:23<21:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 910/2613 [11:23<21:19,  1.33it/s]

	Current Loss: 1.6318
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 911/2613 [11:24<21:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 912/2613 [11:25<21:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 913/2613 [11:26<21:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▍      | 914/2613 [11:27<21:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 915/2613 [11:27<21:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 916/2613 [11:28<21:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 917/2613 [11:29<21:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 918/2613 [11:30<21:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 919/2613 [11:30<21:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 920/2613 [11:31<21:13,  1.33it/s]

	Current Loss: 1.6282
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 921/2613 [11:32<21:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 922/2613 [11:33<21:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 923/2613 [11:33<21:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 924/2613 [11:34<21:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 925/2613 [11:35<21:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 926/2613 [11:36<21:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 35%|███▌      | 927/2613 [11:36<21:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 928/2613 [11:37<21:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 929/2613 [11:38<21:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 930/2613 [11:39<21:06,  1.33it/s]

	Current Loss: 1.6247
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 931/2613 [11:39<21:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 932/2613 [11:40<21:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 933/2613 [11:41<21:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 934/2613 [11:42<21:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 935/2613 [11:42<21:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 936/2613 [11:43<21:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 937/2613 [11:44<20:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 938/2613 [11:45<20:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 939/2613 [11:45<20:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 940/2613 [11:46<21:00,  1.33it/s]

	Current Loss: 1.6287
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 941/2613 [11:47<20:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 942/2613 [11:48<20:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 943/2613 [11:48<20:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 944/2613 [11:49<20:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 945/2613 [11:50<20:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 946/2613 [11:51<20:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▌      | 947/2613 [11:51<20:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 948/2613 [11:52<20:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 949/2613 [11:53<20:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 950/2613 [11:54<20:51,  1.33it/s]

	Current Loss: 1.6250
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 951/2613 [11:54<20:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 952/2613 [11:55<20:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 36%|███▋      | 953/2613 [11:56<20:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 954/2613 [11:57<20:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 955/2613 [11:57<20:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 956/2613 [11:58<20:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 957/2613 [11:59<20:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 958/2613 [12:00<20:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 959/2613 [12:00<20:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 960/2613 [12:01<20:42,  1.33it/s]

	Current Loss: 1.6245
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 961/2613 [12:02<20:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 962/2613 [12:03<20:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 963/2613 [12:03<20:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 964/2613 [12:04<20:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 965/2613 [12:05<20:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 966/2613 [12:06<20:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 967/2613 [12:06<20:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 968/2613 [12:07<20:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 969/2613 [12:08<20:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 970/2613 [12:09<20:36,  1.33it/s]

	Current Loss: 1.6233
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 971/2613 [12:09<20:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 972/2613 [12:10<20:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 973/2613 [12:11<20:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 974/2613 [12:12<20:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 975/2613 [12:12<20:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 976/2613 [12:13<20:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 977/2613 [12:14<20:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 978/2613 [12:15<20:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 37%|███▋      | 979/2613 [12:15<20:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 980/2613 [12:16<20:27,  1.33it/s]

	Current Loss: 1.6316
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 981/2613 [12:17<20:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 982/2613 [12:18<20:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 983/2613 [12:18<20:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 984/2613 [12:19<20:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 985/2613 [12:20<20:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 986/2613 [12:21<20:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 987/2613 [12:21<20:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 988/2613 [12:22<20:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 989/2613 [12:23<20:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 990/2613 [12:24<20:20,  1.33it/s]

	Current Loss: 1.6301
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 991/2613 [12:24<20:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 992/2613 [12:25<20:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 993/2613 [12:26<20:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 994/2613 [12:27<20:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 995/2613 [12:27<20:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 996/2613 [12:28<20:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 997/2613 [12:29<20:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 998/2613 [12:30<20:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 999/2613 [12:30<20:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1000/2613 [12:31<20:12,  1.33it/s]

	Current Loss: 1.6249
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1001/2613 [12:32<20:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1002/2613 [12:33<20:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1003/2613 [12:33<20:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1004/2613 [12:34<20:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1005/2613 [12:35<20:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 38%|███▊      | 1006/2613 [12:36<20:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1007/2613 [12:36<20:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1008/2613 [12:37<20:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1009/2613 [12:38<20:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1010/2613 [12:39<20:04,  1.33it/s]

	Current Loss: 1.6304
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1011/2613 [12:39<20:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▊      | 1012/2613 [12:40<20:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1013/2613 [12:41<20:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1014/2613 [12:42<20:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1015/2613 [12:42<20:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1016/2613 [12:43<20:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1017/2613 [12:44<20:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1018/2613 [12:45<19:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1019/2613 [12:45<19:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1020/2613 [12:46<19:59,  1.33it/s]

	Current Loss: 1.6226
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1021/2613 [12:47<19:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1022/2613 [12:48<19:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1023/2613 [12:48<19:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1024/2613 [12:49<19:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1025/2613 [12:50<19:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1026/2613 [12:51<19:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1027/2613 [12:51<19:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1028/2613 [12:52<19:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1029/2613 [12:53<19:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1030/2613 [12:54<19:50,  1.33it/s]

	Current Loss: 1.6245
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1031/2613 [12:54<19:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 39%|███▉      | 1032/2613 [12:55<19:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1033/2613 [12:56<19:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1034/2613 [12:57<19:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1035/2613 [12:57<19:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1036/2613 [12:58<19:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1037/2613 [12:59<19:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1038/2613 [13:00<19:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1039/2613 [13:01<19:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1040/2613 [13:01<19:44,  1.33it/s]

	Current Loss: 1.6227
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1041/2613 [13:02<19:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1042/2613 [13:03<19:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1043/2613 [13:04<19:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1044/2613 [13:04<19:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|███▉      | 1045/2613 [13:05<19:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1046/2613 [13:06<19:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1047/2613 [13:07<19:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1048/2613 [13:07<19:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1049/2613 [13:08<19:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1050/2613 [13:09<19:35,  1.33it/s]

	Current Loss: 1.6222
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1051/2613 [13:10<19:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1052/2613 [13:10<19:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1053/2613 [13:11<19:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1054/2613 [13:12<19:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1055/2613 [13:13<19:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1056/2613 [13:13<19:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1057/2613 [13:14<19:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 40%|████      | 1058/2613 [13:15<19:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1059/2613 [13:16<19:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1060/2613 [13:16<19:27,  1.33it/s]

	Current Loss: 1.6231
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1061/2613 [13:17<19:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1062/2613 [13:18<19:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1063/2613 [13:19<19:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1064/2613 [13:19<19:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1065/2613 [13:20<19:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1066/2613 [13:21<19:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1067/2613 [13:22<19:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1068/2613 [13:22<19:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1069/2613 [13:23<19:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1070/2613 [13:24<19:19,  1.33it/s]

	Current Loss: 1.6233
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1071/2613 [13:25<19:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1072/2613 [13:25<19:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1073/2613 [13:26<19:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1074/2613 [13:27<19:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1075/2613 [13:28<19:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1076/2613 [13:28<19:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████      | 1077/2613 [13:29<19:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1078/2613 [13:30<19:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1079/2613 [13:31<19:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1080/2613 [13:31<19:13,  1.33it/s]

	Current Loss: 1.6140
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1081/2613 [13:32<19:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1082/2613 [13:33<19:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1083/2613 [13:34<19:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 41%|████▏     | 1084/2613 [13:34<19:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1085/2613 [13:35<19:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1086/2613 [13:36<19:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1087/2613 [13:37<19:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1088/2613 [13:37<19:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1089/2613 [13:38<19:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1090/2613 [13:39<19:05,  1.33it/s]

	Current Loss: 1.6107
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1091/2613 [13:40<19:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1092/2613 [13:40<19:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1093/2613 [13:41<19:23,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1094/2613 [13:42<19:17,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1095/2613 [13:43<19:10,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1096/2613 [13:43<19:07,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1097/2613 [13:44<19:05,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1098/2613 [13:45<19:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1099/2613 [13:46<19:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1100/2613 [13:46<18:59,  1.33it/s]

	Current Loss: 1.6186
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1101/2613 [13:47<18:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1102/2613 [13:48<18:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1103/2613 [13:49<18:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1104/2613 [13:49<18:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1105/2613 [13:50<18:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1106/2613 [13:51<18:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1107/2613 [13:52<18:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1108/2613 [13:52<18:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1109/2613 [13:53<18:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 42%|████▏     | 1110/2613 [13:54<18:53,  1.33it/s]

	Current Loss: 1.6184
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1111/2613 [13:55<18:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1112/2613 [13:55<18:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1113/2613 [13:56<18:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1114/2613 [13:57<18:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1115/2613 [13:58<18:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1116/2613 [13:58<18:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1117/2613 [13:59<18:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1118/2613 [14:00<18:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1119/2613 [14:01<18:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1120/2613 [14:01<18:42,  1.33it/s]

	Current Loss: 1.6180
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1121/2613 [14:02<18:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1122/2613 [14:03<18:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1123/2613 [14:04<18:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1124/2613 [14:04<18:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1125/2613 [14:05<18:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1126/2613 [14:06<18:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1127/2613 [14:07<18:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1128/2613 [14:07<18:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1129/2613 [14:08<18:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1130/2613 [14:09<18:35,  1.33it/s]

	Current Loss: 1.6191
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1131/2613 [14:10<18:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1132/2613 [14:10<18:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1133/2613 [14:11<18:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1134/2613 [14:12<18:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1135/2613 [14:13<18:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 43%|████▎     | 1136/2613 [14:14<18:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1137/2613 [14:14<18:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1138/2613 [14:15<18:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1139/2613 [14:16<18:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1140/2613 [14:17<18:30,  1.33it/s]

	Current Loss: 1.6219
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1141/2613 [14:17<18:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1142/2613 [14:18<18:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▎     | 1143/2613 [14:19<18:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1144/2613 [14:20<18:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1145/2613 [14:20<18:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1146/2613 [14:21<18:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1147/2613 [14:22<18:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1148/2613 [14:23<18:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1149/2613 [14:23<18:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1150/2613 [14:24<18:20,  1.33it/s]

	Current Loss: 1.6245
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1151/2613 [14:25<18:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1152/2613 [14:26<18:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1153/2613 [14:26<18:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1154/2613 [14:27<18:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1155/2613 [14:28<18:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1156/2613 [14:29<18:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1157/2613 [14:29<18:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1158/2613 [14:30<18:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1159/2613 [14:31<18:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1160/2613 [14:32<18:13,  1.33it/s]

	Current Loss: 1.6183
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1161/2613 [14:32<18:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 44%|████▍     | 1162/2613 [14:33<18:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1163/2613 [14:34<18:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1164/2613 [14:35<18:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1165/2613 [14:35<18:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1166/2613 [14:36<18:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1167/2613 [14:37<18:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1168/2613 [14:38<18:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1169/2613 [14:38<18:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1170/2613 [14:39<18:04,  1.33it/s]

	Current Loss: 1.6186
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1171/2613 [14:40<18:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1172/2613 [14:41<18:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1173/2613 [14:41<18:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1174/2613 [14:42<18:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▍     | 1175/2613 [14:43<18:17,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1176/2613 [14:44<18:11,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1177/2613 [14:44<18:06,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1178/2613 [14:45<18:03,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1179/2613 [14:46<18:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1180/2613 [14:47<18:02,  1.32it/s]

	Current Loss: 1.6214
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1181/2613 [14:47<17:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1182/2613 [14:48<17:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1183/2613 [14:49<17:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1184/2613 [14:50<17:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1185/2613 [14:50<17:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1186/2613 [14:51<17:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1187/2613 [14:52<17:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 45%|████▌     | 1188/2613 [14:53<17:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1189/2613 [14:53<17:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1190/2613 [14:54<17:50,  1.33it/s]

	Current Loss: 1.6190
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1191/2613 [14:55<17:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1192/2613 [14:56<17:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1193/2613 [14:56<17:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1194/2613 [14:57<17:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1195/2613 [14:58<17:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1196/2613 [14:59<17:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1197/2613 [14:59<17:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1198/2613 [15:00<17:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1199/2613 [15:01<17:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1200/2613 [15:02<17:44,  1.33it/s]

	Current Loss: 1.6227
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1201/2613 [15:02<17:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1202/2613 [15:03<17:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1203/2613 [15:04<17:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1204/2613 [15:05<17:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1205/2613 [15:05<17:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1206/2613 [15:06<17:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1207/2613 [15:07<17:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▌     | 1208/2613 [15:08<17:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1209/2613 [15:08<17:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1210/2613 [15:09<17:36,  1.33it/s]

	Current Loss: 1.6178
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1211/2613 [15:10<17:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1212/2613 [15:11<17:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1213/2613 [15:11<17:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1214/2613 [15:12<17:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 46%|████▋     | 1215/2613 [15:13<17:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1216/2613 [15:14<17:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1217/2613 [15:14<17:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1218/2613 [15:15<17:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1219/2613 [15:16<17:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1220/2613 [15:17<17:27,  1.33it/s]

	Current Loss: 1.6184
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1221/2613 [15:17<17:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1222/2613 [15:18<17:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1223/2613 [15:19<17:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1224/2613 [15:20<17:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1225/2613 [15:20<17:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1226/2613 [15:21<17:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1227/2613 [15:22<17:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1228/2613 [15:23<17:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1229/2613 [15:23<17:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1230/2613 [15:24<17:19,  1.33it/s]

	Current Loss: 1.6195
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1231/2613 [15:25<17:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1232/2613 [15:26<17:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1233/2613 [15:26<17:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1234/2613 [15:27<17:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1235/2613 [15:28<17:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1236/2613 [15:29<17:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1237/2613 [15:29<17:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1238/2613 [15:30<17:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1239/2613 [15:31<17:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1240/2613 [15:32<17:11,  1.33it/s]

	Current Loss: 1.6259
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 47%|████▋     | 1241/2613 [15:32<17:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1242/2613 [15:33<17:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1243/2613 [15:34<17:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1244/2613 [15:35<17:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1245/2613 [15:36<17:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1246/2613 [15:36<17:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1247/2613 [15:37<17:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1248/2613 [15:38<17:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1249/2613 [15:39<17:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1250/2613 [15:39<17:04,  1.33it/s]

	Current Loss: 1.6213
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1251/2613 [15:40<17:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1252/2613 [15:41<17:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1253/2613 [15:42<17:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1254/2613 [15:42<17:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1255/2613 [15:43<17:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1256/2613 [15:44<17:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1257/2613 [15:45<17:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1258/2613 [15:45<17:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1259/2613 [15:46<16:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1260/2613 [15:47<16:57,  1.33it/s]

	Current Loss: 1.6241
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1261/2613 [15:48<16:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1262/2613 [15:48<16:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1263/2613 [15:49<16:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1264/2613 [15:50<16:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1265/2613 [15:51<16:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1266/2613 [15:51<16:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 48%|████▊     | 1267/2613 [15:52<16:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1268/2613 [15:53<16:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1269/2613 [15:54<16:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1270/2613 [15:54<16:49,  1.33it/s]

	Current Loss: 1.6144
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1271/2613 [15:55<16:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1272/2613 [15:56<16:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▊     | 1273/2613 [15:57<16:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1274/2613 [15:57<16:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1275/2613 [15:58<16:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1276/2613 [15:59<16:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1277/2613 [16:00<16:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1278/2613 [16:00<16:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1279/2613 [16:01<16:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1280/2613 [16:02<16:40,  1.33it/s]

	Current Loss: 1.6214
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1281/2613 [16:03<16:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1282/2613 [16:03<16:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1283/2613 [16:04<16:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1284/2613 [16:05<16:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1285/2613 [16:06<16:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1286/2613 [16:06<16:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1287/2613 [16:07<16:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1288/2613 [16:08<16:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1289/2613 [16:09<16:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1290/2613 [16:09<16:35,  1.33it/s]

	Current Loss: 1.6130
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1291/2613 [16:10<16:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1292/2613 [16:11<16:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 49%|████▉     | 1293/2613 [16:12<16:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1294/2613 [16:12<16:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1295/2613 [16:13<16:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1296/2613 [16:14<16:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1297/2613 [16:15<16:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1298/2613 [16:15<16:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1299/2613 [16:16<16:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1300/2613 [16:17<16:26,  1.33it/s]

	Current Loss: 1.6144
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1301/2613 [16:18<16:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1302/2613 [16:18<16:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1303/2613 [16:19<16:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1304/2613 [16:20<16:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1305/2613 [16:21<16:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|████▉     | 1306/2613 [16:21<16:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1307/2613 [16:22<16:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1308/2613 [16:23<16:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1309/2613 [16:24<16:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1310/2613 [16:24<16:19,  1.33it/s]

	Current Loss: 1.6151
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1311/2613 [16:25<16:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1312/2613 [16:26<16:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1313/2613 [16:27<16:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1314/2613 [16:27<16:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1315/2613 [16:28<16:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1316/2613 [16:29<16:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1317/2613 [16:30<16:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1318/2613 [16:30<16:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 50%|█████     | 1319/2613 [16:31<16:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1320/2613 [16:32<16:12,  1.33it/s]

	Current Loss: 1.6172
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1321/2613 [16:33<16:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1322/2613 [16:33<16:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1323/2613 [16:34<16:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1324/2613 [16:35<16:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1325/2613 [16:36<16:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1326/2613 [16:36<16:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1327/2613 [16:37<16:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1328/2613 [16:38<16:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1329/2613 [16:39<16:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1330/2613 [16:39<16:04,  1.33it/s]

	Current Loss: 1.6112
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1331/2613 [16:40<16:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1332/2613 [16:41<16:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1333/2613 [16:42<16:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1334/2613 [16:42<16:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1335/2613 [16:43<16:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1336/2613 [16:44<16:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1337/2613 [16:45<15:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1338/2613 [16:45<16:14,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████     | 1339/2613 [16:46<16:08,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1340/2613 [16:47<16:07,  1.32it/s]

	Current Loss: 1.6109
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1341/2613 [16:48<16:00,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1342/2613 [16:48<15:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1343/2613 [16:49<15:59,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1344/2613 [16:50<15:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 51%|█████▏    | 1345/2613 [16:51<15:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1346/2613 [16:51<15:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1347/2613 [16:52<15:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1348/2613 [16:53<15:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1349/2613 [16:54<15:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1350/2613 [16:55<15:49,  1.33it/s]

	Current Loss: 1.6194
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1351/2613 [16:55<15:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1352/2613 [16:56<15:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1353/2613 [16:57<15:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1354/2613 [16:58<15:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1355/2613 [16:58<15:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1356/2613 [16:59<15:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1357/2613 [17:00<15:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1358/2613 [17:01<15:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1359/2613 [17:01<15:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1360/2613 [17:02<15:42,  1.33it/s]

	Current Loss: 1.6147
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1361/2613 [17:03<15:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1362/2613 [17:04<15:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1363/2613 [17:04<15:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1364/2613 [17:05<15:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1365/2613 [17:06<15:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1366/2613 [17:07<15:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1367/2613 [17:07<15:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1368/2613 [17:08<15:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1369/2613 [17:09<15:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1370/2613 [17:10<15:33,  1.33it/s]

	Current Loss: 1.6161
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 52%|█████▏    | 1371/2613 [17:10<15:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1372/2613 [17:11<15:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1373/2613 [17:12<15:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1374/2613 [17:13<15:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1375/2613 [17:13<15:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1376/2613 [17:14<15:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1377/2613 [17:15<15:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1378/2613 [17:16<15:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1379/2613 [17:16<15:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1380/2613 [17:17<15:25,  1.33it/s]

	Current Loss: 1.6077
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1381/2613 [17:18<15:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1382/2613 [17:19<15:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1383/2613 [17:19<15:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1384/2613 [17:20<15:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1385/2613 [17:21<15:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1386/2613 [17:22<15:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1387/2613 [17:22<15:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1388/2613 [17:23<15:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1389/2613 [17:24<15:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1390/2613 [17:25<15:19,  1.33it/s]

	Current Loss: 1.6085
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1391/2613 [17:25<15:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1392/2613 [17:26<15:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1393/2613 [17:27<15:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1394/2613 [17:28<15:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1395/2613 [17:28<15:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1396/2613 [17:29<15:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 53%|█████▎    | 1397/2613 [17:30<15:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1398/2613 [17:31<15:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1399/2613 [17:31<15:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1400/2613 [17:32<15:09,  1.33it/s]

	Current Loss: 1.6101
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1401/2613 [17:33<15:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1402/2613 [17:34<15:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1403/2613 [17:34<15:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▎    | 1404/2613 [17:35<15:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1405/2613 [17:36<15:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1406/2613 [17:37<15:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1407/2613 [17:37<15:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1408/2613 [17:38<15:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1409/2613 [17:39<15:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1410/2613 [17:40<15:04,  1.33it/s]

	Current Loss: 1.6144
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1411/2613 [17:40<15:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1412/2613 [17:41<15:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1413/2613 [17:42<15:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1414/2613 [17:43<15:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1415/2613 [17:43<15:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1416/2613 [17:44<15:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1417/2613 [17:45<15:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1418/2613 [17:46<14:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1419/2613 [17:46<14:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1420/2613 [17:47<14:55,  1.33it/s]

	Current Loss: 1.6197
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1421/2613 [17:48<14:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1422/2613 [17:49<14:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1423/2613 [17:49<14:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 54%|█████▍    | 1424/2613 [17:50<14:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1425/2613 [17:51<14:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1426/2613 [17:52<14:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1427/2613 [17:52<14:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1428/2613 [17:53<14:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1429/2613 [17:54<14:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1430/2613 [17:55<14:48,  1.33it/s]

	Current Loss: 1.6114
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1431/2613 [17:55<14:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1432/2613 [17:56<14:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1433/2613 [17:57<14:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1434/2613 [17:58<14:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1435/2613 [17:58<14:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1436/2613 [17:59<14:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▍    | 1437/2613 [18:00<14:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1438/2613 [18:01<14:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1439/2613 [18:01<14:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1440/2613 [18:02<14:40,  1.33it/s]

	Current Loss: 1.6107
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1441/2613 [18:03<14:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1442/2613 [18:04<14:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1443/2613 [18:04<14:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1444/2613 [18:05<14:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1445/2613 [18:06<14:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1446/2613 [18:07<14:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1447/2613 [18:07<14:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1448/2613 [18:08<14:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1449/2613 [18:09<14:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 55%|█████▌    | 1450/2613 [18:10<14:33,  1.33it/s]

	Current Loss: 1.6168
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1451/2613 [18:10<14:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1452/2613 [18:11<14:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1453/2613 [18:12<14:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1454/2613 [18:13<14:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1455/2613 [18:13<14:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1456/2613 [18:14<14:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1457/2613 [18:15<14:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1458/2613 [18:16<14:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1459/2613 [18:16<14:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1460/2613 [18:17<14:25,  1.33it/s]

	Current Loss: 1.6086
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1461/2613 [18:18<14:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1462/2613 [18:19<14:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1463/2613 [18:19<14:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1464/2613 [18:20<14:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1465/2613 [18:21<14:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1466/2613 [18:22<14:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1467/2613 [18:22<14:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1468/2613 [18:23<14:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▌    | 1469/2613 [18:24<14:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1470/2613 [18:25<14:20,  1.33it/s]

	Current Loss: 1.6152
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1471/2613 [18:25<14:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1472/2613 [18:26<14:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1473/2613 [18:27<14:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1474/2613 [18:28<14:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1475/2613 [18:28<14:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 56%|█████▋    | 1476/2613 [18:29<14:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1477/2613 [18:30<14:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1478/2613 [18:31<14:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1479/2613 [18:31<14:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1480/2613 [18:32<14:11,  1.33it/s]

	Current Loss: 1.6137
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1481/2613 [18:33<14:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1482/2613 [18:34<14:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1483/2613 [18:34<14:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1484/2613 [18:35<14:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1485/2613 [18:36<14:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1486/2613 [18:37<14:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1487/2613 [18:37<14:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1488/2613 [18:38<14:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1489/2613 [18:39<14:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1490/2613 [18:40<14:03,  1.33it/s]

	Current Loss: 1.6123
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1491/2613 [18:40<14:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1492/2613 [18:41<14:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1493/2613 [18:42<14:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1494/2613 [18:43<14:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1495/2613 [18:43<14:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1496/2613 [18:44<13:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1497/2613 [18:45<13:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1498/2613 [18:46<13:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1499/2613 [18:46<13:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1500/2613 [18:47<13:57,  1.33it/s]

	Current Loss: 1.6129
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1501/2613 [18:48<13:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 57%|█████▋    | 1502/2613 [18:49<13:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1503/2613 [18:49<13:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1504/2613 [18:50<13:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1505/2613 [18:51<13:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1506/2613 [18:52<13:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1507/2613 [18:52<13:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1508/2613 [18:53<13:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1509/2613 [18:54<13:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1510/2613 [18:55<13:48,  1.33it/s]

	Current Loss: 1.6100
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1511/2613 [18:55<13:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1512/2613 [18:56<13:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1513/2613 [18:57<13:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1514/2613 [18:58<13:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1515/2613 [18:58<13:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1516/2613 [18:59<13:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1517/2613 [19:00<13:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1518/2613 [19:01<13:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1519/2613 [19:02<13:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1520/2613 [19:02<13:41,  1.33it/s]

	Current Loss: 1.6056
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1521/2613 [19:03<13:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1522/2613 [19:04<13:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1523/2613 [19:05<13:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1524/2613 [19:05<13:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1525/2613 [19:06<13:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1526/2613 [19:07<13:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1527/2613 [19:08<13:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 58%|█████▊    | 1528/2613 [19:08<13:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1529/2613 [19:09<13:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1530/2613 [19:10<13:33,  1.33it/s]

	Current Loss: 1.6045
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1531/2613 [19:11<13:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1532/2613 [19:11<13:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1533/2613 [19:12<13:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1534/2613 [19:13<13:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▊    | 1535/2613 [19:14<13:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1536/2613 [19:14<13:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1537/2613 [19:15<13:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1538/2613 [19:16<13:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1539/2613 [19:17<13:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1540/2613 [19:17<13:26,  1.33it/s]

	Current Loss: 1.6097
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1541/2613 [19:18<13:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1542/2613 [19:19<13:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1543/2613 [19:20<13:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1544/2613 [19:20<13:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1545/2613 [19:21<13:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1546/2613 [19:22<13:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1547/2613 [19:23<13:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1548/2613 [19:23<13:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1549/2613 [19:24<13:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1550/2613 [19:25<13:18,  1.33it/s]

	Current Loss: 1.6133
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1551/2613 [19:26<13:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1552/2613 [19:26<13:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1553/2613 [19:27<13:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 59%|█████▉    | 1554/2613 [19:28<13:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1555/2613 [19:29<13:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1556/2613 [19:29<13:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1557/2613 [19:30<13:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1558/2613 [19:31<13:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1559/2613 [19:32<13:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1560/2613 [19:32<13:12,  1.33it/s]

	Current Loss: 1.6169
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1561/2613 [19:33<13:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1562/2613 [19:34<13:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1563/2613 [19:35<13:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1564/2613 [19:35<13:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1565/2613 [19:36<13:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1566/2613 [19:37<13:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|█████▉    | 1567/2613 [19:38<13:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1568/2613 [19:38<13:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1569/2613 [19:39<13:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1570/2613 [19:40<13:04,  1.33it/s]

	Current Loss: 1.6076
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1571/2613 [19:41<13:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1572/2613 [19:41<13:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1573/2613 [19:42<13:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1574/2613 [19:43<13:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1575/2613 [19:44<13:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1576/2613 [19:44<13:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1577/2613 [19:45<12:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1578/2613 [19:46<12:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1579/2613 [19:47<12:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 60%|██████    | 1580/2613 [19:47<12:56,  1.33it/s]

	Current Loss: 1.6076
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1581/2613 [19:48<12:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1582/2613 [19:49<12:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1583/2613 [19:50<12:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1584/2613 [19:50<12:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1585/2613 [19:51<12:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1586/2613 [19:52<12:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1587/2613 [19:53<12:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1588/2613 [19:53<12:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1589/2613 [19:54<12:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1590/2613 [19:55<12:48,  1.33it/s]

	Current Loss: 1.6095
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1591/2613 [19:56<12:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1592/2613 [19:56<12:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1593/2613 [19:57<12:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1594/2613 [19:58<12:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1595/2613 [19:59<12:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1596/2613 [19:59<12:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1597/2613 [20:00<12:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1598/2613 [20:01<12:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1599/2613 [20:02<12:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████    | 1600/2613 [20:02<12:40,  1.33it/s]

	Current Loss: 1.6128
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1601/2613 [20:03<12:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1602/2613 [20:04<12:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1603/2613 [20:05<12:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1604/2613 [20:05<12:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1605/2613 [20:06<12:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 61%|██████▏   | 1606/2613 [20:07<12:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1607/2613 [20:08<12:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1608/2613 [20:08<12:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1609/2613 [20:09<12:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1610/2613 [20:10<12:33,  1.33it/s]

	Current Loss: 1.6059
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1611/2613 [20:11<12:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1612/2613 [20:11<12:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1613/2613 [20:12<12:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1614/2613 [20:13<12:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1615/2613 [20:14<12:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1616/2613 [20:14<12:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1617/2613 [20:15<12:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1618/2613 [20:16<12:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1619/2613 [20:17<12:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1620/2613 [20:17<12:26,  1.33it/s]

	Current Loss: 1.6125
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1621/2613 [20:18<12:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1622/2613 [20:19<12:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1623/2613 [20:20<12:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1624/2613 [20:20<12:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1625/2613 [20:21<12:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1626/2613 [20:22<12:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1627/2613 [20:23<12:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1628/2613 [20:23<12:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1629/2613 [20:24<12:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1630/2613 [20:25<12:18,  1.33it/s]

	Current Loss: 1.6059
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1631/2613 [20:26<12:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1632/2613 [20:26<12:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 62%|██████▏   | 1633/2613 [20:27<12:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1634/2613 [20:28<12:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1635/2613 [20:29<12:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1636/2613 [20:29<12:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1637/2613 [20:30<12:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1638/2613 [20:31<12:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1639/2613 [20:32<12:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1640/2613 [20:32<12:11,  1.33it/s]

	Current Loss: 1.6137
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1641/2613 [20:33<12:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1642/2613 [20:34<12:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1643/2613 [20:35<12:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1644/2613 [20:35<12:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1645/2613 [20:36<12:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1646/2613 [20:37<12:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1647/2613 [20:38<12:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1648/2613 [20:38<12:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1649/2613 [20:39<12:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1650/2613 [20:40<12:03,  1.33it/s]

	Current Loss: 1.6166
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1651/2613 [20:41<12:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1652/2613 [20:41<12:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1653/2613 [20:42<12:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1654/2613 [20:43<12:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1655/2613 [20:44<11:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1656/2613 [20:44<11:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1657/2613 [20:45<11:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1658/2613 [20:46<11:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 63%|██████▎   | 1659/2613 [20:47<11:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1660/2613 [20:47<11:56,  1.33it/s]

	Current Loss: 1.6122
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1661/2613 [20:48<11:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1662/2613 [20:49<11:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1663/2613 [20:50<11:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1664/2613 [20:50<11:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▎   | 1665/2613 [20:51<11:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1666/2613 [20:52<11:55,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1667/2613 [20:53<11:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1668/2613 [20:53<11:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1669/2613 [20:54<11:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1670/2613 [20:55<11:50,  1.33it/s]

	Current Loss: 1.6067
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1671/2613 [20:56<11:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1672/2613 [20:57<11:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1673/2613 [20:57<11:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1674/2613 [20:58<11:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1675/2613 [20:59<11:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1676/2613 [21:00<11:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1677/2613 [21:00<11:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1678/2613 [21:01<11:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1679/2613 [21:02<11:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1680/2613 [21:03<11:41,  1.33it/s]

	Current Loss: 1.6126
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1681/2613 [21:03<11:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1682/2613 [21:04<11:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1683/2613 [21:05<11:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1684/2613 [21:06<11:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 64%|██████▍   | 1685/2613 [21:06<11:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1686/2613 [21:07<11:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1687/2613 [21:08<11:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1688/2613 [21:09<11:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1689/2613 [21:09<11:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1690/2613 [21:10<11:34,  1.33it/s]

	Current Loss: 1.6037
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1691/2613 [21:11<11:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1692/2613 [21:12<11:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1693/2613 [21:12<11:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1694/2613 [21:13<11:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1695/2613 [21:14<11:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1696/2613 [21:15<11:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1697/2613 [21:15<11:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▍   | 1698/2613 [21:16<11:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1699/2613 [21:17<11:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1700/2613 [21:18<11:26,  1.33it/s]

	Current Loss: 1.6037
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1701/2613 [21:18<11:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1702/2613 [21:19<11:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1703/2613 [21:20<11:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1704/2613 [21:21<11:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1705/2613 [21:21<11:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1706/2613 [21:22<11:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1707/2613 [21:23<11:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1708/2613 [21:24<11:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1709/2613 [21:24<11:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1710/2613 [21:25<11:19,  1.33it/s]

	Current Loss: 1.6045
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 65%|██████▌   | 1711/2613 [21:26<11:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1712/2613 [21:27<11:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1713/2613 [21:27<11:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1714/2613 [21:28<11:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1715/2613 [21:29<11:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1716/2613 [21:30<11:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1717/2613 [21:30<11:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1718/2613 [21:31<11:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1719/2613 [21:32<11:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1720/2613 [21:33<11:11,  1.33it/s]

	Current Loss: 1.6097
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1721/2613 [21:33<11:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1722/2613 [21:34<11:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1723/2613 [21:35<11:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1724/2613 [21:36<11:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1725/2613 [21:36<11:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1726/2613 [21:37<11:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1727/2613 [21:38<11:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1728/2613 [21:39<11:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1729/2613 [21:39<11:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1730/2613 [21:40<11:03,  1.33it/s]

	Current Loss: 1.6022
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▌   | 1731/2613 [21:41<11:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1732/2613 [21:42<11:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1733/2613 [21:42<11:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1734/2613 [21:43<11:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1735/2613 [21:44<11:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1736/2613 [21:45<10:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 66%|██████▋   | 1737/2613 [21:45<10:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1738/2613 [21:46<10:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1739/2613 [21:47<10:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1740/2613 [21:48<10:56,  1.33it/s]

	Current Loss: 1.6077
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1741/2613 [21:48<10:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1742/2613 [21:49<10:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1743/2613 [21:50<10:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1744/2613 [21:51<10:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1745/2613 [21:51<10:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1746/2613 [21:52<10:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1747/2613 [21:53<11:02,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1748/2613 [21:54<10:58,  1.31it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1749/2613 [21:54<10:55,  1.32it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1750/2613 [21:55<10:54,  1.32it/s]

	Current Loss: 1.6106
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1751/2613 [21:56<10:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1752/2613 [21:57<10:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1753/2613 [21:57<10:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1754/2613 [21:58<10:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1755/2613 [21:59<10:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1756/2613 [22:00<10:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1757/2613 [22:00<10:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1758/2613 [22:01<10:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1759/2613 [22:02<10:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1760/2613 [22:03<10:41,  1.33it/s]

	Current Loss: 1.6078
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1761/2613 [22:03<10:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1762/2613 [22:04<10:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 67%|██████▋   | 1763/2613 [22:05<10:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1764/2613 [22:06<10:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1765/2613 [22:06<10:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1766/2613 [22:07<10:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1767/2613 [22:08<10:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1768/2613 [22:09<10:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1769/2613 [22:09<10:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1770/2613 [22:10<10:32,  1.33it/s]

	Current Loss: 1.5995
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1771/2613 [22:11<10:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1772/2613 [22:12<10:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1773/2613 [22:12<10:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1774/2613 [22:13<10:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1775/2613 [22:14<10:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1776/2613 [22:15<10:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1777/2613 [22:15<10:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1778/2613 [22:16<10:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1779/2613 [22:17<10:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1780/2613 [22:18<10:25,  1.33it/s]

	Current Loss: 1.6100
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1781/2613 [22:18<10:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1782/2613 [22:19<10:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1783/2613 [22:20<10:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1784/2613 [22:21<10:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1785/2613 [22:22<10:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1786/2613 [22:22<10:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1787/2613 [22:23<10:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1788/2613 [22:24<10:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 68%|██████▊   | 1789/2613 [22:25<10:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1790/2613 [22:25<10:17,  1.33it/s]

	Current Loss: 1.6074
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1791/2613 [22:26<10:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1792/2613 [22:27<10:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1793/2613 [22:28<10:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1794/2613 [22:28<10:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1795/2613 [22:29<10:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▊   | 1796/2613 [22:30<10:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1797/2613 [22:31<10:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1798/2613 [22:31<10:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1799/2613 [22:32<10:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1800/2613 [22:33<10:10,  1.33it/s]

	Current Loss: 1.6066
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1801/2613 [22:34<10:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1802/2613 [22:34<10:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1803/2613 [22:35<10:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1804/2613 [22:36<10:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1805/2613 [22:37<10:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1806/2613 [22:37<10:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1807/2613 [22:38<10:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1808/2613 [22:39<10:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1809/2613 [22:40<10:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1810/2613 [22:40<10:03,  1.33it/s]

	Current Loss: 1.6001
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1811/2613 [22:41<10:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1812/2613 [22:42<10:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1813/2613 [22:43<10:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1814/2613 [22:43<10:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1815/2613 [22:44<09:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 69%|██████▉   | 1816/2613 [22:45<09:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1817/2613 [22:46<09:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1818/2613 [22:46<09:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1819/2613 [22:47<09:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1820/2613 [22:48<09:55,  1.33it/s]

	Current Loss: 1.6055
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1821/2613 [22:49<09:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1822/2613 [22:49<09:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1823/2613 [22:50<09:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1824/2613 [22:51<09:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1825/2613 [22:52<09:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1826/2613 [22:52<09:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1827/2613 [22:53<09:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1828/2613 [22:54<09:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|██████▉   | 1829/2613 [22:55<09:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1830/2613 [22:55<09:48,  1.33it/s]

	Current Loss: 1.6033
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1831/2613 [22:56<09:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1832/2613 [22:57<09:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1833/2613 [22:58<09:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1834/2613 [22:58<09:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1835/2613 [22:59<09:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1836/2613 [23:00<09:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1837/2613 [23:01<09:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1838/2613 [23:01<09:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1839/2613 [23:02<09:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1840/2613 [23:03<09:41,  1.33it/s]

	Current Loss: 1.5999
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1841/2613 [23:04<09:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 70%|███████   | 1842/2613 [23:04<09:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1843/2613 [23:05<09:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1844/2613 [23:06<09:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1845/2613 [23:07<09:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1846/2613 [23:07<09:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1847/2613 [23:08<09:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1848/2613 [23:09<09:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1849/2613 [23:10<09:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1850/2613 [23:10<09:33,  1.33it/s]

	Current Loss: 1.6025
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1851/2613 [23:11<09:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1852/2613 [23:12<09:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1853/2613 [23:13<09:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1854/2613 [23:13<09:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1855/2613 [23:14<09:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1856/2613 [23:15<09:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1857/2613 [23:16<09:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1858/2613 [23:16<09:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1859/2613 [23:17<09:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1860/2613 [23:18<09:25,  1.33it/s]

	Current Loss: 1.6009
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████   | 1861/2613 [23:19<09:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1862/2613 [23:19<09:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1863/2613 [23:20<09:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1864/2613 [23:21<09:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1865/2613 [23:22<09:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1866/2613 [23:22<09:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1867/2613 [23:23<09:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 71%|███████▏  | 1868/2613 [23:24<09:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1869/2613 [23:25<09:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1870/2613 [23:25<09:19,  1.33it/s]

	Current Loss: 1.6064
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1871/2613 [23:26<09:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1872/2613 [23:27<09:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1873/2613 [23:28<09:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1874/2613 [23:28<09:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1875/2613 [23:29<09:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1876/2613 [23:30<09:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1877/2613 [23:31<09:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1878/2613 [23:31<09:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1879/2613 [23:32<09:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1880/2613 [23:33<09:10,  1.33it/s]

	Current Loss: 1.6000
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1881/2613 [23:34<09:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1882/2613 [23:34<09:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1883/2613 [23:35<09:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1884/2613 [23:36<09:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1885/2613 [23:37<09:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1886/2613 [23:37<09:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1887/2613 [23:38<09:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1888/2613 [23:39<09:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1889/2613 [23:40<09:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1890/2613 [23:40<09:03,  1.33it/s]

	Current Loss: 1.6016
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1891/2613 [23:41<09:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1892/2613 [23:42<09:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1893/2613 [23:43<09:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 72%|███████▏  | 1894/2613 [23:43<09:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1895/2613 [23:44<08:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1896/2613 [23:45<08:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1897/2613 [23:46<08:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1898/2613 [23:46<08:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1899/2613 [23:47<08:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1900/2613 [23:48<08:56,  1.33it/s]

	Current Loss: 1.6037
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1901/2613 [23:49<08:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1902/2613 [23:49<08:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1903/2613 [23:50<08:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1904/2613 [23:51<08:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1905/2613 [23:52<08:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1906/2613 [23:52<08:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1907/2613 [23:53<08:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1908/2613 [23:54<08:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1909/2613 [23:55<08:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1910/2613 [23:55<08:48,  1.33it/s]

	Current Loss: 1.6023
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1911/2613 [23:56<08:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1912/2613 [23:57<08:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1913/2613 [23:58<08:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1914/2613 [23:58<08:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1915/2613 [23:59<08:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1916/2613 [24:00<08:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1917/2613 [24:01<08:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1918/2613 [24:01<08:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1919/2613 [24:02<08:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 73%|███████▎  | 1920/2613 [24:03<08:40,  1.33it/s]

	Current Loss: 1.6025
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1921/2613 [24:04<08:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1922/2613 [24:04<08:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1923/2613 [24:05<08:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1924/2613 [24:06<08:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1925/2613 [24:07<08:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1926/2613 [24:07<08:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▎  | 1927/2613 [24:08<08:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1928/2613 [24:09<08:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1929/2613 [24:10<08:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1930/2613 [24:10<08:33,  1.33it/s]

	Current Loss: 1.6021
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1931/2613 [24:11<08:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1932/2613 [24:12<08:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1933/2613 [24:13<08:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1934/2613 [24:13<08:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1935/2613 [24:14<08:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1936/2613 [24:15<08:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1937/2613 [24:16<08:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1938/2613 [24:17<08:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1939/2613 [24:17<08:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1940/2613 [24:18<08:25,  1.33it/s]

	Current Loss: 1.6021
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1941/2613 [24:19<08:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1942/2613 [24:20<08:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1943/2613 [24:20<08:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1944/2613 [24:21<08:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1945/2613 [24:22<08:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 74%|███████▍  | 1946/2613 [24:23<08:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1947/2613 [24:23<08:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1948/2613 [24:24<08:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1949/2613 [24:25<08:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1950/2613 [24:26<08:17,  1.33it/s]

	Current Loss: 1.6005
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1951/2613 [24:26<08:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1952/2613 [24:27<08:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1953/2613 [24:28<08:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1954/2613 [24:29<08:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1955/2613 [24:29<08:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1956/2613 [24:30<08:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1957/2613 [24:31<08:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1958/2613 [24:32<08:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▍  | 1959/2613 [24:32<08:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1960/2613 [24:33<08:10,  1.33it/s]

	Current Loss: 1.5988
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1961/2613 [24:34<08:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1962/2613 [24:35<08:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1963/2613 [24:35<08:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1964/2613 [24:36<08:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1965/2613 [24:37<08:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1966/2613 [24:38<08:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1967/2613 [24:38<08:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1968/2613 [24:39<08:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1969/2613 [24:40<08:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1970/2613 [24:41<08:03,  1.33it/s]

	Current Loss: 1.5887
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1971/2613 [24:41<08:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 75%|███████▌  | 1972/2613 [24:42<08:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1973/2613 [24:43<08:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1974/2613 [24:44<08:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1975/2613 [24:44<07:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1976/2613 [24:45<07:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1977/2613 [24:46<07:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1978/2613 [24:47<07:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1979/2613 [24:47<07:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1980/2613 [24:48<07:55,  1.33it/s]

	Current Loss: 1.6039
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1981/2613 [24:49<07:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1982/2613 [24:50<07:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1983/2613 [24:50<07:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1984/2613 [24:51<07:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1985/2613 [24:52<07:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1986/2613 [24:53<07:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1987/2613 [24:53<07:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1988/2613 [24:54<07:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1989/2613 [24:55<07:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1990/2613 [24:56<07:47,  1.33it/s]

	Current Loss: 1.5992
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1991/2613 [24:56<07:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▌  | 1992/2613 [24:57<07:47,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1993/2613 [24:58<07:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1994/2613 [24:59<07:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1995/2613 [24:59<07:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1996/2613 [25:00<07:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1997/2613 [25:01<07:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 76%|███████▋  | 1998/2613 [25:02<07:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 1999/2613 [25:02<07:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2000/2613 [25:03<07:40,  1.33it/s]

	Current Loss: 1.5982
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2001/2613 [25:04<07:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2002/2613 [25:05<07:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2003/2613 [25:05<07:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2004/2613 [25:06<07:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2005/2613 [25:07<07:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2006/2613 [25:08<07:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2007/2613 [25:08<07:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2008/2613 [25:09<07:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2009/2613 [25:10<07:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2010/2613 [25:11<07:32,  1.33it/s]

	Current Loss: 1.6058
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2011/2613 [25:11<07:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2012/2613 [25:12<07:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2013/2613 [25:13<07:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2014/2613 [25:14<07:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2015/2613 [25:14<07:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2016/2613 [25:15<07:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2017/2613 [25:16<07:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2018/2613 [25:17<07:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2019/2613 [25:17<07:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2020/2613 [25:18<07:25,  1.33it/s]

	Current Loss: 1.5990
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2021/2613 [25:19<07:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2022/2613 [25:20<07:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2023/2613 [25:20<07:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2024/2613 [25:21<07:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 77%|███████▋  | 2025/2613 [25:22<07:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2026/2613 [25:23<07:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2027/2613 [25:23<07:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2028/2613 [25:24<07:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2029/2613 [25:25<07:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2030/2613 [25:26<07:18,  1.33it/s]

	Current Loss: 1.6038
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2031/2613 [25:26<07:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2032/2613 [25:27<07:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2033/2613 [25:28<07:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2034/2613 [25:29<07:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2035/2613 [25:29<07:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2036/2613 [25:30<07:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2037/2613 [25:31<07:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2038/2613 [25:32<07:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2039/2613 [25:32<07:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2040/2613 [25:33<07:11,  1.33it/s]

	Current Loss: 1.5991
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2041/2613 [25:34<07:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2042/2613 [25:35<07:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2043/2613 [25:35<07:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2044/2613 [25:36<07:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2045/2613 [25:37<07:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2046/2613 [25:38<07:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2047/2613 [25:38<07:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2048/2613 [25:39<07:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2049/2613 [25:40<07:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2050/2613 [25:41<07:04,  1.33it/s]

	Current Loss: 1.5988
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 78%|███████▊  | 2051/2613 [25:41<07:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2052/2613 [25:42<07:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2053/2613 [25:43<07:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2054/2613 [25:44<07:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2055/2613 [25:44<06:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2056/2613 [25:45<06:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▊  | 2057/2613 [25:46<06:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2058/2613 [25:47<06:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2059/2613 [25:47<06:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2060/2613 [25:48<06:55,  1.33it/s]

	Current Loss: 1.6013
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2061/2613 [25:49<06:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2062/2613 [25:50<06:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2063/2613 [25:50<06:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2064/2613 [25:51<06:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2065/2613 [25:52<06:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2066/2613 [25:53<06:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2067/2613 [25:53<06:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2068/2613 [25:54<06:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2069/2613 [25:55<06:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2070/2613 [25:56<06:47,  1.33it/s]

	Current Loss: 1.5953
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2071/2613 [25:56<06:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2072/2613 [25:57<06:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2073/2613 [25:58<06:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2074/2613 [25:59<06:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2075/2613 [25:59<06:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2076/2613 [26:00<06:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 79%|███████▉  | 2077/2613 [26:01<06:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2078/2613 [26:02<06:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2079/2613 [26:02<06:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2080/2613 [26:03<06:40,  1.33it/s]

	Current Loss: 1.6023
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2081/2613 [26:04<06:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2082/2613 [26:05<06:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2083/2613 [26:05<06:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2084/2613 [26:06<06:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2085/2613 [26:07<06:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2086/2613 [26:08<06:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2087/2613 [26:08<06:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2088/2613 [26:09<06:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2089/2613 [26:10<06:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|███████▉  | 2090/2613 [26:11<06:32,  1.33it/s]

	Current Loss: 1.6018
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2091/2613 [26:11<06:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2092/2613 [26:12<06:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2093/2613 [26:13<06:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2094/2613 [26:14<06:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2095/2613 [26:14<06:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2096/2613 [26:15<06:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2097/2613 [26:16<06:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2098/2613 [26:17<06:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2099/2613 [26:18<06:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2100/2613 [26:18<06:25,  1.33it/s]

	Current Loss: 1.5946
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2101/2613 [26:19<06:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2102/2613 [26:20<06:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 80%|████████  | 2103/2613 [26:21<06:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2104/2613 [26:21<06:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2105/2613 [26:22<06:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2106/2613 [26:23<06:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2107/2613 [26:24<06:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2108/2613 [26:24<06:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2109/2613 [26:25<06:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2110/2613 [26:26<06:17,  1.33it/s]

	Current Loss: 1.5996
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2111/2613 [26:27<06:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2112/2613 [26:27<06:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2113/2613 [26:28<06:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2114/2613 [26:29<06:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2115/2613 [26:30<06:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2116/2613 [26:30<06:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2117/2613 [26:31<06:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2118/2613 [26:32<06:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2119/2613 [26:33<06:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2120/2613 [26:33<06:10,  1.33it/s]

	Current Loss: 1.5987
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2121/2613 [26:34<06:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2122/2613 [26:35<06:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████  | 2123/2613 [26:36<06:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2124/2613 [26:36<06:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2125/2613 [26:37<06:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2126/2613 [26:38<06:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2127/2613 [26:39<06:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2128/2613 [26:39<06:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 81%|████████▏ | 2129/2613 [26:40<06:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2130/2613 [26:41<06:02,  1.33it/s]

	Current Loss: 1.5963
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2131/2613 [26:42<06:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2132/2613 [26:42<06:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2133/2613 [26:43<06:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2134/2613 [26:44<05:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2135/2613 [26:45<05:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2136/2613 [26:45<05:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2137/2613 [26:46<05:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2138/2613 [26:47<05:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2139/2613 [26:48<05:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2140/2613 [26:48<05:56,  1.33it/s]

	Current Loss: 1.5999
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2141/2613 [26:49<05:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2142/2613 [26:50<05:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2143/2613 [26:51<05:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2144/2613 [26:51<05:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2145/2613 [26:52<05:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2146/2613 [26:53<05:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2147/2613 [26:54<05:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2148/2613 [26:54<05:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2149/2613 [26:55<05:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2150/2613 [26:56<05:47,  1.33it/s]

	Current Loss: 1.5977
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2151/2613 [26:57<05:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2152/2613 [26:57<05:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2153/2613 [26:58<05:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2154/2613 [26:59<05:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 82%|████████▏ | 2155/2613 [27:00<05:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2156/2613 [27:00<05:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2157/2613 [27:01<05:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2158/2613 [27:02<05:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2159/2613 [27:03<05:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2160/2613 [27:03<05:40,  1.33it/s]

	Current Loss: 1.6021
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2161/2613 [27:04<05:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2162/2613 [27:05<05:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2163/2613 [27:06<05:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2164/2613 [27:06<05:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2165/2613 [27:07<05:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2166/2613 [27:08<05:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2167/2613 [27:09<05:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2168/2613 [27:09<05:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2169/2613 [27:10<05:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2170/2613 [27:11<05:32,  1.33it/s]

	Current Loss: 1.5976
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2171/2613 [27:12<05:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2172/2613 [27:12<05:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2173/2613 [27:13<05:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2174/2613 [27:14<05:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2175/2613 [27:15<05:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2176/2613 [27:15<05:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2177/2613 [27:16<05:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2178/2613 [27:17<05:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2179/2613 [27:18<05:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2180/2613 [27:18<05:25,  1.33it/s]

	Current Loss: 1.5980
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 83%|████████▎ | 2181/2613 [27:19<05:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2182/2613 [27:20<05:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2183/2613 [27:21<05:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2184/2613 [27:21<05:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2185/2613 [27:22<05:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2186/2613 [27:23<05:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2187/2613 [27:24<05:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▎ | 2188/2613 [27:24<05:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2189/2613 [27:25<05:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2190/2613 [27:26<05:17,  1.33it/s]

	Current Loss: 1.5979
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2191/2613 [27:27<05:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2192/2613 [27:27<05:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2193/2613 [27:28<05:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2194/2613 [27:29<05:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2195/2613 [27:30<05:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2196/2613 [27:30<05:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2197/2613 [27:31<05:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2198/2613 [27:32<05:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2199/2613 [27:33<05:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2200/2613 [27:33<05:10,  1.33it/s]

	Current Loss: 1.5923
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2201/2613 [27:34<05:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2202/2613 [27:35<05:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2203/2613 [27:36<05:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2204/2613 [27:36<05:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2205/2613 [27:37<05:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2206/2613 [27:38<05:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 84%|████████▍ | 2207/2613 [27:39<05:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2208/2613 [27:39<05:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2209/2613 [27:40<05:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2210/2613 [27:41<05:02,  1.33it/s]

	Current Loss: 1.5949
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2211/2613 [27:42<05:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2212/2613 [27:42<05:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2213/2613 [27:43<05:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2214/2613 [27:44<04:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2215/2613 [27:45<04:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2216/2613 [27:45<04:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2217/2613 [27:46<04:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2218/2613 [27:47<04:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2219/2613 [27:48<04:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2220/2613 [27:48<04:55,  1.33it/s]

	Current Loss: 1.5858
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▍ | 2221/2613 [27:49<04:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2222/2613 [27:50<04:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2223/2613 [27:51<04:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2224/2613 [27:51<04:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2225/2613 [27:52<04:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2226/2613 [27:53<04:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2227/2613 [27:54<04:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2228/2613 [27:54<04:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2229/2613 [27:55<04:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2230/2613 [27:56<04:48,  1.33it/s]

	Current Loss: 1.5960
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2231/2613 [27:57<04:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2232/2613 [27:57<04:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2233/2613 [27:58<04:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 85%|████████▌ | 2234/2613 [27:59<04:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2235/2613 [28:00<04:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2236/2613 [28:00<04:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2237/2613 [28:01<04:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2238/2613 [28:02<04:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2239/2613 [28:03<04:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2240/2613 [28:03<04:40,  1.33it/s]

	Current Loss: 1.5897
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2241/2613 [28:04<04:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2242/2613 [28:05<04:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2243/2613 [28:06<04:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2244/2613 [28:06<04:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2245/2613 [28:07<04:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2246/2613 [28:08<04:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2247/2613 [28:09<04:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2248/2613 [28:09<04:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2249/2613 [28:10<04:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2250/2613 [28:11<04:32,  1.33it/s]

	Current Loss: 1.5897
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2251/2613 [28:12<04:32,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2252/2613 [28:12<04:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▌ | 2253/2613 [28:13<04:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2254/2613 [28:14<04:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2255/2613 [28:15<04:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2256/2613 [28:15<04:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2257/2613 [28:16<04:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2258/2613 [28:17<04:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2259/2613 [28:18<04:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 86%|████████▋ | 2260/2613 [28:18<04:25,  1.33it/s]

	Current Loss: 1.5843
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2261/2613 [28:19<04:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2262/2613 [28:20<04:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2263/2613 [28:21<04:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2264/2613 [28:21<04:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2265/2613 [28:22<04:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2266/2613 [28:23<04:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2267/2613 [28:24<04:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2268/2613 [28:24<04:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2269/2613 [28:25<04:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2270/2613 [28:26<04:17,  1.33it/s]

	Current Loss: 1.5959
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2271/2613 [28:27<04:17,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2272/2613 [28:28<04:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2273/2613 [28:28<04:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2274/2613 [28:29<04:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2275/2613 [28:30<04:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2276/2613 [28:31<04:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2277/2613 [28:31<04:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2278/2613 [28:32<04:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2279/2613 [28:33<04:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2280/2613 [28:34<04:10,  1.33it/s]

	Current Loss: 1.5956
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2281/2613 [28:34<04:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2282/2613 [28:35<04:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2283/2613 [28:36<04:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2284/2613 [28:37<04:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2285/2613 [28:37<04:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 87%|████████▋ | 2286/2613 [28:38<04:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2287/2613 [28:39<04:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2288/2613 [28:40<04:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2289/2613 [28:40<04:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2290/2613 [28:41<04:02,  1.33it/s]

	Current Loss: 1.5930
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2291/2613 [28:42<04:02,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2292/2613 [28:43<04:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2293/2613 [28:43<04:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2294/2613 [28:44<03:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2295/2613 [28:45<03:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2296/2613 [28:46<03:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2297/2613 [28:46<03:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2298/2613 [28:47<03:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2299/2613 [28:48<03:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2300/2613 [28:49<03:55,  1.33it/s]

	Current Loss: 1.5893
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2301/2613 [28:49<03:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2302/2613 [28:50<03:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2303/2613 [28:51<03:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2304/2613 [28:52<03:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2305/2613 [28:52<03:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2306/2613 [28:53<03:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2307/2613 [28:54<03:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2308/2613 [28:55<03:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2309/2613 [28:55<03:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2310/2613 [28:56<03:48,  1.33it/s]

	Current Loss: 1.5943
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2311/2613 [28:57<03:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 88%|████████▊ | 2312/2613 [28:58<03:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2313/2613 [28:58<03:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2314/2613 [28:59<03:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2315/2613 [29:00<03:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2316/2613 [29:01<03:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2317/2613 [29:01<03:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2318/2613 [29:02<03:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▊ | 2319/2613 [29:03<03:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2320/2613 [29:04<03:40,  1.33it/s]

	Current Loss: 1.5948
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2321/2613 [29:04<03:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2322/2613 [29:05<03:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2323/2613 [29:06<03:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2324/2613 [29:07<03:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2325/2613 [29:07<03:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2326/2613 [29:08<03:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2327/2613 [29:09<03:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2328/2613 [29:10<03:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2329/2613 [29:10<03:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2330/2613 [29:11<03:32,  1.33it/s]

	Current Loss: 1.5929
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2331/2613 [29:12<03:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2332/2613 [29:13<03:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2333/2613 [29:13<03:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2334/2613 [29:14<03:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2335/2613 [29:15<03:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2336/2613 [29:16<03:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2337/2613 [29:16<03:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 89%|████████▉ | 2338/2613 [29:17<03:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2339/2613 [29:18<03:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2340/2613 [29:19<03:24,  1.33it/s]

	Current Loss: 1.5975
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2341/2613 [29:19<03:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2342/2613 [29:20<03:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2343/2613 [29:21<03:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2344/2613 [29:22<03:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2345/2613 [29:22<03:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2346/2613 [29:23<03:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2347/2613 [29:24<03:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2348/2613 [29:25<03:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2349/2613 [29:25<03:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2350/2613 [29:26<03:17,  1.33it/s]

	Current Loss: 1.5924
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|████████▉ | 2351/2613 [29:27<03:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2352/2613 [29:28<03:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2353/2613 [29:28<03:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2354/2613 [29:29<03:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2355/2613 [29:30<03:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2356/2613 [29:31<03:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2357/2613 [29:31<03:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2358/2613 [29:32<03:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2359/2613 [29:33<03:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2360/2613 [29:34<03:10,  1.33it/s]

	Current Loss: 1.5890
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2361/2613 [29:34<03:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2362/2613 [29:35<03:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2363/2613 [29:36<03:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 90%|█████████ | 2364/2613 [29:37<03:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2365/2613 [29:37<03:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2366/2613 [29:38<03:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2367/2613 [29:39<03:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2368/2613 [29:40<03:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2369/2613 [29:40<03:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2370/2613 [29:41<03:02,  1.33it/s]

	Current Loss: 1.5865
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2371/2613 [29:42<03:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2372/2613 [29:43<03:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2373/2613 [29:43<03:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2374/2613 [29:44<03:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2375/2613 [29:45<02:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2376/2613 [29:46<02:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2377/2613 [29:46<02:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2378/2613 [29:47<02:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2379/2613 [29:48<02:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2380/2613 [29:49<02:55,  1.33it/s]

	Current Loss: 1.5958
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2381/2613 [29:49<02:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2382/2613 [29:50<02:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2383/2613 [29:51<02:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████ | 2384/2613 [29:52<02:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2385/2613 [29:52<02:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2386/2613 [29:53<02:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2387/2613 [29:54<02:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2388/2613 [29:55<02:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2389/2613 [29:55<02:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 91%|█████████▏| 2390/2613 [29:56<02:47,  1.33it/s]

	Current Loss: 1.5928
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2391/2613 [29:57<02:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2392/2613 [29:58<02:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2393/2613 [29:58<02:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2394/2613 [29:59<02:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2395/2613 [30:00<02:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2396/2613 [30:01<02:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2397/2613 [30:01<02:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2398/2613 [30:02<02:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2399/2613 [30:03<02:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2400/2613 [30:04<02:40,  1.33it/s]

	Current Loss: 1.5924
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2401/2613 [30:04<02:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2402/2613 [30:05<02:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2403/2613 [30:06<02:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2404/2613 [30:07<02:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2405/2613 [30:07<02:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2406/2613 [30:08<02:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2407/2613 [30:09<02:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2408/2613 [30:10<02:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2409/2613 [30:10<02:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2410/2613 [30:11<02:32,  1.33it/s]

	Current Loss: 1.5896
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2411/2613 [30:12<02:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2412/2613 [30:13<02:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2413/2613 [30:14<02:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2414/2613 [30:14<02:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2415/2613 [30:15<02:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2416/2613 [30:16<02:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 92%|█████████▏| 2417/2613 [30:17<02:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2418/2613 [30:17<02:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2419/2613 [30:18<02:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2420/2613 [30:19<02:24,  1.33it/s]

	Current Loss: 1.5875
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2421/2613 [30:20<02:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2422/2613 [30:20<02:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2423/2613 [30:21<02:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2424/2613 [30:22<02:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2425/2613 [30:23<02:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2426/2613 [30:23<02:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2427/2613 [30:24<02:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2428/2613 [30:25<02:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2429/2613 [30:26<02:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2430/2613 [30:26<02:17,  1.33it/s]

	Current Loss: 1.5913
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2431/2613 [30:27<02:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2432/2613 [30:28<02:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2433/2613 [30:29<02:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2434/2613 [30:29<02:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2435/2613 [30:30<02:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2436/2613 [30:31<02:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2437/2613 [30:32<02:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2438/2613 [30:32<02:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2439/2613 [30:33<02:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2440/2613 [30:34<02:10,  1.33it/s]

	Current Loss: 1.5846
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2441/2613 [30:35<02:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2442/2613 [30:35<02:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 93%|█████████▎| 2443/2613 [30:36<02:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2444/2613 [30:37<02:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2445/2613 [30:38<02:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2446/2613 [30:38<02:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2447/2613 [30:39<02:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2448/2613 [30:40<02:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▎| 2449/2613 [30:41<02:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2450/2613 [30:41<02:02,  1.33it/s]

	Current Loss: 1.5876
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2451/2613 [30:42<02:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2452/2613 [30:43<02:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2453/2613 [30:44<02:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2454/2613 [30:44<01:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2455/2613 [30:45<01:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2456/2613 [30:46<01:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2457/2613 [30:47<01:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2458/2613 [30:47<01:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2459/2613 [30:48<01:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2460/2613 [30:49<01:55,  1.33it/s]

	Current Loss: 1.5864
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2461/2613 [30:50<01:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2462/2613 [30:50<01:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2463/2613 [30:51<01:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2464/2613 [30:52<01:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2465/2613 [30:53<01:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2466/2613 [30:53<01:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2467/2613 [30:54<01:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2468/2613 [30:55<01:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 94%|█████████▍| 2469/2613 [30:56<01:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2470/2613 [30:56<01:47,  1.33it/s]

	Current Loss: 1.5916
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2471/2613 [30:57<01:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2472/2613 [30:58<01:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2473/2613 [30:59<01:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2474/2613 [30:59<01:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2475/2613 [31:00<01:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2476/2613 [31:01<01:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2477/2613 [31:02<01:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2478/2613 [31:02<01:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2479/2613 [31:03<01:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2480/2613 [31:04<01:40,  1.33it/s]

	Current Loss: 1.5956
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2481/2613 [31:05<01:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▍| 2482/2613 [31:05<01:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2483/2613 [31:06<01:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2484/2613 [31:07<01:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2485/2613 [31:08<01:36,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2486/2613 [31:08<01:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2487/2613 [31:09<01:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2488/2613 [31:10<01:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2489/2613 [31:11<01:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2490/2613 [31:11<01:32,  1.33it/s]

	Current Loss: 1.5917
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2491/2613 [31:12<01:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2492/2613 [31:13<01:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2493/2613 [31:14<01:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2494/2613 [31:14<01:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 95%|█████████▌| 2495/2613 [31:15<01:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2496/2613 [31:16<01:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2497/2613 [31:17<01:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2498/2613 [31:17<01:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2499/2613 [31:18<01:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2500/2613 [31:19<01:24,  1.33it/s]

	Current Loss: 1.5872
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2501/2613 [31:20<01:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2502/2613 [31:20<01:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2503/2613 [31:21<01:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2504/2613 [31:22<01:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2505/2613 [31:23<01:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2506/2613 [31:23<01:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2507/2613 [31:24<01:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2508/2613 [31:25<01:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2509/2613 [31:26<01:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2510/2613 [31:26<01:17,  1.33it/s]

	Current Loss: 1.5852
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2511/2613 [31:27<01:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2512/2613 [31:28<01:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2513/2613 [31:29<01:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2514/2613 [31:29<01:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▌| 2515/2613 [31:30<01:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2516/2613 [31:31<01:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2517/2613 [31:32<01:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2518/2613 [31:32<01:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2519/2613 [31:33<01:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2520/2613 [31:34<01:09,  1.33it/s]

	Current Loss: 1.5888
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 96%|█████████▋| 2521/2613 [31:35<01:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2522/2613 [31:35<01:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2523/2613 [31:36<01:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2524/2613 [31:37<01:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2525/2613 [31:38<01:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2526/2613 [31:38<01:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2527/2613 [31:39<01:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2528/2613 [31:40<01:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2529/2613 [31:41<01:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2530/2613 [31:41<01:02,  1.33it/s]

	Current Loss: 1.5913
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2531/2613 [31:42<01:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2532/2613 [31:43<01:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2533/2613 [31:44<01:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2534/2613 [31:44<00:59,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2535/2613 [31:45<00:58,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2536/2613 [31:46<00:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2537/2613 [31:47<00:57,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2538/2613 [31:47<00:56,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2539/2613 [31:48<00:55,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2540/2613 [31:49<00:54,  1.33it/s]

	Current Loss: 1.5883
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2541/2613 [31:50<00:54,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2542/2613 [31:50<00:53,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2543/2613 [31:51<00:52,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2544/2613 [31:52<00:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2545/2613 [31:53<00:51,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2546/2613 [31:53<00:50,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 97%|█████████▋| 2547/2613 [31:54<00:49,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2548/2613 [31:55<00:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2549/2613 [31:56<00:48,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2550/2613 [31:56<00:47,  1.33it/s]

	Current Loss: 1.5794
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2551/2613 [31:57<00:46,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2552/2613 [31:58<00:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2553/2613 [31:59<00:45,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2554/2613 [32:00<00:44,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2555/2613 [32:00<00:43,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2556/2613 [32:01<00:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2557/2613 [32:02<00:42,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2558/2613 [32:03<00:41,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2559/2613 [32:03<00:40,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2560/2613 [32:04<00:39,  1.33it/s]

	Current Loss: 1.5902
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2561/2613 [32:05<00:39,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2562/2613 [32:06<00:38,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2563/2613 [32:06<00:37,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2564/2613 [32:07<00:37,  1.30it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2565/2613 [32:08<00:35,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2566/2613 [32:09<00:35,  1.34it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2567/2613 [32:09<00:34,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2568/2613 [32:10<00:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2569/2613 [32:11<00:33,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2570/2613 [32:12<00:32,  1.33it/s]

	Current Loss: 1.5848
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2571/2613 [32:12<00:31,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2572/2613 [32:13<00:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 98%|█████████▊| 2573/2613 [32:14<00:30,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2574/2613 [32:15<00:29,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2575/2613 [32:15<00:28,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2576/2613 [32:16<00:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2577/2613 [32:17<00:27,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2578/2613 [32:18<00:26,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2579/2613 [32:18<00:25,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▊| 2580/2613 [32:19<00:24,  1.33it/s]

	Current Loss: 1.5886
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2581/2613 [32:20<00:24,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2582/2613 [32:21<00:23,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2583/2613 [32:21<00:22,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2584/2613 [32:22<00:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2585/2613 [32:23<00:21,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2586/2613 [32:24<00:20,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2587/2613 [32:24<00:19,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2588/2613 [32:25<00:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2589/2613 [32:26<00:18,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2590/2613 [32:27<00:17,  1.33it/s]

	Current Loss: 1.5924
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2591/2613 [32:27<00:16,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2592/2613 [32:28<00:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2593/2613 [32:29<00:15,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2594/2613 [32:30<00:14,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2595/2613 [32:30<00:13,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2596/2613 [32:31<00:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2597/2613 [32:32<00:12,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2598/2613 [32:33<00:11,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


 99%|█████████▉| 2599/2613 [32:33<00:10,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2600/2613 [32:34<00:09,  1.33it/s]

	Current Loss: 1.5839
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2601/2613 [32:35<00:09,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2602/2613 [32:36<00:08,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2603/2613 [32:36<00:07,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2604/2613 [32:37<00:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2605/2613 [32:38<00:06,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2606/2613 [32:39<00:05,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2607/2613 [32:39<00:04,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2608/2613 [32:40<00:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2609/2613 [32:41<00:03,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2610/2613 [32:42<00:02,  1.33it/s]

	Current Loss: 1.5843
dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2611/2613 [32:42<00:01,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|█████████▉| 2612/2613 [32:43<00:00,  1.33it/s]

dec_inputs shape: torch.Size([256, 128])
dec_outputs shape: torch.Size([256, 128])
model output shape: torch.Size([32768, 64])


100%|██████████| 2613/2613 [32:44<00:00,  1.33it/s]


Epoch 4, Train Loss: 1.6177, Time: 1964.43s


In [None]:
import torch.nn.functional as F

def sample_next_token(logits, temperature=1.0):
    logits = logits / temperature
    probs = F.softmax(logits, dim=-1)
    next_token = torch.multinomial(probs, num_samples=1)
    return next_token.item()

def generate_text(model, start_text, length, dataset, temperature=1.0):
  model.eval()
  generated = start_text
  input_ids = torch.tensor([dataset.stoi[ch] for ch in start_text]).unsqueeze(0).to(device)
  for _ in range(length):
    with torch.no_grad():
      logits, _ = model(input_ids)
      logits = logits[-1, :]
      predicted_id = sample_next_token(logits, temperature)
      if predicted_id >= len(dataset.itos) or predicted_id < 0:
          continue
      if predicted_id == 0:
          continue

    generated += dataset.itos[predicted_id]
    input_ids = torch.cat([input_ids, torch.tensor([[predicted_id]]).to(device)], dim=1)
  return generated


start_text = "O God, O God!"
dataset = test_dataloader.dataset
generated_text = generate_text(model,start_text, length = 100, dataset=dataset)
print(generated_text)

O God, O God! K co jcto,f jgtghtqog jko- jg pkijv jku dg hcvj: Octecwu hckuqp- hqqv vjg itgcv yjcv uvqng vjku ykv


In [None]:
def evaluate(model, data_loader, criterion):
  model.eval()
  total_loss = 0
  with torch.no_grad():
    for dec_inputs, dec_outputs in data_loader:
      dec_inputs, dec_outputs = dec_inputs.to(device), dec_outputs.to(device)
      outputs, _ = model(dec_inputs)
      loss = criterion(outputs, dec_outputs.view(-1))
      total_loss += loss.item()
  return total_loss / len(data_loader)

test_loss = evaluate(model, test_dataloader, nn.CrossEntropyLoss(ignore_index=0).to(device))
print("The loss of test data set is:", test_loss)

The loss of test data set is: 7.202136781297881
