<center><img src='https://drive.google.com/uc?id=1_utx_ZGclmCwNttSe40kYA6VHzNocdET' height="60"></center>

AI TECH - Akademia Innowacyjnych Zastosowań Technologii Cyfrowych. Program Operacyjny Polska Cyfrowa na lata 2014-2020
<hr>

<center><img src='https://drive.google.com/uc?id=1BXZ0u3562N_MqCLcekI-Ens77Kk4LpPm'></center>

<center>
Projekt współfinansowany ze środków Unii Europejskiej w ramach Europejskiego Funduszu Rozwoju Regionalnego
Program Operacyjny Polska Cyfrowa na lata 2014-2020,
Oś Priorytetowa nr 3 "Cyfrowe kompetencje społeczeństwa" Działanie  nr 3.2 "Innowacyjne rozwiązania na rzecz aktywizacji cyfrowej"
Tytuł projektu:  „Akademia Innowacyjnych Zastosowań Technologii Cyfrowych (AI Tech)”
    </center>

# TL;DR

1. In this lab scenario you will have a chance to compare performance of the classic RNN and LSTM on a toy example.
2. This toy example will show that maintaining memory over even 20 steps is non-trivial.
3. Finally, you will see how curriculum learning may allow to train a model on larger sequences.

# Problem definition

Here we consider a toy example, where the goal is to discriminate between two types of binary sequences:
* [Type 0] a sequence with exactly one zero (remaining entries are equal to one).
* [Type 1] a sequence full of ones,

We are especially interested in the performance of the trained models on discriminating between a sequence full of ones versus a sequence with leading zero followed by ones. Note that in this case the goal of the model is to output the first element of the sequence, as the label (sequence type) is fully determined by the first element of the sequence.

#Implementation

## Importing torch

Install `torch` and `torchvision`

In [1]:
!pip3 install torch torchvision



In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import random
from typing import List
from tqdm import tqdm

torch.manual_seed(1)

<torch._C.Generator at 0x7848d7359ad0>

## Understand dimensionality

Check the input and output specification [LSTM](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html) and [RNN](https://pytorch.org/docs/stable/generated/torch.nn.RNN.html). The following snippet shows how we can process
a sequence by LSTM and output a vector of size `hidden_dim` after reading
each token of the sequence.

In [None]:
hidden_dim = 5
lstm = nn.LSTM(1, hidden_dim)  # Input sequence contains elements - vectors of size 1

# create a random sequence
sequence = [torch.randn(1) for _ in range(10)]

# initialize the hidden state (including cell state)
hidden = (torch.zeros(1, 1, 5),
          torch.zeros(1, 1, 5))

for i, elem in enumerate(sequence):
  # we are processing only a single element of the sequence, and there
  # is only one sample (sequence) in the batch, the third one
  # corresponds to the fact that our sequence contains elemenents,
  # which can be treated as vectors of size 1
  out, hidden = lstm(elem.view(1, 1, 1), hidden)
  print(f'i={i} out={out.detach()}')
print(f'Final hidden state={hidden[0].detach()} cell state={hidden[1].detach()}')

i=0 out=tensor([[[ 0.0396,  0.0791,  0.0660,  0.1606, -0.0123]]])
i=1 out=tensor([[[ 0.0548,  0.1250,  0.1118,  0.2201, -0.0197]]])
i=2 out=tensor([[[ 0.0631,  0.1464,  0.1355,  0.2458, -0.0244]]])
i=3 out=tensor([[[ 0.0883,  0.1290,  0.1077,  0.2996, -0.0124]]])
i=4 out=tensor([[[ 0.0515,  0.1840,  0.1855,  0.2289, -0.0334]]])
i=5 out=tensor([[[ 0.0656,  0.1654,  0.1625,  0.2604, -0.0362]]])
i=6 out=tensor([[[ 0.0399,  0.2011,  0.2175,  0.2154, -0.0571]]])
i=7 out=tensor([[[ 0.0299,  0.2115,  0.2359,  0.2058, -0.0774]]])
i=8 out=tensor([[[ 0.0605,  0.1727,  0.1826,  0.2573, -0.0583]]])
i=9 out=tensor([[[ 0.0483,  0.1949,  0.2132,  0.2282, -0.0604]]])
Final hidden state=tensor([[[ 0.0483,  0.1949,  0.2132,  0.2282, -0.0604]]]) cell state=tensor([[[ 0.1221,  0.5815,  0.5050,  0.5271, -0.0939]]])


## To implement

Process the whole sequence all at once by calling `lstm` only once and check that the output is exactly the same as above (remember to initialize the hidden state the same way).

In [None]:
# #########################################################
#                    To implement
# #########################################################
sequence_torch = torch.stack(sequence)

hidden_lstm = (torch.zeros(1, 1, 5),
               torch.zeros(1, 1, 5))

# input_tensor shape: (L, N, H_in), L=sequence length, N=batch size, H_in=input_size
out, hidden_lstm = lstm(sequence_torch.view(10,1,1), hidden_lstm)
print(torch.allclose(hidden_lstm[0], hidden[0])  and torch.allclose(hidden_lstm[1], hidden[1]))

True


## Training a model

Below we define a very simple model, which is a single layer of LSTM, where the output in each time step is processed by relu followed by a single fully connected layer, the output of which is a single number. We are going
to use the number generated after reading the last element of the sequence,
which will serve as the logit for our classification problem.

In [3]:
class Model(nn.Module):

    def __init__(self, hidden_dim: int):
        super(Model, self).__init__()
        self.hidden_dim = hidden_dim
        self.lstm = nn.LSTM(1, self.hidden_dim)
        self.hidden2label = nn.Linear(hidden_dim, 1)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        out, _ = self.lstm(x)
        sequence_len = x.shape[0]
        logits = self.hidden2label(F.relu(out[-1].view(-1)))
        return logits

Below is a training loop, where we only train on the two hardest examples.

In [4]:
SEQUENCE_LEN = 10

# Pairs of (sequence, label)
HARD_EXAMPLES = [([0.]+(SEQUENCE_LEN-1)*[1.], 0),
                 (SEQUENCE_LEN*[1.], 1)]

# 2
def generate_examples(sequence_len: int):
    sequences = []
    for _ in range(100):
        sequences.append(([0.] + [random.choice([0., 1.]) for _ in range(sequence_len - 1)], 0))
        sequences.append(([1.] * sequence_len, 1))
    return sequences


def eval_on_hard_examples(model: nn.Module) -> List[float]:
    with torch.no_grad():
        logits = []
        for sequence in HARD_EXAMPLES:
            input = torch.tensor(sequence[0]).view(-1, 1, 1)
            logit = model(input)
            logits.append(logit.detach())
        print(f'Logits for hard examples={logits}')
        return logits


def train_model(hidden_dim: int, lr: float, num_steps:int = 10000, examples = HARD_EXAMPLES) -> None:
    model = Model(hidden_dim=hidden_dim)
    loss_function = nn.BCEWithLogitsLoss()
    optimizer = optim.SGD(model.parameters(), lr=lr, momentum=0.99)

    pbar = tqdm(range(num_steps))
    for step in pbar:
        if step % 100 == 0:
            logits = eval_on_hard_examples(model)
            pbar.set_postfix(logits=logits)

        for sequence, label in examples:
            model.zero_grad()
            logit = model(torch.tensor(sequence).view(-1, 1, 1))

            loss = loss_function(logit.view(-1), torch.tensor([label], dtype=torch.float32))
            loss.backward()

            optimizer.step()

In [None]:
train_model(hidden_dim=20, lr=0.01, num_steps=10000)

  0%|          | 37/10000 [00:00<01:35, 104.49it/s, logits=[tensor([-0.0075]), tensor([-0.0076])]]

Logits for hard examples=[tensor([-0.0075]), tensor([-0.0076])]


  1%|▏         | 137/10000 [00:00<00:39, 252.28it/s, logits=[tensor([0.0218]), tensor([0.0218])]]

Logits for hard examples=[tensor([0.0218]), tensor([0.0218])]


  2%|▏         | 241/10000 [00:01<00:31, 311.48it/s, logits=[tensor([0.0026]), tensor([0.0026])]]

Logits for hard examples=[tensor([0.0026]), tensor([0.0026])]


  3%|▎         | 343/10000 [00:01<00:30, 318.10it/s, logits=[tensor([-0.0013]), tensor([-0.0012])]]

Logits for hard examples=[tensor([-0.0013]), tensor([-0.0012])]


  4%|▍         | 434/10000 [00:01<00:37, 255.82it/s, logits=[tensor([0.0017]), tensor([0.0019])]]

Logits for hard examples=[tensor([0.0017]), tensor([0.0019])]


  5%|▌         | 542/10000 [00:02<00:36, 260.53it/s, logits=[tensor([0.0017]), tensor([0.0019])]]

Logits for hard examples=[tensor([0.0017]), tensor([0.0019])]


  6%|▋         | 650/10000 [00:02<00:36, 258.08it/s, logits=[tensor([0.0013]), tensor([0.0016])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0016])]


  7%|▋         | 727/10000 [00:02<00:37, 247.13it/s, logits=[tensor([0.0012]), tensor([0.0018])]]

Logits for hard examples=[tensor([0.0012]), tensor([0.0018])]


  8%|▊         | 825/10000 [00:03<00:39, 231.95it/s, logits=[tensor([0.0002]), tensor([0.0022])]]

Logits for hard examples=[tensor([0.0002]), tensor([0.0022])]


  9%|▉         | 946/10000 [00:03<00:38, 234.97it/s, logits=[tensor([-0.2752]), tensor([-0.2735])]]

Logits for hard examples=[tensor([-0.2752]), tensor([-0.2735])]


 11%|█         | 1053/10000 [00:04<00:34, 258.79it/s, logits=[tensor([-0.0965]), tensor([-0.0964])]]

Logits for hard examples=[tensor([-0.0965]), tensor([-0.0964])]


 12%|█▏        | 1158/10000 [00:04<00:27, 319.48it/s, logits=[tensor([0.0161]), tensor([0.0161])]]

Logits for hard examples=[tensor([0.0161]), tensor([0.0161])]


 13%|█▎        | 1264/10000 [00:04<00:25, 336.59it/s, logits=[tensor([0.0146]), tensor([0.0146])]]

Logits for hard examples=[tensor([0.0146]), tensor([0.0146])]


 14%|█▎        | 1371/10000 [00:05<00:24, 347.24it/s, logits=[tensor([0.0075]), tensor([0.0076])]]

Logits for hard examples=[tensor([0.0075]), tensor([0.0076])]


 14%|█▍        | 1441/10000 [00:05<00:24, 344.36it/s, logits=[tensor([0.0040]), tensor([0.0041])]]

Logits for hard examples=[tensor([0.0040]), tensor([0.0041])]


 16%|█▌        | 1550/10000 [00:05<00:25, 335.98it/s, logits=[tensor([0.0026]), tensor([0.0028])]]

Logits for hard examples=[tensor([0.0026]), tensor([0.0028])]


 17%|█▋        | 1658/10000 [00:06<00:23, 349.08it/s, logits=[tensor([0.0021]), tensor([0.0023])]]

Logits for hard examples=[tensor([0.0021]), tensor([0.0023])]


 18%|█▊        | 1768/10000 [00:06<00:23, 353.62it/s, logits=[tensor([0.0019]), tensor([0.0024])]]

Logits for hard examples=[tensor([0.0019]), tensor([0.0024])]


 18%|█▊        | 1841/10000 [00:06<00:23, 346.29it/s, logits=[tensor([0.0027]), tensor([0.0047])]]

Logits for hard examples=[tensor([0.0027]), tensor([0.0047])]


 19%|█▉        | 1946/10000 [00:06<00:24, 333.79it/s, logits=[tensor([-0.2310]), tensor([-0.2310])]]

Logits for hard examples=[tensor([-0.2310]), tensor([-0.2310])]


 21%|██        | 2056/10000 [00:07<00:22, 353.71it/s, logits=[tensor([0.1210]), tensor([0.1211])]]

Logits for hard examples=[tensor([0.1210]), tensor([0.1211])]


 22%|██▏       | 2164/10000 [00:07<00:23, 340.59it/s, logits=[tensor([-0.0484]), tensor([-0.0483])]]

Logits for hard examples=[tensor([-0.0484]), tensor([-0.0483])]


 22%|██▏       | 2234/10000 [00:07<00:23, 326.79it/s, logits=[tensor([0.0226]), tensor([0.0227])]]

Logits for hard examples=[tensor([0.0226]), tensor([0.0227])]


 23%|██▎       | 2341/10000 [00:08<00:22, 341.13it/s, logits=[tensor([-0.0033]), tensor([-0.0031])]]

Logits for hard examples=[tensor([-0.0033]), tensor([-0.0031])]


 24%|██▍       | 2446/10000 [00:08<00:22, 336.23it/s, logits=[tensor([0.0036]), tensor([0.0039])]]

Logits for hard examples=[tensor([0.0036]), tensor([0.0039])]


 26%|██▌       | 2550/10000 [00:08<00:23, 323.32it/s, logits=[tensor([0.0027]), tensor([0.0031])]]

Logits for hard examples=[tensor([0.0027]), tensor([0.0031])]


 27%|██▋       | 2656/10000 [00:09<00:21, 338.61it/s, logits=[tensor([0.0017]), tensor([0.0030])]]

Logits for hard examples=[tensor([0.0017]), tensor([0.0030])]


 28%|██▊       | 2765/10000 [00:09<00:20, 351.20it/s, logits=[tensor([1.8206]), tensor([1.9288])]]

Logits for hard examples=[tensor([1.8206]), tensor([1.9288])]


 28%|██▊       | 2836/10000 [00:09<00:20, 345.92it/s, logits=[tensor([0.9556]), tensor([0.9556])]]

Logits for hard examples=[tensor([0.9556]), tensor([0.9556])]


 29%|██▉       | 2941/10000 [00:09<00:21, 324.87it/s, logits=[tensor([-0.0719]), tensor([-0.0719])]]

Logits for hard examples=[tensor([-0.0719]), tensor([-0.0719])]


 30%|███       | 3048/10000 [00:10<00:20, 342.32it/s, logits=[tensor([-0.1311]), tensor([-0.1311])]]

Logits for hard examples=[tensor([-0.1311]), tensor([-0.1311])]


 32%|███▏      | 3153/10000 [00:10<00:20, 341.75it/s, logits=[tensor([0.0118]), tensor([0.0118])]]

Logits for hard examples=[tensor([0.0118]), tensor([0.0118])]


 33%|███▎      | 3256/10000 [00:10<00:20, 321.67it/s, logits=[tensor([0.0204]), tensor([0.0204])]]

Logits for hard examples=[tensor([0.0204]), tensor([0.0204])]


 34%|███▎      | 3358/10000 [00:11<00:20, 330.27it/s, logits=[tensor([0.0012]), tensor([0.0012])]]

Logits for hard examples=[tensor([0.0012]), tensor([0.0012])]


 35%|███▍      | 3461/10000 [00:11<00:19, 330.07it/s, logits=[tensor([0.0001]), tensor([0.0001])]]

Logits for hard examples=[tensor([0.0001]), tensor([0.0001])]


 35%|███▌      | 3529/10000 [00:11<00:20, 310.63it/s, logits=[tensor([0.0027]), tensor([0.0027])]]

Logits for hard examples=[tensor([0.0027]), tensor([0.0027])]


 37%|███▋      | 3664/10000 [00:12<00:19, 326.49it/s, logits=[tensor([0.0028]), tensor([0.0028])]]

Logits for hard examples=[tensor([0.0028]), tensor([0.0028])]


 38%|███▊      | 3767/10000 [00:12<00:18, 335.66it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 38%|███▊      | 3834/10000 [00:12<00:18, 326.32it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 40%|███▉      | 3967/10000 [00:13<00:18, 319.97it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 40%|████      | 4037/10000 [00:13<00:18, 330.21it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 41%|████▏     | 4138/10000 [00:13<00:18, 325.32it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 42%|████▏     | 4240/10000 [00:13<00:18, 316.54it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 43%|████▎     | 4342/10000 [00:14<00:17, 329.32it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 44%|████▍     | 4437/10000 [00:14<00:20, 270.93it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 45%|████▌     | 4543/10000 [00:15<00:22, 244.48it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 46%|████▋     | 4646/10000 [00:15<00:21, 244.52it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 47%|████▋     | 4745/10000 [00:15<00:22, 229.34it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 48%|████▊     | 4840/10000 [00:16<00:22, 228.00it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 49%|████▉     | 4934/10000 [00:16<00:23, 219.14it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 50%|█████     | 5025/10000 [00:17<00:22, 220.50it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 52%|█████▏    | 5161/10000 [00:17<00:15, 307.93it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 53%|█████▎    | 5258/10000 [00:17<00:15, 312.74it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 54%|█████▎    | 5358/10000 [00:18<00:14, 324.20it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 55%|█████▍    | 5459/10000 [00:18<00:13, 326.90it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 55%|█████▌    | 5528/10000 [00:18<00:14, 312.42it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 56%|█████▋    | 5631/10000 [00:19<00:13, 313.67it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 58%|█████▊    | 5766/10000 [00:19<00:12, 325.85it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 58%|█████▊    | 5832/10000 [00:19<00:13, 314.13it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 60%|█████▉    | 5960/10000 [00:20<00:13, 305.37it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 61%|██████    | 6060/10000 [00:20<00:12, 319.32it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 62%|██████▏   | 6158/10000 [00:20<00:12, 308.09it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 63%|██████▎   | 6258/10000 [00:21<00:11, 320.62it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 64%|██████▎   | 6357/10000 [00:21<00:11, 322.42it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 65%|██████▍   | 6458/10000 [00:21<00:10, 328.75it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 66%|██████▌   | 6559/10000 [00:21<00:10, 324.57it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 67%|██████▋   | 6659/10000 [00:22<00:10, 317.64it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 68%|██████▊   | 6762/10000 [00:22<00:09, 332.46it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 69%|██████▊   | 6865/10000 [00:22<00:09, 330.51it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 69%|██████▉   | 6933/10000 [00:23<00:09, 315.63it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 70%|███████   | 7036/10000 [00:23<00:08, 330.39it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 71%|███████▏  | 7137/10000 [00:23<00:08, 322.38it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 72%|███████▏  | 7241/10000 [00:24<00:08, 333.23it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 73%|███████▎  | 7343/10000 [00:24<00:07, 332.59it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 74%|███████▍  | 7445/10000 [00:24<00:07, 324.90it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 75%|███████▌  | 7547/10000 [00:24<00:07, 330.32it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 77%|███████▋  | 7651/10000 [00:25<00:07, 330.25it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 78%|███████▊  | 7753/10000 [00:25<00:06, 327.07it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 79%|███████▊  | 7853/10000 [00:25<00:06, 318.50it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 80%|███████▉  | 7952/10000 [00:26<00:06, 320.53it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 81%|████████  | 8055/10000 [00:26<00:05, 332.23it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 82%|████████▏ | 8157/10000 [00:26<00:05, 324.97it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 83%|████████▎ | 8259/10000 [00:27<00:05, 309.55it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 83%|████████▎ | 8347/10000 [00:27<00:06, 258.22it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 84%|████████▍ | 8423/10000 [00:27<00:06, 233.52it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 85%|████████▌ | 8524/10000 [00:28<00:06, 232.09it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 86%|████████▋ | 8627/10000 [00:28<00:05, 243.14it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 87%|████████▋ | 8729/10000 [00:29<00:05, 243.92it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 88%|████████▊ | 8826/10000 [00:29<00:05, 222.63it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 90%|████████▉ | 8956/10000 [00:30<00:03, 265.57it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 91%|█████████ | 9054/10000 [00:30<00:03, 300.48it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 92%|█████████▏| 9153/10000 [00:30<00:02, 306.24it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 93%|█████████▎| 9252/10000 [00:31<00:02, 321.41it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 94%|█████████▎| 9354/10000 [00:31<00:01, 328.08it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 95%|█████████▍| 9451/10000 [00:31<00:01, 306.50it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 96%|█████████▌| 9552/10000 [00:32<00:01, 322.40it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 97%|█████████▋| 9652/10000 [00:32<00:01, 327.36it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 98%|█████████▊| 9753/10000 [00:32<00:00, 314.17it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


 99%|█████████▊| 9852/10000 [00:32<00:00, 319.75it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


100%|█████████▉| 9952/10000 [00:33<00:00, 321.53it/s, logits=[tensor([0.0025]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0025]), tensor([0.0025])]


100%|██████████| 10000/10000 [00:33<00:00, 299.29it/s, logits=[tensor([0.0025]), tensor([0.0025])]]


## To implement

1. Check for what values of `SEQUENCE_LEN` the model is able to discriminate betweeh the two hard examples (after training).
2. Instead of training on `HARD_EXAMPLES` only, modify the training loop to train on sequences where zero may be in any position of the sequence (so any valid sequence of `Type 0`, not just the hardest one). After modifying the training loop check for what values of `SEQUENCE_LEN` you can train the model successfully.
3. Replace LSTM by a classic RNN and check for what values of `SEQUENCE_LEN` you can train the model successfully.
4. Write a proper curricullum learning loop, where in a loop you consider longer and longer sequences, where expansion of the sequence length happens only after the model is trained successfully on the current length.

Note that for steps 2-4 you may need to change the value of `num_steps`.

In [5]:
#1
for SEQUENCE_LEN in range(2, 12, 1):
    print(f"SEQUENCE_LEN={SEQUENCE_LEN}")
    HARD_EXAMPLES = [([0.]+(SEQUENCE_LEN-1)*[1.], 0), (SEQUENCE_LEN*[1.], 1)]
    train_model(hidden_dim=20, lr=0.01, num_steps=10000)

SEQUENCE_LEN=2


  0%|          | 1/10000 [00:00<1:08:16,  2.44it/s, logits=[tensor([-0.0188]), tensor([-0.0101])]]

Logits for hard examples=[tensor([-0.0188]), tensor([-0.0101])]


  2%|▏         | 153/10000 [00:00<00:40, 243.59it/s, logits=[tensor([0.0099]), tensor([0.0172])]]

Logits for hard examples=[tensor([0.0099]), tensor([0.0172])]


  2%|▏         | 216/10000 [00:01<00:41, 236.05it/s, logits=[tensor([-0.0105]), tensor([-0.0046])]]

Logits for hard examples=[tensor([-0.0105]), tensor([-0.0046])]


  3%|▎         | 333/10000 [00:01<00:48, 200.76it/s, logits=[tensor([-0.0161]), tensor([-0.0106])]]

Logits for hard examples=[tensor([-0.0161]), tensor([-0.0106])]


  4%|▍         | 422/10000 [00:02<00:46, 203.91it/s, logits=[tensor([-0.0155]), tensor([-0.0100])]]

Logits for hard examples=[tensor([-0.0155]), tensor([-0.0100])]


  5%|▌         | 531/10000 [00:02<00:45, 206.49it/s, logits=[tensor([-0.0185]), tensor([-0.0127])]]

Logits for hard examples=[tensor([-0.0185]), tensor([-0.0127])]


  6%|▋         | 641/10000 [00:03<00:45, 203.53it/s, logits=[tensor([-0.0231]), tensor([-0.0167])]]

Logits for hard examples=[tensor([-0.0231]), tensor([-0.0167])]


  7%|▋         | 727/10000 [00:03<00:46, 200.82it/s, logits=[tensor([-0.0299]), tensor([-0.0219])]]

Logits for hard examples=[tensor([-0.0299]), tensor([-0.0219])]


  8%|▊         | 848/10000 [00:04<00:37, 242.07it/s, logits=[tensor([-0.0431]), tensor([-0.0309])]]

Logits for hard examples=[tensor([-0.0431]), tensor([-0.0309])]


  9%|▉         | 931/10000 [00:04<00:34, 259.41it/s, logits=[tensor([-0.2297]), tensor([-0.2099])]]

Logits for hard examples=[tensor([-0.2297]), tensor([-0.2099])]


 11%|█         | 1054/10000 [00:04<00:30, 294.94it/s, logits=[tensor([-0.0153]), tensor([-0.0010])]]

Logits for hard examples=[tensor([-0.0153]), tensor([-0.0010])]


 11%|█▏        | 1146/10000 [00:05<00:29, 298.02it/s, logits=[tensor([0.0674]), tensor([0.0818])]]

Logits for hard examples=[tensor([0.0674]), tensor([0.0818])]


 12%|█▏        | 1238/10000 [00:05<00:31, 280.42it/s, logits=[tensor([0.0586]), tensor([0.0737])]]

Logits for hard examples=[tensor([0.0586]), tensor([0.0737])]


 13%|█▎        | 1336/10000 [00:05<00:28, 307.10it/s, logits=[tensor([0.0477]), tensor([0.0638])]]

Logits for hard examples=[tensor([0.0477]), tensor([0.0638])]


 14%|█▍        | 1432/10000 [00:06<00:28, 302.17it/s, logits=[tensor([0.0409]), tensor([0.0581])]]

Logits for hard examples=[tensor([0.0409]), tensor([0.0581])]


 16%|█▌        | 1552/10000 [00:06<00:30, 280.49it/s, logits=[tensor([0.0361]), tensor([0.0549])]]

Logits for hard examples=[tensor([0.0361]), tensor([0.0549])]


 16%|█▋        | 1646/10000 [00:07<00:28, 292.42it/s, logits=[tensor([0.0318]), tensor([0.0532])]]

Logits for hard examples=[tensor([0.0318]), tensor([0.0532])]


 17%|█▋        | 1743/10000 [00:07<00:26, 307.14it/s, logits=[tensor([0.0262]), tensor([0.0522])]]

Logits for hard examples=[tensor([0.0262]), tensor([0.0522])]


 18%|█▊        | 1839/10000 [00:07<00:27, 301.69it/s, logits=[tensor([0.0082]), tensor([0.0487])]]

Logits for hard examples=[tensor([0.0082]), tensor([0.0487])]


 19%|█▉        | 1936/10000 [00:07<00:25, 310.22it/s, logits=[tensor([-0.0692]), tensor([-0.0461])]]

Logits for hard examples=[tensor([-0.0692]), tensor([-0.0461])]


 20%|██        | 2035/10000 [00:08<00:24, 319.81it/s, logits=[tensor([0.2392]), tensor([0.2335])]]

Logits for hard examples=[tensor([0.2392]), tensor([0.2335])]


 21%|██▏       | 2133/10000 [00:08<00:25, 305.78it/s, logits=[tensor([0.0915]), tensor([0.0823])]]

Logits for hard examples=[tensor([0.0915]), tensor([0.0823])]


 23%|██▎       | 2257/10000 [00:08<00:25, 305.19it/s, logits=[tensor([0.1634]), tensor([0.1542])]]

Logits for hard examples=[tensor([0.1634]), tensor([0.1542])]


 24%|██▎       | 2355/10000 [00:09<00:24, 315.19it/s, logits=[tensor([0.1402]), tensor([0.1335])]]

Logits for hard examples=[tensor([0.1402]), tensor([0.1335])]


 25%|██▍       | 2451/10000 [00:09<00:24, 304.05it/s, logits=[tensor([0.1476]), tensor([0.1431])]]

Logits for hard examples=[tensor([0.1476]), tensor([0.1431])]


 25%|██▌       | 2549/10000 [00:09<00:23, 310.86it/s, logits=[tensor([0.1467]), tensor([0.1457])]]

Logits for hard examples=[tensor([0.1467]), tensor([0.1457])]


 26%|██▋       | 2647/10000 [00:10<00:23, 315.13it/s, logits=[tensor([0.1446]), tensor([0.1510])]]

Logits for hard examples=[tensor([0.1446]), tensor([0.1510])]


 27%|██▋       | 2740/10000 [00:10<00:25, 286.70it/s, logits=[tensor([2.4788]), tensor([2.7306])]]

Logits for hard examples=[tensor([2.4788]), tensor([2.7306])]


 28%|██▊       | 2832/10000 [00:10<00:24, 289.33it/s, logits=[tensor([1.0064]), tensor([1.0061])]]

Logits for hard examples=[tensor([1.0064]), tensor([1.0061])]


 30%|██▉       | 2952/10000 [00:11<00:25, 274.27it/s, logits=[tensor([0.0001]), tensor([-0.0001])]]

Logits for hard examples=[tensor([0.0001]), tensor([-0.0001])]


 30%|███       | 3045/10000 [00:11<00:24, 282.09it/s, logits=[tensor([-0.0577]), tensor([-0.0579])]]

Logits for hard examples=[tensor([-0.0577]), tensor([-0.0579])]


 31%|███▏      | 3136/10000 [00:11<00:26, 263.99it/s, logits=[tensor([0.0826]), tensor([0.0824])]]

Logits for hard examples=[tensor([0.0826]), tensor([0.0824])]


 32%|███▏      | 3230/10000 [00:12<00:24, 277.76it/s, logits=[tensor([0.0910]), tensor([0.0908])]]

Logits for hard examples=[tensor([0.0910]), tensor([0.0908])]


 34%|███▎      | 3350/10000 [00:12<00:23, 285.43it/s, logits=[tensor([0.0722]), tensor([0.0720])]]

Logits for hard examples=[tensor([0.0722]), tensor([0.0720])]


 34%|███▍      | 3440/10000 [00:13<00:22, 289.39it/s, logits=[tensor([0.0712]), tensor([0.0710])]]

Logits for hard examples=[tensor([0.0712]), tensor([0.0710])]


 35%|███▌      | 3530/10000 [00:13<00:22, 292.01it/s, logits=[tensor([0.0737]), tensor([0.0735])]]

Logits for hard examples=[tensor([0.0737]), tensor([0.0735])]


 36%|███▋      | 3648/10000 [00:13<00:22, 283.82it/s, logits=[tensor([0.0738]), tensor([0.0736])]]

Logits for hard examples=[tensor([0.0738]), tensor([0.0736])]


 37%|███▋      | 3736/10000 [00:14<00:22, 282.91it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 38%|███▊      | 3817/10000 [00:14<00:29, 207.51it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 39%|███▉      | 3926/10000 [00:15<00:30, 197.70it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 40%|████      | 4012/10000 [00:15<00:35, 170.81it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 41%|████▏     | 4126/10000 [00:16<00:53, 110.62it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 42%|████▏     | 4225/10000 [00:17<00:32, 176.03it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 43%|████▎     | 4336/10000 [00:17<00:22, 249.40it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 45%|████▍     | 4452/10000 [00:18<00:20, 270.92it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 45%|████▌     | 4537/10000 [00:18<00:20, 267.95it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 47%|████▋     | 4662/10000 [00:18<00:17, 297.20it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 48%|████▊     | 4751/10000 [00:19<00:18, 279.55it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 48%|████▊     | 4837/10000 [00:19<00:18, 273.03it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 50%|████▉     | 4960/10000 [00:19<00:17, 292.83it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 50%|█████     | 5049/10000 [00:20<00:17, 281.89it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 51%|█████▏    | 5136/10000 [00:20<00:18, 265.41it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 53%|█████▎    | 5260/10000 [00:21<00:16, 293.76it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 54%|█████▎    | 5350/10000 [00:21<00:16, 286.21it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 54%|█████▍    | 5438/10000 [00:21<00:16, 276.26it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 55%|█████▌    | 5530/10000 [00:22<00:15, 288.34it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 56%|█████▋    | 5649/10000 [00:22<00:16, 271.93it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 57%|█████▋    | 5734/10000 [00:22<00:15, 273.52it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 59%|█████▊    | 5859/10000 [00:23<00:14, 293.81it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 59%|█████▉    | 5947/10000 [00:23<00:15, 258.89it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 60%|██████    | 6028/10000 [00:23<00:15, 248.67it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 61%|██████▏   | 6141/10000 [00:24<00:14, 272.01it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 63%|██████▎   | 6252/10000 [00:24<00:14, 267.67it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 63%|██████▎   | 6337/10000 [00:25<00:13, 279.67it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 65%|██████▍   | 6454/10000 [00:25<00:12, 277.89it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 65%|██████▌   | 6541/10000 [00:25<00:12, 270.28it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 66%|██████▋   | 6633/10000 [00:26<00:11, 290.87it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 67%|██████▋   | 6724/10000 [00:26<00:11, 289.87it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 68%|██████▊   | 6843/10000 [00:26<00:11, 281.20it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 69%|██████▉   | 6932/10000 [00:27<00:10, 286.73it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 70%|███████   | 7022/10000 [00:27<00:12, 238.56it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 71%|███████▏  | 7137/10000 [00:28<00:14, 200.18it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 72%|███████▏  | 7219/10000 [00:28<00:16, 172.36it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 73%|███████▎  | 7325/10000 [00:29<00:13, 199.82it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 74%|███████▍  | 7423/10000 [00:29<00:14, 182.79it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 75%|███████▌  | 7525/10000 [00:30<00:12, 195.37it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 76%|███████▋  | 7637/10000 [00:30<00:10, 219.77it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 78%|███████▊  | 7754/10000 [00:31<00:08, 269.94it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 78%|███████▊  | 7836/10000 [00:31<00:08, 260.32it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 79%|███████▉  | 7943/10000 [00:31<00:07, 257.61it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 81%|████████  | 8054/10000 [00:32<00:07, 270.84it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 81%|████████▏ | 8138/10000 [00:32<00:07, 255.50it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 83%|████████▎ | 8251/10000 [00:33<00:06, 275.49it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 83%|████████▎ | 8336/10000 [00:33<00:06, 271.70it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 84%|████████▍ | 8444/10000 [00:33<00:06, 251.37it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 85%|████████▌ | 8531/10000 [00:34<00:05, 272.81it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 86%|████████▋ | 8647/10000 [00:34<00:05, 258.61it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 87%|████████▋ | 8734/10000 [00:34<00:04, 271.12it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 88%|████████▊ | 8827/10000 [00:35<00:04, 280.83it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 89%|████████▉ | 8944/10000 [00:35<00:04, 263.89it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 90%|█████████ | 9031/10000 [00:35<00:03, 273.07it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 92%|█████████▏| 9150/10000 [00:36<00:02, 287.11it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 92%|█████████▏| 9239/10000 [00:36<00:02, 279.79it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 94%|█████████▎| 9358/10000 [00:37<00:02, 279.38it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 94%|█████████▍| 9446/10000 [00:37<00:01, 284.11it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 95%|█████████▌| 9535/10000 [00:37<00:01, 282.61it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 97%|█████████▋| 9652/10000 [00:38<00:01, 281.66it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 97%|█████████▋| 9739/10000 [00:38<00:00, 278.02it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 99%|█████████▊| 9860/10000 [00:38<00:00, 291.29it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


 99%|█████████▉| 9948/10000 [00:39<00:00, 283.87it/s, logits=[tensor([0.0735]), tensor([0.0733])]]

Logits for hard examples=[tensor([0.0735]), tensor([0.0733])]


100%|██████████| 10000/10000 [00:39<00:00, 253.67it/s, logits=[tensor([0.0735]), tensor([0.0733])]]


SEQUENCE_LEN=3


  0%|          | 23/10000 [00:00<00:43, 227.34it/s, logits=[tensor([-0.1682]), tensor([-0.1709])]]

Logits for hard examples=[tensor([-0.1682]), tensor([-0.1709])]


  1%|▏         | 148/10000 [00:00<00:31, 308.01it/s, logits=[tensor([0.0420]), tensor([0.0421])]]

Logits for hard examples=[tensor([0.0420]), tensor([0.0421])]


  2%|▏         | 241/10000 [00:00<00:32, 300.51it/s, logits=[tensor([0.0087]), tensor([0.0091])]]

Logits for hard examples=[tensor([0.0087]), tensor([0.0091])]


  3%|▎         | 333/10000 [00:01<00:37, 256.96it/s, logits=[tensor([-0.0106]), tensor([-0.0100])]]

Logits for hard examples=[tensor([-0.0106]), tensor([-0.0100])]


  4%|▍         | 434/10000 [00:01<00:41, 230.69it/s, logits=[tensor([-7.6549e-05]), tensor([0.0008])]]

Logits for hard examples=[tensor([-7.6549e-05]), tensor([0.0008])]


  5%|▌         | 527/10000 [00:02<00:42, 220.52it/s, logits=[tensor([-0.0030]), tensor([-0.0019])]]

Logits for hard examples=[tensor([-0.0030]), tensor([-0.0019])]


  6%|▋         | 627/10000 [00:02<00:39, 240.03it/s, logits=[tensor([-0.0043]), tensor([-0.0030])]]

Logits for hard examples=[tensor([-0.0043]), tensor([-0.0030])]


  7%|▋         | 728/10000 [00:02<00:39, 233.97it/s, logits=[tensor([-0.0043]), tensor([-0.0029])]]

Logits for hard examples=[tensor([-0.0043]), tensor([-0.0029])]


  8%|▊         | 843/10000 [00:03<00:43, 212.21it/s, logits=[tensor([-0.0055]), tensor([-0.0038])]]

Logits for hard examples=[tensor([-0.0055]), tensor([-0.0038])]


  9%|▉         | 931/10000 [00:03<00:46, 193.59it/s, logits=[tensor([-0.0070]), tensor([-0.0048])]]

Logits for hard examples=[tensor([-0.0070]), tensor([-0.0048])]


 11%|█         | 1063/10000 [00:04<00:33, 266.92it/s, logits=[tensor([-0.0090]), tensor([-0.0063])]]

Logits for hard examples=[tensor([-0.0090]), tensor([-0.0063])]


 12%|█▏        | 1160/10000 [00:04<00:29, 298.84it/s, logits=[tensor([-0.0131]), tensor([-0.0097])]]

Logits for hard examples=[tensor([-0.0131]), tensor([-0.0097])]


 13%|█▎        | 1251/10000 [00:05<00:30, 289.43it/s, logits=[tensor([-0.0124]), tensor([-0.0090])]]

Logits for hard examples=[tensor([-0.0124]), tensor([-0.0090])]


 13%|█▎        | 1346/10000 [00:05<00:28, 306.51it/s, logits=[tensor([-0.0237]), tensor([-0.0179])]]

Logits for hard examples=[tensor([-0.0237]), tensor([-0.0179])]


 14%|█▍        | 1442/10000 [00:05<00:27, 309.32it/s, logits=[tensor([-0.0452]), tensor([-0.0323])]]

Logits for hard examples=[tensor([-0.0452]), tensor([-0.0323])]


 15%|█▌        | 1537/10000 [00:06<00:28, 299.08it/s, logits=[tensor([-0.0794]), tensor([-0.0148])]]

Logits for hard examples=[tensor([-0.0794]), tensor([-0.0148])]


 16%|█▋        | 1630/10000 [00:06<00:29, 283.91it/s, logits=[tensor([-0.3150]), tensor([-0.2584])]]

Logits for hard examples=[tensor([-0.3150]), tensor([-0.2584])]


 18%|█▊        | 1762/10000 [00:06<00:26, 307.43it/s, logits=[tensor([1.6987]), tensor([1.6989])]]

Logits for hard examples=[tensor([1.6987]), tensor([1.6989])]


 19%|█▊        | 1854/10000 [00:07<00:28, 282.73it/s, logits=[tensor([0.6015]), tensor([0.6015])]]

Logits for hard examples=[tensor([0.6015]), tensor([0.6015])]


 19%|█▉        | 1946/10000 [00:07<00:27, 294.46it/s, logits=[tensor([-0.1682]), tensor([-0.1682])]]

Logits for hard examples=[tensor([-0.1682]), tensor([-0.1682])]


 20%|██        | 2041/10000 [00:07<00:25, 307.19it/s, logits=[tensor([0.0226]), tensor([0.0225])]]

Logits for hard examples=[tensor([0.0226]), tensor([0.0225])]


 21%|██▏       | 2133/10000 [00:08<00:26, 292.97it/s, logits=[tensor([0.0162]), tensor([0.0162])]]

Logits for hard examples=[tensor([0.0162]), tensor([0.0162])]


 23%|██▎       | 2258/10000 [00:08<00:25, 298.71it/s, logits=[tensor([-0.0063]), tensor([-0.0063])]]

Logits for hard examples=[tensor([-0.0063]), tensor([-0.0063])]


 24%|██▎       | 2353/10000 [00:08<00:25, 301.14it/s, logits=[tensor([0.0097]), tensor([0.0097])]]

Logits for hard examples=[tensor([0.0097]), tensor([0.0097])]


 24%|██▍       | 2444/10000 [00:09<00:27, 273.74it/s, logits=[tensor([0.0020]), tensor([0.0020])]]

Logits for hard examples=[tensor([0.0020]), tensor([0.0020])]


 25%|██▌       | 2542/10000 [00:09<00:24, 304.58it/s, logits=[tensor([0.0050]), tensor([0.0050])]]

Logits for hard examples=[tensor([0.0050]), tensor([0.0050])]


 26%|██▋       | 2639/10000 [00:09<00:23, 313.87it/s, logits=[tensor([0.0040]), tensor([0.0040])]]

Logits for hard examples=[tensor([0.0040]), tensor([0.0040])]


 27%|██▋       | 2732/10000 [00:10<00:24, 290.96it/s, logits=[tensor([0.0043]), tensor([0.0043])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0043])]


 29%|██▊       | 2859/10000 [00:10<00:23, 304.39it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 30%|██▉       | 2952/10000 [00:10<00:23, 300.82it/s, logits=[tensor([0.0042]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0042]), tensor([0.0042])]


 30%|███       | 3044/10000 [00:11<00:25, 276.54it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 31%|███▏      | 3141/10000 [00:11<00:22, 305.67it/s, logits=[tensor([0.0042]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0042]), tensor([0.0042])]


 32%|███▏      | 3242/10000 [00:11<00:20, 323.57it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 33%|███▎      | 3340/10000 [00:12<00:22, 293.94it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 34%|███▍      | 3433/10000 [00:12<00:21, 298.55it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 35%|███▌      | 3528/10000 [00:12<00:21, 302.88it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 36%|███▋      | 3646/10000 [00:13<00:25, 245.15it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 37%|███▋      | 3707/10000 [00:13<00:26, 241.39it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 38%|███▊      | 3826/10000 [00:14<00:25, 241.35it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 39%|███▉      | 3925/10000 [00:14<00:39, 155.24it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 40%|████      | 4039/10000 [00:15<00:29, 202.40it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 41%|████▏     | 4132/10000 [00:15<00:27, 217.04it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 42%|████▏     | 4221/10000 [00:16<00:28, 200.86it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 43%|████▎     | 4334/10000 [00:16<00:27, 208.37it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 44%|████▍     | 4422/10000 [00:17<00:27, 202.77it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 46%|████▌     | 4553/10000 [00:17<00:21, 256.91it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 46%|████▋     | 4637/10000 [00:18<00:20, 267.67it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 47%|████▋     | 4726/10000 [00:18<00:20, 259.36it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 48%|████▊     | 4848/10000 [00:18<00:17, 291.29it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 49%|████▉     | 4935/10000 [00:19<00:19, 264.55it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 51%|█████     | 5055/10000 [00:19<00:17, 285.88it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 51%|█████▏    | 5146/10000 [00:19<00:17, 284.09it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 52%|█████▏    | 5236/10000 [00:20<00:16, 281.75it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 53%|█████▎    | 5329/10000 [00:20<00:16, 283.58it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 55%|█████▍    | 5453/10000 [00:20<00:15, 296.45it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 55%|█████▌    | 5543/10000 [00:21<00:15, 282.86it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 56%|█████▋    | 5635/10000 [00:21<00:14, 291.74it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 57%|█████▋    | 5732/10000 [00:21<00:14, 302.08it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 59%|█████▊    | 5859/10000 [00:22<00:13, 297.63it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 60%|█████▉    | 5954/10000 [00:22<00:13, 292.71it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 61%|██████    | 6051/10000 [00:22<00:13, 300.94it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 61%|██████▏   | 6139/10000 [00:23<00:14, 273.18it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 62%|██████▏   | 6230/10000 [00:23<00:13, 281.57it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 64%|██████▎   | 6353/10000 [00:24<00:12, 297.15it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 64%|██████▍   | 6447/10000 [00:24<00:11, 302.68it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 65%|██████▌   | 6544/10000 [00:24<00:11, 302.06it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 66%|██████▋   | 6640/10000 [00:24<00:11, 300.05it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 67%|██████▋   | 6732/10000 [00:25<00:11, 292.25it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 68%|██████▊   | 6831/10000 [00:25<00:10, 308.44it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 70%|██████▉   | 6953/10000 [00:26<00:10, 297.39it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 70%|███████   | 7048/10000 [00:26<00:09, 300.10it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 71%|███████▏  | 7143/10000 [00:26<00:09, 300.62it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 72%|███████▏  | 7233/10000 [00:27<00:09, 282.05it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 74%|███████▎  | 7357/10000 [00:27<00:09, 291.09it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 74%|███████▍  | 7414/10000 [00:27<00:10, 237.45it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 75%|███████▌  | 7532/10000 [00:28<00:11, 217.20it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 76%|███████▋  | 7628/10000 [00:28<00:10, 224.58it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 77%|███████▋  | 7743/10000 [00:29<00:10, 217.66it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 78%|███████▊  | 7831/10000 [00:29<00:10, 208.45it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 79%|███████▉  | 7940/10000 [00:30<00:09, 206.53it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 80%|████████  | 8033/10000 [00:30<00:09, 215.05it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 81%|████████▏ | 8134/10000 [00:31<00:08, 227.62it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 83%|████████▎ | 8262/10000 [00:31<00:05, 296.74it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 84%|████████▎ | 8358/10000 [00:31<00:05, 306.54it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 84%|████████▍ | 8450/10000 [00:32<00:05, 277.82it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 85%|████████▌ | 8549/10000 [00:32<00:04, 309.36it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 86%|████████▋ | 8644/10000 [00:32<00:04, 308.86it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 87%|████████▋ | 8736/10000 [00:33<00:04, 287.59it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 89%|████████▊ | 8861/10000 [00:33<00:03, 303.10it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 90%|████████▉ | 8956/10000 [00:33<00:03, 307.35it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 90%|█████████ | 9047/10000 [00:34<00:03, 282.19it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 91%|█████████▏| 9140/10000 [00:34<00:02, 298.49it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 92%|█████████▏| 9236/10000 [00:34<00:02, 308.47it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 93%|█████████▎| 9332/10000 [00:35<00:02, 264.40it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 95%|█████████▍| 9456/10000 [00:35<00:01, 290.39it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 95%|█████████▌| 9548/10000 [00:35<00:01, 290.59it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 96%|█████████▋| 9637/10000 [00:36<00:01, 275.22it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 97%|█████████▋| 9732/10000 [00:36<00:00, 294.91it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 98%|█████████▊| 9827/10000 [00:36<00:00, 299.16it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


 99%|█████████▉| 9946/10000 [00:37<00:00, 278.96it/s, logits=[tensor([0.0043]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0042])]


100%|██████████| 10000/10000 [00:37<00:00, 267.09it/s, logits=[tensor([0.0043]), tensor([0.0042])]]


SEQUENCE_LEN=4


  0%|          | 30/10000 [00:00<00:33, 298.26it/s, logits=[tensor([-0.1372]), tensor([-0.1397])]]

Logits for hard examples=[tensor([-0.1372]), tensor([-0.1397])]


  2%|▏         | 158/10000 [00:00<00:32, 299.20it/s, logits=[tensor([0.0022]), tensor([0.0028])]]

Logits for hard examples=[tensor([0.0022]), tensor([0.0028])]


  2%|▎         | 250/10000 [00:00<00:35, 271.03it/s, logits=[tensor([0.0089]), tensor([0.0117])]]

Logits for hard examples=[tensor([0.0089]), tensor([0.0117])]


  3%|▎         | 340/10000 [00:01<00:33, 287.59it/s, logits=[tensor([-0.0200]), tensor([-0.0139])]]

Logits for hard examples=[tensor([-0.0200]), tensor([-0.0139])]


  4%|▍         | 435/10000 [00:01<00:32, 295.89it/s, logits=[tensor([-0.0595]), tensor([-0.0341])]]

Logits for hard examples=[tensor([-0.0595]), tensor([-0.0341])]


  6%|▌         | 557/10000 [00:01<00:31, 296.63it/s, logits=[tensor([-0.1955]), tensor([-0.1968])]]

Logits for hard examples=[tensor([-0.1955]), tensor([-0.1968])]


  7%|▋         | 654/10000 [00:02<00:31, 298.41it/s, logits=[tensor([0.0780]), tensor([0.0777])]]

Logits for hard examples=[tensor([0.0780]), tensor([0.0777])]


  7%|▋         | 745/10000 [00:02<00:31, 289.50it/s, logits=[tensor([0.0138]), tensor([0.0135])]]

Logits for hard examples=[tensor([0.0138]), tensor([0.0135])]


  8%|▊         | 836/10000 [00:02<00:32, 286.26it/s, logits=[tensor([-0.0038]), tensor([-0.0041])]]

Logits for hard examples=[tensor([-0.0038]), tensor([-0.0041])]


  9%|▉         | 933/10000 [00:03<00:30, 301.02it/s, logits=[tensor([0.0007]), tensor([0.0005])]]

Logits for hard examples=[tensor([0.0007]), tensor([0.0005])]


 10%|█         | 1023/10000 [00:03<00:39, 227.75it/s, logits=[tensor([0.0034]), tensor([0.0033])]]

Logits for hard examples=[tensor([0.0034]), tensor([0.0033])]


 11%|█▏        | 1138/10000 [00:04<00:41, 211.81it/s, logits=[tensor([0.0040]), tensor([0.0039])]]

Logits for hard examples=[tensor([0.0040]), tensor([0.0039])]


 12%|█▏        | 1229/10000 [00:04<00:40, 215.83it/s, logits=[tensor([0.0039]), tensor([0.0039])]]

Logits for hard examples=[tensor([0.0039]), tensor([0.0039])]


 13%|█▎        | 1319/10000 [00:05<00:43, 198.63it/s, logits=[tensor([0.0036]), tensor([0.0037])]]

Logits for hard examples=[tensor([0.0036]), tensor([0.0037])]


 14%|█▍        | 1432/10000 [00:05<00:44, 191.20it/s, logits=[tensor([0.0033]), tensor([0.0036])]]

Logits for hard examples=[tensor([0.0033]), tensor([0.0036])]


 15%|█▌        | 1537/10000 [00:06<00:42, 198.19it/s, logits=[tensor([0.0031]), tensor([0.0034])]]

Logits for hard examples=[tensor([0.0031]), tensor([0.0034])]


 16%|█▋        | 1632/10000 [00:06<00:35, 232.53it/s, logits=[tensor([0.0028]), tensor([0.0033])]]

Logits for hard examples=[tensor([0.0028]), tensor([0.0033])]


 18%|█▊        | 1754/10000 [00:07<00:28, 285.41it/s, logits=[tensor([0.0026]), tensor([0.0032])]]

Logits for hard examples=[tensor([0.0026]), tensor([0.0032])]


 18%|█▊        | 1849/10000 [00:07<00:26, 302.64it/s, logits=[tensor([0.0021]), tensor([0.0030])]]

Logits for hard examples=[tensor([0.0021]), tensor([0.0030])]


 19%|█▉        | 1942/10000 [00:07<00:28, 284.79it/s, logits=[tensor([0.0017]), tensor([0.0029])]]

Logits for hard examples=[tensor([0.0017]), tensor([0.0029])]


 20%|██        | 2037/10000 [00:07<00:25, 306.40it/s, logits=[tensor([0.0014]), tensor([0.0029])]]

Logits for hard examples=[tensor([0.0014]), tensor([0.0029])]


 21%|██▏       | 2134/10000 [00:08<00:25, 302.80it/s, logits=[tensor([0.0012]), tensor([0.0029])]]

Logits for hard examples=[tensor([0.0012]), tensor([0.0029])]


 22%|██▏       | 2228/10000 [00:08<00:27, 283.32it/s, logits=[tensor([0.0011]), tensor([0.0031])]]

Logits for hard examples=[tensor([0.0011]), tensor([0.0031])]


 24%|██▎       | 2355/10000 [00:09<00:25, 303.69it/s, logits=[tensor([0.0010]), tensor([0.0035])]]

Logits for hard examples=[tensor([0.0010]), tensor([0.0035])]


 25%|██▍       | 2451/10000 [00:09<00:24, 306.54it/s, logits=[tensor([0.0009]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0009]), tensor([0.0042])]


 25%|██▌       | 2543/10000 [00:09<00:26, 284.87it/s, logits=[tensor([0.0008]), tensor([0.0058])]]

Logits for hard examples=[tensor([0.0008]), tensor([0.0058])]


 26%|██▋       | 2638/10000 [00:09<00:24, 301.54it/s, logits=[tensor([-9.2819e-05]), tensor([0.0109])]]

Logits for hard examples=[tensor([-9.2819e-05]), tensor([0.0109])]


 27%|██▋       | 2729/10000 [00:10<00:25, 286.34it/s, logits=[tensor([0.3045]), tensor([0.3093])]]

Logits for hard examples=[tensor([0.3045]), tensor([0.3093])]


 28%|██▊       | 2850/10000 [00:10<00:25, 275.68it/s, logits=[tensor([0.0769]), tensor([0.0793])]]

Logits for hard examples=[tensor([0.0769]), tensor([0.0793])]


 29%|██▉       | 2939/10000 [00:11<00:25, 273.61it/s, logits=[tensor([-0.0176]), tensor([-0.0153])]]

Logits for hard examples=[tensor([-0.0176]), tensor([-0.0153])]


 31%|███       | 3053/10000 [00:11<00:25, 274.86it/s, logits=[tensor([-0.0185]), tensor([-0.0159])]]

Logits for hard examples=[tensor([-0.0185]), tensor([-0.0159])]


 31%|███▏      | 3140/10000 [00:11<00:25, 268.30it/s, logits=[tensor([-0.0032]), tensor([-0.0002])]]

Logits for hard examples=[tensor([-0.0032]), tensor([-0.0002])]


 32%|███▏      | 3232/10000 [00:12<00:23, 290.86it/s, logits=[tensor([-0.0013]), tensor([0.0026])]]

Logits for hard examples=[tensor([-0.0013]), tensor([0.0026])]


 34%|███▎      | 3353/10000 [00:12<00:23, 288.02it/s, logits=[tensor([-0.0040]), tensor([0.0017])]]

Logits for hard examples=[tensor([-0.0040]), tensor([0.0017])]


 34%|███▍      | 3444/10000 [00:12<00:22, 293.63it/s, logits=[tensor([-0.0093]), tensor([0.0046])]]

Logits for hard examples=[tensor([-0.0093]), tensor([0.0046])]


 35%|███▌      | 3541/10000 [00:13<00:21, 306.06it/s, logits=[tensor([0.1602]), tensor([0.1671])]]

Logits for hard examples=[tensor([0.1602]), tensor([0.1671])]


 36%|███▋      | 3634/10000 [00:13<00:21, 290.65it/s, logits=[tensor([-0.1129]), tensor([-0.1105])]]

Logits for hard examples=[tensor([-0.1129]), tensor([-0.1105])]


 38%|███▊      | 3755/10000 [00:13<00:21, 294.92it/s, logits=[tensor([0.0115]), tensor([0.0158])]]

Logits for hard examples=[tensor([0.0115]), tensor([0.0158])]


 38%|███▊      | 3848/10000 [00:14<00:21, 283.77it/s, logits=[tensor([-4.7151]), tensor([3.9266])]]

Logits for hard examples=[tensor([-4.7151]), tensor([3.9266])]


 39%|███▉      | 3936/10000 [00:14<00:21, 284.08it/s, logits=[tensor([-8.7293]), tensor([8.1741])]]

Logits for hard examples=[tensor([-8.7293]), tensor([8.1741])]


 41%|████      | 4058/10000 [00:15<00:20, 295.48it/s, logits=[tensor([-9.2818]), tensor([8.8246])]]

Logits for hard examples=[tensor([-9.2818]), tensor([8.8246])]


 41%|████▏     | 4147/10000 [00:15<00:20, 284.39it/s, logits=[tensor([-9.3663]), tensor([8.9368])]]

Logits for hard examples=[tensor([-9.3663]), tensor([8.9368])]


 42%|████▏     | 4232/10000 [00:15<00:21, 266.64it/s, logits=[tensor([-9.3882]), tensor([8.9734])]]

Logits for hard examples=[tensor([-9.3882]), tensor([8.9734])]


 44%|████▎     | 4361/10000 [00:16<00:18, 304.03it/s, logits=[tensor([-9.4018]), tensor([8.9988])]]

Logits for hard examples=[tensor([-9.4018]), tensor([8.9988])]


 45%|████▍     | 4453/10000 [00:16<00:18, 293.01it/s, logits=[tensor([-9.4142]), tensor([9.0222])]]

Logits for hard examples=[tensor([-9.4142]), tensor([9.0222])]


 45%|████▌     | 4538/10000 [00:16<00:23, 232.09it/s, logits=[tensor([-9.4266]), tensor([9.0446])]]

Logits for hard examples=[tensor([-9.4266]), tensor([9.0446])]


 46%|████▋     | 4635/10000 [00:17<00:23, 226.25it/s, logits=[tensor([-9.4389]), tensor([9.0664])]]

Logits for hard examples=[tensor([-9.4389]), tensor([9.0664])]


 47%|████▋     | 4728/10000 [00:17<00:25, 210.42it/s, logits=[tensor([-9.4512]), tensor([9.0876])]]

Logits for hard examples=[tensor([-9.4512]), tensor([9.0876])]


 48%|████▊     | 4840/10000 [00:18<00:24, 206.81it/s, logits=[tensor([-9.4634]), tensor([9.1083])]]

Logits for hard examples=[tensor([-9.4634]), tensor([9.1083])]


 49%|████▉     | 4924/10000 [00:18<00:27, 183.71it/s, logits=[tensor([-9.4757]), tensor([9.1284])]]

Logits for hard examples=[tensor([-9.4757]), tensor([9.1284])]


 50%|█████     | 5029/10000 [00:19<00:24, 201.09it/s, logits=[tensor([-9.4879]), tensor([9.1481])]]

Logits for hard examples=[tensor([-9.4879]), tensor([9.1481])]


 51%|█████▏    | 5140/10000 [00:19<00:23, 205.57it/s, logits=[tensor([-9.5000]), tensor([9.1672])]]

Logits for hard examples=[tensor([-9.5000]), tensor([9.1672])]


 52%|█████▏    | 5234/10000 [00:20<00:17, 271.19it/s, logits=[tensor([-9.5121]), tensor([9.1860])]]

Logits for hard examples=[tensor([-9.5121]), tensor([9.1860])]


 54%|█████▎    | 5360/10000 [00:20<00:15, 299.86it/s, logits=[tensor([-9.5242]), tensor([9.2044])]]

Logits for hard examples=[tensor([-9.5242]), tensor([9.2044])]


 55%|█████▍    | 5454/10000 [00:20<00:15, 286.71it/s, logits=[tensor([-9.5361]), tensor([9.2223])]]

Logits for hard examples=[tensor([-9.5361]), tensor([9.2223])]


 55%|█████▌    | 5546/10000 [00:21<00:15, 289.57it/s, logits=[tensor([-9.5480]), tensor([9.2399])]]

Logits for hard examples=[tensor([-9.5480]), tensor([9.2399])]


 56%|█████▋    | 5640/10000 [00:21<00:14, 299.60it/s, logits=[tensor([-9.5599]), tensor([9.2572])]]

Logits for hard examples=[tensor([-9.5599]), tensor([9.2572])]


 57%|█████▋    | 5728/10000 [00:21<00:16, 257.98it/s, logits=[tensor([-9.5716]), tensor([9.2741])]]

Logits for hard examples=[tensor([-9.5716]), tensor([9.2741])]


 59%|█████▊    | 5856/10000 [00:22<00:13, 301.42it/s, logits=[tensor([-9.5832]), tensor([9.2907])]]

Logits for hard examples=[tensor([-9.5832]), tensor([9.2907])]


 60%|█████▉    | 5950/10000 [00:22<00:13, 301.03it/s, logits=[tensor([-9.5948]), tensor([9.3070])]]

Logits for hard examples=[tensor([-9.5948]), tensor([9.3070])]


 60%|██████    | 6044/10000 [00:22<00:13, 286.99it/s, logits=[tensor([-9.6063]), tensor([9.3230])]]

Logits for hard examples=[tensor([-9.6063]), tensor([9.3230])]


 61%|██████▏   | 6140/10000 [00:23<00:12, 307.78it/s, logits=[tensor([-9.6176]), tensor([9.3388])]]

Logits for hard examples=[tensor([-9.6176]), tensor([9.3388])]


 62%|██████▏   | 6235/10000 [00:23<00:12, 307.16it/s, logits=[tensor([-9.6290]), tensor([9.3542])]]

Logits for hard examples=[tensor([-9.6290]), tensor([9.3542])]


 64%|██████▎   | 6360/10000 [00:23<00:12, 301.17it/s, logits=[tensor([-9.6401]), tensor([9.3694])]]

Logits for hard examples=[tensor([-9.6401]), tensor([9.3694])]


 65%|██████▍   | 6457/10000 [00:24<00:11, 311.76it/s, logits=[tensor([-9.6513]), tensor([9.3844])]]

Logits for hard examples=[tensor([-9.6513]), tensor([9.3844])]


 66%|██████▌   | 6553/10000 [00:24<00:11, 312.19it/s, logits=[tensor([-9.6623]), tensor([9.3991])]]

Logits for hard examples=[tensor([-9.6623]), tensor([9.3991])]


 66%|██████▋   | 6649/10000 [00:24<00:11, 290.55it/s, logits=[tensor([-9.6732]), tensor([9.4136])]]

Logits for hard examples=[tensor([-9.6732]), tensor([9.4136])]


 67%|██████▋   | 6747/10000 [00:25<00:10, 311.37it/s, logits=[tensor([-9.6840]), tensor([9.4278])]]

Logits for hard examples=[tensor([-9.6840]), tensor([9.4278])]


 68%|██████▊   | 6843/10000 [00:25<00:10, 301.68it/s, logits=[tensor([-9.6947]), tensor([9.4419])]]

Logits for hard examples=[tensor([-9.6947]), tensor([9.4419])]


 69%|██████▉   | 6933/10000 [00:25<00:11, 270.97it/s, logits=[tensor([-9.7054]), tensor([9.4557])]]

Logits for hard examples=[tensor([-9.7054]), tensor([9.4557])]


 70%|███████   | 7029/10000 [00:26<00:10, 285.08it/s, logits=[tensor([-9.7159]), tensor([9.4694])]]

Logits for hard examples=[tensor([-9.7159]), tensor([9.4694])]


 71%|███████▏  | 7134/10000 [00:26<00:13, 220.17it/s, logits=[tensor([-9.7263]), tensor([9.4828])]]

Logits for hard examples=[tensor([-9.7263]), tensor([9.4828])]


 72%|███████▏  | 7228/10000 [00:27<00:12, 223.56it/s, logits=[tensor([-9.7367]), tensor([9.4961])]]

Logits for hard examples=[tensor([-9.7367]), tensor([9.4961])]


 73%|███████▎  | 7324/10000 [00:27<00:11, 231.34it/s, logits=[tensor([-9.7469]), tensor([9.5091])]]

Logits for hard examples=[tensor([-9.7469]), tensor([9.5091])]


 74%|███████▍  | 7438/10000 [00:28<00:11, 214.24it/s, logits=[tensor([-9.7571]), tensor([9.5220])]]

Logits for hard examples=[tensor([-9.7571]), tensor([9.5220])]


 75%|███████▌  | 7522/10000 [00:28<00:12, 197.16it/s, logits=[tensor([-9.7672]), tensor([9.5348])]]

Logits for hard examples=[tensor([-9.7672]), tensor([9.5348])]


 76%|███████▋  | 7626/10000 [00:29<00:12, 197.05it/s, logits=[tensor([-9.7772]), tensor([9.5473])]]

Logits for hard examples=[tensor([-9.7772]), tensor([9.5473])]


 77%|███████▋  | 7739/10000 [00:29<00:10, 219.32it/s, logits=[tensor([-9.7871]), tensor([9.5597])]]

Logits for hard examples=[tensor([-9.7871]), tensor([9.5597])]


 78%|███████▊  | 7834/10000 [00:30<00:10, 213.50it/s, logits=[tensor([-9.7969]), tensor([9.5719])]]

Logits for hard examples=[tensor([-9.7969]), tensor([9.5719])]


 79%|███████▉  | 7925/10000 [00:30<00:09, 219.76it/s, logits=[tensor([-9.8066]), tensor([9.5840])]]

Logits for hard examples=[tensor([-9.8066]), tensor([9.5840])]


 80%|████████  | 8041/10000 [00:30<00:08, 219.07it/s, logits=[tensor([-9.8162]), tensor([9.5959])]]

Logits for hard examples=[tensor([-9.8162]), tensor([9.5959])]


 81%|████████▏ | 8129/10000 [00:31<00:09, 207.84it/s, logits=[tensor([-9.8257]), tensor([9.6077])]]

Logits for hard examples=[tensor([-9.8257]), tensor([9.6077])]


 82%|████████▏ | 8235/10000 [00:31<00:08, 201.13it/s, logits=[tensor([-9.8352]), tensor([9.6194])]]

Logits for hard examples=[tensor([-9.8352]), tensor([9.6194])]


 83%|████████▎ | 8338/10000 [00:32<00:08, 194.34it/s, logits=[tensor([-9.8446]), tensor([9.6309])]]

Logits for hard examples=[tensor([-9.8446]), tensor([9.6309])]


 84%|████████▍ | 8421/10000 [00:32<00:08, 194.73it/s, logits=[tensor([-9.8539]), tensor([9.6422])]]

Logits for hard examples=[tensor([-9.8539]), tensor([9.6422])]


 85%|████████▌ | 8535/10000 [00:33<00:05, 268.54it/s, logits=[tensor([-9.8631]), tensor([9.6535])]]

Logits for hard examples=[tensor([-9.8631]), tensor([9.6535])]


 86%|████████▋ | 8626/10000 [00:33<00:05, 267.98it/s, logits=[tensor([-9.8722]), tensor([9.6645])]]

Logits for hard examples=[tensor([-9.8722]), tensor([9.6645])]


 88%|████████▊ | 8755/10000 [00:34<00:04, 305.83it/s, logits=[tensor([-9.8812]), tensor([9.6756])]]

Logits for hard examples=[tensor([-9.8812]), tensor([9.6756])]


 88%|████████▊ | 8850/10000 [00:34<00:03, 298.94it/s, logits=[tensor([-9.8901]), tensor([9.6864])]]

Logits for hard examples=[tensor([-9.8901]), tensor([9.6864])]


 89%|████████▉ | 8941/10000 [00:34<00:03, 287.74it/s, logits=[tensor([-9.8990]), tensor([9.6971])]]

Logits for hard examples=[tensor([-9.8990]), tensor([9.6971])]


 90%|█████████ | 9034/10000 [00:35<00:03, 300.64it/s, logits=[tensor([-9.9078]), tensor([9.7077])]]

Logits for hard examples=[tensor([-9.9078]), tensor([9.7077])]


 92%|█████████▏| 9158/10000 [00:35<00:02, 296.06it/s, logits=[tensor([-9.9165]), tensor([9.7182])]]

Logits for hard examples=[tensor([-9.9165]), tensor([9.7182])]


 93%|█████████▎| 9254/10000 [00:35<00:02, 286.11it/s, logits=[tensor([-9.9252]), tensor([9.7286])]]

Logits for hard examples=[tensor([-9.9252]), tensor([9.7286])]


 93%|█████████▎| 9346/10000 [00:36<00:02, 296.03it/s, logits=[tensor([-9.9338]), tensor([9.7389])]]

Logits for hard examples=[tensor([-9.9338]), tensor([9.7389])]


 94%|█████████▍| 9435/10000 [00:36<00:01, 285.72it/s, logits=[tensor([-9.9422]), tensor([9.7491])]]

Logits for hard examples=[tensor([-9.9422]), tensor([9.7491])]


 96%|█████████▌| 9554/10000 [00:36<00:01, 288.49it/s, logits=[tensor([-9.9507]), tensor([9.7591])]]

Logits for hard examples=[tensor([-9.9507]), tensor([9.7591])]


 96%|█████████▋| 9646/10000 [00:37<00:01, 301.03it/s, logits=[tensor([-9.9590]), tensor([9.7690])]]

Logits for hard examples=[tensor([-9.9590]), tensor([9.7690])]


 97%|█████████▋| 9739/10000 [00:37<00:00, 295.38it/s, logits=[tensor([-9.9673]), tensor([9.7789])]]

Logits for hard examples=[tensor([-9.9673]), tensor([9.7789])]


 99%|█████████▊| 9860/10000 [00:37<00:00, 288.10it/s, logits=[tensor([-9.9755]), tensor([9.7887])]]

Logits for hard examples=[tensor([-9.9755]), tensor([9.7887])]


100%|█████████▉| 9953/10000 [00:38<00:00, 299.64it/s, logits=[tensor([-9.9837]), tensor([9.7983])]]

Logits for hard examples=[tensor([-9.9837]), tensor([9.7983])]


100%|██████████| 10000/10000 [00:38<00:00, 260.51it/s, logits=[tensor([-9.9837]), tensor([9.7983])]]


SEQUENCE_LEN=5


  0%|          | 27/10000 [00:00<00:37, 263.00it/s, logits=[tensor([-0.2277]), tensor([-0.2291])]]

Logits for hard examples=[tensor([-0.2277]), tensor([-0.2291])]


  1%|▏         | 142/10000 [00:00<00:35, 281.14it/s, logits=[tensor([0.0413]), tensor([0.0418])]]

Logits for hard examples=[tensor([0.0413]), tensor([0.0418])]


  2%|▏         | 236/10000 [00:00<00:32, 296.54it/s, logits=[tensor([0.0102]), tensor([0.0111])]]

Logits for hard examples=[tensor([0.0102]), tensor([0.0111])]


  3%|▎         | 326/10000 [00:01<00:34, 280.39it/s, logits=[tensor([-0.0149]), tensor([-0.0136])]]

Logits for hard examples=[tensor([-0.0149]), tensor([-0.0136])]


  4%|▍         | 449/10000 [00:01<00:32, 296.81it/s, logits=[tensor([-0.0043]), tensor([-0.0020])]]

Logits for hard examples=[tensor([-0.0043]), tensor([-0.0020])]


  5%|▌         | 543/10000 [00:01<00:31, 298.39it/s, logits=[tensor([-0.0151]), tensor([-0.0102])]]

Logits for hard examples=[tensor([-0.0151]), tensor([-0.0102])]


  6%|▋         | 633/10000 [00:02<00:33, 278.85it/s, logits=[tensor([-0.0683]), tensor([-0.0462])]]

Logits for hard examples=[tensor([-0.0683]), tensor([-0.0462])]


  8%|▊         | 754/10000 [00:02<00:32, 288.13it/s, logits=[tensor([-10.6462]), tensor([7.1076])]]

Logits for hard examples=[tensor([-10.6462]), tensor([7.1076])]


  8%|▊         | 844/10000 [00:02<00:31, 289.08it/s, logits=[tensor([-15.0384]), tensor([7.0539])]]

Logits for hard examples=[tensor([-15.0384]), tensor([7.0539])]


  9%|▉         | 930/10000 [00:03<00:34, 263.37it/s, logits=[tensor([-15.5570]), tensor([6.7944])]]

Logits for hard examples=[tensor([-15.5570]), tensor([6.7944])]


 11%|█         | 1052/10000 [00:03<00:29, 298.30it/s, logits=[tensor([-15.6124]), tensor([6.7809])]]

Logits for hard examples=[tensor([-15.6124]), tensor([6.7809])]


 11%|█▏        | 1144/10000 [00:04<00:30, 292.70it/s, logits=[tensor([-15.6175]), tensor([6.8110])]]

Logits for hard examples=[tensor([-15.6175]), tensor([6.8110])]


 12%|█▏        | 1235/10000 [00:04<00:31, 281.47it/s, logits=[tensor([-15.6161]), tensor([6.8464])]]

Logits for hard examples=[tensor([-15.6161]), tensor([6.8464])]


 13%|█▎        | 1330/10000 [00:04<00:32, 265.66it/s, logits=[tensor([-15.6137]), tensor([6.8822])]]

Logits for hard examples=[tensor([-15.6137]), tensor([6.8822])]


 14%|█▍        | 1432/10000 [00:05<00:41, 208.89it/s, logits=[tensor([-15.6113]), tensor([6.9180])]]

Logits for hard examples=[tensor([-15.6113]), tensor([6.9180])]


 15%|█▌        | 1522/10000 [00:05<00:40, 208.99it/s, logits=[tensor([-15.6089]), tensor([6.9532])]]

Logits for hard examples=[tensor([-15.6089]), tensor([6.9532])]


 16%|█▋        | 1637/10000 [00:06<00:39, 213.29it/s, logits=[tensor([-15.6065]), tensor([6.9881])]]

Logits for hard examples=[tensor([-15.6065]), tensor([6.9881])]


 17%|█▋        | 1725/10000 [00:06<00:41, 200.64it/s, logits=[tensor([-15.6041]), tensor([7.0228])]]

Logits for hard examples=[tensor([-15.6041]), tensor([7.0228])]


 18%|█▊        | 1830/10000 [00:07<00:41, 195.80it/s, logits=[tensor([-15.6018]), tensor([7.0567])]]

Logits for hard examples=[tensor([-15.6018]), tensor([7.0567])]


 19%|█▉        | 1934/10000 [00:07<00:40, 201.60it/s, logits=[tensor([-15.5996]), tensor([7.0902])]]

Logits for hard examples=[tensor([-15.5996]), tensor([7.0902])]


 20%|██        | 2034/10000 [00:08<00:32, 247.78it/s, logits=[tensor([-15.5973]), tensor([7.1235])]]

Logits for hard examples=[tensor([-15.5973]), tensor([7.1235])]


 22%|██▏       | 2150/10000 [00:08<00:28, 277.26it/s, logits=[tensor([-15.5951]), tensor([7.1539])]]

Logits for hard examples=[tensor([-15.5951]), tensor([7.1539])]


 22%|██▏       | 2246/10000 [00:08<00:25, 299.10it/s, logits=[tensor([-15.5929]), tensor([7.1771])]]

Logits for hard examples=[tensor([-15.5929]), tensor([7.1771])]


 23%|██▎       | 2338/10000 [00:09<00:25, 301.65it/s, logits=[tensor([-15.5908]), tensor([7.2003])]]

Logits for hard examples=[tensor([-15.5908]), tensor([7.2003])]


 24%|██▍       | 2431/10000 [00:09<00:26, 290.44it/s, logits=[tensor([-15.5886]), tensor([7.2235])]]

Logits for hard examples=[tensor([-15.5886]), tensor([7.2235])]


 26%|██▌       | 2558/10000 [00:09<00:24, 305.75it/s, logits=[tensor([-15.5865]), tensor([7.2466])]]

Logits for hard examples=[tensor([-15.5865]), tensor([7.2466])]


 26%|██▋       | 2649/10000 [00:10<00:25, 291.96it/s, logits=[tensor([-15.5844]), tensor([7.2692])]]

Logits for hard examples=[tensor([-15.5844]), tensor([7.2692])]


 27%|██▋       | 2740/10000 [00:10<00:24, 290.66it/s, logits=[tensor([-15.5823]), tensor([7.2918])]]

Logits for hard examples=[tensor([-15.5823]), tensor([7.2918])]


 28%|██▊       | 2836/10000 [00:10<00:23, 304.25it/s, logits=[tensor([-15.5802]), tensor([7.3143])]]

Logits for hard examples=[tensor([-15.5802]), tensor([7.3143])]


 29%|██▉       | 2932/10000 [00:11<00:23, 306.14it/s, logits=[tensor([-15.5782]), tensor([7.3367])]]

Logits for hard examples=[tensor([-15.5782]), tensor([7.3367])]


 30%|███       | 3027/10000 [00:11<00:24, 283.48it/s, logits=[tensor([-15.5762]), tensor([7.3586])]]

Logits for hard examples=[tensor([-15.5762]), tensor([7.3586])]


 32%|███▏      | 3157/10000 [00:11<00:22, 303.12it/s, logits=[tensor([-15.5742]), tensor([7.3803])]]

Logits for hard examples=[tensor([-15.5742]), tensor([7.3803])]


 32%|███▏      | 3248/10000 [00:12<00:23, 292.24it/s, logits=[tensor([-15.5722]), tensor([7.4019])]]

Logits for hard examples=[tensor([-15.5722]), tensor([7.4019])]


 33%|███▎      | 3339/10000 [00:12<00:22, 293.28it/s, logits=[tensor([-15.5704]), tensor([7.4221])]]

Logits for hard examples=[tensor([-15.5704]), tensor([7.4221])]


 34%|███▍      | 3437/10000 [00:12<00:21, 310.15it/s, logits=[tensor([-15.5695]), tensor([7.4289])]]

Logits for hard examples=[tensor([-15.5695]), tensor([7.4289])]


 35%|███▌      | 3532/10000 [00:13<00:21, 300.88it/s, logits=[tensor([-15.5690]), tensor([7.4316])]]

Logits for hard examples=[tensor([-15.5690]), tensor([7.4316])]


 37%|███▋      | 3652/10000 [00:13<00:22, 284.68it/s, logits=[tensor([-15.5685]), tensor([7.4339])]]

Logits for hard examples=[tensor([-15.5685]), tensor([7.4339])]


 37%|███▋      | 3748/10000 [00:13<00:20, 302.25it/s, logits=[tensor([-15.5679]), tensor([7.4362])]]

Logits for hard examples=[tensor([-15.5679]), tensor([7.4362])]


 38%|███▊      | 3837/10000 [00:14<00:22, 277.32it/s, logits=[tensor([-15.5674]), tensor([7.4384])]]

Logits for hard examples=[tensor([-15.5674]), tensor([7.4384])]


 40%|███▉      | 3962/10000 [00:14<00:20, 299.33it/s, logits=[tensor([-15.5669]), tensor([7.4407])]]

Logits for hard examples=[tensor([-15.5669]), tensor([7.4407])]


 41%|████      | 4058/10000 [00:14<00:19, 308.57it/s, logits=[tensor([-15.5664]), tensor([7.4429])]]

Logits for hard examples=[tensor([-15.5664]), tensor([7.4429])]


 42%|████▏     | 4150/10000 [00:15<00:19, 296.21it/s, logits=[tensor([-15.5658]), tensor([7.4451])]]

Logits for hard examples=[tensor([-15.5658]), tensor([7.4451])]


 42%|████▏     | 4246/10000 [00:15<00:18, 308.43it/s, logits=[tensor([-15.5653]), tensor([7.4474])]]

Logits for hard examples=[tensor([-15.5653]), tensor([7.4474])]


 43%|████▎     | 4340/10000 [00:15<00:18, 303.38it/s, logits=[tensor([-15.5648]), tensor([7.4496])]]

Logits for hard examples=[tensor([-15.5648]), tensor([7.4496])]


 44%|████▍     | 4434/10000 [00:16<00:18, 304.73it/s, logits=[tensor([-15.5643]), tensor([7.4518])]]

Logits for hard examples=[tensor([-15.5643]), tensor([7.4518])]


 46%|████▌     | 4565/10000 [00:16<00:17, 316.40it/s, logits=[tensor([-15.5638]), tensor([7.4539])]]

Logits for hard examples=[tensor([-15.5638]), tensor([7.4539])]


 47%|████▋     | 4661/10000 [00:16<00:17, 304.28it/s, logits=[tensor([-15.5632]), tensor([7.4561])]]

Logits for hard examples=[tensor([-15.5632]), tensor([7.4561])]


 47%|████▋     | 4724/10000 [00:17<00:18, 286.90it/s, logits=[tensor([-15.5627]), tensor([7.4583])]]

Logits for hard examples=[tensor([-15.5627]), tensor([7.4583])]


 48%|████▊     | 4847/10000 [00:17<00:17, 295.28it/s, logits=[tensor([-15.5622]), tensor([7.4605])]]

Logits for hard examples=[tensor([-15.5622]), tensor([7.4605])]


 49%|████▉     | 4940/10000 [00:17<00:16, 299.76it/s, logits=[tensor([-15.5617]), tensor([7.4627])]]

Logits for hard examples=[tensor([-15.5617]), tensor([7.4627])]


 50%|█████     | 5025/10000 [00:18<00:21, 227.47it/s, logits=[tensor([-15.5612]), tensor([7.4649])]]

Logits for hard examples=[tensor([-15.5612]), tensor([7.4649])]


 51%|█████▏    | 5140/10000 [00:18<00:22, 214.13it/s, logits=[tensor([-15.5607]), tensor([7.4671])]]

Logits for hard examples=[tensor([-15.5607]), tensor([7.4671])]


 52%|█████▏    | 5236/10000 [00:19<00:21, 217.51it/s, logits=[tensor([-15.5602]), tensor([7.4693])]]

Logits for hard examples=[tensor([-15.5602]), tensor([7.4693])]


 53%|█████▎    | 5336/10000 [00:19<00:20, 227.30it/s, logits=[tensor([-15.5597]), tensor([7.4715])]]

Logits for hard examples=[tensor([-15.5597]), tensor([7.4715])]


 54%|█████▍    | 5427/10000 [00:20<00:23, 196.55it/s, logits=[tensor([-15.5591]), tensor([7.4736])]]

Logits for hard examples=[tensor([-15.5591]), tensor([7.4736])]


 55%|█████▌    | 5533/10000 [00:20<00:22, 196.21it/s, logits=[tensor([-15.5586]), tensor([7.4758])]]

Logits for hard examples=[tensor([-15.5586]), tensor([7.4758])]


 56%|█████▋    | 5634/10000 [00:21<00:22, 190.71it/s, logits=[tensor([-15.5581]), tensor([7.4780])]]

Logits for hard examples=[tensor([-15.5581]), tensor([7.4780])]


 58%|█████▊    | 5761/10000 [00:21<00:14, 285.66it/s, logits=[tensor([-15.5576]), tensor([7.4801])]]

Logits for hard examples=[tensor([-15.5576]), tensor([7.4801])]


 59%|█████▊    | 5851/10000 [00:22<00:14, 284.92it/s, logits=[tensor([-15.5571]), tensor([7.4823])]]

Logits for hard examples=[tensor([-15.5571]), tensor([7.4823])]


 59%|█████▉    | 5945/10000 [00:22<00:14, 288.17it/s, logits=[tensor([-15.5566]), tensor([7.4845])]]

Logits for hard examples=[tensor([-15.5566]), tensor([7.4845])]


 60%|██████    | 6038/10000 [00:22<00:13, 293.73it/s, logits=[tensor([-15.5561]), tensor([7.4866])]]

Logits for hard examples=[tensor([-15.5561]), tensor([7.4866])]


 61%|██████▏   | 6134/10000 [00:22<00:12, 301.42it/s, logits=[tensor([-15.5556]), tensor([7.4888])]]

Logits for hard examples=[tensor([-15.5556]), tensor([7.4888])]


 63%|██████▎   | 6252/10000 [00:23<00:13, 280.44it/s, logits=[tensor([-15.5551]), tensor([7.4910])]]

Logits for hard examples=[tensor([-15.5551]), tensor([7.4910])]


 63%|██████▎   | 6345/10000 [00:23<00:12, 300.69it/s, logits=[tensor([-15.5546]), tensor([7.4931])]]

Logits for hard examples=[tensor([-15.5546]), tensor([7.4931])]


 64%|██████▍   | 6438/10000 [00:24<00:11, 298.71it/s, logits=[tensor([-15.5541]), tensor([7.4952])]]

Logits for hard examples=[tensor([-15.5541]), tensor([7.4952])]


 66%|██████▌   | 6557/10000 [00:24<00:11, 287.56it/s, logits=[tensor([-15.5537]), tensor([7.4973])]]

Logits for hard examples=[tensor([-15.5537]), tensor([7.4973])]


 66%|██████▋   | 6647/10000 [00:24<00:11, 292.99it/s, logits=[tensor([-15.5532]), tensor([7.4994])]]

Logits for hard examples=[tensor([-15.5532]), tensor([7.4994])]


 67%|██████▋   | 6734/10000 [00:25<00:11, 273.01it/s, logits=[tensor([-15.5527]), tensor([7.5015])]]

Logits for hard examples=[tensor([-15.5527]), tensor([7.5015])]


 68%|██████▊   | 6850/10000 [00:25<00:11, 282.37it/s, logits=[tensor([-15.5522]), tensor([7.5036])]]

Logits for hard examples=[tensor([-15.5522]), tensor([7.5036])]


 69%|██████▉   | 6944/10000 [00:25<00:10, 301.09it/s, logits=[tensor([-15.5517]), tensor([7.5058])]]

Logits for hard examples=[tensor([-15.5517]), tensor([7.5058])]


 70%|███████   | 7036/10000 [00:26<00:10, 286.89it/s, logits=[tensor([-15.5512]), tensor([7.5079])]]

Logits for hard examples=[tensor([-15.5512]), tensor([7.5079])]


 72%|███████▏  | 7154/10000 [00:26<00:09, 288.91it/s, logits=[tensor([-15.5507]), tensor([7.5100])]]

Logits for hard examples=[tensor([-15.5507]), tensor([7.5100])]


 72%|███████▏  | 7246/10000 [00:26<00:09, 293.09it/s, logits=[tensor([-15.5502]), tensor([7.5121])]]

Logits for hard examples=[tensor([-15.5502]), tensor([7.5121])]


 73%|███████▎  | 7334/10000 [00:27<00:10, 265.97it/s, logits=[tensor([-15.5498]), tensor([7.5141])]]

Logits for hard examples=[tensor([-15.5498]), tensor([7.5141])]


 75%|███████▍  | 7456/10000 [00:27<00:08, 294.68it/s, logits=[tensor([-15.5493]), tensor([7.5161])]]

Logits for hard examples=[tensor([-15.5493]), tensor([7.5161])]


 75%|███████▌  | 7546/10000 [00:27<00:08, 288.73it/s, logits=[tensor([-15.5488]), tensor([7.5181])]]

Logits for hard examples=[tensor([-15.5488]), tensor([7.5181])]


 76%|███████▋  | 7632/10000 [00:28<00:08, 263.98it/s, logits=[tensor([-15.5483]), tensor([7.5201])]]

Logits for hard examples=[tensor([-15.5483]), tensor([7.5201])]


 77%|███████▋  | 7748/10000 [00:28<00:08, 280.10it/s, logits=[tensor([-15.5478]), tensor([7.5221])]]

Logits for hard examples=[tensor([-15.5478]), tensor([7.5221])]


 78%|███████▊  | 7837/10000 [00:29<00:07, 280.21it/s, logits=[tensor([-15.5474]), tensor([7.5241])]]

Logits for hard examples=[tensor([-15.5474]), tensor([7.5241])]


 80%|███████▉  | 7954/10000 [00:29<00:07, 273.22it/s, logits=[tensor([-15.5469]), tensor([7.5261])]]

Logits for hard examples=[tensor([-15.5469]), tensor([7.5261])]


 80%|████████  | 8043/10000 [00:29<00:07, 278.59it/s, logits=[tensor([-15.5464]), tensor([7.5281])]]

Logits for hard examples=[tensor([-15.5464]), tensor([7.5281])]


 81%|████████▏ | 8135/10000 [00:30<00:06, 289.20it/s, logits=[tensor([-15.5459]), tensor([7.5301])]]

Logits for hard examples=[tensor([-15.5459]), tensor([7.5301])]


 83%|████████▎ | 8255/10000 [00:30<00:06, 284.50it/s, logits=[tensor([-15.5455]), tensor([7.5321])]]

Logits for hard examples=[tensor([-15.5455]), tensor([7.5321])]


 83%|████████▎ | 8349/10000 [00:30<00:05, 298.66it/s, logits=[tensor([-15.5450]), tensor([7.5341])]]

Logits for hard examples=[tensor([-15.5450]), tensor([7.5341])]


 84%|████████▍ | 8439/10000 [00:31<00:05, 293.17it/s, logits=[tensor([-15.5445]), tensor([7.5361])]]

Logits for hard examples=[tensor([-15.5445]), tensor([7.5361])]


 85%|████████▌ | 8524/10000 [00:31<00:06, 243.89it/s, logits=[tensor([-15.5440]), tensor([7.5380])]]

Logits for hard examples=[tensor([-15.5440]), tensor([7.5380])]


 86%|████████▌ | 8619/10000 [00:31<00:06, 219.72it/s, logits=[tensor([-15.5436]), tensor([7.5400])]]

Logits for hard examples=[tensor([-15.5436]), tensor([7.5400])]


 87%|████████▋ | 8734/10000 [00:32<00:05, 213.56it/s, logits=[tensor([-15.5431]), tensor([7.5419])]]

Logits for hard examples=[tensor([-15.5431]), tensor([7.5419])]


 88%|████████▊ | 8831/10000 [00:32<00:05, 232.07it/s, logits=[tensor([-15.5427]), tensor([7.5439])]]

Logits for hard examples=[tensor([-15.5427]), tensor([7.5439])]


 89%|████████▉ | 8925/10000 [00:33<00:05, 213.66it/s, logits=[tensor([-15.5422]), tensor([7.5458])]]

Logits for hard examples=[tensor([-15.5422]), tensor([7.5458])]


 90%|█████████ | 9035/10000 [00:33<00:04, 205.17it/s, logits=[tensor([-15.5417]), tensor([7.5477])]]

Logits for hard examples=[tensor([-15.5417]), tensor([7.5477])]


 91%|█████████▏| 9139/10000 [00:34<00:04, 192.23it/s, logits=[tensor([-15.5413]), tensor([7.5497])]]

Logits for hard examples=[tensor([-15.5413]), tensor([7.5497])]


 92%|█████████▏| 9226/10000 [00:34<00:03, 211.28it/s, logits=[tensor([-15.5408]), tensor([7.5516])]]

Logits for hard examples=[tensor([-15.5408]), tensor([7.5516])]


 93%|█████████▎| 9338/10000 [00:35<00:02, 248.96it/s, logits=[tensor([-15.5404]), tensor([7.5535])]]

Logits for hard examples=[tensor([-15.5404]), tensor([7.5535])]


 94%|█████████▍| 9428/10000 [00:35<00:02, 271.47it/s, logits=[tensor([-15.5399]), tensor([7.5555])]]

Logits for hard examples=[tensor([-15.5399]), tensor([7.5555])]


 96%|█████████▌| 9555/10000 [00:36<00:01, 298.05it/s, logits=[tensor([-15.5395]), tensor([7.5574])]]

Logits for hard examples=[tensor([-15.5395]), tensor([7.5574])]


 96%|█████████▋| 9645/10000 [00:36<00:01, 276.80it/s, logits=[tensor([-15.5390]), tensor([7.5594])]]

Logits for hard examples=[tensor([-15.5390]), tensor([7.5594])]


 97%|█████████▋| 9735/10000 [00:36<00:00, 279.15it/s, logits=[tensor([-15.5385]), tensor([7.5613])]]

Logits for hard examples=[tensor([-15.5385]), tensor([7.5613])]


 98%|█████████▊| 9826/10000 [00:37<00:00, 274.04it/s, logits=[tensor([-15.5381]), tensor([7.5632])]]

Logits for hard examples=[tensor([-15.5381]), tensor([7.5632])]


 99%|█████████▉| 9945/10000 [00:37<00:00, 280.95it/s, logits=[tensor([-15.5377]), tensor([7.5651])]]

Logits for hard examples=[tensor([-15.5377]), tensor([7.5651])]


100%|██████████| 10000/10000 [00:37<00:00, 265.70it/s, logits=[tensor([-15.5377]), tensor([7.5651])]]


SEQUENCE_LEN=6


  0%|          | 27/10000 [00:00<00:37, 266.03it/s, logits=[tensor([0.1169]), tensor([0.1164])]]

Logits for hard examples=[tensor([0.1169]), tensor([0.1164])]


  2%|▏         | 153/10000 [00:00<00:33, 294.86it/s, logits=[tensor([0.0278]), tensor([0.0269])]]

Logits for hard examples=[tensor([0.0278]), tensor([0.0269])]


  2%|▏         | 243/10000 [00:00<00:33, 289.60it/s, logits=[tensor([-0.0139]), tensor([-0.0138])]]

Logits for hard examples=[tensor([-0.0139]), tensor([-0.0138])]


  3%|▎         | 338/10000 [00:01<00:32, 296.89it/s, logits=[tensor([-0.0027]), tensor([-0.0020])]]

Logits for hard examples=[tensor([-0.0027]), tensor([-0.0020])]


  4%|▍         | 431/10000 [00:01<00:32, 292.69it/s, logits=[tensor([-0.0007]), tensor([0.0007])]]

Logits for hard examples=[tensor([-0.0007]), tensor([0.0007])]


  6%|▌         | 556/10000 [00:01<00:31, 303.12it/s, logits=[tensor([-0.0064]), tensor([-0.0034])]]

Logits for hard examples=[tensor([-0.0064]), tensor([-0.0034])]


  7%|▋         | 653/10000 [00:02<00:31, 297.33it/s, logits=[tensor([-0.0197]), tensor([-0.0104])]]

Logits for hard examples=[tensor([-0.0197]), tensor([-0.0104])]


  7%|▋         | 743/10000 [00:02<00:33, 279.49it/s, logits=[tensor([-0.4938]), tensor([-0.4921])]]

Logits for hard examples=[tensor([-0.4938]), tensor([-0.4921])]


  8%|▊         | 832/10000 [00:02<00:32, 280.00it/s, logits=[tensor([0.1176]), tensor([0.1190])]]

Logits for hard examples=[tensor([0.1176]), tensor([0.1190])]


 10%|▉         | 953/10000 [00:03<00:30, 294.63it/s, logits=[tensor([0.0356]), tensor([0.0366])]]

Logits for hard examples=[tensor([0.0356]), tensor([0.0366])]


 10%|█         | 1041/10000 [00:03<00:32, 272.58it/s, logits=[tensor([-0.0313]), tensor([-0.0301])]]

Logits for hard examples=[tensor([-0.0313]), tensor([-0.0301])]


 11%|█▏        | 1133/10000 [00:03<00:30, 292.02it/s, logits=[tensor([-0.0010]), tensor([0.0006])]]

Logits for hard examples=[tensor([-0.0010]), tensor([0.0006])]


 13%|█▎        | 1256/10000 [00:04<00:29, 295.67it/s, logits=[tensor([-0.0013]), tensor([0.0007])]]

Logits for hard examples=[tensor([-0.0013]), tensor([0.0007])]


 13%|█▎        | 1343/10000 [00:04<00:32, 269.06it/s, logits=[tensor([-0.0083]), tensor([-0.0056])]]

Logits for hard examples=[tensor([-0.0083]), tensor([-0.0056])]


 14%|█▍        | 1432/10000 [00:05<00:30, 284.95it/s, logits=[tensor([-0.0113]), tensor([-0.0074])]]

Logits for hard examples=[tensor([-0.0113]), tensor([-0.0074])]


 15%|█▌        | 1546/10000 [00:05<00:30, 277.51it/s, logits=[tensor([-0.0186]), tensor([-0.0119])]]

Logits for hard examples=[tensor([-0.0186]), tensor([-0.0119])]


 16%|█▋        | 1628/10000 [00:05<00:32, 257.81it/s, logits=[tensor([-0.0217]), tensor([0.0026])]]

Logits for hard examples=[tensor([-0.0217]), tensor([0.0026])]


 17%|█▋        | 1745/10000 [00:06<00:29, 279.98it/s, logits=[tensor([-5.9969]), tensor([7.5269])]]

Logits for hard examples=[tensor([-5.9969]), tensor([7.5269])]


 18%|█▊        | 1832/10000 [00:06<00:29, 277.60it/s, logits=[tensor([-8.4948]), tensor([10.2561])]]

Logits for hard examples=[tensor([-8.4948]), tensor([10.2561])]


 20%|█▉        | 1951/10000 [00:06<00:27, 288.07it/s, logits=[tensor([-8.8902]), tensor([10.6095])]]

Logits for hard examples=[tensor([-8.8902]), tensor([10.6095])]


 20%|██        | 2041/10000 [00:07<00:29, 272.80it/s, logits=[tensor([-8.9785]), tensor([10.6519])]]

Logits for hard examples=[tensor([-8.9785]), tensor([10.6519])]


 21%|██        | 2119/10000 [00:07<00:37, 212.07it/s, logits=[tensor([-9.0224]), tensor([10.6535])]]

Logits for hard examples=[tensor([-9.0224]), tensor([10.6535])]


 22%|██▏       | 2231/10000 [00:08<00:36, 215.12it/s, logits=[tensor([-9.0589]), tensor([10.6502])]]

Logits for hard examples=[tensor([-9.0589]), tensor([10.6502])]


 23%|██▎       | 2320/10000 [00:08<00:38, 201.42it/s, logits=[tensor([-9.0932]), tensor([10.6467])]]

Logits for hard examples=[tensor([-9.0932]), tensor([10.6467])]


 24%|██▍       | 2432/10000 [00:09<00:37, 203.30it/s, logits=[tensor([-9.1261]), tensor([10.6435])]]

Logits for hard examples=[tensor([-9.1261]), tensor([10.6435])]


 25%|██▌       | 2520/10000 [00:09<00:37, 197.97it/s, logits=[tensor([-9.1577]), tensor([10.6408])]]

Logits for hard examples=[tensor([-9.1577]), tensor([10.6408])]


 26%|██▋       | 2629/10000 [00:10<00:34, 213.45it/s, logits=[tensor([-9.1883]), tensor([10.6385])]]

Logits for hard examples=[tensor([-9.1883]), tensor([10.6385])]


 27%|██▋       | 2739/10000 [00:10<00:35, 207.37it/s, logits=[tensor([-9.2178]), tensor([10.6366])]]

Logits for hard examples=[tensor([-9.2178]), tensor([10.6366])]


 28%|██▊       | 2834/10000 [00:10<00:25, 275.81it/s, logits=[tensor([-9.2463]), tensor([10.6350])]]

Logits for hard examples=[tensor([-9.2463]), tensor([10.6350])]


 30%|██▉       | 2960/10000 [00:11<00:23, 298.20it/s, logits=[tensor([-9.2740]), tensor([10.6337])]]

Logits for hard examples=[tensor([-9.2740]), tensor([10.6337])]


 31%|███       | 3051/10000 [00:11<00:24, 285.84it/s, logits=[tensor([-9.3007]), tensor([10.6328])]]

Logits for hard examples=[tensor([-9.3007]), tensor([10.6328])]


 31%|███▏      | 3144/10000 [00:12<00:22, 298.15it/s, logits=[tensor([-9.3267]), tensor([10.6321])]]

Logits for hard examples=[tensor([-9.3267]), tensor([10.6321])]


 32%|███▏      | 3235/10000 [00:12<00:22, 296.63it/s, logits=[tensor([-9.3520]), tensor([10.6318])]]

Logits for hard examples=[tensor([-9.3520]), tensor([10.6318])]


 34%|███▎      | 3355/10000 [00:12<00:24, 274.64it/s, logits=[tensor([-9.3765]), tensor([10.6316])]]

Logits for hard examples=[tensor([-9.3765]), tensor([10.6316])]


 34%|███▍      | 3446/10000 [00:13<00:22, 290.63it/s, logits=[tensor([-9.4003]), tensor([10.6317])]]

Logits for hard examples=[tensor([-9.4003]), tensor([10.6317])]


 35%|███▌      | 3538/10000 [00:13<00:21, 295.96it/s, logits=[tensor([-9.4231]), tensor([10.6321])]]

Logits for hard examples=[tensor([-9.4231]), tensor([10.6321])]


 37%|███▋      | 3656/10000 [00:13<00:22, 285.76it/s, logits=[tensor([-9.4448]), tensor([10.6326])]]

Logits for hard examples=[tensor([-9.4448]), tensor([10.6326])]


 38%|███▊      | 3753/10000 [00:14<00:20, 303.15it/s, logits=[tensor([-9.4660]), tensor([10.6333])]]

Logits for hard examples=[tensor([-9.4660]), tensor([10.6333])]


 38%|███▊      | 3848/10000 [00:14<00:20, 302.04it/s, logits=[tensor([-9.4868]), tensor([10.6342])]]

Logits for hard examples=[tensor([-9.4868]), tensor([10.6342])]


 39%|███▉      | 3941/10000 [00:14<00:20, 293.25it/s, logits=[tensor([-9.5071]), tensor([10.6353])]]

Logits for hard examples=[tensor([-9.5071]), tensor([10.6353])]


 40%|████      | 4038/10000 [00:15<00:19, 310.23it/s, logits=[tensor([-9.5269]), tensor([10.6366])]]

Logits for hard examples=[tensor([-9.5269]), tensor([10.6366])]


 41%|████▏     | 4132/10000 [00:15<00:19, 298.70it/s, logits=[tensor([-9.5463]), tensor([10.6380])]]

Logits for hard examples=[tensor([-9.5463]), tensor([10.6380])]


 43%|████▎     | 4254/10000 [00:15<00:19, 289.18it/s, logits=[tensor([-9.5652]), tensor([10.6396])]]

Logits for hard examples=[tensor([-9.5652]), tensor([10.6396])]


 43%|████▎     | 4348/10000 [00:16<00:18, 300.37it/s, logits=[tensor([-9.5838]), tensor([10.6413])]]

Logits for hard examples=[tensor([-9.5838]), tensor([10.6413])]


 44%|████▍     | 4441/10000 [00:16<00:18, 300.48it/s, logits=[tensor([-9.6019]), tensor([10.6432])]]

Logits for hard examples=[tensor([-9.6019]), tensor([10.6432])]


 45%|████▌     | 4532/10000 [00:16<00:19, 283.38it/s, logits=[tensor([-9.6197]), tensor([10.6452])]]

Logits for hard examples=[tensor([-9.6197]), tensor([10.6452])]


 47%|████▋     | 4662/10000 [00:17<00:17, 305.64it/s, logits=[tensor([-9.6371]), tensor([10.6473])]]

Logits for hard examples=[tensor([-9.6371]), tensor([10.6473])]


 47%|████▋     | 4723/10000 [00:17<00:19, 276.46it/s, logits=[tensor([-9.6542]), tensor([10.6496])]]

Logits for hard examples=[tensor([-9.6542]), tensor([10.6496])]


 48%|████▊     | 4841/10000 [00:17<00:18, 284.68it/s, logits=[tensor([-9.6709]), tensor([10.6520])]]

Logits for hard examples=[tensor([-9.6709]), tensor([10.6520])]


 49%|████▉     | 4931/10000 [00:18<00:17, 288.77it/s, logits=[tensor([-9.6873]), tensor([10.6544])]]

Logits for hard examples=[tensor([-9.6873]), tensor([10.6544])]


 51%|█████     | 5051/10000 [00:18<00:17, 282.35it/s, logits=[tensor([-9.7034]), tensor([10.6570])]]

Logits for hard examples=[tensor([-9.7034]), tensor([10.6570])]


 51%|█████▏    | 5141/10000 [00:19<00:17, 284.47it/s, logits=[tensor([-9.7192]), tensor([10.6597])]]

Logits for hard examples=[tensor([-9.7192]), tensor([10.6597])]


 52%|█████▏    | 5234/10000 [00:19<00:15, 300.12it/s, logits=[tensor([-9.7347]), tensor([10.6624])]]

Logits for hard examples=[tensor([-9.7347]), tensor([10.6624])]


 54%|█████▎    | 5355/10000 [00:19<00:16, 278.71it/s, logits=[tensor([-9.7499]), tensor([10.6653])]]

Logits for hard examples=[tensor([-9.7499]), tensor([10.6653])]


 54%|█████▍    | 5447/10000 [00:20<00:15, 291.85it/s, logits=[tensor([-9.7649]), tensor([10.6681])]]

Logits for hard examples=[tensor([-9.7649]), tensor([10.6681])]


 55%|█████▌    | 5537/10000 [00:20<00:15, 291.04it/s, logits=[tensor([-9.7796]), tensor([10.6711])]]

Logits for hard examples=[tensor([-9.7796]), tensor([10.6711])]


 56%|█████▌    | 5624/10000 [00:20<00:18, 238.30it/s, logits=[tensor([-9.7940]), tensor([10.6740])]]

Logits for hard examples=[tensor([-9.7940]), tensor([10.6740])]


 57%|█████▋    | 5734/10000 [00:21<00:21, 202.81it/s, logits=[tensor([-9.8082]), tensor([10.6771])]]

Logits for hard examples=[tensor([-9.8082]), tensor([10.6771])]


 58%|█████▊    | 5822/10000 [00:21<00:21, 192.27it/s, logits=[tensor([-9.8222]), tensor([10.6802])]]

Logits for hard examples=[tensor([-9.8222]), tensor([10.6802])]


 59%|█████▉    | 5938/10000 [00:22<00:18, 221.13it/s, logits=[tensor([-9.8360]), tensor([10.6833])]]

Logits for hard examples=[tensor([-9.8360]), tensor([10.6833])]


 60%|██████    | 6025/10000 [00:22<00:21, 188.75it/s, logits=[tensor([-9.8495]), tensor([10.6865])]]

Logits for hard examples=[tensor([-9.8495]), tensor([10.6865])]


 61%|██████▏   | 6125/10000 [00:23<00:20, 188.09it/s, logits=[tensor([-9.8628]), tensor([10.6898])]]

Logits for hard examples=[tensor([-9.8628]), tensor([10.6898])]


 62%|██████▏   | 6231/10000 [00:23<00:19, 196.98it/s, logits=[tensor([-9.8759]), tensor([10.6932])]]

Logits for hard examples=[tensor([-9.8759]), tensor([10.6932])]


 63%|██████▎   | 6331/10000 [00:24<00:15, 241.84it/s, logits=[tensor([-9.8889]), tensor([10.6966])]]

Logits for hard examples=[tensor([-9.8889]), tensor([10.6966])]


 64%|██████▍   | 6444/10000 [00:24<00:13, 262.22it/s, logits=[tensor([-9.9016]), tensor([10.7001])]]

Logits for hard examples=[tensor([-9.9016]), tensor([10.7001])]


 65%|██████▌   | 6536/10000 [00:24<00:12, 284.72it/s, logits=[tensor([-9.9141]), tensor([10.7036])]]

Logits for hard examples=[tensor([-9.9141]), tensor([10.7036])]


 67%|██████▋   | 6654/10000 [00:25<00:12, 273.95it/s, logits=[tensor([-9.9265]), tensor([10.7071])]]

Logits for hard examples=[tensor([-9.9265]), tensor([10.7071])]


 67%|██████▋   | 6743/10000 [00:25<00:11, 272.47it/s, logits=[tensor([-9.9387]), tensor([10.7107])]]

Logits for hard examples=[tensor([-9.9387]), tensor([10.7107])]


 68%|██████▊   | 6829/10000 [00:26<00:11, 273.32it/s, logits=[tensor([-9.9507]), tensor([10.7143])]]

Logits for hard examples=[tensor([-9.9507]), tensor([10.7143])]


 70%|██████▉   | 6951/10000 [00:26<00:10, 289.23it/s, logits=[tensor([-9.9625]), tensor([10.7178])]]

Logits for hard examples=[tensor([-9.9625]), tensor([10.7178])]


 70%|███████   | 7039/10000 [00:26<00:10, 279.02it/s, logits=[tensor([-9.9742]), tensor([10.7214])]]

Logits for hard examples=[tensor([-9.9742]), tensor([10.7214])]


 71%|███████▏  | 7130/10000 [00:27<00:10, 274.02it/s, logits=[tensor([-9.9857]), tensor([10.7251])]]

Logits for hard examples=[tensor([-9.9857]), tensor([10.7251])]


 72%|███████▏  | 7243/10000 [00:27<00:10, 269.07it/s, logits=[tensor([-9.9971]), tensor([10.7288])]]

Logits for hard examples=[tensor([-9.9971]), tensor([10.7288])]


 73%|███████▎  | 7333/10000 [00:27<00:09, 281.02it/s, logits=[tensor([-10.0083]), tensor([10.7325])]]

Logits for hard examples=[tensor([-10.0083]), tensor([10.7325])]


 74%|███████▍  | 7427/10000 [00:28<00:09, 280.78it/s, logits=[tensor([-10.0194]), tensor([10.7363])]]

Logits for hard examples=[tensor([-10.0194]), tensor([10.7363])]


 75%|███████▌  | 7544/10000 [00:28<00:09, 267.74it/s, logits=[tensor([-10.0303]), tensor([10.7401])]]

Logits for hard examples=[tensor([-10.0303]), tensor([10.7401])]


 76%|███████▋  | 7639/10000 [00:28<00:08, 293.68it/s, logits=[tensor([-10.0411]), tensor([10.7439])]]

Logits for hard examples=[tensor([-10.0411]), tensor([10.7439])]


 77%|███████▋  | 7731/10000 [00:29<00:07, 287.97it/s, logits=[tensor([-10.0518]), tensor([10.7478])]]

Logits for hard examples=[tensor([-10.0518]), tensor([10.7478])]


 79%|███████▊  | 7853/10000 [00:29<00:07, 283.36it/s, logits=[tensor([-10.0623]), tensor([10.7517])]]

Logits for hard examples=[tensor([-10.0623]), tensor([10.7517])]


 79%|███████▉  | 7945/10000 [00:30<00:06, 297.05it/s, logits=[tensor([-10.0727]), tensor([10.7555])]]

Logits for hard examples=[tensor([-10.0727]), tensor([10.7555])]


 80%|████████  | 8037/10000 [00:30<00:06, 295.41it/s, logits=[tensor([-10.0830]), tensor([10.7594])]]

Logits for hard examples=[tensor([-10.0830]), tensor([10.7594])]


 82%|████████▏ | 8157/10000 [00:30<00:06, 285.61it/s, logits=[tensor([-10.0931]), tensor([10.7632])]]

Logits for hard examples=[tensor([-10.0931]), tensor([10.7632])]


 82%|████████▏ | 8247/10000 [00:31<00:06, 290.82it/s, logits=[tensor([-10.1032]), tensor([10.7671])]]

Logits for hard examples=[tensor([-10.1032]), tensor([10.7671])]


 83%|████████▎ | 8337/10000 [00:31<00:05, 283.69it/s, logits=[tensor([-10.1131]), tensor([10.7710])]]

Logits for hard examples=[tensor([-10.1131]), tensor([10.7710])]


 85%|████████▍ | 8453/10000 [00:31<00:05, 278.05it/s, logits=[tensor([-10.1229]), tensor([10.7749])]]

Logits for hard examples=[tensor([-10.1229]), tensor([10.7749])]


 85%|████████▌ | 8542/10000 [00:32<00:05, 280.87it/s, logits=[tensor([-10.1326]), tensor([10.7788])]]

Logits for hard examples=[tensor([-10.1326]), tensor([10.7788])]


 86%|████████▋ | 8634/10000 [00:32<00:04, 281.55it/s, logits=[tensor([-10.1422]), tensor([10.7828])]]

Logits for hard examples=[tensor([-10.1422]), tensor([10.7828])]


 88%|████████▊ | 8755/10000 [00:32<00:04, 289.46it/s, logits=[tensor([-10.1517]), tensor([10.7868])]]

Logits for hard examples=[tensor([-10.1517]), tensor([10.7868])]


 88%|████████▊ | 8847/10000 [00:33<00:03, 294.04it/s, logits=[tensor([-10.1610]), tensor([10.7908])]]

Logits for hard examples=[tensor([-10.1610]), tensor([10.7908])]


 89%|████████▉ | 8937/10000 [00:33<00:03, 276.98it/s, logits=[tensor([-10.1703]), tensor([10.7948])]]

Logits for hard examples=[tensor([-10.1703]), tensor([10.7948])]


 91%|█████████ | 9061/10000 [00:33<00:03, 294.84it/s, logits=[tensor([-10.1795]), tensor([10.7988])]]

Logits for hard examples=[tensor([-10.1795]), tensor([10.7988])]


 91%|█████████ | 9119/10000 [00:34<00:03, 236.82it/s, logits=[tensor([-10.1886]), tensor([10.8028])]]

Logits for hard examples=[tensor([-10.1886]), tensor([10.8028])]


 92%|█████████▏| 9239/10000 [00:34<00:03, 219.16it/s, logits=[tensor([-10.1976]), tensor([10.8068])]]

Logits for hard examples=[tensor([-10.1976]), tensor([10.8068])]


 93%|█████████▎| 9335/10000 [00:35<00:02, 228.92it/s, logits=[tensor([-10.2064]), tensor([10.8108])]]

Logits for hard examples=[tensor([-10.2064]), tensor([10.8108])]


 94%|█████████▍| 9430/10000 [00:35<00:02, 202.79it/s, logits=[tensor([-10.2153]), tensor([10.8147])]]

Logits for hard examples=[tensor([-10.2153]), tensor([10.8147])]


 95%|█████████▌| 9521/10000 [00:36<00:02, 203.33it/s, logits=[tensor([-10.2240]), tensor([10.8187])]]

Logits for hard examples=[tensor([-10.2240]), tensor([10.8187])]


 96%|█████████▋| 9632/10000 [00:36<00:01, 193.15it/s, logits=[tensor([-10.2326]), tensor([10.8227])]]

Logits for hard examples=[tensor([-10.2326]), tensor([10.8227])]


 97%|█████████▋| 9736/10000 [00:37<00:01, 193.48it/s, logits=[tensor([-10.2411]), tensor([10.8267])]]

Logits for hard examples=[tensor([-10.2411]), tensor([10.8267])]


 99%|█████████▊| 9857/10000 [00:37<00:00, 247.10it/s, logits=[tensor([-10.2496]), tensor([10.8307])]]

Logits for hard examples=[tensor([-10.2496]), tensor([10.8307])]


 99%|█████████▉| 9944/10000 [00:38<00:00, 276.46it/s, logits=[tensor([-10.2580]), tensor([10.8348])]]

Logits for hard examples=[tensor([-10.2580]), tensor([10.8348])]


100%|██████████| 10000/10000 [00:38<00:00, 261.17it/s, logits=[tensor([-10.2580]), tensor([10.8348])]]


SEQUENCE_LEN=7


  0%|          | 24/10000 [00:00<00:42, 232.96it/s, logits=[tensor([-0.1114]), tensor([-0.1116])]]

Logits for hard examples=[tensor([-0.1114]), tensor([-0.1116])]


  1%|▏         | 136/10000 [00:00<00:36, 267.40it/s, logits=[tensor([0.0331]), tensor([0.0334])]]

Logits for hard examples=[tensor([0.0331]), tensor([0.0334])]


  2%|▏         | 229/10000 [00:00<00:35, 279.16it/s, logits=[tensor([0.0054]), tensor([0.0064])]]

Logits for hard examples=[tensor([0.0054]), tensor([0.0064])]


  3%|▎         | 347/10000 [00:01<00:36, 267.42it/s, logits=[tensor([-0.0092]), tensor([-0.0069])]]

Logits for hard examples=[tensor([-0.0092]), tensor([-0.0069])]


  4%|▍         | 430/10000 [00:01<00:35, 267.17it/s, logits=[tensor([-0.0156]), tensor([-0.0051])]]

Logits for hard examples=[tensor([-0.0156]), tensor([-0.0051])]


  5%|▌         | 549/10000 [00:02<00:33, 282.38it/s, logits=[tensor([-0.2755]), tensor([-0.2753])]]

Logits for hard examples=[tensor([-0.2755]), tensor([-0.2753])]


  6%|▋         | 633/10000 [00:02<00:35, 263.96it/s, logits=[tensor([0.0503]), tensor([0.0506])]]

Logits for hard examples=[tensor([0.0503]), tensor([0.0506])]


  8%|▊         | 750/10000 [00:02<00:32, 288.95it/s, logits=[tensor([0.0437]), tensor([0.0440])]]

Logits for hard examples=[tensor([0.0437]), tensor([0.0440])]


  8%|▊         | 840/10000 [00:03<00:31, 291.07it/s, logits=[tensor([0.0071]), tensor([0.0074])]]

Logits for hard examples=[tensor([0.0071]), tensor([0.0074])]


  9%|▉         | 930/10000 [00:03<00:32, 279.28it/s, logits=[tensor([8.7231e-05]), tensor([0.0004])]]

Logits for hard examples=[tensor([8.7231e-05]), tensor([0.0004])]


 11%|█         | 1053/10000 [00:03<00:30, 288.84it/s, logits=[tensor([0.0006]), tensor([0.0010])]]

Logits for hard examples=[tensor([0.0006]), tensor([0.0010])]


 11%|█▏        | 1145/10000 [00:04<00:30, 293.38it/s, logits=[tensor([0.0015]), tensor([0.0019])]]

Logits for hard examples=[tensor([0.0015]), tensor([0.0019])]


 12%|█▏        | 1237/10000 [00:04<00:31, 281.00it/s, logits=[tensor([0.0019]), tensor([0.0023])]]

Logits for hard examples=[tensor([0.0019]), tensor([0.0023])]


 14%|█▎        | 1356/10000 [00:04<00:30, 285.34it/s, logits=[tensor([0.0020]), tensor([0.0024])]]

Logits for hard examples=[tensor([0.0020]), tensor([0.0024])]


 14%|█▍        | 1446/10000 [00:05<00:29, 288.53it/s, logits=[tensor([0.0020]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0020]), tensor([0.0025])]


 15%|█▌        | 1537/10000 [00:05<00:29, 285.25it/s, logits=[tensor([0.0020]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0020]), tensor([0.0025])]


 17%|█▋        | 1662/10000 [00:05<00:27, 300.29it/s, logits=[tensor([0.0020]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0020]), tensor([0.0025])]


 18%|█▊        | 1753/10000 [00:06<00:28, 284.78it/s, logits=[tensor([0.0019]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0019]), tensor([0.0025])]


 18%|█▊        | 1845/10000 [00:06<00:27, 296.63it/s, logits=[tensor([0.0018]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0018]), tensor([0.0025])]


 19%|█▉        | 1937/10000 [00:06<00:27, 296.06it/s, logits=[tensor([0.0017]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0017]), tensor([0.0025])]


 20%|██        | 2025/10000 [00:07<00:29, 273.00it/s, logits=[tensor([0.0016]), tensor([0.0025])]]

Logits for hard examples=[tensor([0.0016]), tensor([0.0025])]


 21%|██▏       | 2143/10000 [00:07<00:27, 282.33it/s, logits=[tensor([0.0014]), tensor([0.0026])]]

Logits for hard examples=[tensor([0.0014]), tensor([0.0026])]


 22%|██▏       | 2229/10000 [00:07<00:28, 268.63it/s, logits=[tensor([0.0009]), tensor([0.0027])]]

Logits for hard examples=[tensor([0.0009]), tensor([0.0027])]


 23%|██▎       | 2343/10000 [00:08<00:28, 267.53it/s, logits=[tensor([-0.0004]), tensor([0.0032])]]

Logits for hard examples=[tensor([-0.0004]), tensor([0.0032])]


 24%|██▍       | 2431/10000 [00:08<00:27, 279.77it/s, logits=[tensor([-0.0028]), tensor([0.0272])]]

Logits for hard examples=[tensor([-0.0028]), tensor([0.0272])]


 26%|██▌       | 2551/10000 [00:09<00:26, 285.51it/s, logits=[tensor([-0.2780]), tensor([-0.2780])]]

Logits for hard examples=[tensor([-0.2780]), tensor([-0.2780])]


 26%|██▋       | 2633/10000 [00:09<00:31, 231.29it/s, logits=[tensor([-0.0807]), tensor([-0.0806])]]

Logits for hard examples=[tensor([-0.0807]), tensor([-0.0806])]


 27%|██▋       | 2729/10000 [00:10<00:34, 213.56it/s, logits=[tensor([0.0532]), tensor([0.0533])]]

Logits for hard examples=[tensor([0.0532]), tensor([0.0533])]


 28%|██▊       | 2822/10000 [00:10<00:35, 201.48it/s, logits=[tensor([0.0075]), tensor([0.0076])]]

Logits for hard examples=[tensor([0.0075]), tensor([0.0076])]


 29%|██▉       | 2922/10000 [00:10<00:31, 225.07it/s, logits=[tensor([-0.0059]), tensor([-0.0058])]]

Logits for hard examples=[tensor([-0.0059]), tensor([-0.0058])]


 30%|███       | 3035/10000 [00:11<00:33, 205.71it/s, logits=[tensor([0.0013]), tensor([0.0015])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0015])]


 31%|███       | 3124/10000 [00:11<00:35, 195.83it/s, logits=[tensor([0.0024]), tensor([0.0026])]]

Logits for hard examples=[tensor([0.0024]), tensor([0.0026])]


 32%|███▏      | 3228/10000 [00:12<00:36, 186.56it/s, logits=[tensor([0.0013]), tensor([0.0015])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0015])]


 33%|███▎      | 3348/10000 [00:12<00:27, 240.15it/s, logits=[tensor([0.0012]), tensor([0.0014])]]

Logits for hard examples=[tensor([0.0012]), tensor([0.0014])]


 34%|███▍      | 3430/10000 [00:13<00:25, 259.89it/s, logits=[tensor([0.0013]), tensor([0.0015])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0015])]


 35%|███▌      | 3549/10000 [00:13<00:22, 288.38it/s, logits=[tensor([0.0012]), tensor([0.0015])]]

Logits for hard examples=[tensor([0.0012]), tensor([0.0015])]


 36%|███▋      | 3641/10000 [00:13<00:21, 298.81it/s, logits=[tensor([0.0010]), tensor([0.0014])]]

Logits for hard examples=[tensor([0.0010]), tensor([0.0014])]


 37%|███▋      | 3730/10000 [00:14<00:22, 279.42it/s, logits=[tensor([0.0009]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0009]), tensor([0.0013])]


 39%|███▊      | 3855/10000 [00:14<00:20, 294.89it/s, logits=[tensor([0.0006]), tensor([0.0011])]]

Logits for hard examples=[tensor([0.0006]), tensor([0.0011])]


 39%|███▉      | 3947/10000 [00:15<00:20, 295.37it/s, logits=[tensor([3.3736e-05]), tensor([0.0008])]]

Logits for hard examples=[tensor([3.3736e-05]), tensor([0.0008])]


 40%|████      | 4034/10000 [00:15<00:21, 277.61it/s, logits=[tensor([-0.0016]), tensor([-5.4121e-05])]]

Logits for hard examples=[tensor([-0.0016]), tensor([-5.4121e-05])]


 41%|████▏     | 4149/10000 [00:15<00:21, 271.54it/s, logits=[tensor([-0.0132]), tensor([-0.0055])]]

Logits for hard examples=[tensor([-0.0132]), tensor([-0.0055])]


 42%|████▏     | 4242/10000 [00:16<00:19, 295.83it/s, logits=[tensor([-0.1835]), tensor([-0.1830])]]

Logits for hard examples=[tensor([-0.1835]), tensor([-0.1830])]


 43%|████▎     | 4333/10000 [00:16<00:20, 281.04it/s, logits=[tensor([-0.0060]), tensor([-0.0057])]]

Logits for hard examples=[tensor([-0.0060]), tensor([-0.0057])]


 45%|████▍     | 4459/10000 [00:16<00:18, 302.60it/s, logits=[tensor([0.0269]), tensor([0.0272])]]

Logits for hard examples=[tensor([0.0269]), tensor([0.0272])]


 45%|████▌     | 4521/10000 [00:17<00:20, 264.74it/s, logits=[tensor([-0.0015]), tensor([-0.0011])]]

Logits for hard examples=[tensor([-0.0015]), tensor([-0.0011])]


 46%|████▋     | 4634/10000 [00:17<00:19, 269.92it/s, logits=[tensor([-0.0021]), tensor([-0.0017])]]

Logits for hard examples=[tensor([-0.0021]), tensor([-0.0017])]


 48%|████▊     | 4758/10000 [00:17<00:18, 291.13it/s, logits=[tensor([0.0016]), tensor([0.0021])]]

Logits for hard examples=[tensor([0.0016]), tensor([0.0021])]


 48%|████▊     | 4844/10000 [00:18<00:20, 255.94it/s, logits=[tensor([0.0011]), tensor([0.0016])]]

Logits for hard examples=[tensor([0.0011]), tensor([0.0016])]


 49%|████▉     | 4932/10000 [00:18<00:18, 275.10it/s, logits=[tensor([0.0005]), tensor([0.0010])]]

Logits for hard examples=[tensor([0.0005]), tensor([0.0010])]


 50%|█████     | 5050/10000 [00:19<00:17, 275.30it/s, logits=[tensor([0.0004]), tensor([0.0011])]]

Logits for hard examples=[tensor([0.0004]), tensor([0.0011])]


 51%|█████▏    | 5132/10000 [00:19<00:19, 248.25it/s, logits=[tensor([0.0001]), tensor([0.0010])]]

Logits for hard examples=[tensor([0.0001]), tensor([0.0010])]


 53%|█████▎    | 5252/10000 [00:19<00:16, 284.33it/s, logits=[tensor([-0.0003]), tensor([0.0008])]]

Logits for hard examples=[tensor([-0.0003]), tensor([0.0008])]


 53%|█████▎    | 5338/10000 [00:20<00:17, 264.56it/s, logits=[tensor([-0.0010]), tensor([0.0007])]]

Logits for hard examples=[tensor([-0.0010]), tensor([0.0007])]


 54%|█████▍    | 5444/10000 [00:20<00:18, 247.60it/s, logits=[tensor([-0.0026]), tensor([0.0006])]]

Logits for hard examples=[tensor([-0.0026]), tensor([0.0006])]


 55%|█████▌    | 5531/10000 [00:20<00:16, 272.70it/s, logits=[tensor([-0.0089]), tensor([0.0050])]]

Logits for hard examples=[tensor([-0.0089]), tensor([0.0050])]


 56%|█████▋    | 5644/10000 [00:21<00:16, 265.08it/s, logits=[tensor([-4.9079]), tensor([2.5029])]]

Logits for hard examples=[tensor([-4.9079]), tensor([2.5029])]


 57%|█████▋    | 5732/10000 [00:21<00:15, 280.00it/s, logits=[tensor([-11.4773]), tensor([8.5663])]]

Logits for hard examples=[tensor([-11.4773]), tensor([8.5663])]


 59%|█████▊    | 5853/10000 [00:22<00:14, 286.11it/s, logits=[tensor([-12.1425]), tensor([9.4618])]]

Logits for hard examples=[tensor([-12.1425]), tensor([9.4618])]


 59%|█████▉    | 5939/10000 [00:22<00:15, 256.87it/s, logits=[tensor([-12.2259]), tensor([9.5957])]]

Logits for hard examples=[tensor([-12.2259]), tensor([9.5957])]


 60%|██████    | 6046/10000 [00:22<00:15, 258.94it/s, logits=[tensor([-12.2333]), tensor([9.6257])]]

Logits for hard examples=[tensor([-12.2333]), tensor([9.6257])]


 61%|██████    | 6121/10000 [00:23<00:17, 220.14it/s, logits=[tensor([-12.2308]), tensor([9.6414])]]

Logits for hard examples=[tensor([-12.2308]), tensor([9.6414])]


 62%|██████▏   | 6236/10000 [00:23<00:17, 221.11it/s, logits=[tensor([-12.2270]), tensor([9.6550])]]

Logits for hard examples=[tensor([-12.2270]), tensor([9.6550])]


 63%|██████▎   | 6330/10000 [00:24<00:16, 219.31it/s, logits=[tensor([-12.2230]), tensor([9.6681])]]

Logits for hard examples=[tensor([-12.2230]), tensor([9.6681])]


 64%|██████▍   | 6418/10000 [00:24<00:18, 192.33it/s, logits=[tensor([-12.2192]), tensor([9.6810])]]

Logits for hard examples=[tensor([-12.2192]), tensor([9.6810])]


 65%|██████▌   | 6527/10000 [00:25<00:17, 200.58it/s, logits=[tensor([-12.2155]), tensor([9.6938])]]

Logits for hard examples=[tensor([-12.2155]), tensor([9.6938])]


 66%|██████▋   | 6629/10000 [00:25<00:17, 188.46it/s, logits=[tensor([-12.2119]), tensor([9.7063])]]

Logits for hard examples=[tensor([-12.2119]), tensor([9.7063])]


 67%|██████▋   | 6728/10000 [00:26<00:17, 191.55it/s, logits=[tensor([-12.2083]), tensor([9.7187])]]

Logits for hard examples=[tensor([-12.2083]), tensor([9.7187])]


 68%|██████▊   | 6843/10000 [00:26<00:11, 266.32it/s, logits=[tensor([-12.2048]), tensor([9.7309])]]

Logits for hard examples=[tensor([-12.2048]), tensor([9.7309])]


 69%|██████▉   | 6931/10000 [00:26<00:11, 277.14it/s, logits=[tensor([-12.2015]), tensor([9.7429])]]

Logits for hard examples=[tensor([-12.2015]), tensor([9.7429])]


 71%|███████   | 7051/10000 [00:27<00:10, 274.28it/s, logits=[tensor([-12.1982]), tensor([9.7548])]]

Logits for hard examples=[tensor([-12.1982]), tensor([9.7548])]


 71%|███████▏  | 7137/10000 [00:27<00:10, 274.93it/s, logits=[tensor([-12.1950]), tensor([9.7665])]]

Logits for hard examples=[tensor([-12.1950]), tensor([9.7665])]


 72%|███████▏  | 7227/10000 [00:28<00:10, 269.69it/s, logits=[tensor([-12.1919]), tensor([9.7781])]]

Logits for hard examples=[tensor([-12.1919]), tensor([9.7781])]


 73%|███████▎  | 7346/10000 [00:28<00:09, 279.56it/s, logits=[tensor([-12.1889]), tensor([9.7895])]]

Logits for hard examples=[tensor([-12.1889]), tensor([9.7895])]


 74%|███████▍  | 7434/10000 [00:28<00:09, 285.10it/s, logits=[tensor([-12.1860]), tensor([9.8008])]]

Logits for hard examples=[tensor([-12.1860]), tensor([9.8008])]


 76%|███████▌  | 7553/10000 [00:29<00:08, 293.65it/s, logits=[tensor([-12.1832]), tensor([9.8119])]]

Logits for hard examples=[tensor([-12.1832]), tensor([9.8119])]


 76%|███████▋  | 7646/10000 [00:29<00:08, 290.84it/s, logits=[tensor([-12.1805]), tensor([9.8229])]]

Logits for hard examples=[tensor([-12.1805]), tensor([9.8229])]


 77%|███████▋  | 7737/10000 [00:29<00:07, 292.40it/s, logits=[tensor([-12.1779]), tensor([9.8338])]]

Logits for hard examples=[tensor([-12.1779]), tensor([9.8338])]


 78%|███████▊  | 7831/10000 [00:30<00:07, 282.55it/s, logits=[tensor([-12.1753]), tensor([9.8446])]]

Logits for hard examples=[tensor([-12.1753]), tensor([9.8446])]


 80%|███████▉  | 7950/10000 [00:30<00:07, 287.11it/s, logits=[tensor([-12.1728]), tensor([9.8552])]]

Logits for hard examples=[tensor([-12.1728]), tensor([9.8552])]


 80%|████████  | 8040/10000 [00:30<00:07, 269.80it/s, logits=[tensor([-12.1704]), tensor([9.8657])]]

Logits for hard examples=[tensor([-12.1704]), tensor([9.8657])]


 81%|████████▏ | 8132/10000 [00:31<00:06, 290.59it/s, logits=[tensor([-12.1681]), tensor([9.8760])]]

Logits for hard examples=[tensor([-12.1681]), tensor([9.8760])]


 83%|████████▎ | 8252/10000 [00:31<00:06, 290.47it/s, logits=[tensor([-12.1658]), tensor([9.8863])]]

Logits for hard examples=[tensor([-12.1658]), tensor([9.8863])]


 83%|████████▎ | 8343/10000 [00:31<00:05, 285.27it/s, logits=[tensor([-12.1635]), tensor([9.8964])]]

Logits for hard examples=[tensor([-12.1635]), tensor([9.8964])]


 84%|████████▍ | 8431/10000 [00:32<00:05, 282.68it/s, logits=[tensor([-12.1614]), tensor([9.9064])]]

Logits for hard examples=[tensor([-12.1614]), tensor([9.9064])]


 86%|████████▌ | 8554/10000 [00:32<00:04, 293.86it/s, logits=[tensor([-12.1592]), tensor([9.9163])]]

Logits for hard examples=[tensor([-12.1592]), tensor([9.9163])]


 86%|████████▋ | 8646/10000 [00:32<00:04, 285.77it/s, logits=[tensor([-12.1572]), tensor([9.9261])]]

Logits for hard examples=[tensor([-12.1572]), tensor([9.9261])]


 87%|████████▋ | 8733/10000 [00:33<00:04, 276.52it/s, logits=[tensor([-12.1552]), tensor([9.9358])]]

Logits for hard examples=[tensor([-12.1552]), tensor([9.9358])]


 89%|████████▊ | 8855/10000 [00:33<00:03, 295.41it/s, logits=[tensor([-12.1532]), tensor([9.9454])]]

Logits for hard examples=[tensor([-12.1532]), tensor([9.9454])]


 89%|████████▉ | 8947/10000 [00:34<00:03, 284.96it/s, logits=[tensor([-12.1513]), tensor([9.9549])]]

Logits for hard examples=[tensor([-12.1513]), tensor([9.9549])]


 90%|█████████ | 9033/10000 [00:34<00:03, 269.25it/s, logits=[tensor([-12.1495]), tensor([9.9643])]]

Logits for hard examples=[tensor([-12.1495]), tensor([9.9643])]


 92%|█████████▏| 9158/10000 [00:34<00:02, 293.38it/s, logits=[tensor([-12.1477]), tensor([9.9736])]]

Logits for hard examples=[tensor([-12.1477]), tensor([9.9736])]


 92%|█████████▏| 9248/10000 [00:35<00:02, 281.16it/s, logits=[tensor([-12.1461]), tensor([9.9828])]]

Logits for hard examples=[tensor([-12.1461]), tensor([9.9828])]


 93%|█████████▎| 9336/10000 [00:35<00:02, 278.22it/s, logits=[tensor([-12.1444]), tensor([9.9919])]]

Logits for hard examples=[tensor([-12.1444]), tensor([9.9919])]


 95%|█████████▍| 9460/10000 [00:35<00:01, 300.12it/s, logits=[tensor([-12.1428]), tensor([10.0009])]]

Logits for hard examples=[tensor([-12.1428]), tensor([10.0009])]


 96%|█████████▌| 9552/10000 [00:36<00:01, 284.82it/s, logits=[tensor([-12.1413]), tensor([10.0098])]]

Logits for hard examples=[tensor([-12.1413]), tensor([10.0098])]


 96%|█████████▋| 9631/10000 [00:36<00:01, 224.36it/s, logits=[tensor([-12.1398]), tensor([10.0186])]]

Logits for hard examples=[tensor([-12.1398]), tensor([10.0186])]


 97%|█████████▋| 9726/10000 [00:37<00:01, 204.92it/s, logits=[tensor([-12.1384]), tensor([10.0274])]]

Logits for hard examples=[tensor([-12.1384]), tensor([10.0274])]


 98%|█████████▊| 9828/10000 [00:37<00:00, 184.25it/s, logits=[tensor([-12.1370]), tensor([10.0360])]]

Logits for hard examples=[tensor([-12.1370]), tensor([10.0360])]


 99%|█████████▉| 9922/10000 [00:38<00:00, 210.53it/s, logits=[tensor([-12.1357]), tensor([10.0446])]]

Logits for hard examples=[tensor([-12.1357]), tensor([10.0446])]


100%|██████████| 10000/10000 [00:38<00:00, 259.95it/s, logits=[tensor([-12.1357]), tensor([10.0446])]]


SEQUENCE_LEN=8


  0%|          | 22/10000 [00:00<00:46, 213.32it/s, logits=[tensor([0.0094]), tensor([0.0094])]]

Logits for hard examples=[tensor([0.0094]), tensor([0.0094])]


  1%|▏         | 128/10000 [00:00<00:50, 196.28it/s, logits=[tensor([0.0198]), tensor([0.0202])]]

Logits for hard examples=[tensor([0.0198]), tensor([0.0202])]


  2%|▏         | 229/10000 [00:01<00:49, 196.52it/s, logits=[tensor([0.0055]), tensor([0.0066])]]

Logits for hard examples=[tensor([0.0055]), tensor([0.0066])]


  3%|▎         | 336/10000 [00:01<00:38, 253.55it/s, logits=[tensor([-0.0030]), tensor([0.0004])]]

Logits for hard examples=[tensor([-0.0030]), tensor([0.0004])]


  4%|▍         | 449/10000 [00:02<00:35, 267.12it/s, logits=[tensor([-0.2216]), tensor([-0.2208])]]

Logits for hard examples=[tensor([-0.2216]), tensor([-0.2208])]


  5%|▌         | 542/10000 [00:02<00:32, 294.27it/s, logits=[tensor([-0.0632]), tensor([-0.0631])]]

Logits for hard examples=[tensor([-0.0632]), tensor([-0.0631])]


  6%|▋         | 632/10000 [00:02<00:32, 290.26it/s, logits=[tensor([0.0179]), tensor([0.0180])]]

Logits for hard examples=[tensor([0.0179]), tensor([0.0180])]


  8%|▊         | 751/10000 [00:03<00:32, 280.57it/s, logits=[tensor([0.0148]), tensor([0.0150])]]

Logits for hard examples=[tensor([0.0148]), tensor([0.0150])]


  8%|▊         | 839/10000 [00:03<00:37, 242.62it/s, logits=[tensor([0.0073]), tensor([0.0075])]]

Logits for hard examples=[tensor([0.0073]), tensor([0.0075])]


  9%|▉         | 938/10000 [00:03<00:41, 217.21it/s, logits=[tensor([0.0043]), tensor([0.0046])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0046])]


 10%|█         | 1042/10000 [00:04<00:36, 244.94it/s, logits=[tensor([0.0033]), tensor([0.0037])]]

Logits for hard examples=[tensor([0.0033]), tensor([0.0037])]


 11%|█▏        | 1147/10000 [00:04<00:35, 247.46it/s, logits=[tensor([0.0030]), tensor([0.0037])]]

Logits for hard examples=[tensor([0.0030]), tensor([0.0037])]


 12%|█▏        | 1242/10000 [00:05<00:39, 223.27it/s, logits=[tensor([0.0030]), tensor([0.0045])]]

Logits for hard examples=[tensor([0.0030]), tensor([0.0045])]


 13%|█▎        | 1343/10000 [00:05<00:35, 245.46it/s, logits=[tensor([0.0030]), tensor([0.0103])]]

Logits for hard examples=[tensor([0.0030]), tensor([0.0103])]


 14%|█▍        | 1441/10000 [00:06<00:37, 225.57it/s, logits=[tensor([0.0338]), tensor([0.0339])]]

Logits for hard examples=[tensor([0.0338]), tensor([0.0339])]


 15%|█▌        | 1547/10000 [00:06<00:33, 250.70it/s, logits=[tensor([0.1571]), tensor([0.1571])]]

Logits for hard examples=[tensor([0.1571]), tensor([0.1571])]


 16%|█▋        | 1626/10000 [00:06<00:34, 242.60it/s, logits=[tensor([-0.0015]), tensor([-0.0015])]]

Logits for hard examples=[tensor([-0.0015]), tensor([-0.0015])]


 17%|█▋        | 1725/10000 [00:07<00:35, 231.36it/s, logits=[tensor([-0.0166]), tensor([-0.0166])]]

Logits for hard examples=[tensor([-0.0166]), tensor([-0.0166])]


 18%|█▊        | 1829/10000 [00:07<00:33, 241.50it/s, logits=[tensor([0.0090]), tensor([0.0090])]]

Logits for hard examples=[tensor([0.0090]), tensor([0.0090])]


 19%|█▉        | 1927/10000 [00:08<00:35, 228.11it/s, logits=[tensor([0.0005]), tensor([0.0005])]]

Logits for hard examples=[tensor([0.0005]), tensor([0.0005])]


 20%|██        | 2033/10000 [00:08<00:31, 252.20it/s, logits=[tensor([0.0007]), tensor([0.0007])]]

Logits for hard examples=[tensor([0.0007]), tensor([0.0007])]


 21%|██▏       | 2136/10000 [00:08<00:34, 225.79it/s, logits=[tensor([0.0017]), tensor([0.0017])]]

Logits for hard examples=[tensor([0.0017]), tensor([0.0017])]


 22%|██▏       | 2226/10000 [00:09<00:39, 196.06it/s, logits=[tensor([0.0012]), tensor([0.0012])]]

Logits for hard examples=[tensor([0.0012]), tensor([0.0012])]


 23%|██▎       | 2327/10000 [00:10<00:39, 195.78it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 24%|██▍       | 2436/10000 [00:10<00:37, 203.08it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 25%|██▌       | 2524/10000 [00:10<00:37, 197.77it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 26%|██▌       | 2607/10000 [00:11<00:40, 180.87it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 27%|██▋       | 2722/10000 [00:12<00:50, 143.44it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 28%|██▊       | 2823/10000 [00:13<00:55, 129.93it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 29%|██▉       | 2929/10000 [00:13<00:36, 195.82it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 30%|███       | 3038/10000 [00:14<00:35, 198.18it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 31%|███▏      | 3129/10000 [00:14<00:34, 202.00it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 32%|███▏      | 3233/10000 [00:15<00:34, 193.47it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 33%|███▎      | 3318/10000 [00:15<00:35, 189.47it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 34%|███▍      | 3421/10000 [00:16<00:33, 193.52it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 35%|███▌      | 3543/10000 [00:16<00:26, 242.71it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 37%|███▋      | 3661/10000 [00:17<00:22, 282.88it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 37%|███▋      | 3748/10000 [00:17<00:22, 280.09it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 38%|███▊      | 3839/10000 [00:17<00:21, 290.19it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 40%|███▉      | 3959/10000 [00:18<00:20, 291.86it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 40%|████      | 4049/10000 [00:18<00:20, 288.26it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 41%|████▏     | 4137/10000 [00:18<00:21, 275.24it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 43%|████▎     | 4256/10000 [00:19<00:19, 290.04it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 43%|████▎     | 4346/10000 [00:19<00:19, 290.79it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 44%|████▍     | 4438/10000 [00:19<00:18, 295.30it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 45%|████▌     | 4530/10000 [00:20<00:18, 288.51it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 47%|████▋     | 4653/10000 [00:20<00:18, 288.85it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 47%|████▋     | 4743/10000 [00:20<00:18, 282.60it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 48%|████▊     | 4835/10000 [00:21<00:17, 293.92it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 49%|████▉     | 4929/10000 [00:21<00:17, 296.91it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 50%|█████     | 5048/10000 [00:21<00:17, 278.33it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 51%|█████▏    | 5137/10000 [00:22<00:16, 289.06it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 53%|█████▎    | 5256/10000 [00:22<00:16, 285.90it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 53%|█████▎    | 5342/10000 [00:22<00:17, 270.28it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 54%|█████▍    | 5433/10000 [00:23<00:15, 291.01it/s, logits=[tensor([0.0013]), tensor([0.0013])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0013])]


 55%|█████▌    | 5528/10000 [00:23<00:15, 286.12it/s, logits=[tensor([0.0013]), tensor([0.0014])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0014])]


 56%|█████▋    | 5647/10000 [00:23<00:15, 280.47it/s, logits=[tensor([0.0013]), tensor([0.0014])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0014])]


 57%|█████▋    | 5738/10000 [00:24<00:14, 289.78it/s, logits=[tensor([0.0013]), tensor([0.0014])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0014])]


 58%|█████▊    | 5831/10000 [00:24<00:14, 286.98it/s, logits=[tensor([0.0013]), tensor([0.0014])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0014])]


 59%|█████▉    | 5948/10000 [00:25<00:14, 278.75it/s, logits=[tensor([0.0014]), tensor([0.0015])]]

Logits for hard examples=[tensor([0.0014]), tensor([0.0015])]


 60%|██████    | 6039/10000 [00:25<00:13, 286.99it/s, logits=[tensor([0.0014]), tensor([0.0016])]]

Logits for hard examples=[tensor([0.0014]), tensor([0.0016])]


 61%|██████▏   | 6131/10000 [00:25<00:13, 283.41it/s, logits=[tensor([0.0014]), tensor([0.0017])]]

Logits for hard examples=[tensor([0.0014]), tensor([0.0017])]


 62%|██████▎   | 6250/10000 [00:26<00:12, 288.82it/s, logits=[tensor([0.0015]), tensor([0.0020])]]

Logits for hard examples=[tensor([0.0015]), tensor([0.0020])]


 63%|██████▎   | 6337/10000 [00:26<00:12, 285.89it/s, logits=[tensor([0.0017]), tensor([0.0027])]]

Logits for hard examples=[tensor([0.0017]), tensor([0.0027])]


 64%|██████▍   | 6418/10000 [00:26<00:16, 214.62it/s, logits=[tensor([-0.0005]), tensor([0.0081])]]

Logits for hard examples=[tensor([-0.0005]), tensor([0.0081])]


 65%|██████▌   | 6532/10000 [00:27<00:17, 203.52it/s, logits=[tensor([0.5835]), tensor([0.5833])]]

Logits for hard examples=[tensor([0.5835]), tensor([0.5833])]


 66%|██████▋   | 6625/10000 [00:27<00:16, 205.84it/s, logits=[tensor([-0.1389]), tensor([-0.1389])]]

Logits for hard examples=[tensor([-0.1389]), tensor([-0.1389])]


 67%|██████▋   | 6742/10000 [00:28<00:14, 226.08it/s, logits=[tensor([0.0061]), tensor([0.0061])]]

Logits for hard examples=[tensor([0.0061]), tensor([0.0061])]


 68%|██████▊   | 6831/10000 [00:28<00:16, 197.59it/s, logits=[tensor([0.0165]), tensor([0.0165])]]

Logits for hard examples=[tensor([0.0165]), tensor([0.0165])]


 69%|██████▉   | 6926/10000 [00:29<00:18, 169.08it/s, logits=[tensor([-0.0084]), tensor([-0.0084])]]

Logits for hard examples=[tensor([-0.0084]), tensor([-0.0084])]


 70%|███████   | 7027/10000 [00:29<00:16, 177.01it/s, logits=[tensor([0.0043]), tensor([0.0043])]]

Logits for hard examples=[tensor([0.0043]), tensor([0.0043])]


 71%|███████▏  | 7131/10000 [00:30<00:11, 241.68it/s, logits=[tensor([0.0004]), tensor([0.0004])]]

Logits for hard examples=[tensor([0.0004]), tensor([0.0004])]


 72%|███████▏  | 7241/10000 [00:30<00:10, 257.51it/s, logits=[tensor([0.0009]), tensor([0.0009])]]

Logits for hard examples=[tensor([0.0009]), tensor([0.0009])]


 74%|███████▎  | 7354/10000 [00:31<00:09, 273.25it/s, logits=[tensor([0.0011]), tensor([0.0012])]]

Logits for hard examples=[tensor([0.0011]), tensor([0.0012])]


 74%|███████▍  | 7442/10000 [00:31<00:09, 279.04it/s, logits=[tensor([0.0009]), tensor([0.0010])]]

Logits for hard examples=[tensor([0.0009]), tensor([0.0010])]


 75%|███████▌  | 7530/10000 [00:31<00:09, 270.70it/s, logits=[tensor([0.0010]), tensor([0.0010])]]

Logits for hard examples=[tensor([0.0010]), tensor([0.0010])]


 76%|███████▋  | 7644/10000 [00:32<00:08, 278.16it/s, logits=[tensor([0.0010]), tensor([0.0010])]]

Logits for hard examples=[tensor([0.0010]), tensor([0.0010])]


 77%|███████▋  | 7729/10000 [00:32<00:08, 274.35it/s, logits=[tensor([0.0010]), tensor([0.0010])]]

Logits for hard examples=[tensor([0.0010]), tensor([0.0010])]


 78%|███████▊  | 7850/10000 [00:33<00:07, 274.36it/s, logits=[tensor([0.0010]), tensor([0.0010])]]

Logits for hard examples=[tensor([0.0010]), tensor([0.0010])]


 79%|███████▉  | 7937/10000 [00:33<00:07, 275.53it/s, logits=[tensor([0.0010]), tensor([0.0010])]]

Logits for hard examples=[tensor([0.0010]), tensor([0.0010])]


 80%|████████  | 8048/10000 [00:33<00:07, 259.17it/s, logits=[tensor([0.0010]), tensor([0.0010])]]

Logits for hard examples=[tensor([0.0010]), tensor([0.0010])]


 81%|████████▏ | 8129/10000 [00:34<00:07, 255.88it/s, logits=[tensor([0.0010]), tensor([0.0010])]]

Logits for hard examples=[tensor([0.0010]), tensor([0.0010])]


 82%|████████▏ | 8248/10000 [00:34<00:06, 281.43it/s, logits=[tensor([0.0010]), tensor([0.0010])]]

Logits for hard examples=[tensor([0.0010]), tensor([0.0010])]


 83%|████████▎ | 8332/10000 [00:34<00:06, 262.86it/s, logits=[tensor([0.0010]), tensor([0.0010])]]

Logits for hard examples=[tensor([0.0010]), tensor([0.0010])]


 85%|████████▍ | 8454/10000 [00:35<00:05, 280.86it/s, logits=[tensor([0.0010]), tensor([0.0010])]]

Logits for hard examples=[tensor([0.0010]), tensor([0.0010])]


 85%|████████▌ | 8541/10000 [00:35<00:05, 280.63it/s, logits=[tensor([0.0010]), tensor([0.0010])]]

Logits for hard examples=[tensor([0.0010]), tensor([0.0010])]


 86%|████████▋ | 8628/10000 [00:35<00:05, 261.69it/s, logits=[tensor([0.0010]), tensor([0.0010])]]

Logits for hard examples=[tensor([0.0010]), tensor([0.0010])]


 87%|████████▋ | 8742/10000 [00:36<00:04, 263.53it/s, logits=[tensor([0.0010]), tensor([0.0010])]]

Logits for hard examples=[tensor([0.0010]), tensor([0.0010])]


 88%|████████▊ | 8831/10000 [00:36<00:04, 279.55it/s, logits=[tensor([0.0010]), tensor([0.0010])]]

Logits for hard examples=[tensor([0.0010]), tensor([0.0010])]


 89%|████████▉ | 8949/10000 [00:37<00:03, 286.11it/s, logits=[tensor([0.0010]), tensor([0.0010])]]

Logits for hard examples=[tensor([0.0010]), tensor([0.0010])]


 90%|█████████ | 9040/10000 [00:37<00:03, 279.94it/s, logits=[tensor([0.0010]), tensor([0.0010])]]

Logits for hard examples=[tensor([0.0010]), tensor([0.0010])]


 91%|█████████▏| 9128/10000 [00:37<00:03, 279.74it/s, logits=[tensor([0.0010]), tensor([0.0010])]]

Logits for hard examples=[tensor([0.0010]), tensor([0.0010])]


 92%|█████████▏| 9246/10000 [00:38<00:02, 280.63it/s, logits=[tensor([0.0010]), tensor([0.0010])]]

Logits for hard examples=[tensor([0.0010]), tensor([0.0010])]


 93%|█████████▎| 9333/10000 [00:38<00:02, 281.17it/s, logits=[tensor([0.0010]), tensor([0.0010])]]

Logits for hard examples=[tensor([0.0010]), tensor([0.0010])]


 94%|█████████▍| 9421/10000 [00:38<00:02, 258.32it/s, logits=[tensor([0.0010]), tensor([0.0010])]]

Logits for hard examples=[tensor([0.0010]), tensor([0.0010])]


 95%|█████████▌| 9539/10000 [00:39<00:01, 280.08it/s, logits=[tensor([0.0010]), tensor([0.0010])]]

Logits for hard examples=[tensor([0.0010]), tensor([0.0010])]


 96%|█████████▋| 9629/10000 [00:39<00:01, 276.08it/s, logits=[tensor([0.0009]), tensor([0.0010])]]

Logits for hard examples=[tensor([0.0009]), tensor([0.0010])]


 97%|█████████▋| 9748/10000 [00:39<00:00, 273.72it/s, logits=[tensor([0.0009]), tensor([0.0010])]]

Logits for hard examples=[tensor([0.0009]), tensor([0.0010])]


 98%|█████████▊| 9829/10000 [00:40<00:00, 224.34it/s, logits=[tensor([0.0009]), tensor([0.0010])]]

Logits for hard examples=[tensor([0.0009]), tensor([0.0010])]


 99%|█████████▉| 9918/10000 [00:40<00:00, 190.14it/s, logits=[tensor([0.0009]), tensor([0.0010])]]

Logits for hard examples=[tensor([0.0009]), tensor([0.0010])]


100%|██████████| 10000/10000 [00:41<00:00, 242.57it/s, logits=[tensor([0.0009]), tensor([0.0010])]]


SEQUENCE_LEN=9


  0%|          | 21/10000 [00:00<00:48, 203.85it/s, logits=[tensor([-0.0317]), tensor([-0.0316])]]

Logits for hard examples=[tensor([-0.0317]), tensor([-0.0316])]


  1%|▏         | 126/10000 [00:00<00:52, 188.83it/s, logits=[tensor([0.0194]), tensor([0.0196])]]

Logits for hard examples=[tensor([0.0194]), tensor([0.0196])]


  2%|▏         | 237/10000 [00:01<00:49, 198.41it/s, logits=[tensor([0.0076]), tensor([0.0079])]]

Logits for hard examples=[tensor([0.0076]), tensor([0.0079])]


  3%|▎         | 321/10000 [00:01<00:50, 193.32it/s, logits=[tensor([-0.0012]), tensor([-0.0009])]]

Logits for hard examples=[tensor([-0.0012]), tensor([-0.0009])]


  4%|▍         | 430/10000 [00:02<00:45, 209.45it/s, logits=[tensor([-1.4085e-05]), tensor([0.0005])]]

Logits for hard examples=[tensor([-1.4085e-05]), tensor([0.0005])]


  5%|▌         | 538/10000 [00:02<00:48, 193.20it/s, logits=[tensor([0.0008]), tensor([0.0016])]]

Logits for hard examples=[tensor([0.0008]), tensor([0.0016])]


  6%|▋         | 631/10000 [00:03<00:36, 259.92it/s, logits=[tensor([-0.0001]), tensor([0.0019])]]

Logits for hard examples=[tensor([-0.0001]), tensor([0.0019])]


  8%|▊         | 752/10000 [00:03<00:33, 275.06it/s, logits=[tensor([-0.0473]), tensor([0.0110])]]

Logits for hard examples=[tensor([-0.0473]), tensor([0.0110])]


  8%|▊         | 839/10000 [00:03<00:32, 277.79it/s, logits=[tensor([0.0969]), tensor([0.0969])]]

Logits for hard examples=[tensor([0.0969]), tensor([0.0969])]


  9%|▉         | 931/10000 [00:04<00:31, 292.04it/s, logits=[tensor([-0.0657]), tensor([-0.0657])]]

Logits for hard examples=[tensor([-0.0657]), tensor([-0.0657])]


 10%|█         | 1049/10000 [00:04<00:32, 273.99it/s, logits=[tensor([0.0494]), tensor([0.0493])]]

Logits for hard examples=[tensor([0.0494]), tensor([0.0493])]


 11%|█▏        | 1130/10000 [00:04<00:35, 252.03it/s, logits=[tensor([-0.0034]), tensor([-0.0034])]]

Logits for hard examples=[tensor([-0.0034]), tensor([-0.0034])]


 13%|█▎        | 1251/10000 [00:05<00:30, 288.64it/s, logits=[tensor([-0.0027]), tensor([-0.0028])]]

Logits for hard examples=[tensor([-0.0027]), tensor([-0.0028])]


 13%|█▎        | 1339/10000 [00:05<00:32, 264.47it/s, logits=[tensor([0.0040]), tensor([0.0039])]]

Logits for hard examples=[tensor([0.0040]), tensor([0.0039])]


 15%|█▍        | 1460/10000 [00:06<00:29, 294.17it/s, logits=[tensor([0.0029]), tensor([0.0029])]]

Logits for hard examples=[tensor([0.0029]), tensor([0.0029])]


 16%|█▌        | 1551/10000 [00:06<00:29, 288.64it/s, logits=[tensor([0.0021]), tensor([0.0021])]]

Logits for hard examples=[tensor([0.0021]), tensor([0.0021])]


 16%|█▋        | 1639/10000 [00:06<00:31, 263.40it/s, logits=[tensor([0.0023]), tensor([0.0022])]]

Logits for hard examples=[tensor([0.0023]), tensor([0.0022])]


 17%|█▋        | 1732/10000 [00:06<00:28, 288.52it/s, logits=[tensor([0.0023]), tensor([0.0023])]]

Logits for hard examples=[tensor([0.0023]), tensor([0.0023])]


 19%|█▊        | 1855/10000 [00:07<00:27, 294.54it/s, logits=[tensor([0.0023]), tensor([0.0023])]]

Logits for hard examples=[tensor([0.0023]), tensor([0.0023])]


 19%|█▉        | 1945/10000 [00:07<00:29, 275.93it/s, logits=[tensor([0.0023]), tensor([0.0023])]]

Logits for hard examples=[tensor([0.0023]), tensor([0.0023])]


 20%|██        | 2035/10000 [00:08<00:27, 290.40it/s, logits=[tensor([0.0022]), tensor([0.0022])]]

Logits for hard examples=[tensor([0.0022]), tensor([0.0022])]


 22%|██▏       | 2157/10000 [00:08<00:26, 291.49it/s, logits=[tensor([0.0022]), tensor([0.0022])]]

Logits for hard examples=[tensor([0.0022]), tensor([0.0022])]


 22%|██▏       | 2245/10000 [00:08<00:28, 275.10it/s, logits=[tensor([0.0022]), tensor([0.0022])]]

Logits for hard examples=[tensor([0.0022]), tensor([0.0022])]


 23%|██▎       | 2334/10000 [00:09<00:27, 280.60it/s, logits=[tensor([0.0022]), tensor([0.0021])]]

Logits for hard examples=[tensor([0.0022]), tensor([0.0021])]


 25%|██▍       | 2453/10000 [00:09<00:27, 273.54it/s, logits=[tensor([0.0021]), tensor([0.0021])]]

Logits for hard examples=[tensor([0.0021]), tensor([0.0021])]


 25%|██▌       | 2542/10000 [00:09<00:26, 282.42it/s, logits=[tensor([0.0021]), tensor([0.0021])]]

Logits for hard examples=[tensor([0.0021]), tensor([0.0021])]


 26%|██▋       | 2631/10000 [00:10<00:25, 285.87it/s, logits=[tensor([0.0021]), tensor([0.0021])]]

Logits for hard examples=[tensor([0.0021]), tensor([0.0021])]


 28%|██▊       | 2750/10000 [00:10<00:26, 271.84it/s, logits=[tensor([0.0020]), tensor([0.0021])]]

Logits for hard examples=[tensor([0.0020]), tensor([0.0021])]


 28%|██▊       | 2842/10000 [00:10<00:24, 290.33it/s, logits=[tensor([0.0020]), tensor([0.0020])]]

Logits for hard examples=[tensor([0.0020]), tensor([0.0020])]


 29%|██▉       | 2933/10000 [00:11<00:24, 292.29it/s, logits=[tensor([0.0020]), tensor([0.0020])]]

Logits for hard examples=[tensor([0.0020]), tensor([0.0020])]


 30%|███       | 3024/10000 [00:11<00:25, 269.73it/s, logits=[tensor([0.0021]), tensor([0.0021])]]

Logits for hard examples=[tensor([0.0021]), tensor([0.0021])]


 31%|███▏      | 3142/10000 [00:12<00:24, 285.33it/s, logits=[tensor([0.0020]), tensor([0.0020])]]

Logits for hard examples=[tensor([0.0020]), tensor([0.0020])]


 32%|███▏      | 3230/10000 [00:12<00:23, 285.89it/s, logits=[tensor([0.0019]), tensor([0.0019])]]

Logits for hard examples=[tensor([0.0019]), tensor([0.0019])]


 33%|███▎      | 3341/10000 [00:12<00:27, 240.03it/s, logits=[tensor([0.0019]), tensor([0.0019])]]

Logits for hard examples=[tensor([0.0019]), tensor([0.0019])]


 34%|███▍      | 3437/10000 [00:13<00:30, 217.70it/s, logits=[tensor([0.0019]), tensor([0.0019])]]

Logits for hard examples=[tensor([0.0019]), tensor([0.0019])]


 35%|███▌      | 3526/10000 [00:13<00:30, 209.54it/s, logits=[tensor([0.0018]), tensor([0.0019])]]

Logits for hard examples=[tensor([0.0018]), tensor([0.0019])]


 36%|███▋      | 3643/10000 [00:14<00:28, 219.45it/s, logits=[tensor([0.0018]), tensor([0.0019])]]

Logits for hard examples=[tensor([0.0018]), tensor([0.0019])]


 37%|███▋      | 3731/10000 [00:14<00:32, 194.91it/s, logits=[tensor([0.0018]), tensor([0.0019])]]

Logits for hard examples=[tensor([0.0018]), tensor([0.0019])]


 38%|███▊      | 3837/10000 [00:15<00:30, 202.34it/s, logits=[tensor([0.0017]), tensor([0.0019])]]

Logits for hard examples=[tensor([0.0017]), tensor([0.0019])]


 39%|███▉      | 3919/10000 [00:15<00:32, 187.85it/s, logits=[tensor([0.0017]), tensor([0.0018])]]

Logits for hard examples=[tensor([0.0017]), tensor([0.0018])]


 40%|████      | 4024/10000 [00:16<00:30, 194.17it/s, logits=[tensor([0.0016]), tensor([0.0018])]]

Logits for hard examples=[tensor([0.0016]), tensor([0.0018])]


 42%|████▏     | 4154/10000 [00:16<00:22, 256.88it/s, logits=[tensor([0.0018]), tensor([0.0021])]]

Logits for hard examples=[tensor([0.0018]), tensor([0.0021])]


 42%|████▏     | 4240/10000 [00:17<00:20, 274.86it/s, logits=[tensor([0.0016]), tensor([0.0020])]]

Logits for hard examples=[tensor([0.0016]), tensor([0.0020])]


 44%|████▎     | 4358/10000 [00:17<00:19, 284.84it/s, logits=[tensor([0.0014]), tensor([0.0020])]]

Logits for hard examples=[tensor([0.0014]), tensor([0.0020])]


 44%|████▍     | 4449/10000 [00:17<00:19, 285.71it/s, logits=[tensor([0.0010]), tensor([0.0020])]]

Logits for hard examples=[tensor([0.0010]), tensor([0.0020])]


 45%|████▌     | 4536/10000 [00:18<00:19, 275.00it/s, logits=[tensor([0.0002]), tensor([0.0021])]]

Logits for hard examples=[tensor([0.0002]), tensor([0.0021])]


 47%|████▋     | 4656/10000 [00:18<00:18, 284.36it/s, logits=[tensor([0.0009]), tensor([0.0083])]]

Logits for hard examples=[tensor([0.0009]), tensor([0.0083])]


 47%|████▋     | 4746/10000 [00:18<00:18, 283.58it/s, logits=[tensor([-0.3191]), tensor([-0.3193])]]

Logits for hard examples=[tensor([-0.3191]), tensor([-0.3193])]


 48%|████▊     | 4834/10000 [00:19<00:18, 278.54it/s, logits=[tensor([-0.1148]), tensor([-0.1148])]]

Logits for hard examples=[tensor([-0.1148]), tensor([-0.1148])]


 49%|████▉     | 4949/10000 [00:19<00:19, 261.69it/s, logits=[tensor([-0.0195]), tensor([-0.0196])]]

Logits for hard examples=[tensor([-0.0195]), tensor([-0.0196])]


 50%|█████     | 5039/10000 [00:19<00:17, 279.48it/s, logits=[tensor([0.0022]), tensor([0.0022])]]

Logits for hard examples=[tensor([0.0022]), tensor([0.0022])]


 51%|█████▏    | 5130/10000 [00:20<00:17, 275.47it/s, logits=[tensor([0.0049]), tensor([0.0048])]]

Logits for hard examples=[tensor([0.0049]), tensor([0.0048])]


 52%|█████▏    | 5248/10000 [00:20<00:17, 272.15it/s, logits=[tensor([0.0046]), tensor([0.0045])]]

Logits for hard examples=[tensor([0.0046]), tensor([0.0045])]


 53%|█████▎    | 5337/10000 [00:20<00:16, 288.01it/s, logits=[tensor([0.0042]), tensor([0.0042])]]

Logits for hard examples=[tensor([0.0042]), tensor([0.0042])]


 54%|█████▍    | 5429/10000 [00:21<00:16, 281.69it/s, logits=[tensor([0.0041]), tensor([0.0041])]]

Logits for hard examples=[tensor([0.0041]), tensor([0.0041])]


 55%|█████▌    | 5546/10000 [00:21<00:16, 271.19it/s, logits=[tensor([0.0041]), tensor([0.0040])]]

Logits for hard examples=[tensor([0.0041]), tensor([0.0040])]


 56%|█████▋    | 5641/10000 [00:22<00:14, 299.89it/s, logits=[tensor([0.0041]), tensor([0.0040])]]

Logits for hard examples=[tensor([0.0041]), tensor([0.0040])]


 57%|█████▋    | 5733/10000 [00:22<00:14, 286.41it/s, logits=[tensor([0.0040]), tensor([0.0039])]]

Logits for hard examples=[tensor([0.0040]), tensor([0.0039])]


 59%|█████▊    | 5851/10000 [00:22<00:14, 278.47it/s, logits=[tensor([0.0039]), tensor([0.0039])]]

Logits for hard examples=[tensor([0.0039]), tensor([0.0039])]


 59%|█████▉    | 5940/10000 [00:23<00:14, 275.86it/s, logits=[tensor([0.0038]), tensor([0.0038])]]

Logits for hard examples=[tensor([0.0038]), tensor([0.0038])]


 61%|██████    | 6052/10000 [00:23<00:14, 266.80it/s, logits=[tensor([0.0037]), tensor([0.0037])]]

Logits for hard examples=[tensor([0.0037]), tensor([0.0037])]


 61%|██████▏   | 6139/10000 [00:23<00:13, 283.15it/s, logits=[tensor([0.0036]), tensor([0.0036])]]

Logits for hard examples=[tensor([0.0036]), tensor([0.0036])]


 62%|██████▏   | 6231/10000 [00:24<00:13, 286.49it/s, logits=[tensor([0.0035]), tensor([0.0035])]]

Logits for hard examples=[tensor([0.0035]), tensor([0.0035])]


 63%|██████▎   | 6345/10000 [00:24<00:14, 251.67it/s, logits=[tensor([0.0035]), tensor([0.0034])]]

Logits for hard examples=[tensor([0.0035]), tensor([0.0034])]


 64%|██████▍   | 6435/10000 [00:24<00:12, 281.74it/s, logits=[tensor([0.0034]), tensor([0.0034])]]

Logits for hard examples=[tensor([0.0034]), tensor([0.0034])]


 66%|██████▌   | 6556/10000 [00:25<00:11, 288.58it/s, logits=[tensor([0.0033]), tensor([0.0033])]]

Logits for hard examples=[tensor([0.0033]), tensor([0.0033])]


 66%|██████▋   | 6646/10000 [00:25<00:11, 281.21it/s, logits=[tensor([0.0033]), tensor([0.0033])]]

Logits for hard examples=[tensor([0.0033]), tensor([0.0033])]


 67%|██████▋   | 6736/10000 [00:26<00:11, 290.70it/s, logits=[tensor([0.0033]), tensor([0.0032])]]

Logits for hard examples=[tensor([0.0033]), tensor([0.0032])]


 68%|██████▊   | 6827/10000 [00:26<00:11, 286.60it/s, logits=[tensor([0.0032]), tensor([0.0032])]]

Logits for hard examples=[tensor([0.0032]), tensor([0.0032])]


 69%|██████▉   | 6933/10000 [00:26<00:13, 230.76it/s, logits=[tensor([0.0032]), tensor([0.0032])]]

Logits for hard examples=[tensor([0.0032]), tensor([0.0032])]


 70%|███████   | 7027/10000 [00:27<00:13, 221.33it/s, logits=[tensor([0.0031]), tensor([0.0031])]]

Logits for hard examples=[tensor([0.0031]), tensor([0.0031])]


 71%|███████▏  | 7137/10000 [00:27<00:14, 196.76it/s, logits=[tensor([0.0031]), tensor([0.0031])]]

Logits for hard examples=[tensor([0.0031]), tensor([0.0031])]


 72%|███████▏  | 7229/10000 [00:28<00:12, 215.16it/s, logits=[tensor([0.0031]), tensor([0.0031])]]

Logits for hard examples=[tensor([0.0031]), tensor([0.0031])]


 73%|███████▎  | 7321/10000 [00:28<00:13, 196.45it/s, logits=[tensor([0.0031]), tensor([0.0030])]]

Logits for hard examples=[tensor([0.0031]), tensor([0.0030])]


 74%|███████▍  | 7425/10000 [00:29<00:13, 190.94it/s, logits=[tensor([0.0030]), tensor([0.0030])]]

Logits for hard examples=[tensor([0.0030]), tensor([0.0030])]


 75%|███████▌  | 7527/10000 [00:29<00:12, 190.81it/s, logits=[tensor([0.0030]), tensor([0.0030])]]

Logits for hard examples=[tensor([0.0030]), tensor([0.0030])]


 76%|███████▋  | 7641/10000 [00:30<00:10, 226.72it/s, logits=[tensor([0.0030]), tensor([0.0030])]]

Logits for hard examples=[tensor([0.0030]), tensor([0.0030])]


 77%|███████▋  | 7744/10000 [00:30<00:09, 241.09it/s, logits=[tensor([0.0030]), tensor([0.0030])]]

Logits for hard examples=[tensor([0.0030]), tensor([0.0030])]


 78%|███████▊  | 7837/10000 [00:31<00:07, 284.04it/s, logits=[tensor([0.0030]), tensor([0.0030])]]

Logits for hard examples=[tensor([0.0030]), tensor([0.0030])]


 80%|███████▉  | 7959/10000 [00:31<00:07, 289.86it/s, logits=[tensor([0.0030]), tensor([0.0029])]]

Logits for hard examples=[tensor([0.0030]), tensor([0.0029])]


 80%|████████  | 8045/10000 [00:31<00:07, 271.40it/s, logits=[tensor([0.0029]), tensor([0.0029])]]

Logits for hard examples=[tensor([0.0029]), tensor([0.0029])]


 81%|████████▏ | 8133/10000 [00:32<00:06, 284.29it/s, logits=[tensor([0.0029]), tensor([0.0029])]]

Logits for hard examples=[tensor([0.0029]), tensor([0.0029])]


 83%|████████▎ | 8252/10000 [00:32<00:06, 286.80it/s, logits=[tensor([0.0029]), tensor([0.0029])]]

Logits for hard examples=[tensor([0.0029]), tensor([0.0029])]


 83%|████████▎ | 8340/10000 [00:32<00:06, 274.88it/s, logits=[tensor([0.0029]), tensor([0.0029])]]

Logits for hard examples=[tensor([0.0029]), tensor([0.0029])]


 84%|████████▍ | 8430/10000 [00:33<00:05, 281.15it/s, logits=[tensor([0.0029]), tensor([0.0029])]]

Logits for hard examples=[tensor([0.0029]), tensor([0.0029])]


 85%|████████▌ | 8547/10000 [00:33<00:05, 264.01it/s, logits=[tensor([0.0029]), tensor([0.0029])]]

Logits for hard examples=[tensor([0.0029]), tensor([0.0029])]


 86%|████████▋ | 8631/10000 [00:33<00:04, 274.20it/s, logits=[tensor([0.0029]), tensor([0.0029])]]

Logits for hard examples=[tensor([0.0029]), tensor([0.0029])]


 88%|████████▊ | 8750/10000 [00:34<00:04, 289.07it/s, logits=[tensor([0.0029]), tensor([0.0029])]]

Logits for hard examples=[tensor([0.0029]), tensor([0.0029])]


 88%|████████▊ | 8836/10000 [00:34<00:04, 270.48it/s, logits=[tensor([0.0028]), tensor([0.0029])]]

Logits for hard examples=[tensor([0.0028]), tensor([0.0029])]


 90%|████████▉ | 8952/10000 [00:35<00:03, 281.84it/s, logits=[tensor([0.0028]), tensor([0.0028])]]

Logits for hard examples=[tensor([0.0028]), tensor([0.0028])]


 90%|█████████ | 9040/10000 [00:35<00:03, 282.27it/s, logits=[tensor([0.0028]), tensor([0.0028])]]

Logits for hard examples=[tensor([0.0028]), tensor([0.0028])]


 92%|█████████▏| 9156/10000 [00:35<00:03, 277.45it/s, logits=[tensor([0.0028]), tensor([0.0028])]]

Logits for hard examples=[tensor([0.0028]), tensor([0.0028])]


 92%|█████████▏| 9244/10000 [00:36<00:02, 282.04it/s, logits=[tensor([0.0028]), tensor([0.0028])]]

Logits for hard examples=[tensor([0.0028]), tensor([0.0028])]


 93%|█████████▎| 9331/10000 [00:36<00:02, 280.40it/s, logits=[tensor([0.0028]), tensor([0.0028])]]

Logits for hard examples=[tensor([0.0028]), tensor([0.0028])]


 94%|█████████▍| 9448/10000 [00:36<00:01, 284.58it/s, logits=[tensor([0.0028]), tensor([0.0028])]]

Logits for hard examples=[tensor([0.0028]), tensor([0.0028])]


 95%|█████████▌| 9538/10000 [00:37<00:01, 286.60it/s, logits=[tensor([0.0028]), tensor([0.0028])]]

Logits for hard examples=[tensor([0.0028]), tensor([0.0028])]


 96%|█████████▋| 9625/10000 [00:37<00:01, 272.16it/s, logits=[tensor([0.0028]), tensor([0.0028])]]

Logits for hard examples=[tensor([0.0028]), tensor([0.0028])]


 97%|█████████▋| 9740/10000 [00:37<00:00, 274.72it/s, logits=[tensor([0.0028]), tensor([0.0028])]]

Logits for hard examples=[tensor([0.0028]), tensor([0.0028])]


 99%|█████████▊| 9856/10000 [00:38<00:00, 279.77it/s, logits=[tensor([0.0028]), tensor([0.0028])]]

Logits for hard examples=[tensor([0.0028]), tensor([0.0028])]


 99%|█████████▉| 9943/10000 [00:38<00:00, 270.03it/s, logits=[tensor([0.0028]), tensor([0.0028])]]

Logits for hard examples=[tensor([0.0028]), tensor([0.0028])]


100%|██████████| 10000/10000 [00:38<00:00, 257.25it/s, logits=[tensor([0.0028]), tensor([0.0028])]]


SEQUENCE_LEN=10


  0%|          | 19/10000 [00:00<00:53, 185.37it/s, logits=[tensor([0.1900]), tensor([0.1901])]]

Logits for hard examples=[tensor([0.1900]), tensor([0.1901])]


  1%|▏         | 139/10000 [00:00<00:34, 287.37it/s, logits=[tensor([0.0430]), tensor([0.0434])]]

Logits for hard examples=[tensor([0.0430]), tensor([0.0434])]


  3%|▎         | 257/10000 [00:00<00:34, 280.41it/s, logits=[tensor([-0.0185]), tensor([-0.0175])]]

Logits for hard examples=[tensor([-0.0185]), tensor([-0.0175])]


  3%|▎         | 348/10000 [00:01<00:33, 284.58it/s, logits=[tensor([-0.0106]), tensor([-0.0035])]]

Logits for hard examples=[tensor([-0.0106]), tensor([-0.0035])]


  4%|▍         | 429/10000 [00:01<00:40, 234.49it/s, logits=[tensor([-7.4499]), tensor([6.2641])]]

Logits for hard examples=[tensor([-7.4499]), tensor([6.2641])]


  5%|▌         | 545/10000 [00:02<00:43, 219.83it/s, logits=[tensor([-12.8321]), tensor([11.4779])]]

Logits for hard examples=[tensor([-12.8321]), tensor([11.4779])]


  6%|▋         | 637/10000 [00:02<00:43, 213.27it/s, logits=[tensor([-13.5144]), tensor([12.1806])]]

Logits for hard examples=[tensor([-13.5144]), tensor([12.1806])]


  7%|▋         | 730/10000 [00:03<00:42, 220.66it/s, logits=[tensor([-13.6101]), tensor([12.2772])]]

Logits for hard examples=[tensor([-13.6101]), tensor([12.2772])]


  8%|▊         | 824/10000 [00:03<00:43, 209.87it/s, logits=[tensor([-13.6276]), tensor([12.2924])]]

Logits for hard examples=[tensor([-13.6276]), tensor([12.2924])]


  9%|▉         | 927/10000 [00:04<00:48, 187.34it/s, logits=[tensor([-13.6345]), tensor([12.2967])]]

Logits for hard examples=[tensor([-13.6345]), tensor([12.2967])]


 10%|█         | 1023/10000 [00:04<00:51, 174.01it/s, logits=[tensor([-13.6401]), tensor([12.2994])]]

Logits for hard examples=[tensor([-13.6401]), tensor([12.2994])]


 11%|█▏        | 1130/10000 [00:05<00:41, 214.72it/s, logits=[tensor([-13.6455]), tensor([12.3021])]]

Logits for hard examples=[tensor([-13.6455]), tensor([12.3021])]


 12%|█▏        | 1238/10000 [00:05<00:34, 253.94it/s, logits=[tensor([-13.6508]), tensor([12.3047])]]

Logits for hard examples=[tensor([-13.6508]), tensor([12.3047])]


 13%|█▎        | 1343/10000 [00:06<00:34, 251.87it/s, logits=[tensor([-13.6561]), tensor([12.3073])]]

Logits for hard examples=[tensor([-13.6561]), tensor([12.3073])]


 14%|█▍        | 1449/10000 [00:06<00:34, 250.22it/s, logits=[tensor([-13.6613]), tensor([12.3099])]]

Logits for hard examples=[tensor([-13.6613]), tensor([12.3099])]


 15%|█▌        | 1529/10000 [00:06<00:33, 250.38it/s, logits=[tensor([-13.6667]), tensor([12.3125])]]

Logits for hard examples=[tensor([-13.6667]), tensor([12.3125])]


 16%|█▋        | 1645/10000 [00:07<00:30, 272.53it/s, logits=[tensor([-13.6720]), tensor([12.3151])]]

Logits for hard examples=[tensor([-13.6720]), tensor([12.3151])]


 17%|█▋        | 1730/10000 [00:07<00:31, 262.56it/s, logits=[tensor([-13.6773]), tensor([12.3177])]]

Logits for hard examples=[tensor([-13.6773]), tensor([12.3177])]


 18%|█▊        | 1842/10000 [00:07<00:30, 269.55it/s, logits=[tensor([-13.6827]), tensor([12.3202])]]

Logits for hard examples=[tensor([-13.6827]), tensor([12.3202])]


 19%|█▉        | 1932/10000 [00:08<00:28, 287.23it/s, logits=[tensor([-13.6880]), tensor([12.3228])]]

Logits for hard examples=[tensor([-13.6880]), tensor([12.3228])]


 20%|██        | 2047/10000 [00:08<00:30, 258.88it/s, logits=[tensor([-13.6934]), tensor([12.3254])]]

Logits for hard examples=[tensor([-13.6934]), tensor([12.3254])]


 21%|██▏       | 2131/10000 [00:09<00:29, 270.08it/s, logits=[tensor([-13.6988]), tensor([12.3279])]]

Logits for hard examples=[tensor([-13.6988]), tensor([12.3279])]


 23%|██▎       | 2253/10000 [00:09<00:27, 283.48it/s, logits=[tensor([-13.7042]), tensor([12.3305])]]

Logits for hard examples=[tensor([-13.7042]), tensor([12.3305])]


 23%|██▎       | 2339/10000 [00:09<00:28, 270.67it/s, logits=[tensor([-13.7097]), tensor([12.3330])]]

Logits for hard examples=[tensor([-13.7097]), tensor([12.3330])]


 25%|██▍       | 2458/10000 [00:10<00:26, 282.99it/s, logits=[tensor([-13.7151]), tensor([12.3356])]]

Logits for hard examples=[tensor([-13.7151]), tensor([12.3356])]


 25%|██▌       | 2546/10000 [00:10<00:26, 284.71it/s, logits=[tensor([-13.7205]), tensor([12.3381])]]

Logits for hard examples=[tensor([-13.7205]), tensor([12.3381])]


 26%|██▋       | 2632/10000 [00:10<00:28, 260.45it/s, logits=[tensor([-13.7260]), tensor([12.3407])]]

Logits for hard examples=[tensor([-13.7260]), tensor([12.3407])]


 28%|██▊       | 2754/10000 [00:11<00:24, 289.90it/s, logits=[tensor([-13.7315]), tensor([12.3433])]]

Logits for hard examples=[tensor([-13.7315]), tensor([12.3433])]


 28%|██▊       | 2846/10000 [00:11<00:24, 294.50it/s, logits=[tensor([-13.7369]), tensor([12.3458])]]

Logits for hard examples=[tensor([-13.7369]), tensor([12.3458])]


 29%|██▉       | 2934/10000 [00:11<00:26, 268.73it/s, logits=[tensor([-13.7424]), tensor([12.3483])]]

Logits for hard examples=[tensor([-13.7424]), tensor([12.3483])]


 31%|███       | 3055/10000 [00:12<00:24, 286.53it/s, logits=[tensor([-13.7479]), tensor([12.3507])]]

Logits for hard examples=[tensor([-13.7479]), tensor([12.3507])]


 31%|███▏      | 3141/10000 [00:12<00:27, 252.49it/s, logits=[tensor([-13.7535]), tensor([12.3531])]]

Logits for hard examples=[tensor([-13.7535]), tensor([12.3531])]


 33%|███▎      | 3257/10000 [00:13<00:23, 281.91it/s, logits=[tensor([-13.7590]), tensor([12.3556])]]

Logits for hard examples=[tensor([-13.7590]), tensor([12.3556])]


 33%|███▎      | 3347/10000 [00:13<00:23, 284.25it/s, logits=[tensor([-13.7645]), tensor([12.3580])]]

Logits for hard examples=[tensor([-13.7645]), tensor([12.3580])]


 34%|███▍      | 3431/10000 [00:13<00:26, 248.93it/s, logits=[tensor([-13.7700]), tensor([12.3604])]]

Logits for hard examples=[tensor([-13.7700]), tensor([12.3604])]


 35%|███▌      | 3547/10000 [00:14<00:23, 277.16it/s, logits=[tensor([-13.7755]), tensor([12.3629])]]

Logits for hard examples=[tensor([-13.7755]), tensor([12.3629])]


 36%|███▋      | 3635/10000 [00:14<00:22, 283.39it/s, logits=[tensor([-13.7811]), tensor([12.3653])]]

Logits for hard examples=[tensor([-13.7811]), tensor([12.3653])]


 38%|███▊      | 3753/10000 [00:14<00:22, 277.76it/s, logits=[tensor([-13.7866]), tensor([12.3678])]]

Logits for hard examples=[tensor([-13.7866]), tensor([12.3678])]


 38%|███▊      | 3835/10000 [00:15<00:26, 233.56it/s, logits=[tensor([-13.7922]), tensor([12.3702])]]

Logits for hard examples=[tensor([-13.7922]), tensor([12.3702])]


 39%|███▉      | 3924/10000 [00:15<00:30, 197.07it/s, logits=[tensor([-13.7977]), tensor([12.3726])]]

Logits for hard examples=[tensor([-13.7977]), tensor([12.3726])]


 40%|████      | 4038/10000 [00:16<00:27, 217.08it/s, logits=[tensor([-13.8033]), tensor([12.3750])]]

Logits for hard examples=[tensor([-13.8033]), tensor([12.3750])]


 41%|████▏     | 4126/10000 [00:16<00:30, 194.90it/s, logits=[tensor([-13.8089]), tensor([12.3773])]]

Logits for hard examples=[tensor([-13.8089]), tensor([12.3773])]


 42%|████▏     | 4230/10000 [00:17<00:29, 197.95it/s, logits=[tensor([-13.8145]), tensor([12.3797])]]

Logits for hard examples=[tensor([-13.8145]), tensor([12.3797])]


 43%|████▎     | 4334/10000 [00:17<00:31, 181.62it/s, logits=[tensor([-13.8201]), tensor([12.3820])]]

Logits for hard examples=[tensor([-13.8201]), tensor([12.3820])]


 44%|████▍     | 4436/10000 [00:18<00:28, 197.15it/s, logits=[tensor([-13.8257]), tensor([12.3843])]]

Logits for hard examples=[tensor([-13.8257]), tensor([12.3843])]


 45%|████▌     | 4539/10000 [00:18<00:28, 191.62it/s, logits=[tensor([-13.8313]), tensor([12.3866])]]

Logits for hard examples=[tensor([-13.8313]), tensor([12.3866])]


 47%|████▋     | 4652/10000 [00:19<00:20, 257.03it/s, logits=[tensor([-13.8369]), tensor([12.3890])]]

Logits for hard examples=[tensor([-13.8369]), tensor([12.3890])]


 47%|████▋     | 4733/10000 [00:19<00:20, 260.81it/s, logits=[tensor([-13.8425]), tensor([12.3913])]]

Logits for hard examples=[tensor([-13.8425]), tensor([12.3913])]


 48%|████▊     | 4849/10000 [00:20<00:18, 278.43it/s, logits=[tensor([-13.8481]), tensor([12.3937])]]

Logits for hard examples=[tensor([-13.8481]), tensor([12.3937])]


 49%|████▉     | 4932/10000 [00:20<00:18, 269.80it/s, logits=[tensor([-13.8537]), tensor([12.3960])]]

Logits for hard examples=[tensor([-13.8537]), tensor([12.3960])]


 51%|█████     | 5051/10000 [00:20<00:17, 278.08it/s, logits=[tensor([-13.8593]), tensor([12.3984])]]

Logits for hard examples=[tensor([-13.8593]), tensor([12.3984])]


 51%|█████▏    | 5142/10000 [00:21<00:16, 287.59it/s, logits=[tensor([-13.8650]), tensor([12.4007])]]

Logits for hard examples=[tensor([-13.8650]), tensor([12.4007])]


 52%|█████▏    | 5232/10000 [00:21<00:16, 283.15it/s, logits=[tensor([-13.8706]), tensor([12.4031])]]

Logits for hard examples=[tensor([-13.8706]), tensor([12.4031])]


 53%|█████▎    | 5348/10000 [00:21<00:16, 277.82it/s, logits=[tensor([-13.8762]), tensor([12.4054])]]

Logits for hard examples=[tensor([-13.8762]), tensor([12.4054])]


 54%|█████▍    | 5431/10000 [00:22<00:17, 262.21it/s, logits=[tensor([-13.8818]), tensor([12.4077])]]

Logits for hard examples=[tensor([-13.8818]), tensor([12.4077])]


 55%|█████▌    | 5545/10000 [00:22<00:16, 272.82it/s, logits=[tensor([-13.8875]), tensor([12.4100])]]

Logits for hard examples=[tensor([-13.8875]), tensor([12.4100])]


 56%|█████▋    | 5634/10000 [00:22<00:15, 282.94it/s, logits=[tensor([-13.8931]), tensor([12.4123])]]

Logits for hard examples=[tensor([-13.8931]), tensor([12.4123])]


 57%|█████▋    | 5728/10000 [00:23<00:15, 275.67it/s, logits=[tensor([-13.8987]), tensor([12.4146])]]

Logits for hard examples=[tensor([-13.8987]), tensor([12.4146])]


 58%|█████▊    | 5844/10000 [00:23<00:15, 268.35it/s, logits=[tensor([-13.9043]), tensor([12.4169])]]

Logits for hard examples=[tensor([-13.9043]), tensor([12.4169])]


 59%|█████▉    | 5931/10000 [00:24<00:14, 278.45it/s, logits=[tensor([-13.9099]), tensor([12.4192])]]

Logits for hard examples=[tensor([-13.9099]), tensor([12.4192])]


 61%|██████    | 6052/10000 [00:24<00:14, 279.81it/s, logits=[tensor([-13.9154]), tensor([12.4215])]]

Logits for hard examples=[tensor([-13.9154]), tensor([12.4215])]


 61%|██████▏   | 6140/10000 [00:24<00:14, 272.79it/s, logits=[tensor([-13.9210]), tensor([12.4238])]]

Logits for hard examples=[tensor([-13.9210]), tensor([12.4238])]


 62%|██████▏   | 6232/10000 [00:25<00:13, 288.06it/s, logits=[tensor([-13.9266]), tensor([12.4261])]]

Logits for hard examples=[tensor([-13.9266]), tensor([12.4261])]


 63%|██████▎   | 6349/10000 [00:25<00:13, 272.56it/s, logits=[tensor([-13.9321]), tensor([12.4285])]]

Logits for hard examples=[tensor([-13.9321]), tensor([12.4285])]


 64%|██████▍   | 6431/10000 [00:25<00:14, 254.59it/s, logits=[tensor([-13.9376]), tensor([12.4308])]]

Logits for hard examples=[tensor([-13.9376]), tensor([12.4308])]


 65%|██████▌   | 6548/10000 [00:26<00:12, 278.21it/s, logits=[tensor([-13.9431]), tensor([12.4330])]]

Logits for hard examples=[tensor([-13.9431]), tensor([12.4330])]


 66%|██████▋   | 6630/10000 [00:26<00:12, 265.10it/s, logits=[tensor([-13.9486]), tensor([12.4353])]]

Logits for hard examples=[tensor([-13.9486]), tensor([12.4353])]


 67%|██████▋   | 6747/10000 [00:27<00:11, 280.60it/s, logits=[tensor([-13.9541]), tensor([12.4375])]]

Logits for hard examples=[tensor([-13.9541]), tensor([12.4375])]


 68%|██████▊   | 6832/10000 [00:27<00:11, 275.28it/s, logits=[tensor([-13.9596]), tensor([12.4397])]]

Logits for hard examples=[tensor([-13.9596]), tensor([12.4397])]


 70%|██████▉   | 6952/10000 [00:27<00:11, 275.67it/s, logits=[tensor([-13.9651]), tensor([12.4420])]]

Logits for hard examples=[tensor([-13.9651]), tensor([12.4420])]


 70%|███████   | 7044/10000 [00:28<00:10, 288.19it/s, logits=[tensor([-13.9706]), tensor([12.4442])]]

Logits for hard examples=[tensor([-13.9706]), tensor([12.4442])]


 71%|███████▏  | 7131/10000 [00:28<00:10, 281.95it/s, logits=[tensor([-13.9761]), tensor([12.4465])]]

Logits for hard examples=[tensor([-13.9761]), tensor([12.4465])]


 72%|███████▏  | 7242/10000 [00:28<00:11, 248.13it/s, logits=[tensor([-13.9815]), tensor([12.4487])]]

Logits for hard examples=[tensor([-13.9815]), tensor([12.4487])]


 73%|███████▎  | 7337/10000 [00:29<00:12, 218.86it/s, logits=[tensor([-13.9870]), tensor([12.4509])]]

Logits for hard examples=[tensor([-13.9870]), tensor([12.4509])]


 74%|███████▍  | 7426/10000 [00:29<00:13, 194.41it/s, logits=[tensor([-13.9925]), tensor([12.4532])]]

Logits for hard examples=[tensor([-13.9925]), tensor([12.4532])]


 75%|███████▌  | 7540/10000 [00:30<00:11, 211.34it/s, logits=[tensor([-13.9979]), tensor([12.4554])]]

Logits for hard examples=[tensor([-13.9979]), tensor([12.4554])]


 76%|███████▋  | 7627/10000 [00:30<00:11, 205.67it/s, logits=[tensor([-14.0034]), tensor([12.4577])]]

Logits for hard examples=[tensor([-14.0034]), tensor([12.4577])]


 77%|███████▋  | 7734/10000 [00:31<00:11, 193.48it/s, logits=[tensor([-14.0088]), tensor([12.4599])]]

Logits for hard examples=[tensor([-14.0088]), tensor([12.4599])]


 78%|███████▊  | 7840/10000 [00:31<00:11, 193.99it/s, logits=[tensor([-14.0142]), tensor([12.4622])]]

Logits for hard examples=[tensor([-14.0142]), tensor([12.4622])]


 79%|███████▉  | 7926/10000 [00:32<00:10, 203.10it/s, logits=[tensor([-14.0197]), tensor([12.4643])]]

Logits for hard examples=[tensor([-14.0197]), tensor([12.4643])]


 80%|████████  | 8040/10000 [00:32<00:08, 221.91it/s, logits=[tensor([-14.0251]), tensor([12.4665])]]

Logits for hard examples=[tensor([-14.0251]), tensor([12.4665])]


 81%|████████▏ | 8130/10000 [00:33<00:07, 265.91it/s, logits=[tensor([-14.0305]), tensor([12.4687])]]

Logits for hard examples=[tensor([-14.0305]), tensor([12.4687])]


 82%|████████▏ | 8247/10000 [00:33<00:06, 276.17it/s, logits=[tensor([-14.0359]), tensor([12.4708])]]

Logits for hard examples=[tensor([-14.0359]), tensor([12.4708])]


 83%|████████▎ | 8329/10000 [00:33<00:06, 258.67it/s, logits=[tensor([-14.0413]), tensor([12.4730])]]

Logits for hard examples=[tensor([-14.0413]), tensor([12.4730])]


 85%|████████▍ | 8452/10000 [00:34<00:05, 283.19it/s, logits=[tensor([-14.0466]), tensor([12.4752])]]

Logits for hard examples=[tensor([-14.0466]), tensor([12.4752])]


 85%|████████▌ | 8537/10000 [00:34<00:05, 262.57it/s, logits=[tensor([-14.0520]), tensor([12.4774])]]

Logits for hard examples=[tensor([-14.0520]), tensor([12.4774])]


 87%|████████▋ | 8651/10000 [00:35<00:04, 270.90it/s, logits=[tensor([-14.0573]), tensor([12.4795])]]

Logits for hard examples=[tensor([-14.0573]), tensor([12.4795])]


 87%|████████▋ | 8737/10000 [00:35<00:04, 278.30it/s, logits=[tensor([-14.0626]), tensor([12.4817])]]

Logits for hard examples=[tensor([-14.0626]), tensor([12.4817])]


 88%|████████▊ | 8847/10000 [00:35<00:04, 254.07it/s, logits=[tensor([-14.0678]), tensor([12.4839])]]

Logits for hard examples=[tensor([-14.0678]), tensor([12.4839])]


 89%|████████▉ | 8933/10000 [00:36<00:03, 271.69it/s, logits=[tensor([-14.0731]), tensor([12.4861])]]

Logits for hard examples=[tensor([-14.0731]), tensor([12.4861])]


 91%|█████████ | 9052/10000 [00:36<00:03, 278.72it/s, logits=[tensor([-14.0783]), tensor([12.4883])]]

Logits for hard examples=[tensor([-14.0783]), tensor([12.4883])]


 91%|█████████▏| 9135/10000 [00:36<00:03, 257.80it/s, logits=[tensor([-14.0836]), tensor([12.4904])]]

Logits for hard examples=[tensor([-14.0836]), tensor([12.4904])]


 93%|█████████▎| 9260/10000 [00:37<00:02, 293.69it/s, logits=[tensor([-14.0888]), tensor([12.4926])]]

Logits for hard examples=[tensor([-14.0888]), tensor([12.4926])]


 93%|█████████▎| 9349/10000 [00:37<00:02, 279.45it/s, logits=[tensor([-14.0939]), tensor([12.4948])]]

Logits for hard examples=[tensor([-14.0939]), tensor([12.4948])]


 94%|█████████▍| 9434/10000 [00:38<00:02, 270.94it/s, logits=[tensor([-14.0991]), tensor([12.4969])]]

Logits for hard examples=[tensor([-14.0991]), tensor([12.4969])]


 96%|█████████▌| 9555/10000 [00:38<00:01, 291.09it/s, logits=[tensor([-14.1043]), tensor([12.4991])]]

Logits for hard examples=[tensor([-14.1043]), tensor([12.4991])]


 96%|█████████▋| 9642/10000 [00:38<00:01, 267.04it/s, logits=[tensor([-14.1094]), tensor([12.5012])]]

Logits for hard examples=[tensor([-14.1094]), tensor([12.5012])]


 97%|█████████▋| 9728/10000 [00:39<00:00, 272.55it/s, logits=[tensor([-14.1145]), tensor([12.5034])]]

Logits for hard examples=[tensor([-14.1145]), tensor([12.5034])]


 98%|█████████▊| 9849/10000 [00:39<00:00, 275.92it/s, logits=[tensor([-14.1197]), tensor([12.5055])]]

Logits for hard examples=[tensor([-14.1197]), tensor([12.5055])]


 99%|█████████▉| 9936/10000 [00:39<00:00, 272.76it/s, logits=[tensor([-14.1248]), tensor([12.5077])]]

Logits for hard examples=[tensor([-14.1248]), tensor([12.5077])]


100%|██████████| 10000/10000 [00:40<00:00, 249.58it/s, logits=[tensor([-14.1248]), tensor([12.5077])]]


SEQUENCE_LEN=11


  0%|          | 20/10000 [00:00<00:50, 198.92it/s, logits=[tensor([0.1327]), tensor([0.1327])]]

Logits for hard examples=[tensor([0.1327]), tensor([0.1327])]


  1%|▏         | 134/10000 [00:00<00:36, 270.81it/s, logits=[tensor([0.0268]), tensor([0.0268])]]

Logits for hard examples=[tensor([0.0268]), tensor([0.0268])]


  2%|▏         | 246/10000 [00:00<00:36, 265.31it/s, logits=[tensor([-0.0141]), tensor([-0.0141])]]

Logits for hard examples=[tensor([-0.0141]), tensor([-0.0141])]


  3%|▎         | 329/10000 [00:01<00:36, 268.03it/s, logits=[tensor([-0.0002]), tensor([-0.0001])]]

Logits for hard examples=[tensor([-0.0002]), tensor([-0.0001])]


  4%|▍         | 441/10000 [00:01<00:36, 262.13it/s, logits=[tensor([0.0039]), tensor([0.0040])]]

Logits for hard examples=[tensor([0.0039]), tensor([0.0040])]


  5%|▌         | 529/10000 [00:01<00:34, 271.67it/s, logits=[tensor([0.0013]), tensor([0.0014])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0014])]


  6%|▋         | 649/10000 [00:02<00:32, 284.96it/s, logits=[tensor([0.0013]), tensor([0.0014])]]

Logits for hard examples=[tensor([0.0013]), tensor([0.0014])]


  7%|▋         | 734/10000 [00:02<00:37, 247.21it/s, logits=[tensor([0.0017]), tensor([0.0018])]]

Logits for hard examples=[tensor([0.0017]), tensor([0.0018])]


  8%|▊         | 831/10000 [00:03<00:41, 220.00it/s, logits=[tensor([0.0016]), tensor([0.0018])]]

Logits for hard examples=[tensor([0.0016]), tensor([0.0018])]


  9%|▉         | 923/10000 [00:03<00:45, 197.90it/s, logits=[tensor([0.0017]), tensor([0.0021])]]

Logits for hard examples=[tensor([0.0017]), tensor([0.0021])]


 10%|█         | 1039/10000 [00:04<00:42, 212.47it/s, logits=[tensor([0.0023]), tensor([0.0036])]]

Logits for hard examples=[tensor([0.0023]), tensor([0.0036])]


 11%|█         | 1124/10000 [00:04<00:46, 190.03it/s, logits=[tensor([-0.1810]), tensor([0.3612])]]

Logits for hard examples=[tensor([-0.1810]), tensor([0.3612])]


 12%|█▏        | 1226/10000 [00:05<00:46, 188.59it/s, logits=[tensor([-0.1093]), tensor([-0.1093])]]

Logits for hard examples=[tensor([-0.1093]), tensor([-0.1093])]


 13%|█▎        | 1324/10000 [00:05<00:48, 178.39it/s, logits=[tensor([0.0132]), tensor([0.0133])]]

Logits for hard examples=[tensor([0.0132]), tensor([0.0133])]


 14%|█▍        | 1427/10000 [00:06<00:45, 189.87it/s, logits=[tensor([0.0158]), tensor([0.0158])]]

Logits for hard examples=[tensor([0.0158]), tensor([0.0158])]


 16%|█▌        | 1550/10000 [00:06<00:34, 241.72it/s, logits=[tensor([0.0078]), tensor([0.0079])]]

Logits for hard examples=[tensor([0.0078]), tensor([0.0079])]


 16%|█▋        | 1637/10000 [00:07<00:30, 272.58it/s, logits=[tensor([0.0040]), tensor([0.0041])]]

Logits for hard examples=[tensor([0.0040]), tensor([0.0041])]


 17%|█▋        | 1749/10000 [00:07<00:30, 269.98it/s, logits=[tensor([0.0024]), tensor([0.0026])]]

Logits for hard examples=[tensor([0.0024]), tensor([0.0026])]


 18%|█▊        | 1834/10000 [00:07<00:30, 266.77it/s, logits=[tensor([0.0017]), tensor([0.0019])]]

Logits for hard examples=[tensor([0.0017]), tensor([0.0019])]


 19%|█▉        | 1913/10000 [00:08<00:35, 228.99it/s, logits=[tensor([0.0010]), tensor([0.0015])]]

Logits for hard examples=[tensor([0.0010]), tensor([0.0015])]


 20%|██        | 2022/10000 [00:08<00:40, 199.17it/s, logits=[tensor([0.0002]), tensor([0.0026])]]

Logits for hard examples=[tensor([0.0002]), tensor([0.0026])]


 21%|██▏       | 2135/10000 [00:09<00:36, 213.19it/s, logits=[tensor([-0.2315]), tensor([-0.2313])]]

Logits for hard examples=[tensor([-0.2315]), tensor([-0.2313])]


 22%|██▏       | 2224/10000 [00:09<00:38, 203.53it/s, logits=[tensor([0.0215]), tensor([0.0217])]]

Logits for hard examples=[tensor([0.0215]), tensor([0.0217])]


 23%|██▎       | 2340/10000 [00:10<00:35, 214.56it/s, logits=[tensor([0.0396]), tensor([0.0401])]]

Logits for hard examples=[tensor([0.0396]), tensor([0.0401])]


 24%|██▍       | 2427/10000 [00:10<00:40, 185.29it/s, logits=[tensor([-0.0012]), tensor([0.0033])]]

Logits for hard examples=[tensor([-0.0012]), tensor([0.0033])]


 25%|██▌       | 2528/10000 [00:11<00:40, 186.01it/s, logits=[tensor([-10.0638]), tensor([6.2710])]]

Logits for hard examples=[tensor([-10.0638]), tensor([6.2710])]


 26%|██▋       | 2629/10000 [00:11<00:39, 185.77it/s, logits=[tensor([-13.6785]), tensor([9.3779])]]

Logits for hard examples=[tensor([-13.6785]), tensor([9.3779])]


 27%|██▋       | 2731/10000 [00:12<00:30, 241.37it/s, logits=[tensor([-14.1478]), tensor([9.8709])]]

Logits for hard examples=[tensor([-14.1478]), tensor([9.8709])]


 28%|██▊       | 2839/10000 [00:12<00:29, 241.69it/s, logits=[tensor([-14.2051]), tensor([9.9751])]]

Logits for hard examples=[tensor([-14.2051]), tensor([9.9751])]


 30%|██▉       | 2955/10000 [00:13<00:25, 275.77it/s, logits=[tensor([-14.2077]), tensor([10.0222])]]

Logits for hard examples=[tensor([-14.2077]), tensor([10.0222])]


 30%|███       | 3045/10000 [00:13<00:24, 287.07it/s, logits=[tensor([-14.2034]), tensor([10.0593])]]

Logits for hard examples=[tensor([-14.2034]), tensor([10.0593])]


 31%|███▏      | 3129/10000 [00:13<00:26, 260.22it/s, logits=[tensor([-14.1983]), tensor([10.0931])]]

Logits for hard examples=[tensor([-14.1983]), tensor([10.0931])]


 32%|███▏      | 3245/10000 [00:14<00:24, 278.90it/s, logits=[tensor([-14.1932]), tensor([10.1249])]]

Logits for hard examples=[tensor([-14.1932]), tensor([10.1249])]


 33%|███▎      | 3330/10000 [00:14<00:24, 275.34it/s, logits=[tensor([-14.1884]), tensor([10.1550])]]

Logits for hard examples=[tensor([-14.1884]), tensor([10.1550])]


 34%|███▍      | 3446/10000 [00:14<00:23, 276.19it/s, logits=[tensor([-14.1837]), tensor([10.1836])]]

Logits for hard examples=[tensor([-14.1837]), tensor([10.1836])]


 35%|███▌      | 3535/10000 [00:15<00:22, 282.72it/s, logits=[tensor([-14.1793]), tensor([10.2109])]]

Logits for hard examples=[tensor([-14.1793]), tensor([10.2109])]


 37%|███▋      | 3652/10000 [00:15<00:23, 273.00it/s, logits=[tensor([-14.1749]), tensor([10.2369])]]

Logits for hard examples=[tensor([-14.1749]), tensor([10.2369])]


 37%|███▋      | 3740/10000 [00:16<00:22, 277.18it/s, logits=[tensor([-14.1707]), tensor([10.2619])]]

Logits for hard examples=[tensor([-14.1707]), tensor([10.2619])]


 39%|███▊      | 3859/10000 [00:16<00:21, 288.42it/s, logits=[tensor([-14.1666]), tensor([10.2858])]]

Logits for hard examples=[tensor([-14.1666]), tensor([10.2858])]


 39%|███▉      | 3916/10000 [00:16<00:26, 229.94it/s, logits=[tensor([-14.1626]), tensor([10.3089])]]

Logits for hard examples=[tensor([-14.1626]), tensor([10.3089])]


 40%|████      | 4032/10000 [00:17<00:29, 204.89it/s, logits=[tensor([-14.1588]), tensor([10.3311])]]

Logits for hard examples=[tensor([-14.1588]), tensor([10.3311])]


 41%|████▏     | 4147/10000 [00:17<00:26, 218.15it/s, logits=[tensor([-14.1550]), tensor([10.3525])]]

Logits for hard examples=[tensor([-14.1550]), tensor([10.3525])]


 42%|████▏     | 4241/10000 [00:18<00:25, 226.61it/s, logits=[tensor([-14.1514]), tensor([10.3732])]]

Logits for hard examples=[tensor([-14.1514]), tensor([10.3732])]


 43%|████▎     | 4328/10000 [00:18<00:29, 195.25it/s, logits=[tensor([-14.1478]), tensor([10.3932])]]

Logits for hard examples=[tensor([-14.1478]), tensor([10.3932])]


 44%|████▍     | 4428/10000 [00:19<00:30, 184.95it/s, logits=[tensor([-14.1444]), tensor([10.4126])]]

Logits for hard examples=[tensor([-14.1444]), tensor([10.4126])]


 45%|████▌     | 4531/10000 [00:19<00:28, 190.71it/s, logits=[tensor([-14.1410]), tensor([10.4315])]]

Logits for hard examples=[tensor([-14.1410]), tensor([10.4315])]


 46%|████▌     | 4619/10000 [00:20<00:27, 199.20it/s, logits=[tensor([-14.1378]), tensor([10.4498])]]

Logits for hard examples=[tensor([-14.1378]), tensor([10.4498])]


 47%|████▋     | 4744/10000 [00:20<00:21, 244.23it/s, logits=[tensor([-14.1346]), tensor([10.4675])]]

Logits for hard examples=[tensor([-14.1346]), tensor([10.4675])]


 48%|████▊     | 4829/10000 [00:21<00:19, 266.73it/s, logits=[tensor([-14.1314]), tensor([10.4849])]]

Logits for hard examples=[tensor([-14.1314]), tensor([10.4849])]


 49%|████▉     | 4945/10000 [00:21<00:17, 281.12it/s, logits=[tensor([-14.1284]), tensor([10.5017])]]

Logits for hard examples=[tensor([-14.1284]), tensor([10.5017])]


 50%|█████     | 5032/10000 [00:21<00:18, 269.10it/s, logits=[tensor([-14.1254]), tensor([10.5182])]]

Logits for hard examples=[tensor([-14.1254]), tensor([10.5182])]


 52%|█████▏    | 5152/10000 [00:22<00:17, 278.74it/s, logits=[tensor([-14.1225]), tensor([10.5342])]]

Logits for hard examples=[tensor([-14.1225]), tensor([10.5342])]


 52%|█████▏    | 5235/10000 [00:22<00:18, 258.39it/s, logits=[tensor([-14.1196]), tensor([10.5499])]]

Logits for hard examples=[tensor([-14.1196]), tensor([10.5499])]


 53%|█████▎    | 5349/10000 [00:23<00:17, 272.72it/s, logits=[tensor([-14.1168]), tensor([10.5652])]]

Logits for hard examples=[tensor([-14.1168]), tensor([10.5652])]


 54%|█████▍    | 5434/10000 [00:23<00:16, 271.80it/s, logits=[tensor([-14.1140]), tensor([10.5802])]]

Logits for hard examples=[tensor([-14.1140]), tensor([10.5802])]


 55%|█████▌    | 5545/10000 [00:23<00:17, 253.59it/s, logits=[tensor([-14.1113]), tensor([10.5949])]]

Logits for hard examples=[tensor([-14.1113]), tensor([10.5949])]


 56%|█████▋    | 5632/10000 [00:24<00:16, 271.14it/s, logits=[tensor([-14.1086]), tensor([10.6092])]]

Logits for hard examples=[tensor([-14.1086]), tensor([10.6092])]


 57%|█████▋    | 5744/10000 [00:24<00:16, 258.47it/s, logits=[tensor([-14.1060]), tensor([10.6232])]]

Logits for hard examples=[tensor([-14.1060]), tensor([10.6232])]


 58%|█████▊    | 5828/10000 [00:24<00:16, 259.34it/s, logits=[tensor([-14.1034]), tensor([10.6370])]]

Logits for hard examples=[tensor([-14.1034]), tensor([10.6370])]


 59%|█████▉    | 5938/10000 [00:25<00:15, 260.65it/s, logits=[tensor([-14.1009]), tensor([10.6505])]]

Logits for hard examples=[tensor([-14.1009]), tensor([10.6505])]


 60%|██████    | 6024/10000 [00:25<00:15, 249.52it/s, logits=[tensor([-14.0984]), tensor([10.6638])]]

Logits for hard examples=[tensor([-14.0984]), tensor([10.6638])]


 61%|██████▏   | 6139/10000 [00:26<00:14, 274.66it/s, logits=[tensor([-14.0960]), tensor([10.6768])]]

Logits for hard examples=[tensor([-14.0960]), tensor([10.6768])]


 63%|██████▎   | 6254/10000 [00:26<00:13, 269.02it/s, logits=[tensor([-14.0936]), tensor([10.6895])]]

Logits for hard examples=[tensor([-14.0936]), tensor([10.6895])]


 63%|██████▎   | 6341/10000 [00:26<00:13, 274.27it/s, logits=[tensor([-14.0913]), tensor([10.7021])]]

Logits for hard examples=[tensor([-14.0913]), tensor([10.7021])]


 65%|██████▍   | 6456/10000 [00:27<00:12, 278.41it/s, logits=[tensor([-14.0889]), tensor([10.7144])]]

Logits for hard examples=[tensor([-14.0889]), tensor([10.7144])]


 65%|██████▌   | 6539/10000 [00:27<00:13, 258.16it/s, logits=[tensor([-14.0867]), tensor([10.7265])]]

Logits for hard examples=[tensor([-14.0867]), tensor([10.7265])]


 67%|██████▋   | 6655/10000 [00:27<00:12, 276.51it/s, logits=[tensor([-14.0844]), tensor([10.7384])]]

Logits for hard examples=[tensor([-14.0844]), tensor([10.7384])]


 67%|██████▋   | 6744/10000 [00:28<00:11, 284.40it/s, logits=[tensor([-14.0822]), tensor([10.7501])]]

Logits for hard examples=[tensor([-14.0822]), tensor([10.7501])]


 68%|██████▊   | 6829/10000 [00:28<00:12, 258.21it/s, logits=[tensor([-14.0800]), tensor([10.7617])]]

Logits for hard examples=[tensor([-14.0800]), tensor([10.7617])]


 69%|██████▉   | 6948/10000 [00:29<00:10, 285.47it/s, logits=[tensor([-14.0778]), tensor([10.7730])]]

Logits for hard examples=[tensor([-14.0778]), tensor([10.7730])]


 70%|███████   | 7036/10000 [00:29<00:10, 282.40it/s, logits=[tensor([-14.0757]), tensor([10.7842])]]

Logits for hard examples=[tensor([-14.0757]), tensor([10.7842])]


 71%|███████▏  | 7149/10000 [00:29<00:10, 269.39it/s, logits=[tensor([-14.0736]), tensor([10.7952])]]

Logits for hard examples=[tensor([-14.0736]), tensor([10.7952])]


 72%|███████▏  | 7235/10000 [00:30<00:10, 275.79it/s, logits=[tensor([-14.0716]), tensor([10.8061])]]

Logits for hard examples=[tensor([-14.0716]), tensor([10.8061])]


 73%|███████▎  | 7323/10000 [00:30<00:09, 270.83it/s, logits=[tensor([-14.0696]), tensor([10.8167])]]

Logits for hard examples=[tensor([-14.0696]), tensor([10.8167])]


 74%|███████▍  | 7420/10000 [00:31<00:13, 188.17it/s, logits=[tensor([-14.0676]), tensor([10.8272])]]

Logits for hard examples=[tensor([-14.0676]), tensor([10.8272])]


 75%|███████▌  | 7528/10000 [00:31<00:13, 189.24it/s, logits=[tensor([-14.0656]), tensor([10.8377])]]

Logits for hard examples=[tensor([-14.0656]), tensor([10.8377])]


 76%|███████▋  | 7635/10000 [00:32<00:11, 205.51it/s, logits=[tensor([-14.0637]), tensor([10.8479])]]

Logits for hard examples=[tensor([-14.0637]), tensor([10.8479])]


 77%|███████▋  | 7729/10000 [00:32<00:11, 198.03it/s, logits=[tensor([-14.0618]), tensor([10.8580])]]

Logits for hard examples=[tensor([-14.0618]), tensor([10.8580])]


 78%|███████▊  | 7840/10000 [00:33<00:10, 209.64it/s, logits=[tensor([-14.0599]), tensor([10.8680])]]

Logits for hard examples=[tensor([-14.0599]), tensor([10.8680])]


 79%|███████▉  | 7926/10000 [00:33<00:10, 199.38it/s, logits=[tensor([-14.0580]), tensor([10.8778])]]

Logits for hard examples=[tensor([-14.0580]), tensor([10.8778])]


 80%|████████  | 8030/10000 [00:34<00:10, 180.79it/s, logits=[tensor([-14.0562]), tensor([10.8876])]]

Logits for hard examples=[tensor([-14.0562]), tensor([10.8876])]


 81%|████████▏ | 8128/10000 [00:34<00:07, 234.79it/s, logits=[tensor([-14.0543]), tensor([10.8972])]]

Logits for hard examples=[tensor([-14.0543]), tensor([10.8972])]


 82%|████████▏ | 8240/10000 [00:34<00:06, 271.38it/s, logits=[tensor([-14.0525]), tensor([10.9067])]]

Logits for hard examples=[tensor([-14.0525]), tensor([10.9067])]


 83%|████████▎ | 8329/10000 [00:35<00:05, 280.03it/s, logits=[tensor([-14.0507]), tensor([10.9160])]]

Logits for hard examples=[tensor([-14.0507]), tensor([10.9160])]


 84%|████████▍ | 8444/10000 [00:35<00:05, 267.15it/s, logits=[tensor([-14.0490]), tensor([10.9252])]]

Logits for hard examples=[tensor([-14.0490]), tensor([10.9252])]


 86%|████████▌ | 8557/10000 [00:36<00:05, 279.52it/s, logits=[tensor([-14.0472]), tensor([10.9344])]]

Logits for hard examples=[tensor([-14.0472]), tensor([10.9344])]


 86%|████████▋ | 8646/10000 [00:36<00:04, 284.66it/s, logits=[tensor([-14.0455]), tensor([10.9435])]]

Logits for hard examples=[tensor([-14.0455]), tensor([10.9435])]


 87%|████████▋ | 8731/10000 [00:36<00:04, 261.38it/s, logits=[tensor([-14.0438]), tensor([10.9524])]]

Logits for hard examples=[tensor([-14.0438]), tensor([10.9524])]


 88%|████████▊ | 8845/10000 [00:37<00:04, 273.53it/s, logits=[tensor([-14.0422]), tensor([10.9612])]]

Logits for hard examples=[tensor([-14.0422]), tensor([10.9612])]


 89%|████████▉ | 8930/10000 [00:37<00:03, 277.37it/s, logits=[tensor([-14.0406]), tensor([10.9699])]]

Logits for hard examples=[tensor([-14.0406]), tensor([10.9699])]


 90%|█████████ | 9046/10000 [00:37<00:03, 283.64it/s, logits=[tensor([-14.0390]), tensor([10.9785])]]

Logits for hard examples=[tensor([-14.0390]), tensor([10.9785])]


 91%|█████████▏| 9136/10000 [00:38<00:02, 289.10it/s, logits=[tensor([-14.0374]), tensor([10.9871])]]

Logits for hard examples=[tensor([-14.0374]), tensor([10.9871])]


 92%|█████████▏| 9228/10000 [00:38<00:02, 275.47it/s, logits=[tensor([-14.0358]), tensor([10.9956])]]

Logits for hard examples=[tensor([-14.0358]), tensor([10.9956])]


 93%|█████████▎| 9344/10000 [00:38<00:02, 272.64it/s, logits=[tensor([-14.0342]), tensor([11.0040])]]

Logits for hard examples=[tensor([-14.0342]), tensor([11.0040])]


 94%|█████████▍| 9427/10000 [00:39<00:02, 261.28it/s, logits=[tensor([-14.0327]), tensor([11.0122])]]

Logits for hard examples=[tensor([-14.0327]), tensor([11.0122])]


 95%|█████████▌| 9540/10000 [00:39<00:01, 245.46it/s, logits=[tensor([-14.0311]), tensor([11.0203])]]

Logits for hard examples=[tensor([-14.0311]), tensor([11.0203])]


 97%|█████████▋| 9651/10000 [00:40<00:01, 259.42it/s, logits=[tensor([-14.0296]), tensor([11.0284])]]

Logits for hard examples=[tensor([-14.0296]), tensor([11.0284])]


 97%|█████████▋| 9735/10000 [00:40<00:00, 271.22it/s, logits=[tensor([-14.0281]), tensor([11.0364])]]

Logits for hard examples=[tensor([-14.0281]), tensor([11.0364])]


 99%|█████████▊| 9853/10000 [00:40<00:00, 279.37it/s, logits=[tensor([-14.0266]), tensor([11.0444])]]

Logits for hard examples=[tensor([-14.0266]), tensor([11.0444])]


 99%|█████████▉| 9939/10000 [00:41<00:00, 273.12it/s, logits=[tensor([-14.0252]), tensor([11.0523])]]

Logits for hard examples=[tensor([-14.0252]), tensor([11.0523])]


100%|██████████| 10000/10000 [00:41<00:00, 240.90it/s, logits=[tensor([-14.0252]), tensor([11.0523])]]


In [None]:
print(f"Value for [Type 0] : {torch.sigmoid(torch.tensor(0.0013))} , Value for [Type 1] : {torch.sigmoid(torch.tensor(0.0013))}")

Value for [Type 0] : 0.5003250241279602 , Value for [Type 1] : 0.5003250241279602


In [None]:
# 2
SEQUENCE_LEN = 10
train_model(hidden_dim=20, lr=0.01, num_steps=10000, examples=generate_examples(SEQUENCE_LEN))

  0%|          | 0/10000 [00:00<?, ?it/s, logits=[tensor([-0.1412]), tensor([-0.1415])]]

Logits for hard examples=[tensor([-0.1412]), tensor([-0.1415])]


  1%|          | 100/10000 [00:30<46:14,  3.57it/s, logits=[tensor([-10.7415]), tensor([11.2493])]]

Logits for hard examples=[tensor([-10.7415]), tensor([11.2493])]


  2%|▏         | 200/10000 [01:01<1:00:13,  2.71it/s, logits=[tensor([-11.2921]), tensor([11.7204])]]

Logits for hard examples=[tensor([-11.2921]), tensor([11.7204])]


  3%|▎         | 300/10000 [01:32<47:19,  3.42it/s, logits=[tensor([-11.6458]), tensor([12.0492])]]

Logits for hard examples=[tensor([-11.6458]), tensor([12.0492])]


  4%|▍         | 400/10000 [02:03<53:36,  2.98it/s, logits=[tensor([-11.9103]), tensor([12.3013])]]

Logits for hard examples=[tensor([-11.9103]), tensor([12.3013])]


  5%|▌         | 500/10000 [02:34<45:58,  3.44it/s, logits=[tensor([-12.1213]), tensor([12.5044])]]

Logits for hard examples=[tensor([-12.1213]), tensor([12.5044])]


  6%|▌         | 600/10000 [03:05<45:10,  3.47it/s, logits=[tensor([-12.2945]), tensor([12.6761])]]

Logits for hard examples=[tensor([-12.2945]), tensor([12.6761])]


  7%|▋         | 700/10000 [03:37<46:22,  3.34it/s, logits=[tensor([-12.4439]), tensor([12.8220])]]

Logits for hard examples=[tensor([-12.4439]), tensor([12.8220])]


  8%|▊         | 800/10000 [04:08<45:14,  3.39it/s, logits=[tensor([-12.5746]), tensor([12.9560])]]

Logits for hard examples=[tensor([-12.5746]), tensor([12.9560])]


  9%|▉         | 900/10000 [04:40<1:01:11,  2.48it/s, logits=[tensor([-12.6877]), tensor([13.0660])]]

Logits for hard examples=[tensor([-12.6877]), tensor([13.0660])]


 10%|█         | 1000/10000 [05:10<42:31,  3.53it/s, logits=[tensor([-12.7930]), tensor([13.1672])]]

Logits for hard examples=[tensor([-12.7930]), tensor([13.1672])]


 11%|█         | 1100/10000 [05:40<42:00,  3.53it/s, logits=[tensor([-12.8904]), tensor([13.2626])]]

Logits for hard examples=[tensor([-12.8904]), tensor([13.2626])]


 12%|█▏        | 1200/10000 [06:14<43:11,  3.40it/s, logits=[tensor([-12.9783]), tensor([13.3536])]]

Logits for hard examples=[tensor([-12.9783]), tensor([13.3536])]


 13%|█▎        | 1300/10000 [06:50<56:05,  2.59it/s, logits=[tensor([-13.0531]), tensor([13.4387])]]

Logits for hard examples=[tensor([-13.0531]), tensor([13.4387])]


 14%|█▍        | 1400/10000 [07:23<43:41,  3.28it/s, logits=[tensor([-13.1246]), tensor([13.5131])]]

Logits for hard examples=[tensor([-13.1246]), tensor([13.5131])]


 15%|█▌        | 1500/10000 [07:57<45:11,  3.13it/s, logits=[tensor([-13.1943]), tensor([13.5786])]]

Logits for hard examples=[tensor([-13.1943]), tensor([13.5786])]


 16%|█▌        | 1600/10000 [08:29<42:02,  3.33it/s, logits=[tensor([-13.2611]), tensor([13.6383])]]

Logits for hard examples=[tensor([-13.2611]), tensor([13.6383])]


 17%|█▋        | 1700/10000 [09:02<45:42,  3.03it/s, logits=[tensor([-13.3228]), tensor([13.6963])]]

Logits for hard examples=[tensor([-13.3228]), tensor([13.6963])]


 18%|█▊        | 1800/10000 [09:37<59:17,  2.30it/s, logits=[tensor([-13.3839]), tensor([13.7500])]]

Logits for hard examples=[tensor([-13.3839]), tensor([13.7500])]


 19%|█▉        | 1900/10000 [10:14<49:36,  2.72it/s, logits=[tensor([-13.4432]), tensor([13.8038])]]

Logits for hard examples=[tensor([-13.4432]), tensor([13.8038])]


 20%|██        | 2000/10000 [10:50<41:01,  3.25it/s, logits=[tensor([-13.4923]), tensor([13.8552])]]

Logits for hard examples=[tensor([-13.4923]), tensor([13.8552])]


 21%|██        | 2100/10000 [11:25<50:56,  2.58it/s, logits=[tensor([-13.5404]), tensor([13.9067])]]

Logits for hard examples=[tensor([-13.5404]), tensor([13.9067])]


 22%|██▏       | 2200/10000 [12:02<58:20,  2.23it/s, logits=[tensor([-13.5875]), tensor([13.9561])]]

Logits for hard examples=[tensor([-13.5875]), tensor([13.9561])]


 23%|██▎       | 2300/10000 [12:37<43:01,  2.98it/s, logits=[tensor([-13.6303]), tensor([14.0039])]]

Logits for hard examples=[tensor([-13.6303]), tensor([14.0039])]


 24%|██▍       | 2400/10000 [13:11<39:00,  3.25it/s, logits=[tensor([-13.6709]), tensor([14.0517])]]

Logits for hard examples=[tensor([-13.6709]), tensor([14.0517])]


 25%|██▌       | 2500/10000 [13:46<49:31,  2.52it/s, logits=[tensor([-13.7113]), tensor([14.0950])]]

Logits for hard examples=[tensor([-13.7113]), tensor([14.0950])]


 26%|██▌       | 2600/10000 [14:17<36:33,  3.37it/s, logits=[tensor([-13.7515]), tensor([14.1356])]]

Logits for hard examples=[tensor([-13.7515]), tensor([14.1356])]


 27%|██▋       | 2700/10000 [14:49<47:12,  2.58it/s, logits=[tensor([-13.7913]), tensor([14.1762])]]

Logits for hard examples=[tensor([-13.7913]), tensor([14.1762])]


 28%|██▊       | 2800/10000 [15:21<37:05,  3.24it/s, logits=[tensor([-13.8309]), tensor([14.2168])]]

Logits for hard examples=[tensor([-13.8309]), tensor([14.2168])]


 29%|██▉       | 2900/10000 [15:54<45:25,  2.61it/s, logits=[tensor([-13.8699]), tensor([14.2552])]]

Logits for hard examples=[tensor([-13.8699]), tensor([14.2552])]


 30%|███       | 3000/10000 [16:27<34:39,  3.37it/s, logits=[tensor([-13.9080]), tensor([14.2905])]]

Logits for hard examples=[tensor([-13.9080]), tensor([14.2905])]


 31%|███       | 3100/10000 [16:59<43:49,  2.62it/s, logits=[tensor([-13.9450]), tensor([14.3258])]]

Logits for hard examples=[tensor([-13.9450]), tensor([14.3258])]


 32%|███▏      | 3200/10000 [17:31<33:42,  3.36it/s, logits=[tensor([-13.9814]), tensor([14.3614])]]

Logits for hard examples=[tensor([-13.9814]), tensor([14.3614])]


 33%|███▎      | 3300/10000 [18:03<41:32,  2.69it/s, logits=[tensor([-14.0141]), tensor([14.4006])]]

Logits for hard examples=[tensor([-14.0141]), tensor([14.4006])]


 34%|███▍      | 3400/10000 [18:35<32:29,  3.39it/s, logits=[tensor([-14.0459]), tensor([14.4400])]]

Logits for hard examples=[tensor([-14.0459]), tensor([14.4400])]


 35%|███▌      | 3500/10000 [19:11<42:17,  2.56it/s, logits=[tensor([-14.0821]), tensor([14.4639])]]

Logits for hard examples=[tensor([-14.0821]), tensor([14.4639])]


 36%|███▌      | 3600/10000 [19:46<32:07,  3.32it/s, logits=[tensor([-14.1157]), tensor([14.4878])]]

Logits for hard examples=[tensor([-14.1157]), tensor([14.4878])]


 37%|███▋      | 3700/10000 [20:20<32:25,  3.24it/s, logits=[tensor([-14.1459]), tensor([14.5128])]]

Logits for hard examples=[tensor([-14.1459]), tensor([14.5128])]


 38%|███▊      | 3800/10000 [20:52<32:51,  3.14it/s, logits=[tensor([-14.1726]), tensor([14.5412])]]

Logits for hard examples=[tensor([-14.1726]), tensor([14.5412])]


 39%|███▉      | 3900/10000 [21:25<29:53,  3.40it/s, logits=[tensor([-14.1989]), tensor([14.5705])]]

Logits for hard examples=[tensor([-14.1989]), tensor([14.5705])]


 40%|████      | 4000/10000 [21:57<34:26,  2.90it/s, logits=[tensor([-14.2251]), tensor([14.6003])]]

Logits for hard examples=[tensor([-14.2251]), tensor([14.6003])]


 41%|████      | 4100/10000 [22:30<30:08,  3.26it/s, logits=[tensor([-14.2500]), tensor([14.6301])]]

Logits for hard examples=[tensor([-14.2500]), tensor([14.6301])]


 42%|████▏     | 4200/10000 [23:05<36:25,  2.65it/s, logits=[tensor([-14.2738]), tensor([14.6601])]]

Logits for hard examples=[tensor([-14.2738]), tensor([14.6601])]


 43%|████▎     | 4300/10000 [23:40<31:35,  3.01it/s, logits=[tensor([-14.2977]), tensor([14.6899])]]

Logits for hard examples=[tensor([-14.2977]), tensor([14.6899])]


 44%|████▍     | 4400/10000 [24:16<27:34,  3.38it/s, logits=[tensor([-14.3253]), tensor([14.7003])]]

Logits for hard examples=[tensor([-14.3253]), tensor([14.7003])]


 45%|████▌     | 4500/10000 [24:50<36:51,  2.49it/s, logits=[tensor([-14.3495]), tensor([14.7140])]]

Logits for hard examples=[tensor([-14.3495]), tensor([14.7140])]


 46%|████▌     | 4600/10000 [25:24<31:00,  2.90it/s, logits=[tensor([-14.3729]), tensor([14.7284])]]

Logits for hard examples=[tensor([-14.3729]), tensor([14.7284])]


 47%|████▋     | 4700/10000 [26:04<30:51,  2.86it/s, logits=[tensor([-14.3962]), tensor([14.7428])]]

Logits for hard examples=[tensor([-14.3962]), tensor([14.7428])]


 48%|████▊     | 4800/10000 [26:42<30:49,  2.81it/s, logits=[tensor([-14.4195]), tensor([14.7572])]]

Logits for hard examples=[tensor([-14.4195]), tensor([14.7572])]


 49%|████▉     | 4900/10000 [27:30<37:18,  2.28it/s, logits=[tensor([-14.4427]), tensor([14.7716])]]

Logits for hard examples=[tensor([-14.4427]), tensor([14.7716])]


 50%|█████     | 5000/10000 [28:10<26:42,  3.12it/s, logits=[tensor([-14.4658]), tensor([14.7860])]]

Logits for hard examples=[tensor([-14.4658]), tensor([14.7860])]


 51%|█████     | 5100/10000 [28:44<30:58,  2.64it/s, logits=[tensor([-14.4889]), tensor([14.8003])]]

Logits for hard examples=[tensor([-14.4889]), tensor([14.8003])]


 52%|█████▏    | 5200/10000 [29:20<27:20,  2.93it/s, logits=[tensor([-14.5119]), tensor([14.8147])]]

Logits for hard examples=[tensor([-14.5119]), tensor([14.8147])]


 53%|█████▎    | 5300/10000 [29:55<27:50,  2.81it/s, logits=[tensor([-14.5347]), tensor([14.8290])]]

Logits for hard examples=[tensor([-14.5347]), tensor([14.8290])]


 54%|█████▍    | 5400/10000 [30:29<32:58,  2.33it/s, logits=[tensor([-14.5541]), tensor([14.8434])]]

Logits for hard examples=[tensor([-14.5541]), tensor([14.8434])]


 55%|█████▌    | 5500/10000 [31:03<25:05,  2.99it/s, logits=[tensor([-14.5671]), tensor([14.8577])]]

Logits for hard examples=[tensor([-14.5671]), tensor([14.8577])]


 56%|█████▌    | 5600/10000 [31:39<23:09,  3.17it/s, logits=[tensor([-14.5800]), tensor([14.8721])]]

Logits for hard examples=[tensor([-14.5800]), tensor([14.8721])]


 57%|█████▋    | 5700/10000 [32:12<28:03,  2.55it/s, logits=[tensor([-14.5929]), tensor([14.8864])]]

Logits for hard examples=[tensor([-14.5929]), tensor([14.8864])]


 58%|█████▊    | 5800/10000 [32:45<22:34,  3.10it/s, logits=[tensor([-14.6057]), tensor([14.9008])]]

Logits for hard examples=[tensor([-14.6057]), tensor([14.9008])]


 59%|█████▉    | 5900/10000 [33:22<20:47,  3.29it/s, logits=[tensor([-14.6183]), tensor([14.9152])]]

Logits for hard examples=[tensor([-14.6183]), tensor([14.9152])]


 60%|██████    | 6000/10000 [33:58<29:47,  2.24it/s, logits=[tensor([-14.6310]), tensor([14.9295])]]

Logits for hard examples=[tensor([-14.6310]), tensor([14.9295])]


 61%|██████    | 6100/10000 [34:31<21:07,  3.08it/s, logits=[tensor([-14.6436]), tensor([14.9439])]]

Logits for hard examples=[tensor([-14.6436]), tensor([14.9439])]


 62%|██████▏   | 6200/10000 [35:09<20:11,  3.14it/s, logits=[tensor([-14.6563]), tensor([14.9582])]]

Logits for hard examples=[tensor([-14.6563]), tensor([14.9582])]


 63%|██████▎   | 6300/10000 [35:43<26:45,  2.30it/s, logits=[tensor([-14.6689]), tensor([14.9726])]]

Logits for hard examples=[tensor([-14.6689]), tensor([14.9726])]


 64%|██████▍   | 6400/10000 [36:17<18:25,  3.26it/s, logits=[tensor([-14.6816]), tensor([14.9870])]]

Logits for hard examples=[tensor([-14.6816]), tensor([14.9870])]


 65%|██████▌   | 6500/10000 [36:53<20:20,  2.87it/s, logits=[tensor([-14.6942]), tensor([15.0015])]]

Logits for hard examples=[tensor([-14.6942]), tensor([15.0015])]


 66%|██████▌   | 6600/10000 [37:30<24:25,  2.32it/s, logits=[tensor([-14.7068]), tensor([15.0159])]]

Logits for hard examples=[tensor([-14.7068]), tensor([15.0159])]


 67%|██████▋   | 6700/10000 [38:06<22:16,  2.47it/s, logits=[tensor([-14.7194]), tensor([15.0283])]]

Logits for hard examples=[tensor([-14.7194]), tensor([15.0283])]


 68%|██████▊   | 6800/10000 [38:45<21:26,  2.49it/s, logits=[tensor([-14.7320]), tensor([15.0356])]]

Logits for hard examples=[tensor([-14.7320]), tensor([15.0356])]


 69%|██████▉   | 6900/10000 [39:23<28:23,  1.82it/s, logits=[tensor([-14.7445]), tensor([15.0429])]]

Logits for hard examples=[tensor([-14.7445]), tensor([15.0429])]


 70%|███████   | 7000/10000 [40:07<23:49,  2.10it/s, logits=[tensor([-14.7570]), tensor([15.0503])]]

Logits for hard examples=[tensor([-14.7570]), tensor([15.0503])]


 71%|███████   | 7100/10000 [40:43<19:33,  2.47it/s, logits=[tensor([-14.7685]), tensor([15.0576])]]

Logits for hard examples=[tensor([-14.7685]), tensor([15.0576])]


 72%|███████▏  | 7200/10000 [41:21<15:28,  3.02it/s, logits=[tensor([-14.7787]), tensor([15.0649])]]

Logits for hard examples=[tensor([-14.7787]), tensor([15.0649])]


 73%|███████▎  | 7300/10000 [41:59<15:15,  2.95it/s, logits=[tensor([-14.7890]), tensor([15.0722])]]

Logits for hard examples=[tensor([-14.7890]), tensor([15.0722])]


 74%|███████▍  | 7400/10000 [42:36<16:19,  2.65it/s, logits=[tensor([-14.7992]), tensor([15.0795])]]

Logits for hard examples=[tensor([-14.7992]), tensor([15.0795])]


 75%|███████▌  | 7500/10000 [43:12<14:17,  2.91it/s, logits=[tensor([-14.8094]), tensor([15.0868])]]

Logits for hard examples=[tensor([-14.8094]), tensor([15.0868])]


 76%|███████▌  | 7600/10000 [43:50<14:53,  2.69it/s, logits=[tensor([-14.8196]), tensor([15.0941])]]

Logits for hard examples=[tensor([-14.8196]), tensor([15.0941])]


 77%|███████▋  | 7700/10000 [44:29<14:25,  2.66it/s, logits=[tensor([-14.8297]), tensor([15.1015])]]

Logits for hard examples=[tensor([-14.8297]), tensor([15.1015])]


 78%|███████▊  | 7800/10000 [45:09<14:05,  2.60it/s, logits=[tensor([-14.8399]), tensor([15.1088])]]

Logits for hard examples=[tensor([-14.8399]), tensor([15.1088])]


 79%|███████▉  | 7900/10000 [45:47<13:10,  2.66it/s, logits=[tensor([-14.8500]), tensor([15.1161])]]

Logits for hard examples=[tensor([-14.8500]), tensor([15.1161])]


 80%|████████  | 8000/10000 [46:24<14:16,  2.34it/s, logits=[tensor([-14.8600]), tensor([15.1234])]]

Logits for hard examples=[tensor([-14.8600]), tensor([15.1234])]


 81%|████████  | 8100/10000 [46:58<11:25,  2.77it/s, logits=[tensor([-14.8701]), tensor([15.1308])]]

Logits for hard examples=[tensor([-14.8701]), tensor([15.1308])]


 82%|████████▏ | 8200/10000 [47:36<10:36,  2.83it/s, logits=[tensor([-14.8801]), tensor([15.1381])]]

Logits for hard examples=[tensor([-14.8801]), tensor([15.1381])]


 83%|████████▎ | 8300/10000 [48:13<09:07,  3.10it/s, logits=[tensor([-14.8901]), tensor([15.1454])]]

Logits for hard examples=[tensor([-14.8901]), tensor([15.1454])]


 84%|████████▍ | 8400/10000 [48:50<09:08,  2.92it/s, logits=[tensor([-14.9001]), tensor([15.1527])]]

Logits for hard examples=[tensor([-14.9001]), tensor([15.1527])]


 85%|████████▌ | 8500/10000 [49:27<09:20,  2.67it/s, logits=[tensor([-14.9100]), tensor([15.1601])]]

Logits for hard examples=[tensor([-14.9100]), tensor([15.1601])]


 86%|████████▌ | 8600/10000 [50:04<10:53,  2.14it/s, logits=[tensor([-14.9199]), tensor([15.1674])]]

Logits for hard examples=[tensor([-14.9199]), tensor([15.1674])]


 87%|████████▋ | 8700/10000 [50:40<06:58,  3.10it/s, logits=[tensor([-14.9298]), tensor([15.1748])]]

Logits for hard examples=[tensor([-14.9298]), tensor([15.1748])]


 88%|████████▊ | 8800/10000 [51:15<06:20,  3.15it/s, logits=[tensor([-14.9397]), tensor([15.1821])]]

Logits for hard examples=[tensor([-14.9397]), tensor([15.1821])]


 89%|████████▉ | 8900/10000 [51:51<06:15,  2.93it/s, logits=[tensor([-14.9495]), tensor([15.1894])]]

Logits for hard examples=[tensor([-14.9495]), tensor([15.1894])]


 90%|█████████ | 9000/10000 [52:25<06:27,  2.58it/s, logits=[tensor([-14.9594]), tensor([15.1967])]]

Logits for hard examples=[tensor([-14.9594]), tensor([15.1967])]


 91%|█████████ | 9100/10000 [52:58<04:33,  3.29it/s, logits=[tensor([-14.9692]), tensor([15.2040])]]

Logits for hard examples=[tensor([-14.9692]), tensor([15.2040])]


 92%|█████████▏| 9200/10000 [53:33<04:26,  3.00it/s, logits=[tensor([-14.9790]), tensor([15.2113])]]

Logits for hard examples=[tensor([-14.9790]), tensor([15.2113])]


 93%|█████████▎| 9300/10000 [54:09<04:30,  2.59it/s, logits=[tensor([-14.9887]), tensor([15.2186])]]

Logits for hard examples=[tensor([-14.9887]), tensor([15.2186])]


 94%|█████████▍| 9400/10000 [54:43<03:07,  3.20it/s, logits=[tensor([-14.9985]), tensor([15.2260])]]

Logits for hard examples=[tensor([-14.9985]), tensor([15.2260])]


 95%|█████████▌| 9500/10000 [55:18<02:43,  3.06it/s, logits=[tensor([-15.0083]), tensor([15.2333])]]

Logits for hard examples=[tensor([-15.0083]), tensor([15.2333])]


 96%|█████████▌| 9600/10000 [55:50<02:02,  3.26it/s, logits=[tensor([-15.0180]), tensor([15.2406])]]

Logits for hard examples=[tensor([-15.0180]), tensor([15.2406])]


 97%|█████████▋| 9700/10000 [56:27<01:41,  2.95it/s, logits=[tensor([-15.0277]), tensor([15.2479])]]

Logits for hard examples=[tensor([-15.0277]), tensor([15.2479])]


 98%|█████████▊| 9800/10000 [57:04<01:10,  2.84it/s, logits=[tensor([-15.0374]), tensor([15.2553])]]

Logits for hard examples=[tensor([-15.0374]), tensor([15.2553])]


 99%|█████████▉| 9900/10000 [57:42<00:33,  2.99it/s, logits=[tensor([-15.0471]), tensor([15.2626])]]

Logits for hard examples=[tensor([-15.0471]), tensor([15.2626])]


100%|██████████| 10000/10000 [58:26<00:00,  2.85it/s, logits=[tensor([-15.0471]), tensor([15.2626])]]


In [None]:
print(f"Value for [Type 0] : {torch.sigmoid(torch.tensor(-14.5929))} , Value for [Type 1] : {torch.sigmoid(torch.tensor(14.8864))}")

Value for [Type 0] : 4.596039104853844e-07 , Value for [Type 1] : 0.9999996423721313


In [None]:
# 3
class RNNModel(nn.Module):
    def __init__(self, hidden_dim: int):
        super(RNNModel, self).__init__()
        self.hidden_dim = hidden_dim
        self.rnn = nn.RNN(1, self.hidden_dim)
        self.hidden2label = nn.Linear(hidden_dim, 1)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        out, _ = self.rnn(x)
        sequence_len = x.shape[0]
        logits = self.hidden2label(F.relu(out[-1].view(-1)))
        return logits

def train_rnn_model(hidden_dim: int, lr: float, num_steps: int = 10000) -> None:
    model = RNNModel(hidden_dim=hidden_dim)
    loss_function = nn.BCEWithLogitsLoss()
    optimizer = optim.SGD(model.parameters(), lr=lr, momentum=0.99)

    pbar = tqdm(range(num_steps))
    for step in pbar:
        if step % 100 == 0:
            logits = eval_on_hard_examples(model)
            pbar.set_postfix(logits=logits)

        sequences = generate_examples(SEQUENCE_LEN)
        for sequence, label in sequences:
            model.zero_grad()
            logit = model(torch.tensor(sequence).view(-1, 1, 1))

            loss = loss_function(logit.view(-1), torch.tensor([label], dtype=torch.float32))
            loss.backward()

            optimizer.step()

SEQUENCE_LEN = 10
train_rnn_model(hidden_dim=20, lr=0.01, num_steps=1000)

  0%|          | 0/1000 [00:00<?, ?it/s, logits=[tensor([-0.0216]), tensor([-0.0216])]]

Logits for hard examples=[tensor([-0.0216]), tensor([-0.0216])]


 10%|█         | 100/1000 [00:55<08:13,  1.82it/s, logits=[tensor([0.0101]), tensor([0.0101])]]  

Logits for hard examples=[tensor([0.0101]), tensor([0.0101])]


 20%|██        | 200/1000 [01:54<05:53,  2.26it/s, logits=[tensor([0.0101]), tensor([0.0101])]]

Logits for hard examples=[tensor([0.0101]), tensor([0.0101])]


 30%|███       | 300/1000 [02:44<05:23,  2.17it/s, logits=[tensor([0.0101]), tensor([0.0101])]]

Logits for hard examples=[tensor([0.0101]), tensor([0.0101])]


 40%|████      | 400/1000 [03:34<04:07,  2.43it/s, logits=[tensor([0.0101]), tensor([0.0101])]]

Logits for hard examples=[tensor([0.0101]), tensor([0.0101])]


 50%|█████     | 500/1000 [04:16<03:05,  2.70it/s, logits=[tensor([0.0101]), tensor([0.0101])]]

Logits for hard examples=[tensor([0.0101]), tensor([0.0101])]


 60%|██████    | 600/1000 [04:58<02:30,  2.65it/s, logits=[tensor([0.0088]), tensor([0.0088])]]

Logits for hard examples=[tensor([0.0088]), tensor([0.0088])]


 70%|███████   | 700/1000 [05:43<03:00,  1.66it/s, logits=[tensor([0.0088]), tensor([0.0088])]]

Logits for hard examples=[tensor([0.0088]), tensor([0.0088])]


 80%|████████  | 800/1000 [06:26<01:22,  2.42it/s, logits=[tensor([0.0088]), tensor([0.0088])]]

Logits for hard examples=[tensor([0.0088]), tensor([0.0088])]


 90%|█████████ | 900/1000 [07:08<00:38,  2.61it/s, logits=[tensor([0.0088]), tensor([0.0088])]]

Logits for hard examples=[tensor([0.0088]), tensor([0.0088])]


100%|██████████| 1000/1000 [07:49<00:00,  2.13it/s, logits=[tensor([0.0088]), tensor([0.0088])]]


In [None]:
print(f"Value for [Type 0] : {torch.sigmoid(torch.tensor(0.0088))} , Value for [Type 1] : {torch.sigmoid(torch.tensor(0.0088))}")

Value for [Type 0] : 0.5022000074386597 , Value for [Type 1] : 0.5022000074386597


In [None]:
# 4
def curriculum_learning(hidden_dim: int, lr: float, max_seq_len: int, num_steps: int = 10000) -> None:
    model = Model(hidden_dim=hidden_dim)
    loss_function = nn.BCEWithLogitsLoss()
    optimizer = optim.SGD(model.parameters(), lr=lr, momentum=0.99)

    for seq_len in range(1, max_seq_len + 1):
        print(f"Train for SEQUENCE_LEN={seq_len}")
        SEQUENCE_LEN = seq_len

        pbar = tqdm(range(num_steps))
        for step in pbar:
            if step % 100 == 0:
              logits = eval_on_hard_examples(model)
              pbar.set_postfix(logits=logits)

        sequences = generate_examples(SEQUENCE_LEN)
        for sequence, label in sequences:
            model.zero_grad()
            logit = model(torch.tensor(sequence).view(-1,1,1))

            loss = loss_function(logit.view(-1), torch.tensor([label], dtype=torch.float32))
            loss.backward()

            optimizer.step()

curriculum_learning(hidden_dim=20, lr=0.01, max_seq_len=20, num_steps=10000)

Train for SEQUENCE_LEN=1


 13%|█▎        | 1301/10000 [00:00<00:00, 12599.99it/s, logits=[tensor([0.1224]), tensor([0.1223])]]

Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits f

 13%|█▎        | 1301/10000 [00:00<00:00, 12599.99it/s, logits=[tensor([0.1224]), tensor([0.1223])]]

Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]


 41%|████▏     | 4137/10000 [00:00<00:00, 13094.23it/s, logits=[tensor([0.1224]), tensor([0.1223])]]

Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits f

 41%|████▏     | 4137/10000 [00:00<00:00, 13094.23it/s, logits=[tensor([0.1224]), tensor([0.1223])]]

Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]

 70%|███████   | 7001/10000 [00:00<00:00, 13491.19it/s, logits=[tensor([0.1224]), tensor([0.1223])]]


Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits 

 70%|███████   | 7001/10000 [00:00<00:00, 13491.19it/s, logits=[tensor([0.1224]), tensor([0.1223])]]

Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]


100%|██████████| 10000/10000 [00:00<00:00, 13475.42it/s, logits=[tensor([0.1224]), tensor([0.1223])]]


Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits for hard examples=[tensor([0.1224]), tensor([0.1223])]
Logits f

 11%|█         | 1101/10000 [00:00<00:00, 10304.61it/s, logits=[tensor([5.6819]), tensor([5.6799])]]

Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits f

 11%|█         | 1101/10000 [00:00<00:00, 10304.61it/s, logits=[tensor([5.6819]), tensor([5.6799])]]

Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]


 32%|███▏      | 3201/10000 [00:00<00:00, 9786.43it/s, logits=[tensor([5.6819]), tensor([5.6799])]]

Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits f

 32%|███▏      | 3201/10000 [00:00<00:00, 9786.43it/s, logits=[tensor([5.6819]), tensor([5.6799])]]

Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]


 52%|█████▏    | 5154/10000 [00:00<00:00, 9670.72it/s, logits=[tensor([5.6819]), tensor([5.6799])]]

Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits f

 52%|█████▏    | 5154/10000 [00:00<00:00, 9670.72it/s, logits=[tensor([5.6819]), tensor([5.6799])]]

Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]

 75%|███████▌  | 7501/10000 [00:00<00:00, 10309.70it/s, logits=[tensor([5.6819]), tensor([5.6799])]]


Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits 

100%|██████████| 10000/10000 [00:00<00:00, 10356.08it/s, logits=[tensor([5.6819]), tensor([5.6799])]]


Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Logits for hard examples=[tensor([5.6819]), tensor([5.6799])]
Train fo

 12%|█▏        | 1201/10000 [00:00<00:00, 11251.72it/s, logits=[tensor([18.2254]), tensor([18.4701])]]

Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.225

 12%|█▏        | 1201/10000 [00:00<00:00, 11251.72it/s, logits=[tensor([18.2254]), tensor([18.4701])]]

Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]


 37%|███▋      | 3701/10000 [00:00<00:00, 11185.56it/s, logits=[tensor([18.2254]), tensor([18.4701])]]

Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.225

 37%|███▋      | 3701/10000 [00:00<00:00, 11185.56it/s, logits=[tensor([18.2254]), tensor([18.4701])]]

Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]


 59%|█████▊    | 5861/10000 [00:00<00:00, 10168.79it/s, logits=[tensor([18.2254]), tensor([18.4701])]]

Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.225

 59%|█████▊    | 5861/10000 [00:00<00:00, 10168.79it/s, logits=[tensor([18.2254]), tensor([18.4701])]]


Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]


 84%|████████▍ | 8401/10000 [00:00<00:00, 11310.49it/s, logits=[tensor([18.2254]), tensor([18.4701])]]

Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.225

 84%|████████▍ | 8401/10000 [00:00<00:00, 11310.49it/s, logits=[tensor([18.2254]), tensor([18.4701])]]


Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]


100%|██████████| 10000/10000 [00:00<00:00, 11110.34it/s, logits=[tensor([18.2254]), tensor([18.4701])]]


Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Logits for hard examples=[tensor([18.2254]), tensor([18.4701])]
Train for SEQUENCE_LEN=4


 12%|█▏        | 1201/10000 [00:00<00:00, 11185.43it/s, logits=[tensor([-7.5364]), tensor([20.1428])]]

Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.536

 12%|█▏        | 1201/10000 [00:00<00:00, 11185.43it/s, logits=[tensor([-7.5364]), tensor([20.1428])]]

Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]


 38%|███▊      | 3801/10000 [00:00<00:00, 12400.19it/s, logits=[tensor([-7.5364]), tensor([20.1428])]]

Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.536

 38%|███▊      | 3801/10000 [00:00<00:00, 12400.19it/s, logits=[tensor([-7.5364]), tensor([20.1428])]]




 66%|██████▌   | 6601/10000 [00:00<00:00, 13010.17it/s, logits=[tensor([-7.5364]), tensor([20.1428])]]

Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.536

 66%|██████▌   | 6601/10000 [00:00<00:00, 13010.17it/s, logits=[tensor([-7.5364]), tensor([20.1428])]]

Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]


100%|██████████| 10000/10000 [00:00<00:00, 12765.96it/s, logits=[tensor([-7.5364]), tensor([20.1428])]]


Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.5364]), tensor([20.1428])]
Logits for hard examples=[tensor([-7.536

 16%|█▌        | 1601/10000 [00:00<00:00, 15380.29it/s, logits=[tensor([-7.8565]), tensor([20.3190])]]

Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.856

 16%|█▌        | 1601/10000 [00:00<00:00, 15380.29it/s, logits=[tensor([-7.8565]), tensor([20.3190])]]

Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]


 50%|█████     | 5001/10000 [00:00<00:00, 16117.66it/s, logits=[tensor([-7.8565]), tensor([20.3190])]]

Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.856

 50%|█████     | 5001/10000 [00:00<00:00, 16117.66it/s, logits=[tensor([-7.8565]), tensor([20.3190])]]

Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]


 83%|████████▎ | 8301/10000 [00:00<00:00, 16090.73it/s, logits=[tensor([-7.8565]), tensor([20.3190])]]

Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.856

100%|██████████| 10000/10000 [00:00<00:00, 15376.82it/s, logits=[tensor([-7.8565]), tensor([20.3190])]]


Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Logits for hard examples=[tensor([-7.8565]), tensor([20.3190])]
Train for SEQUENCE_LEN=6


 13%|█▎        | 1301/10000 [00:00<00:00, 12524.79it/s, logits=[tensor([-8.0286]), tensor([20.3185])]]

Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.028

 13%|█▎        | 1301/10000 [00:00<00:00, 12524.79it/s, logits=[tensor([-8.0286]), tensor([20.3185])]]

Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]


 39%|███▉      | 3901/10000 [00:00<00:00, 12785.84it/s, logits=[tensor([-8.0286]), tensor([20.3185])]]

Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.028

 52%|█████▏    | 5201/10000 [00:00<00:00, 12658.40it/s, logits=[tensor([-8.0286]), tensor([20.3185])]]

Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]


 65%|██████▌   | 6501/10000 [00:00<00:00, 12646.75it/s, logits=[tensor([-8.0286]), tensor([20.3185])]]

Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.028

 78%|███████▊  | 7801/10000 [00:00<00:00, 12583.78it/s, logits=[tensor([-8.0286]), tensor([20.3185])]]

Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]


100%|██████████| 10000/10000 [00:00<00:00, 12808.48it/s, logits=[tensor([-8.0286]), tensor([20.3185])]]


Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.0286]), tensor([20.3185])]
Logits for hard examples=[tensor([-8.028

 15%|█▍        | 1482/10000 [00:00<00:00, 14517.98it/s, logits=[tensor([-8.1586]), tensor([20.2995])]]

Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.158

 15%|█▍        | 1482/10000 [00:00<00:00, 14517.98it/s, logits=[tensor([-8.1586]), tensor([20.2995])]]

Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]


 44%|████▍     | 4401/10000 [00:00<00:00, 13981.33it/s, logits=[tensor([-8.1586]), tensor([20.2995])]]

Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.158

 72%|███████▏  | 7201/10000 [00:00<00:00, 13447.34it/s, logits=[tensor([-8.1586]), tensor([20.2995])]]

Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.158

 72%|███████▏  | 7201/10000 [00:00<00:00, 13447.34it/s, logits=[tensor([-8.1586]), tensor([20.2995])]]

Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]


100%|██████████| 10000/10000 [00:00<00:00, 13241.44it/s, logits=[tensor([-8.1586]), tensor([20.2995])]]


Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.1586]), tensor([20.2995])]
Logits for hard examples=[tensor([-8.158

 13%|█▎        | 1311/10000 [00:00<00:00, 13085.09it/s, logits=[tensor([-8.2656]), tensor([20.2814])]]

Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.265

 29%|██▉       | 2901/10000 [00:00<00:00, 14366.74it/s, logits=[tensor([-8.2656]), tensor([20.2814])]]

Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]


 43%|████▎     | 4336/10000 [00:00<00:00, 13862.97it/s, logits=[tensor([-8.2656]), tensor([20.2814])]]

Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.265

 43%|████▎     | 4336/10000 [00:00<00:00, 13862.97it/s, logits=[tensor([-8.2656]), tensor([20.2814])]]


Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]


 70%|███████   | 7023/10000 [00:00<00:00, 12552.59it/s, logits=[tensor([-8.2656]), tensor([20.2814])]]

Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.265

 70%|███████   | 7023/10000 [00:00<00:00, 12552.59it/s, logits=[tensor([-8.2656]), tensor([20.2814])]]

Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]


100%|██████████| 10000/10000 [00:00<00:00, 12705.30it/s, logits=[tensor([-8.2656]), tensor([20.2814])]]


Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.2656]), tensor([20.2814])]
Logits for hard examples=[tensor([-8.265

 14%|█▍        | 1401/10000 [00:00<00:00, 12343.11it/s, logits=[tensor([-8.3619]), tensor([20.2650])]]

Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.361

 14%|█▍        | 1401/10000 [00:00<00:00, 12343.11it/s, logits=[tensor([-8.3619]), tensor([20.2650])]]

Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]


 40%|████      | 4001/10000 [00:00<00:00, 12659.49it/s, logits=[tensor([-8.3619]), tensor([20.2650])]]

Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.361

 40%|████      | 4001/10000 [00:00<00:00, 12659.49it/s, logits=[tensor([-8.3619]), tensor([20.2650])]]

Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]


 67%|██████▋   | 6701/10000 [00:00<00:00, 12988.51it/s, logits=[tensor([-8.3619]), tensor([20.2650])]]

Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.361

100%|██████████| 10000/10000 [00:00<00:00, 12582.45it/s, logits=[tensor([-8.3619]), tensor([20.2650])]]


Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.3619]), tensor([20.2650])]
Logits for hard examples=[tensor([-8.361

 12%|█▏        | 1201/10000 [00:00<00:00, 11441.99it/s, logits=[tensor([-8.4484]), tensor([20.2504])]]

Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.448

 12%|█▏        | 1201/10000 [00:00<00:00, 11441.99it/s, logits=[tensor([-8.4484]), tensor([20.2504])]]

Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]


 38%|███▊      | 3801/10000 [00:00<00:00, 12259.44it/s, logits=[tensor([-8.4484]), tensor([20.2504])]]

Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.448

 38%|███▊      | 3801/10000 [00:00<00:00, 12259.44it/s, logits=[tensor([-8.4484]), tensor([20.2504])]]

Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]


 66%|██████▌   | 6601/10000 [00:00<00:00, 13088.05it/s, logits=[tensor([-8.4484]), tensor([20.2504])]]

Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.448

100%|██████████| 10000/10000 [00:00<00:00, 13047.63it/s, logits=[tensor([-8.4484]), tensor([20.2504])]]


Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.4484]), tensor([20.2504])]
Logits for hard examples=[tensor([-8.448

 13%|█▎        | 1301/10000 [00:00<00:00, 12031.95it/s, logits=[tensor([-8.5309]), tensor([20.2367])]]

Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.530

 13%|█▎        | 1301/10000 [00:00<00:00, 12031.95it/s, logits=[tensor([-8.5309]), tensor([20.2367])]]

Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]


 40%|████      | 4001/10000 [00:00<00:00, 12437.49it/s, logits=[tensor([-8.5309]), tensor([20.2367])]]

Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.530

 67%|██████▋   | 6701/10000 [00:00<00:00, 12472.26it/s, logits=[tensor([-8.5309]), tensor([20.2367])]]

Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.530

 67%|██████▋   | 6701/10000 [00:00<00:00, 12472.26it/s, logits=[tensor([-8.5309]), tensor([20.2367])]]

Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]


 93%|█████████▎| 9301/10000 [00:00<00:00, 11706.08it/s, logits=[tensor([-8.5309]), tensor([20.2367])]]

Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]
Logits for hard examples=[tensor([-8.530

100%|██████████| 10000/10000 [00:00<00:00, 12101.18it/s, logits=[tensor([-8.5309]), tensor([20.2367])]]

Logits for hard examples=[tensor([-8.5309]), tensor([20.2367])]





Train for SEQUENCE_LEN=12


 16%|█▌        | 1601/10000 [00:00<00:00, 15439.31it/s, logits=[tensor([-8.6048]), tensor([20.2245])]]

Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.604

 46%|████▌     | 4604/10000 [00:00<00:00, 14523.45it/s, logits=[tensor([-8.6048]), tensor([20.2245])]]

Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.604

 46%|████▌     | 4604/10000 [00:00<00:00, 14523.45it/s, logits=[tensor([-8.6048]), tensor([20.2245])]]

Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]


 77%|███████▋  | 7689/10000 [00:00<00:00, 14239.96it/s, logits=[tensor([-8.6048]), tensor([20.2245])]]

Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.604

 77%|███████▋  | 7689/10000 [00:00<00:00, 14239.96it/s, logits=[tensor([-8.6048]), tensor([20.2245])]]

Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]


100%|██████████| 10000/10000 [00:00<00:00, 14340.09it/s, logits=[tensor([-8.6048]), tensor([20.2245])]]


Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Logits for hard examples=[tensor([-8.6048]), tensor([20.2245])]
Train for SEQUENCE_LEN=13


 12%|█▏        | 1201/10000 [00:00<00:00, 11033.97it/s, logits=[tensor([-8.6739]), tensor([20.2131])]]

Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.673

 12%|█▏        | 1201/10000 [00:00<00:00, 11033.97it/s, logits=[tensor([-8.6739]), tensor([20.2131])]]

Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]


 33%|███▎      | 3301/10000 [00:00<00:00, 9417.73it/s, logits=[tensor([-8.6739]), tensor([20.2131])]]

Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.673

 33%|███▎      | 3301/10000 [00:00<00:00, 9417.73it/s, logits=[tensor([-8.6739]), tensor([20.2131])]]

Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]


 55%|█████▌    | 5501/10000 [00:00<00:00, 10302.98it/s, logits=[tensor([-8.6739]), tensor([20.2131])]]

Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.673

 55%|█████▌    | 5501/10000 [00:00<00:00, 10302.98it/s, logits=[tensor([-8.6739]), tensor([20.2131])]]

Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]


 83%|████████▎ | 8301/10000 [00:00<00:00, 11970.88it/s, logits=[tensor([-8.6739]), tensor([20.2131])]]

Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.673

 83%|████████▎ | 8301/10000 [00:00<00:00, 11970.88it/s, logits=[tensor([-8.6739]), tensor([20.2131])]]




100%|██████████| 10000/10000 [00:00<00:00, 10440.61it/s, logits=[tensor([-8.6739]), tensor([20.2131])]]


Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Logits for hard examples=[tensor([-8.6739]), tensor([20.2131])]
Train for SEQUENCE_LEN=14


 14%|█▍        | 1401/10000 [00:00<00:00, 13521.30it/s, logits=[tensor([-8.7393]), tensor([20.2025])]]

Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.739

 14%|█▍        | 1401/10000 [00:00<00:00, 13521.30it/s, logits=[tensor([-8.7393]), tensor([20.2025])]]

Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]


 41%|████      | 4085/10000 [00:00<00:00, 12545.79it/s, logits=[tensor([-8.7393]), tensor([20.2025])]]

Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.739

 41%|████      | 4085/10000 [00:00<00:00, 12545.79it/s, logits=[tensor([-8.7393]), tensor([20.2025])]]

Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]

 65%|██████▌   | 6501/10000 [00:00<00:00, 10965.05it/s, logits=[tensor([-8.7393]), tensor([20.2025])]]


Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.73

 65%|██████▌   | 6501/10000 [00:00<00:00, 10965.05it/s, logits=[tensor([-8.7393]), tensor([20.2025])]]

Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]


 94%|█████████▍| 9401/10000 [00:00<00:00, 12442.82it/s, logits=[tensor([-8.7393]), tensor([20.2025])]]

Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.7393]), tensor([20.2025])]
Logits for hard examples=[tensor([-8.739

100%|██████████| 10000/10000 [00:00<00:00, 12392.37it/s, logits=[tensor([-8.7393]), tensor([20.2025])]]


Train for SEQUENCE_LEN=15


 11%|█         | 1101/10000 [00:00<00:00, 9957.97it/s, logits=[tensor([-8.8038]), tensor([20.1920])]]

Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.803

 11%|█         | 1101/10000 [00:00<00:00, 9957.97it/s, logits=[tensor([-8.8038]), tensor([20.1920])]]

Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]


 32%|███▏      | 3213/10000 [00:00<00:00, 10049.17it/s, logits=[tensor([-8.8038]), tensor([20.1920])]]

Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.803

 32%|███▏      | 3213/10000 [00:00<00:00, 10049.17it/s, logits=[tensor([-8.8038]), tensor([20.1920])]]

Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]

 56%|█████▌    | 5601/10000 [00:00<00:00, 11219.38it/s, logits=[tensor([-8.8038]), tensor([20.1920])]]


Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.80

 56%|█████▌    | 5601/10000 [00:00<00:00, 11219.38it/s, logits=[tensor([-8.8038]), tensor([20.1920])]]

Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]


 80%|████████  | 8001/10000 [00:00<00:00, 11387.27it/s, logits=[tensor([-8.8038]), tensor([20.1920])]]

Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.803

 80%|████████  | 8001/10000 [00:00<00:00, 11387.27it/s, logits=[tensor([-8.8038]), tensor([20.1920])]]

Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]


100%|██████████| 10000/10000 [00:00<00:00, 10647.37it/s, logits=[tensor([-8.8038]), tensor([20.1920])]]


Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Logits for hard examples=[tensor([-8.8038]), tensor([20.1920])]
Train for SEQUENCE_LEN=16


 15%|█▌        | 1501/10000 [00:00<00:00, 14165.77it/s, logits=[tensor([-8.8621]), tensor([20.1826])]]

Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.862

 15%|█▌        | 1501/10000 [00:00<00:00, 14165.77it/s, logits=[tensor([-8.8621]), tensor([20.1826])]]

Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]


 44%|████▍     | 4404/10000 [00:00<00:00, 13884.17it/s, logits=[tensor([-8.8621]), tensor([20.1826])]]

Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.862

 44%|████▍     | 4404/10000 [00:00<00:00, 13884.17it/s, logits=[tensor([-8.8621]), tensor([20.1826])]]

Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]


 72%|███████▏  | 7174/10000 [00:00<00:00, 11432.74it/s, logits=[tensor([-8.8621]), tensor([20.1826])]]

Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.862

 72%|███████▏  | 7174/10000 [00:00<00:00, 11432.74it/s, logits=[tensor([-8.8621]), tensor([20.1826])]]




 84%|████████▎ | 8370/10000 [00:00<00:00, 9963.57it/s, logits=[tensor([-8.8621]), tensor([20.1826])]]

Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.862

 84%|████████▎ | 8370/10000 [00:00<00:00, 9963.57it/s, logits=[tensor([-8.8621]), tensor([20.1826])]]

Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]


100%|██████████| 10000/10000 [00:00<00:00, 11006.90it/s, logits=[tensor([-8.8621]), tensor([20.1826])]]


Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Logits for hard examples=[tensor([-8.8621]), tensor([20.1826])]
Train for SEQUENCE_LEN=17


 15%|█▌        | 1501/10000 [00:00<00:00, 14634.51it/s, logits=[tensor([-8.9190]), tensor([20.1735])]]

Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.919

 15%|█▌        | 1501/10000 [00:00<00:00, 14634.51it/s, logits=[tensor([-8.9190]), tensor([20.1735])]]

Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]


 46%|████▌     | 4592/10000 [00:00<00:00, 14626.91it/s, logits=[tensor([-8.9190]), tensor([20.1735])]]

Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.919

 46%|████▌     | 4592/10000 [00:00<00:00, 14626.91it/s, logits=[tensor([-8.9190]), tensor([20.1735])]]

Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]


 74%|███████▍  | 7401/10000 [00:00<00:00, 13041.17it/s, logits=[tensor([-8.9190]), tensor([20.1735])]]

Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.919

 74%|███████▍  | 7401/10000 [00:00<00:00, 13041.17it/s, logits=[tensor([-8.9190]), tensor([20.1735])]]

Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]


100%|██████████| 10000/10000 [00:00<00:00, 13405.44it/s, logits=[tensor([-8.9190]), tensor([20.1735])]]


Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.9190]), tensor([20.1735])]
Logits for hard examples=[tensor([-8.919

  9%|▉         | 901/10000 [00:00<00:01, 8921.86it/s, logits=[tensor([-8.9728]), tensor([20.1649])]]

Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.972

  9%|▉         | 901/10000 [00:00<00:01, 8921.86it/s, logits=[tensor([-8.9728]), tensor([20.1649])]]

Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]


 34%|███▍      | 3401/10000 [00:00<00:00, 11517.50it/s, logits=[tensor([-8.9728]), tensor([20.1649])]]

Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.972

 49%|████▉     | 4901/10000 [00:00<00:00, 12805.58it/s, logits=[tensor([-8.9728]), tensor([20.1649])]]

Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]


 63%|██████▎   | 6301/10000 [00:00<00:00, 13094.68it/s, logits=[tensor([-8.9728]), tensor([20.1649])]]

Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.972

 76%|███████▌  | 7613/10000 [00:00<00:00, 13078.98it/s, logits=[tensor([-8.9728]), tensor([20.1649])]]

Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]


100%|██████████| 10000/10000 [00:00<00:00, 12792.86it/s, logits=[tensor([-8.9728]), tensor([20.1649])]]


Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.9728]), tensor([20.1649])]
Logits for hard examples=[tensor([-8.972

 13%|█▎        | 1301/10000 [00:00<00:00, 12599.78it/s, logits=[tensor([-9.0195]), tensor([20.1574])]]

Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.019

 39%|███▉      | 3901/10000 [00:00<00:00, 11594.46it/s, logits=[tensor([-9.0195]), tensor([20.1574])]]

Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.019

 39%|███▉      | 3901/10000 [00:00<00:00, 11594.46it/s, logits=[tensor([-9.0195]), tensor([20.1574])]]

Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]


 66%|██████▌   | 6569/10000 [00:00<00:00, 11689.09it/s, logits=[tensor([-9.0195]), tensor([20.1574])]]

Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.019

 66%|██████▌   | 6569/10000 [00:00<00:00, 11689.09it/s, logits=[tensor([-9.0195]), tensor([20.1574])]]

Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]


 89%|████████▉ | 8901/10000 [00:00<00:00, 11000.99it/s, logits=[tensor([-9.0195]), tensor([20.1574])]]

Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.019

 89%|████████▉ | 8901/10000 [00:00<00:00, 11000.99it/s, logits=[tensor([-9.0195]), tensor([20.1574])]]

Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]


100%|██████████| 10000/10000 [00:00<00:00, 11177.91it/s, logits=[tensor([-9.0195]), tensor([20.1574])]]


Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Logits for hard examples=[tensor([-9.0195]), tensor([20.1574])]
Train for SEQUENCE_LEN=20


 14%|█▍        | 1401/10000 [00:00<00:00, 13659.81it/s, logits=[tensor([-9.0663]), tensor([20.1499])]]

Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.066

 44%|████▍     | 4401/10000 [00:00<00:00, 14245.47it/s, logits=[tensor([-9.0663]), tensor([20.1499])]]


Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.06

 75%|███████▍  | 7467/10000 [00:00<00:00, 14339.83it/s, logits=[tensor([-9.0663]), tensor([20.1499])]]

Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.066

100%|██████████| 10000/10000 [00:00<00:00, 13301.87it/s, logits=[tensor([-9.0663]), tensor([20.1499])]]


Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]
Logits for hard examples=[tensor([-9.0663]), tensor([20.1499])]


In [None]:
print(f"Value for [Type 0] : {torch.sigmoid(torch.tensor(-9.0663))} , Value for [Type 1] : {torch.sigmoid(torch.tensor(20.1499))}")

Value for [Type 0] : 0.00011547969188541174 , Value for [Type 1] : 1.0
