### Fine-tuning 6-Billion GPT-J in colab with LoRA and 8-bit compression

This notebook is a proof of concept for fine-tuning [GPT-J-6B](https://huggingface.co/EleutherAI/gpt-j-6B) with limited memory. A detailed explanation of how it works can be found in [this model card](https://huggingface.co/hivemind/gpt-j-6B-8bit).

In [1]:
import transformers

import torch
import torch.nn.functional as F
from torch import nn
from torch.cuda.amp import custom_fwd, custom_bwd

from bitsandbytes.functional import quantize_blockwise, dequantize_blockwise

from tqdm.auto import tqdm

  from .autonotebook import tqdm as notebook_tqdm




### Converting the model to 8 bits.

We convert EleutherAI's GPT-J-6B model to 8 bits using facebook's [bitsandbytes](https://github.com/facebookresearch/bitsandbytes) library. This reduces the model's size from 20Gb down to just 6Gb.

Note that we don't convert linear layer biases to 8 bit as they take up less that 1% of the model's weight anyway.

In [2]:
class FrozenBNBLinear(nn.Module):
    def __init__(self, weight, absmax, code, bias=None):
        assert isinstance(bias, nn.Parameter) or bias is None
        super().__init__()
        self.out_features, self.in_features = weight.shape
        self.register_buffer("weight", weight.requires_grad_(False))
        self.register_buffer("absmax", absmax.requires_grad_(False))
        self.register_buffer("code", code.requires_grad_(False))
        self.adapter = None
        self.bias = bias

    def forward(self, input):
        output = DequantizeAndLinear.apply(input, self.weight, self.absmax, self.code, self.bias)
        if self.adapter:
            output += self.adapter(input)
        return output

    @classmethod
    def from_linear(cls, linear: nn.Linear) -> "FrozenBNBLinear":
        weights_int8, state = quantize_blockise_lowmemory(linear.weight)
        return cls(weights_int8, *state, linear.bias)

    def __repr__(self):
        return f"{self.__class__.__name__}({self.in_features}, {self.out_features})"


class DequantizeAndLinear(torch.autograd.Function):
    @staticmethod
    @custom_fwd
    def forward(ctx, input: torch.Tensor, weights_quantized: torch.ByteTensor,
                absmax: torch.FloatTensor, code: torch.FloatTensor, bias: torch.FloatTensor):
        weights_deq = dequantize_blockwise(weights_quantized, absmax=absmax, code=code)
        ctx.save_for_backward(input, weights_quantized, absmax, code)
        ctx._has_bias = bias is not None
        return F.linear(input, weights_deq, bias).clone()

    @staticmethod
    @custom_bwd
    def backward(ctx, grad_output: torch.Tensor):
        assert not ctx.needs_input_grad[1] and not ctx.needs_input_grad[2] and not ctx.needs_input_grad[3]
        input, weights_quantized, absmax, code = ctx.saved_tensors
        # grad_output: [*batch, out_features]
        weights_deq = dequantize_blockwise(weights_quantized, absmax=absmax, code=code)
        grad_input = grad_output @ weights_deq
        grad_bias = grad_output.flatten(0, -2).sum(dim=0) if ctx._has_bias else None
        return grad_input, None, None, None, grad_bias


class FrozenBNBEmbedding(nn.Module):
    def __init__(self, weight, absmax, code):
        super().__init__()
        self.num_embeddings, self.embedding_dim = weight.shape
        self.register_buffer("weight", weight.requires_grad_(False))
        self.register_buffer("absmax", absmax.requires_grad_(False))
        self.register_buffer("code", code.requires_grad_(False))
        self.adapter = None

    def forward(self, input, **kwargs):
        with torch.no_grad():
            # note: both quantuized weights and input indices are *not* differentiable
            weight_deq = dequantize_blockwise(self.weight, absmax=self.absmax, code=self.code)
            output = F.embedding(input, weight_deq, **kwargs)
        if self.adapter:
            output += self.adapter(input)
        return output

    @classmethod
    def from_embedding(cls, embedding: nn.Embedding) -> "FrozenBNBEmbedding":
        weights_int8, state = quantize_blockise_lowmemory(embedding.weight)
        return cls(weights_int8, *state)

    def __repr__(self):
        return f"{self.__class__.__name__}({self.num_embeddings}, {self.embedding_dim})"


def quantize_blockise_lowmemory(matrix: torch.Tensor, chunk_size: int = 2 ** 20):
    assert chunk_size % 4096 == 0
    code = None
    chunks = []
    absmaxes = []
    flat_tensor = matrix.view(-1)
    for i in range((matrix.numel() - 1) // chunk_size + 1):
        input_chunk = flat_tensor[i * chunk_size: (i + 1) * chunk_size].clone()
        quantized_chunk, (absmax_chunk, code) = quantize_blockwise(input_chunk, code=code)
        chunks.append(quantized_chunk)
        absmaxes.append(absmax_chunk)

    matrix_i8 = torch.cat(chunks).reshape_as(matrix)
    absmax = torch.cat(absmaxes)
    return matrix_i8, (absmax, code)


def convert_to_int8(model):
    """Convert linear and embedding modules to 8-bit with optional adapters"""
    for module in list(model.modules()):
        for name, child in module.named_children():
            if isinstance(child, nn.Linear):
                print(name, child)
                setattr(
                    module,
                    name,
                    FrozenBNBLinear(
                        weight=torch.zeros(child.out_features, child.in_features, dtype=torch.uint8),
                        absmax=torch.zeros((child.weight.numel() - 1) // 4096 + 1),
                        code=torch.zeros(256),
                        bias=child.bias,
                    ),
                )
            elif isinstance(child, nn.Embedding):
                setattr(
                    module,
                    name,
                    FrozenBNBEmbedding(
                        weight=torch.zeros(child.num_embeddings, child.embedding_dim, dtype=torch.uint8),
                        absmax=torch.zeros((child.weight.numel() - 1) // 4096 + 1),
                        code=torch.zeros(256),
                    )
                )

In [3]:
# class GPTJBlock(transformers.models.gptj.modeling_gptj.GPTJBlock):
#     def __init__(self, config):
#         super().__init__(config)

#         convert_to_int8(self.attn)
#         convert_to_int8(self.mlp)


# class GPTJModel(transformers.models.gptj.modeling_gptj.GPTJModel):
#     def __init__(self, config):
#         super().__init__(config)
#         convert_to_int8(self)


class GPTJForCausalLM(transformers.models.gptj.modeling_gptj.GPTJForCausalLM):
    def __init__(self, config):
        super().__init__(config)
        convert_to_int8(self)


# transformers.models.gptj.modeling_gptj.GPTJBlock = GPTJBlock  # monkey-patch GPT-J

In [5]:
config = transformers.GPTJConfig.from_pretrained("EleutherAI/gpt-j-6B")
tokenizer = transformers.AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")

In [6]:
gpt = GPTJForCausalLM.from_pretrained("hivemind/gpt-j-6B-8bit", low_cpu_mem_usage=True)

device = 'cuda' if torch.cuda.is_available() else 'cpu'
gpt.to(device)

lm_head Linear(in_features=4096, out_features=50400, bias=True)
k_proj Linear(in_features=4096, out_features=4096, bias=False)
v_proj Linear(in_features=4096, out_features=4096, bias=False)
q_proj Linear(in_features=4096, out_features=4096, bias=False)
out_proj Linear(in_features=4096, out_features=4096, bias=False)
fc_in Linear(in_features=4096, out_features=16384, bias=True)
fc_out Linear(in_features=16384, out_features=4096, bias=True)
k_proj Linear(in_features=4096, out_features=4096, bias=False)
v_proj Linear(in_features=4096, out_features=4096, bias=False)
q_proj Linear(in_features=4096, out_features=4096, bias=False)
out_proj Linear(in_features=4096, out_features=4096, bias=False)
fc_in Linear(in_features=4096, out_features=16384, bias=True)
fc_out Linear(in_features=16384, out_features=4096, bias=True)
k_proj Linear(in_features=4096, out_features=4096, bias=False)
v_proj Linear(in_features=4096, out_features=4096, bias=False)
q_proj Linear(in_features=4096, out_features=4096, b

GPTJForCausalLM(
  (transformer): GPTJModel(
    (wte): FrozenBNBEmbedding(50400, 4096)
    (drop): Dropout(p=0.0, inplace=False)
    (h): ModuleList(
      (0-27): 28 x GPTJBlock(
        (ln_1): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
        (attn): GPTJAttention(
          (attn_dropout): Dropout(p=0.0, inplace=False)
          (resid_dropout): Dropout(p=0.0, inplace=False)
          (k_proj): FrozenBNBLinear(4096, 4096)
          (v_proj): FrozenBNBLinear(4096, 4096)
          (q_proj): FrozenBNBLinear(4096, 4096)
          (out_proj): FrozenBNBLinear(4096, 4096)
        )
        (mlp): GPTJMLP(
          (fc_in): FrozenBNBLinear(4096, 16384)
          (fc_out): FrozenBNBLinear(16384, 4096)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.0, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): FrozenBNBLinear(4096, 50400)
)

### Text generation example

In [50]:
device = 'cuda'
gpt.to(device)
prompt = tokenizer("The results of a systematic study of the otological aspects in 13 cases of earpit-deafness syndrome are reported. The audiometric, radiological and vestibular findings as well as the results of exploratory tympanotomies with and without stapedectomies are discussed together with the results reported in the literature. A convincing explanation of the poor results of exploratory tympanotomies in cases with mixed hearing loss is not furnished. If the hearing loss is confined to conduction and ankylosis of the stapes or a disconnection of the ossicular chain is suspected, exploratory tympanotomy can be expected to be successful. \n\n##\n\n", return_tensors='pt')
prompt = {key: value.to(device) for key, value in prompt.items()}
out = gpt.generate(**prompt, min_length=128, max_length=512, do_sample=True)
tokenizer.decode(out[0])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


"The results of a systematic study of the otological aspects in 13 cases of earpit-deafness syndrome are reported. The audiometric, radiological and vestibular findings as well as the results of exploratory tympanotomies with and without stapedectomies are discussed together with the results reported in the literature. A convincing explanation of the poor results of exploratory tympanotomies in cases with mixed hearing loss is not furnished. If the hearing loss is confined to conduction and ankylosis of the stapes or a disconnection of the ossicular chain is suspected, exploratory tympanotomy can be expected to be successful. \n\n##\n\n  TABLES\n\nPURPOSE OF THE STUDY: To demonstrate the usefulness of exploratory tympanotomy in the assessment of cases of ear pathology with hearing loss. METHODS/PROCEDURE: This retrospective study involves the examination of patients treated for ear pathology during the period from August 1994 to September 2000 at the Otolaryngology Department, ENT Clin

earpit | HP_0004467\n deafness | HP_0000404\n mixed hearing loss | HP_0000410\n hearing loss | HP_0000365\n conduction | HP_0000405\n ankylosis of the stapes | HP_0000381\n disconnection of the ossicular chain | HP_0004452\n END'

### LoRA fine-tuning example
Here we demonstrate how to fine-tune the proposed model using low-rank adapters [(Hu et al, 2021)](https://arxiv.org/abs/2106.09685) and [8-bit Adam](https://arxiv.org/abs/2110.02861). We also use [dataset streaming API](https://huggingface.co/docs/datasets/dataset_streaming.html) to avoid downloading the large dataset.

In [4]:
def add_adapters(model, adapter_dim=16):
    assert adapter_dim > 0

    for module in model.modules():
        if isinstance(module, FrozenBNBLinear):
            module.adapter = nn.Sequential(
                nn.Linear(module.in_features, adapter_dim, bias=False),
                nn.Linear(adapter_dim, module.out_features, bias=False),
            )
            nn.init.zeros_(module.adapter[1].weight)
        elif isinstance(module, FrozenBNBEmbedding):
            module.adapter = nn.Sequential(
                nn.Embedding(module.num_embeddings, adapter_dim),
                nn.Linear(adapter_dim, module.embedding_dim, bias=False),
            )
            nn.init.zeros_(module.adapter[1].weight)

# add_adapters(gpt)
# gpt.to(device)

In [8]:
from datasets import load_dataset
from bitsandbytes.optim import Adam8bit

gpt.gradient_checkpointing_enable()

# codeparrot = load_dataset("transformersbook/codeparrot-train", streaming=True)
optimizer = Adam8bit(gpt.parameters(), lr=1e-5)

In [9]:
# from google.colab import drive
# drive.mount('/content/drive')

In [7]:
import pandas as pd
# path = '/content/drive/MyDrive/biolarkgsc_locs.csv'
path = './biolarkgsc_locs.csv'
biolark = pd.read_csv(path, sep='\t')

In [8]:
from sklearn.model_selection import train_test_split
train_df, test_df = train_test_split(biolark, test_size=0.1, random_state=42)
len(train_df), len(test_df)

(205, 23)

In [9]:
def preprocess(train_df):
    rows = []
    for i, row in train_df.iterrows():
#         bad examples in train_df
#         if i in {160,7,23,76}:
#             continue
        list_hpo = row.labels.split(';')
        list_hpo_pair = [[_.split('|')[0], _.split('|')[1].split(':')]  for _ in list_hpo]
        # print(list_hpo_pair)
        # for hpo, [s,e] in list_hpo_pair:
        #     print(row.text[int(s)-50:int(e)+50])
        # break
        ans = ""
        hpos = set()
        for hpo, [s,e] in list_hpo_pair:
            if hpo in hpos:
                continue
            ans += ' ' + row.text[int(s):int(e)] + ' | ' + hpo + '\n'
            hpos.add(hpo)
        ans += ' END'
        text = row.text+' \n\n##\n\n '+ans
        rows.append(text)
#         rows.append({"prompt":f"{row.text}\n\n###\n\n", "completion":f" {ans}"})
    return rows

In [10]:
biolark_ft = preprocess(train_df)
biolark_ft

['Familial Angelman syndrome (AS) can result from mutations in chromosome 15q11q13 that, when transmitted from father to child, result in no phenotypic abnormality but, when transmitted from mother to child, cause AS. These mutations therefore behave neither as dominant nor as recessive mutations but, rather, show an imprinted mode of inheritance. We have analyzed two sibling pairs with AS and a larger family with four AS offspring of three sisters with several recently described microsatellite polymorphisms in the AS region. AS siblings inherited the same maternal alleles at the GABRB3 and GABRA5 loci, and the unaffected siblings of AS individuals inherited the other maternal alleles at these loci. In one of the AS sibling pairs, analysis of a recombination event indicates that the mutation responsible for AS is distal to locus D15S63. This result is consistent with a previously described imprinted submicroscopic deletion causing AS, a deletion that includes loci D15S10, D15S113, and 

In [11]:
biolark_tt = preprocess(test_df)

In [16]:
#fine-tune biolark
for epoch in range(5):
    print(epoch)
    for i in tqdm(range(len(biolark_ft))):
        if len(biolark_ft[i]) <= 1:
            continue

        batch = tokenizer(biolark_ft[i], truncation=True, max_length=2048, return_tensors='pt')
        batch = {k: v.cuda() for k, v in batch.items()}
        with torch.cuda.amp.autocast():
            out = gpt.forward(**batch,)

            loss = F.cross_entropy(out.logits[:, :-1, :].flatten(0, -2), batch['input_ids'][:, 1:].flatten(),
                                   reduction='mean')
        print(loss)
        loss.backward()

        optimizer.step()
        optimizer.zero_grad()

0


  0%|          | 0/205 [00:00<?, ?it/s]

tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward0>)


  1%|          | 2/205 [00:00<01:09,  2.93it/s]

tensor(0.0440, device='cuda:0', grad_fn=<NllLossBackward0>)


  1%|▏         | 3/205 [00:00<01:04,  3.14it/s]

tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward0>)


  2%|▏         | 4/205 [00:01<00:59,  3.37it/s]

tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward0>)


  2%|▏         | 5/205 [00:01<00:57,  3.47it/s]

tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward0>)


  3%|▎         | 6/205 [00:01<00:59,  3.36it/s]

tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward0>)


  3%|▎         | 7/205 [00:02<00:58,  3.38it/s]

tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward0>)


  4%|▍         | 8/205 [00:02<01:01,  3.23it/s]

tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward0>)


  4%|▍         | 9/205 [00:02<01:01,  3.21it/s]

tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward0>)


  5%|▍         | 10/205 [00:03<01:01,  3.17it/s]

tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward0>)


  5%|▌         | 11/205 [00:03<00:59,  3.29it/s]

tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward0>)


  6%|▌         | 12/205 [00:03<00:55,  3.47it/s]

tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward0>)


  6%|▋         | 13/205 [00:03<00:54,  3.53it/s]

tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward0>)


  7%|▋         | 14/205 [00:04<00:52,  3.61it/s]

tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward0>)


  7%|▋         | 15/205 [00:04<00:53,  3.58it/s]

tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward0>)


  8%|▊         | 16/205 [00:04<00:56,  3.36it/s]

tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward0>)


  8%|▊         | 17/205 [00:05<00:54,  3.47it/s]

tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward0>)


  9%|▉         | 18/205 [00:05<00:51,  3.66it/s]

tensor(0.0448, device='cuda:0', grad_fn=<NllLossBackward0>)


  9%|▉         | 19/205 [00:05<00:50,  3.67it/s]

tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward0>)


 10%|▉         | 20/205 [00:05<00:49,  3.72it/s]

tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward0>)


 10%|█         | 21/205 [00:06<00:49,  3.74it/s]

tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward0>)


 11%|█         | 22/205 [00:06<00:51,  3.54it/s]

tensor(0.0340, device='cuda:0', grad_fn=<NllLossBackward0>)


 11%|█         | 23/205 [00:06<00:49,  3.66it/s]

tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward0>)


 12%|█▏        | 24/205 [00:06<00:48,  3.73it/s]

tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward0>)


 12%|█▏        | 25/205 [00:07<00:49,  3.65it/s]

tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward0>)


 13%|█▎        | 26/205 [00:07<00:49,  3.61it/s]

tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward0>)


 13%|█▎        | 27/205 [00:07<00:47,  3.72it/s]

tensor(0.0432, device='cuda:0', grad_fn=<NllLossBackward0>)


 14%|█▎        | 28/205 [00:07<00:48,  3.68it/s]

tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward0>)


 14%|█▍        | 29/205 [00:08<00:47,  3.72it/s]

tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward0>)


 15%|█▍        | 30/205 [00:08<00:47,  3.68it/s]

tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward0>)


 15%|█▌        | 31/205 [00:08<00:45,  3.80it/s]

tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward0>)


 16%|█▌        | 32/205 [00:09<00:47,  3.62it/s]

tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward0>)


 16%|█▌        | 33/205 [00:09<00:48,  3.55it/s]

tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward0>)


 17%|█▋        | 34/205 [00:09<00:47,  3.57it/s]

tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward0>)


 17%|█▋        | 35/205 [00:09<00:49,  3.47it/s]

tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward0>)


 18%|█▊        | 36/205 [00:10<00:49,  3.44it/s]

tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward0>)


 18%|█▊        | 37/205 [00:10<00:46,  3.59it/s]

tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward0>)


 19%|█▊        | 38/205 [00:10<00:44,  3.74it/s]

tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward0>)


 19%|█▉        | 39/205 [00:11<00:45,  3.67it/s]

tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward0>)


 20%|█▉        | 40/205 [00:11<00:45,  3.63it/s]

tensor(0.0335, device='cuda:0', grad_fn=<NllLossBackward0>)


 20%|██        | 41/205 [00:11<00:45,  3.62it/s]

tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward0>)


 20%|██        | 42/205 [00:11<00:43,  3.78it/s]

tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward0>)


 21%|██        | 43/205 [00:12<00:43,  3.73it/s]

tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward0>)


 21%|██▏       | 44/205 [00:12<00:43,  3.74it/s]

tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward0>)


 22%|██▏       | 45/205 [00:12<00:43,  3.69it/s]

tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward0>)


 22%|██▏       | 46/205 [00:12<00:43,  3.66it/s]

tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward0>)


 23%|██▎       | 47/205 [00:13<00:42,  3.70it/s]

tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward0>)


 23%|██▎       | 48/205 [00:13<00:41,  3.78it/s]

tensor(0.0471, device='cuda:0', grad_fn=<NllLossBackward0>)


 24%|██▍       | 49/205 [00:13<00:40,  3.85it/s]

tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward0>)


 24%|██▍       | 50/205 [00:13<00:40,  3.85it/s]

tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward0>)


 25%|██▍       | 51/205 [00:14<00:41,  3.73it/s]

tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward0>)


 25%|██▌       | 52/205 [00:14<00:43,  3.54it/s]

tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward0>)


 26%|██▌       | 53/205 [00:14<00:41,  3.65it/s]

tensor(0.0331, device='cuda:0', grad_fn=<NllLossBackward0>)


 26%|██▋       | 54/205 [00:15<00:41,  3.64it/s]

tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward0>)


 27%|██▋       | 55/205 [00:15<00:41,  3.63it/s]

tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward0>)


 27%|██▋       | 56/205 [00:15<00:40,  3.69it/s]

tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward0>)


 28%|██▊       | 57/205 [00:15<00:41,  3.53it/s]

tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward0>)


 28%|██▊       | 58/205 [00:16<00:41,  3.55it/s]

tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward0>)


 29%|██▉       | 59/205 [00:16<00:39,  3.68it/s]

tensor(0.0053, device='cuda:0', grad_fn=<NllLossBackward0>)


 29%|██▉       | 60/205 [00:16<00:40,  3.62it/s]

tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward0>)


 30%|██▉       | 61/205 [00:17<00:39,  3.63it/s]

tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward0>)


 30%|███       | 62/205 [00:17<00:37,  3.78it/s]

tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward0>)


 31%|███       | 63/205 [00:17<00:36,  3.87it/s]

tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward0>)


 31%|███       | 64/205 [00:17<00:37,  3.81it/s]

tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward0>)


 32%|███▏      | 65/205 [00:18<00:35,  3.91it/s]

tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward0>)


 32%|███▏      | 66/205 [00:18<00:36,  3.84it/s]

tensor(0.0130, device='cuda:0', grad_fn=<NllLossBackward0>)


 33%|███▎      | 67/205 [00:18<00:37,  3.73it/s]

tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward0>)


 33%|███▎      | 68/205 [00:18<00:36,  3.80it/s]

tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward0>)


 34%|███▎      | 69/205 [00:19<00:36,  3.77it/s]

tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward0>)


 34%|███▍      | 70/205 [00:19<00:35,  3.85it/s]

tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward0>)


 35%|███▍      | 71/205 [00:19<00:35,  3.77it/s]

tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward0>)


 35%|███▌      | 72/205 [00:19<00:34,  3.83it/s]

tensor(0.0330, device='cuda:0', grad_fn=<NllLossBackward0>)


 36%|███▌      | 73/205 [00:20<00:35,  3.75it/s]

tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward0>)


 36%|███▌      | 74/205 [00:20<00:39,  3.33it/s]

tensor(0.0339, device='cuda:0', grad_fn=<NllLossBackward0>)


 37%|███▋      | 75/205 [00:20<00:37,  3.48it/s]

tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward0>)


 37%|███▋      | 76/205 [00:21<00:36,  3.57it/s]

tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward0>)


 38%|███▊      | 77/205 [00:21<00:35,  3.64it/s]

tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward0>)


 38%|███▊      | 78/205 [00:21<00:35,  3.63it/s]

tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward0>)


 39%|███▊      | 79/205 [00:21<00:35,  3.59it/s]

tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward0>)


 39%|███▉      | 80/205 [00:22<00:34,  3.61it/s]

tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward0>)


 40%|███▉      | 81/205 [00:22<00:33,  3.67it/s]

tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward0>)


 40%|████      | 82/205 [00:22<00:33,  3.66it/s]

tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward0>)


 40%|████      | 83/205 [00:22<00:33,  3.69it/s]

tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward0>)


 41%|████      | 84/205 [00:23<00:34,  3.51it/s]

tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward0>)


 41%|████▏     | 85/205 [00:23<00:34,  3.48it/s]

tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward0>)


 42%|████▏     | 86/205 [00:23<00:33,  3.50it/s]

tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward0>)


 42%|████▏     | 87/205 [00:24<00:33,  3.47it/s]

tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward0>)


 43%|████▎     | 88/205 [00:24<00:32,  3.60it/s]

tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward0>)


 43%|████▎     | 89/205 [00:24<00:32,  3.56it/s]

tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward0>)


 44%|████▍     | 90/205 [00:24<00:31,  3.64it/s]

tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward0>)


 44%|████▍     | 91/205 [00:25<00:31,  3.62it/s]

tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward0>)


 45%|████▍     | 92/205 [00:25<00:32,  3.45it/s]

tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward0>)


 45%|████▌     | 93/205 [00:25<00:32,  3.48it/s]

tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward0>)
tensor(0.0448, device='cuda:0', grad_fn=<NllLossBackward0>)


 46%|████▋     | 95/205 [00:26<00:35,  3.13it/s]

tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward0>)


 47%|████▋     | 96/205 [00:26<00:33,  3.22it/s]

tensor(0.0433, device='cuda:0', grad_fn=<NllLossBackward0>)


 47%|████▋     | 97/205 [00:27<00:33,  3.18it/s]

tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward0>)


 48%|████▊     | 98/205 [00:27<00:34,  3.13it/s]

tensor(0.0418, device='cuda:0', grad_fn=<NllLossBackward0>)


 48%|████▊     | 99/205 [00:27<00:32,  3.25it/s]

tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward0>)


 49%|████▉     | 100/205 [00:28<00:31,  3.35it/s]

tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward0>)


 49%|████▉     | 101/205 [00:28<00:32,  3.22it/s]

tensor(0.0388, device='cuda:0', grad_fn=<NllLossBackward0>)


 50%|████▉     | 102/205 [00:28<00:30,  3.40it/s]

tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward0>)


 50%|█████     | 103/205 [00:28<00:29,  3.44it/s]

tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward0>)


 51%|█████     | 104/205 [00:29<00:28,  3.59it/s]

tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward0>)


 51%|█████     | 105/205 [00:29<00:26,  3.74it/s]

tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward0>)


 52%|█████▏    | 106/205 [00:29<00:26,  3.70it/s]

tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward0>)


 52%|█████▏    | 107/205 [00:29<00:25,  3.84it/s]

tensor(0.0483, device='cuda:0', grad_fn=<NllLossBackward0>)


 53%|█████▎    | 108/205 [00:30<00:26,  3.64it/s]

tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward0>)


 53%|█████▎    | 109/205 [00:30<00:28,  3.42it/s]

tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward0>)


 54%|█████▎    | 110/205 [00:30<00:27,  3.46it/s]

tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward0>)


 54%|█████▍    | 111/205 [00:31<00:26,  3.50it/s]

tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward0>)


 55%|█████▍    | 112/205 [00:31<00:27,  3.40it/s]

tensor(0.0269, device='cuda:0', grad_fn=<NllLossBackward0>)


 55%|█████▌    | 113/205 [00:31<00:26,  3.43it/s]

tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward0>)


 56%|█████▌    | 114/205 [00:32<00:28,  3.14it/s]

tensor(0.0516, device='cuda:0', grad_fn=<NllLossBackward0>)


 56%|█████▌    | 115/205 [00:32<00:26,  3.36it/s]

tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward0>)


 57%|█████▋    | 116/205 [00:32<00:25,  3.45it/s]

tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward0>)


 57%|█████▋    | 117/205 [00:32<00:24,  3.54it/s]

tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward0>)


 58%|█████▊    | 118/205 [00:33<00:27,  3.20it/s]

tensor(0.0305, device='cuda:0', grad_fn=<NllLossBackward0>)


 58%|█████▊    | 119/205 [00:33<00:26,  3.24it/s]

tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward0>)


 59%|█████▊    | 120/205 [00:33<00:26,  3.25it/s]

tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward0>)


 59%|█████▉    | 121/205 [00:34<00:27,  3.06it/s]

tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward0>)


 60%|█████▉    | 122/205 [00:34<00:26,  3.19it/s]

tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward0>)


 60%|██████    | 123/205 [00:34<00:24,  3.36it/s]

tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward0>)


 60%|██████    | 124/205 [00:35<00:23,  3.48it/s]

tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward0>)


 61%|██████    | 125/205 [00:35<00:24,  3.24it/s]

tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward0>)


 61%|██████▏   | 126/205 [00:35<00:23,  3.32it/s]

tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward0>)


 62%|██████▏   | 127/205 [00:35<00:22,  3.49it/s]

tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward0>)


 62%|██████▏   | 128/205 [00:36<00:21,  3.58it/s]

tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward0>)


 63%|██████▎   | 129/205 [00:36<00:20,  3.66it/s]

tensor(0.0147, device='cuda:0', grad_fn=<NllLossBackward0>)


 63%|██████▎   | 130/205 [00:36<00:20,  3.70it/s]

tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward0>)


 64%|██████▍   | 131/205 [00:37<00:19,  3.79it/s]

tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward0>)


 64%|██████▍   | 132/205 [00:37<00:19,  3.74it/s]

tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward0>)


 65%|██████▍   | 133/205 [00:37<00:19,  3.76it/s]

tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward0>)


 65%|██████▌   | 134/205 [00:37<00:18,  3.87it/s]

tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward0>)


 66%|██████▌   | 135/205 [00:38<00:19,  3.54it/s]

tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward0>)


 66%|██████▋   | 136/205 [00:38<00:19,  3.60it/s]

tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward0>)


 67%|██████▋   | 137/205 [00:38<00:18,  3.66it/s]

tensor(0.0059, device='cuda:0', grad_fn=<NllLossBackward0>)


 67%|██████▋   | 138/205 [00:38<00:18,  3.65it/s]

tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward0>)


 68%|██████▊   | 139/205 [00:39<00:18,  3.52it/s]

tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward0>)


 68%|██████▊   | 140/205 [00:39<00:18,  3.42it/s]

tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward0>)


 69%|██████▉   | 141/205 [00:39<00:17,  3.56it/s]

tensor(0.0367, device='cuda:0', grad_fn=<NllLossBackward0>)


 69%|██████▉   | 142/205 [00:40<00:17,  3.60it/s]

tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward0>)


 70%|██████▉   | 143/205 [00:40<00:17,  3.47it/s]

tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward0>)


 70%|███████   | 144/205 [00:40<00:17,  3.51it/s]

tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward0>)


 71%|███████   | 145/205 [00:40<00:16,  3.54it/s]

tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward0>)


 71%|███████   | 146/205 [00:41<00:16,  3.50it/s]

tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward0>)


 72%|███████▏  | 147/205 [00:41<00:18,  3.18it/s]

tensor(0.0379, device='cuda:0', grad_fn=<NllLossBackward0>)


 72%|███████▏  | 148/205 [00:41<00:18,  3.15it/s]

tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward0>)


 73%|███████▎  | 149/205 [00:42<00:17,  3.17it/s]

tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward0>)


 73%|███████▎  | 150/205 [00:42<00:16,  3.25it/s]

tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward0>)


 74%|███████▎  | 151/205 [00:42<00:16,  3.31it/s]

tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward0>)


 74%|███████▍  | 152/205 [00:43<00:16,  3.25it/s]

tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward0>)


 75%|███████▍  | 153/205 [00:43<00:16,  3.24it/s]

tensor(0.0364, device='cuda:0', grad_fn=<NllLossBackward0>)


 75%|███████▌  | 154/205 [00:43<00:14,  3.46it/s]

tensor(0.0024, device='cuda:0', grad_fn=<NllLossBackward0>)


 76%|███████▌  | 155/205 [00:43<00:13,  3.63it/s]

tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward0>)


 76%|███████▌  | 156/205 [00:44<00:13,  3.50it/s]

tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward0>)


 77%|███████▋  | 157/205 [00:44<00:13,  3.49it/s]

tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward0>)


 77%|███████▋  | 158/205 [00:44<00:13,  3.41it/s]

tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward0>)


 78%|███████▊  | 159/205 [00:45<00:13,  3.47it/s]

tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward0>)


 78%|███████▊  | 160/205 [00:45<00:12,  3.62it/s]

tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward0>)


 79%|███████▊  | 161/205 [00:45<00:12,  3.50it/s]

tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward0>)


 79%|███████▉  | 162/205 [00:45<00:12,  3.40it/s]

tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward0>)


 80%|███████▉  | 163/205 [00:46<00:11,  3.58it/s]

tensor(0.0352, device='cuda:0', grad_fn=<NllLossBackward0>)


 80%|████████  | 164/205 [00:46<00:11,  3.66it/s]

tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward0>)


 80%|████████  | 165/205 [00:46<00:11,  3.49it/s]

tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward0>)


 81%|████████  | 166/205 [00:47<00:11,  3.52it/s]

tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward0>)


 81%|████████▏ | 167/205 [00:47<00:10,  3.53it/s]

tensor(0.0090, device='cuda:0', grad_fn=<NllLossBackward0>)


 82%|████████▏ | 168/205 [00:47<00:10,  3.61it/s]

tensor(0.0086, device='cuda:0', grad_fn=<NllLossBackward0>)


 82%|████████▏ | 169/205 [00:47<00:10,  3.46it/s]

tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward0>)


 83%|████████▎ | 170/205 [00:48<00:10,  3.46it/s]

tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward0>)


 83%|████████▎ | 171/205 [00:48<00:09,  3.48it/s]

tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward0>)


 84%|████████▍ | 172/205 [00:48<00:09,  3.58it/s]

tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward0>)


 84%|████████▍ | 173/205 [00:49<00:08,  3.70it/s]

tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward0>)


 85%|████████▍ | 174/205 [00:49<00:08,  3.65it/s]

tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward0>)


 85%|████████▌ | 175/205 [00:49<00:08,  3.36it/s]

tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward0>)


 86%|████████▌ | 176/205 [00:49<00:08,  3.35it/s]

tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward0>)


 86%|████████▋ | 177/205 [00:50<00:08,  3.48it/s]

tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward0>)


 87%|████████▋ | 178/205 [00:50<00:07,  3.51it/s]

tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward0>)


 87%|████████▋ | 179/205 [00:50<00:07,  3.58it/s]

tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward0>)


 88%|████████▊ | 180/205 [00:51<00:07,  3.50it/s]

tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward0>)


 88%|████████▊ | 181/205 [00:51<00:06,  3.53it/s]

tensor(0.0276, device='cuda:0', grad_fn=<NllLossBackward0>)


 89%|████████▉ | 182/205 [00:51<00:06,  3.61it/s]

tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward0>)


 89%|████████▉ | 183/205 [00:51<00:05,  3.68it/s]

tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward0>)


 90%|████████▉ | 184/205 [00:52<00:05,  3.82it/s]

tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward0>)


 90%|█████████ | 185/205 [00:52<00:05,  3.88it/s]

tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward0>)


 91%|█████████ | 186/205 [00:52<00:04,  3.86it/s]

tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward0>)


 91%|█████████ | 187/205 [00:52<00:04,  3.96it/s]

tensor(0.0416, device='cuda:0', grad_fn=<NllLossBackward0>)


 92%|█████████▏| 188/205 [00:53<00:04,  3.83it/s]

tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward0>)


 92%|█████████▏| 189/205 [00:53<00:04,  3.83it/s]

tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward0>)


 93%|█████████▎| 190/205 [00:53<00:03,  3.76it/s]

tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward0>)


 93%|█████████▎| 191/205 [00:53<00:03,  3.60it/s]

tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward0>)


 94%|█████████▎| 192/205 [00:54<00:03,  3.70it/s]

tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward0>)


 94%|█████████▍| 193/205 [00:54<00:03,  3.55it/s]

tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward0>)


 95%|█████████▍| 194/205 [00:54<00:03,  3.55it/s]

tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward0>)


 95%|█████████▌| 195/205 [00:55<00:02,  3.56it/s]

tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward0>)


 96%|█████████▌| 196/205 [00:55<00:02,  3.64it/s]

tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward0>)


 96%|█████████▌| 197/205 [00:55<00:02,  3.68it/s]

tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward0>)


 97%|█████████▋| 198/205 [00:55<00:01,  3.81it/s]

tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward0>)


 97%|█████████▋| 199/205 [00:56<00:01,  3.72it/s]

tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward0>)


 98%|█████████▊| 200/205 [00:56<00:01,  3.79it/s]

tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward0>)


 98%|█████████▊| 201/205 [00:56<00:01,  3.82it/s]

tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward0>)


 99%|█████████▊| 202/205 [00:56<00:00,  3.89it/s]

tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward0>)


 99%|█████████▉| 203/205 [00:57<00:00,  3.69it/s]

tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward0>)


100%|█████████▉| 204/205 [00:57<00:00,  3.66it/s]

tensor(0.0317, device='cuda:0', grad_fn=<NllLossBackward0>)


100%|██████████| 205/205 [00:57<00:00,  3.55it/s]


tensor(0.0430, device='cuda:0', grad_fn=<NllLossBackward0>)
1


  0%|          | 1/205 [00:00<00:56,  3.59it/s]

tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward0>)


  1%|          | 2/205 [00:00<01:08,  2.96it/s]

tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward0>)


  1%|▏         | 3/205 [00:00<01:04,  3.16it/s]

tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward0>)


  2%|▏         | 4/205 [00:01<00:59,  3.38it/s]

tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward0>)


  2%|▏         | 5/205 [00:01<00:57,  3.47it/s]

tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward0>)


  3%|▎         | 6/205 [00:01<00:59,  3.37it/s]

tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward0>)


  3%|▎         | 7/205 [00:02<00:58,  3.39it/s]

tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward0>)


  4%|▍         | 8/205 [00:02<01:01,  3.23it/s]

tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward0>)


  4%|▍         | 9/205 [00:02<01:01,  3.21it/s]

tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward0>)


  5%|▍         | 10/205 [00:03<01:01,  3.17it/s]

tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward0>)


  5%|▌         | 11/205 [00:03<00:59,  3.29it/s]

tensor(0.0163, device='cuda:0', grad_fn=<NllLossBackward0>)


  6%|▌         | 12/205 [00:03<00:55,  3.47it/s]

tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward0>)


  6%|▋         | 13/205 [00:03<00:54,  3.53it/s]

tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward0>)


  7%|▋         | 14/205 [00:04<00:52,  3.61it/s]

tensor(0.0061, device='cuda:0', grad_fn=<NllLossBackward0>)


  7%|▋         | 15/205 [00:04<00:53,  3.58it/s]

tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward0>)


  8%|▊         | 16/205 [00:04<00:56,  3.36it/s]

tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward0>)


  8%|▊         | 17/205 [00:05<00:54,  3.47it/s]

tensor(0.0323, device='cuda:0', grad_fn=<NllLossBackward0>)


  9%|▉         | 18/205 [00:05<00:51,  3.66it/s]

tensor(0.0435, device='cuda:0', grad_fn=<NllLossBackward0>)


  9%|▉         | 19/205 [00:05<00:50,  3.67it/s]

tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward0>)


 10%|▉         | 20/205 [00:05<00:49,  3.72it/s]

tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward0>)


 10%|█         | 21/205 [00:06<00:49,  3.74it/s]

tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward0>)


 11%|█         | 22/205 [00:06<00:51,  3.54it/s]

tensor(0.0314, device='cuda:0', grad_fn=<NllLossBackward0>)


 11%|█         | 23/205 [00:06<00:49,  3.66it/s]

tensor(0.0371, device='cuda:0', grad_fn=<NllLossBackward0>)


 12%|█▏        | 24/205 [00:06<00:48,  3.72it/s]

tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward0>)


 12%|█▏        | 25/205 [00:07<00:49,  3.65it/s]

tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward0>)


 13%|█▎        | 26/205 [00:07<00:49,  3.60it/s]

tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward0>)


 13%|█▎        | 27/205 [00:07<00:47,  3.72it/s]

tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward0>)


 14%|█▎        | 28/205 [00:07<00:48,  3.68it/s]

tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward0>)


 14%|█▍        | 29/205 [00:08<00:47,  3.72it/s]

tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward0>)


 15%|█▍        | 30/205 [00:08<00:47,  3.68it/s]

tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward0>)


 15%|█▌        | 31/205 [00:08<00:45,  3.80it/s]

tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward0>)


 16%|█▌        | 32/205 [00:09<00:47,  3.62it/s]

tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward0>)


 16%|█▌        | 33/205 [00:09<00:48,  3.55it/s]

tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward0>)


 17%|█▋        | 34/205 [00:09<00:47,  3.57it/s]

tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward0>)


 17%|█▋        | 35/205 [00:09<00:49,  3.46it/s]

tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward0>)


 18%|█▊        | 36/205 [00:10<00:49,  3.44it/s]

tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward0>)


 18%|█▊        | 37/205 [00:10<00:46,  3.59it/s]

tensor(0.0420, device='cuda:0', grad_fn=<NllLossBackward0>)


 19%|█▊        | 38/205 [00:10<00:44,  3.74it/s]

tensor(0.0368, device='cuda:0', grad_fn=<NllLossBackward0>)


 19%|█▉        | 39/205 [00:11<00:45,  3.67it/s]

tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward0>)


 20%|█▉        | 40/205 [00:11<00:45,  3.63it/s]

tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward0>)


 20%|██        | 41/205 [00:11<00:45,  3.62it/s]

tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward0>)


 20%|██        | 42/205 [00:11<00:43,  3.78it/s]

tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward0>)


 21%|██        | 43/205 [00:12<00:43,  3.73it/s]

tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward0>)


 21%|██▏       | 44/205 [00:12<00:43,  3.74it/s]

tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward0>)


 22%|██▏       | 45/205 [00:12<00:43,  3.69it/s]

tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward0>)


 22%|██▏       | 46/205 [00:12<00:43,  3.66it/s]

tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward0>)


 23%|██▎       | 47/205 [00:13<00:42,  3.70it/s]

tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward0>)


 23%|██▎       | 48/205 [00:13<00:41,  3.78it/s]

tensor(0.0445, device='cuda:0', grad_fn=<NllLossBackward0>)


 24%|██▍       | 49/205 [00:13<00:40,  3.85it/s]

tensor(0.0081, device='cuda:0', grad_fn=<NllLossBackward0>)


 24%|██▍       | 50/205 [00:13<00:40,  3.85it/s]

tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward0>)


 25%|██▍       | 51/205 [00:14<00:41,  3.73it/s]

tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward0>)


 25%|██▌       | 52/205 [00:14<00:43,  3.54it/s]

tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward0>)


 26%|██▌       | 53/205 [00:14<00:41,  3.65it/s]

tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward0>)


 26%|██▋       | 54/205 [00:15<00:41,  3.64it/s]

tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward0>)


 27%|██▋       | 55/205 [00:15<00:41,  3.62it/s]

tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward0>)


 27%|██▋       | 56/205 [00:15<00:40,  3.69it/s]

tensor(0.0139, device='cuda:0', grad_fn=<NllLossBackward0>)


 28%|██▊       | 57/205 [00:15<00:41,  3.53it/s]

tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward0>)


 28%|██▊       | 58/205 [00:16<00:41,  3.55it/s]

tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward0>)


 29%|██▉       | 59/205 [00:16<00:39,  3.68it/s]

tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward0>)


 29%|██▉       | 60/205 [00:16<00:40,  3.62it/s]

tensor(0.0289, device='cuda:0', grad_fn=<NllLossBackward0>)


 30%|██▉       | 61/205 [00:17<00:39,  3.63it/s]

tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward0>)


 30%|███       | 62/205 [00:17<00:37,  3.78it/s]

tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward0>)


 31%|███       | 63/205 [00:17<00:36,  3.87it/s]

tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward0>)


 31%|███       | 64/205 [00:17<00:37,  3.81it/s]

tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward0>)


 32%|███▏      | 65/205 [00:18<00:35,  3.91it/s]

tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward0>)


 32%|███▏      | 66/205 [00:18<00:36,  3.84it/s]

tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward0>)


 33%|███▎      | 67/205 [00:18<00:36,  3.73it/s]

tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward0>)


 33%|███▎      | 68/205 [00:18<00:36,  3.80it/s]

tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward0>)


 34%|███▎      | 69/205 [00:19<00:36,  3.78it/s]

tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward0>)


 34%|███▍      | 70/205 [00:19<00:35,  3.85it/s]

tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward0>)


 35%|███▍      | 71/205 [00:19<00:35,  3.77it/s]

tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward0>)


 35%|███▌      | 72/205 [00:19<00:34,  3.83it/s]

tensor(0.0320, device='cuda:0', grad_fn=<NllLossBackward0>)


 36%|███▌      | 73/205 [00:20<00:35,  3.75it/s]

tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward0>)


 36%|███▌      | 74/205 [00:20<00:39,  3.33it/s]

tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward0>)


 37%|███▋      | 75/205 [00:20<00:37,  3.48it/s]

tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward0>)


 37%|███▋      | 76/205 [00:21<00:36,  3.57it/s]

tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward0>)


 38%|███▊      | 77/205 [00:21<00:35,  3.64it/s]

tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward0>)


 38%|███▊      | 78/205 [00:21<00:35,  3.63it/s]

tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward0>)


 39%|███▊      | 79/205 [00:21<00:35,  3.59it/s]

tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward0>)


 39%|███▉      | 80/205 [00:22<00:34,  3.61it/s]

tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward0>)


 40%|███▉      | 81/205 [00:22<00:33,  3.67it/s]

tensor(0.0056, device='cuda:0', grad_fn=<NllLossBackward0>)


 40%|████      | 82/205 [00:22<00:33,  3.66it/s]

tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward0>)


 40%|████      | 83/205 [00:22<00:33,  3.69it/s]

tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward0>)


 41%|████      | 84/205 [00:23<00:34,  3.51it/s]

tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward0>)


 41%|████▏     | 85/205 [00:23<00:34,  3.48it/s]

tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward0>)


 42%|████▏     | 86/205 [00:23<00:33,  3.50it/s]

tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward0>)


 42%|████▏     | 87/205 [00:24<00:33,  3.47it/s]

tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward0>)


 43%|████▎     | 88/205 [00:24<00:32,  3.60it/s]

tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward0>)


 43%|████▎     | 89/205 [00:24<00:32,  3.57it/s]

tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward0>)


 44%|████▍     | 90/205 [00:24<00:31,  3.64it/s]

tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward0>)


 44%|████▍     | 91/205 [00:25<00:31,  3.62it/s]

tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward0>)


 45%|████▍     | 92/205 [00:25<00:32,  3.46it/s]

tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward0>)


 45%|████▌     | 93/205 [00:25<00:32,  3.49it/s]

tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward0>)
tensor(0.0376, device='cuda:0', grad_fn=<NllLossBackward0>)


 46%|████▋     | 95/205 [00:26<00:35,  3.13it/s]

tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward0>)


 47%|████▋     | 96/205 [00:26<00:33,  3.23it/s]

tensor(0.0303, device='cuda:0', grad_fn=<NllLossBackward0>)


 47%|████▋     | 97/205 [00:27<00:33,  3.18it/s]

tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward0>)


 48%|████▊     | 98/205 [00:27<00:34,  3.13it/s]

tensor(0.0548, device='cuda:0', grad_fn=<NllLossBackward0>)


 48%|████▊     | 99/205 [00:27<00:32,  3.24it/s]

tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward0>)


 49%|████▉     | 100/205 [00:28<00:31,  3.34it/s]

tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward0>)


 49%|████▉     | 101/205 [00:28<00:32,  3.22it/s]

tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward0>)


 50%|████▉     | 102/205 [00:28<00:30,  3.40it/s]

tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward0>)


 50%|█████     | 103/205 [00:28<00:29,  3.43it/s]

tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward0>)


 51%|█████     | 104/205 [00:29<00:28,  3.58it/s]

tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward0>)


 51%|█████     | 105/205 [00:29<00:26,  3.74it/s]

tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward0>)


 52%|█████▏    | 106/205 [00:29<00:26,  3.70it/s]

tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward0>)


 52%|█████▏    | 107/205 [00:29<00:25,  3.84it/s]

tensor(0.0486, device='cuda:0', grad_fn=<NllLossBackward0>)


 53%|█████▎    | 108/205 [00:30<00:26,  3.64it/s]

tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward0>)


 53%|█████▎    | 109/205 [00:30<00:28,  3.42it/s]

tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward0>)


 54%|█████▎    | 110/205 [00:30<00:27,  3.46it/s]

tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward0>)


 54%|█████▍    | 111/205 [00:31<00:26,  3.50it/s]

tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward0>)


 55%|█████▍    | 112/205 [00:31<00:27,  3.40it/s]

tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward0>)


 55%|█████▌    | 113/205 [00:31<00:26,  3.43it/s]

tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward0>)


 56%|█████▌    | 114/205 [00:32<00:28,  3.14it/s]

tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward0>)


 56%|█████▌    | 115/205 [00:32<00:26,  3.36it/s]

tensor(0.0623, device='cuda:0', grad_fn=<NllLossBackward0>)


 57%|█████▋    | 116/205 [00:32<00:25,  3.44it/s]

tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward0>)


 57%|█████▋    | 117/205 [00:32<00:24,  3.54it/s]

tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward0>)


 58%|█████▊    | 118/205 [00:33<00:27,  3.20it/s]

tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward0>)


 58%|█████▊    | 119/205 [00:33<00:26,  3.23it/s]

tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward0>)


 59%|█████▊    | 120/205 [00:33<00:26,  3.23it/s]

tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward0>)


 59%|█████▉    | 121/205 [00:34<00:27,  3.05it/s]

tensor(0.0291, device='cuda:0', grad_fn=<NllLossBackward0>)


 60%|█████▉    | 122/205 [00:34<00:26,  3.18it/s]

tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward0>)


 60%|██████    | 123/205 [00:34<00:24,  3.35it/s]

tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward0>)


 60%|██████    | 124/205 [00:35<00:23,  3.47it/s]

tensor(0.0146, device='cuda:0', grad_fn=<NllLossBackward0>)


 61%|██████    | 125/205 [00:35<00:24,  3.24it/s]

tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward0>)


 61%|██████▏   | 126/205 [00:35<00:23,  3.32it/s]

tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward0>)


 62%|██████▏   | 127/205 [00:35<00:22,  3.49it/s]

tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward0>)


 62%|██████▏   | 128/205 [00:36<00:21,  3.58it/s]

tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward0>)


 63%|██████▎   | 129/205 [00:36<00:20,  3.65it/s]

tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward0>)


 63%|██████▎   | 130/205 [00:36<00:20,  3.69it/s]

tensor(0.0311, device='cuda:0', grad_fn=<NllLossBackward0>)


 64%|██████▍   | 131/205 [00:37<00:19,  3.79it/s]

tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward0>)


 64%|██████▍   | 132/205 [00:37<00:19,  3.73it/s]

tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward0>)


 65%|██████▍   | 133/205 [00:37<00:19,  3.75it/s]

tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward0>)


 65%|██████▌   | 134/205 [00:37<00:18,  3.86it/s]

tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward0>)


 66%|██████▌   | 135/205 [00:38<00:19,  3.54it/s]

tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward0>)


 66%|██████▋   | 136/205 [00:38<00:19,  3.59it/s]

tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward0>)


 67%|██████▋   | 137/205 [00:38<00:18,  3.66it/s]

tensor(0.0051, device='cuda:0', grad_fn=<NllLossBackward0>)


 67%|██████▋   | 138/205 [00:38<00:18,  3.65it/s]

tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward0>)


 68%|██████▊   | 139/205 [00:39<00:18,  3.52it/s]

tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward0>)


 68%|██████▊   | 140/205 [00:39<00:19,  3.42it/s]

tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward0>)


 69%|██████▉   | 141/205 [00:39<00:17,  3.56it/s]

tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward0>)


 69%|██████▉   | 142/205 [00:40<00:17,  3.60it/s]

tensor(0.0186, device='cuda:0', grad_fn=<NllLossBackward0>)


 70%|██████▉   | 143/205 [00:40<00:17,  3.47it/s]

tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward0>)


 70%|███████   | 144/205 [00:40<00:17,  3.51it/s]

tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward0>)


 71%|███████   | 145/205 [00:40<00:16,  3.54it/s]

tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward0>)


 71%|███████   | 146/205 [00:41<00:16,  3.50it/s]

tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward0>)


 72%|███████▏  | 147/205 [00:41<00:18,  3.18it/s]

tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward0>)


 72%|███████▏  | 148/205 [00:41<00:18,  3.15it/s]

tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward0>)


 73%|███████▎  | 149/205 [00:42<00:17,  3.17it/s]

tensor(0.0316, device='cuda:0', grad_fn=<NllLossBackward0>)


 73%|███████▎  | 150/205 [00:42<00:16,  3.25it/s]

tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward0>)


 74%|███████▎  | 151/205 [00:42<00:16,  3.32it/s]

tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward0>)


 74%|███████▍  | 152/205 [00:43<00:16,  3.25it/s]

tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward0>)


 75%|███████▍  | 153/205 [00:43<00:16,  3.24it/s]

tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward0>)


 75%|███████▌  | 154/205 [00:43<00:14,  3.47it/s]

tensor(0.0022, device='cuda:0', grad_fn=<NllLossBackward0>)


 76%|███████▌  | 155/205 [00:43<00:13,  3.63it/s]

tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward0>)


 76%|███████▌  | 156/205 [00:44<00:13,  3.50it/s]

tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward0>)


 77%|███████▋  | 157/205 [00:44<00:13,  3.48it/s]

tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward0>)


 77%|███████▋  | 158/205 [00:44<00:13,  3.40it/s]

tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward0>)


 78%|███████▊  | 159/205 [00:45<00:13,  3.47it/s]

tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward0>)


 78%|███████▊  | 160/205 [00:45<00:12,  3.62it/s]

tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward0>)


 79%|███████▊  | 161/205 [00:45<00:12,  3.50it/s]

tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward0>)


 79%|███████▉  | 162/205 [00:46<00:12,  3.39it/s]

tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward0>)


 80%|███████▉  | 163/205 [00:46<00:11,  3.58it/s]

tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward0>)


 80%|████████  | 164/205 [00:46<00:11,  3.66it/s]

tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward0>)


 80%|████████  | 165/205 [00:46<00:11,  3.49it/s]

tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward0>)


 81%|████████  | 166/205 [00:47<00:11,  3.52it/s]

tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward0>)


 81%|████████▏ | 167/205 [00:47<00:10,  3.53it/s]

tensor(0.0076, device='cuda:0', grad_fn=<NllLossBackward0>)


 82%|████████▏ | 168/205 [00:47<00:10,  3.62it/s]

tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward0>)


 82%|████████▏ | 169/205 [00:47<00:10,  3.46it/s]

tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward0>)


 83%|████████▎ | 170/205 [00:48<00:10,  3.46it/s]

tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward0>)


 83%|████████▎ | 171/205 [00:48<00:09,  3.48it/s]

tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward0>)


 84%|████████▍ | 172/205 [00:48<00:09,  3.58it/s]

tensor(0.0157, device='cuda:0', grad_fn=<NllLossBackward0>)


 84%|████████▍ | 173/205 [00:49<00:08,  3.70it/s]

tensor(0.0240, device='cuda:0', grad_fn=<NllLossBackward0>)


 85%|████████▍ | 174/205 [00:49<00:08,  3.65it/s]

tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward0>)


 85%|████████▌ | 175/205 [00:49<00:08,  3.36it/s]

tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward0>)


 86%|████████▌ | 176/205 [00:49<00:08,  3.35it/s]

tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward0>)


 86%|████████▋ | 177/205 [00:50<00:08,  3.48it/s]

tensor(0.0753, device='cuda:0', grad_fn=<NllLossBackward0>)


 87%|████████▋ | 178/205 [00:50<00:07,  3.51it/s]

tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward0>)


 87%|████████▋ | 179/205 [00:50<00:07,  3.58it/s]

tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward0>)


 88%|████████▊ | 180/205 [00:51<00:07,  3.50it/s]

tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward0>)


 88%|████████▊ | 181/205 [00:51<00:06,  3.53it/s]

tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward0>)


 89%|████████▉ | 182/205 [00:51<00:06,  3.61it/s]

tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward0>)


 89%|████████▉ | 183/205 [00:51<00:05,  3.68it/s]

tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward0>)


 90%|████████▉ | 184/205 [00:52<00:05,  3.82it/s]

tensor(0.0347, device='cuda:0', grad_fn=<NllLossBackward0>)


 90%|█████████ | 185/205 [00:52<00:05,  3.88it/s]

tensor(0.0050, device='cuda:0', grad_fn=<NllLossBackward0>)


 91%|█████████ | 186/205 [00:52<00:04,  3.86it/s]

tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward0>)


 91%|█████████ | 187/205 [00:52<00:04,  3.96it/s]

tensor(0.0418, device='cuda:0', grad_fn=<NllLossBackward0>)


 92%|█████████▏| 188/205 [00:53<00:04,  3.83it/s]

tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward0>)


 92%|█████████▏| 189/205 [00:53<00:04,  3.83it/s]

tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward0>)


 93%|█████████▎| 190/205 [00:53<00:03,  3.76it/s]

tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward0>)


 93%|█████████▎| 191/205 [00:53<00:03,  3.60it/s]

tensor(0.0121, device='cuda:0', grad_fn=<NllLossBackward0>)


 94%|█████████▎| 192/205 [00:54<00:03,  3.70it/s]

tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward0>)


 94%|█████████▍| 193/205 [00:54<00:03,  3.55it/s]

tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward0>)


 95%|█████████▍| 194/205 [00:54<00:03,  3.55it/s]

tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward0>)


 95%|█████████▌| 195/205 [00:55<00:02,  3.56it/s]

tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward0>)


 96%|█████████▌| 196/205 [00:55<00:02,  3.64it/s]

tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward0>)


 96%|█████████▌| 197/205 [00:55<00:02,  3.68it/s]

tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward0>)


 97%|█████████▋| 198/205 [00:55<00:01,  3.81it/s]

tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward0>)


 97%|█████████▋| 199/205 [00:56<00:01,  3.73it/s]

tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward0>)


 98%|█████████▊| 200/205 [00:56<00:01,  3.79it/s]

tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward0>)


 98%|█████████▊| 201/205 [00:56<00:01,  3.82it/s]

tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward0>)


 99%|█████████▊| 202/205 [00:56<00:00,  3.89it/s]

tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward0>)


 99%|█████████▉| 203/205 [00:57<00:00,  3.69it/s]

tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward0>)


100%|█████████▉| 204/205 [00:57<00:00,  3.66it/s]

tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward0>)


100%|██████████| 205/205 [00:57<00:00,  3.55it/s]


tensor(0.0433, device='cuda:0', grad_fn=<NllLossBackward0>)
2


  0%|          | 1/205 [00:00<00:56,  3.60it/s]

tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward0>)


  1%|          | 2/205 [00:00<01:08,  2.96it/s]

tensor(0.0333, device='cuda:0', grad_fn=<NllLossBackward0>)


  1%|▏         | 3/205 [00:00<01:03,  3.16it/s]

tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward0>)


  2%|▏         | 4/205 [00:01<00:59,  3.38it/s]

tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward0>)


  2%|▏         | 5/205 [00:01<00:57,  3.48it/s]

tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward0>)


  3%|▎         | 6/205 [00:01<00:59,  3.37it/s]

tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward0>)


  3%|▎         | 7/205 [00:02<00:58,  3.39it/s]

tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward0>)


  4%|▍         | 8/205 [00:02<01:01,  3.23it/s]

tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward0>)


  4%|▍         | 9/205 [00:02<01:01,  3.21it/s]

tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward0>)


  5%|▍         | 10/205 [00:03<01:01,  3.17it/s]

tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward0>)


  5%|▌         | 11/205 [00:03<00:59,  3.29it/s]

tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward0>)


  6%|▌         | 12/205 [00:03<00:55,  3.48it/s]

tensor(0.0062, device='cuda:0', grad_fn=<NllLossBackward0>)


  6%|▋         | 13/205 [00:03<00:54,  3.53it/s]

tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward0>)


  7%|▋         | 14/205 [00:04<00:52,  3.61it/s]

tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward0>)


  7%|▋         | 15/205 [00:04<00:53,  3.58it/s]

tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward0>)


  8%|▊         | 16/205 [00:04<00:56,  3.36it/s]

tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward0>)


  8%|▊         | 17/205 [00:05<00:54,  3.47it/s]

tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward0>)


  9%|▉         | 18/205 [00:05<00:51,  3.67it/s]

tensor(0.0433, device='cuda:0', grad_fn=<NllLossBackward0>)


  9%|▉         | 19/205 [00:05<00:50,  3.67it/s]

tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward0>)


 10%|▉         | 20/205 [00:05<00:49,  3.72it/s]

tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward0>)


 10%|█         | 21/205 [00:06<00:49,  3.74it/s]

tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward0>)


 11%|█         | 22/205 [00:06<00:51,  3.54it/s]

tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward0>)


 11%|█         | 23/205 [00:06<00:49,  3.66it/s]

tensor(0.0361, device='cuda:0', grad_fn=<NllLossBackward0>)


 12%|█▏        | 24/205 [00:06<00:48,  3.73it/s]

tensor(0.0275, device='cuda:0', grad_fn=<NllLossBackward0>)


 12%|█▏        | 25/205 [00:07<00:49,  3.65it/s]

tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward0>)


 13%|█▎        | 26/205 [00:07<00:49,  3.60it/s]

tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward0>)


 13%|█▎        | 27/205 [00:07<00:47,  3.72it/s]

tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward0>)


 14%|█▎        | 28/205 [00:07<00:48,  3.68it/s]

tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward0>)


 14%|█▍        | 29/205 [00:08<00:47,  3.72it/s]

tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward0>)


 15%|█▍        | 30/205 [00:08<00:47,  3.68it/s]

tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward0>)


 15%|█▌        | 31/205 [00:08<00:45,  3.80it/s]

tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward0>)


 16%|█▌        | 32/205 [00:09<00:47,  3.62it/s]

tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward0>)


 16%|█▌        | 33/205 [00:09<00:48,  3.55it/s]

tensor(0.0110, device='cuda:0', grad_fn=<NllLossBackward0>)


 17%|█▋        | 34/205 [00:09<00:47,  3.57it/s]

tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward0>)


 17%|█▋        | 35/205 [00:09<00:49,  3.46it/s]

tensor(0.0188, device='cuda:0', grad_fn=<NllLossBackward0>)


 18%|█▊        | 36/205 [00:10<00:49,  3.44it/s]

tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward0>)


 18%|█▊        | 37/205 [00:10<00:46,  3.59it/s]

tensor(0.0409, device='cuda:0', grad_fn=<NllLossBackward0>)


 19%|█▊        | 38/205 [00:10<00:44,  3.74it/s]

tensor(0.0365, device='cuda:0', grad_fn=<NllLossBackward0>)


 19%|█▉        | 39/205 [00:11<00:45,  3.67it/s]

tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward0>)


 20%|█▉        | 40/205 [00:11<00:45,  3.63it/s]

tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward0>)


 20%|██        | 41/205 [00:11<00:45,  3.62it/s]

tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward0>)


 20%|██        | 42/205 [00:11<00:43,  3.78it/s]

tensor(0.0382, device='cuda:0', grad_fn=<NllLossBackward0>)


 21%|██        | 43/205 [00:12<00:43,  3.73it/s]

tensor(0.0315, device='cuda:0', grad_fn=<NllLossBackward0>)


 21%|██▏       | 44/205 [00:12<00:43,  3.74it/s]

tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward0>)


 22%|██▏       | 45/205 [00:12<00:43,  3.69it/s]

tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward0>)


 22%|██▏       | 46/205 [00:12<00:43,  3.66it/s]

tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward0>)


 23%|██▎       | 47/205 [00:13<00:42,  3.70it/s]

tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward0>)


 23%|██▎       | 48/205 [00:13<00:41,  3.78it/s]

tensor(0.0409, device='cuda:0', grad_fn=<NllLossBackward0>)


 24%|██▍       | 49/205 [00:13<00:40,  3.85it/s]

tensor(0.0072, device='cuda:0', grad_fn=<NllLossBackward0>)


 24%|██▍       | 50/205 [00:13<00:40,  3.85it/s]

tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward0>)


 25%|██▍       | 51/205 [00:14<00:41,  3.73it/s]

tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward0>)


 25%|██▌       | 52/205 [00:14<00:43,  3.54it/s]

tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward0>)


 26%|██▌       | 53/205 [00:14<00:41,  3.65it/s]

tensor(0.0295, device='cuda:0', grad_fn=<NllLossBackward0>)


 26%|██▋       | 54/205 [00:15<00:41,  3.64it/s]

tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward0>)


 27%|██▋       | 55/205 [00:15<00:41,  3.62it/s]

tensor(0.0164, device='cuda:0', grad_fn=<NllLossBackward0>)


 27%|██▋       | 56/205 [00:15<00:40,  3.69it/s]

tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward0>)


 28%|██▊       | 57/205 [00:15<00:41,  3.53it/s]

tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward0>)


 28%|██▊       | 58/205 [00:16<00:41,  3.55it/s]

tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward0>)


 29%|██▉       | 59/205 [00:16<00:39,  3.68it/s]

tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward0>)


 29%|██▉       | 60/205 [00:16<00:40,  3.62it/s]

tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward0>)


 30%|██▉       | 61/205 [00:17<00:39,  3.63it/s]

tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward0>)


 30%|███       | 62/205 [00:17<00:37,  3.78it/s]

tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward0>)


 31%|███       | 63/205 [00:17<00:36,  3.87it/s]

tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward0>)


 31%|███       | 64/205 [00:17<00:37,  3.81it/s]

tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward0>)


 32%|███▏      | 65/205 [00:18<00:35,  3.91it/s]

tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward0>)


 32%|███▏      | 66/205 [00:18<00:36,  3.84it/s]

tensor(0.0101, device='cuda:0', grad_fn=<NllLossBackward0>)


 33%|███▎      | 67/205 [00:18<00:36,  3.73it/s]

tensor(0.0182, device='cuda:0', grad_fn=<NllLossBackward0>)


 33%|███▎      | 68/205 [00:18<00:36,  3.80it/s]

tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward0>)


 34%|███▎      | 69/205 [00:19<00:36,  3.77it/s]

tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward0>)


 34%|███▍      | 70/205 [00:19<00:35,  3.85it/s]

tensor(0.0093, device='cuda:0', grad_fn=<NllLossBackward0>)


 35%|███▍      | 71/205 [00:19<00:35,  3.77it/s]

tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward0>)


 35%|███▌      | 72/205 [00:19<00:34,  3.83it/s]

tensor(0.0313, device='cuda:0', grad_fn=<NllLossBackward0>)


 36%|███▌      | 73/205 [00:20<00:35,  3.75it/s]

tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward0>)


 36%|███▌      | 74/205 [00:20<00:39,  3.33it/s]

tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward0>)


 37%|███▋      | 75/205 [00:20<00:37,  3.47it/s]

tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward0>)


 37%|███▋      | 76/205 [00:21<00:36,  3.57it/s]

tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward0>)


 38%|███▊      | 77/205 [00:21<00:35,  3.64it/s]

tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward0>)


 38%|███▊      | 78/205 [00:21<00:35,  3.63it/s]

tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward0>)


 39%|███▊      | 79/205 [00:21<00:35,  3.59it/s]

tensor(0.0257, device='cuda:0', grad_fn=<NllLossBackward0>)


 39%|███▉      | 80/205 [00:22<00:34,  3.61it/s]

tensor(0.0144, device='cuda:0', grad_fn=<NllLossBackward0>)


 40%|███▉      | 81/205 [00:22<00:33,  3.67it/s]

tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward0>)


 40%|████      | 82/205 [00:22<00:33,  3.66it/s]

tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward0>)


 40%|████      | 83/205 [00:22<00:33,  3.69it/s]

tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward0>)


 41%|████      | 84/205 [00:23<00:34,  3.51it/s]

tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward0>)


 41%|████▏     | 85/205 [00:23<00:34,  3.48it/s]

tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward0>)


 42%|████▏     | 86/205 [00:23<00:34,  3.50it/s]

tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward0>)


 42%|████▏     | 87/205 [00:24<00:33,  3.47it/s]

tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward0>)


 43%|████▎     | 88/205 [00:24<00:32,  3.60it/s]

tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward0>)


 43%|████▎     | 89/205 [00:24<00:32,  3.56it/s]

tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward0>)


 44%|████▍     | 90/205 [00:24<00:31,  3.64it/s]

tensor(0.0286, device='cuda:0', grad_fn=<NllLossBackward0>)


 44%|████▍     | 91/205 [00:25<00:31,  3.62it/s]

tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward0>)


 45%|████▍     | 92/205 [00:25<00:32,  3.46it/s]

tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward0>)


 45%|████▌     | 93/205 [00:25<00:32,  3.48it/s]

tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward0>)
tensor(0.0324, device='cuda:0', grad_fn=<NllLossBackward0>)


 46%|████▋     | 95/205 [00:26<00:35,  3.13it/s]

tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward0>)


 47%|████▋     | 96/205 [00:26<00:33,  3.23it/s]

tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward0>)


 47%|████▋     | 97/205 [00:27<00:33,  3.18it/s]

tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward0>)


 48%|████▊     | 98/205 [00:27<00:34,  3.13it/s]

tensor(0.0329, device='cuda:0', grad_fn=<NllLossBackward0>)


 48%|████▊     | 99/205 [00:27<00:32,  3.25it/s]

tensor(0.0104, device='cuda:0', grad_fn=<NllLossBackward0>)


 49%|████▉     | 100/205 [00:28<00:31,  3.35it/s]

tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward0>)


 49%|████▉     | 101/205 [00:28<00:32,  3.22it/s]

tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward0>)


 50%|████▉     | 102/205 [00:28<00:30,  3.40it/s]

tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward0>)


 50%|█████     | 103/205 [00:28<00:29,  3.43it/s]

tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward0>)


 51%|█████     | 104/205 [00:29<00:28,  3.59it/s]

tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward0>)


 51%|█████     | 105/205 [00:29<00:26,  3.74it/s]

tensor(0.0098, device='cuda:0', grad_fn=<NllLossBackward0>)


 52%|█████▏    | 106/205 [00:29<00:26,  3.70it/s]

tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward0>)


 52%|█████▏    | 107/205 [00:29<00:25,  3.84it/s]

tensor(0.0480, device='cuda:0', grad_fn=<NllLossBackward0>)


 53%|█████▎    | 108/205 [00:30<00:26,  3.63it/s]

tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward0>)


 53%|█████▎    | 109/205 [00:30<00:28,  3.42it/s]

tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward0>)


 54%|█████▎    | 110/205 [00:30<00:27,  3.46it/s]

tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward0>)


 54%|█████▍    | 111/205 [00:31<00:26,  3.50it/s]

tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward0>)


 55%|█████▍    | 112/205 [00:31<00:27,  3.40it/s]

tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward0>)


 55%|█████▌    | 113/205 [00:31<00:26,  3.43it/s]

tensor(0.0337, device='cuda:0', grad_fn=<NllLossBackward0>)


 56%|█████▌    | 114/205 [00:32<00:28,  3.14it/s]

tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward0>)


 56%|█████▌    | 115/205 [00:32<00:26,  3.36it/s]

tensor(0.0304, device='cuda:0', grad_fn=<NllLossBackward0>)


 57%|█████▋    | 116/205 [00:32<00:25,  3.44it/s]

tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward0>)


 57%|█████▋    | 117/205 [00:32<00:24,  3.54it/s]

tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward0>)


 58%|█████▊    | 118/205 [00:33<00:27,  3.20it/s]

tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward0>)


 58%|█████▊    | 119/205 [00:33<00:26,  3.24it/s]

tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward0>)


 59%|█████▊    | 120/205 [00:33<00:26,  3.25it/s]

tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward0>)


 59%|█████▉    | 121/205 [00:34<00:27,  3.06it/s]

tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward0>)


 60%|█████▉    | 122/205 [00:34<00:26,  3.19it/s]

tensor(0.0120, device='cuda:0', grad_fn=<NllLossBackward0>)


 60%|██████    | 123/205 [00:34<00:24,  3.36it/s]

tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward0>)


 60%|██████    | 124/205 [00:35<00:23,  3.48it/s]

tensor(0.0137, device='cuda:0', grad_fn=<NllLossBackward0>)


 61%|██████    | 125/205 [00:35<00:24,  3.24it/s]

tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward0>)


 61%|██████▏   | 126/205 [00:35<00:23,  3.32it/s]

tensor(0.0190, device='cuda:0', grad_fn=<NllLossBackward0>)


 62%|██████▏   | 127/205 [00:35<00:22,  3.49it/s]

tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward0>)


 62%|██████▏   | 128/205 [00:36<00:21,  3.58it/s]

tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward0>)


 63%|██████▎   | 129/205 [00:36<00:20,  3.65it/s]

tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward0>)


 63%|██████▎   | 130/205 [00:36<00:20,  3.70it/s]

tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward0>)


 64%|██████▍   | 131/205 [00:36<00:19,  3.79it/s]

tensor(0.0357, device='cuda:0', grad_fn=<NllLossBackward0>)


 64%|██████▍   | 132/205 [00:37<00:19,  3.74it/s]

tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward0>)


 65%|██████▍   | 133/205 [00:37<00:19,  3.75it/s]

tensor(0.0184, device='cuda:0', grad_fn=<NllLossBackward0>)


 65%|██████▌   | 134/205 [00:37<00:18,  3.87it/s]

tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward0>)


 66%|██████▌   | 135/205 [00:38<00:19,  3.55it/s]

tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward0>)


 66%|██████▋   | 136/205 [00:38<00:19,  3.60it/s]

tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward0>)


 67%|██████▋   | 137/205 [00:38<00:18,  3.66it/s]

tensor(0.0047, device='cuda:0', grad_fn=<NllLossBackward0>)


 67%|██████▋   | 138/205 [00:38<00:18,  3.65it/s]

tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward0>)


 68%|██████▊   | 139/205 [00:39<00:18,  3.52it/s]

tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward0>)


 68%|██████▊   | 140/205 [00:39<00:18,  3.42it/s]

tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward0>)


 69%|██████▉   | 141/205 [00:39<00:17,  3.57it/s]

tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward0>)


 69%|██████▉   | 142/205 [00:40<00:17,  3.60it/s]

tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward0>)


 70%|██████▉   | 143/205 [00:40<00:17,  3.47it/s]

tensor(0.0167, device='cuda:0', grad_fn=<NllLossBackward0>)


 70%|███████   | 144/205 [00:40<00:17,  3.51it/s]

tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward0>)


 71%|███████   | 145/205 [00:40<00:16,  3.54it/s]

tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward0>)


 71%|███████   | 146/205 [00:41<00:16,  3.50it/s]

tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward0>)


 72%|███████▏  | 147/205 [00:41<00:18,  3.19it/s]

tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward0>)


 72%|███████▏  | 148/205 [00:41<00:18,  3.15it/s]

tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward0>)


 73%|███████▎  | 149/205 [00:42<00:17,  3.17it/s]

tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward0>)


 73%|███████▎  | 150/205 [00:42<00:16,  3.25it/s]

tensor(0.0111, device='cuda:0', grad_fn=<NllLossBackward0>)


 74%|███████▎  | 151/205 [00:42<00:16,  3.32it/s]

tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward0>)


 74%|███████▍  | 152/205 [00:43<00:16,  3.25it/s]

tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward0>)


 75%|███████▍  | 153/205 [00:43<00:16,  3.24it/s]

tensor(0.0245, device='cuda:0', grad_fn=<NllLossBackward0>)


 75%|███████▌  | 154/205 [00:43<00:14,  3.47it/s]

tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward0>)


 76%|███████▌  | 155/205 [00:43<00:13,  3.63it/s]

tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward0>)


 76%|███████▌  | 156/205 [00:44<00:13,  3.51it/s]

tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward0>)


 77%|███████▋  | 157/205 [00:44<00:13,  3.49it/s]

tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward0>)


 77%|███████▋  | 158/205 [00:44<00:13,  3.41it/s]

tensor(0.0198, device='cuda:0', grad_fn=<NllLossBackward0>)


 78%|███████▊  | 159/205 [00:45<00:13,  3.47it/s]

tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward0>)


 78%|███████▊  | 160/205 [00:45<00:12,  3.62it/s]

tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward0>)


 79%|███████▊  | 161/205 [00:45<00:12,  3.50it/s]

tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward0>)


 79%|███████▉  | 162/205 [00:45<00:12,  3.40it/s]

tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward0>)


 80%|███████▉  | 163/205 [00:46<00:11,  3.58it/s]

tensor(0.0349, device='cuda:0', grad_fn=<NllLossBackward0>)


 80%|████████  | 164/205 [00:46<00:11,  3.66it/s]

tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward0>)


 80%|████████  | 165/205 [00:46<00:11,  3.49it/s]

tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward0>)


 81%|████████  | 166/205 [00:47<00:11,  3.52it/s]

tensor(0.0180, device='cuda:0', grad_fn=<NllLossBackward0>)


 81%|████████▏ | 167/205 [00:47<00:10,  3.52it/s]

tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward0>)


 82%|████████▏ | 168/205 [00:47<00:10,  3.61it/s]

tensor(0.0071, device='cuda:0', grad_fn=<NllLossBackward0>)


 82%|████████▏ | 169/205 [00:47<00:10,  3.45it/s]

tensor(0.0212, device='cuda:0', grad_fn=<NllLossBackward0>)


 83%|████████▎ | 170/205 [00:48<00:10,  3.45it/s]

tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward0>)


 83%|████████▎ | 171/205 [00:48<00:09,  3.48it/s]

tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward0>)


 84%|████████▍ | 172/205 [00:48<00:09,  3.58it/s]

tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward0>)


 84%|████████▍ | 173/205 [00:49<00:08,  3.70it/s]

tensor(0.0235, device='cuda:0', grad_fn=<NllLossBackward0>)


 85%|████████▍ | 174/205 [00:49<00:08,  3.65it/s]

tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward0>)


 85%|████████▌ | 175/205 [00:49<00:08,  3.35it/s]

tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward0>)


 86%|████████▌ | 176/205 [00:49<00:08,  3.35it/s]

tensor(0.0136, device='cuda:0', grad_fn=<NllLossBackward0>)


 86%|████████▋ | 177/205 [00:50<00:08,  3.48it/s]

tensor(0.0355, device='cuda:0', grad_fn=<NllLossBackward0>)


 87%|████████▋ | 178/205 [00:50<00:07,  3.51it/s]

tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward0>)


 87%|████████▋ | 179/205 [00:50<00:07,  3.58it/s]

tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward0>)


 88%|████████▊ | 180/205 [00:51<00:07,  3.50it/s]

tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward0>)


 88%|████████▊ | 181/205 [00:51<00:06,  3.53it/s]

tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward0>)


 89%|████████▉ | 182/205 [00:51<00:06,  3.60it/s]

tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward0>)


 89%|████████▉ | 183/205 [00:51<00:05,  3.68it/s]

tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward0>)


 90%|████████▉ | 184/205 [00:52<00:05,  3.82it/s]

tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward0>)


 90%|█████████ | 185/205 [00:52<00:05,  3.88it/s]

tensor(0.0044, device='cuda:0', grad_fn=<NllLossBackward0>)


 91%|█████████ | 186/205 [00:52<00:04,  3.86it/s]

tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward0>)


 91%|█████████ | 187/205 [00:52<00:04,  3.96it/s]

tensor(0.0407, device='cuda:0', grad_fn=<NllLossBackward0>)


 92%|█████████▏| 188/205 [00:53<00:04,  3.83it/s]

tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward0>)


 92%|█████████▏| 189/205 [00:53<00:04,  3.83it/s]

tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward0>)


 93%|█████████▎| 190/205 [00:53<00:03,  3.76it/s]

tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward0>)


 93%|█████████▎| 191/205 [00:53<00:03,  3.60it/s]

tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward0>)


 94%|█████████▎| 192/205 [00:54<00:03,  3.70it/s]

tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward0>)


 94%|█████████▍| 193/205 [00:54<00:03,  3.55it/s]

tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward0>)


 95%|█████████▍| 194/205 [00:54<00:03,  3.55it/s]

tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward0>)


 95%|█████████▌| 195/205 [00:55<00:02,  3.56it/s]

tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward0>)


 96%|█████████▌| 196/205 [00:55<00:02,  3.64it/s]

tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward0>)


 96%|█████████▌| 197/205 [00:55<00:02,  3.68it/s]

tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward0>)


 97%|█████████▋| 198/205 [00:55<00:01,  3.81it/s]

tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward0>)


 97%|█████████▋| 199/205 [00:56<00:01,  3.73it/s]

tensor(0.0244, device='cuda:0', grad_fn=<NllLossBackward0>)


 98%|█████████▊| 200/205 [00:56<00:01,  3.80it/s]

tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward0>)


 98%|█████████▊| 201/205 [00:56<00:01,  3.82it/s]

tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward0>)


 99%|█████████▊| 202/205 [00:56<00:00,  3.89it/s]

tensor(0.0312, device='cuda:0', grad_fn=<NllLossBackward0>)


 99%|█████████▉| 203/205 [00:57<00:00,  3.64it/s]

tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward0>)


100%|█████████▉| 204/205 [00:57<00:00,  3.67it/s]

tensor(0.0293, device='cuda:0', grad_fn=<NllLossBackward0>)


100%|██████████| 205/205 [00:57<00:00,  3.55it/s]


tensor(0.0435, device='cuda:0', grad_fn=<NllLossBackward0>)
3


  0%|          | 1/205 [00:00<00:56,  3.59it/s]

tensor(0.0124, device='cuda:0', grad_fn=<NllLossBackward0>)


  1%|          | 2/205 [00:00<01:08,  2.96it/s]

tensor(0.0298, device='cuda:0', grad_fn=<NllLossBackward0>)


  1%|▏         | 3/205 [00:00<01:04,  3.16it/s]

tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward0>)


  2%|▏         | 4/205 [00:01<00:59,  3.38it/s]

tensor(0.0242, device='cuda:0', grad_fn=<NllLossBackward0>)


  2%|▏         | 5/205 [00:01<00:57,  3.47it/s]

tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward0>)


  3%|▎         | 6/205 [00:01<00:59,  3.37it/s]

tensor(0.0238, device='cuda:0', grad_fn=<NllLossBackward0>)


  3%|▎         | 7/205 [00:02<00:58,  3.38it/s]

tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward0>)


  4%|▍         | 8/205 [00:02<01:01,  3.22it/s]

tensor(0.0231, device='cuda:0', grad_fn=<NllLossBackward0>)


  4%|▍         | 9/205 [00:02<01:01,  3.20it/s]

tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward0>)


  5%|▍         | 10/205 [00:03<01:01,  3.17it/s]

tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward0>)


  5%|▌         | 11/205 [00:03<00:59,  3.29it/s]

tensor(0.0140, device='cuda:0', grad_fn=<NllLossBackward0>)


  6%|▌         | 12/205 [00:03<00:55,  3.47it/s]

tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward0>)


  6%|▋         | 13/205 [00:03<00:54,  3.53it/s]

tensor(0.0065, device='cuda:0', grad_fn=<NllLossBackward0>)


  7%|▋         | 14/205 [00:04<00:52,  3.61it/s]

tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward0>)


  7%|▋         | 15/205 [00:04<00:53,  3.58it/s]

tensor(0.0247, device='cuda:0', grad_fn=<NllLossBackward0>)


  8%|▊         | 16/205 [00:04<00:56,  3.36it/s]

tensor(0.0215, device='cuda:0', grad_fn=<NllLossBackward0>)


  8%|▊         | 17/205 [00:05<00:54,  3.47it/s]

tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward0>)


  9%|▉         | 18/205 [00:05<00:51,  3.66it/s]

tensor(0.0435, device='cuda:0', grad_fn=<NllLossBackward0>)


  9%|▉         | 19/205 [00:05<00:50,  3.67it/s]

tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward0>)


 10%|▉         | 20/205 [00:05<00:49,  3.72it/s]

tensor(0.0250, device='cuda:0', grad_fn=<NllLossBackward0>)


 10%|█         | 21/205 [00:06<00:49,  3.74it/s]

tensor(0.0174, device='cuda:0', grad_fn=<NllLossBackward0>)


 11%|█         | 22/205 [00:06<00:51,  3.54it/s]

tensor(0.0280, device='cuda:0', grad_fn=<NllLossBackward0>)


 11%|█         | 23/205 [00:06<00:49,  3.66it/s]

tensor(0.0356, device='cuda:0', grad_fn=<NllLossBackward0>)


 12%|█▏        | 24/205 [00:06<00:48,  3.73it/s]

tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward0>)


 12%|█▏        | 25/205 [00:07<00:49,  3.65it/s]

tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward0>)


 13%|█▎        | 26/205 [00:07<00:49,  3.61it/s]

tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward0>)


 13%|█▎        | 27/205 [00:07<00:47,  3.72it/s]

tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward0>)


 14%|█▎        | 28/205 [00:07<00:48,  3.68it/s]

tensor(0.0255, device='cuda:0', grad_fn=<NllLossBackward0>)


 14%|█▍        | 29/205 [00:08<00:47,  3.72it/s]

tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward0>)


 15%|█▍        | 30/205 [00:08<00:47,  3.68it/s]

tensor(0.0145, device='cuda:0', grad_fn=<NllLossBackward0>)


 15%|█▌        | 31/205 [00:08<00:45,  3.81it/s]

tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward0>)


 16%|█▌        | 32/205 [00:09<00:47,  3.62it/s]

tensor(0.0234, device='cuda:0', grad_fn=<NllLossBackward0>)


 16%|█▌        | 33/205 [00:09<00:48,  3.55it/s]

tensor(0.0097, device='cuda:0', grad_fn=<NllLossBackward0>)


 17%|█▋        | 34/205 [00:09<00:47,  3.57it/s]

tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward0>)


 17%|█▋        | 35/205 [00:09<00:49,  3.47it/s]

tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward0>)


 18%|█▊        | 36/205 [00:10<00:49,  3.45it/s]

tensor(0.0168, device='cuda:0', grad_fn=<NllLossBackward0>)


 18%|█▊        | 37/205 [00:10<00:46,  3.59it/s]

tensor(0.0400, device='cuda:0', grad_fn=<NllLossBackward0>)


 19%|█▊        | 38/205 [00:10<00:44,  3.74it/s]

tensor(0.0362, device='cuda:0', grad_fn=<NllLossBackward0>)


 19%|█▉        | 39/205 [00:11<00:45,  3.67it/s]

tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward0>)


 20%|█▉        | 40/205 [00:11<00:45,  3.63it/s]

tensor(0.0258, device='cuda:0', grad_fn=<NllLossBackward0>)


 20%|██        | 41/205 [00:11<00:45,  3.62it/s]

tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward0>)


 20%|██        | 42/205 [00:11<00:43,  3.78it/s]

tensor(0.0372, device='cuda:0', grad_fn=<NllLossBackward0>)


 21%|██        | 43/205 [00:12<00:43,  3.73it/s]

tensor(0.0302, device='cuda:0', grad_fn=<NllLossBackward0>)


 21%|██▏       | 44/205 [00:12<00:43,  3.74it/s]

tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward0>)


 22%|██▏       | 45/205 [00:12<00:43,  3.69it/s]

tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward0>)


 22%|██▏       | 46/205 [00:12<00:43,  3.66it/s]

tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward0>)


 23%|██▎       | 47/205 [00:13<00:42,  3.70it/s]

tensor(0.0346, device='cuda:0', grad_fn=<NllLossBackward0>)


 23%|██▎       | 48/205 [00:13<00:41,  3.78it/s]

tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward0>)


 24%|██▍       | 49/205 [00:13<00:40,  3.85it/s]

tensor(0.0067, device='cuda:0', grad_fn=<NllLossBackward0>)


 24%|██▍       | 50/205 [00:13<00:40,  3.85it/s]

tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward0>)


 25%|██▍       | 51/205 [00:14<00:41,  3.73it/s]

tensor(0.0132, device='cuda:0', grad_fn=<NllLossBackward0>)


 25%|██▌       | 52/205 [00:14<00:43,  3.54it/s]

tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward0>)


 26%|██▌       | 53/205 [00:14<00:41,  3.65it/s]

tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward0>)


 26%|██▋       | 54/205 [00:15<00:41,  3.64it/s]

tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward0>)


 27%|██▋       | 55/205 [00:15<00:41,  3.62it/s]

tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward0>)


 27%|██▋       | 56/205 [00:15<00:40,  3.68it/s]

tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward0>)


 28%|██▊       | 57/205 [00:15<00:41,  3.53it/s]

tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward0>)


 28%|██▊       | 58/205 [00:16<00:41,  3.55it/s]

tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward0>)


 29%|██▉       | 59/205 [00:16<00:39,  3.68it/s]

tensor(0.0040, device='cuda:0', grad_fn=<NllLossBackward0>)


 29%|██▉       | 60/205 [00:16<00:40,  3.62it/s]

tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward0>)


 30%|██▉       | 61/205 [00:17<00:39,  3.63it/s]

tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward0>)


 30%|███       | 62/205 [00:17<00:37,  3.78it/s]

tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward0>)


 31%|███       | 63/205 [00:17<00:36,  3.87it/s]

tensor(0.0166, device='cuda:0', grad_fn=<NllLossBackward0>)


 31%|███       | 64/205 [00:17<00:37,  3.81it/s]

tensor(0.0283, device='cuda:0', grad_fn=<NllLossBackward0>)


 32%|███▏      | 65/205 [00:18<00:35,  3.91it/s]

tensor(0.0202, device='cuda:0', grad_fn=<NllLossBackward0>)


 32%|███▏      | 66/205 [00:18<00:36,  3.84it/s]

tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward0>)


 33%|███▎      | 67/205 [00:18<00:36,  3.73it/s]

tensor(0.0170, device='cuda:0', grad_fn=<NllLossBackward0>)


 33%|███▎      | 68/205 [00:18<00:36,  3.80it/s]

tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward0>)


 34%|███▎      | 69/205 [00:19<00:36,  3.78it/s]

tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward0>)


 34%|███▍      | 70/205 [00:19<00:35,  3.85it/s]

tensor(0.0089, device='cuda:0', grad_fn=<NllLossBackward0>)


 35%|███▍      | 71/205 [00:19<00:35,  3.77it/s]

tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward0>)


 35%|███▌      | 72/205 [00:19<00:34,  3.83it/s]

tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward0>)


 36%|███▌      | 73/205 [00:20<00:35,  3.76it/s]

tensor(0.0181, device='cuda:0', grad_fn=<NllLossBackward0>)


 36%|███▌      | 74/205 [00:20<00:39,  3.33it/s]

tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward0>)


 37%|███▋      | 75/205 [00:20<00:37,  3.48it/s]

tensor(0.0112, device='cuda:0', grad_fn=<NllLossBackward0>)


 37%|███▋      | 76/205 [00:21<00:36,  3.57it/s]

tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward0>)


 38%|███▊      | 77/205 [00:21<00:35,  3.64it/s]

tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward0>)


 38%|███▊      | 78/205 [00:21<00:35,  3.63it/s]

tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward0>)


 39%|███▊      | 79/205 [00:21<00:35,  3.59it/s]

tensor(0.0243, device='cuda:0', grad_fn=<NllLossBackward0>)


 39%|███▉      | 80/205 [00:22<00:34,  3.61it/s]

tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward0>)


 40%|███▉      | 81/205 [00:22<00:33,  3.67it/s]

tensor(0.0046, device='cuda:0', grad_fn=<NllLossBackward0>)


 40%|████      | 82/205 [00:22<00:33,  3.66it/s]

tensor(0.0160, device='cuda:0', grad_fn=<NllLossBackward0>)


 40%|████      | 83/205 [00:22<00:33,  3.69it/s]

tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward0>)


 41%|████      | 84/205 [00:23<00:34,  3.51it/s]

tensor(0.0211, device='cuda:0', grad_fn=<NllLossBackward0>)


 41%|████▏     | 85/205 [00:23<00:34,  3.48it/s]

tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward0>)


 42%|████▏     | 86/205 [00:23<00:33,  3.50it/s]

tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward0>)


 42%|████▏     | 87/205 [00:24<00:33,  3.47it/s]

tensor(0.0249, device='cuda:0', grad_fn=<NllLossBackward0>)


 43%|████▎     | 88/205 [00:24<00:32,  3.60it/s]

tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward0>)


 43%|████▎     | 89/205 [00:24<00:32,  3.56it/s]

tensor(0.0192, device='cuda:0', grad_fn=<NllLossBackward0>)


 44%|████▍     | 90/205 [00:24<00:31,  3.64it/s]

tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward0>)


 44%|████▍     | 91/205 [00:25<00:31,  3.62it/s]

tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward0>)


 45%|████▍     | 92/205 [00:25<00:32,  3.46it/s]

tensor(0.0233, device='cuda:0', grad_fn=<NllLossBackward0>)


 45%|████▌     | 93/205 [00:25<00:32,  3.49it/s]

tensor(0.0227, device='cuda:0', grad_fn=<NllLossBackward0>)
tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward0>)


 46%|████▋     | 95/205 [00:26<00:35,  3.13it/s]

tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward0>)


 47%|████▋     | 96/205 [00:26<00:33,  3.23it/s]

tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward0>)


 47%|████▋     | 97/205 [00:27<00:33,  3.18it/s]

tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward0>)


 48%|████▊     | 98/205 [00:27<00:34,  3.13it/s]

tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward0>)


 48%|████▊     | 99/205 [00:27<00:32,  3.25it/s]

tensor(0.0096, device='cuda:0', grad_fn=<NllLossBackward0>)


 49%|████▉     | 100/205 [00:28<00:31,  3.35it/s]

tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward0>)


 49%|████▉     | 101/205 [00:28<00:32,  3.23it/s]

tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward0>)


 50%|████▉     | 102/205 [00:28<00:30,  3.40it/s]

tensor(0.0259, device='cuda:0', grad_fn=<NllLossBackward0>)


 50%|█████     | 103/205 [00:28<00:29,  3.44it/s]

tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward0>)


 51%|█████     | 104/205 [00:29<00:28,  3.58it/s]

tensor(0.0200, device='cuda:0', grad_fn=<NllLossBackward0>)


 51%|█████     | 105/205 [00:29<00:26,  3.74it/s]

tensor(0.0095, device='cuda:0', grad_fn=<NllLossBackward0>)


 52%|█████▏    | 106/205 [00:29<00:26,  3.70it/s]

tensor(0.0189, device='cuda:0', grad_fn=<NllLossBackward0>)


 52%|█████▏    | 107/205 [00:29<00:25,  3.84it/s]

tensor(0.0481, device='cuda:0', grad_fn=<NllLossBackward0>)


 53%|█████▎    | 108/205 [00:30<00:26,  3.64it/s]

tensor(0.0129, device='cuda:0', grad_fn=<NllLossBackward0>)


 53%|█████▎    | 109/205 [00:30<00:28,  3.42it/s]

tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward0>)


 54%|█████▎    | 110/205 [00:30<00:27,  3.46it/s]

tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward0>)


 54%|█████▍    | 111/205 [00:31<00:26,  3.50it/s]

tensor(0.0210, device='cuda:0', grad_fn=<NllLossBackward0>)


 55%|█████▍    | 112/205 [00:31<00:27,  3.40it/s]

tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward0>)


 55%|█████▌    | 113/205 [00:31<00:26,  3.43it/s]

tensor(0.0325, device='cuda:0', grad_fn=<NllLossBackward0>)


 56%|█████▌    | 114/205 [00:32<00:28,  3.14it/s]

tensor(0.0395, device='cuda:0', grad_fn=<NllLossBackward0>)


 56%|█████▌    | 115/205 [00:32<00:26,  3.36it/s]

tensor(0.0607, device='cuda:0', grad_fn=<NllLossBackward0>)


 57%|█████▋    | 116/205 [00:32<00:25,  3.44it/s]

tensor(0.0402, device='cuda:0', grad_fn=<NllLossBackward0>)


 57%|█████▋    | 117/205 [00:32<00:24,  3.54it/s]

tensor(0.0218, device='cuda:0', grad_fn=<NllLossBackward0>)


 58%|█████▊    | 118/205 [00:33<00:27,  3.20it/s]

tensor(0.0537, device='cuda:0', grad_fn=<NllLossBackward0>)


 58%|█████▊    | 119/205 [00:33<00:26,  3.24it/s]

tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward0>)


 59%|█████▊    | 120/205 [00:33<00:26,  3.25it/s]

tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward0>)


 59%|█████▉    | 121/205 [00:34<00:27,  3.06it/s]

tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward0>)


 60%|█████▉    | 122/205 [00:34<00:26,  3.19it/s]

tensor(0.0117, device='cuda:0', grad_fn=<NllLossBackward0>)


 60%|██████    | 123/205 [00:34<00:24,  3.36it/s]

tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward0>)


 60%|██████    | 124/205 [00:35<00:23,  3.48it/s]

tensor(0.0134, device='cuda:0', grad_fn=<NllLossBackward0>)


 61%|██████    | 125/205 [00:35<00:24,  3.24it/s]

tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward0>)


 61%|██████▏   | 126/205 [00:35<00:23,  3.32it/s]

tensor(0.0183, device='cuda:0', grad_fn=<NllLossBackward0>)


 62%|██████▏   | 127/205 [00:35<00:22,  3.49it/s]

tensor(0.0049, device='cuda:0', grad_fn=<NllLossBackward0>)


 62%|██████▏   | 128/205 [00:36<00:21,  3.58it/s]

tensor(0.0155, device='cuda:0', grad_fn=<NllLossBackward0>)


 63%|██████▎   | 129/205 [00:36<00:20,  3.65it/s]

tensor(0.0126, device='cuda:0', grad_fn=<NllLossBackward0>)


 63%|██████▎   | 130/205 [00:36<00:20,  3.69it/s]

tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward0>)


 64%|██████▍   | 131/205 [00:36<00:19,  3.79it/s]

tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward0>)


 64%|██████▍   | 132/205 [00:37<00:19,  3.74it/s]

tensor(0.0224, device='cuda:0', grad_fn=<NllLossBackward0>)


 65%|██████▍   | 133/205 [00:37<00:19,  3.76it/s]

tensor(0.0149, device='cuda:0', grad_fn=<NllLossBackward0>)


 65%|██████▌   | 134/205 [00:37<00:18,  3.87it/s]

tensor(0.0373, device='cuda:0', grad_fn=<NllLossBackward0>)


 66%|██████▌   | 135/205 [00:38<00:19,  3.53it/s]

tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward0>)


 66%|██████▋   | 136/205 [00:38<00:19,  3.58it/s]

tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward0>)


 67%|██████▋   | 137/205 [00:38<00:18,  3.64it/s]

tensor(0.0055, device='cuda:0', grad_fn=<NllLossBackward0>)


 67%|██████▋   | 138/205 [00:38<00:18,  3.64it/s]

tensor(0.0114, device='cuda:0', grad_fn=<NllLossBackward0>)


 68%|██████▊   | 139/205 [00:39<00:18,  3.52it/s]

tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward0>)


 68%|██████▊   | 140/205 [00:39<00:19,  3.42it/s]

tensor(0.0253, device='cuda:0', grad_fn=<NllLossBackward0>)


 69%|██████▉   | 141/205 [00:39<00:17,  3.56it/s]

tensor(0.0341, device='cuda:0', grad_fn=<NllLossBackward0>)


 69%|██████▉   | 142/205 [00:40<00:17,  3.59it/s]

tensor(0.0177, device='cuda:0', grad_fn=<NllLossBackward0>)


 70%|██████▉   | 143/205 [00:40<00:17,  3.47it/s]

tensor(0.0169, device='cuda:0', grad_fn=<NllLossBackward0>)


 70%|███████   | 144/205 [00:40<00:17,  3.51it/s]

tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward0>)


 71%|███████   | 145/205 [00:40<00:16,  3.54it/s]

tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward0>)


 71%|███████   | 146/205 [00:41<00:16,  3.50it/s]

tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward0>)


 72%|███████▏  | 147/205 [00:41<00:18,  3.18it/s]

tensor(0.0322, device='cuda:0', grad_fn=<NllLossBackward0>)


 72%|███████▏  | 148/205 [00:41<00:18,  3.14it/s]

tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward0>)


 73%|███████▎  | 149/205 [00:42<00:17,  3.17it/s]

tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward0>)


 73%|███████▎  | 150/205 [00:42<00:16,  3.25it/s]

tensor(0.0113, device='cuda:0', grad_fn=<NllLossBackward0>)


 74%|███████▎  | 151/205 [00:42<00:16,  3.32it/s]

tensor(0.0226, device='cuda:0', grad_fn=<NllLossBackward0>)


 74%|███████▍  | 152/205 [00:43<00:16,  3.25it/s]

tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward0>)


 75%|███████▍  | 153/205 [00:43<00:16,  3.24it/s]

tensor(0.0237, device='cuda:0', grad_fn=<NllLossBackward0>)


 75%|███████▌  | 154/205 [00:43<00:14,  3.47it/s]

tensor(0.0021, device='cuda:0', grad_fn=<NllLossBackward0>)


 76%|███████▌  | 155/205 [00:43<00:13,  3.63it/s]

tensor(0.0306, device='cuda:0', grad_fn=<NllLossBackward0>)


 76%|███████▌  | 156/205 [00:44<00:13,  3.50it/s]

tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward0>)


 77%|███████▋  | 157/205 [00:44<00:13,  3.49it/s]

tensor(0.0115, device='cuda:0', grad_fn=<NllLossBackward0>)


 77%|███████▋  | 158/205 [00:44<00:13,  3.41it/s]

tensor(0.0195, device='cuda:0', grad_fn=<NllLossBackward0>)


 78%|███████▊  | 159/205 [00:45<00:13,  3.47it/s]

tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward0>)


 78%|███████▊  | 160/205 [00:45<00:12,  3.62it/s]

tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward0>)


 79%|███████▊  | 161/205 [00:45<00:12,  3.50it/s]

tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward0>)


 79%|███████▉  | 162/205 [00:45<00:12,  3.40it/s]

tensor(0.0248, device='cuda:0', grad_fn=<NllLossBackward0>)


 80%|███████▉  | 163/205 [00:46<00:11,  3.58it/s]

tensor(0.0345, device='cuda:0', grad_fn=<NllLossBackward0>)


 80%|████████  | 164/205 [00:46<00:11,  3.66it/s]

tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward0>)


 80%|████████  | 165/205 [00:46<00:11,  3.49it/s]

tensor(0.0205, device='cuda:0', grad_fn=<NllLossBackward0>)


 81%|████████  | 166/205 [00:47<00:11,  3.52it/s]

tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward0>)


 81%|████████▏ | 167/205 [00:47<00:10,  3.53it/s]

tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward0>)


 82%|████████▏ | 168/205 [00:47<00:10,  3.62it/s]

tensor(0.0069, device='cuda:0', grad_fn=<NllLossBackward0>)


 82%|████████▏ | 169/205 [00:47<00:10,  3.46it/s]

tensor(0.0204, device='cuda:0', grad_fn=<NllLossBackward0>)


 83%|████████▎ | 170/205 [00:48<00:10,  3.46it/s]

tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward0>)


 83%|████████▎ | 171/205 [00:48<00:09,  3.48it/s]

tensor(0.0297, device='cuda:0', grad_fn=<NllLossBackward0>)


 84%|████████▍ | 172/205 [00:48<00:09,  3.58it/s]

tensor(0.0153, device='cuda:0', grad_fn=<NllLossBackward0>)


 84%|████████▍ | 173/205 [00:49<00:08,  3.70it/s]

tensor(0.0230, device='cuda:0', grad_fn=<NllLossBackward0>)


 85%|████████▍ | 174/205 [00:49<00:08,  3.65it/s]

tensor(0.0206, device='cuda:0', grad_fn=<NllLossBackward0>)


 85%|████████▌ | 175/205 [00:49<00:08,  3.35it/s]

tensor(0.0208, device='cuda:0', grad_fn=<NllLossBackward0>)


 86%|████████▌ | 176/205 [00:49<00:08,  3.35it/s]

tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward0>)


 86%|████████▋ | 177/205 [00:50<00:08,  3.47it/s]

tensor(0.0732, device='cuda:0', grad_fn=<NllLossBackward0>)


 87%|████████▋ | 178/205 [00:50<00:07,  3.51it/s]

tensor(0.0219, device='cuda:0', grad_fn=<NllLossBackward0>)


 87%|████████▋ | 179/205 [00:50<00:07,  3.58it/s]

tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward0>)


 88%|████████▊ | 180/205 [00:51<00:07,  3.50it/s]

tensor(0.0197, device='cuda:0', grad_fn=<NllLossBackward0>)


 88%|████████▊ | 181/205 [00:51<00:06,  3.53it/s]

tensor(0.0265, device='cuda:0', grad_fn=<NllLossBackward0>)


 89%|████████▉ | 182/205 [00:51<00:06,  3.60it/s]

tensor(0.0123, device='cuda:0', grad_fn=<NllLossBackward0>)


 89%|████████▉ | 183/205 [00:51<00:05,  3.68it/s]

tensor(0.0091, device='cuda:0', grad_fn=<NllLossBackward0>)


 90%|████████▉ | 184/205 [00:52<00:05,  3.81it/s]

tensor(0.0343, device='cuda:0', grad_fn=<NllLossBackward0>)


 90%|█████████ | 185/205 [00:52<00:05,  3.88it/s]

tensor(0.0043, device='cuda:0', grad_fn=<NllLossBackward0>)


 91%|█████████ | 186/205 [00:52<00:04,  3.86it/s]

tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward0>)


 91%|█████████ | 187/205 [00:52<00:04,  3.96it/s]

tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward0>)


 92%|█████████▏| 188/205 [00:53<00:04,  3.83it/s]

tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward0>)


 92%|█████████▏| 189/205 [00:53<00:04,  3.83it/s]

tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward0>)


 93%|█████████▎| 190/205 [00:53<00:03,  3.76it/s]

tensor(0.0209, device='cuda:0', grad_fn=<NllLossBackward0>)


 93%|█████████▎| 191/205 [00:54<00:04,  3.26it/s]

tensor(0.0108, device='cuda:0', grad_fn=<NllLossBackward0>)


 94%|█████████▎| 192/205 [00:54<00:03,  3.44it/s]

tensor(0.0159, device='cuda:0', grad_fn=<NllLossBackward0>)


 94%|█████████▍| 193/205 [00:54<00:03,  3.37it/s]

tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward0>)


 95%|█████████▍| 194/205 [00:54<00:03,  3.43it/s]

tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward0>)


 95%|█████████▌| 195/205 [00:55<00:02,  3.47it/s]

tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward0>)


 96%|█████████▌| 196/205 [00:55<00:02,  3.58it/s]

tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward0>)


 96%|█████████▌| 197/205 [00:55<00:02,  3.64it/s]

tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward0>)


 97%|█████████▋| 198/205 [00:55<00:01,  3.78it/s]

tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward0>)


 97%|█████████▋| 199/205 [00:56<00:01,  3.70it/s]

tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward0>)


 98%|█████████▊| 200/205 [00:56<00:01,  3.78it/s]

tensor(0.0217, device='cuda:0', grad_fn=<NllLossBackward0>)


 98%|█████████▊| 201/205 [00:56<00:01,  3.80it/s]

tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward0>)


 99%|█████████▊| 202/205 [00:57<00:00,  3.88it/s]

tensor(0.0194, device='cuda:0', grad_fn=<NllLossBackward0>)


 99%|█████████▉| 203/205 [00:57<00:00,  3.68it/s]

tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward0>)


100%|█████████▉| 204/205 [00:57<00:00,  3.65it/s]

tensor(0.0292, device='cuda:0', grad_fn=<NllLossBackward0>)


100%|██████████| 205/205 [00:57<00:00,  3.54it/s]


tensor(0.0447, device='cuda:0', grad_fn=<NllLossBackward0>)
4


  0%|          | 1/205 [00:00<00:56,  3.59it/s]

tensor(0.0133, device='cuda:0', grad_fn=<NllLossBackward0>)


  1%|          | 2/205 [00:00<01:08,  2.96it/s]

tensor(0.0301, device='cuda:0', grad_fn=<NllLossBackward0>)


  1%|▏         | 3/205 [00:00<01:03,  3.16it/s]

tensor(0.0119, device='cuda:0', grad_fn=<NllLossBackward0>)


  2%|▏         | 4/205 [00:01<00:59,  3.38it/s]

tensor(0.0239, device='cuda:0', grad_fn=<NllLossBackward0>)


  2%|▏         | 5/205 [00:01<00:57,  3.48it/s]

tensor(0.0229, device='cuda:0', grad_fn=<NllLossBackward0>)


  3%|▎         | 6/205 [00:01<00:59,  3.37it/s]

tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward0>)


  3%|▎         | 7/205 [00:02<00:58,  3.39it/s]

tensor(0.0187, device='cuda:0', grad_fn=<NllLossBackward0>)


  4%|▍         | 8/205 [00:02<01:01,  3.23it/s]

tensor(0.0162, device='cuda:0', grad_fn=<NllLossBackward0>)


  4%|▍         | 9/205 [00:02<01:01,  3.21it/s]

tensor(0.0213, device='cuda:0', grad_fn=<NllLossBackward0>)


  5%|▍         | 10/205 [00:03<01:01,  3.17it/s]

tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward0>)


  5%|▌         | 11/205 [00:03<00:58,  3.29it/s]

tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward0>)


  6%|▌         | 12/205 [00:03<00:55,  3.48it/s]

tensor(0.0054, device='cuda:0', grad_fn=<NllLossBackward0>)


  6%|▋         | 13/205 [00:03<00:54,  3.53it/s]

tensor(0.0064, device='cuda:0', grad_fn=<NllLossBackward0>)


  7%|▋         | 14/205 [00:04<00:52,  3.61it/s]

tensor(0.0048, device='cuda:0', grad_fn=<NllLossBackward0>)


  7%|▋         | 15/205 [00:04<00:53,  3.58it/s]

tensor(0.0251, device='cuda:0', grad_fn=<NllLossBackward0>)


  8%|▊         | 16/205 [00:04<00:56,  3.36it/s]

tensor(0.0207, device='cuda:0', grad_fn=<NllLossBackward0>)


  8%|▊         | 17/205 [00:05<00:54,  3.47it/s]

tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward0>)


  9%|▉         | 18/205 [00:05<00:51,  3.66it/s]

tensor(0.0419, device='cuda:0', grad_fn=<NllLossBackward0>)


  9%|▉         | 19/205 [00:05<00:50,  3.67it/s]

tensor(0.0201, device='cuda:0', grad_fn=<NllLossBackward0>)


 10%|▉         | 20/205 [00:05<00:49,  3.72it/s]

tensor(0.0829, device='cuda:0', grad_fn=<NllLossBackward0>)


 10%|█         | 21/205 [00:06<00:49,  3.74it/s]

tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward0>)


 11%|█         | 22/205 [00:06<00:51,  3.54it/s]

tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward0>)


 11%|█         | 23/205 [00:06<00:49,  3.66it/s]

tensor(0.0348, device='cuda:0', grad_fn=<NllLossBackward0>)


 12%|█▏        | 24/205 [00:06<00:48,  3.73it/s]

tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward0>)


 12%|█▏        | 25/205 [00:07<00:49,  3.65it/s]

tensor(0.0171, device='cuda:0', grad_fn=<NllLossBackward0>)


 13%|█▎        | 26/205 [00:07<00:49,  3.61it/s]

tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward0>)


 13%|█▎        | 27/205 [00:07<00:47,  3.72it/s]

tensor(0.0386, device='cuda:0', grad_fn=<NllLossBackward0>)


 14%|█▎        | 28/205 [00:07<00:48,  3.68it/s]

tensor(0.0262, device='cuda:0', grad_fn=<NllLossBackward0>)


 14%|█▍        | 29/205 [00:08<00:47,  3.72it/s]

tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward0>)


 15%|█▍        | 30/205 [00:08<00:47,  3.68it/s]

tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward0>)


 15%|█▌        | 31/205 [00:08<00:45,  3.80it/s]

tensor(0.0271, device='cuda:0', grad_fn=<NllLossBackward0>)


 16%|█▌        | 32/205 [00:09<00:47,  3.62it/s]

tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward0>)


 16%|█▌        | 33/205 [00:09<00:48,  3.55it/s]

tensor(0.0106, device='cuda:0', grad_fn=<NllLossBackward0>)


 17%|█▋        | 34/205 [00:09<00:47,  3.57it/s]

tensor(0.0203, device='cuda:0', grad_fn=<NllLossBackward0>)


 17%|█▋        | 35/205 [00:09<00:49,  3.46it/s]

tensor(0.0185, device='cuda:0', grad_fn=<NllLossBackward0>)


 18%|█▊        | 36/205 [00:10<00:49,  3.44it/s]

tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward0>)


 18%|█▊        | 37/205 [00:10<00:46,  3.59it/s]

tensor(0.0396, device='cuda:0', grad_fn=<NllLossBackward0>)


 19%|█▊        | 38/205 [00:10<00:44,  3.74it/s]

tensor(0.0375, device='cuda:0', grad_fn=<NllLossBackward0>)


 19%|█▉        | 39/205 [00:11<00:45,  3.67it/s]

tensor(0.0338, device='cuda:0', grad_fn=<NllLossBackward0>)


 20%|█▉        | 40/205 [00:11<00:45,  3.63it/s]

tensor(0.0285, device='cuda:0', grad_fn=<NllLossBackward0>)


 20%|██        | 41/205 [00:11<00:45,  3.62it/s]

tensor(0.0868, device='cuda:0', grad_fn=<NllLossBackward0>)


 20%|██        | 42/205 [00:11<00:43,  3.78it/s]

tensor(0.0378, device='cuda:0', grad_fn=<NllLossBackward0>)


 21%|██        | 43/205 [00:12<00:43,  3.73it/s]

tensor(0.0308, device='cuda:0', grad_fn=<NllLossBackward0>)


 21%|██▏       | 44/205 [00:12<00:43,  3.74it/s]

tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward0>)


 22%|██▏       | 45/205 [00:12<00:43,  3.69it/s]

tensor(0.0191, device='cuda:0', grad_fn=<NllLossBackward0>)


 22%|██▏       | 46/205 [00:12<00:43,  3.66it/s]

tensor(0.0199, device='cuda:0', grad_fn=<NllLossBackward0>)


 23%|██▎       | 47/205 [00:13<00:42,  3.70it/s]

tensor(0.0353, device='cuda:0', grad_fn=<NllLossBackward0>)


 23%|██▎       | 48/205 [00:13<00:41,  3.78it/s]

tensor(0.0358, device='cuda:0', grad_fn=<NllLossBackward0>)


 24%|██▍       | 49/205 [00:13<00:40,  3.85it/s]

tensor(0.0079, device='cuda:0', grad_fn=<NllLossBackward0>)


 24%|██▍       | 50/205 [00:13<00:40,  3.85it/s]

tensor(0.0193, device='cuda:0', grad_fn=<NllLossBackward0>)


 25%|██▍       | 51/205 [00:14<00:41,  3.73it/s]

tensor(0.0142, device='cuda:0', grad_fn=<NllLossBackward0>)


 25%|██▌       | 52/205 [00:14<00:43,  3.54it/s]

tensor(0.0256, device='cuda:0', grad_fn=<NllLossBackward0>)


 26%|██▌       | 53/205 [00:14<00:41,  3.65it/s]

tensor(0.0288, device='cuda:0', grad_fn=<NllLossBackward0>)


 26%|██▋       | 54/205 [00:15<00:41,  3.64it/s]

tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward0>)


 27%|██▋       | 55/205 [00:15<00:41,  3.62it/s]

tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward0>)


 27%|██▋       | 56/205 [00:15<00:40,  3.69it/s]

tensor(0.0252, device='cuda:0', grad_fn=<NllLossBackward0>)


 28%|██▊       | 57/205 [00:15<00:41,  3.53it/s]

tensor(0.0152, device='cuda:0', grad_fn=<NllLossBackward0>)


 28%|██▊       | 58/205 [00:16<00:41,  3.55it/s]

tensor(0.0263, device='cuda:0', grad_fn=<NllLossBackward0>)


 29%|██▉       | 59/205 [00:16<00:39,  3.68it/s]

tensor(0.0045, device='cuda:0', grad_fn=<NllLossBackward0>)


 29%|██▉       | 60/205 [00:16<00:40,  3.62it/s]

tensor(0.0266, device='cuda:0', grad_fn=<NllLossBackward0>)


 30%|██▉       | 61/205 [00:17<00:39,  3.63it/s]

tensor(0.0223, device='cuda:0', grad_fn=<NllLossBackward0>)


 30%|███       | 62/205 [00:17<00:37,  3.78it/s]

tensor(0.0273, device='cuda:0', grad_fn=<NllLossBackward0>)


 31%|███       | 63/205 [00:17<00:36,  3.87it/s]

tensor(0.0158, device='cuda:0', grad_fn=<NllLossBackward0>)


 31%|███       | 64/205 [00:17<00:37,  3.81it/s]

tensor(0.0284, device='cuda:0', grad_fn=<NllLossBackward0>)


 32%|███▏      | 65/205 [00:18<00:35,  3.91it/s]

tensor(0.0196, device='cuda:0', grad_fn=<NllLossBackward0>)


 32%|███▏      | 66/205 [00:18<00:36,  3.84it/s]

tensor(0.0102, device='cuda:0', grad_fn=<NllLossBackward0>)


 33%|███▎      | 67/205 [00:18<00:36,  3.73it/s]

tensor(0.0172, device='cuda:0', grad_fn=<NllLossBackward0>)


 33%|███▎      | 68/205 [00:18<00:36,  3.81it/s]

tensor(0.0254, device='cuda:0', grad_fn=<NllLossBackward0>)


 34%|███▎      | 69/205 [00:19<00:36,  3.77it/s]

tensor(0.0277, device='cuda:0', grad_fn=<NllLossBackward0>)


 34%|███▍      | 70/205 [00:19<00:35,  3.85it/s]

tensor(0.0087, device='cuda:0', grad_fn=<NllLossBackward0>)


 35%|███▍      | 71/205 [00:19<00:35,  3.77it/s]

tensor(0.0354, device='cuda:0', grad_fn=<NllLossBackward0>)


 35%|███▌      | 72/205 [00:19<00:34,  3.83it/s]

tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward0>)


 36%|███▌      | 73/205 [00:20<00:35,  3.75it/s]

tensor(0.0179, device='cuda:0', grad_fn=<NllLossBackward0>)


 36%|███▌      | 74/205 [00:20<00:39,  3.33it/s]

tensor(0.0261, device='cuda:0', grad_fn=<NllLossBackward0>)


 37%|███▋      | 75/205 [00:20<00:37,  3.48it/s]

tensor(0.0363, device='cuda:0', grad_fn=<NllLossBackward0>)


 37%|███▋      | 76/205 [00:21<00:36,  3.57it/s]

tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward0>)


 38%|███▊      | 77/205 [00:21<00:35,  3.63it/s]

tensor(0.0268, device='cuda:0', grad_fn=<NllLossBackward0>)


 38%|███▊      | 78/205 [00:21<00:35,  3.63it/s]

tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward0>)


 39%|███▊      | 79/205 [00:21<00:35,  3.59it/s]

tensor(0.0264, device='cuda:0', grad_fn=<NllLossBackward0>)


 39%|███▉      | 80/205 [00:22<00:34,  3.61it/s]

tensor(0.0178, device='cuda:0', grad_fn=<NllLossBackward0>)


 40%|███▉      | 81/205 [00:22<00:33,  3.67it/s]

tensor(0.0066, device='cuda:0', grad_fn=<NllLossBackward0>)


 40%|████      | 82/205 [00:22<00:33,  3.66it/s]

tensor(0.0176, device='cuda:0', grad_fn=<NllLossBackward0>)


 40%|████      | 83/205 [00:22<00:33,  3.69it/s]

tensor(0.0175, device='cuda:0', grad_fn=<NllLossBackward0>)


 41%|████      | 84/205 [00:23<00:34,  3.51it/s]

tensor(0.0260, device='cuda:0', grad_fn=<NllLossBackward0>)


 41%|████▏     | 85/205 [00:23<00:34,  3.48it/s]

tensor(0.0336, device='cuda:0', grad_fn=<NllLossBackward0>)


 42%|████▏     | 86/205 [00:23<00:33,  3.50it/s]

tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward0>)


 42%|████▏     | 87/205 [00:24<00:33,  3.47it/s]

tensor(0.0281, device='cuda:0', grad_fn=<NllLossBackward0>)


 43%|████▎     | 88/205 [00:24<00:32,  3.60it/s]

tensor(0.0282, device='cuda:0', grad_fn=<NllLossBackward0>)


 43%|████▎     | 89/205 [00:24<00:32,  3.56it/s]

tensor(0.0246, device='cuda:0', grad_fn=<NllLossBackward0>)


 44%|████▍     | 90/205 [00:24<00:31,  3.64it/s]

tensor(0.0294, device='cuda:0', grad_fn=<NllLossBackward0>)


 44%|████▍     | 91/205 [00:25<00:31,  3.62it/s]

tensor(0.0270, device='cuda:0', grad_fn=<NllLossBackward0>)


 45%|████▍     | 92/205 [00:25<00:32,  3.46it/s]

tensor(0.0307, device='cuda:0', grad_fn=<NllLossBackward0>)


 45%|████▌     | 93/205 [00:25<00:32,  3.49it/s]

tensor(0.0267, device='cuda:0', grad_fn=<NllLossBackward0>)
tensor(0.0432, device='cuda:0', grad_fn=<NllLossBackward0>)


 46%|████▋     | 95/205 [00:26<00:35,  3.13it/s]

tensor(0.0228, device='cuda:0', grad_fn=<NllLossBackward0>)


 47%|████▋     | 96/205 [00:26<00:33,  3.23it/s]

tensor(0.0342, device='cuda:0', grad_fn=<NllLossBackward0>)


 47%|████▋     | 97/205 [00:27<00:33,  3.18it/s]

tensor(0.0220, device='cuda:0', grad_fn=<NllLossBackward0>)


 48%|████▊     | 98/205 [00:27<00:34,  3.13it/s]

tensor(0.0296, device='cuda:0', grad_fn=<NllLossBackward0>)


 48%|████▊     | 99/205 [00:27<00:32,  3.25it/s]

tensor(0.0122, device='cuda:0', grad_fn=<NllLossBackward0>)


 49%|████▉     | 100/205 [00:28<00:31,  3.35it/s]

tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward0>)


 49%|████▉     | 101/205 [00:28<00:32,  3.23it/s]

tensor(0.0393, device='cuda:0', grad_fn=<NllLossBackward0>)


 50%|████▉     | 102/205 [00:28<00:30,  3.40it/s]

tensor(0.0278, device='cuda:0', grad_fn=<NllLossBackward0>)


 50%|█████     | 103/205 [00:28<00:29,  3.44it/s]

tensor(0.0300, device='cuda:0', grad_fn=<NllLossBackward0>)


 51%|█████     | 104/205 [00:29<00:28,  3.58it/s]

tensor(0.0214, device='cuda:0', grad_fn=<NllLossBackward0>)


 51%|█████     | 105/205 [00:29<00:26,  3.74it/s]

tensor(0.0161, device='cuda:0', grad_fn=<NllLossBackward0>)


 52%|█████▏    | 106/205 [00:29<00:26,  3.70it/s]

tensor(0.0232, device='cuda:0', grad_fn=<NllLossBackward0>)


 52%|█████▏    | 107/205 [00:29<00:25,  3.84it/s]

tensor(0.0476, device='cuda:0', grad_fn=<NllLossBackward0>)


 53%|█████▎    | 108/205 [00:30<00:26,  3.64it/s]

tensor(0.0173, device='cuda:0', grad_fn=<NllLossBackward0>)


 53%|█████▎    | 109/205 [00:30<00:28,  3.42it/s]

tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward0>)


 54%|█████▎    | 110/205 [00:30<00:27,  3.46it/s]

tensor(0.0236, device='cuda:0', grad_fn=<NllLossBackward0>)


 54%|█████▍    | 111/205 [00:31<00:26,  3.50it/s]

tensor(0.0272, device='cuda:0', grad_fn=<NllLossBackward0>)


 55%|█████▍    | 112/205 [00:31<00:27,  3.40it/s]

tensor(0.0287, device='cuda:0', grad_fn=<NllLossBackward0>)


 55%|█████▌    | 113/205 [00:31<00:26,  3.43it/s]

tensor(0.0408, device='cuda:0', grad_fn=<NllLossBackward0>)


 56%|█████▌    | 114/205 [00:32<00:28,  3.14it/s]

tensor(0.0613, device='cuda:0', grad_fn=<NllLossBackward0>)


 56%|█████▌    | 115/205 [00:32<00:26,  3.36it/s]

tensor(0.0332, device='cuda:0', grad_fn=<NllLossBackward0>)


 57%|█████▋    | 116/205 [00:32<00:25,  3.45it/s]

tensor(0.0321, device='cuda:0', grad_fn=<NllLossBackward0>)


 57%|█████▋    | 117/205 [00:32<00:24,  3.54it/s]

tensor(0.0299, device='cuda:0', grad_fn=<NllLossBackward0>)


 58%|█████▊    | 118/205 [00:33<00:27,  3.20it/s]

tensor(0.1060, device='cuda:0', grad_fn=<NllLossBackward0>)


 58%|█████▊    | 119/205 [00:33<00:26,  3.24it/s]

tensor(0.0279, device='cuda:0', grad_fn=<NllLossBackward0>)


 59%|█████▊    | 120/205 [00:33<00:26,  3.25it/s]

tensor(0.0351, device='cuda:0', grad_fn=<NllLossBackward0>)


 59%|█████▉    | 121/205 [00:34<00:27,  3.06it/s]

tensor(0.0419, device='cuda:0', grad_fn=<NllLossBackward0>)


 60%|█████▉    | 122/205 [00:34<00:26,  3.19it/s]

tensor(0.0222, device='cuda:0', grad_fn=<NllLossBackward0>)


 60%|██████    | 123/205 [00:34<00:24,  3.35it/s]

tensor(0.0309, device='cuda:0', grad_fn=<NllLossBackward0>)


 60%|██████    | 124/205 [00:35<00:23,  3.48it/s]

tensor(0.0225, device='cuda:0', grad_fn=<NllLossBackward0>)


 61%|██████    | 125/205 [00:35<00:24,  3.24it/s]

tensor(0.0578, device='cuda:0', grad_fn=<NllLossBackward0>)


 61%|██████▏   | 126/205 [00:35<00:23,  3.32it/s]

tensor(0.0274, device='cuda:0', grad_fn=<NllLossBackward0>)


 62%|██████▏   | 127/205 [00:35<00:22,  3.49it/s]

tensor(0.0150, device='cuda:0', grad_fn=<NllLossBackward0>)


 62%|██████▏   | 128/205 [00:36<00:21,  3.58it/s]

tensor(0.0221, device='cuda:0', grad_fn=<NllLossBackward0>)


 63%|██████▎   | 129/205 [00:36<00:20,  3.66it/s]

tensor(0.0241, device='cuda:0', grad_fn=<NllLossBackward0>)


 63%|██████▎   | 130/205 [00:36<00:20,  3.70it/s]

tensor(0.0446, device='cuda:0', grad_fn=<NllLossBackward0>)


 64%|██████▍   | 131/205 [00:36<00:19,  3.79it/s]

tensor(0.0369, device='cuda:0', grad_fn=<NllLossBackward0>)


 64%|██████▍   | 132/205 [00:37<00:19,  3.74it/s]

tensor(0.0614, device='cuda:0', grad_fn=<NllLossBackward0>)


 65%|██████▍   | 133/205 [00:37<00:19,  3.75it/s]

tensor(0.0359, device='cuda:0', grad_fn=<NllLossBackward0>)


 65%|██████▌   | 134/205 [00:37<00:18,  3.87it/s]

tensor(0.0573, device='cuda:0', grad_fn=<NllLossBackward0>)


 66%|██████▌   | 135/205 [00:38<00:19,  3.54it/s]

tensor(0.0550, device='cuda:0', grad_fn=<NllLossBackward0>)


 66%|██████▋   | 136/205 [00:38<00:19,  3.60it/s]

tensor(0.0691, device='cuda:0', grad_fn=<NllLossBackward0>)


 67%|██████▋   | 137/205 [00:38<00:18,  3.66it/s]

tensor(0.0290, device='cuda:0', grad_fn=<NllLossBackward0>)


 67%|██████▋   | 138/205 [00:38<00:18,  3.65it/s]

tensor(0.0685, device='cuda:0', grad_fn=<NllLossBackward0>)


 68%|██████▊   | 139/205 [00:39<00:18,  3.52it/s]

tensor(0.1033, device='cuda:0', grad_fn=<NllLossBackward0>)


 68%|██████▊   | 140/205 [00:39<00:19,  3.42it/s]

tensor(0.1057, device='cuda:0', grad_fn=<NllLossBackward0>)


 69%|██████▉   | 141/205 [00:39<00:17,  3.57it/s]

tensor(0.0523, device='cuda:0', grad_fn=<NllLossBackward0>)


 69%|██████▉   | 142/205 [00:40<00:17,  3.60it/s]

tensor(0.0424, device='cuda:0', grad_fn=<NllLossBackward0>)


 70%|██████▉   | 143/205 [00:40<00:17,  3.47it/s]

tensor(0.0952, device='cuda:0', grad_fn=<NllLossBackward0>)


 70%|███████   | 144/205 [00:40<00:17,  3.51it/s]

tensor(0.0996, device='cuda:0', grad_fn=<NllLossBackward0>)


 71%|███████   | 145/205 [00:40<00:16,  3.54it/s]

tensor(0.0703, device='cuda:0', grad_fn=<NllLossBackward0>)


 71%|███████   | 146/205 [00:41<00:16,  3.50it/s]

tensor(0.0700, device='cuda:0', grad_fn=<NllLossBackward0>)


 72%|███████▏  | 147/205 [00:41<00:18,  3.18it/s]

tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward0>)


 72%|███████▏  | 148/205 [00:41<00:18,  3.15it/s]

tensor(0.0993, device='cuda:0', grad_fn=<NllLossBackward0>)


 73%|███████▎  | 149/205 [00:42<00:17,  3.17it/s]

tensor(0.1011, device='cuda:0', grad_fn=<NllLossBackward0>)


 73%|███████▎  | 150/205 [00:42<00:16,  3.25it/s]

tensor(0.0858, device='cuda:0', grad_fn=<NllLossBackward0>)


 74%|███████▎  | 151/205 [00:42<00:16,  3.32it/s]

tensor(0.1079, device='cuda:0', grad_fn=<NllLossBackward0>)


 74%|███████▍  | 152/205 [00:43<00:16,  3.25it/s]

tensor(0.0806, device='cuda:0', grad_fn=<NllLossBackward0>)


 75%|███████▍  | 153/205 [00:43<00:16,  3.24it/s]

tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward0>)


 75%|███████▌  | 154/205 [00:43<00:14,  3.47it/s]

tensor(0.0074, device='cuda:0', grad_fn=<NllLossBackward0>)


 76%|███████▌  | 155/205 [00:43<00:13,  3.63it/s]

tensor(0.0467, device='cuda:0', grad_fn=<NllLossBackward0>)


 76%|███████▌  | 156/205 [00:44<00:13,  3.51it/s]

tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward0>)


 77%|███████▋  | 157/205 [00:44<00:13,  3.49it/s]

tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward0>)


 77%|███████▋  | 158/205 [00:44<00:13,  3.41it/s]

tensor(0.0717, device='cuda:0', grad_fn=<NllLossBackward0>)


 78%|███████▊  | 159/205 [00:45<00:13,  3.47it/s]

tensor(0.0897, device='cuda:0', grad_fn=<NllLossBackward0>)


 78%|███████▊  | 160/205 [00:45<00:12,  3.61it/s]

tensor(0.0573, device='cuda:0', grad_fn=<NllLossBackward0>)


 79%|███████▊  | 161/205 [00:45<00:12,  3.50it/s]

tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward0>)


 79%|███████▉  | 162/205 [00:45<00:12,  3.39it/s]

tensor(0.1119, device='cuda:0', grad_fn=<NllLossBackward0>)


 80%|███████▉  | 163/205 [00:46<00:11,  3.58it/s]

tensor(0.0488, device='cuda:0', grad_fn=<NllLossBackward0>)


 80%|████████  | 164/205 [00:46<00:11,  3.66it/s]

tensor(0.0776, device='cuda:0', grad_fn=<NllLossBackward0>)


 80%|████████  | 165/205 [00:46<00:11,  3.49it/s]

tensor(0.1039, device='cuda:0', grad_fn=<NllLossBackward0>)


 81%|████████  | 166/205 [00:47<00:11,  3.52it/s]

tensor(0.0860, device='cuda:0', grad_fn=<NllLossBackward0>)


 81%|████████▏ | 167/205 [00:47<00:10,  3.53it/s]

tensor(0.1359, device='cuda:0', grad_fn=<NllLossBackward0>)


 82%|████████▏ | 168/205 [00:47<00:10,  3.62it/s]

tensor(0.0704, device='cuda:0', grad_fn=<NllLossBackward0>)


 82%|████████▏ | 169/205 [00:47<00:10,  3.46it/s]

tensor(0.3041, device='cuda:0', grad_fn=<NllLossBackward0>)


 83%|████████▎ | 170/205 [00:48<00:10,  3.46it/s]

tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward0>)


 83%|████████▎ | 171/205 [00:48<00:09,  3.48it/s]

tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward0>)


 84%|████████▍ | 172/205 [00:48<00:09,  3.58it/s]

tensor(0.1125, device='cuda:0', grad_fn=<NllLossBackward0>)


 84%|████████▍ | 173/205 [00:49<00:08,  3.70it/s]

tensor(0.0758, device='cuda:0', grad_fn=<NllLossBackward0>)


 85%|████████▍ | 174/205 [00:49<00:08,  3.65it/s]

tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward0>)


 85%|████████▌ | 175/205 [00:49<00:08,  3.36it/s]

tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward0>)


 86%|████████▌ | 176/205 [00:49<00:08,  3.35it/s]

tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward0>)


 86%|████████▋ | 177/205 [00:50<00:08,  3.48it/s]

tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward0>)


 87%|████████▋ | 178/205 [00:50<00:07,  3.51it/s]

tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward0>)


 87%|████████▋ | 179/205 [00:50<00:07,  3.58it/s]

tensor(0.0985, device='cuda:0', grad_fn=<NllLossBackward0>)


 88%|████████▊ | 180/205 [00:51<00:07,  3.50it/s]

tensor(0.1343, device='cuda:0', grad_fn=<NllLossBackward0>)


 88%|████████▊ | 181/205 [00:51<00:06,  3.53it/s]

tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward0>)


 89%|████████▉ | 182/205 [00:51<00:06,  3.60it/s]

tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward0>)


 89%|████████▉ | 183/205 [00:51<00:05,  3.67it/s]

tensor(0.0619, device='cuda:0', grad_fn=<NllLossBackward0>)


 90%|████████▉ | 184/205 [00:52<00:05,  3.81it/s]

tensor(0.1349, device='cuda:0', grad_fn=<NllLossBackward0>)


 90%|█████████ | 185/205 [00:52<00:05,  3.88it/s]

tensor(0.0775, device='cuda:0', grad_fn=<NllLossBackward0>)


 91%|█████████ | 186/205 [00:52<00:04,  3.86it/s]

tensor(0.1064, device='cuda:0', grad_fn=<NllLossBackward0>)


 91%|█████████ | 187/205 [00:52<00:04,  3.96it/s]

tensor(0.0845, device='cuda:0', grad_fn=<NllLossBackward0>)


 92%|█████████▏| 188/205 [00:53<00:04,  3.83it/s]

tensor(0.1196, device='cuda:0', grad_fn=<NllLossBackward0>)


 92%|█████████▏| 189/205 [00:53<00:04,  3.83it/s]

tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward0>)


 93%|█████████▎| 190/205 [00:53<00:03,  3.76it/s]

tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward0>)


 93%|█████████▎| 191/205 [00:53<00:03,  3.60it/s]

tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward0>)


 94%|█████████▎| 192/205 [00:54<00:03,  3.70it/s]

tensor(0.1101, device='cuda:0', grad_fn=<NllLossBackward0>)


 94%|█████████▍| 193/205 [00:54<00:03,  3.55it/s]

tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward0>)


 95%|█████████▍| 194/205 [00:54<00:03,  3.55it/s]

tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward0>)


 95%|█████████▌| 195/205 [00:55<00:02,  3.56it/s]

tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward0>)


 96%|█████████▌| 196/205 [00:55<00:02,  3.64it/s]

tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward0>)


 96%|█████████▌| 197/205 [00:55<00:02,  3.68it/s]

tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward0>)


 97%|█████████▋| 198/205 [00:55<00:01,  3.81it/s]

tensor(0.1084, device='cuda:0', grad_fn=<NllLossBackward0>)


 97%|█████████▋| 199/205 [00:56<00:01,  3.72it/s]

tensor(0.2510, device='cuda:0', grad_fn=<NllLossBackward0>)


 98%|█████████▊| 200/205 [00:56<00:01,  3.79it/s]

tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward0>)


 98%|█████████▊| 201/205 [00:56<00:01,  3.81it/s]

tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward0>)


 99%|█████████▊| 202/205 [00:56<00:00,  3.89it/s]

tensor(0.0769, device='cuda:0', grad_fn=<NllLossBackward0>)


 99%|█████████▉| 203/205 [00:57<00:00,  3.69it/s]

tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward0>)


100%|█████████▉| 204/205 [00:57<00:00,  3.66it/s]

tensor(0.2714, device='cuda:0', grad_fn=<NllLossBackward0>)


100%|██████████| 205/205 [00:57<00:00,  3.55it/s]

tensor(0.0633, device='cuda:0', grad_fn=<NllLossBackward0>)





In [43]:
torch.save(gpt.state_dict(), './torch_model/model.pth')
gpt.save_pretrained('./model/')

# prediction

In [7]:
def add_adapters(model, adapter_dim=16):
    assert adapter_dim > 0

    for module in model.modules():
        if isinstance(module, FrozenBNBLinear):
            module.adapter = nn.Sequential(
                nn.Linear(module.in_features, adapter_dim, bias=False),
                nn.Linear(adapter_dim, module.out_features, bias=False),
            )
            nn.init.zeros_(module.adapter[1].weight)
        elif isinstance(module, FrozenBNBEmbedding):
            module.adapter = nn.Sequential(
                nn.Embedding(module.num_embeddings, adapter_dim),
                nn.Linear(adapter_dim, module.embedding_dim, bias=False),
            )
            nn.init.zeros_(module.adapter[1].weight)

# add_adapters(gpt)
# gpt.to(device)

In [8]:
gptmodel = GPTJForCausalLM.from_pretrained("hivemind/gpt-j-6B-8bit", low_cpu_mem_usage=True)


lm_head Linear(in_features=4096, out_features=50400, bias=True)
k_proj Linear(in_features=4096, out_features=4096, bias=False)
v_proj Linear(in_features=4096, out_features=4096, bias=False)
q_proj Linear(in_features=4096, out_features=4096, bias=False)
out_proj Linear(in_features=4096, out_features=4096, bias=False)
fc_in Linear(in_features=4096, out_features=16384, bias=True)
fc_out Linear(in_features=16384, out_features=4096, bias=True)
k_proj Linear(in_features=4096, out_features=4096, bias=False)
v_proj Linear(in_features=4096, out_features=4096, bias=False)
q_proj Linear(in_features=4096, out_features=4096, bias=False)
out_proj Linear(in_features=4096, out_features=4096, bias=False)
fc_in Linear(in_features=4096, out_features=16384, bias=True)
fc_out Linear(in_features=16384, out_features=4096, bias=True)
k_proj Linear(in_features=4096, out_features=4096, bias=False)
v_proj Linear(in_features=4096, out_features=4096, bias=False)
q_proj Linear(in_features=4096, out_features=4096, b

In [9]:
add_adapters(gptmodel)

In [10]:
gptmodel.load_state_dict(torch.load("./torch_model/model.pth"))

<All keys matched successfully>

In [9]:
prompt = "We describe the clinical findings of 15 individuals in a large kindred affected with distal arthrogryposis type 1A (DA1A). The most consistent findings among individuals were overlapping fingers at birth, abnormal digital flexion creases, and foot deformities, including talipes equinovarus and vertical talus. There was marked intrafamilial variation in the expression of DA1A. Linkage mapping of the locus for DA1A suggests that the use of strict diagnostic criteria excludes unaffected individuals rigorously, but can produce incomplete ascertainment of affected individuals. In the context of an affected family, the range of phenotypes consistent with a diagnosis of DA1A needs to be expanded.\n\n##\n\n"

In [34]:
prompt = "A 5 years old girl with swallowing difficulty, recurrent chest infections and developmental delay. She can only say 4 words. Her parents are first-degree cousins, and there is no family history of a similar condition. Her growth parameters at 5 years of age were weight 9.4 kg, height 82.5 cm and head circumference 46 cm, all below the 3rd percentile. Her physical examination was significant for a blue sclera, hypotonia and hyporeflexia. Vision and hearing were normal, and other examinations were within normal limits. The brain MRI was grossly normal with diffuse T-2 hyperintensity in subcortical white matter. \n\n##\n\n"

In [35]:
prompt = tokenizer(prompt, return_tensors='pt')

In [37]:
device_1 = 'cuda'
gptmodel.to(device_1)
# prompt = tokenizer("The results of a systematic study of the otological aspects in 13 cases of earpit-deafness syndrome are reported. The audiometric, radiological and vestibular findings as well as the results of exploratory tympanotomies with and without stapedectomies are discussed together with the results reported in the literature. A convincing explanation of the poor results of exploratory tympanotomies in cases with mixed hearing loss is not furnished. If the hearing loss is confined to conduction and ankylosis of the stapes or a disconnection of the ossicular chain is suspected, exploratory tympanotomy can be expected to be successful. \n\n##\n\n", return_tensors='pt')
prompt = {key: value.to(device_1) for key, value in prompt.items()}
out = gptmodel.generate(**prompt, min_length=128, max_length=600, 
                        temperature=0.1,
                        top_p=0.5,
                        do_sample=True)
tokenizer.decode(out[0])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


'A 5 years old girl with swallowing difficulty, recurrent chest infections and developmental delay. She can only say 4 words. Her parents are first-degree cousins, and there is no family history of a similar condition. Her growth parameters at 5 years of age were weight 9.4 kg, height 82.5 cm and head circumference 46 cm, all below the 3rd percentile. Her physical examination was significant for a blue sclera, hypotonia and hyporeflexia. Vision and hearing were normal, and other examinations were within normal limits. The brain MRI was grossly normal with diffuse T-2 hyperintensity in subcortical white matter. \n\n##\n\n  swallowing difficulty | HP_0001627\n developmental delay | HP_0001263\n hypotonia | HP_0001159\n hyporeflexia | HP_0008586\n 3 rd percentile | HP_0004051\n low weight, 9.4 k | HP_0006996\n low height, 82.5 cm | HP_0009373\n low head circumference, 46 cm | HP_0004458\n hypoplastic scul | HP_0001140\n hypoplastic scul | HP_0004452\n brachycephaly | HP_0001155\n brachyce

In [18]:
device_1 = 'cuda'
gptmodel.to(device_1)

GPTJForCausalLM(
  (transformer): GPTJModel(
    (wte): FrozenBNBEmbedding(50400, 4096)
    (drop): Dropout(p=0.0, inplace=False)
    (h): ModuleList(
      (0-27): 28 x GPTJBlock(
        (ln_1): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
        (attn): GPTJAttention(
          (attn_dropout): Dropout(p=0.0, inplace=False)
          (resid_dropout): Dropout(p=0.0, inplace=False)
          (k_proj): FrozenBNBLinear(4096, 4096)
          (v_proj): FrozenBNBLinear(4096, 4096)
          (q_proj): FrozenBNBLinear(4096, 4096)
          (out_proj): FrozenBNBLinear(4096, 4096)
        )
        (mlp): GPTJMLP(
          (fc_in): FrozenBNBLinear(4096, 16384)
          (fc_out): FrozenBNBLinear(16384, 4096)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.0, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): FrozenBNBLinear(4096, 50400)
)

In [16]:
def gpt_val(prompt):
    prompt = tokenizer(prompt, return_tensors='pt')
    prompt = {key: value.to(device_1) for key, value in prompt.items()}
    out = gptmodel.generate(**prompt, min_length=128, max_length=600, 
                        temperature=0.1,
                        top_p=0.5,
                        do_sample=True)
    print(tokenizer.decode(out[0]))

In [30]:
for id, prompt in enumerate(biolark_tt):
    
    prompt = prompt.split('\n\n##\n\n')[0] + '\n\n##\n\n'
    print(f'------{id}------')
    gpt_val(prompt)


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


------0------


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Nevoid basal cell carcinoma syndrome (NBCCS) is rare in black persons. We describe an 11-year-old black boy with NBCCS who presented with exotropia and a painful, expanding, cystic mass in the left posterior alveolar ridge. Further examination revealed odontogenic keratocysts with palmar and plantar pitting. Less than 5% of reported patients with NBCCS are black. To our knowledge, this is the first report of a black patient with NBCCS presenting with exotropia and an impacted molar displaced into the orbit by an odontogenic keratocyst. 

##

  exotropia | HP_0001144
 rare disorders | HP_0003745
 odontogenic keratocysts | HP_0010603
 palmar and plantar pitting | HP_0006860
 molar | HP_0009794
 cystic mass | HP_0007678
 expansion | HP_0003813
 END_REPORT | HP_0004467
 END_RECOMMEND | HP_0004411
 exotropia | HP_0001193
 odontogenic keratocysts | HP_0010603
 palmar and plantar pitting | HP_0007678
 plantar pitting | HP_0006812
 pitting | HP_0006860
 molar | HP_0009794
 cystic mass | HP_000

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


To report new ocular manifestations of branchio-oculo-facial (BOF) syndrome.

Case report.

A 10-year-old girl with known BOF syndrome was referred because of a fundus lesion in her left eye.

She had undergone excision of a left orbital dermoid cyst at age 18 months and a branchial cleft fistula from the right side of neck at age 4 years. Examination disclosed openings of sinus tracts on each side of the nose connecting the lacrimal sac to skin. In the right eye, an iris pigment epithelial cyst was confirmed with ultrasound biomicroscopy. In the left eye, there was a combined hamartoma of the retina and retina pigment epithelium.

BOF syndrome can display mild to severe craniofacial, auricular, oral, and ophthalmic anomalies. In this case, the ophthalmic manifestations included lacrimal sac fistula, orbital dermoid cyst, iris pigment epithelial cyst, and combined hamartoma of the retina and retinal pigment epithelium. 

##

  orbital dermoid cyst | HP_0006546
 branchial cleft 

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Townes-Brocks syndrome (TBS) has been recognized as a dominant inherited syndrome. We report 2 cases of TBS. Case 1 was operated on for imperforate anus. Triphalangeal thumb and ear anomalies were remarkable. Deafness was diagnosed when the patient was 6 months old. Anomalies of the semicircular canals and the incus with inculomalleolar fusion were shown when the patient was 3.5 years old. During childhood, recurrent episodes of abdominal pain appeared. The diagnosis of hereditary angioneurotic edema (HANE) was made. HANE was familial as the father, the father's brother and the paternal grand mother were also affected. The parents of case 2, a female, are both mildly mentally retarded. This was the first pregnancy of the mother who had short stature. The child had an antepositioned anus, bifid right thumb, large toes, low set ears, microretrognathia and deafness. A (5, 16) translocation was observed in a child with TBS. At the breakpoint in 16q21.1, a gene coding for a transcription fa

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


We describe the clinical findings of 15 individuals in a large kindred affected with distal arthrogryposis type 1A (DA1A). The most consistent findings among individuals were overlapping fingers at birth, abnormal digital flexion creases, and foot deformities, including talipes equinovarus and vertical talus. There was marked intrafamilial variation in the expression of DA1A. Linkage mapping of the locus for DA1A suggests that the use of strict diagnostic criteria excludes unaffected individuals rigorously, but can produce incomplete ascertainment of affected individuals. In the context of an affected family, the range of phenotypes consistent with a diagnosis of DA1A needs to be expanded. 

##

  distal arthrogryposis | HP_0005684
 arthrogryposis | HP_0001390
 overlapping fingers at birth | HP_0005188
 abnormal digital flexion creases | HP_0004452
 foot deformities | HP_0001946
 talipes equinovarus | HP_0008090
 vertical talus | HP_0001193
 marked intrafamilial variation | HP_0003822


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


We present a child with mild to moderate global developmental delay including severe speech impairment, inappropriate happy demeanor, wide-based gait, frequent ear infections with mild hearing loss, deep-set eyes, a wide mouth, widely-spaced teeth, normal head circumference, and no seizures. Results of peripheral blood lymphocyte chromosomal analysis with GTG banding were normal. However, fluorescence in situ hybridization (FISH) studies showed mosaicism for a deletion of probes (D15S10 and SNRPN) from the Angelman syndrome (AS) critical region with approximately 40% of peripheral lymphocytes having the deletion. The deleted chromosome 15 also showed centromeric duplication, which was detected with a D15Z1 probe [46,XX, dic(15)(pter-->q11.1::p11.2-->q11. 1::q13-->qter)]. The same duplication pattern was observed in 30% of the nuclei obtained from a buccal smear. Methylation studies using polymerase chain reaction with sodium bisulfite-treated DNA demonstrated a normal biparental methyl

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Branchio-oto-renal (BOR) syndrome is an autosomal dominant disorder characterized by branchial abnormality, hearing loss, and renal anomalies. Recently, the disease gene has been localized to chromosome 8q. Here, we report genetic studies that further refine the disease gene region to a smaller interval and identify several YACs from the critical region. We studied two large, clinically well-characterized BOR families with a set of 13 polymorphic markers spanning the D8S165-D8S275 interval from the chromosome 8q region. Based on multipoint analysis, the highest likelihood for the location of the BOR gene is between markers D8S543 and D8S530, a distance of about 2 cM. YACs that map in the BOR critical region have been identified and characterized by fluorescence in situ hybridization and pulsed-field gel electrophoresis. A YAC contig, based on the STS content map, that covers a minimum of 4 Mb of human DNA in the critical region of BOR is assembled. This lays the groundwork for the cons

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Brachydactyly type A-1 (BDA-1; MIM 112500) is characterized by shortening or missing of the middle phalanges (Fig. 1a). It was first identified by Farabee in 1903 (ref. 2), is the first recorded example of a human anomaly with Mendelian autosomal-dominant inheritance and, as such, is cited in most genetic and biological textbooks. Here we show that mutations in IHH, which encodes Indian hedgehog, cause BDA-1. We have identified three heterozygous missense mutations in the region encoding the amino-terminal signaling domain in all affected members of three large, unrelated families. The three mutant amino acids, which are conserved across all vertebrates and invertebrates studied so far, are predicted to be adjacent on the surface of IHH. 

##

  Brachydactyly type A-1 | HP_0009371
 Shortening or missing of the middle phalanges | HP_0004100
 autosomal-dominant inheritance | HP_0000006
 ENDOCYTOSCHITOSIS | HP_0006746
 IHH | HP_0009588
 ENDOCYTOSCHITOSIS | HP_0009579
 Vertebrates | HP_000

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Neurofibromatosis 2 (NF2) features bilateral vestibular schwannomas, other benign neural tumors, and cataracts. Patients in some families develop many tumors at an early age and have rapid clinical progression, whereas in other families, patients may not have symptoms until much later and vestibular schwannomas may be the only tumors. The NF2 gene has been cloned from chromosome 22q; most identified germ-line mutations result in a truncated protein and severe NF2. To look for additional mutations and clinical correlations, we used SSCP analysis to screen DNA from 32 unrelated patients. We identified 20 different mutations in 21 patients (66%): 10 nonsense mutations, 2 frameshifts, 7 splice-site mutations, and 1 large in-frame deletion. Clinical information on 47 patients from the 21 families included ages at onset and at diagnosis, numbers of meningiomas, spinal and skin tumors, and presence of cataracts and retinal abnormalities. We compared clinical findings in patients with nonsense

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


The basal cell nevus syndrome is characterized by multiple basal cell nevi and basal cell carcinoma, cysts of the jaw, anomalies of ribs and spine, abnormal calcifications, and additional anomalies of the facial skull. A German family is described with manifestations of the syndrome in the mother and her three daughters. Expressivity was variable, in part due to age effects. The observation conforms to the assumed autosomal dominant mode of inheritance with high penetrance. 

##

  basal cell nevus | HP_0002671
 basal cell carcinoma | HP_0002664
 cysts of the jaw | HP_0004390
 anomalies of ribs and spine | HP_0000951
 abnormal calcifications | HP_0004121
 additional anomalies of the facial skull | HP_0000951
 autosomal dominant | HP_0000006
 variable | HP_0003828
 variable age | HP_0003813
 END_CLASS | HP_0001263
 END_CLASS | HP_0001263009
 MIM_0002673 | HP_0004390
 END_CLASS | HP_0003829
 variable | HP_0003828
 variable age | HP_0003813
 variable mode of inheritance | HP_0003828
 vari

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


We report a girl aged 11 and her brother aged five, both with the typical features of Angelman syndrome, and three isolated cases. This report, together with a review of published reports and contact with previous authors, has revealed a total of 41 sibs of probands, although only nine of these are known to have been later born. The possible effect of voluntary restriction of family size after the birth of an affected child is discussed in relation to the possibility of autosomal recessive inheritance, but a recurrence risk of 5% is appropriate for use in the genetic clinic. 

##

  isolated cases | HP_0003813
 sibs of probands | HP_0000005
 autosomal recessive inheritance | HP_0000007
 END_CLASS
 END_CLASS

#### Autosomal-dominant disorders | HP_0000006
 family size | HP_0006445
 END_CLASS

## What is this condition's usual presentation?

The main features are mental retardation, ataxic movements, microcephaly, and a happy disposition. The latter is a mental state resulting from a lac

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


To evaluate patients with multiple endocrine neoplasia type 1 (MEN 1) for cutaneous manifestations.

Survey during a 3-year period.

The National Institutes of Health, a tertiary referral research hospital in Bethesda Md.

A consecutive sample of 32 individuals with previously diagnosed MEN1 who were not preselected for the presence of skin lesions were examined for cutaneous abnormalities. None of the patients or family members were diagnosed as having tuberous sclerosis.

Lesions were identified by clinical appearance, photographed, and confirmed histologically.

To determine the frequency of skin lesions in patients with MEN1.

Multiple facial angiofibromas were observed in 28 (88%) of the patients with MEN1, with 16 patients (50%) having 5 or more. Angiofibromas were clinically and histologically identical to those in individuals with tuberous sclerosis. Collagenomas were observed in 23 patients (72%). Also observed were cafe au lait macules in 12 patients (38%), lipoma

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Nevoid basal cell carcinoma syndrome (NBCCS) is a genodermatosis with autosomal dominant inheritance. In identified kindreds the diagnosis is relatively easy, but for the patients without family history of this syndrome a high clinical suspicion is necessary for diagnosis.

Acrochordons are distinctly uncommon in childhood. Our purpose was to evaluate skin tags that develop at an early age.

This is a retrospective series evaluation of 7 children who presented with pedunculated papules (acrochordon-like growths). A full history was then correlated with biopsy results in each patient.

Clinically, lesions consisted of flesh-colored and pigmented pedunculated papules. Histopathologic examination of these papules showed basal cell carcinomas in each biopsy specimen.

We consider that "skin tag"-like basal cell carcinomas in childhood may represent a marker for NBCCS. Early diagnosis of this syndrome and early sun protection of the affected children could help decrease the number o

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


The diagnosis of Angelman syndrome (AS) has seldom been made in infants because the previously described characteristic manifestations usually are not apparent until after age 2 years. We describe 4 AS patients, one of whom has oculocutaneous albinism, who were less than 2 years old when first evaluated. All 4 have deletions of the region q11.2-q13 of chromosome 15. In the 3 cases in which parents were available for study the deleted chromosome 15 was maternally derived, as determined by cytological markers. All of the patients presented with severe to profound global developmental delay and postnatal-onset microcephaly; they had seizures, hypotonia, hyperreflexia, and hyperkinesis. All were hypopigmented as compared to their relatives. Each had eye abnormalities; all had choroidal pigment hypoplasia. None were initially described as having an abnormal appearance. We believe that AS is far more common than previously thought and present these 4 children to emphasize the manifestations 

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Robertsonian translocations, occurring with a frequency of about 1 in 10,000 livebirths, may be an important cause of uniparental disomy as demonstrated for 13/15, 13/14, 14/14, and 22/22 translocations. Dysmorphogenesis and/or mental retardation provide clinical clues for uniparental disomy in apparently balanced offspring of translocation carriers. Research strategies for assessing frequency and implications of uniparental disomy in translocation carriers include a genetic register approach, study of abortuses from balanced translocation carriers, and parent-of-origin studies on de novo homologous Robertsonian translocations. 

##

  uniparental disomy | HP_0004467
 mental retardation | HP_0001249
 dysmorphogenesis | HP_0000924
 mental retardation, mental, retardation, uniparental disomy | HP_0006291
 dysmorphology | HP_0000913
 mental retardation, mental retardation, mental retardation, mental retardation, mental retardation, mental retardation, mental retardation, mental retardatio

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


The E6-AP ubiquitin ligase (human/mouse gene UBE3A/Ube3a) promotes the degradation of p53 in association with papilloma E6 protein, and maternal deficiency causes human Angelman syndrome (AS). Ube3a is imprinted with silencing of the paternal allele in hippocampus and cerebellum in mice. We found that the phenotype of mice with maternal deficiency (m-/p+) for Ube3a resembles human AS with motor dysfunction, inducible seizures, and a context-dependent learning deficit. Long-term potentiation (LTP) was severely impaired in m-/p+ mice despite normal baseline synaptic transmission and neuroanatomy, indicating that ubiquitination may play a role in mammalian LTP and that LTP may be abnormal in AS. The cytoplasmic abundance of p53 was increased in postmitotic neurons in m-/p+ mice and in AS, providing a potential biochemical basis for the phenotype through failure to ubiquitinate and degrade various effectors. 

##

  autosomal dominant | HP_0000006
 p53 | HP_0002671
 ENDORFL | HP_0008589
 s

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


A young girl 12 old, sent to us for obesity, and coxa-epiphysiolysis showed signs of mental retardation and bilateral thumb ankylosis. The fact that the mother was also affected by both of these signs, led to a more detailed genetic research. The latter revealed that not only the daughter, the mother, but also their own mother and may be, the sister, the grand-mother and the great-aunt of the patient had a retardation, a slight dysmorphia, a type A brachydactylia, signs of obesity and an identical ankylosis of both thumbs. This vertical inheritance, affecting apparently females only, but not associated with a high rate of miscarriage, has, it seems, never been reported. The characteristics of this family are being considered and discussed. 

##

  young girl | HP_0004045
 mental retardation | HP_0001249
 brachydactylia | HP_0001181
 retardation | HP_0001249
 dysmorphia | HP_0001252
 type A brachydactylia | HP_0008069
 brachydactylism | HP_0001159
 ankylosis of both thumbs | HP_0002216


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Several pedigrees with 19 new cases of the earpits-deafness syndrome (McK +12510) [28] are presented. Mention is made of clinical findings obtained in audiometric and vestibular studies, studies of renal function and configuration and polytomographic studies of the labyrinth, and results of exploratory tympanotomies are discussed. The literature is reviewed and the features found in 138 cases and in our 19 cases are presented. The earpits-deafness syndrome is an autosomal dominant disorder in which affected individuals may have sensorineural, conductive or mixed hearing loss, preauricular pits, structural defects of the outer, middle and inner ear, lacrimal duct stenosis, branchial fistulas or cysts of the second branchial arch, and renal anomalies ranging from mild hypoplasia to complete absence. Not all the features of the syndrome are expressed in all carriers of the gene. Pits, branchial clefts and hearing loss are frequently expressed. The incidence of renal malformation is higher

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Neurofibromatosis type 2 (NF2) must be suspected in patients presenting with a unilateral vestibular schwannoma at a young age who are therefore at theoretical risk of developing bilateral disease. We identified 45 patients aged 30 years or less at the onset of symptoms of a unilateral vestibular schwannoma. Molecular genetic analysis of the NF2 gene was completed on peripheral blood samples in all 45 and on 28 tumour samples. No pathogenic NF2 mutations were identified in any of the blood samples. NF2 point mutations were identified in 21/28 (75%) tumour samples and loss of heterozygosity (LOH) in 21/28 (75%) tumour samples. Both mutational hits were identified in 18/28 (65%) tumour samples. In one multilobular tumour, one (presumably first hit) mutation was confirmed which was common to different foci of the tumour, while the second mutational event differed between foci. The molecular findings in this patient were consistent with somatic mosaicism for NF2 and the clinical diagnosis 

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


We had previously described a patient with an overgrowth syndrome and the chromosome constitution 45,XY,t(15q15q) (Wajntal et al., DNA Cell Biol 1993: 12: 227-231). Clinical reassessment and the use of molecular studies, including methylation analysis with an SNRPN probe, microsatellite analyses of D15S11, GABRB3 and D15S113 loci, and fluorescence in situ hybridization (FISH) using the SNRPN and GABRB3 probes, are consistent with a diagnosis of Angelman syndrome (AS) due to paternal isodisomy. This is the fourth report case of a translocation 15q15q with paternal uniparental disomy (UPD). Our findings suggest that some patients with clinical features of AS have hyperphagia and obesity with overgrowth, and that these features should not rule out a diagnosis of AS. 

##

  hyperphagia | HP_0001058
 obesity | HP_0001511
 overgrowth | HP_0007087
 ENDORFL | HP_0008586
 ENDORFL for AS | HP_0009373
 ENDORFL for PAD | HP_0012371
 ENDORFL for PADL | HP_0012372
 translocation | HP_0001442
 UPD |

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Neurofibromatosis type 2 (NF2) is a monogenic dominantly inherited disease predisposing carriers to develop nervous system tumours. To identify the genetic defect, the region between two flanking polymorphic markers on chromosome 22 was cloned and several genes identified. One is the site of germ-line mutations in NF2 patients and of somatic mutations in NF2-related tumours. Its deduced product has homology with proteins at the plasma membrane and cytoskeleton interface, a previously unknown site of action of tumour suppressor genes in humans. 

##

  Neurofibromatosis | HP_0006746
 dominantly inherited disease | HP_0000006
 tumour | HP_0002664
 somatic mutations | HP_0001425
 END_CLASS959000 | HP_0007121
 END_CLASS65000 | HP_0009794
 END_CLASS68000 | HP_0009795
 END_CLASS16000 | HP_0004467
 plasma membrane | HP_0001059
 cytoskeleton | HP_0003121
 somatic mutations in NF2-related tumours | HP_0009792
 NF2 patients | HP_0009588
 relevant loci | HP_0009911
 relevant loci, between two fla

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Bilateral acoustic neurofibromatosis (BANF) is a severe autosomal dominant disorder involving development of multiple tumours of the nervous system including meningiomas, gliomas, neurofibromas and particularly bilateral acoustic neuromas. We have used genetic linkage analysis with DNA markers to establish that the defective gene causing BANF is on chromosome 22, and is therefore distinct from the gene for the von Recklinghausen form of neurofibromatosis, which maps to chromosome 17. Linked DNA markers will be particularly valuable in BANF, facilitating early detection of tumours and thereby permitting more effective surgical intervention. In view of the reported loss of genes on chromosome 22 in meningiomas and acoustic neuromas, the genetic localization of the primary BANF defect strongly supports the concept that the disease locus encodes a 'tumour suppressor' gene. Isolation of this gene should provide insights into the pathogenesis of acoustic neuromas and other nervous system tum

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Three successive generations in two families affected with the popliteal pterygium syndrome are reported. While expression of the syndrome was relatively mild in the first and second generation, the patients in the third generation showed the full-blown syndrome. Differential diagnosis between mildly affected patients with the popliteal pterygium syndrome and those with Van der Woude syndrome is difficult and may even be impossible. The present observations further support the hypothesis that both syndromes may in fact represent variants of the same condition. 

##

  popliteal pterygium | HP_0009756
 pterygium | HP_0001059
 popliteal pterygium syndrome | HP_0009757
 mildly affected | HP_0004467
 Van der Woude syndrome | HP_0100257
 differential diagnosis | HP_0004713
 END_CLASSROOM(HP_0009756) | CPG_241500
 END_CLASSROOM(HP_0001059) | CPG_3US | CPG_000300
 END_CLASSROOM(HP_0001508) | HMW_HORVATIC_CURVATURAL | HP_0004707
 END_CLASSROOM(HP_0004467) | HMW_HORVATIC_CURVATURAL | HP_0004707

This training loop is just a proof of concept - to show that even in the heaviest case, it still fits on a gpu.
Depending on your finetuning task, you'll need to remove some parts.
Below we explain how to modify the code to achieve the setup from the [LoRA paper](https://arxiv.org/pdf/2106.09685.pdf)

If you wanna fine-tune a-la LoRA , please use the parameters from Table 11,12 and 15 as a starter:

(1) Train only the adapter matrices from attention layers

In the above example, we train all kinds of adapters, and also layernorm scales and biases. This is only useful for fine-tuning over reasonably large datasets over long time.
For quick setups you should tag everything except **the attention adapters** as `requires_grad=False` -- or just don't feed them into Adam:

```

params_for_optimizer = [
    param for name, param in model.named_parameters()
    if "attn" in name and "adapter" in name
]
print("Trainiable params:", len(params_for_optimizer))

# and after you verified it:
for name, param in model.named_parameters():
    if param not in params_for_optimizer:
        print(f"Setting {name} requires_grad=False")
        param.requires_grad = False
```

An even better way is to only create adapters that you need by modifying the `add_adapters` function above:
```
for name, module in model.named_modules():
    if isinstance(module, (FrozenBNBLinear, FrozenBNBEmbedding)):
        if "attn" in name:
            print("Adding adapter to", name)

            todo_initialize_adapters_like_in_notebook()
        else:
            print("Not adding adapter to", name)
```
As a side-effect, that would actually somewhat reduce the memory usage and may let you fit a longer sequence (e.g. 256)


(2) initialize the second adapter matrix with zeros
```
for name, module in model.named_modules():
    if hasattr(module, "adapter"):
        print("Initializing", name)
        nn.init.zeros_(module.adapter[1].weight)
        # optional: scale adapter[0].weight by (LoRA_alpha / r)
```

(3) use warmup and weight decay in Adam:
```
optimizer = Adam8Bit(..., weight_decay=0.01)
scheduler = transformers.get_linear_schedule_with_warmup(
    optimizer, num_warmup_steps_from_paper(), expected_total_number_of_steps
)

actually_use_scheduler_in_training_loop()
```

Finally, we recommend modifying training loop to track the training metrics, saving the best checkpoint, etc.

In [11]:
#run ID-68
import os

notes = os.listdir('/mnt/isilon/wang_lab/jingye/projects/data/ID-68/corpus/')
notes

['17DG0781',
 '09DG00835',
 '16DG0402',
 '13DG0911',
 '14DG1188',
 '17DG0766',
 '17DG0769',
 '17DG0773',
 '12DG1370',
 '14DG0647',
 '15DG0298',
 '15DG0299',
 '17DG0783',
 '17DG0768',
 '15DG0315',
 '16DG1608',
 '16DG0485',
 '17DG0777',
 '15DG0307',
 '17DG0764',
 '17DG0776',
 '17DG0774',
 '17DG0775',
 '16DG1123',
 '16DG0971',
 '15DG2492',
 '15DG2032',
 'DD_91704',
 '15DG0837',
 '15DG0633',
 '12DG2311',
 '15DG2336',
 '17DG0771',
 '15DG0305',
 '13DG1545',
 '13DG1665',
 '16DG1625',
 '11DG1842',
 '16DG0201',
 '10DG0720',
 '14DG2158',
 '16DG0105',
 '17DG0780',
 '15DG0115',
 '15DG2064',
 '15DG0749',
 '14DG0959',
 '17DG0778',
 '13DG1202',
 '17DG0767',
 '16DG0325',
 '17DG0779',
 '17DG0763',
 '14DG1743',
 '10DG0840',
 '17DG0772',
 '17DG0782',
 '14DG0467',
 '15DG2661',
 '17DG0762',
 '17DG0770',
 '16DG0697',
 '16DG0051',
 '15DG2307',
 '15DG2206',
 '14DG1320',
 '14DG0056',
 '17DG0765']

In [12]:
id_68 = []
id_68_lb = [] 
for note in notes:
    with open(f'/mnt/isilon/wang_lab/jingye/projects/data/ID-68/corpus/{note}','r') as file:
        id_68.append(file.read())
    with open(f'/mnt/isilon/wang_lab/jingye/projects/data/ID-68/ann/{note}', 'r') as file:
        id_68_lb.append(file.read())

In [13]:
a = list(zip(id_68, id_68_lb))
import random
random.seed(1)
a = list(zip(id_68,id_68_lb))

In [27]:
my_list = []
for prompt, lb in zip(id_68, id_68_lb):
    if len(lb.rstrip('\n').split('\n')) <= 10:
        my_list.append((prompt,lb))

In [28]:
len(my_list)

31

In [29]:
random.seed(1)
new_list = random.sample(my_list, 20)

In [32]:
i = 1
for prompt, lb in new_list:
    print(f'---{i}---')
    print(prompt)
    print(lb)
    print(len(lb.rstrip('\n').split('\n')))
    i += 1

---1---
This is a 13 months old boy with severe microcephaly, Dandy-Walker malformation, spasticity, GDD, developmental regression and failure to thrive. Parents are consanguineous, and there are no affected siblings. Head circumference at birth was 27 cm and at 1 year was 34 cm.
40	52	microcephaly	HP:0000252
54	79	dandy walker malformation	HP:0001305
81	91	spasticity	HP:0001257
98	122	developmental regression	HP:0002376
127	144	failure to thrive	HP:0001508

5
---2---
A 9 months old deceased boy who was born at full-term via C-section due to meconium-stained liquor and fetal distress. He was admitted to NICU immediately after birth where he developed convulsions and jerky movements. His seizures have been difficult to control, and he is on multiple medications. The parents are first-degree cousins once removed, and they had a daughter with a similar course that passed away at the age of 16 months. His examination was significant for dysmorphic features and sluggish pupils. Brain MRI sh

In [33]:
i = 1
for prompt, lb in new_list:
    prompt = prompt + '\n\n##\n\n'
    print(f'------{i}------')
    gpt_val(prompt)
    i += 1

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


------1------


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


This is a 13 months old boy with severe microcephaly, Dandy-Walker malformation, spasticity, GDD, developmental regression and failure to thrive. Parents are consanguineous, and there are no affected siblings. Head circumference at birth was 27 cm and at 1 year was 34 cm.

##

  severe microcephaly | HP_0008635
 microcephaly | HP_0001156
 Dandy-Walker malformation | HP_0009589
 spasticity | HP_0001250
 GDD | HP_0001263
 developmental regression | HP_0007087
 failure to thrive | HP_0002089
 Consangineous | HP_0003745
 END_CLASS | HP_0004322
 END_CLASS | HP_0001425
 13 months old boy | HP_0003593
 growth retardation | HP_0001263
 END_CLASS | HP_0004467
 ILL | HP_0003812
  spasticity, GDD | HP_0002564
  spasticity | HP_0001250
  microcephaly | HP_0001999
  Consangineous | HP_0003745
  END_CLASS | HP_0004412
  END_CLASS | HP_0003812
  HP_0003812
  GDD | HP_0001263
  END_CLASS | HP_0004712
  HP_0004745
  developmental regression | HP_0007087
  failure to thrive | HP_0002089
  growth retarda

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


A 9 months old deceased boy who was born at full-term via C-section due to meconium-stained liquor and fetal distress. He was admitted to NICU immediately after birth where he developed convulsions and jerky movements. His seizures have been difficult to control, and he is on multiple medications. The parents are first-degree cousins once removed, and they had a daughter with a similar course that passed away at the age of 16 months. His examination was significant for dysmorphic features and sluggish pupils. Brain MRI showed diffuse brain atrophy and delayed myelination of white matter. 

##

  hyperactive child | HP_0001263
 ENDODROPHIN | HP_0004363
 meconium-stained liquor | HP_0002167
 fetal distress | HP_0008897
 convulsions | HP_0001250
 jerky movements | HP_0001251
 dysmorphic features | HP_0004467
 delayed myelination | HP_0001317
 ENDODROPHIN | HP_0004359
 ENDODROPIN | HP_000897
 ENDOCYTOSC | HP_0004656
 ENDOCYTOSC PTP | HP_0007118
 delayed myelination | HP_0001317
 ENDOCYTOSC

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


A 12 years old boy who was born at full term via NSVD to a G2P1 mother following an uneventful pregnancy. He was noted to have an abnormal roving eye movement in infancy, and it was found that his retina is severely damaged, bilaterally. His motor and cognitive functions are appropriate for age. His parents are first-degree cousins, and they have a daughter with a similar condition and occipital encephalocele and two sons with intellectual disability. His growth parameters at 12 years of age were weight 39 kg (50th -75th percentile), height 153 cm (50th -75th percentile) and head circumference 53 cm (25th – 50th percentile). He does not have dysmorphic features. Brain MRI showed nonspecific white matter changes.

##

  abnormal roving eye movement | HP_0006818
 roving eye movement | HP_0001263
 full term | HP_000377
 END_OF_1ST_TRIM | HP_0006996
 END_OF_1ST_TRIM | HP_0004467
 encephalocele | HP_0000238
 condition | HP_0001425
 nonspecific white matter changes | HP_0007407
 weight | HP_

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


A 17 months old girl with global developmental delay and neonatal-onset hypotonia. She was born via NSVD with no perinatal or maternal risk factors. She has poor feeding and hypotonia since birth. Developmentally, she has a global developmental delay affecting all domains. She can coo, raise her head, grasp objects and roll over from side to side. The parents are first-degree cousins with two healthy daughters. Her examination was significant for severe hypotonia and weakness, but other examinations were within normal limits. Her last growth parameters at 17 months of age were weight 9.7 kg (25th -50th percentile), height 80 cm (on the 50th percentile) and head circumference 47.1 cm (50th -75th percentile). The brain MRI showed white matter abnormality. 

##

  global developmental delay | HP_0001263
 neonatal-onset hypotonia | HP_0006821
 hypotonia | HP_0001252
 poor feeding | HP_0006828
 poor feeding and hypotonia | HP_0007078
 poor feeding since birth | HP_0004452
 developmental del

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Autism spectrum disorder and speech delay

##

  autism spectrum disorder | HP_0003593
 speech delay | HP_0001344
 END_OF_SHORTHOUSE_DELAY | HP_0003571
 END_OF_SHORTHOUSE_DELay | HP_0003570
 Short Hais | HP_0000336
 Hais | HP_0001059
 END_OF_REMS | HP_0002508
 REMS | HP_0001263
 END_OF_VAGINAL_RHYTHM | HP_0006573
 END_OF_VAGINAL_RHYTHM | HP_0002651
 VAGINAL_RHYTHM | HP_0002612
 END_OF_SLEEP | HP_0008684
 SLEEP | HP_0001263
 END_OF_LIFESPAN | HP_0008422
 END_OF_LIFESPAN | HP_0003589
 LIFESPAN | HP_0001264
 END_OF_FACET | HP_0004452
 Facet | HP_0000951
 END_OF_TOOTH | HP_0004452
 TOOTH | HP_0000196
 END_OF_RIB | HP_0004452
 RIB | HP_0000177
 END_OF_SKIN | HP_0000076
 Skin | HP_0001999
 END_OF_JAW | HP_0004452
 Jaw | HP_0000206
 END_OF_TRA | HP_0004452
 TRA | HP_0001291
 END_OF_URIN | HP_0011463
 URD | HP_0008794
 END_OF_URIN | HP_0008897
 END_OF_URIN | HP_0008898
 URD | HP_0008794
 END_OF_URIN | HP_0008897
 END_OF_URIN | HP_0008898
 END_OF_URIN | HP_0008899
 END_OF_URIN | HP_0008900
 END

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


A 6 years old girl who was born via NSVD to a healthy 20 years old primigravida following a pregnancy that was complicated by threatened abortion. The antenatal and perinatal histories were unremarkable, and she was discharged on the second day. The first abnormality was noted at 14 months of age when she had tip-toeing and extensive investigations at that time were normal. She had two seizures at 26 and 28 months of age, and she has been seizure-free since then. She is now off medications. However, the primary concern was noted at the age of three years when she had an expressive language delay. Her cognitive abilities, in general, were reduced. She was evaluated by a psychiatrist and was diagnosed with ADHD and was started on medications. She also had an estimated IQ of 76. Her mother also noticed that she has frequent sleep panic attacks. Her parents are nonconsanguineous with another mildly affected daughter. Her father also has mild cognitive impairment, and there are multiple fam

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


A 7 years old girl with a developmental delay, nystagmus and ADHD. She was born at full term via NSVD to a G9P6 SB2 40 years old mother following an uneventful pregnancy. The perinatal and postnatal histories were unremarkable, and she was discharged on the second day. The first concern was a developmental delay that was recognized in late infancy. She walked at 18 months, and she started saying mama and dada at 12 months, but she currently has less than 100 words vocabulary. She is not toilet trained. Her IQ was estimated to be 77. Her parents are consanguineous with a family history of a similar condition in two siblings and four cousins. Her father seems to be mildly affected. Her physical examination showed a thin body built, and other examinations were within normal limits. Her growth parameters at 7 years and 4 months were weight 16.1 kg (<3rd percentile), height 113.4 cm (3rd-10th percentile) and head circumference 53 cm (75th-90th percentile). Brain MRI showed thickened corpus 

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


11 years old boy who was born at full term via NSVD following an uneventful pregnancy. His birth growth parameters were weight 2.75 kg and head circumference 31.5 cm. He has a global developmental delay, microcephaly and seizures. Parents are first-degree cousins, and they have another affected son. There is a family history of hearing loss on the maternal side. (14DG0648) A 9 years old boy who was born at full term via NSVD following an uneventful pregnancy. Birth weight was 2.5 kg, and he was noted to have microcephaly. He also has a global developmental delay and seizures. 

##

  global developmental delay | HP_0001249
 microcephaly | HP_0000252
 seizures | HP_0001250
 hearing loss | HP_0000365
 hearing loss, on the maternal side | HP_0004363
 hearing loss, on the maternal side | HP_0000110
 global developmental delay | HP_0002276
 ENDORECOMPATIBILITY | HP_0003813
 ENDORECOMPATIBILITIES | HP_0001291
 ENDORECOMPATIBILITIES | HP_0004363
 ENDOCYSTAL CYSTS | HP_0005986
 CYSTS | HP_0002

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


This is a baby girl who was born at 36 weeks gestational age via C-section to primigravid 24 years old healthy mother. She was admitted to NICU for symmetric IUGR and mild respiratory distress. Apgar scores were 8 and 9 at 1 and 5 minutes, respectively, and growth parameters at birth were weight 1.92 kg (<1st percentile), length 48 cm (25th-50th percentile) and OFC 29.5 cm (<1st percentile). On examination, her dysmorphic features include oligodactyly and dysplasia of the digits in the right hand (4 digits), Cutis aplasia in the scalp, posteriorly rotated ears, prominent nose and micrognathia. Her last growth parameters were all below 3rd percentile, and other examinations were within normal limits. Brain MRI was reported to be within normal limits. Skeletal survey showed 5 metacarpals with 3 hypoplastic fingers in the right hand as well as a claw appearance of a part of the little finger. The left hand is unremarkable. 

##

  asymmetric IUGR | HP_0008935
 mild respiratory distress | 

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


A 5 years old girl with swallowing difficulty, recurrent chest infections and developmental delay. She can only say 4 words. Her parents are first-degree cousins, and there is no family history of a similar condition. Her growth parameters at 5 years of age were weight 9.4 kg, height 82.5 cm and head circumference 46 cm, all below the 3rd percentile. Her physical examination was significant for a blue sclera, hypotonia and hyporeflexia. Vision and hearing were normal, and other examinations were within normal limits. The brain MRI was grossly normal with diffuse T-2 hyperintensity in subcortical white matter. 

##

  swallowing difficulty | HP_0001627
 developmental delay | HP_0001263
 hypotonia | HP_0001159
 hyporeflexia | HP_0008586
 3 rd percentile | HP_0004051
 low weight, 9.4 k | HP_0006996
 low height, 82.5 cm | HP_0009373
 low head circumference, 46 cm | HP_0004458
 hypoplastic scul | HP_0001140
 hypoplastic scul | HP_0004452
 brachycephaly | HP_0001155
 brachycephaly | HP_00049

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


A 6.5 years old boy who was born at full term via NSVD with normal growth parameters at birth. Antenatal and perinatal history was unremarkable, and he was discharged home on the following day. His gross motor development has been appropriate for age. However, he seems to have a delay in fine motor and language domains. He still has immature finger grasp and cannot feed himself or dress himself. Parents are first-degree cousins, and they have two other children with intellectual disability. His last growth parameters at 6 years of age were weight 16.6 kg (10th-25th percentile), height 113 cm (on the 25th percentile) and head circumference 48.4 cm (5th-10th percentile). He has no dysmorphic features apart from bilateral clinodactyly. 

##

  intellectual disability | HP_0001249
 bilateral clinodactyly | HP_0008021
 ENDODERM | HP_0001949
 ENDODERM, Normal Growth | HP_0008522
 ENDOGENOUS FERTILITY | HP_0001249
 ENDOGENOUS FERTILITY (MIDDLE OF GESTATION) | HP_0006821
 ENDOGENOUS FERTILITY 

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


A 23 months old girl who was born via NSVD to G4P3 36 years old mother following an uneventful pregnancy. Apgar scores were 8 and 9 at 1 and 5 minutes, respectively. She seized at the 10th hour of life, and she was admitted to NICU. Comprehensive investigations were carried out, and they were negative. She continues to have break-through seizures despite a trial of multiple antiepileptic drugs. On examination, her growth parameters at 23 months were weight 9.6 kg (10th -25th percentile), height 79.5 cm (10th -25th percentile) and head circumference 51 cm (>98th percentile). Her dysmorphic features are macrocephaly, prominent forehead, downslanting epicanthal folds, anteverted nares, one café au lait spot and significant axial and appendicular hypotonia. Other examinations were within normal limits. 

##

  hyperinsulinaemia | HP_0009589
 ENDORETIN | HP_0004606
 ENDORETIN REACH | HP_0005486
 ENDOCYTOSCOPIC SYNDROME | HP_0002664
 ENDOCYTOSCOPIC SYNDROME | HP_0002665
 SYNDROMES | HP_00013

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


A 13 years old boy who was born at full term via NSVD following an uneventful pregnancy. He was developmentally normal until two years of age when he started to regress by losing his ability to speak. He sat with support at 4 months, talked by 9 months, and walked at 13 months. He is currently nonexpressive verbally, although he does seem to retain some verbal comprehension. He started to have seizures at 3 years of age. His seizures seem to be a combination of absence seizures and generalized tonic-clonic seizures which can occur even while he is walking and causing him to fall on the floor. He has an older sister who has been through the same course. His past medical history is noteworthy for a herniorrhaphy at age 4 months of age. His physical examination revealed appendicular hypertonia and hyporeflexia. He does not have dysmorphic features, and other examinations were within normal limits. His and his sister’s brain MRI were reported as normal. An experimental PET scan was done fo

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


A 10 years old with ADHD, autistic features, cold intolerance, sloping forehead, microcephaly, brittle hair, esotropia, failure to thrive, muscle weakness and frequent falls.

##

  hyperactive disorder | HP_0004378
 ENDORECOMPENSABILITY | HP_0004467
 AUTISTIC FACTOR | HP_0001249
 AUTISTIC FACTOR SYNDROME | HP_0001250
 AUTISITIC SYNDROME | HP_0001251
 AUTISITIC ESSURGECT | HP_0004390
 AUTISITIC ESSURGECT SYNDROME | HP_0004087
 ENDORECOMPENSABILITY | HP_0001143
 BRACIOSCOPIC | HP_0005363
 BRACIOSCOPIC SYMMETRICAL | HP_0005361
 Microcephaly | HP_0000252
 BRACIO-CORVOSCOPIC | HP_0001272
 BRACIOSCOPIC SYMMETRICAL | HP_0001270
 SYMMETRICAL | HP_0006411
 ENDORECOMPENSABILITY | HP_0004467
 ENDORECOMPENSABILITY SYMMETRICAL | HP_0004468
 ENDORECOMPENSABILITY | HP_0004378
 ENDORECOMPENSABILITY SYMMETRICAL | HP_0004468
 AUTISITIC | HP_0001263
 ENDORECOMPENSABILITY | HP_0004088
 AUTISITIC ESSURGECT | HP_0004390
 ENDORECOMPENSABILITY | HP_0004381
 Autistic features | HP_0001263
 ENDORECOMPENSABILIT

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


A 14 months old boy with severe microcephaly, global developmental delay, severe photosensitivity and congenital cataract. Family history is significant of two deceased sisters with a similar phenotype.

##

  severe microcephaly | HP_0000252
 global developmental delay | HP_0001263
 severe photosensitivity | HP_0008675
 cataract | HP_0000518
 congenital | HP_0001267
 END_OF_GENE | HP_0001420
 END_OF_GENE | HP_0004322
 died | HP_0001317
 END_OF_GENE | HP_0004319
 END_OF_GENE | HP_0004321
 END_OF_GENE | HP_0004320
 END_OF_GENE | HP_0002212
 END_OF_GENE | HP_0004318
 END_OF_GENE | HP_0002564
 END_OF_GENE | HP_0004317
 END_OF_GENE | HP_0002565
 END_OF_GENE | HP_0004314
 END_OF_GENE | HP_0002566
 END_OF_GENE | HP_0002567
 END_OF_GENE | HP_0002568
 END_OF_GENE | HP_0002569
 END_OF_GENE | HP_0002570
 END_OF_GENE | HP_0002571
 END_OF_GENE | HP_0002572
 END_OF_GENE | HP_0002573
 END_OF_GENE | HP_0002574
 END_OF_GENE | HP_0002575
 END_OF_GENE | HP_0002576
 END_OF_GENE | HP_0002577
 END_OF_GENE 

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


An 11 months old girl who was born at full-term via C-section due to failure to progress. Birth weight was 2.6 kg, and she was discharged from the hospital on the following day. She was noted at birth to have a cleft palate. She has had recurrent aspiration pneumonia that began within the second week of life led to several hospital admissions. Developmentally, she has a global developmental delay affecting all domains. She can roll over and hold her milk bottle, but she cannot sit, stand or say any word. Her parents are nonconsanguineous, and other siblings are healthy. On examination, her dysmorphic features include hypertelorism, excessive forehead hair, broad thumbs, three phalanges thumbs and bilateral prominent fingertip pads. Her growth parameters at 11 months of age were weight 5.1 KG and height 64 cm; both were below the 1st percentile. Her neurological assessment showed hyperreflexia and hypotonia, but other examinations were within normal limits. Brain MRI and EEG were report

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


A 3 years old girl with failure to thrive, short stature, microcephaly, speech delay, anteriorly placed anus and dysmorphic features. She was born preterm with IUGR to consanguineous parents. 

##

  failure to thrive | HP_0001061
 short stature | HP_0004375
 microcephaly | HP_0000252
 speech delay | HP_0001177
 anteriorly placed anus | HP_0002089
 dysmorphic features | HP_0001249
 preterm | HP_0002814
 IUGR | HP_0001425
 consanguineous parents | HP_0006746
 END_CLASSIFICATION
 END_CLASSIFICATION3RICH_BASIC_SYNDROMS | HP_0006322
 preterm | HP_0001425
 IUGR | HP_0001425
 END_CLASSIFICATION2CUTANEOUS_SYMPTOMS | HP_0004711
 dysmorphic features | HP_0001250
 END_CLASSIFICATION1CUTANEOUS_SYMPTOMS | HP_0001263
 END_CLASSIFICATION1CUTANEOUS | HP_0004711
 END_CLASSIFICATION2VAGINAL_CRY | HP_0005134
 END_CLASSIFICATION3RICH_BASIC | HP_0009795
 END_CLASSIFICATION1IUGR | HP_0001425
 END_CLASSIFICATION2CUTANEOUS_SYMPTOMS_VAGINAL_CRY | HP_0005136
 IUGR | HP_0001425
 END_CLASSIFICATION3RICH_BASIC_SY

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


A 5 years old boy with dysmorphic facial features, hypertelorism, downslanting eyes, small ears, epicanthal folds, short neck, global developmental delay and mild autistic features. The parents are consanguineous, and they have another affected daughter.

##

  dysmorphic facial features | HP_0004467
 hypertelorism | HP_0000209
 downslanting eyes | HP_0000356
 small ears | HP_0000394
 epicanthal folds | HP_0000381
 short neck | HP_0004412
 global developmental delay | HP_0001263
 mild autistic features | HP_0001263
 END_OF_PROJECTION | HP_0004422
 Consanguineous | HP_0001425
 global developmental delay | HP_0001263
 mild autistic features | HP_0001263
 END_OF_PROJECT | HP_0003813
 END_OF_HEAD | HP_0004450
 developmental delay | HP_0001263
 Autistic features | HP_0001263
 END_OF_HEAD | HP_0004449
 Autistic features | HP_0001263
 END_OF_BEHAVIOR | HP_0004429
 Consanguineous | HP_0001425
 END_OF_BEHAVIOR | HP_0004428
 Autistic features | HP_0001263
 END_OF_BEHAVIOR | HP_0004427
 Autistic 

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


A 17 years old girl with neuroregression that started at 4 years of age. She had been developmentally normal until the age of 4 years when she started to gradually lose her ability to walk and talk, and she is currently wheelchair-bound. She communicates through nodding her head. She also has developed seizures, but they have been under control. The parents are first-degree cousins, and they have three other affected children and three deceased children who died for an unknown reason. On examination, her growth parameters were weight 32 kg (<3rd percentile) and head circumference 49 cm(<3rd percentile). She has an ulnar deviation of the hands. She also has hyperreflexia of the extremities and strabismus. MRI showed mild leukodystrophy. Skin biopsy showed inclusion bodies. 

##

  neuroregression | HP_0001311
 developmentally normal | HP_0004322
 gradually lost her ability to walk and talk | HP_000ID35
 loss of her ability to walk and talk | HP_0004711
 gradually lost her ability to wal