<!--I'm writing this post as part of a blog. When I ask you to rephrase things bare this in mind. Also make sure you return requested rephrasing in markdown code so that I can easily copy and paste 
-->

# GPTRec Implementation

Following up from my previous post on the [GPTRec paper](https://andrewboney.github.io/andrew_boney_blog/posts/GPTRec%20Paper%20Read/), I want to implement a GPTRec-style system.

The original author has existing implementations across a couple of repos: [gptrec_rl](https://github.com/asash/gptrec_rl) and [bert4rec_repro](https://github.com/asash/bert4rec_repro). However, these are broader in scope than I need, and are implemented in TensorFlow (boo!!!).

Instead, I'll be implementing this using a framework I'm building: [rec](https://github.com/AndrewBoney/rec). This is a work in progress aiming to provide an end-to-end implementation of the kinds of recommendation systems used in industry—encompassing the whole lifecycle from data prep to deployment. While there's still plenty to add, the baseline should now be robust enough to work at scale. I'll likely use this framework in future posts too.

In this implementation, I'm going to focus on the GPTRec architecture - ignoring the sub-item tokenisation and Next-K prediction aspects of the paper.

## Get repo

In [None]:
!git clone https://github.com/AndrewBoney/rec.git && cd rec && git checkout 7a6475f


fatal: destination path 'rec' already exists and is not an empty directory.


In [None]:
import sys
sys.path.insert(0, 'rec')

# Data

First, we'll use the [rec](https://github.com/AndrewBoney/rec) framework to prepare the data. 

## Generate data

I'll use the [MovieLens 1M dataset](https://grouplens.org/datasets/movielens/1m/)—a classic benchmark in recommendation systems research containing 1 million ratings from 6,000 users on 4,000 movies. The `rec` framework includes a [data preparation module](https://github.com/AndrewBoney/rec/blob/main/rec/data_prep/movielens.py) that downloads and processes this dataset into a consistent format for training.

In [None]:
import pandas as pd

import os

from rec.data_prep.movielens import generate_movielens_1m
from rec.common.data import DataPaths

root_folder = "../../assets/movielens_rec_data"

generate_movielens_1m(output_dir = root_folder)

In [None]:
paths = DataPaths(
    users_path = os.path.join(root_folder, "prepared", "users.parquet"),
    items_path = os.path.join(root_folder, "prepared", "items.parquet"),
    interactions_train_path = os.path.join(root_folder, "prepared", "interactions_train.parquet"),
    interactions_val_path = os.path.join(root_folder, "prepared", "interactions_val.parquet")
)

In [None]:
users = pd.read_parquet(paths.users_path)
print(users.head())

  user_id   age gender age_group occupation    zip zip_prefix
0       1   1.0      F   age_1.0     occ_10  48067        480
1       2  56.0      M  age_56.0     occ_16  70072        700
2       3  25.0      M  age_25.0     occ_15  55117        551
3       4  45.0      M  age_45.0      occ_7  02460        024
4       5  25.0      M  age_25.0     occ_20  55455        554


In [None]:
items = pd.read_parquet(paths.items_path)
print(items.head())

  item_id                        genres                 genre_grouped  \
0       1   Animation|Children's|Comedy                         other   
1       2  Adventure|Children's|Fantasy  Adventure|Children's|Fantasy   
2       3                Comedy|Romance                Comedy|Romance   
3       4                  Comedy|Drama                  Comedy|Drama   
4       5                        Comedy                        Comedy   

                                title                    title_raw  year  \
0                    Toy Story (1995)                    Toy Story  1995   
1                      Jumanji (1995)                      Jumanji  1995   
2             Grumpier Old Men (1995)             Grumpier Old Men  1995   
3            Waiting to Exhale (1995)            Waiting to Exhale  1995   
4  Father of the Bride Part II (1995)  Father of the Bride Part II  1995   

  year_bucket  
0  year_1990s  
1  year_1990s  
2  year_1990s  
3  year_1990s  
4  year_1990s  


In [None]:
interactions_train = pd.read_parquet(paths.interactions_train_path)
print(interactions_train.head())

  user_id item_id  rating   timestamp
0       1    1193       5  2000-12-31
1       1     661       3  2000-12-31
2       1     914       3  2000-12-31
3       1    3408       4  2000-12-31
4       1    2355       5  2001-01-06


In [None]:
interactions_val = pd.read_parquet(paths.interactions_val_path)
print(interactions_val.head())

  user_id item_id  rating   timestamp
0      36    1266       5  2002-12-22
1      36    2713       1  2002-12-22
2      36     595       4  2002-12-22
3      36     247       4  2002-12-22
4      36    1295       4  2002-12-22


## Define Feature Config

This defines the cols used in the dataset, and the types of features they should be converted into.

In [None]:
from rec.common.data import FeatureConfig

feature_config = FeatureConfig(
    user_id_col = "user_id", 
    item_id_col = "item_id",
    user_cat_cols = [],
    item_cat_cols = [], 
    interaction_user_col = "user_id",
    interaction_item_col = "item_id", 
    interaction_time_col = "timestamp"
)

## Build Encoders

Before training, we need to convert raw IDs (like `u_000001` or `i_000042`) into integer indices that can be used for embedding lookups—much like tokenization in NLP. The `build_encoders` function creates a `CategoryEncoder` for each categorical column, mapping each unique value to an integer index while reserving 0 for unknown values.

In [None]:
from rec.common.data import build_encoders

user_encoders, item_encoders = build_encoders(
    users_path = paths.users_path,
    items_path = paths.items_path,
    interactions_path = paths.interactions_train_path,
    feature_cfg = feature_config
)

In [None]:
user_encoders, item_encoders

({'user_id': <rec.common.data.CategoryEncoder at 0x76e62cbc0f80>},
 {'item_id': <rec.common.data.CategoryEncoder at 0x76e62cbc11c0>})

## Build Cardinalities

We also need to define the feature cardinalities, i.e. the number of unique values for each categorical feature. This is used to determine the size of the embeddings. 

In [None]:
from rec.common.train import build_cardinalities

user_cardinalities = build_cardinalities(user_encoders, [feature_config.user_id_col])
item_cardinalities = build_cardinalities(item_encoders, [feature_config.item_id_col])

In [None]:
user_cardinalities, item_cardinalities

({'user_id': 6041}, {'item_id': 3884})

In this case we only need `user_id` and `item_id`. Note that the cardinality is the number of unique values, plus one for the unknown value.

## Build User / Item Map

Sequential recommendation models like GPTRec learn from the *order* in which users interact with items—predicting the next item based on the sequence of previous ones. To train such models, we need to transform our flat interaction table into an ordered mapping: for each user, a chronologically sorted list of item IDs.

While the `rec` framework includes a [`build_user_item_map`](https://github.com/AndrewBoney/rec/blob/7a6475f946ebb6f9285572ef6b2d53dc5a05c38f/rec/common/train.py#L118) function, this was designed for non-sequential models where interaction order doesn't matter—it simply collects the set of items each user has interacted with. For GPTRec, we need a modified version that preserves temporal ordering by sorting on the timestamp column. I'll likely integrate this into the framework in a future update.

In [None]:
from typing import Dict, List

from rec.common.io import read_parquet_batches
from rec.common.data import FeatureConfig, CategoryEncoder

def build_user_item_map_ordered(
    interactions_path: str,
    feature_cfg: FeatureConfig,
    user_encoders: Dict[str, CategoryEncoder],
    item_encoders: Dict[str, CategoryEncoder],
    chunksize: int = 200_000,
) -> Dict[int, List[int]]:
    """Build user->items map ordered by timestamp (ascending)."""
    user_to_items: Dict[int, List[tuple]] = {}  # uid -> [(timestamp, item_id), ...]
    
    for chunk in read_parquet_batches(interactions_path, chunksize):
        user_ids = user_encoders[feature_cfg.user_id_col].transform(
            chunk[feature_cfg.interaction_user_col].astype(str).tolist()
        )
        item_ids = item_encoders[feature_cfg.item_id_col].transform(
            chunk[feature_cfg.interaction_item_col].astype(str).tolist()
        )
        timestamps = chunk[feature_cfg.interaction_time_col].tolist()
        
        for uid, iid, ts in zip(user_ids, item_ids, timestamps):
            uid, iid = int(uid), int(iid)
            if uid not in user_to_items:
                user_to_items[uid] = []
            user_to_items[uid].append((ts, iid))
    
    # Sort by timestamp and extract just the item ids
    return {
        uid: [iid for _, iid in sorted(items)]
        for uid, items in user_to_items.items()
    }

train_user_item_map = build_user_item_map_ordered(
    paths.interactions_train_path,
    feature_config,
    user_encoders,
    item_encoders,
)

val_user_item_map = build_user_item_map_ordered(
    paths.interactions_val_path,
    feature_config,
    user_encoders,
    item_encoders,
)

In [None]:
print(train_user_item_map[36])

[1, 11, 21, 30, 32, 34, 47, 143, 171, 177, 195, 221, 230, 231, 244, 254, 314, 326, 353, 377, 438, 446, 477, 497, 521, 548, 584, 585, 586, 594, 643, 700, 771, 776, 842, 1064, 1066, 1082, 1112, 1120, 1121, 1157, 1173, 1225, 1230, 1231, 1240, 1251, 1255, 1259, 1266, 1278, 1301, 1336, 1352, 1373, 1375, 1376, 1383, 1425, 1449, 1456, 1483, 1492, 1506, 1530, 1534, 1540, 1544, 1596, 1608, 1618, 1631, 1696, 1761, 1764, 1807, 1808, 1814, 1839, 1841, 1855, 1895, 1900, 1932, 1944, 1955, 1997, 2010, 2013, 2026, 2032, 2040, 2069, 2077, 2087, 2175, 2177, 2180, 2182, 2226, 2244, 2253, 2284, 2287, 2303, 2327, 2338, 2339, 2350, 2356, 2364, 2365, 2401, 2422, 2434, 2473, 2503, 2512, 2531, 2537, 2560, 2603, 2615, 2619, 2625, 2631, 2632, 2634, 2638, 2648, 2654, 2678, 2693, 2694, 2695, 2723, 2729, 2790, 2847, 2850, 2891, 2905, 2919, 2967, 2969, 2971, 2984, 2992, 3004, 3013, 3019, 3046, 3088, 3107, 3108, 3145, 3185, 3187, 3195, 3230, 3233, 3240, 3290, 3293, 3356, 3436, 3437, 3457, 3458, 3484, 3510, 3523, 3555

In [None]:
print(val_user_item_map[36])

[245, 294, 592, 1247, 1276, 1654, 2201, 2301, 2626, 2645, 3106, 3571, 3718]


In a future iteration, I may extend this to include timestamps in the output mapping. This would enable time-based positional embeddings—encoding *when* interactions occurred rather than just their relative order. For now, I'll keep things simple with index-based positional embeddings.

## Build Feature Store

The [`FeatureStore`](https://github.com/AndrewBoney/rec/blob/main/rec/common/data.py#L174) class provides efficient feature lookup for users and items during training and inference. Rather than repeatedly encoding features on-the-fly, it pre-encodes all user and item features into tensors at initialization—storing them in memory for fast indexed access.

Key functionality:
- **Pre-encoded tensors**: All categorical and dense features are encoded once and stored as PyTorch tensors with zero-padding at index 0 (for unknown/missing values)
- **Index mappings**: Maintains `user_index` and `item_index` dictionaries that map encoded IDs to their row positions in the feature tensors
- **Batch lookups**: `get_user_features()` and `get_item_features()` retrieve all features for a batch of IDs in a single operation
- **Item catalog access**: `get_all_item_features()` and `get_all_item_ids()` provide full item catalog access—useful for scoring all items during inference

In [None]:
from rec.common.data import FeatureStore

fs = FeatureStore(
    user_df=users,
    item_df=items,
    user_encoders=user_encoders,
    item_encoders=item_encoders,
    feature_cfg = feature_config
)

print(dir(fs))

['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'feature_cfg', 'get_all_item_features', 'get_all_item_ids', 'get_item_features', 'get_user_features', 'item_encoders', 'item_features', 'item_id_tensor', 'item_index', 'map_item_ids_to_indices', 'user_encoders', 'user_features', 'user_index']


## Build Dataset

For this I want a dataset that generates a padded sequence of items for each user. 

First, work out a good max sequence length based on the distribution in the data

In [None]:
import numpy as np

lens = {k : len(v) for k, v in train_user_item_map.items()}

print("Max len:", max(lens.values()))
print("Avg len:", np.mean(list(lens.values())))
print("Std len:", np.std(list(lens.values())))

Max len: 2314
Avg len: 163.83573439311144
Std len: 190.44394716254214


In [None]:
max_len = 200

print(f"Pct > {max_len}:", round(len([None for l in lens.values() if l > max_len]) / len(lens) * 100, 4) , "%")

Pct > 200: 25.7824 %


With a max sequence length of 200, we capture the full history for ~75% of users. While this loses some information it allows us to work in a compute limited environment. 

Now let's build the datasets. For sequential recommendation, we need two different dataset types:

1. **Training dataset**: Uses a sliding window approach where, given a sequence of items `[A, B, C, D]`, the model learns to predict each next item from the preceding context: `A→B`, `[A,B]→C`, `[A,B,C]→D`. This is implemented by shifting input and labels by one position.

2. **Evaluation dataset**: Uses the full training history as context and held-out validation items as targets. This mirrors the real inference scenario: given everything we know about a user's past behavior, can we predict what they'll interact with next?

Both datasets use left-padding (padding at the start of sequences) so the most recent item is always at the same position—this works naturally with causal attention where we predict the next token based on previous ones.

In [None]:
import torch

from torch.utils.data import Dataset, DataLoader

class SequentialTrainDataset(Dataset):
    """Training dataset: generates sequences for next-item prediction."""
    PAD_TOKEN = 0
    
    def __init__(
        self,
        user_item_map: Dict[int, List[int]],
        max_length: int = 50,
        min_length: int = 2,
    ) -> None:
        super().__init__()
        self.max_length = max_length
        self.user_item_map = user_item_map
        self.user_ids = [
            uid for uid, items in user_item_map.items() 
            if len(items) >= min_length
        ]
    
    def __len__(self) -> int:
        return len(self.user_ids)
    
    def __getitem__(self, idx: int) -> Dict[str, torch.Tensor]:
        user_id = self.user_ids[idx]
        items = self.user_item_map[user_id]
        
        # Need max_length + 1 items to get max_length input/target pairs
        if len(items) > self.max_length + 1:
            items = items[-(self.max_length + 1):]
        
        # input/target shifted by 1
        input_items = items[:-1]
        labels = items[1:]
        actual_len = len(input_items)
        
        # Left-pad to max_length
        pad_len = self.max_length - actual_len
        input_seq = np.full(self.max_length, self.PAD_TOKEN, dtype=np.int64)
        label_seq = np.full(self.max_length, self.PAD_TOKEN, dtype=np.int64)
        input_seq[pad_len:] = input_items
        label_seq[pad_len:] = labels
        
        attention_mask = np.zeros(self.max_length, dtype=np.float32)
        attention_mask[pad_len:] = 1.0
        
        return {
            "user_id": torch.tensor(user_id, dtype=torch.long),
            "input_ids": torch.from_numpy(input_seq),
            "labels": torch.from_numpy(label_seq),
            "attention_mask": torch.from_numpy(attention_mask),
            "seq_length": torch.tensor(actual_len, dtype=torch.long),
        }


class SequentialEvalDataset(Dataset):
    """
    Eval dataset for retrieval metrics.
    
    Returns user's training history as context, and val items as targets.
    Compatible with evaluate_retrieval pattern - model produces scores,
    we compare top-k against val items.
    """
    PAD_TOKEN = 0
    
    def __init__(
        self,
        train_user_item_map: Dict[int, List[int]],
        val_user_item_map: Dict[int, List[int]],
        max_length: int = 50,
    ) -> None:
        super().__init__()
        self.max_length = max_length
        self.train_map = train_user_item_map
        self.val_map = val_user_item_map
        
        # Users with val items AND some training history
        self.user_ids = [
            uid for uid in val_user_item_map
            if len(val_user_item_map[uid]) >= 1 and uid in train_user_item_map
        ]
    
    def __len__(self) -> int:
        return len(self.user_ids)
    
    def __getitem__(self, idx: int) -> Dict[str, torch.Tensor]:
        user_id = self.user_ids[idx]
        context_items = self.train_map.get(user_id, [])
        target_items = self.val_map[user_id]
        
        # Truncate context to max_length (keep most recent)
        if len(context_items) > self.max_length:
            context_items = context_items[-self.max_length:]

        actual_len = len(context_items)
        pad_len = self.max_length - actual_len
        
        input_seq = np.full(self.max_length, self.PAD_TOKEN, dtype=np.int64)
        input_seq[pad_len:] = context_items
        
        attention_mask = np.zeros(self.max_length, dtype=np.float32)
        attention_mask[pad_len:] = 1.0
        
        return {
            "user_id": torch.tensor(user_id, dtype=torch.long),
            "input_ids": torch.from_numpy(input_seq),
            "attention_mask": torch.from_numpy(attention_mask),
            "seq_length": torch.tensor(actual_len, dtype=torch.long),
            # Targets for metric computation (variable length)
            "target_items": torch.tensor(target_items, dtype=torch.long),
        }

def collate_eval_batches(batch):
    return {
        "user_id" : torch.stack([ex["user_id"] for ex in batch]),
        "input_ids": torch.stack([ex["input_ids"] for ex in batch]),
        "attention_mask": torch.stack([ex["attention_mask"] for ex in batch]),
        "seq_length": torch.stack([ex["seq_length"] for ex in batch]),
        "target_items": [ex["target_items"] for ex in batch],  # keep as list of tensors for ragged
    }

A few implementation details worth noting:

- **PAD_TOKEN = 0**: We reserve index 0 for padding, which aligns with the `+1` offset we built into our encoders earlier
- **Minimum length filtering**: Training requires at least 2 items (one for input, one for target), so we filter out users with singleton interactions
- **Variable-length targets**: The eval dataset keeps targets as a list of tensors rather than padding them, since different users have different numbers of validation interactions. The custom `collate_eval_batches` function handles this ragged structure.

In [None]:
# Training
train_dataset = SequentialTrainDataset(train_user_item_map, max_length=max_len, min_length=2)

# Evaluation - pass train history as context
val_dataset = SequentialEvalDataset(
    train_user_item_map,    
    val_user_item_map, 
    max_length=max_len
)

In [None]:
# Check a train sample
train_dataset[0]

{'user_id': tensor(1),
 'input_ids': tensor([   0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
            0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0

In [None]:
# Check a val sample
val_dataset[0]

{'user_id': tensor(36),
 'input_ids': tensor([2790, 2847, 2850, 2891, 2905, 2919, 2967, 2969, 2971, 2984, 2992, 3004,
         3013, 3019, 3046, 3088, 3107, 3108, 3145, 3185, 3187, 3195, 3230, 3233,
         3240, 3290, 3293, 3356, 3436, 3437, 3457, 3458, 3484, 3510, 3523, 3555,
         3603, 3629, 3644, 3677, 3685, 3717, 3725, 3752, 3767, 3794, 3883,    6,
           10,   16,  109,  110,  160,  163,  164,  227,  233,  346,  374,  451,
          454,  463,  471,  524,  590,  605,  642,  725,  779,  856,  901,  913,
          958, 1063, 1065, 1075, 1077, 1079, 1179, 1180, 1183, 1191, 1193, 1197,
         1210, 1215, 1221, 1246, 1253, 1257, 1280, 1288, 1338, 1350, 1386, 1452,
         1546, 1563, 1569, 1575, 1576, 1600, 1629, 1674, 1743, 1852, 1892, 1960,
         1990, 1996, 2050, 2210, 2220, 2266, 2285, 2323, 2325, 2472, 2696, 2747,
         2803, 2815, 2822, 2848, 2899, 2904, 2998, 3033, 3079, 3128, 3188, 3373,
         3380, 3444, 3459, 3483, 3638, 3659, 3695, 1843, 3340,  258,  29

In [None]:
batch_size = 16

train_dl = DataLoader(train_dataset, batch_size = batch_size, shuffle = True)
val_dl = DataLoader(val_dataset, batch_size = batch_size, collate_fn = collate_eval_batches)

In [None]:
train_batch = next(iter(train_dl))
train_batch

{'user_id': tensor([2993, 2311, 5298, 2819, 2634,  144,  230, 4548, 4676,  754, 5792, 3857,
         2707,    3, 2189,   87]),
 'input_ids': tensor([[   0,    0,    0,  ..., 3687, 3741, 3772],
         [   0,    0,    0,  ..., 3257, 3340, 3382],
         [   0,    0,    0,  ..., 3570, 3633, 3634],
         ...,
         [   0,    0,    0,  ..., 3484, 3551, 3603],
         [   0,    0,    0,  ..., 3547, 3686, 3695],
         [   0,    0,    0,  ..., 3604, 3725, 1198]]),
 'labels': tensor([[   0,    0,    0,  ..., 3741, 3772, 3800],
         [   0,    0,    0,  ..., 3340, 3382, 3442],
         [   0,    0,    0,  ..., 3633, 3634, 3635],
         ...,
         [   0,    0,    0,  ..., 3551, 3603, 3799],
         [   0,    0,    0,  ..., 3686, 3695, 3725],
         [   0,    0,    0,  ..., 3725, 1198, 1569]]),
 'attention_mask': tensor([[0., 0., 0.,  ..., 1., 1., 1.],
         [0., 0., 0.,  ..., 1., 1., 1.],
         [0., 0., 0.,  ..., 1., 1., 1.],
         ...,
         [0., 0., 0.,  ...,

In [None]:
val_batch = next(iter(val_dl))
val_batch

{'user_id': tensor([ 36,  59,  65, 102, 131, 146, 157, 164, 169, 184, 192, 193, 195, 229,
         231, 237]),
 'input_ids': tensor([[2790, 2847, 2850,  ..., 3291, 3758, 2706],
         [   0,    0,    0,  ..., 3743, 3747, 3841],
         [   0,    0,    0,  ..., 1193, 2560, 3752],
         ...,
         [   0,    0,    0,  ..., 3687, 3725, 3847],
         [   0,    0,    0,  ..., 2848, 2917, 3459],
         [   0,    0,    0,  ..., 1637,  354, 2041]]),
 'attention_mask': tensor([[1., 1., 1.,  ..., 1., 1., 1.],
         [0., 0., 0.,  ..., 1., 1., 1.],
         [0., 0., 0.,  ..., 1., 1., 1.],
         ...,
         [0., 0., 0.,  ..., 1., 1., 1.],
         [0., 0., 0.,  ..., 1., 1., 1.],
         [0., 0., 0.,  ..., 1., 1., 1.]]),
 'seq_length': tensor([200,  93, 119,  32, 200, 200, 200,  25, 200,  28, 200, 177, 200,  83,
          48, 161]),
 'target_items': [tensor([ 245,  294,  592, 1247, 1276, 1654, 2201, 2301, 2626, 2645, 3106, 3571,
          3718]),
  tensor([  17,   25,   32,   58

In [None]:
val_batch["input_ids"].shape

torch.Size([16, 200])

In [None]:
train_batch["input_ids"].shape

torch.Size([16, 200])

# Model

## Model Architecture

At its core, GPTRec applies the same autoregressive language modeling approach that powers GPT to sequential recommendation. Just as GPT predicts the next word given previous words, GPTRec predicts the next item given a user's interaction history.

The architecture follows a familiar transformer pattern:

1. **Item Embeddings**: Each item gets a learned embedding vector. We also add positional embeddings so the model knows where each item appears in the sequence.

2. **Causal Transformer**: The key ingredient. Unlike BERT-style models that can look at the full sequence bidirectionally, we use causal (autoregressive) masking—each position can only attend to *earlier* positions. This matches our inference scenario: predict what comes next based only on what we've seen so far.

3. **Weight Tying**: The output projection layer shares weights with the item embedding layer. This is a common trick in language models that reduces parameters and often improves performance—the intuition being that the "meaning" of an item should be consistent whether we're encoding it as input or predicting it as output.

The `compute_loss` method implements standard cross-entropy loss with `ignore_index=0` to skip padding tokens—we only want to learn from real predictions, not from predicting padding.

One notable simplification: unlike the full GPTRec paper which explores SVD-based embedding initialization and various training optimizations, I'm using standard randomly-initialized embeddings here. For a small dataset like this, it should work fine.

In [None]:
import torch
import torch.nn as nn
import math

class GPTRecModel(nn.Module):
    """
    GPTRec: GPT-style autoregressive transformer for sequential recommendation.
    
    Uses causal masking so each position only attends to previous positions,
    enabling next-item prediction.
    """
    
    def __init__(
        self,
        n_items: int,
        d_model: int = 64,
        n_heads: int = 2,
        n_layers: int = 2,
        d_ff: int = 256,
        max_seq_len: int = 50,
        dropout: float = 0.1,
        pad_token: int = 0,
    ):
        super().__init__()
        self.pad_token = pad_token
        self.d_model = d_model
        
        # Item embedding (+1 for padding token at index 0)
        self.item_embedding = nn.Embedding(n_items + 1, d_model, padding_idx=pad_token)
        self.pos_embedding = nn.Embedding(max_seq_len, d_model)
        
        self.dropout = nn.Dropout(dropout)
        self.layer_norm = nn.LayerNorm(d_model)
        
        # Transformer encoder with causal masking
        encoder_layer = nn.TransformerEncoderLayer(
            d_model=d_model,
            nhead=n_heads,
            dim_feedforward=d_ff,
            dropout=dropout,
            activation='gelu',
            batch_first=True,
        )
        self.transformer = nn.TransformerEncoder(encoder_layer, num_layers=n_layers)
        
        # Output projection to item scores
        self.output_layer = nn.Linear(d_model, n_items + 1)
        self.output_layer.weight = self.item_embedding.weight  # tie weights

    def _generate_causal_mask(self, seq_len: int, device: torch.device) -> torch.Tensor:
        """Generate causal mask: positions can only attend to earlier positions."""
        mask = torch.triu(torch.ones(seq_len, seq_len, device=device), diagonal=1)
        mask = mask.masked_fill(mask == 1, float('-inf'))
        return mask
    
    def forward(
        self,
        input_ids: torch.Tensor,           # (batch, seq_len)
        attention_mask: torch.Tensor = None,  # (batch, seq_len) - 1 for real, 0 for pad
    ) -> torch.Tensor:
        batch_size, seq_len = input_ids.shape
        device = input_ids.device
        
        # Embeddings
        positions = torch.arange(seq_len, device=device).unsqueeze(0).expand(batch_size, -1)
        x = self.item_embedding(input_ids) + self.pos_embedding(positions)
        x = self.dropout(self.layer_norm(x))
        
        # Causal mask
        causal_mask = self._generate_causal_mask(seq_len, device)
        
        # Padding mask (convert to float: 0.0 = attend, -inf = ignore)
        src_key_padding_mask = torch.where(attention_mask == 1, 0.0, float('-inf'))

        # Transformer
        x = self.transformer(
            x,
            mask=causal_mask,
            src_key_padding_mask=src_key_padding_mask,
        )
        
        # Project to item logits
        logits = self.output_layer(x)  # (batch, seq_len, n_items+1)
        return logits
    
    def compute_loss(self, batch, ignore_index=0):
        """Cross-entropy loss for next-item prediction, ignoring padding."""
        logits = self(batch['input_ids'], batch['attention_mask'])  # (B, seq_len, n_items+1)
        
        # Reshape for cross-entropy: (B*seq_len, n_items+1) vs (B*seq_len,)
        logits_flat = logits.view(-1, logits.size(-1))
        labels_flat = batch['labels'].view(-1)
        
        loss = nn.functional.cross_entropy(
            logits_flat, 
            labels_flat, 
            ignore_index=ignore_index  # Ignore padding positions
        )
        return loss

In [None]:
n_items = item_cardinalities['item_id']  # 201

model = GPTRecModel(
    n_items=n_items,
    d_model=64,
    n_heads=2,
    n_layers=2,
    d_ff=256,
    max_seq_len=max_len,
    dropout=0.2,
)

# Test forward pass
logits = model(val_batch['input_ids'], val_batch['attention_mask'])
print("Logits shape:", logits.shape)  # (1, seq_len, n_items+1)

Logits shape: torch.Size([16, 200, 3885])


In [None]:
model.compute_loss(train_batch)

tensor(36.3798, grad_fn=<NllLossBackward0>)

In [None]:
sum([p.numel() for p in model.parameters()])

365421

## Train / Evaluate

In [None]:
#from rec.retrieval.metrics import aggregate_retrieval_metrics
from rec.retrieval.metrics import *
from rec.retrieval.metrics import _as_list

# requires rewrite from library version to add dcg for comparison with GPTRec paper. will integrate into library in future iterations
def aggregate_retrieval_metrics(
    topk_indices: torch.Tensor,
    relevant_indices: Sequence[torch.Tensor],
    ks: Iterable[int],
) -> Dict[str, float]:
    ks_list = _as_list(ks)
    if not ks_list or topk_indices.numel() == 0:
        return {}

    max_k = max(ks_list)
    if topk_indices.size(1) < max_k:
        raise ValueError("topk_indices must have at least max(k) columns")

    totals = {f"recall@{k}": 0.0 for k in ks_list}
    totals.update({f"precision@{k}": 0.0 for k in ks_list})
    totals.update({f"dcg@{k}": 0.0 for k in ks_list})
    totals.update({f"ndcg@{k}": 0.0 for k in ks_list})
    totals["mrr"] = 0.0

    num_users = topk_indices.size(0)
    for idx in range(num_users):
        topk = topk_indices[idx]
        rel = relevant_indices[idx]
        if rel.numel() == 0:
            continue
        hits = torch.isin(topk, rel)

        totals["mrr"] += mrr(hits)
        num_rel = int(rel.numel())
        for k in ks_list:
            totals[f"recall@{k}"] += recall_at_k(hits, num_rel, k)
            totals[f"precision@{k}"] += precision_at_k(hits, k)
            dcg = dcg_at_k(hits, k)
            ideal_dcg = idcg_at_k(num_rel, k)
            ndcg = dcg / ideal_dcg if ideal_dcg > 0 else 0.0
            totals[f"dcg@{k}"] += dcg
            totals[f"ndcg@{k}"] += ndcg

    if num_users == 0:
        return {}
    return {k: v / float(num_users) for k, v in totals.items()}

def evaluate_gptrec(
    model: GPTRecModel,
    val_dataloader: DataLoader,
    train_user_item_map: Dict[int, List[int]],
    n_items: int,
    ks: List[int] = [5, 10, 20],
    device: torch.device = None,
) -> Dict[str, float]:
    """
    Evaluate GPTRec model on retrieval metrics.
    
    For each user, we:
    1. Get model's logits from the last position (next-item prediction)
    2. Mask out items the user has already seen in training
    3. Get top-k predictions
    4. Compare against validation targets
    """
    if device is None:
        device = next(model.parameters()).device
    
    model.eval()
    
    topk_indices_list = []
    relevant_indices_list = []
    max_k = max(ks)
    
    with torch.no_grad():
        for batch in tqdm(val_dataloader):
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            user_ids = batch['user_id']
            target_items = batch['target_items']  # list of tensors
            
            # Forward pass
            logits = model(input_ids, attention_mask)  # (B, seq_len, n_items+1)
            
            # Get logits from last position for next-item prediction
            last_logits = logits[:, -1, :]  # (B, n_items+1)
            
            # For each user in batch
            for i in range(len(user_ids)):
                uid = user_ids[i].item()
                scores = last_logits[i].clone()  # (n_items+1,)
                
                # Mask out seen items (set to -inf)
                seen_items = train_user_item_map.get(uid, [])
                if seen_items:
                    seen_tensor = torch.tensor(seen_items, device=device)
                    scores[seen_tensor] = float('-inf')
                
                # Also mask out padding token (index 0)
                scores[0] = float('-inf')
                
                # Get top-k predictions
                topk = torch.topk(scores, min(max_k, n_items)).indices
                topk_indices_list.append(topk.cpu())
                
                # Target items for this user
                relevant_indices_list.append(target_items[i])
    
    if not topk_indices_list:
        return {}
    
    topk_tensor = torch.stack(topk_indices_list, dim=0)
    metrics = aggregate_retrieval_metrics(topk_tensor, relevant_indices_list, ks)
    return metrics

In [None]:
from tqdm import tqdm

num_epochs = 5
batch_size = 32
lr = 5e-4

optimizer = torch.optim.AdamW(model.parameters(), lr=lr)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

for epoch in range(num_epochs):
    model.train()
    train_losses = []
    
    # Training loop
    pbar = tqdm(train_loader, desc=f"Epoch {epoch+1}/{num_epochs}")
    for batch in pbar:
        optimizer.zero_grad()
        loss = model.compute_loss(batch)
        loss.backward()
        optimizer.step()
        
        train_losses.append(loss.item())
        pbar.set_postfix(loss=f"{loss.item():.2f}")
    
    avg_train_loss = sum(train_losses) / len(train_losses)
    
    # Retrieval metrics (with progress bar)
    metrics = evaluate_gptrec(
        model, val_dl, train_user_item_map,
        n_items=n_items, ks=[5, 10, 20],
    )
    
    # Summary line
    print(f"\nEpoch {epoch+1}: train_loss={avg_train_loss:.4f} | R@10={metrics.get('recall@10', 0):.4f}",  
        f"| DCG@10={metrics.get('dcg@10', 0):.4f} | NDCG@10={metrics.get('ndcg@10', 0):.4f}")

Epoch 1/5:   0%|          | 0/189 [00:00<?, ?it/s]

Epoch 1/5:   0%|          | 0/189 [00:01<?, ?it/s, loss=36.64]

Epoch 1/5:   1%|          | 1/189 [00:01<05:34,  1.78s/it, loss=36.64]

Epoch 1/5:   1%|          | 1/189 [00:03<05:34,  1.78s/it, loss=36.19]

Epoch 1/5:   1%|          | 2/189 [00:03<05:31,  1.77s/it, loss=36.19]

Epoch 1/5:   1%|          | 2/189 [00:05<05:31,  1.77s/it, loss=36.06]

Epoch 1/5:   2%|▏         | 3/189 [00:05<05:12,  1.68s/it, loss=36.06]

Epoch 1/5:   2%|▏         | 3/189 [00:06<05:12,  1.68s/it, loss=35.71]

Epoch 1/5:   2%|▏         | 4/189 [00:06<05:08,  1.67s/it, loss=35.71]

Epoch 1/5:   2%|▏         | 4/189 [00:08<05:08,  1.67s/it, loss=35.56]

Epoch 1/5:   3%|▎         | 5/189 [00:08<05:09,  1.68s/it, loss=35.56]

Epoch 1/5:   3%|▎         | 5/189 [00:10<05:09,  1.68s/it, loss=35.01]

Epoch 1/5:   3%|▎         | 6/189 [00:10<05:01,  1.65s/it, loss=35.01]

Epoch 1/5:   3%|▎         | 6/189 [00:11<05:01,  1.65s/it, loss=34.56]

Epoch 1/5:   4%|▎         | 7/189 [00:11<04:55,  1.62s/it, loss=34.56]

Epoch 1/5:   4%|▎         | 7/189 [00:13<04:55,  1.62s/it, loss=34.11]

Epoch 1/5:   4%|▍         | 8/189 [00:13<04:46,  1.58s/it, loss=34.11]

Epoch 1/5:   4%|▍         | 8/189 [00:14<04:46,  1.58s/it, loss=33.33]

Epoch 1/5:   5%|▍         | 9/189 [00:14<04:42,  1.57s/it, loss=33.33]

Epoch 1/5:   5%|▍         | 9/189 [00:16<04:42,  1.57s/it, loss=32.68]

Epoch 1/5:   5%|▌         | 10/189 [00:16<04:46,  1.60s/it, loss=32.68]

Epoch 1/5:   5%|▌         | 10/189 [00:17<04:46,  1.60s/it, loss=32.68]

Epoch 1/5:   6%|▌         | 11/189 [00:17<04:37,  1.56s/it, loss=32.68]

Epoch 1/5:   6%|▌         | 11/189 [00:19<04:37,  1.56s/it, loss=31.85]

Epoch 1/5:   6%|▋         | 12/189 [00:19<04:39,  1.58s/it, loss=31.85]

Epoch 1/5:   6%|▋         | 12/189 [00:21<04:39,  1.58s/it, loss=31.54]

Epoch 1/5:   7%|▋         | 13/189 [00:21<04:39,  1.59s/it, loss=31.54]

Epoch 1/5:   7%|▋         | 13/189 [00:22<04:39,  1.59s/it, loss=30.80]

Epoch 1/5:   7%|▋         | 14/189 [00:22<04:34,  1.57s/it, loss=30.80]

Epoch 1/5:   7%|▋         | 14/189 [00:24<04:34,  1.57s/it, loss=30.32]

Epoch 1/5:   8%|▊         | 15/189 [00:24<04:36,  1.59s/it, loss=30.32]

Epoch 1/5:   8%|▊         | 15/189 [00:25<04:36,  1.59s/it, loss=29.31]

Epoch 1/5:   8%|▊         | 16/189 [00:25<04:25,  1.54s/it, loss=29.31]

Epoch 1/5:   8%|▊         | 16/189 [00:27<04:25,  1.54s/it, loss=28.87]

Epoch 1/5:   9%|▉         | 17/189 [00:27<04:30,  1.58s/it, loss=28.87]

Epoch 1/5:   9%|▉         | 17/189 [00:28<04:30,  1.58s/it, loss=28.11]

Epoch 1/5:  10%|▉         | 18/189 [00:28<04:31,  1.59s/it, loss=28.11]

Epoch 1/5:  10%|▉         | 18/189 [00:30<04:31,  1.59s/it, loss=27.85]

Epoch 1/5:  10%|█         | 19/189 [00:30<04:32,  1.60s/it, loss=27.85]

Epoch 1/5:  10%|█         | 19/189 [00:32<04:32,  1.60s/it, loss=27.22]

Epoch 1/5:  11%|█         | 20/189 [00:32<04:33,  1.62s/it, loss=27.22]

Epoch 1/5:  11%|█         | 20/189 [00:33<04:33,  1.62s/it, loss=27.05]

Epoch 1/5:  11%|█         | 21/189 [00:33<04:31,  1.61s/it, loss=27.05]

Epoch 1/5:  11%|█         | 21/189 [00:35<04:31,  1.61s/it, loss=26.80]

Epoch 1/5:  12%|█▏        | 22/189 [00:35<04:37,  1.66s/it, loss=26.80]

Epoch 1/5:  12%|█▏        | 22/189 [00:37<04:37,  1.66s/it, loss=26.49]

Epoch 1/5:  12%|█▏        | 23/189 [00:37<04:35,  1.66s/it, loss=26.49]

Epoch 1/5:  12%|█▏        | 23/189 [00:38<04:35,  1.66s/it, loss=25.91]

Epoch 1/5:  13%|█▎        | 24/189 [00:38<04:36,  1.67s/it, loss=25.91]

Epoch 1/5:  13%|█▎        | 24/189 [00:40<04:36,  1.67s/it, loss=25.62]

Epoch 1/5:  13%|█▎        | 25/189 [00:40<04:28,  1.64s/it, loss=25.62]

Epoch 1/5:  13%|█▎        | 25/189 [00:42<04:28,  1.64s/it, loss=25.39]

Epoch 1/5:  14%|█▍        | 26/189 [00:42<04:25,  1.63s/it, loss=25.39]

Epoch 1/5:  14%|█▍        | 26/189 [00:43<04:25,  1.63s/it, loss=24.72]

Epoch 1/5:  14%|█▍        | 27/189 [00:43<04:26,  1.65s/it, loss=24.72]

Epoch 1/5:  14%|█▍        | 27/189 [00:45<04:26,  1.65s/it, loss=24.38]

Epoch 1/5:  15%|█▍        | 28/189 [00:45<04:22,  1.63s/it, loss=24.38]

Epoch 1/5:  15%|█▍        | 28/189 [00:47<04:22,  1.63s/it, loss=24.32]

Epoch 1/5:  15%|█▌        | 29/189 [00:47<04:22,  1.64s/it, loss=24.32]

Epoch 1/5:  15%|█▌        | 29/189 [00:48<04:22,  1.64s/it, loss=24.00]

Epoch 1/5:  16%|█▌        | 30/189 [00:48<04:18,  1.63s/it, loss=24.00]

Epoch 1/5:  16%|█▌        | 30/189 [00:50<04:18,  1.63s/it, loss=23.63]

Epoch 1/5:  16%|█▋        | 31/189 [00:50<04:25,  1.68s/it, loss=23.63]

Epoch 1/5:  16%|█▋        | 31/189 [00:51<04:25,  1.68s/it, loss=23.35]

Epoch 1/5:  17%|█▋        | 32/189 [00:51<04:18,  1.64s/it, loss=23.35]

Epoch 1/5:  17%|█▋        | 32/189 [00:53<04:18,  1.64s/it, loss=23.33]

Epoch 1/5:  17%|█▋        | 33/189 [00:53<04:17,  1.65s/it, loss=23.33]

Epoch 1/5:  17%|█▋        | 33/189 [00:55<04:17,  1.65s/it, loss=23.14]

Epoch 1/5:  18%|█▊        | 34/189 [00:55<04:16,  1.66s/it, loss=23.14]

Epoch 1/5:  18%|█▊        | 34/189 [00:56<04:16,  1.66s/it, loss=23.01]

Epoch 1/5:  19%|█▊        | 35/189 [00:56<04:14,  1.65s/it, loss=23.01]

Epoch 1/5:  19%|█▊        | 35/189 [00:58<04:14,  1.65s/it, loss=22.90]

Epoch 1/5:  19%|█▉        | 36/189 [00:58<04:09,  1.63s/it, loss=22.90]

Epoch 1/5:  19%|█▉        | 36/189 [01:00<04:09,  1.63s/it, loss=22.78]

Epoch 1/5:  20%|█▉        | 37/189 [01:00<04:14,  1.67s/it, loss=22.78]

Epoch 1/5:  20%|█▉        | 37/189 [01:02<04:14,  1.67s/it, loss=22.71]

Epoch 1/5:  20%|██        | 38/189 [01:02<04:14,  1.69s/it, loss=22.71]

Epoch 1/5:  20%|██        | 38/189 [01:03<04:14,  1.69s/it, loss=22.54]

Epoch 1/5:  21%|██        | 39/189 [01:03<04:14,  1.70s/it, loss=22.54]

Epoch 1/5:  21%|██        | 39/189 [01:05<04:14,  1.70s/it, loss=22.40]

Epoch 1/5:  21%|██        | 40/189 [01:05<04:15,  1.71s/it, loss=22.40]

Epoch 1/5:  21%|██        | 40/189 [01:07<04:15,  1.71s/it, loss=22.37]

Epoch 1/5:  22%|██▏       | 41/189 [01:07<04:18,  1.75s/it, loss=22.37]

Epoch 1/5:  22%|██▏       | 41/189 [01:09<04:18,  1.75s/it, loss=22.06]

Epoch 1/5:  22%|██▏       | 42/189 [01:09<04:13,  1.73s/it, loss=22.06]

Epoch 1/5:  22%|██▏       | 42/189 [01:10<04:13,  1.73s/it, loss=22.00]

Epoch 1/5:  23%|██▎       | 43/189 [01:10<04:10,  1.72s/it, loss=22.00]

Epoch 1/5:  23%|██▎       | 43/189 [01:12<04:10,  1.72s/it, loss=22.13]

Epoch 1/5:  23%|██▎       | 44/189 [01:12<04:05,  1.70s/it, loss=22.13]

Epoch 1/5:  23%|██▎       | 44/189 [01:13<04:05,  1.70s/it, loss=21.73]

Epoch 1/5:  24%|██▍       | 45/189 [01:13<03:59,  1.66s/it, loss=21.73]

Epoch 1/5:  24%|██▍       | 45/189 [01:15<03:59,  1.66s/it, loss=21.77]

Epoch 1/5:  24%|██▍       | 46/189 [01:15<03:52,  1.63s/it, loss=21.77]

Epoch 1/5:  24%|██▍       | 46/189 [01:17<03:52,  1.63s/it, loss=21.65]

Epoch 1/5:  25%|██▍       | 47/189 [01:17<03:52,  1.64s/it, loss=21.65]

Epoch 1/5:  25%|██▍       | 47/189 [01:18<03:52,  1.64s/it, loss=21.58]

Epoch 1/5:  25%|██▌       | 48/189 [01:18<03:43,  1.58s/it, loss=21.58]

Epoch 1/5:  25%|██▌       | 48/189 [01:20<03:43,  1.58s/it, loss=21.53]

Epoch 1/5:  26%|██▌       | 49/189 [01:20<03:37,  1.55s/it, loss=21.53]

Epoch 1/5:  26%|██▌       | 49/189 [01:21<03:37,  1.55s/it, loss=21.26]

Epoch 1/5:  26%|██▋       | 50/189 [01:21<03:27,  1.50s/it, loss=21.26]

Epoch 1/5:  26%|██▋       | 50/189 [01:23<03:27,  1.50s/it, loss=21.38]

Epoch 1/5:  27%|██▋       | 51/189 [01:23<03:31,  1.54s/it, loss=21.38]

Epoch 1/5:  27%|██▋       | 51/189 [01:24<03:31,  1.54s/it, loss=21.17]

Epoch 1/5:  28%|██▊       | 52/189 [01:24<03:26,  1.51s/it, loss=21.17]

Epoch 1/5:  28%|██▊       | 52/189 [01:26<03:26,  1.51s/it, loss=21.18]

Epoch 1/5:  28%|██▊       | 53/189 [01:26<03:27,  1.52s/it, loss=21.18]

Epoch 1/5:  28%|██▊       | 53/189 [01:27<03:27,  1.52s/it, loss=21.15]

Epoch 1/5:  29%|██▊       | 54/189 [01:27<03:19,  1.48s/it, loss=21.15]

Epoch 1/5:  29%|██▊       | 54/189 [01:28<03:19,  1.48s/it, loss=21.16]

Epoch 1/5:  29%|██▉       | 55/189 [01:28<03:14,  1.45s/it, loss=21.16]

Epoch 1/5:  29%|██▉       | 55/189 [01:30<03:14,  1.45s/it, loss=20.84]

Epoch 1/5:  30%|██▉       | 56/189 [01:30<03:18,  1.50s/it, loss=20.84]

Epoch 1/5:  30%|██▉       | 56/189 [01:32<03:18,  1.50s/it, loss=20.65]

Epoch 1/5:  30%|███       | 57/189 [01:32<03:20,  1.52s/it, loss=20.65]

Epoch 1/5:  30%|███       | 57/189 [01:33<03:20,  1.52s/it, loss=20.82]

Epoch 1/5:  31%|███       | 58/189 [01:33<03:17,  1.51s/it, loss=20.82]

Epoch 1/5:  31%|███       | 58/189 [01:35<03:17,  1.51s/it, loss=20.35]

Epoch 1/5:  31%|███       | 59/189 [01:35<03:23,  1.56s/it, loss=20.35]

Epoch 1/5:  31%|███       | 59/189 [01:36<03:23,  1.56s/it, loss=20.53]

Epoch 1/5:  32%|███▏      | 60/189 [01:36<03:20,  1.55s/it, loss=20.53]

Epoch 1/5:  32%|███▏      | 60/189 [01:38<03:20,  1.55s/it, loss=20.51]

Epoch 1/5:  32%|███▏      | 61/189 [01:38<03:19,  1.56s/it, loss=20.51]

Epoch 1/5:  32%|███▏      | 61/189 [01:39<03:19,  1.56s/it, loss=20.36]

Epoch 1/5:  33%|███▎      | 62/189 [01:39<03:19,  1.57s/it, loss=20.36]

Epoch 1/5:  33%|███▎      | 62/189 [01:41<03:19,  1.57s/it, loss=20.41]

Epoch 1/5:  33%|███▎      | 63/189 [01:41<03:20,  1.59s/it, loss=20.41]

Epoch 1/5:  33%|███▎      | 63/189 [01:43<03:20,  1.59s/it, loss=20.24]

Epoch 1/5:  34%|███▍      | 64/189 [01:43<03:23,  1.63s/it, loss=20.24]

Epoch 1/5:  34%|███▍      | 64/189 [01:44<03:23,  1.63s/it, loss=20.16]

Epoch 1/5:  34%|███▍      | 65/189 [01:44<03:14,  1.57s/it, loss=20.16]

Epoch 1/5:  34%|███▍      | 65/189 [01:46<03:14,  1.57s/it, loss=20.13]

Epoch 1/5:  35%|███▍      | 66/189 [01:46<03:19,  1.63s/it, loss=20.13]

Epoch 1/5:  35%|███▍      | 66/189 [01:48<03:19,  1.63s/it, loss=19.93]

Epoch 1/5:  35%|███▌      | 67/189 [01:48<03:17,  1.62s/it, loss=19.93]

Epoch 1/5:  35%|███▌      | 67/189 [01:49<03:17,  1.62s/it, loss=20.23]

Epoch 1/5:  36%|███▌      | 68/189 [01:49<03:12,  1.59s/it, loss=20.23]

Epoch 1/5:  36%|███▌      | 68/189 [01:51<03:12,  1.59s/it, loss=19.74]

Epoch 1/5:  37%|███▋      | 69/189 [01:51<03:10,  1.59s/it, loss=19.74]

Epoch 1/5:  37%|███▋      | 69/189 [01:52<03:10,  1.59s/it, loss=19.88]

Epoch 1/5:  37%|███▋      | 70/189 [01:52<03:07,  1.58s/it, loss=19.88]

Epoch 1/5:  37%|███▋      | 70/189 [01:54<03:07,  1.58s/it, loss=19.78]

Epoch 1/5:  38%|███▊      | 71/189 [01:54<03:02,  1.55s/it, loss=19.78]

Epoch 1/5:  38%|███▊      | 71/189 [01:55<03:02,  1.55s/it, loss=19.82]

Epoch 1/5:  38%|███▊      | 72/189 [01:55<02:59,  1.53s/it, loss=19.82]

Epoch 1/5:  38%|███▊      | 72/189 [01:57<02:59,  1.53s/it, loss=19.66]

Epoch 1/5:  39%|███▊      | 73/189 [01:57<02:59,  1.55s/it, loss=19.66]

Epoch 1/5:  39%|███▊      | 73/189 [01:58<02:59,  1.55s/it, loss=19.61]

Epoch 1/5:  39%|███▉      | 74/189 [01:58<03:03,  1.60s/it, loss=19.61]

Epoch 1/5:  39%|███▉      | 74/189 [02:00<03:03,  1.60s/it, loss=19.48]

Epoch 1/5:  40%|███▉      | 75/189 [02:00<03:04,  1.62s/it, loss=19.48]

Epoch 1/5:  40%|███▉      | 75/189 [02:02<03:04,  1.62s/it, loss=19.44]

Epoch 1/5:  40%|████      | 76/189 [02:02<03:01,  1.61s/it, loss=19.44]

Epoch 1/5:  40%|████      | 76/189 [02:03<03:01,  1.61s/it, loss=19.40]

Epoch 1/5:  41%|████      | 77/189 [02:03<03:01,  1.62s/it, loss=19.40]

Epoch 1/5:  41%|████      | 77/189 [02:05<03:01,  1.62s/it, loss=19.42]

Epoch 1/5:  41%|████▏     | 78/189 [02:05<02:57,  1.60s/it, loss=19.42]

Epoch 1/5:  41%|████▏     | 78/189 [02:07<02:57,  1.60s/it, loss=18.89]

Epoch 1/5:  42%|████▏     | 79/189 [02:07<02:56,  1.61s/it, loss=18.89]

Epoch 1/5:  42%|████▏     | 79/189 [02:08<02:56,  1.61s/it, loss=19.14]

Epoch 1/5:  42%|████▏     | 80/189 [02:08<02:55,  1.61s/it, loss=19.14]

Epoch 1/5:  42%|████▏     | 80/189 [02:10<02:55,  1.61s/it, loss=19.00]

Epoch 1/5:  43%|████▎     | 81/189 [02:10<02:56,  1.63s/it, loss=19.00]

Epoch 1/5:  43%|████▎     | 81/189 [02:12<02:56,  1.63s/it, loss=18.78]

Epoch 1/5:  43%|████▎     | 82/189 [02:12<02:58,  1.67s/it, loss=18.78]

Epoch 1/5:  43%|████▎     | 82/189 [02:13<02:58,  1.67s/it, loss=18.91]

Epoch 1/5:  44%|████▍     | 83/189 [02:13<02:52,  1.63s/it, loss=18.91]

Epoch 1/5:  44%|████▍     | 83/189 [02:15<02:52,  1.63s/it, loss=18.71]

Epoch 1/5:  44%|████▍     | 84/189 [02:15<02:49,  1.62s/it, loss=18.71]

Epoch 1/5:  44%|████▍     | 84/189 [02:16<02:49,  1.62s/it, loss=18.55]

Epoch 1/5:  45%|████▍     | 85/189 [02:16<02:42,  1.56s/it, loss=18.55]

Epoch 1/5:  45%|████▍     | 85/189 [02:18<02:42,  1.56s/it, loss=18.64]

Epoch 1/5:  46%|████▌     | 86/189 [02:18<02:40,  1.56s/it, loss=18.64]

Epoch 1/5:  46%|████▌     | 86/189 [02:19<02:40,  1.56s/it, loss=18.65]

Epoch 1/5:  46%|████▌     | 87/189 [02:19<02:41,  1.58s/it, loss=18.65]

Epoch 1/5:  46%|████▌     | 87/189 [02:21<02:41,  1.58s/it, loss=18.56]

Epoch 1/5:  47%|████▋     | 88/189 [02:21<02:39,  1.58s/it, loss=18.56]

Epoch 1/5:  47%|████▋     | 88/189 [02:23<02:39,  1.58s/it, loss=18.59]

Epoch 1/5:  47%|████▋     | 89/189 [02:23<02:42,  1.63s/it, loss=18.59]

Epoch 1/5:  47%|████▋     | 89/189 [02:24<02:42,  1.63s/it, loss=18.41]

Epoch 1/5:  48%|████▊     | 90/189 [02:24<02:40,  1.62s/it, loss=18.41]

Epoch 1/5:  48%|████▊     | 90/189 [02:26<02:40,  1.62s/it, loss=18.22]

Epoch 1/5:  48%|████▊     | 91/189 [02:26<02:42,  1.66s/it, loss=18.22]

Epoch 1/5:  48%|████▊     | 91/189 [02:28<02:42,  1.66s/it, loss=18.27]

Epoch 1/5:  49%|████▊     | 92/189 [02:28<02:43,  1.68s/it, loss=18.27]

Epoch 1/5:  49%|████▊     | 92/189 [02:29<02:43,  1.68s/it, loss=17.92]

Epoch 1/5:  49%|████▉     | 93/189 [02:29<02:35,  1.62s/it, loss=17.92]

Epoch 1/5:  49%|████▉     | 93/189 [02:31<02:35,  1.62s/it, loss=18.26]

Epoch 1/5:  50%|████▉     | 94/189 [02:31<02:31,  1.60s/it, loss=18.26]

Epoch 1/5:  50%|████▉     | 94/189 [02:32<02:31,  1.60s/it, loss=18.02]

Epoch 1/5:  50%|█████     | 95/189 [02:32<02:31,  1.61s/it, loss=18.02]

Epoch 1/5:  50%|█████     | 95/189 [02:34<02:31,  1.61s/it, loss=18.19]

Epoch 1/5:  51%|█████     | 96/189 [02:34<02:32,  1.64s/it, loss=18.19]

Epoch 1/5:  51%|█████     | 96/189 [02:36<02:32,  1.64s/it, loss=17.99]

Epoch 1/5:  51%|█████▏    | 97/189 [02:36<02:28,  1.62s/it, loss=17.99]

Epoch 1/5:  51%|█████▏    | 97/189 [02:37<02:28,  1.62s/it, loss=17.63]

Epoch 1/5:  52%|█████▏    | 98/189 [02:37<02:29,  1.64s/it, loss=17.63]

Epoch 1/5:  52%|█████▏    | 98/189 [02:39<02:29,  1.64s/it, loss=17.92]

Epoch 1/5:  52%|█████▏    | 99/189 [02:39<02:26,  1.62s/it, loss=17.92]

Epoch 1/5:  52%|█████▏    | 99/189 [02:41<02:26,  1.62s/it, loss=17.82]

Epoch 1/5:  53%|█████▎    | 100/189 [02:41<02:26,  1.65s/it, loss=17.82]

Epoch 1/5:  53%|█████▎    | 100/189 [02:42<02:26,  1.65s/it, loss=17.55]

Epoch 1/5:  53%|█████▎    | 101/189 [02:42<02:23,  1.64s/it, loss=17.55]

Epoch 1/5:  53%|█████▎    | 101/189 [02:44<02:23,  1.64s/it, loss=17.63]

Epoch 1/5:  54%|█████▍    | 102/189 [02:44<02:22,  1.63s/it, loss=17.63]

Epoch 1/5:  54%|█████▍    | 102/189 [02:46<02:22,  1.63s/it, loss=17.74]

Epoch 1/5:  54%|█████▍    | 103/189 [02:46<02:20,  1.64s/it, loss=17.74]

Epoch 1/5:  54%|█████▍    | 103/189 [02:47<02:20,  1.64s/it, loss=17.44]

Epoch 1/5:  55%|█████▌    | 104/189 [02:47<02:14,  1.58s/it, loss=17.44]

Epoch 1/5:  55%|█████▌    | 104/189 [02:49<02:14,  1.58s/it, loss=17.60]

Epoch 1/5:  56%|█████▌    | 105/189 [02:49<02:10,  1.56s/it, loss=17.60]

Epoch 1/5:  56%|█████▌    | 105/189 [02:50<02:10,  1.56s/it, loss=17.58]

Epoch 1/5:  56%|█████▌    | 106/189 [02:50<02:11,  1.58s/it, loss=17.58]

Epoch 1/5:  56%|█████▌    | 106/189 [02:52<02:11,  1.58s/it, loss=17.23]

Epoch 1/5:  57%|█████▋    | 107/189 [02:52<02:13,  1.63s/it, loss=17.23]

Epoch 1/5:  57%|█████▋    | 107/189 [02:53<02:13,  1.63s/it, loss=17.34]

Epoch 1/5:  57%|█████▋    | 108/189 [02:53<02:11,  1.62s/it, loss=17.34]

Epoch 1/5:  57%|█████▋    | 108/189 [02:55<02:11,  1.62s/it, loss=17.37]

Epoch 1/5:  58%|█████▊    | 109/189 [02:55<02:14,  1.68s/it, loss=17.37]

Epoch 1/5:  58%|█████▊    | 109/189 [02:57<02:14,  1.68s/it, loss=17.22]

Epoch 1/5:  58%|█████▊    | 110/189 [02:57<02:15,  1.72s/it, loss=17.22]

Epoch 1/5:  58%|█████▊    | 110/189 [02:59<02:15,  1.72s/it, loss=17.15]

Epoch 1/5:  59%|█████▊    | 111/189 [02:59<02:11,  1.69s/it, loss=17.15]

Epoch 1/5:  59%|█████▊    | 111/189 [03:00<02:11,  1.69s/it, loss=17.01]

Epoch 1/5:  59%|█████▉    | 112/189 [03:00<02:10,  1.70s/it, loss=17.01]

Epoch 1/5:  59%|█████▉    | 112/189 [03:02<02:10,  1.70s/it, loss=17.17]

Epoch 1/5:  60%|█████▉    | 113/189 [03:02<02:11,  1.73s/it, loss=17.17]

Epoch 1/5:  60%|█████▉    | 113/189 [03:04<02:11,  1.73s/it, loss=17.01]

Epoch 1/5:  60%|██████    | 114/189 [03:04<02:05,  1.67s/it, loss=17.01]

Epoch 1/5:  60%|██████    | 114/189 [03:05<02:05,  1.67s/it, loss=16.85]

Epoch 1/5:  61%|██████    | 115/189 [03:05<02:00,  1.63s/it, loss=16.85]

Epoch 1/5:  61%|██████    | 115/189 [03:07<02:00,  1.63s/it, loss=16.88]

Epoch 1/5:  61%|██████▏   | 116/189 [03:07<02:01,  1.67s/it, loss=16.88]

Epoch 1/5:  61%|██████▏   | 116/189 [03:09<02:01,  1.67s/it, loss=16.82]

Epoch 1/5:  62%|██████▏   | 117/189 [03:09<01:59,  1.66s/it, loss=16.82]

Epoch 1/5:  62%|██████▏   | 117/189 [03:10<01:59,  1.66s/it, loss=16.83]

Epoch 1/5:  62%|██████▏   | 118/189 [03:10<01:57,  1.65s/it, loss=16.83]

Epoch 1/5:  62%|██████▏   | 118/189 [03:12<01:57,  1.65s/it, loss=16.63]

Epoch 1/5:  63%|██████▎   | 119/189 [03:12<01:54,  1.63s/it, loss=16.63]

Epoch 1/5:  63%|██████▎   | 119/189 [03:14<01:54,  1.63s/it, loss=16.44]

Epoch 1/5:  63%|██████▎   | 120/189 [03:14<01:52,  1.64s/it, loss=16.44]

Epoch 1/5:  63%|██████▎   | 120/189 [03:15<01:52,  1.64s/it, loss=16.58]

Epoch 1/5:  64%|██████▍   | 121/189 [03:15<01:50,  1.62s/it, loss=16.58]

Epoch 1/5:  64%|██████▍   | 121/189 [03:17<01:50,  1.62s/it, loss=16.61]

Epoch 1/5:  65%|██████▍   | 122/189 [03:17<01:50,  1.65s/it, loss=16.61]

Epoch 1/5:  65%|██████▍   | 122/189 [03:19<01:50,  1.65s/it, loss=16.14]

Epoch 1/5:  65%|██████▌   | 123/189 [03:19<01:49,  1.66s/it, loss=16.14]

Epoch 1/5:  65%|██████▌   | 123/189 [03:20<01:49,  1.66s/it, loss=16.71]

Epoch 1/5:  66%|██████▌   | 124/189 [03:20<01:43,  1.59s/it, loss=16.71]

Epoch 1/5:  66%|██████▌   | 124/189 [03:22<01:43,  1.59s/it, loss=16.30]

Epoch 1/5:  66%|██████▌   | 125/189 [03:22<01:40,  1.58s/it, loss=16.30]

Epoch 1/5:  66%|██████▌   | 125/189 [03:23<01:40,  1.58s/it, loss=16.33]

Epoch 1/5:  67%|██████▋   | 126/189 [03:23<01:38,  1.56s/it, loss=16.33]

Epoch 1/5:  67%|██████▋   | 126/189 [03:25<01:38,  1.56s/it, loss=16.26]

Epoch 1/5:  67%|██████▋   | 127/189 [03:25<01:36,  1.56s/it, loss=16.26]

Epoch 1/5:  67%|██████▋   | 127/189 [03:26<01:36,  1.56s/it, loss=16.13]

Epoch 1/5:  68%|██████▊   | 128/189 [03:26<01:34,  1.55s/it, loss=16.13]

Epoch 1/5:  68%|██████▊   | 128/189 [03:28<01:34,  1.55s/it, loss=15.99]

Epoch 1/5:  68%|██████▊   | 129/189 [03:28<01:35,  1.58s/it, loss=15.99]

Epoch 1/5:  68%|██████▊   | 129/189 [03:29<01:35,  1.58s/it, loss=16.44]

Epoch 1/5:  69%|██████▉   | 130/189 [03:29<01:34,  1.60s/it, loss=16.44]

Epoch 1/5:  69%|██████▉   | 130/189 [03:31<01:34,  1.60s/it, loss=15.88]

Epoch 1/5:  69%|██████▉   | 131/189 [03:31<01:31,  1.58s/it, loss=15.88]

Epoch 1/5:  69%|██████▉   | 131/189 [03:33<01:31,  1.58s/it, loss=16.16]

Epoch 1/5:  70%|██████▉   | 132/189 [03:33<01:31,  1.60s/it, loss=16.16]

Epoch 1/5:  70%|██████▉   | 132/189 [03:34<01:31,  1.60s/it, loss=15.84]

Epoch 1/5:  70%|███████   | 133/189 [03:34<01:28,  1.58s/it, loss=15.84]

Epoch 1/5:  70%|███████   | 133/189 [03:36<01:28,  1.58s/it, loss=15.77]

Epoch 1/5:  71%|███████   | 134/189 [03:36<01:27,  1.59s/it, loss=15.77]

Epoch 1/5:  71%|███████   | 134/189 [03:37<01:27,  1.59s/it, loss=15.82]

Epoch 1/5:  71%|███████▏  | 135/189 [03:37<01:27,  1.62s/it, loss=15.82]

Epoch 1/5:  71%|███████▏  | 135/189 [03:39<01:27,  1.62s/it, loss=15.74]

Epoch 1/5:  72%|███████▏  | 136/189 [03:39<01:31,  1.72s/it, loss=15.74]

Epoch 1/5:  72%|███████▏  | 136/189 [03:41<01:31,  1.72s/it, loss=15.69]

Epoch 1/5:  72%|███████▏  | 137/189 [03:41<01:30,  1.75s/it, loss=15.69]

Epoch 1/5:  72%|███████▏  | 137/189 [03:43<01:30,  1.75s/it, loss=15.77]

Epoch 1/5:  73%|███████▎  | 138/189 [03:43<01:30,  1.78s/it, loss=15.77]

Epoch 1/5:  73%|███████▎  | 138/189 [03:45<01:30,  1.78s/it, loss=15.66]

Epoch 1/5:  74%|███████▎  | 139/189 [03:45<01:29,  1.79s/it, loss=15.66]

Epoch 1/5:  74%|███████▎  | 139/189 [03:47<01:29,  1.79s/it, loss=15.55]

Epoch 1/5:  74%|███████▍  | 140/189 [03:47<01:27,  1.79s/it, loss=15.55]

Epoch 1/5:  74%|███████▍  | 140/189 [03:48<01:27,  1.79s/it, loss=15.68]

Epoch 1/5:  75%|███████▍  | 141/189 [03:48<01:23,  1.74s/it, loss=15.68]

Epoch 1/5:  75%|███████▍  | 141/189 [03:50<01:23,  1.74s/it, loss=15.45]

Epoch 1/5:  75%|███████▌  | 142/189 [03:50<01:18,  1.67s/it, loss=15.45]

Epoch 1/5:  75%|███████▌  | 142/189 [03:52<01:18,  1.67s/it, loss=15.60]

Epoch 1/5:  76%|███████▌  | 143/189 [03:52<01:18,  1.70s/it, loss=15.60]

Epoch 1/5:  76%|███████▌  | 143/189 [03:53<01:18,  1.70s/it, loss=15.44]

Epoch 1/5:  76%|███████▌  | 144/189 [03:53<01:14,  1.65s/it, loss=15.44]

Epoch 1/5:  76%|███████▌  | 144/189 [03:55<01:14,  1.65s/it, loss=15.45]

Epoch 1/5:  77%|███████▋  | 145/189 [03:55<01:13,  1.68s/it, loss=15.45]

Epoch 1/5:  77%|███████▋  | 145/189 [03:57<01:13,  1.68s/it, loss=15.33]

Epoch 1/5:  77%|███████▋  | 146/189 [03:57<01:13,  1.71s/it, loss=15.33]

Epoch 1/5:  77%|███████▋  | 146/189 [03:58<01:13,  1.71s/it, loss=15.21]

Epoch 1/5:  78%|███████▊  | 147/189 [03:58<01:11,  1.69s/it, loss=15.21]

Epoch 1/5:  78%|███████▊  | 147/189 [04:00<01:11,  1.69s/it, loss=15.24]

Epoch 1/5:  78%|███████▊  | 148/189 [04:00<01:08,  1.66s/it, loss=15.24]

Epoch 1/5:  78%|███████▊  | 148/189 [04:01<01:08,  1.66s/it, loss=15.07]

Epoch 1/5:  79%|███████▉  | 149/189 [04:01<01:05,  1.63s/it, loss=15.07]

Epoch 1/5:  79%|███████▉  | 149/189 [04:03<01:05,  1.63s/it, loss=15.05]

Epoch 1/5:  79%|███████▉  | 150/189 [04:03<01:02,  1.60s/it, loss=15.05]

Epoch 1/5:  79%|███████▉  | 150/189 [04:05<01:02,  1.60s/it, loss=15.25]

Epoch 1/5:  80%|███████▉  | 151/189 [04:05<01:00,  1.60s/it, loss=15.25]

Epoch 1/5:  80%|███████▉  | 151/189 [04:06<01:00,  1.60s/it, loss=14.99]

Epoch 1/5:  80%|████████  | 152/189 [04:06<00:59,  1.61s/it, loss=14.99]

Epoch 1/5:  80%|████████  | 152/189 [04:08<00:59,  1.61s/it, loss=14.92]

Epoch 1/5:  81%|████████  | 153/189 [04:08<00:57,  1.61s/it, loss=14.92]

Epoch 1/5:  81%|████████  | 153/189 [04:09<00:57,  1.61s/it, loss=15.06]

Epoch 1/5:  81%|████████▏ | 154/189 [04:09<00:56,  1.61s/it, loss=15.06]

Epoch 1/5:  81%|████████▏ | 154/189 [04:11<00:56,  1.61s/it, loss=14.89]

Epoch 1/5:  82%|████████▏ | 155/189 [04:11<00:55,  1.64s/it, loss=14.89]

Epoch 1/5:  82%|████████▏ | 155/189 [04:13<00:55,  1.64s/it, loss=14.79]

Epoch 1/5:  83%|████████▎ | 156/189 [04:13<00:55,  1.67s/it, loss=14.79]

Epoch 1/5:  83%|████████▎ | 156/189 [04:15<00:55,  1.67s/it, loss=14.68]

Epoch 1/5:  83%|████████▎ | 157/189 [04:15<00:54,  1.69s/it, loss=14.68]

Epoch 1/5:  83%|████████▎ | 157/189 [04:16<00:54,  1.69s/it, loss=14.75]

Epoch 1/5:  84%|████████▎ | 158/189 [04:16<00:52,  1.69s/it, loss=14.75]

Epoch 1/5:  84%|████████▎ | 158/189 [04:18<00:52,  1.69s/it, loss=14.70]

Epoch 1/5:  84%|████████▍ | 159/189 [04:18<00:50,  1.70s/it, loss=14.70]

Epoch 1/5:  84%|████████▍ | 159/189 [04:20<00:50,  1.70s/it, loss=14.68]

Epoch 1/5:  85%|████████▍ | 160/189 [04:20<00:49,  1.70s/it, loss=14.68]

Epoch 1/5:  85%|████████▍ | 160/189 [04:21<00:49,  1.70s/it, loss=14.69]

Epoch 1/5:  85%|████████▌ | 161/189 [04:21<00:47,  1.71s/it, loss=14.69]

Epoch 1/5:  85%|████████▌ | 161/189 [04:23<00:47,  1.71s/it, loss=14.70]

Epoch 1/5:  86%|████████▌ | 162/189 [04:23<00:45,  1.68s/it, loss=14.70]

Epoch 1/5:  86%|████████▌ | 162/189 [04:25<00:45,  1.68s/it, loss=14.55]

Epoch 1/5:  86%|████████▌ | 163/189 [04:25<00:43,  1.66s/it, loss=14.55]

Epoch 1/5:  86%|████████▌ | 163/189 [04:26<00:43,  1.66s/it, loss=14.70]

Epoch 1/5:  87%|████████▋ | 164/189 [04:26<00:41,  1.67s/it, loss=14.70]

Epoch 1/5:  87%|████████▋ | 164/189 [04:28<00:41,  1.67s/it, loss=14.55]

Epoch 1/5:  87%|████████▋ | 165/189 [04:28<00:39,  1.66s/it, loss=14.55]

Epoch 1/5:  87%|████████▋ | 165/189 [04:30<00:39,  1.66s/it, loss=14.54]

Epoch 1/5:  88%|████████▊ | 166/189 [04:30<00:38,  1.68s/it, loss=14.54]

Epoch 1/5:  88%|████████▊ | 166/189 [04:31<00:38,  1.68s/it, loss=14.35]

Epoch 1/5:  88%|████████▊ | 167/189 [04:31<00:36,  1.66s/it, loss=14.35]

Epoch 1/5:  88%|████████▊ | 167/189 [04:33<00:36,  1.66s/it, loss=14.41]

Epoch 1/5:  89%|████████▉ | 168/189 [04:33<00:35,  1.67s/it, loss=14.41]

Epoch 1/5:  89%|████████▉ | 168/189 [04:35<00:35,  1.67s/it, loss=14.46]

Epoch 1/5:  89%|████████▉ | 169/189 [04:35<00:33,  1.66s/it, loss=14.46]

Epoch 1/5:  89%|████████▉ | 169/189 [04:36<00:33,  1.66s/it, loss=14.36]

Epoch 1/5:  90%|████████▉ | 170/189 [04:36<00:30,  1.63s/it, loss=14.36]

Epoch 1/5:  90%|████████▉ | 170/189 [04:38<00:30,  1.63s/it, loss=14.38]

Epoch 1/5:  90%|█████████ | 171/189 [04:38<00:30,  1.68s/it, loss=14.38]

Epoch 1/5:  90%|█████████ | 171/189 [04:40<00:30,  1.68s/it, loss=14.19]

Epoch 1/5:  91%|█████████ | 172/189 [04:40<00:28,  1.68s/it, loss=14.19]

Epoch 1/5:  91%|█████████ | 172/189 [04:41<00:28,  1.68s/it, loss=14.05]

Epoch 1/5:  92%|█████████▏| 173/189 [04:41<00:25,  1.62s/it, loss=14.05]

Epoch 1/5:  92%|█████████▏| 173/189 [04:43<00:25,  1.62s/it, loss=14.13]

Epoch 1/5:  92%|█████████▏| 174/189 [04:43<00:24,  1.65s/it, loss=14.13]

Epoch 1/5:  92%|█████████▏| 174/189 [04:44<00:24,  1.65s/it, loss=14.16]

Epoch 1/5:  93%|█████████▎| 175/189 [04:44<00:22,  1.59s/it, loss=14.16]

Epoch 1/5:  93%|█████████▎| 175/189 [04:46<00:22,  1.59s/it, loss=14.07]

Epoch 1/5:  93%|█████████▎| 176/189 [04:46<00:20,  1.62s/it, loss=14.07]

Epoch 1/5:  93%|█████████▎| 176/189 [04:48<00:20,  1.62s/it, loss=13.89]

Epoch 1/5:  94%|█████████▎| 177/189 [04:48<00:18,  1.58s/it, loss=13.89]

Epoch 1/5:  94%|█████████▎| 177/189 [04:49<00:18,  1.58s/it, loss=13.97]

Epoch 1/5:  94%|█████████▍| 178/189 [04:49<00:17,  1.62s/it, loss=13.97]

Epoch 1/5:  94%|█████████▍| 178/189 [04:51<00:17,  1.62s/it, loss=14.08]

Epoch 1/5:  95%|█████████▍| 179/189 [04:51<00:16,  1.64s/it, loss=14.08]

Epoch 1/5:  95%|█████████▍| 179/189 [04:53<00:16,  1.64s/it, loss=13.94]

Epoch 1/5:  95%|█████████▌| 180/189 [04:53<00:14,  1.63s/it, loss=13.94]

Epoch 1/5:  95%|█████████▌| 180/189 [04:54<00:14,  1.63s/it, loss=13.85]

Epoch 1/5:  96%|█████████▌| 181/189 [04:54<00:13,  1.64s/it, loss=13.85]

Epoch 1/5:  96%|█████████▌| 181/189 [04:56<00:13,  1.64s/it, loss=13.88]

Epoch 1/5:  96%|█████████▋| 182/189 [04:56<00:11,  1.63s/it, loss=13.88]

Epoch 1/5:  96%|█████████▋| 182/189 [04:58<00:11,  1.63s/it, loss=13.67]

Epoch 1/5:  97%|█████████▋| 183/189 [04:58<00:09,  1.66s/it, loss=13.67]

Epoch 1/5:  97%|█████████▋| 183/189 [04:59<00:09,  1.66s/it, loss=13.83]

Epoch 1/5:  97%|█████████▋| 184/189 [04:59<00:08,  1.65s/it, loss=13.83]

Epoch 1/5:  97%|█████████▋| 184/189 [05:01<00:08,  1.65s/it, loss=13.61]

Epoch 1/5:  98%|█████████▊| 185/189 [05:01<00:06,  1.58s/it, loss=13.61]

Epoch 1/5:  98%|█████████▊| 185/189 [05:02<00:06,  1.58s/it, loss=13.65]

Epoch 1/5:  98%|█████████▊| 186/189 [05:02<00:04,  1.62s/it, loss=13.65]

Epoch 1/5:  98%|█████████▊| 186/189 [05:04<00:04,  1.62s/it, loss=13.69]

Epoch 1/5:  99%|█████████▉| 187/189 [05:04<00:03,  1.61s/it, loss=13.69]

Epoch 1/5:  99%|█████████▉| 187/189 [05:05<00:03,  1.61s/it, loss=13.57]

Epoch 1/5:  99%|█████████▉| 188/189 [05:05<00:01,  1.59s/it, loss=13.57]

Epoch 1/5:  99%|█████████▉| 188/189 [05:07<00:01,  1.59s/it, loss=13.67]

Epoch 1/5: 100%|██████████| 189/189 [05:07<00:00,  1.60s/it, loss=13.67]

Epoch 1/5: 100%|██████████| 189/189 [05:07<00:00,  1.63s/it, loss=13.67]




  0%|          | 0/23 [00:00<?, ?it/s]

  4%|▍         | 1/23 [00:00<00:08,  2.64it/s]

  9%|▊         | 2/23 [00:00<00:08,  2.60it/s]

 13%|█▎        | 3/23 [00:01<00:07,  2.57it/s]

 17%|█▋        | 4/23 [00:01<00:06,  2.94it/s]

 22%|██▏       | 5/23 [00:01<00:06,  2.82it/s]

 26%|██▌       | 6/23 [00:02<00:06,  2.67it/s]

 30%|███       | 7/23 [00:02<00:05,  2.85it/s]

 35%|███▍      | 8/23 [00:02<00:05,  2.92it/s]

 39%|███▉      | 9/23 [00:03<00:05,  2.68it/s]

 43%|████▎     | 10/23 [00:03<00:04,  2.78it/s]

 48%|████▊     | 11/23 [00:04<00:04,  2.71it/s]

 52%|█████▏    | 12/23 [00:04<00:03,  2.76it/s]

 57%|█████▋    | 13/23 [00:04<00:03,  2.78it/s]

 61%|██████    | 14/23 [00:05<00:03,  2.80it/s]

 65%|██████▌   | 15/23 [00:05<00:02,  2.84it/s]

 70%|██████▉   | 16/23 [00:05<00:02,  2.80it/s]

 74%|███████▍  | 17/23 [00:06<00:02,  2.73it/s]

 78%|███████▊  | 18/23 [00:06<00:01,  2.67it/s]

 83%|████████▎ | 19/23 [00:06<00:01,  2.60it/s]

 87%|████████▋ | 20/23 [00:07<00:01,  2.55it/s]

 91%|█████████▏| 21/23 [00:07<00:00,  2.73it/s]

 96%|█████████▌| 22/23 [00:08<00:00,  2.67it/s]

100%|██████████| 23/23 [00:08<00:00,  2.61it/s]

100%|██████████| 23/23 [00:08<00:00,  2.71it/s]





Epoch 1: train_loss=19.5774 | R@10=0.0075 | DCG@10=0.0666 | NDCG@10=0.0156


Epoch 2/5:   0%|          | 0/189 [00:00<?, ?it/s]

Epoch 2/5:   0%|          | 0/189 [00:01<?, ?it/s, loss=13.54]

Epoch 2/5:   1%|          | 1/189 [00:01<05:20,  1.71s/it, loss=13.54]

Epoch 2/5:   1%|          | 1/189 [00:03<05:20,  1.71s/it, loss=13.40]

Epoch 2/5:   1%|          | 2/189 [00:03<05:08,  1.65s/it, loss=13.40]

Epoch 2/5:   1%|          | 2/189 [00:04<05:08,  1.65s/it, loss=13.47]

Epoch 2/5:   2%|▏         | 3/189 [00:04<04:51,  1.56s/it, loss=13.47]

Epoch 2/5:   2%|▏         | 3/189 [00:06<04:51,  1.56s/it, loss=13.47]

Epoch 2/5:   2%|▏         | 4/189 [00:06<04:50,  1.57s/it, loss=13.47]

Epoch 2/5:   2%|▏         | 4/189 [00:07<04:50,  1.57s/it, loss=13.30]

Epoch 2/5:   3%|▎         | 5/189 [00:07<04:46,  1.56s/it, loss=13.30]

Epoch 2/5:   3%|▎         | 5/189 [00:09<04:46,  1.56s/it, loss=13.31]

Epoch 2/5:   3%|▎         | 6/189 [00:09<04:49,  1.58s/it, loss=13.31]

Epoch 2/5:   3%|▎         | 6/189 [00:11<04:49,  1.58s/it, loss=13.23]

Epoch 2/5:   4%|▎         | 7/189 [00:11<04:46,  1.58s/it, loss=13.23]

Epoch 2/5:   4%|▎         | 7/189 [00:12<04:46,  1.58s/it, loss=13.24]

Epoch 2/5:   4%|▍         | 8/189 [00:12<04:33,  1.51s/it, loss=13.24]

Epoch 2/5:   4%|▍         | 8/189 [00:13<04:33,  1.51s/it, loss=13.35]

Epoch 2/5:   5%|▍         | 9/189 [00:13<04:31,  1.51s/it, loss=13.35]

Epoch 2/5:   5%|▍         | 9/189 [00:15<04:31,  1.51s/it, loss=13.15]

Epoch 2/5:   5%|▌         | 10/189 [00:15<04:31,  1.52s/it, loss=13.15]

Epoch 2/5:   5%|▌         | 10/189 [00:17<04:31,  1.52s/it, loss=13.11]

Epoch 2/5:   6%|▌         | 11/189 [00:17<04:29,  1.52s/it, loss=13.11]

Epoch 2/5:   6%|▌         | 11/189 [00:18<04:29,  1.52s/it, loss=13.15]

Epoch 2/5:   6%|▋         | 12/189 [00:18<04:35,  1.56s/it, loss=13.15]

Epoch 2/5:   6%|▋         | 12/189 [00:20<04:35,  1.56s/it, loss=13.11]

Epoch 2/5:   7%|▋         | 13/189 [00:20<04:40,  1.59s/it, loss=13.11]

Epoch 2/5:   7%|▋         | 13/189 [00:22<04:40,  1.59s/it, loss=13.27]

Epoch 2/5:   7%|▋         | 14/189 [00:22<04:45,  1.63s/it, loss=13.27]

Epoch 2/5:   7%|▋         | 14/189 [00:23<04:45,  1.63s/it, loss=13.05]

Epoch 2/5:   8%|▊         | 15/189 [00:23<04:50,  1.67s/it, loss=13.05]

Epoch 2/5:   8%|▊         | 15/189 [00:25<04:50,  1.67s/it, loss=12.91]

Epoch 2/5:   8%|▊         | 16/189 [00:25<04:47,  1.66s/it, loss=12.91]

Epoch 2/5:   8%|▊         | 16/189 [00:27<04:47,  1.66s/it, loss=12.93]

Epoch 2/5:   9%|▉         | 17/189 [00:27<04:40,  1.63s/it, loss=12.93]

Epoch 2/5:   9%|▉         | 17/189 [00:28<04:40,  1.63s/it, loss=13.04]

Epoch 2/5:  10%|▉         | 18/189 [00:28<04:39,  1.63s/it, loss=13.04]

Epoch 2/5:  10%|▉         | 18/189 [00:30<04:39,  1.63s/it, loss=12.89]

Epoch 2/5:  10%|█         | 19/189 [00:30<04:38,  1.64s/it, loss=12.89]

Epoch 2/5:  10%|█         | 19/189 [00:31<04:38,  1.64s/it, loss=12.66]

Epoch 2/5:  11%|█         | 20/189 [00:31<04:38,  1.65s/it, loss=12.66]

Epoch 2/5:  11%|█         | 20/189 [00:33<04:38,  1.65s/it, loss=12.61]

Epoch 2/5:  11%|█         | 21/189 [00:33<04:42,  1.68s/it, loss=12.61]

Epoch 2/5:  11%|█         | 21/189 [00:35<04:42,  1.68s/it, loss=12.86]

Epoch 2/5:  12%|█▏        | 22/189 [00:35<04:35,  1.65s/it, loss=12.86]

Epoch 2/5:  12%|█▏        | 22/189 [00:36<04:35,  1.65s/it, loss=13.02]

Epoch 2/5:  12%|█▏        | 23/189 [00:36<04:29,  1.63s/it, loss=13.02]

Epoch 2/5:  12%|█▏        | 23/189 [00:38<04:29,  1.63s/it, loss=12.67]

Epoch 2/5:  13%|█▎        | 24/189 [00:38<04:26,  1.62s/it, loss=12.67]

Epoch 2/5:  13%|█▎        | 24/189 [00:40<04:26,  1.62s/it, loss=12.71]

Epoch 2/5:  13%|█▎        | 25/189 [00:40<04:28,  1.64s/it, loss=12.71]

Epoch 2/5:  13%|█▎        | 25/189 [00:41<04:28,  1.64s/it, loss=12.63]

Epoch 2/5:  14%|█▍        | 26/189 [00:41<04:19,  1.59s/it, loss=12.63]

Epoch 2/5:  14%|█▍        | 26/189 [00:43<04:19,  1.59s/it, loss=12.68]

Epoch 2/5:  14%|█▍        | 27/189 [00:43<04:26,  1.64s/it, loss=12.68]

Epoch 2/5:  14%|█▍        | 27/189 [00:45<04:26,  1.64s/it, loss=12.62]

Epoch 2/5:  15%|█▍        | 28/189 [00:45<04:26,  1.66s/it, loss=12.62]

Epoch 2/5:  15%|█▍        | 28/189 [00:46<04:26,  1.66s/it, loss=12.56]

Epoch 2/5:  15%|█▌        | 29/189 [00:46<04:20,  1.63s/it, loss=12.56]

Epoch 2/5:  15%|█▌        | 29/189 [00:48<04:20,  1.63s/it, loss=12.73]

Epoch 2/5:  16%|█▌        | 30/189 [00:48<04:18,  1.63s/it, loss=12.73]

Epoch 2/5:  16%|█▌        | 30/189 [00:49<04:18,  1.63s/it, loss=12.63]

Epoch 2/5:  16%|█▋        | 31/189 [00:49<04:18,  1.64s/it, loss=12.63]

Epoch 2/5:  16%|█▋        | 31/189 [00:51<04:18,  1.64s/it, loss=12.51]

Epoch 2/5:  17%|█▋        | 32/189 [00:51<04:15,  1.63s/it, loss=12.51]

Epoch 2/5:  17%|█▋        | 32/189 [00:53<04:15,  1.63s/it, loss=12.16]

Epoch 2/5:  17%|█▋        | 33/189 [00:53<04:12,  1.62s/it, loss=12.16]

Epoch 2/5:  17%|█▋        | 33/189 [00:54<04:12,  1.62s/it, loss=12.49]

Epoch 2/5:  18%|█▊        | 34/189 [00:54<04:18,  1.67s/it, loss=12.49]

Epoch 2/5:  18%|█▊        | 34/189 [00:56<04:18,  1.67s/it, loss=12.27]

Epoch 2/5:  19%|█▊        | 35/189 [00:56<04:19,  1.68s/it, loss=12.27]

Epoch 2/5:  19%|█▊        | 35/189 [00:58<04:19,  1.68s/it, loss=12.26]

Epoch 2/5:  19%|█▉        | 36/189 [00:58<04:17,  1.68s/it, loss=12.26]

Epoch 2/5:  19%|█▉        | 36/189 [00:59<04:17,  1.68s/it, loss=12.36]

Epoch 2/5:  20%|█▉        | 37/189 [00:59<04:12,  1.66s/it, loss=12.36]

Epoch 2/5:  20%|█▉        | 37/189 [01:01<04:12,  1.66s/it, loss=12.24]

Epoch 2/5:  20%|██        | 38/189 [01:01<04:04,  1.62s/it, loss=12.24]

Epoch 2/5:  20%|██        | 38/189 [01:03<04:04,  1.62s/it, loss=12.26]

Epoch 2/5:  21%|██        | 39/189 [01:03<04:06,  1.64s/it, loss=12.26]

Epoch 2/5:  21%|██        | 39/189 [01:04<04:06,  1.64s/it, loss=12.12]

Epoch 2/5:  21%|██        | 40/189 [01:04<04:08,  1.67s/it, loss=12.12]

Epoch 2/5:  21%|██        | 40/189 [01:06<04:08,  1.67s/it, loss=12.33]

Epoch 2/5:  22%|██▏       | 41/189 [01:06<04:09,  1.68s/it, loss=12.33]

Epoch 2/5:  22%|██▏       | 41/189 [01:08<04:09,  1.68s/it, loss=12.16]

Epoch 2/5:  22%|██▏       | 42/189 [01:08<04:01,  1.65s/it, loss=12.16]

Epoch 2/5:  22%|██▏       | 42/189 [01:09<04:01,  1.65s/it, loss=12.10]

Epoch 2/5:  23%|██▎       | 43/189 [01:09<03:54,  1.60s/it, loss=12.10]

Epoch 2/5:  23%|██▎       | 43/189 [01:11<03:54,  1.60s/it, loss=11.91]

Epoch 2/5:  23%|██▎       | 44/189 [01:11<03:52,  1.60s/it, loss=11.91]

Epoch 2/5:  23%|██▎       | 44/189 [01:12<03:52,  1.60s/it, loss=12.02]

Epoch 2/5:  24%|██▍       | 45/189 [01:12<03:53,  1.62s/it, loss=12.02]

Epoch 2/5:  24%|██▍       | 45/189 [01:14<03:53,  1.62s/it, loss=12.11]

Epoch 2/5:  24%|██▍       | 46/189 [01:14<03:48,  1.59s/it, loss=12.11]

Epoch 2/5:  24%|██▍       | 46/189 [01:16<03:48,  1.59s/it, loss=12.06]

Epoch 2/5:  25%|██▍       | 47/189 [01:16<03:56,  1.67s/it, loss=12.06]

Epoch 2/5:  25%|██▍       | 47/189 [01:18<03:56,  1.67s/it, loss=12.16]

Epoch 2/5:  25%|██▌       | 48/189 [01:18<04:00,  1.71s/it, loss=12.16]

Epoch 2/5:  25%|██▌       | 48/189 [01:19<04:00,  1.71s/it, loss=11.98]

Epoch 2/5:  26%|██▌       | 49/189 [01:19<03:55,  1.68s/it, loss=11.98]

Epoch 2/5:  26%|██▌       | 49/189 [01:21<03:55,  1.68s/it, loss=12.18]

Epoch 2/5:  26%|██▋       | 50/189 [01:21<03:53,  1.68s/it, loss=12.18]

Epoch 2/5:  26%|██▋       | 50/189 [01:23<03:53,  1.68s/it, loss=11.85]

Epoch 2/5:  27%|██▋       | 51/189 [01:23<03:55,  1.70s/it, loss=11.85]

Epoch 2/5:  27%|██▋       | 51/189 [01:24<03:55,  1.70s/it, loss=11.98]

Epoch 2/5:  28%|██▊       | 52/189 [01:24<03:43,  1.63s/it, loss=11.98]

Epoch 2/5:  28%|██▊       | 52/189 [01:26<03:43,  1.63s/it, loss=12.01]

Epoch 2/5:  28%|██▊       | 53/189 [01:26<03:41,  1.63s/it, loss=12.01]

Epoch 2/5:  28%|██▊       | 53/189 [01:27<03:41,  1.63s/it, loss=11.81]

Epoch 2/5:  29%|██▊       | 54/189 [01:27<03:41,  1.64s/it, loss=11.81]

Epoch 2/5:  29%|██▊       | 54/189 [01:29<03:41,  1.64s/it, loss=11.78]

Epoch 2/5:  29%|██▉       | 55/189 [01:29<03:41,  1.66s/it, loss=11.78]

Epoch 2/5:  29%|██▉       | 55/189 [01:31<03:41,  1.66s/it, loss=11.84]

Epoch 2/5:  30%|██▉       | 56/189 [01:31<03:41,  1.67s/it, loss=11.84]

Epoch 2/5:  30%|██▉       | 56/189 [01:32<03:41,  1.67s/it, loss=12.02]

Epoch 2/5:  30%|███       | 57/189 [01:32<03:37,  1.64s/it, loss=12.02]

Epoch 2/5:  30%|███       | 57/189 [01:34<03:37,  1.64s/it, loss=11.94]

Epoch 2/5:  31%|███       | 58/189 [01:34<03:31,  1.61s/it, loss=11.94]

Epoch 2/5:  31%|███       | 58/189 [01:36<03:31,  1.61s/it, loss=11.72]

Epoch 2/5:  31%|███       | 59/189 [01:36<03:31,  1.63s/it, loss=11.72]

Epoch 2/5:  31%|███       | 59/189 [01:37<03:31,  1.63s/it, loss=11.60]

Epoch 2/5:  32%|███▏      | 60/189 [01:37<03:28,  1.61s/it, loss=11.60]

Epoch 2/5:  32%|███▏      | 60/189 [01:39<03:28,  1.61s/it, loss=11.65]

Epoch 2/5:  32%|███▏      | 61/189 [01:39<03:29,  1.64s/it, loss=11.65]

Epoch 2/5:  32%|███▏      | 61/189 [01:41<03:29,  1.64s/it, loss=11.49]

Epoch 2/5:  33%|███▎      | 62/189 [01:41<03:29,  1.65s/it, loss=11.49]

Epoch 2/5:  33%|███▎      | 62/189 [01:42<03:29,  1.65s/it, loss=11.53]

Epoch 2/5:  33%|███▎      | 63/189 [01:42<03:27,  1.65s/it, loss=11.53]

Epoch 2/5:  33%|███▎      | 63/189 [01:44<03:27,  1.65s/it, loss=11.93]

Epoch 2/5:  34%|███▍      | 64/189 [01:44<03:21,  1.61s/it, loss=11.93]

Epoch 2/5:  34%|███▍      | 64/189 [01:45<03:21,  1.61s/it, loss=11.78]

Epoch 2/5:  34%|███▍      | 65/189 [01:45<03:16,  1.58s/it, loss=11.78]

Epoch 2/5:  34%|███▍      | 65/189 [01:47<03:16,  1.58s/it, loss=11.73]

Epoch 2/5:  35%|███▍      | 66/189 [01:47<03:24,  1.66s/it, loss=11.73]

Epoch 2/5:  35%|███▍      | 66/189 [01:49<03:24,  1.66s/it, loss=11.46]

Epoch 2/5:  35%|███▌      | 67/189 [01:49<03:18,  1.63s/it, loss=11.46]

Epoch 2/5:  35%|███▌      | 67/189 [01:50<03:18,  1.63s/it, loss=11.55]

Epoch 2/5:  36%|███▌      | 68/189 [01:50<03:16,  1.63s/it, loss=11.55]

Epoch 2/5:  36%|███▌      | 68/189 [01:52<03:16,  1.63s/it, loss=11.59]

Epoch 2/5:  37%|███▋      | 69/189 [01:52<03:13,  1.62s/it, loss=11.59]

Epoch 2/5:  37%|███▋      | 69/189 [01:53<03:13,  1.62s/it, loss=11.51]

Epoch 2/5:  37%|███▋      | 70/189 [01:53<03:10,  1.60s/it, loss=11.51]

Epoch 2/5:  37%|███▋      | 70/189 [01:55<03:10,  1.60s/it, loss=11.60]

Epoch 2/5:  38%|███▊      | 71/189 [01:55<03:10,  1.62s/it, loss=11.60]

Epoch 2/5:  38%|███▊      | 71/189 [01:57<03:10,  1.62s/it, loss=11.33]

Epoch 2/5:  38%|███▊      | 72/189 [01:57<03:06,  1.60s/it, loss=11.33]

Epoch 2/5:  38%|███▊      | 72/189 [01:58<03:06,  1.60s/it, loss=11.61]

Epoch 2/5:  39%|███▊      | 73/189 [01:58<03:04,  1.59s/it, loss=11.61]

Epoch 2/5:  39%|███▊      | 73/189 [02:00<03:04,  1.59s/it, loss=11.45]

Epoch 2/5:  39%|███▉      | 74/189 [02:00<02:57,  1.55s/it, loss=11.45]

Epoch 2/5:  39%|███▉      | 74/189 [02:01<02:57,  1.55s/it, loss=11.39]

Epoch 2/5:  40%|███▉      | 75/189 [02:01<02:54,  1.53s/it, loss=11.39]

Epoch 2/5:  40%|███▉      | 75/189 [02:03<02:54,  1.53s/it, loss=11.38]

Epoch 2/5:  40%|████      | 76/189 [02:03<02:53,  1.53s/it, loss=11.38]

Epoch 2/5:  40%|████      | 76/189 [02:04<02:53,  1.53s/it, loss=11.39]

Epoch 2/5:  41%|████      | 77/189 [02:04<02:53,  1.55s/it, loss=11.39]

Epoch 2/5:  41%|████      | 77/189 [02:06<02:53,  1.55s/it, loss=11.31]

Epoch 2/5:  41%|████▏     | 78/189 [02:06<02:52,  1.55s/it, loss=11.31]

Epoch 2/5:  41%|████▏     | 78/189 [02:07<02:52,  1.55s/it, loss=11.40]

Epoch 2/5:  42%|████▏     | 79/189 [02:07<02:51,  1.56s/it, loss=11.40]

Epoch 2/5:  42%|████▏     | 79/189 [02:09<02:51,  1.56s/it, loss=11.34]

Epoch 2/5:  42%|████▏     | 80/189 [02:09<02:45,  1.52s/it, loss=11.34]

Epoch 2/5:  42%|████▏     | 80/189 [02:10<02:45,  1.52s/it, loss=11.21]

Epoch 2/5:  43%|████▎     | 81/189 [02:10<02:44,  1.53s/it, loss=11.21]

Epoch 2/5:  43%|████▎     | 81/189 [02:12<02:44,  1.53s/it, loss=11.26]

Epoch 2/5:  43%|████▎     | 82/189 [02:12<02:47,  1.57s/it, loss=11.26]

Epoch 2/5:  43%|████▎     | 82/189 [02:14<02:47,  1.57s/it, loss=11.34]

Epoch 2/5:  44%|████▍     | 83/189 [02:14<02:51,  1.62s/it, loss=11.34]

Epoch 2/5:  44%|████▍     | 83/189 [02:15<02:51,  1.62s/it, loss=11.28]

Epoch 2/5:  44%|████▍     | 84/189 [02:15<02:48,  1.60s/it, loss=11.28]

Epoch 2/5:  44%|████▍     | 84/189 [02:17<02:48,  1.60s/it, loss=11.10]

Epoch 2/5:  45%|████▍     | 85/189 [02:17<02:46,  1.61s/it, loss=11.10]

Epoch 2/5:  45%|████▍     | 85/189 [02:18<02:46,  1.61s/it, loss=10.87]

Epoch 2/5:  46%|████▌     | 86/189 [02:18<02:41,  1.57s/it, loss=10.87]

Epoch 2/5:  46%|████▌     | 86/189 [02:20<02:41,  1.57s/it, loss=11.29]

Epoch 2/5:  46%|████▌     | 87/189 [02:20<02:37,  1.54s/it, loss=11.29]

Epoch 2/5:  46%|████▌     | 87/189 [02:22<02:37,  1.54s/it, loss=11.03]

Epoch 2/5:  47%|████▋     | 88/189 [02:22<02:38,  1.57s/it, loss=11.03]

Epoch 2/5:  47%|████▋     | 88/189 [02:23<02:38,  1.57s/it, loss=11.04]

Epoch 2/5:  47%|████▋     | 89/189 [02:23<02:35,  1.56s/it, loss=11.04]

Epoch 2/5:  47%|████▋     | 89/189 [02:25<02:35,  1.56s/it, loss=11.05]

Epoch 2/5:  48%|████▊     | 90/189 [02:25<02:36,  1.58s/it, loss=11.05]

Epoch 2/5:  48%|████▊     | 90/189 [02:26<02:36,  1.58s/it, loss=11.04]

Epoch 2/5:  48%|████▊     | 91/189 [02:26<02:30,  1.54s/it, loss=11.04]

Epoch 2/5:  48%|████▊     | 91/189 [02:28<02:30,  1.54s/it, loss=11.02]

Epoch 2/5:  49%|████▊     | 92/189 [02:28<02:27,  1.52s/it, loss=11.02]

Epoch 2/5:  49%|████▊     | 92/189 [02:29<02:27,  1.52s/it, loss=10.98]

Epoch 2/5:  49%|████▉     | 93/189 [02:29<02:28,  1.55s/it, loss=10.98]

Epoch 2/5:  49%|████▉     | 93/189 [02:31<02:28,  1.55s/it, loss=11.15]

Epoch 2/5:  50%|████▉     | 94/189 [02:31<02:26,  1.54s/it, loss=11.15]

Epoch 2/5:  50%|████▉     | 94/189 [02:32<02:26,  1.54s/it, loss=11.03]

Epoch 2/5:  50%|█████     | 95/189 [02:32<02:18,  1.47s/it, loss=11.03]

Epoch 2/5:  50%|█████     | 95/189 [02:34<02:18,  1.47s/it, loss=10.97]

Epoch 2/5:  51%|█████     | 96/189 [02:34<02:16,  1.47s/it, loss=10.97]

Epoch 2/5:  51%|█████     | 96/189 [02:35<02:16,  1.47s/it, loss=10.85]

Epoch 2/5:  51%|█████▏    | 97/189 [02:35<02:22,  1.55s/it, loss=10.85]

Epoch 2/5:  51%|█████▏    | 97/189 [02:37<02:22,  1.55s/it, loss=10.86]

Epoch 2/5:  52%|█████▏    | 98/189 [02:37<02:28,  1.63s/it, loss=10.86]

Epoch 2/5:  52%|█████▏    | 98/189 [02:39<02:28,  1.63s/it, loss=10.85]

Epoch 2/5:  52%|█████▏    | 99/189 [02:39<02:29,  1.67s/it, loss=10.85]

Epoch 2/5:  52%|█████▏    | 99/189 [02:41<02:29,  1.67s/it, loss=10.84]

Epoch 2/5:  53%|█████▎    | 100/189 [02:41<02:28,  1.67s/it, loss=10.84]

Epoch 2/5:  53%|█████▎    | 100/189 [02:42<02:28,  1.67s/it, loss=10.78]

Epoch 2/5:  53%|█████▎    | 101/189 [02:42<02:28,  1.68s/it, loss=10.78]

Epoch 2/5:  53%|█████▎    | 101/189 [02:44<02:28,  1.68s/it, loss=10.84]

Epoch 2/5:  54%|█████▍    | 102/189 [02:44<02:21,  1.63s/it, loss=10.84]

Epoch 2/5:  54%|█████▍    | 102/189 [02:45<02:21,  1.63s/it, loss=10.75]

Epoch 2/5:  54%|█████▍    | 103/189 [02:45<02:17,  1.59s/it, loss=10.75]

Epoch 2/5:  54%|█████▍    | 103/189 [02:47<02:17,  1.59s/it, loss=10.67]

Epoch 2/5:  55%|█████▌    | 104/189 [02:47<02:15,  1.60s/it, loss=10.67]

Epoch 2/5:  55%|█████▌    | 104/189 [02:48<02:15,  1.60s/it, loss=10.66]

Epoch 2/5:  56%|█████▌    | 105/189 [02:48<02:14,  1.60s/it, loss=10.66]

Epoch 2/5:  56%|█████▌    | 105/189 [02:50<02:14,  1.60s/it, loss=10.69]

Epoch 2/5:  56%|█████▌    | 106/189 [02:50<02:12,  1.60s/it, loss=10.69]

Epoch 2/5:  56%|█████▌    | 106/189 [02:52<02:12,  1.60s/it, loss=10.86]

Epoch 2/5:  57%|█████▋    | 107/189 [02:52<02:13,  1.62s/it, loss=10.86]

Epoch 2/5:  57%|█████▋    | 107/189 [02:53<02:13,  1.62s/it, loss=10.67]

Epoch 2/5:  57%|█████▋    | 108/189 [02:53<02:10,  1.62s/it, loss=10.67]

Epoch 2/5:  57%|█████▋    | 108/189 [02:55<02:10,  1.62s/it, loss=10.60]

Epoch 2/5:  58%|█████▊    | 109/189 [02:55<02:08,  1.61s/it, loss=10.60]

Epoch 2/5:  58%|█████▊    | 109/189 [02:57<02:08,  1.61s/it, loss=10.68]

Epoch 2/5:  58%|█████▊    | 110/189 [02:57<02:10,  1.66s/it, loss=10.68]

Epoch 2/5:  58%|█████▊    | 110/189 [02:58<02:10,  1.66s/it, loss=10.80]

Epoch 2/5:  59%|█████▊    | 111/189 [02:58<02:05,  1.61s/it, loss=10.80]

Epoch 2/5:  59%|█████▊    | 111/189 [03:00<02:05,  1.61s/it, loss=10.63]

Epoch 2/5:  59%|█████▉    | 112/189 [03:00<02:02,  1.59s/it, loss=10.63]

Epoch 2/5:  59%|█████▉    | 112/189 [03:01<02:02,  1.59s/it, loss=10.75]

Epoch 2/5:  60%|█████▉    | 113/189 [03:01<02:00,  1.58s/it, loss=10.75]

Epoch 2/5:  60%|█████▉    | 113/189 [03:03<02:00,  1.58s/it, loss=10.67]

Epoch 2/5:  60%|██████    | 114/189 [03:03<02:00,  1.60s/it, loss=10.67]

Epoch 2/5:  60%|██████    | 114/189 [03:05<02:00,  1.60s/it, loss=10.79]

Epoch 2/5:  61%|██████    | 115/189 [03:05<01:58,  1.60s/it, loss=10.79]

Epoch 2/5:  61%|██████    | 115/189 [03:06<01:58,  1.60s/it, loss=10.52]

Epoch 2/5:  61%|██████▏   | 116/189 [03:06<01:53,  1.55s/it, loss=10.52]

Epoch 2/5:  61%|██████▏   | 116/189 [03:08<01:53,  1.55s/it, loss=10.52]

Epoch 2/5:  62%|██████▏   | 117/189 [03:08<01:52,  1.57s/it, loss=10.52]

Epoch 2/5:  62%|██████▏   | 117/189 [03:09<01:52,  1.57s/it, loss=10.55]

Epoch 2/5:  62%|██████▏   | 118/189 [03:09<01:53,  1.59s/it, loss=10.55]

Epoch 2/5:  62%|██████▏   | 118/189 [03:11<01:53,  1.59s/it, loss=10.54]

Epoch 2/5:  63%|██████▎   | 119/189 [03:11<01:51,  1.59s/it, loss=10.54]

Epoch 2/5:  63%|██████▎   | 119/189 [03:12<01:51,  1.59s/it, loss=10.55]

Epoch 2/5:  63%|██████▎   | 120/189 [03:12<01:49,  1.58s/it, loss=10.55]

Epoch 2/5:  63%|██████▎   | 120/189 [03:14<01:49,  1.58s/it, loss=10.53]

Epoch 2/5:  64%|██████▍   | 121/189 [03:14<01:48,  1.60s/it, loss=10.53]

Epoch 2/5:  64%|██████▍   | 121/189 [03:16<01:48,  1.60s/it, loss=10.71]

Epoch 2/5:  65%|██████▍   | 122/189 [03:16<01:47,  1.61s/it, loss=10.71]

Epoch 2/5:  65%|██████▍   | 122/189 [03:17<01:47,  1.61s/it, loss=10.54]

Epoch 2/5:  65%|██████▌   | 123/189 [03:17<01:45,  1.60s/it, loss=10.54]

Epoch 2/5:  65%|██████▌   | 123/189 [03:19<01:45,  1.60s/it, loss=10.54]

Epoch 2/5:  66%|██████▌   | 124/189 [03:19<01:43,  1.60s/it, loss=10.54]

Epoch 2/5:  66%|██████▌   | 124/189 [03:20<01:43,  1.60s/it, loss=10.51]

Epoch 2/5:  66%|██████▌   | 125/189 [03:20<01:41,  1.59s/it, loss=10.51]

Epoch 2/5:  66%|██████▌   | 125/189 [03:22<01:41,  1.59s/it, loss=10.56]

Epoch 2/5:  67%|██████▋   | 126/189 [03:22<01:41,  1.60s/it, loss=10.56]

Epoch 2/5:  67%|██████▋   | 126/189 [03:24<01:41,  1.60s/it, loss=10.40]

Epoch 2/5:  67%|██████▋   | 127/189 [03:24<01:40,  1.62s/it, loss=10.40]

Epoch 2/5:  67%|██████▋   | 127/189 [03:25<01:40,  1.62s/it, loss=10.46]

Epoch 2/5:  68%|██████▊   | 128/189 [03:25<01:40,  1.64s/it, loss=10.46]

Epoch 2/5:  68%|██████▊   | 128/189 [03:27<01:40,  1.64s/it, loss=10.54]

Epoch 2/5:  68%|██████▊   | 129/189 [03:27<01:36,  1.62s/it, loss=10.54]

Epoch 2/5:  68%|██████▊   | 129/189 [03:29<01:36,  1.62s/it, loss=10.52]

Epoch 2/5:  69%|██████▉   | 130/189 [03:29<01:35,  1.63s/it, loss=10.52]

Epoch 2/5:  69%|██████▉   | 130/189 [03:30<01:35,  1.63s/it, loss=10.44]

Epoch 2/5:  69%|██████▉   | 131/189 [03:30<01:35,  1.64s/it, loss=10.44]

Epoch 2/5:  69%|██████▉   | 131/189 [03:32<01:35,  1.64s/it, loss=10.26]

Epoch 2/5:  70%|██████▉   | 132/189 [03:32<01:31,  1.61s/it, loss=10.26]

Epoch 2/5:  70%|██████▉   | 132/189 [03:33<01:31,  1.61s/it, loss=10.31]

Epoch 2/5:  70%|███████   | 133/189 [03:33<01:29,  1.61s/it, loss=10.31]

Epoch 2/5:  70%|███████   | 133/189 [03:35<01:29,  1.61s/it, loss=10.32]

Epoch 2/5:  71%|███████   | 134/189 [03:35<01:27,  1.60s/it, loss=10.32]

Epoch 2/5:  71%|███████   | 134/189 [03:37<01:27,  1.60s/it, loss=10.37]

Epoch 2/5:  71%|███████▏  | 135/189 [03:37<01:25,  1.58s/it, loss=10.37]

Epoch 2/5:  71%|███████▏  | 135/189 [03:38<01:25,  1.58s/it, loss=10.20]

Epoch 2/5:  72%|███████▏  | 136/189 [03:38<01:20,  1.52s/it, loss=10.20]

Epoch 2/5:  72%|███████▏  | 136/189 [03:39<01:20,  1.52s/it, loss=10.31]

Epoch 2/5:  72%|███████▏  | 137/189 [03:39<01:20,  1.54s/it, loss=10.31]

Epoch 2/5:  72%|███████▏  | 137/189 [03:41<01:20,  1.54s/it, loss=10.19]

Epoch 2/5:  73%|███████▎  | 138/189 [03:41<01:20,  1.57s/it, loss=10.19]

Epoch 2/5:  73%|███████▎  | 138/189 [03:43<01:20,  1.57s/it, loss=10.21]

Epoch 2/5:  74%|███████▎  | 139/189 [03:43<01:16,  1.52s/it, loss=10.21]

Epoch 2/5:  74%|███████▎  | 139/189 [03:44<01:16,  1.52s/it, loss=10.06]

Epoch 2/5:  74%|███████▍  | 140/189 [03:44<01:17,  1.58s/it, loss=10.06]

Epoch 2/5:  74%|███████▍  | 140/189 [03:46<01:17,  1.58s/it, loss=10.29]

Epoch 2/5:  75%|███████▍  | 141/189 [03:46<01:14,  1.55s/it, loss=10.29]

Epoch 2/5:  75%|███████▍  | 141/189 [03:47<01:14,  1.55s/it, loss=10.24]

Epoch 2/5:  75%|███████▌  | 142/189 [03:47<01:12,  1.55s/it, loss=10.24]

Epoch 2/5:  75%|███████▌  | 142/189 [03:49<01:12,  1.55s/it, loss=10.08]

Epoch 2/5:  76%|███████▌  | 143/189 [03:49<01:11,  1.56s/it, loss=10.08]

Epoch 2/5:  76%|███████▌  | 143/189 [03:50<01:11,  1.56s/it, loss=10.15]

Epoch 2/5:  76%|███████▌  | 144/189 [03:50<01:09,  1.55s/it, loss=10.15]

Epoch 2/5:  76%|███████▌  | 144/189 [03:52<01:09,  1.55s/it, loss=10.10]

Epoch 2/5:  77%|███████▋  | 145/189 [03:52<01:06,  1.51s/it, loss=10.10]

Epoch 2/5:  77%|███████▋  | 145/189 [03:53<01:06,  1.51s/it, loss=10.13]

Epoch 2/5:  77%|███████▋  | 146/189 [03:53<01:07,  1.56s/it, loss=10.13]

Epoch 2/5:  77%|███████▋  | 146/189 [03:55<01:07,  1.56s/it, loss=10.02]

Epoch 2/5:  78%|███████▊  | 147/189 [03:55<01:05,  1.55s/it, loss=10.02]

Epoch 2/5:  78%|███████▊  | 147/189 [03:57<01:05,  1.55s/it, loss=10.05]

Epoch 2/5:  78%|███████▊  | 148/189 [03:57<01:03,  1.56s/it, loss=10.05]

Epoch 2/5:  78%|███████▊  | 148/189 [03:58<01:03,  1.56s/it, loss=9.91] 

Epoch 2/5:  79%|███████▉  | 149/189 [03:58<01:01,  1.54s/it, loss=9.91]

Epoch 2/5:  79%|███████▉  | 149/189 [04:00<01:01,  1.54s/it, loss=10.09]

Epoch 2/5:  79%|███████▉  | 150/189 [04:00<01:01,  1.57s/it, loss=10.09]

Epoch 2/5:  79%|███████▉  | 150/189 [04:01<01:01,  1.57s/it, loss=10.13]

Epoch 2/5:  80%|███████▉  | 151/189 [04:01<00:59,  1.56s/it, loss=10.13]

Epoch 2/5:  80%|███████▉  | 151/189 [04:03<00:59,  1.56s/it, loss=10.09]

Epoch 2/5:  80%|████████  | 152/189 [04:03<00:58,  1.59s/it, loss=10.09]

Epoch 2/5:  80%|████████  | 152/189 [04:05<00:58,  1.59s/it, loss=10.00]

Epoch 2/5:  81%|████████  | 153/189 [04:05<00:58,  1.63s/it, loss=10.00]

Epoch 2/5:  81%|████████  | 153/189 [04:06<00:58,  1.63s/it, loss=10.14]

Epoch 2/5:  81%|████████▏ | 154/189 [04:06<00:56,  1.62s/it, loss=10.14]

Epoch 2/5:  81%|████████▏ | 154/189 [04:08<00:56,  1.62s/it, loss=10.18]

Epoch 2/5:  82%|████████▏ | 155/189 [04:08<00:55,  1.64s/it, loss=10.18]

Epoch 2/5:  82%|████████▏ | 155/189 [04:09<00:55,  1.64s/it, loss=10.18]

Epoch 2/5:  83%|████████▎ | 156/189 [04:09<00:52,  1.60s/it, loss=10.18]

Epoch 2/5:  83%|████████▎ | 156/189 [04:11<00:52,  1.60s/it, loss=9.97] 

Epoch 2/5:  83%|████████▎ | 157/189 [04:11<00:51,  1.61s/it, loss=9.97]

Epoch 2/5:  83%|████████▎ | 157/189 [04:13<00:51,  1.61s/it, loss=9.97]

Epoch 2/5:  84%|████████▎ | 158/189 [04:13<00:49,  1.60s/it, loss=9.97]

Epoch 2/5:  84%|████████▎ | 158/189 [04:14<00:49,  1.60s/it, loss=9.90]

Epoch 2/5:  84%|████████▍ | 159/189 [04:14<00:47,  1.58s/it, loss=9.90]

Epoch 2/5:  84%|████████▍ | 159/189 [04:16<00:47,  1.58s/it, loss=10.04]

Epoch 2/5:  85%|████████▍ | 160/189 [04:16<00:46,  1.60s/it, loss=10.04]

Epoch 2/5:  85%|████████▍ | 160/189 [04:17<00:46,  1.60s/it, loss=10.13]

Epoch 2/5:  85%|████████▌ | 161/189 [04:17<00:44,  1.60s/it, loss=10.13]

Epoch 2/5:  85%|████████▌ | 161/189 [04:19<00:44,  1.60s/it, loss=9.84] 

Epoch 2/5:  86%|████████▌ | 162/189 [04:19<00:44,  1.66s/it, loss=9.84]

Epoch 2/5:  86%|████████▌ | 162/189 [04:21<00:44,  1.66s/it, loss=10.01]

Epoch 2/5:  86%|████████▌ | 163/189 [04:21<00:43,  1.66s/it, loss=10.01]

Epoch 2/5:  86%|████████▌ | 163/189 [04:22<00:43,  1.66s/it, loss=9.88] 

Epoch 2/5:  87%|████████▋ | 164/189 [04:22<00:41,  1.64s/it, loss=9.88]

Epoch 2/5:  87%|████████▋ | 164/189 [04:24<00:41,  1.64s/it, loss=9.91]

Epoch 2/5:  87%|████████▋ | 165/189 [04:24<00:37,  1.55s/it, loss=9.91]

Epoch 2/5:  87%|████████▋ | 165/189 [04:25<00:37,  1.55s/it, loss=9.69]

Epoch 2/5:  88%|████████▊ | 166/189 [04:26<00:36,  1.59s/it, loss=9.69]

Epoch 2/5:  88%|████████▊ | 166/189 [04:27<00:36,  1.59s/it, loss=9.80]

Epoch 2/5:  88%|████████▊ | 167/189 [04:27<00:35,  1.63s/it, loss=9.80]

Epoch 2/5:  88%|████████▊ | 167/189 [04:29<00:35,  1.63s/it, loss=9.80]

Epoch 2/5:  89%|████████▉ | 168/189 [04:29<00:35,  1.67s/it, loss=9.80]

Epoch 2/5:  89%|████████▉ | 168/189 [04:31<00:35,  1.67s/it, loss=9.77]

Epoch 2/5:  89%|████████▉ | 169/189 [04:31<00:33,  1.67s/it, loss=9.77]

Epoch 2/5:  89%|████████▉ | 169/189 [04:32<00:33,  1.67s/it, loss=9.89]

Epoch 2/5:  90%|████████▉ | 170/189 [04:32<00:29,  1.57s/it, loss=9.89]

Epoch 2/5:  90%|████████▉ | 170/189 [04:34<00:29,  1.57s/it, loss=9.91]

Epoch 2/5:  90%|█████████ | 171/189 [04:34<00:28,  1.56s/it, loss=9.91]

Epoch 2/5:  90%|█████████ | 171/189 [04:35<00:28,  1.56s/it, loss=9.86]

Epoch 2/5:  91%|█████████ | 172/189 [04:35<00:27,  1.62s/it, loss=9.86]

Epoch 2/5:  91%|█████████ | 172/189 [04:37<00:27,  1.62s/it, loss=9.86]

Epoch 2/5:  92%|█████████▏| 173/189 [04:37<00:25,  1.58s/it, loss=9.86]

Epoch 2/5:  92%|█████████▏| 173/189 [04:38<00:25,  1.58s/it, loss=9.79]

Epoch 2/5:  92%|█████████▏| 174/189 [04:38<00:24,  1.61s/it, loss=9.79]

Epoch 2/5:  92%|█████████▏| 174/189 [04:40<00:24,  1.61s/it, loss=9.75]

Epoch 2/5:  93%|█████████▎| 175/189 [04:40<00:22,  1.64s/it, loss=9.75]

Epoch 2/5:  93%|█████████▎| 175/189 [04:42<00:22,  1.64s/it, loss=9.79]

Epoch 2/5:  93%|█████████▎| 176/189 [04:42<00:20,  1.60s/it, loss=9.79]

Epoch 2/5:  93%|█████████▎| 176/189 [04:43<00:20,  1.60s/it, loss=9.85]

Epoch 2/5:  94%|█████████▎| 177/189 [04:43<00:19,  1.61s/it, loss=9.85]

Epoch 2/5:  94%|█████████▎| 177/189 [04:45<00:19,  1.61s/it, loss=9.70]

Epoch 2/5:  94%|█████████▍| 178/189 [04:45<00:16,  1.54s/it, loss=9.70]

Epoch 2/5:  94%|█████████▍| 178/189 [04:46<00:16,  1.54s/it, loss=9.86]

Epoch 2/5:  95%|█████████▍| 179/189 [04:46<00:16,  1.60s/it, loss=9.86]

Epoch 2/5:  95%|█████████▍| 179/189 [04:48<00:16,  1.60s/it, loss=9.78]

Epoch 2/5:  95%|█████████▌| 180/189 [04:48<00:14,  1.64s/it, loss=9.78]

Epoch 2/5:  95%|█████████▌| 180/189 [04:50<00:14,  1.64s/it, loss=9.79]

Epoch 2/5:  96%|█████████▌| 181/189 [04:50<00:13,  1.64s/it, loss=9.79]

Epoch 2/5:  96%|█████████▌| 181/189 [04:51<00:13,  1.64s/it, loss=9.75]

Epoch 2/5:  96%|█████████▋| 182/189 [04:51<00:11,  1.58s/it, loss=9.75]

Epoch 2/5:  96%|█████████▋| 182/189 [04:53<00:11,  1.58s/it, loss=9.74]

Epoch 2/5:  97%|█████████▋| 183/189 [04:53<00:09,  1.52s/it, loss=9.74]

Epoch 2/5:  97%|█████████▋| 183/189 [04:54<00:09,  1.52s/it, loss=9.68]

Epoch 2/5:  97%|█████████▋| 184/189 [04:54<00:07,  1.58s/it, loss=9.68]

Epoch 2/5:  97%|█████████▋| 184/189 [04:56<00:07,  1.58s/it, loss=9.75]

Epoch 2/5:  98%|█████████▊| 185/189 [04:56<00:06,  1.63s/it, loss=9.75]

Epoch 2/5:  98%|█████████▊| 185/189 [04:58<00:06,  1.63s/it, loss=9.68]

Epoch 2/5:  98%|█████████▊| 186/189 [04:58<00:04,  1.66s/it, loss=9.68]

Epoch 2/5:  98%|█████████▊| 186/189 [05:00<00:04,  1.66s/it, loss=9.66]

Epoch 2/5:  99%|█████████▉| 187/189 [05:00<00:03,  1.68s/it, loss=9.66]

Epoch 2/5:  99%|█████████▉| 187/189 [05:01<00:03,  1.68s/it, loss=9.63]

Epoch 2/5:  99%|█████████▉| 188/189 [05:01<00:01,  1.63s/it, loss=9.63]

Epoch 2/5:  99%|█████████▉| 188/189 [05:03<00:01,  1.63s/it, loss=9.46]

Epoch 2/5: 100%|██████████| 189/189 [05:03<00:00,  1.61s/it, loss=9.46]

Epoch 2/5: 100%|██████████| 189/189 [05:03<00:00,  1.60s/it, loss=9.46]




  0%|          | 0/23 [00:00<?, ?it/s]

  4%|▍         | 1/23 [00:00<00:08,  2.53it/s]

  9%|▊         | 2/23 [00:00<00:08,  2.54it/s]

 13%|█▎        | 3/23 [00:01<00:07,  2.62it/s]

 17%|█▋        | 4/23 [00:01<00:06,  2.95it/s]

 22%|██▏       | 5/23 [00:01<00:06,  2.85it/s]

 26%|██▌       | 6/23 [00:02<00:06,  2.82it/s]

 30%|███       | 7/23 [00:02<00:05,  2.84it/s]

 35%|███▍      | 8/23 [00:02<00:05,  2.98it/s]

 39%|███▉      | 9/23 [00:03<00:04,  2.84it/s]

 43%|████▎     | 10/23 [00:03<00:04,  2.79it/s]

 48%|████▊     | 11/23 [00:03<00:04,  2.80it/s]

 52%|█████▏    | 12/23 [00:04<00:03,  2.77it/s]

 57%|█████▋    | 13/23 [00:04<00:03,  2.67it/s]

 61%|██████    | 14/23 [00:04<00:03,  2.86it/s]

 65%|██████▌   | 15/23 [00:05<00:02,  3.00it/s]

 70%|██████▉   | 16/23 [00:05<00:02,  2.87it/s]

 74%|███████▍  | 17/23 [00:06<00:02,  2.81it/s]

 78%|███████▊  | 18/23 [00:06<00:01,  2.95it/s]

 83%|████████▎ | 19/23 [00:06<00:01,  2.89it/s]

 87%|████████▋ | 20/23 [00:07<00:01,  2.98it/s]

 91%|█████████▏| 21/23 [00:07<00:00,  2.99it/s]

 96%|█████████▌| 22/23 [00:07<00:00,  3.13it/s]

100%|██████████| 23/23 [00:07<00:00,  3.31it/s]

100%|██████████| 23/23 [00:07<00:00,  2.91it/s]





Epoch 2: train_loss=11.1752 | R@10=0.0129 | DCG@10=0.1115 | NDCG@10=0.0265


Epoch 3/5:   0%|          | 0/189 [00:00<?, ?it/s]

Epoch 3/5:   0%|          | 0/189 [00:01<?, ?it/s, loss=9.65]

Epoch 3/5:   1%|          | 1/189 [00:01<05:10,  1.65s/it, loss=9.65]

Epoch 3/5:   1%|          | 1/189 [00:03<05:10,  1.65s/it, loss=9.52]

Epoch 3/5:   1%|          | 2/189 [00:03<05:03,  1.63s/it, loss=9.52]

Epoch 3/5:   1%|          | 2/189 [00:04<05:03,  1.63s/it, loss=9.52]

Epoch 3/5:   2%|▏         | 3/189 [00:04<04:59,  1.61s/it, loss=9.52]

Epoch 3/5:   2%|▏         | 3/189 [00:06<04:59,  1.61s/it, loss=9.53]

Epoch 3/5:   2%|▏         | 4/189 [00:06<04:55,  1.60s/it, loss=9.53]

Epoch 3/5:   2%|▏         | 4/189 [00:07<04:55,  1.60s/it, loss=9.53]

Epoch 3/5:   3%|▎         | 5/189 [00:07<04:47,  1.56s/it, loss=9.53]

Epoch 3/5:   3%|▎         | 5/189 [00:09<04:47,  1.56s/it, loss=9.52]

Epoch 3/5:   3%|▎         | 6/189 [00:09<04:48,  1.58s/it, loss=9.52]

Epoch 3/5:   3%|▎         | 6/189 [00:11<04:48,  1.58s/it, loss=9.50]

Epoch 3/5:   4%|▎         | 7/189 [00:11<04:58,  1.64s/it, loss=9.50]

Epoch 3/5:   4%|▎         | 7/189 [00:12<04:58,  1.64s/it, loss=9.53]

Epoch 3/5:   4%|▍         | 8/189 [00:12<04:54,  1.63s/it, loss=9.53]

Epoch 3/5:   4%|▍         | 8/189 [00:14<04:54,  1.63s/it, loss=9.64]

Epoch 3/5:   5%|▍         | 9/189 [00:14<04:49,  1.61s/it, loss=9.64]

Epoch 3/5:   5%|▍         | 9/189 [00:16<04:49,  1.61s/it, loss=9.59]

Epoch 3/5:   5%|▌         | 10/189 [00:16<04:45,  1.59s/it, loss=9.59]

Epoch 3/5:   5%|▌         | 10/189 [00:17<04:45,  1.59s/it, loss=9.43]

Epoch 3/5:   6%|▌         | 11/189 [00:17<04:50,  1.63s/it, loss=9.43]

Epoch 3/5:   6%|▌         | 11/189 [00:19<04:50,  1.63s/it, loss=9.32]

Epoch 3/5:   6%|▋         | 12/189 [00:19<04:45,  1.61s/it, loss=9.32]

Epoch 3/5:   6%|▋         | 12/189 [00:20<04:45,  1.61s/it, loss=9.40]

Epoch 3/5:   7%|▋         | 13/189 [00:20<04:44,  1.61s/it, loss=9.40]

Epoch 3/5:   7%|▋         | 13/189 [00:22<04:44,  1.61s/it, loss=9.55]

Epoch 3/5:   7%|▋         | 14/189 [00:22<04:42,  1.61s/it, loss=9.55]

Epoch 3/5:   7%|▋         | 14/189 [00:24<04:42,  1.61s/it, loss=9.57]

Epoch 3/5:   8%|▊         | 15/189 [00:24<04:41,  1.62s/it, loss=9.57]

Epoch 3/5:   8%|▊         | 15/189 [00:25<04:41,  1.62s/it, loss=9.39]

Epoch 3/5:   8%|▊         | 16/189 [00:25<04:35,  1.59s/it, loss=9.39]

Epoch 3/5:   8%|▊         | 16/189 [00:27<04:35,  1.59s/it, loss=9.52]

Epoch 3/5:   9%|▉         | 17/189 [00:27<04:22,  1.53s/it, loss=9.52]

Epoch 3/5:   9%|▉         | 17/189 [00:28<04:22,  1.53s/it, loss=9.48]

Epoch 3/5:  10%|▉         | 18/189 [00:28<04:23,  1.54s/it, loss=9.48]

Epoch 3/5:  10%|▉         | 18/189 [00:30<04:23,  1.54s/it, loss=9.33]

Epoch 3/5:  10%|█         | 19/189 [00:30<04:30,  1.59s/it, loss=9.33]

Epoch 3/5:  10%|█         | 19/189 [00:32<04:30,  1.59s/it, loss=9.42]

Epoch 3/5:  11%|█         | 20/189 [00:32<04:33,  1.62s/it, loss=9.42]

Epoch 3/5:  11%|█         | 20/189 [00:33<04:33,  1.62s/it, loss=9.43]

Epoch 3/5:  11%|█         | 21/189 [00:33<04:37,  1.65s/it, loss=9.43]

Epoch 3/5:  11%|█         | 21/189 [00:35<04:37,  1.65s/it, loss=9.48]

Epoch 3/5:  12%|█▏        | 22/189 [00:35<04:22,  1.57s/it, loss=9.48]

Epoch 3/5:  12%|█▏        | 22/189 [00:36<04:22,  1.57s/it, loss=9.58]

Epoch 3/5:  12%|█▏        | 23/189 [00:36<04:14,  1.53s/it, loss=9.58]

Epoch 3/5:  12%|█▏        | 23/189 [00:38<04:14,  1.53s/it, loss=9.31]

Epoch 3/5:  13%|█▎        | 24/189 [00:38<04:20,  1.58s/it, loss=9.31]

Epoch 3/5:  13%|█▎        | 24/189 [00:40<04:20,  1.58s/it, loss=9.34]

Epoch 3/5:  13%|█▎        | 25/189 [00:40<04:26,  1.62s/it, loss=9.34]

Epoch 3/5:  13%|█▎        | 25/189 [00:41<04:26,  1.62s/it, loss=9.42]

Epoch 3/5:  14%|█▍        | 26/189 [00:41<04:24,  1.62s/it, loss=9.42]

Epoch 3/5:  14%|█▍        | 26/189 [00:43<04:24,  1.62s/it, loss=9.40]

Epoch 3/5:  14%|█▍        | 27/189 [00:43<04:23,  1.63s/it, loss=9.40]

Epoch 3/5:  14%|█▍        | 27/189 [00:44<04:23,  1.63s/it, loss=9.49]

Epoch 3/5:  15%|█▍        | 28/189 [00:44<04:24,  1.64s/it, loss=9.49]

Epoch 3/5:  15%|█▍        | 28/189 [00:46<04:24,  1.64s/it, loss=9.20]

Epoch 3/5:  15%|█▌        | 29/189 [00:46<04:21,  1.63s/it, loss=9.20]

Epoch 3/5:  15%|█▌        | 29/189 [00:48<04:21,  1.63s/it, loss=9.21]

Epoch 3/5:  16%|█▌        | 30/189 [00:48<04:19,  1.63s/it, loss=9.21]

Epoch 3/5:  16%|█▌        | 30/189 [00:49<04:19,  1.63s/it, loss=9.41]

Epoch 3/5:  16%|█▋        | 31/189 [00:49<04:13,  1.61s/it, loss=9.41]

Epoch 3/5:  16%|█▋        | 31/189 [00:51<04:13,  1.61s/it, loss=9.42]

Epoch 3/5:  17%|█▋        | 32/189 [00:51<04:09,  1.59s/it, loss=9.42]

Epoch 3/5:  17%|█▋        | 32/189 [00:52<04:09,  1.59s/it, loss=9.21]

Epoch 3/5:  17%|█▋        | 33/189 [00:52<04:07,  1.59s/it, loss=9.21]

Epoch 3/5:  17%|█▋        | 33/189 [00:54<04:07,  1.59s/it, loss=9.26]

Epoch 3/5:  18%|█▊        | 34/189 [00:54<03:58,  1.54s/it, loss=9.26]

Epoch 3/5:  18%|█▊        | 34/189 [00:55<03:58,  1.54s/it, loss=9.36]

Epoch 3/5:  19%|█▊        | 35/189 [00:55<04:03,  1.58s/it, loss=9.36]

Epoch 3/5:  19%|█▊        | 35/189 [00:57<04:03,  1.58s/it, loss=9.28]

Epoch 3/5:  19%|█▉        | 36/189 [00:57<03:59,  1.57s/it, loss=9.28]

Epoch 3/5:  19%|█▉        | 36/189 [00:59<03:59,  1.57s/it, loss=9.32]

Epoch 3/5:  20%|█▉        | 37/189 [00:59<03:58,  1.57s/it, loss=9.32]

Epoch 3/5:  20%|█▉        | 37/189 [01:00<03:58,  1.57s/it, loss=9.22]

Epoch 3/5:  20%|██        | 38/189 [01:00<03:54,  1.55s/it, loss=9.22]

Epoch 3/5:  20%|██        | 38/189 [01:02<03:54,  1.55s/it, loss=9.31]

Epoch 3/5:  21%|██        | 39/189 [01:02<03:56,  1.58s/it, loss=9.31]

Epoch 3/5:  21%|██        | 39/189 [01:03<03:56,  1.58s/it, loss=9.29]

Epoch 3/5:  21%|██        | 40/189 [01:03<04:00,  1.62s/it, loss=9.29]

Epoch 3/5:  21%|██        | 40/189 [01:05<04:00,  1.62s/it, loss=8.99]

Epoch 3/5:  22%|██▏       | 41/189 [01:05<03:58,  1.61s/it, loss=8.99]

Epoch 3/5:  22%|██▏       | 41/189 [01:07<03:58,  1.61s/it, loss=9.22]

Epoch 3/5:  22%|██▏       | 42/189 [01:07<04:01,  1.65s/it, loss=9.22]

Epoch 3/5:  22%|██▏       | 42/189 [01:08<04:01,  1.65s/it, loss=9.40]

Epoch 3/5:  23%|██▎       | 43/189 [01:08<04:02,  1.66s/it, loss=9.40]

Epoch 3/5:  23%|██▎       | 43/189 [01:10<04:02,  1.66s/it, loss=9.22]

Epoch 3/5:  23%|██▎       | 44/189 [01:10<03:57,  1.64s/it, loss=9.22]

Epoch 3/5:  23%|██▎       | 44/189 [01:12<03:57,  1.64s/it, loss=9.08]

Epoch 3/5:  24%|██▍       | 45/189 [01:12<03:53,  1.62s/it, loss=9.08]

Epoch 3/5:  24%|██▍       | 45/189 [01:13<03:53,  1.62s/it, loss=9.07]

Epoch 3/5:  24%|██▍       | 46/189 [01:13<03:50,  1.61s/it, loss=9.07]

Epoch 3/5:  24%|██▍       | 46/189 [01:15<03:50,  1.61s/it, loss=9.15]

Epoch 3/5:  25%|██▍       | 47/189 [01:15<03:48,  1.61s/it, loss=9.15]

Epoch 3/5:  25%|██▍       | 47/189 [01:17<03:48,  1.61s/it, loss=9.08]

Epoch 3/5:  25%|██▌       | 48/189 [01:17<03:51,  1.64s/it, loss=9.08]

Epoch 3/5:  25%|██▌       | 48/189 [01:18<03:51,  1.64s/it, loss=9.02]

Epoch 3/5:  26%|██▌       | 49/189 [01:18<03:53,  1.67s/it, loss=9.02]

Epoch 3/5:  26%|██▌       | 49/189 [01:20<03:53,  1.67s/it, loss=9.21]

Epoch 3/5:  26%|██▋       | 50/189 [01:20<03:46,  1.63s/it, loss=9.21]

Epoch 3/5:  26%|██▋       | 50/189 [01:22<03:46,  1.63s/it, loss=9.07]

Epoch 3/5:  27%|██▋       | 51/189 [01:22<03:47,  1.65s/it, loss=9.07]

Epoch 3/5:  27%|██▋       | 51/189 [01:23<03:47,  1.65s/it, loss=9.21]

Epoch 3/5:  28%|██▊       | 52/189 [01:23<03:46,  1.65s/it, loss=9.21]

Epoch 3/5:  28%|██▊       | 52/189 [01:25<03:46,  1.65s/it, loss=9.07]

Epoch 3/5:  28%|██▊       | 53/189 [01:25<03:49,  1.69s/it, loss=9.07]

Epoch 3/5:  28%|██▊       | 53/189 [01:27<03:49,  1.69s/it, loss=8.92]

Epoch 3/5:  29%|██▊       | 54/189 [01:27<03:47,  1.68s/it, loss=8.92]

Epoch 3/5:  29%|██▊       | 54/189 [01:28<03:47,  1.68s/it, loss=8.94]

Epoch 3/5:  29%|██▉       | 55/189 [01:28<03:43,  1.67s/it, loss=8.94]

Epoch 3/5:  29%|██▉       | 55/189 [01:30<03:43,  1.67s/it, loss=9.11]

Epoch 3/5:  30%|██▉       | 56/189 [01:30<03:42,  1.67s/it, loss=9.11]

Epoch 3/5:  30%|██▉       | 56/189 [01:32<03:42,  1.67s/it, loss=8.98]

Epoch 3/5:  30%|███       | 57/189 [01:32<03:43,  1.69s/it, loss=8.98]

Epoch 3/5:  30%|███       | 57/189 [01:33<03:43,  1.69s/it, loss=9.16]

Epoch 3/5:  31%|███       | 58/189 [01:33<03:41,  1.69s/it, loss=9.16]

Epoch 3/5:  31%|███       | 58/189 [01:35<03:41,  1.69s/it, loss=9.05]

Epoch 3/5:  31%|███       | 59/189 [01:35<03:45,  1.73s/it, loss=9.05]

Epoch 3/5:  31%|███       | 59/189 [01:37<03:45,  1.73s/it, loss=9.19]

Epoch 3/5:  32%|███▏      | 60/189 [01:37<03:33,  1.65s/it, loss=9.19]

Epoch 3/5:  32%|███▏      | 60/189 [01:38<03:33,  1.65s/it, loss=9.11]

Epoch 3/5:  32%|███▏      | 61/189 [01:38<03:30,  1.64s/it, loss=9.11]

Epoch 3/5:  32%|███▏      | 61/189 [01:40<03:30,  1.64s/it, loss=8.92]

Epoch 3/5:  33%|███▎      | 62/189 [01:40<03:20,  1.58s/it, loss=8.92]

Epoch 3/5:  33%|███▎      | 62/189 [01:41<03:20,  1.58s/it, loss=8.90]

Epoch 3/5:  33%|███▎      | 63/189 [01:41<03:24,  1.62s/it, loss=8.90]

Epoch 3/5:  33%|███▎      | 63/189 [01:43<03:24,  1.62s/it, loss=9.07]

Epoch 3/5:  34%|███▍      | 64/189 [01:43<03:26,  1.65s/it, loss=9.07]

Epoch 3/5:  34%|███▍      | 64/189 [01:45<03:26,  1.65s/it, loss=8.88]

Epoch 3/5:  34%|███▍      | 65/189 [01:45<03:21,  1.63s/it, loss=8.88]

Epoch 3/5:  34%|███▍      | 65/189 [01:46<03:21,  1.63s/it, loss=9.10]

Epoch 3/5:  35%|███▍      | 66/189 [01:46<03:21,  1.64s/it, loss=9.10]

Epoch 3/5:  35%|███▍      | 66/189 [01:48<03:21,  1.64s/it, loss=8.83]

Epoch 3/5:  35%|███▌      | 67/189 [01:48<03:09,  1.55s/it, loss=8.83]

Epoch 3/5:  35%|███▌      | 67/189 [01:49<03:09,  1.55s/it, loss=8.89]

Epoch 3/5:  36%|███▌      | 68/189 [01:49<03:09,  1.57s/it, loss=8.89]

Epoch 3/5:  36%|███▌      | 68/189 [01:51<03:09,  1.57s/it, loss=9.25]

Epoch 3/5:  37%|███▋      | 69/189 [01:51<03:07,  1.57s/it, loss=9.25]

Epoch 3/5:  37%|███▋      | 69/189 [01:53<03:07,  1.57s/it, loss=9.03]

Epoch 3/5:  37%|███▋      | 70/189 [01:53<03:11,  1.61s/it, loss=9.03]

Epoch 3/5:  37%|███▋      | 70/189 [01:54<03:11,  1.61s/it, loss=9.02]

Epoch 3/5:  38%|███▊      | 71/189 [01:54<03:10,  1.61s/it, loss=9.02]

Epoch 3/5:  38%|███▊      | 71/189 [01:56<03:10,  1.61s/it, loss=8.99]

Epoch 3/5:  38%|███▊      | 72/189 [01:56<03:05,  1.58s/it, loss=8.99]

Epoch 3/5:  38%|███▊      | 72/189 [01:57<03:05,  1.58s/it, loss=9.13]

Epoch 3/5:  39%|███▊      | 73/189 [01:57<03:06,  1.61s/it, loss=9.13]

Epoch 3/5:  39%|███▊      | 73/189 [01:59<03:06,  1.61s/it, loss=8.98]

Epoch 3/5:  39%|███▉      | 74/189 [01:59<03:03,  1.59s/it, loss=8.98]

Epoch 3/5:  39%|███▉      | 74/189 [02:01<03:03,  1.59s/it, loss=8.89]

Epoch 3/5:  40%|███▉      | 75/189 [02:01<03:06,  1.64s/it, loss=8.89]

Epoch 3/5:  40%|███▉      | 75/189 [02:02<03:06,  1.64s/it, loss=8.95]

Epoch 3/5:  40%|████      | 76/189 [02:02<03:08,  1.67s/it, loss=8.95]

Epoch 3/5:  40%|████      | 76/189 [02:04<03:08,  1.67s/it, loss=8.99]

Epoch 3/5:  41%|████      | 77/189 [02:04<03:09,  1.69s/it, loss=8.99]

Epoch 3/5:  41%|████      | 77/189 [02:06<03:09,  1.69s/it, loss=8.89]

Epoch 3/5:  41%|████▏     | 78/189 [02:06<03:03,  1.65s/it, loss=8.89]

Epoch 3/5:  41%|████▏     | 78/189 [02:07<03:03,  1.65s/it, loss=8.82]

Epoch 3/5:  42%|████▏     | 79/189 [02:07<03:00,  1.64s/it, loss=8.82]

Epoch 3/5:  42%|████▏     | 79/189 [02:09<03:00,  1.64s/it, loss=8.82]

Epoch 3/5:  42%|████▏     | 80/189 [02:09<03:01,  1.67s/it, loss=8.82]

Epoch 3/5:  42%|████▏     | 80/189 [02:11<03:01,  1.67s/it, loss=8.84]

Epoch 3/5:  43%|████▎     | 81/189 [02:11<02:56,  1.63s/it, loss=8.84]

Epoch 3/5:  43%|████▎     | 81/189 [02:12<02:56,  1.63s/it, loss=8.93]

Epoch 3/5:  43%|████▎     | 82/189 [02:12<02:47,  1.56s/it, loss=8.93]

Epoch 3/5:  43%|████▎     | 82/189 [02:14<02:47,  1.56s/it, loss=9.03]

Epoch 3/5:  44%|████▍     | 83/189 [02:14<02:48,  1.59s/it, loss=9.03]

Epoch 3/5:  44%|████▍     | 83/189 [02:15<02:48,  1.59s/it, loss=8.86]

Epoch 3/5:  44%|████▍     | 84/189 [02:15<02:47,  1.60s/it, loss=8.86]

Epoch 3/5:  44%|████▍     | 84/189 [02:17<02:47,  1.60s/it, loss=8.88]

Epoch 3/5:  45%|████▍     | 85/189 [02:17<02:48,  1.62s/it, loss=8.88]

Epoch 3/5:  45%|████▍     | 85/189 [02:19<02:48,  1.62s/it, loss=8.93]

Epoch 3/5:  46%|████▌     | 86/189 [02:19<02:46,  1.61s/it, loss=8.93]

Epoch 3/5:  46%|████▌     | 86/189 [02:20<02:46,  1.61s/it, loss=8.93]

Epoch 3/5:  46%|████▌     | 87/189 [02:20<02:46,  1.63s/it, loss=8.93]

Epoch 3/5:  46%|████▌     | 87/189 [02:22<02:46,  1.63s/it, loss=8.96]

Epoch 3/5:  47%|████▋     | 88/189 [02:22<02:41,  1.60s/it, loss=8.96]

Epoch 3/5:  47%|████▋     | 88/189 [02:24<02:41,  1.60s/it, loss=8.91]

Epoch 3/5:  47%|████▋     | 89/189 [02:24<02:43,  1.64s/it, loss=8.91]

Epoch 3/5:  47%|████▋     | 89/189 [02:25<02:43,  1.64s/it, loss=8.89]

Epoch 3/5:  48%|████▊     | 90/189 [02:25<02:37,  1.59s/it, loss=8.89]

Epoch 3/5:  48%|████▊     | 90/189 [02:27<02:37,  1.59s/it, loss=8.83]

Epoch 3/5:  48%|████▊     | 91/189 [02:27<02:36,  1.59s/it, loss=8.83]

Epoch 3/5:  48%|████▊     | 91/189 [02:28<02:36,  1.59s/it, loss=8.94]

Epoch 3/5:  49%|████▊     | 92/189 [02:28<02:32,  1.58s/it, loss=8.94]

Epoch 3/5:  49%|████▊     | 92/189 [02:30<02:32,  1.58s/it, loss=8.79]

Epoch 3/5:  49%|████▉     | 93/189 [02:30<02:29,  1.56s/it, loss=8.79]

Epoch 3/5:  49%|████▉     | 93/189 [02:31<02:29,  1.56s/it, loss=8.89]

Epoch 3/5:  50%|████▉     | 94/189 [02:31<02:29,  1.58s/it, loss=8.89]

Epoch 3/5:  50%|████▉     | 94/189 [02:33<02:29,  1.58s/it, loss=8.75]

Epoch 3/5:  50%|█████     | 95/189 [02:33<02:26,  1.56s/it, loss=8.75]

Epoch 3/5:  50%|█████     | 95/189 [02:34<02:26,  1.56s/it, loss=8.76]

Epoch 3/5:  51%|█████     | 96/189 [02:34<02:27,  1.59s/it, loss=8.76]

Epoch 3/5:  51%|█████     | 96/189 [02:36<02:27,  1.59s/it, loss=8.77]

Epoch 3/5:  51%|█████▏    | 97/189 [02:36<02:26,  1.59s/it, loss=8.77]

Epoch 3/5:  51%|█████▏    | 97/189 [02:38<02:26,  1.59s/it, loss=8.86]

Epoch 3/5:  52%|█████▏    | 98/189 [02:38<02:24,  1.59s/it, loss=8.86]

Epoch 3/5:  52%|█████▏    | 98/189 [02:39<02:24,  1.59s/it, loss=8.92]

Epoch 3/5:  52%|█████▏    | 99/189 [02:39<02:25,  1.62s/it, loss=8.92]

Epoch 3/5:  52%|█████▏    | 99/189 [02:41<02:25,  1.62s/it, loss=8.67]

Epoch 3/5:  53%|█████▎    | 100/189 [02:41<02:24,  1.62s/it, loss=8.67]

Epoch 3/5:  53%|█████▎    | 100/189 [02:43<02:24,  1.62s/it, loss=8.75]

Epoch 3/5:  53%|█████▎    | 101/189 [02:43<02:24,  1.64s/it, loss=8.75]

Epoch 3/5:  53%|█████▎    | 101/189 [02:44<02:24,  1.64s/it, loss=8.76]

Epoch 3/5:  54%|█████▍    | 102/189 [02:44<02:25,  1.67s/it, loss=8.76]

Epoch 3/5:  54%|█████▍    | 102/189 [02:46<02:25,  1.67s/it, loss=8.76]

Epoch 3/5:  54%|█████▍    | 103/189 [02:46<02:21,  1.65s/it, loss=8.76]

Epoch 3/5:  54%|█████▍    | 103/189 [02:48<02:21,  1.65s/it, loss=8.65]

Epoch 3/5:  55%|█████▌    | 104/189 [02:48<02:20,  1.65s/it, loss=8.65]

Epoch 3/5:  55%|█████▌    | 104/189 [02:49<02:20,  1.65s/it, loss=8.71]

Epoch 3/5:  56%|█████▌    | 105/189 [02:49<02:20,  1.68s/it, loss=8.71]

Epoch 3/5:  56%|█████▌    | 105/189 [02:51<02:20,  1.68s/it, loss=8.88]

Epoch 3/5:  56%|█████▌    | 106/189 [02:51<02:17,  1.66s/it, loss=8.88]

Epoch 3/5:  56%|█████▌    | 106/189 [02:53<02:17,  1.66s/it, loss=8.73]

Epoch 3/5:  57%|█████▋    | 107/189 [02:53<02:16,  1.66s/it, loss=8.73]

Epoch 3/5:  57%|█████▋    | 107/189 [02:54<02:16,  1.66s/it, loss=8.87]

Epoch 3/5:  57%|█████▋    | 108/189 [02:54<02:14,  1.66s/it, loss=8.87]

Epoch 3/5:  57%|█████▋    | 108/189 [02:56<02:14,  1.66s/it, loss=8.65]

Epoch 3/5:  58%|█████▊    | 109/189 [02:56<02:12,  1.65s/it, loss=8.65]

Epoch 3/5:  58%|█████▊    | 109/189 [02:58<02:12,  1.65s/it, loss=8.85]

Epoch 3/5:  58%|█████▊    | 110/189 [02:58<02:09,  1.63s/it, loss=8.85]

Epoch 3/5:  58%|█████▊    | 110/189 [02:59<02:09,  1.63s/it, loss=8.91]

Epoch 3/5:  59%|█████▊    | 111/189 [02:59<02:05,  1.61s/it, loss=8.91]

Epoch 3/5:  59%|█████▊    | 111/189 [03:01<02:05,  1.61s/it, loss=8.76]

Epoch 3/5:  59%|█████▉    | 112/189 [03:01<02:08,  1.66s/it, loss=8.76]

Epoch 3/5:  59%|█████▉    | 112/189 [03:02<02:08,  1.66s/it, loss=8.75]

Epoch 3/5:  60%|█████▉    | 113/189 [03:02<02:04,  1.63s/it, loss=8.75]

Epoch 3/5:  60%|█████▉    | 113/189 [03:04<02:04,  1.63s/it, loss=8.72]

Epoch 3/5:  60%|██████    | 114/189 [03:04<02:01,  1.62s/it, loss=8.72]

Epoch 3/5:  60%|██████    | 114/189 [03:06<02:01,  1.62s/it, loss=8.81]

Epoch 3/5:  61%|██████    | 115/189 [03:06<02:00,  1.62s/it, loss=8.81]

Epoch 3/5:  61%|██████    | 115/189 [03:07<02:00,  1.62s/it, loss=8.62]

Epoch 3/5:  61%|██████▏   | 116/189 [03:07<02:01,  1.66s/it, loss=8.62]

Epoch 3/5:  61%|██████▏   | 116/189 [03:09<02:01,  1.66s/it, loss=8.70]

Epoch 3/5:  62%|██████▏   | 117/189 [03:09<02:00,  1.67s/it, loss=8.70]

Epoch 3/5:  62%|██████▏   | 117/189 [03:11<02:00,  1.67s/it, loss=8.71]

Epoch 3/5:  62%|██████▏   | 118/189 [03:11<01:58,  1.66s/it, loss=8.71]

Epoch 3/5:  62%|██████▏   | 118/189 [03:12<01:58,  1.66s/it, loss=8.60]

Epoch 3/5:  63%|██████▎   | 119/189 [03:12<01:55,  1.65s/it, loss=8.60]

Epoch 3/5:  63%|██████▎   | 119/189 [03:14<01:55,  1.65s/it, loss=8.59]

Epoch 3/5:  63%|██████▎   | 120/189 [03:14<01:51,  1.61s/it, loss=8.59]

Epoch 3/5:  63%|██████▎   | 120/189 [03:16<01:51,  1.61s/it, loss=8.82]

Epoch 3/5:  64%|██████▍   | 121/189 [03:16<01:53,  1.68s/it, loss=8.82]

Epoch 3/5:  64%|██████▍   | 121/189 [03:17<01:53,  1.68s/it, loss=8.72]

Epoch 3/5:  65%|██████▍   | 122/189 [03:17<01:54,  1.71s/it, loss=8.72]

Epoch 3/5:  65%|██████▍   | 122/189 [03:19<01:54,  1.71s/it, loss=8.76]

Epoch 3/5:  65%|██████▌   | 123/189 [03:19<01:52,  1.71s/it, loss=8.76]

Epoch 3/5:  65%|██████▌   | 123/189 [03:21<01:52,  1.71s/it, loss=8.72]

Epoch 3/5:  66%|██████▌   | 124/189 [03:21<01:48,  1.67s/it, loss=8.72]

Epoch 3/5:  66%|██████▌   | 124/189 [03:22<01:48,  1.67s/it, loss=8.46]

Epoch 3/5:  66%|██████▌   | 125/189 [03:22<01:47,  1.68s/it, loss=8.46]

Epoch 3/5:  66%|██████▌   | 125/189 [03:24<01:47,  1.68s/it, loss=8.73]

Epoch 3/5:  67%|██████▋   | 126/189 [03:24<01:44,  1.66s/it, loss=8.73]

Epoch 3/5:  67%|██████▋   | 126/189 [03:26<01:44,  1.66s/it, loss=8.50]

Epoch 3/5:  67%|██████▋   | 127/189 [03:26<01:44,  1.68s/it, loss=8.50]

Epoch 3/5:  67%|██████▋   | 127/189 [03:27<01:44,  1.68s/it, loss=8.54]

Epoch 3/5:  68%|██████▊   | 128/189 [03:27<01:40,  1.66s/it, loss=8.54]

Epoch 3/5:  68%|██████▊   | 128/189 [03:29<01:40,  1.66s/it, loss=8.70]

Epoch 3/5:  68%|██████▊   | 129/189 [03:29<01:37,  1.63s/it, loss=8.70]

Epoch 3/5:  68%|██████▊   | 129/189 [03:31<01:37,  1.63s/it, loss=8.40]

Epoch 3/5:  69%|██████▉   | 130/189 [03:31<01:35,  1.62s/it, loss=8.40]

Epoch 3/5:  69%|██████▉   | 130/189 [03:32<01:35,  1.62s/it, loss=8.65]

Epoch 3/5:  69%|██████▉   | 131/189 [03:32<01:32,  1.60s/it, loss=8.65]

Epoch 3/5:  69%|██████▉   | 131/189 [03:34<01:32,  1.60s/it, loss=8.54]

Epoch 3/5:  70%|██████▉   | 132/189 [03:34<01:31,  1.61s/it, loss=8.54]

Epoch 3/5:  70%|██████▉   | 132/189 [03:35<01:31,  1.61s/it, loss=8.54]

Epoch 3/5:  70%|███████   | 133/189 [03:35<01:31,  1.63s/it, loss=8.54]

Epoch 3/5:  70%|███████   | 133/189 [03:37<01:31,  1.63s/it, loss=8.56]

Epoch 3/5:  71%|███████   | 134/189 [03:37<01:25,  1.55s/it, loss=8.56]

Epoch 3/5:  71%|███████   | 134/189 [03:38<01:25,  1.55s/it, loss=8.48]

Epoch 3/5:  71%|███████▏  | 135/189 [03:38<01:24,  1.57s/it, loss=8.48]

Epoch 3/5:  71%|███████▏  | 135/189 [03:40<01:24,  1.57s/it, loss=8.60]

Epoch 3/5:  72%|███████▏  | 136/189 [03:40<01:21,  1.54s/it, loss=8.60]

Epoch 3/5:  72%|███████▏  | 136/189 [03:42<01:21,  1.54s/it, loss=8.63]

Epoch 3/5:  72%|███████▏  | 137/189 [03:42<01:23,  1.61s/it, loss=8.63]

Epoch 3/5:  72%|███████▏  | 137/189 [03:43<01:23,  1.61s/it, loss=8.64]

Epoch 3/5:  73%|███████▎  | 138/189 [03:43<01:21,  1.59s/it, loss=8.64]

Epoch 3/5:  73%|███████▎  | 138/189 [03:45<01:21,  1.59s/it, loss=8.64]

Epoch 3/5:  74%|███████▎  | 139/189 [03:45<01:20,  1.60s/it, loss=8.64]

Epoch 3/5:  74%|███████▎  | 139/189 [03:47<01:20,  1.60s/it, loss=8.50]

Epoch 3/5:  74%|███████▍  | 140/189 [03:47<01:19,  1.62s/it, loss=8.50]

Epoch 3/5:  74%|███████▍  | 140/189 [03:48<01:19,  1.62s/it, loss=8.47]

Epoch 3/5:  75%|███████▍  | 141/189 [03:48<01:17,  1.62s/it, loss=8.47]

Epoch 3/5:  75%|███████▍  | 141/189 [03:50<01:17,  1.62s/it, loss=8.59]

Epoch 3/5:  75%|███████▌  | 142/189 [03:50<01:15,  1.62s/it, loss=8.59]

Epoch 3/5:  75%|███████▌  | 142/189 [03:51<01:15,  1.62s/it, loss=8.58]

Epoch 3/5:  76%|███████▌  | 143/189 [03:51<01:13,  1.59s/it, loss=8.58]

Epoch 3/5:  76%|███████▌  | 143/189 [03:53<01:13,  1.59s/it, loss=8.39]

Epoch 3/5:  76%|███████▌  | 144/189 [03:53<01:07,  1.51s/it, loss=8.39]

Epoch 3/5:  76%|███████▌  | 144/189 [03:54<01:07,  1.51s/it, loss=8.46]

Epoch 3/5:  77%|███████▋  | 145/189 [03:54<01:04,  1.47s/it, loss=8.46]

Epoch 3/5:  77%|███████▋  | 145/189 [03:56<01:04,  1.47s/it, loss=8.59]

Epoch 3/5:  77%|███████▋  | 146/189 [03:56<01:05,  1.52s/it, loss=8.59]

Epoch 3/5:  77%|███████▋  | 146/189 [03:57<01:05,  1.52s/it, loss=8.44]

Epoch 3/5:  78%|███████▊  | 147/189 [03:57<01:05,  1.56s/it, loss=8.44]

Epoch 3/5:  78%|███████▊  | 147/189 [03:59<01:05,  1.56s/it, loss=8.39]

Epoch 3/5:  78%|███████▊  | 148/189 [03:59<01:06,  1.63s/it, loss=8.39]

Epoch 3/5:  78%|███████▊  | 148/189 [04:01<01:06,  1.63s/it, loss=8.53]

Epoch 3/5:  79%|███████▉  | 149/189 [04:01<01:05,  1.65s/it, loss=8.53]

Epoch 3/5:  79%|███████▉  | 149/189 [04:02<01:05,  1.65s/it, loss=8.46]

Epoch 3/5:  79%|███████▉  | 150/189 [04:02<01:03,  1.63s/it, loss=8.46]

Epoch 3/5:  79%|███████▉  | 150/189 [04:04<01:03,  1.63s/it, loss=8.47]

Epoch 3/5:  80%|███████▉  | 151/189 [04:04<01:00,  1.60s/it, loss=8.47]

Epoch 3/5:  80%|███████▉  | 151/189 [04:06<01:00,  1.60s/it, loss=8.41]

Epoch 3/5:  80%|████████  | 152/189 [04:06<01:00,  1.63s/it, loss=8.41]

Epoch 3/5:  80%|████████  | 152/189 [04:07<01:00,  1.63s/it, loss=8.49]

Epoch 3/5:  81%|████████  | 153/189 [04:07<00:58,  1.64s/it, loss=8.49]

Epoch 3/5:  81%|████████  | 153/189 [04:09<00:58,  1.64s/it, loss=8.54]

Epoch 3/5:  81%|████████▏ | 154/189 [04:09<00:56,  1.61s/it, loss=8.54]

Epoch 3/5:  81%|████████▏ | 154/189 [04:10<00:56,  1.61s/it, loss=8.40]

Epoch 3/5:  82%|████████▏ | 155/189 [04:10<00:55,  1.63s/it, loss=8.40]

Epoch 3/5:  82%|████████▏ | 155/189 [04:12<00:55,  1.63s/it, loss=8.48]

Epoch 3/5:  83%|████████▎ | 156/189 [04:12<00:52,  1.61s/it, loss=8.48]

Epoch 3/5:  83%|████████▎ | 156/189 [04:14<00:52,  1.61s/it, loss=8.55]

Epoch 3/5:  83%|████████▎ | 157/189 [04:14<00:51,  1.59s/it, loss=8.55]

Epoch 3/5:  83%|████████▎ | 157/189 [04:15<00:51,  1.59s/it, loss=8.48]

Epoch 3/5:  84%|████████▎ | 158/189 [04:15<00:48,  1.57s/it, loss=8.48]

Epoch 3/5:  84%|████████▎ | 158/189 [04:17<00:48,  1.57s/it, loss=8.35]

Epoch 3/5:  84%|████████▍ | 159/189 [04:17<00:47,  1.59s/it, loss=8.35]

Epoch 3/5:  84%|████████▍ | 159/189 [04:18<00:47,  1.59s/it, loss=8.38]

Epoch 3/5:  85%|████████▍ | 160/189 [04:18<00:46,  1.60s/it, loss=8.38]

Epoch 3/5:  85%|████████▍ | 160/189 [04:20<00:46,  1.60s/it, loss=8.43]

Epoch 3/5:  85%|████████▌ | 161/189 [04:20<00:44,  1.61s/it, loss=8.43]

Epoch 3/5:  85%|████████▌ | 161/189 [04:22<00:44,  1.61s/it, loss=8.36]

Epoch 3/5:  86%|████████▌ | 162/189 [04:22<00:42,  1.59s/it, loss=8.36]

Epoch 3/5:  86%|████████▌ | 162/189 [04:23<00:42,  1.59s/it, loss=8.53]

Epoch 3/5:  86%|████████▌ | 163/189 [04:23<00:40,  1.57s/it, loss=8.53]

Epoch 3/5:  86%|████████▌ | 163/189 [04:25<00:40,  1.57s/it, loss=8.36]

Epoch 3/5:  87%|████████▋ | 164/189 [04:25<00:39,  1.58s/it, loss=8.36]

Epoch 3/5:  87%|████████▋ | 164/189 [04:26<00:39,  1.58s/it, loss=8.48]

Epoch 3/5:  87%|████████▋ | 165/189 [04:26<00:38,  1.61s/it, loss=8.48]

Epoch 3/5:  87%|████████▋ | 165/189 [04:28<00:38,  1.61s/it, loss=8.53]

Epoch 3/5:  88%|████████▊ | 166/189 [04:28<00:37,  1.63s/it, loss=8.53]

Epoch 3/5:  88%|████████▊ | 166/189 [04:30<00:37,  1.63s/it, loss=8.50]

Epoch 3/5:  88%|████████▊ | 167/189 [04:30<00:36,  1.65s/it, loss=8.50]

Epoch 3/5:  88%|████████▊ | 167/189 [04:31<00:36,  1.65s/it, loss=8.52]

Epoch 3/5:  89%|████████▉ | 168/189 [04:31<00:34,  1.64s/it, loss=8.52]

Epoch 3/5:  89%|████████▉ | 168/189 [04:33<00:34,  1.64s/it, loss=8.48]

Epoch 3/5:  89%|████████▉ | 169/189 [04:33<00:32,  1.62s/it, loss=8.48]

Epoch 3/5:  89%|████████▉ | 169/189 [04:35<00:32,  1.62s/it, loss=8.49]

Epoch 3/5:  90%|████████▉ | 170/189 [04:35<00:30,  1.63s/it, loss=8.49]

Epoch 3/5:  90%|████████▉ | 170/189 [04:36<00:30,  1.63s/it, loss=8.54]

Epoch 3/5:  90%|█████████ | 171/189 [04:36<00:28,  1.60s/it, loss=8.54]

Epoch 3/5:  90%|█████████ | 171/189 [04:38<00:28,  1.60s/it, loss=8.48]

Epoch 3/5:  91%|█████████ | 172/189 [04:38<00:27,  1.61s/it, loss=8.48]

Epoch 3/5:  91%|█████████ | 172/189 [04:39<00:27,  1.61s/it, loss=8.46]

Epoch 3/5:  92%|█████████▏| 173/189 [04:39<00:25,  1.60s/it, loss=8.46]

Epoch 3/5:  92%|█████████▏| 173/189 [04:41<00:25,  1.60s/it, loss=8.47]

Epoch 3/5:  92%|█████████▏| 174/189 [04:41<00:23,  1.60s/it, loss=8.47]

Epoch 3/5:  92%|█████████▏| 174/189 [04:42<00:23,  1.60s/it, loss=8.33]

Epoch 3/5:  93%|█████████▎| 175/189 [04:42<00:22,  1.58s/it, loss=8.33]

Epoch 3/5:  93%|█████████▎| 175/189 [04:44<00:22,  1.58s/it, loss=8.39]

Epoch 3/5:  93%|█████████▎| 176/189 [04:44<00:20,  1.57s/it, loss=8.39]

Epoch 3/5:  93%|█████████▎| 176/189 [04:45<00:20,  1.57s/it, loss=8.47]

Epoch 3/5:  94%|█████████▎| 177/189 [04:45<00:18,  1.56s/it, loss=8.47]

Epoch 3/5:  94%|█████████▎| 177/189 [04:47<00:18,  1.56s/it, loss=8.23]

Epoch 3/5:  94%|█████████▍| 178/189 [04:47<00:17,  1.61s/it, loss=8.23]

Epoch 3/5:  94%|█████████▍| 178/189 [04:49<00:17,  1.61s/it, loss=8.32]

Epoch 3/5:  95%|█████████▍| 179/189 [04:49<00:16,  1.63s/it, loss=8.32]

Epoch 3/5:  95%|█████████▍| 179/189 [04:50<00:16,  1.63s/it, loss=8.36]

Epoch 3/5:  95%|█████████▌| 180/189 [04:50<00:14,  1.61s/it, loss=8.36]

Epoch 3/5:  95%|█████████▌| 180/189 [04:52<00:14,  1.61s/it, loss=8.42]

Epoch 3/5:  96%|█████████▌| 181/189 [04:52<00:12,  1.59s/it, loss=8.42]

Epoch 3/5:  96%|█████████▌| 181/189 [04:54<00:12,  1.59s/it, loss=8.34]

Epoch 3/5:  96%|█████████▋| 182/189 [04:54<00:11,  1.58s/it, loss=8.34]

Epoch 3/5:  96%|█████████▋| 182/189 [04:55<00:11,  1.58s/it, loss=8.30]

Epoch 3/5:  97%|█████████▋| 183/189 [04:55<00:09,  1.61s/it, loss=8.30]

Epoch 3/5:  97%|█████████▋| 183/189 [04:57<00:09,  1.61s/it, loss=8.43]

Epoch 3/5:  97%|█████████▋| 184/189 [04:57<00:07,  1.59s/it, loss=8.43]

Epoch 3/5:  97%|█████████▋| 184/189 [04:58<00:07,  1.59s/it, loss=8.47]

Epoch 3/5:  98%|█████████▊| 185/189 [04:58<00:06,  1.56s/it, loss=8.47]

Epoch 3/5:  98%|█████████▊| 185/189 [05:00<00:06,  1.56s/it, loss=8.35]

Epoch 3/5:  98%|█████████▊| 186/189 [05:00<00:04,  1.50s/it, loss=8.35]

Epoch 3/5:  98%|█████████▊| 186/189 [05:01<00:04,  1.50s/it, loss=8.30]

Epoch 3/5:  99%|█████████▉| 187/189 [05:01<00:03,  1.57s/it, loss=8.30]

Epoch 3/5:  99%|█████████▉| 187/189 [05:03<00:03,  1.57s/it, loss=8.30]

Epoch 3/5:  99%|█████████▉| 188/189 [05:03<00:01,  1.59s/it, loss=8.30]

Epoch 3/5:  99%|█████████▉| 188/189 [05:05<00:01,  1.59s/it, loss=8.23]

Epoch 3/5: 100%|██████████| 189/189 [05:05<00:00,  1.57s/it, loss=8.23]

Epoch 3/5: 100%|██████████| 189/189 [05:05<00:00,  1.61s/it, loss=8.23]




  0%|          | 0/23 [00:00<?, ?it/s]

  4%|▍         | 1/23 [00:00<00:05,  3.91it/s]

  9%|▊         | 2/23 [00:00<00:06,  3.04it/s]

 13%|█▎        | 3/23 [00:00<00:06,  2.91it/s]

 17%|█▋        | 4/23 [00:01<00:06,  3.04it/s]

 22%|██▏       | 5/23 [00:01<00:06,  2.85it/s]

 26%|██▌       | 6/23 [00:02<00:05,  2.97it/s]

 30%|███       | 7/23 [00:02<00:05,  3.07it/s]

 35%|███▍      | 8/23 [00:02<00:05,  2.98it/s]

 39%|███▉      | 9/23 [00:03<00:04,  2.83it/s]

 43%|████▎     | 10/23 [00:03<00:04,  2.87it/s]

 48%|████▊     | 11/23 [00:03<00:04,  2.68it/s]

 52%|█████▏    | 12/23 [00:04<00:04,  2.66it/s]

 57%|█████▋    | 13/23 [00:04<00:03,  2.74it/s]

 61%|██████    | 14/23 [00:04<00:03,  2.95it/s]

 65%|██████▌   | 15/23 [00:05<00:02,  2.91it/s]

 70%|██████▉   | 16/23 [00:05<00:02,  2.85it/s]

 74%|███████▍  | 17/23 [00:05<00:02,  2.74it/s]

 78%|███████▊  | 18/23 [00:06<00:01,  2.87it/s]

 83%|████████▎ | 19/23 [00:06<00:01,  2.83it/s]

 87%|████████▋ | 20/23 [00:06<00:01,  2.78it/s]

 91%|█████████▏| 21/23 [00:07<00:00,  2.83it/s]

 96%|█████████▌| 22/23 [00:07<00:00,  2.79it/s]

100%|██████████| 23/23 [00:07<00:00,  2.96it/s]

100%|██████████| 23/23 [00:07<00:00,  2.88it/s]





Epoch 3: train_loss=8.8681 | R@10=0.0193 | DCG@10=0.2030 | NDCG@10=0.0485


Epoch 4/5:   0%|          | 0/189 [00:00<?, ?it/s]

Epoch 4/5:   0%|          | 0/189 [00:01<?, ?it/s, loss=8.34]

Epoch 4/5:   1%|          | 1/189 [00:01<04:53,  1.56s/it, loss=8.34]

Epoch 4/5:   1%|          | 1/189 [00:03<04:53,  1.56s/it, loss=8.29]

Epoch 4/5:   1%|          | 2/189 [00:03<04:55,  1.58s/it, loss=8.29]

Epoch 4/5:   1%|          | 2/189 [00:04<04:55,  1.58s/it, loss=8.42]

Epoch 4/5:   2%|▏         | 3/189 [00:04<05:03,  1.63s/it, loss=8.42]

Epoch 4/5:   2%|▏         | 3/189 [00:06<05:03,  1.63s/it, loss=8.33]

Epoch 4/5:   2%|▏         | 4/189 [00:06<05:07,  1.66s/it, loss=8.33]

Epoch 4/5:   2%|▏         | 4/189 [00:08<05:07,  1.66s/it, loss=8.42]

Epoch 4/5:   3%|▎         | 5/189 [00:08<05:05,  1.66s/it, loss=8.42]

Epoch 4/5:   3%|▎         | 5/189 [00:09<05:05,  1.66s/it, loss=8.43]

Epoch 4/5:   3%|▎         | 6/189 [00:09<05:00,  1.64s/it, loss=8.43]

Epoch 4/5:   3%|▎         | 6/189 [00:11<05:00,  1.64s/it, loss=8.19]

Epoch 4/5:   4%|▎         | 7/189 [00:11<04:49,  1.59s/it, loss=8.19]

Epoch 4/5:   4%|▎         | 7/189 [00:12<04:49,  1.59s/it, loss=8.26]

Epoch 4/5:   4%|▍         | 8/189 [00:12<04:48,  1.59s/it, loss=8.26]

Epoch 4/5:   4%|▍         | 8/189 [00:14<04:48,  1.59s/it, loss=8.38]

Epoch 4/5:   5%|▍         | 9/189 [00:14<04:51,  1.62s/it, loss=8.38]

Epoch 4/5:   5%|▍         | 9/189 [00:16<04:51,  1.62s/it, loss=8.28]

Epoch 4/5:   5%|▌         | 10/189 [00:16<04:48,  1.61s/it, loss=8.28]

Epoch 4/5:   5%|▌         | 10/189 [00:17<04:48,  1.61s/it, loss=8.32]

Epoch 4/5:   6%|▌         | 11/189 [00:17<04:50,  1.63s/it, loss=8.32]

Epoch 4/5:   6%|▌         | 11/189 [00:19<04:50,  1.63s/it, loss=8.37]

Epoch 4/5:   6%|▋         | 12/189 [00:19<04:53,  1.66s/it, loss=8.37]

Epoch 4/5:   6%|▋         | 12/189 [00:21<04:53,  1.66s/it, loss=8.25]

Epoch 4/5:   7%|▋         | 13/189 [00:21<04:47,  1.64s/it, loss=8.25]

Epoch 4/5:   7%|▋         | 13/189 [00:22<04:47,  1.64s/it, loss=8.36]

Epoch 4/5:   7%|▋         | 14/189 [00:22<04:41,  1.61s/it, loss=8.36]

Epoch 4/5:   7%|▋         | 14/189 [00:24<04:41,  1.61s/it, loss=8.40]

Epoch 4/5:   8%|▊         | 15/189 [00:24<04:34,  1.58s/it, loss=8.40]

Epoch 4/5:   8%|▊         | 15/189 [00:25<04:34,  1.58s/it, loss=8.28]

Epoch 4/5:   8%|▊         | 16/189 [00:25<04:38,  1.61s/it, loss=8.28]

Epoch 4/5:   8%|▊         | 16/189 [00:27<04:38,  1.61s/it, loss=8.17]

Epoch 4/5:   9%|▉         | 17/189 [00:27<04:37,  1.61s/it, loss=8.17]

Epoch 4/5:   9%|▉         | 17/189 [00:29<04:37,  1.61s/it, loss=8.30]

Epoch 4/5:  10%|▉         | 18/189 [00:29<04:36,  1.61s/it, loss=8.30]

Epoch 4/5:  10%|▉         | 18/189 [00:30<04:36,  1.61s/it, loss=8.25]

Epoch 4/5:  10%|█         | 19/189 [00:30<04:35,  1.62s/it, loss=8.25]

Epoch 4/5:  10%|█         | 19/189 [00:32<04:35,  1.62s/it, loss=8.19]

Epoch 4/5:  11%|█         | 20/189 [00:32<04:30,  1.60s/it, loss=8.19]

Epoch 4/5:  11%|█         | 20/189 [00:34<04:30,  1.60s/it, loss=8.09]

Epoch 4/5:  11%|█         | 21/189 [00:34<04:33,  1.63s/it, loss=8.09]

Epoch 4/5:  11%|█         | 21/189 [00:35<04:33,  1.63s/it, loss=8.21]

Epoch 4/5:  12%|█▏        | 22/189 [00:35<04:29,  1.61s/it, loss=8.21]

Epoch 4/5:  12%|█▏        | 22/189 [00:37<04:29,  1.61s/it, loss=8.21]

Epoch 4/5:  12%|█▏        | 23/189 [00:37<04:24,  1.59s/it, loss=8.21]

Epoch 4/5:  12%|█▏        | 23/189 [00:38<04:24,  1.59s/it, loss=8.20]

Epoch 4/5:  13%|█▎        | 24/189 [00:38<04:30,  1.64s/it, loss=8.20]

Epoch 4/5:  13%|█▎        | 24/189 [00:40<04:30,  1.64s/it, loss=8.22]

Epoch 4/5:  13%|█▎        | 25/189 [00:40<04:30,  1.65s/it, loss=8.22]

Epoch 4/5:  13%|█▎        | 25/189 [00:42<04:30,  1.65s/it, loss=8.23]

Epoch 4/5:  14%|█▍        | 26/189 [00:42<04:26,  1.63s/it, loss=8.23]

Epoch 4/5:  14%|█▍        | 26/189 [00:43<04:26,  1.63s/it, loss=8.23]

Epoch 4/5:  14%|█▍        | 27/189 [00:43<04:13,  1.57s/it, loss=8.23]

Epoch 4/5:  14%|█▍        | 27/189 [00:45<04:13,  1.57s/it, loss=8.28]

Epoch 4/5:  15%|█▍        | 28/189 [00:45<04:15,  1.58s/it, loss=8.28]

Epoch 4/5:  15%|█▍        | 28/189 [00:46<04:15,  1.58s/it, loss=8.15]

Epoch 4/5:  15%|█▌        | 29/189 [00:46<04:04,  1.53s/it, loss=8.15]

Epoch 4/5:  15%|█▌        | 29/189 [00:48<04:04,  1.53s/it, loss=8.05]

Epoch 4/5:  16%|█▌        | 30/189 [00:48<04:04,  1.54s/it, loss=8.05]

Epoch 4/5:  16%|█▌        | 30/189 [00:49<04:04,  1.54s/it, loss=8.17]

Epoch 4/5:  16%|█▋        | 31/189 [00:49<04:05,  1.55s/it, loss=8.17]

Epoch 4/5:  16%|█▋        | 31/189 [00:51<04:05,  1.55s/it, loss=8.15]

Epoch 4/5:  17%|█▋        | 32/189 [00:51<04:10,  1.59s/it, loss=8.15]

Epoch 4/5:  17%|█▋        | 32/189 [00:53<04:10,  1.59s/it, loss=8.12]

Epoch 4/5:  17%|█▋        | 33/189 [00:53<04:09,  1.60s/it, loss=8.12]

Epoch 4/5:  17%|█▋        | 33/189 [00:54<04:09,  1.60s/it, loss=8.06]

Epoch 4/5:  18%|█▊        | 34/189 [00:54<04:11,  1.62s/it, loss=8.06]

Epoch 4/5:  18%|█▊        | 34/189 [00:56<04:11,  1.62s/it, loss=8.15]

Epoch 4/5:  19%|█▊        | 35/189 [00:56<04:10,  1.63s/it, loss=8.15]

Epoch 4/5:  19%|█▊        | 35/189 [00:58<04:10,  1.63s/it, loss=8.18]

Epoch 4/5:  19%|█▉        | 36/189 [00:58<04:10,  1.64s/it, loss=8.18]

Epoch 4/5:  19%|█▉        | 36/189 [00:59<04:10,  1.64s/it, loss=8.28]

Epoch 4/5:  20%|█▉        | 37/189 [00:59<03:52,  1.53s/it, loss=8.28]

Epoch 4/5:  20%|█▉        | 37/189 [01:00<03:52,  1.53s/it, loss=8.14]

Epoch 4/5:  20%|██        | 38/189 [01:00<03:50,  1.53s/it, loss=8.14]

Epoch 4/5:  20%|██        | 38/189 [01:02<03:50,  1.53s/it, loss=8.13]

Epoch 4/5:  21%|██        | 39/189 [01:02<03:50,  1.54s/it, loss=8.13]

Epoch 4/5:  21%|██        | 39/189 [01:04<03:50,  1.54s/it, loss=8.22]

Epoch 4/5:  21%|██        | 40/189 [01:04<03:53,  1.57s/it, loss=8.22]

Epoch 4/5:  21%|██        | 40/189 [01:05<03:53,  1.57s/it, loss=8.24]

Epoch 4/5:  22%|██▏       | 41/189 [01:05<03:55,  1.59s/it, loss=8.24]

Epoch 4/5:  22%|██▏       | 41/189 [01:07<03:55,  1.59s/it, loss=8.19]

Epoch 4/5:  22%|██▏       | 42/189 [01:07<03:51,  1.57s/it, loss=8.19]

Epoch 4/5:  22%|██▏       | 42/189 [01:08<03:51,  1.57s/it, loss=7.95]

Epoch 4/5:  23%|██▎       | 43/189 [01:08<03:49,  1.57s/it, loss=7.95]

Epoch 4/5:  23%|██▎       | 43/189 [01:10<03:49,  1.57s/it, loss=8.04]

Epoch 4/5:  23%|██▎       | 44/189 [01:10<03:42,  1.53s/it, loss=8.04]

Epoch 4/5:  23%|██▎       | 44/189 [01:11<03:42,  1.53s/it, loss=8.11]

Epoch 4/5:  24%|██▍       | 45/189 [01:11<03:48,  1.59s/it, loss=8.11]

Epoch 4/5:  24%|██▍       | 45/189 [01:13<03:48,  1.59s/it, loss=8.13]

Epoch 4/5:  24%|██▍       | 46/189 [01:13<03:50,  1.61s/it, loss=8.13]

Epoch 4/5:  24%|██▍       | 46/189 [01:15<03:50,  1.61s/it, loss=8.33]

Epoch 4/5:  25%|██▍       | 47/189 [01:15<03:48,  1.61s/it, loss=8.33]

Epoch 4/5:  25%|██▍       | 47/189 [01:16<03:48,  1.61s/it, loss=8.22]

Epoch 4/5:  25%|██▌       | 48/189 [01:16<03:48,  1.62s/it, loss=8.22]

Epoch 4/5:  25%|██▌       | 48/189 [01:18<03:48,  1.62s/it, loss=8.22]

Epoch 4/5:  26%|██▌       | 49/189 [01:18<03:46,  1.61s/it, loss=8.22]

Epoch 4/5:  26%|██▌       | 49/189 [01:20<03:46,  1.61s/it, loss=8.21]

Epoch 4/5:  26%|██▋       | 50/189 [01:20<03:46,  1.63s/it, loss=8.21]

Epoch 4/5:  26%|██▋       | 50/189 [01:21<03:46,  1.63s/it, loss=8.11]

Epoch 4/5:  27%|██▋       | 51/189 [01:21<03:39,  1.59s/it, loss=8.11]

Epoch 4/5:  27%|██▋       | 51/189 [01:23<03:39,  1.59s/it, loss=8.11]

Epoch 4/5:  28%|██▊       | 52/189 [01:23<03:35,  1.57s/it, loss=8.11]

Epoch 4/5:  28%|██▊       | 52/189 [01:24<03:35,  1.57s/it, loss=8.19]

Epoch 4/5:  28%|██▊       | 53/189 [01:24<03:37,  1.60s/it, loss=8.19]

Epoch 4/5:  28%|██▊       | 53/189 [01:26<03:37,  1.60s/it, loss=8.15]

Epoch 4/5:  29%|██▊       | 54/189 [01:26<03:38,  1.62s/it, loss=8.15]

Epoch 4/5:  29%|██▊       | 54/189 [01:27<03:38,  1.62s/it, loss=8.08]

Epoch 4/5:  29%|██▉       | 55/189 [01:27<03:33,  1.59s/it, loss=8.08]

Epoch 4/5:  29%|██▉       | 55/189 [01:29<03:33,  1.59s/it, loss=8.10]

Epoch 4/5:  30%|██▉       | 56/189 [01:29<03:27,  1.56s/it, loss=8.10]

Epoch 4/5:  30%|██▉       | 56/189 [01:30<03:27,  1.56s/it, loss=8.18]

Epoch 4/5:  30%|███       | 57/189 [01:30<03:25,  1.55s/it, loss=8.18]

Epoch 4/5:  30%|███       | 57/189 [01:32<03:25,  1.55s/it, loss=8.15]

Epoch 4/5:  31%|███       | 58/189 [01:32<03:29,  1.60s/it, loss=8.15]

Epoch 4/5:  31%|███       | 58/189 [01:34<03:29,  1.60s/it, loss=8.15]

Epoch 4/5:  31%|███       | 59/189 [01:34<03:27,  1.60s/it, loss=8.15]

Epoch 4/5:  31%|███       | 59/189 [01:35<03:27,  1.60s/it, loss=8.23]

Epoch 4/5:  32%|███▏      | 60/189 [01:35<03:28,  1.62s/it, loss=8.23]

Epoch 4/5:  32%|███▏      | 60/189 [01:37<03:28,  1.62s/it, loss=8.15]

Epoch 4/5:  32%|███▏      | 61/189 [01:37<03:29,  1.63s/it, loss=8.15]

Epoch 4/5:  32%|███▏      | 61/189 [01:39<03:29,  1.63s/it, loss=8.03]

Epoch 4/5:  33%|███▎      | 62/189 [01:39<03:27,  1.63s/it, loss=8.03]

Epoch 4/5:  33%|███▎      | 62/189 [01:41<03:27,  1.63s/it, loss=7.94]

Epoch 4/5:  33%|███▎      | 63/189 [01:41<03:31,  1.68s/it, loss=7.94]

Epoch 4/5:  33%|███▎      | 63/189 [01:42<03:31,  1.68s/it, loss=8.06]

Epoch 4/5:  34%|███▍      | 64/189 [01:42<03:29,  1.68s/it, loss=8.06]

Epoch 4/5:  34%|███▍      | 64/189 [01:44<03:29,  1.68s/it, loss=8.11]

Epoch 4/5:  34%|███▍      | 65/189 [01:44<03:25,  1.65s/it, loss=8.11]

Epoch 4/5:  34%|███▍      | 65/189 [01:45<03:25,  1.65s/it, loss=7.98]

Epoch 4/5:  35%|███▍      | 66/189 [01:45<03:23,  1.65s/it, loss=7.98]

Epoch 4/5:  35%|███▍      | 66/189 [01:47<03:23,  1.65s/it, loss=8.13]

Epoch 4/5:  35%|███▌      | 67/189 [01:47<03:16,  1.61s/it, loss=8.13]

Epoch 4/5:  35%|███▌      | 67/189 [01:48<03:16,  1.61s/it, loss=8.01]

Epoch 4/5:  36%|███▌      | 68/189 [01:48<03:09,  1.56s/it, loss=8.01]

Epoch 4/5:  36%|███▌      | 68/189 [01:50<03:09,  1.56s/it, loss=8.11]

Epoch 4/5:  37%|███▋      | 69/189 [01:50<03:08,  1.57s/it, loss=8.11]

Epoch 4/5:  37%|███▋      | 69/189 [01:52<03:08,  1.57s/it, loss=8.02]

Epoch 4/5:  37%|███▋      | 70/189 [01:52<03:08,  1.58s/it, loss=8.02]

Epoch 4/5:  37%|███▋      | 70/189 [01:53<03:08,  1.58s/it, loss=8.03]

Epoch 4/5:  38%|███▊      | 71/189 [01:53<03:09,  1.60s/it, loss=8.03]

Epoch 4/5:  38%|███▊      | 71/189 [01:55<03:09,  1.60s/it, loss=8.05]

Epoch 4/5:  38%|███▊      | 72/189 [01:55<03:10,  1.63s/it, loss=8.05]

Epoch 4/5:  38%|███▊      | 72/189 [01:57<03:10,  1.63s/it, loss=8.16]

Epoch 4/5:  39%|███▊      | 73/189 [01:57<03:06,  1.61s/it, loss=8.16]

Epoch 4/5:  39%|███▊      | 73/189 [01:58<03:06,  1.61s/it, loss=8.16]

Epoch 4/5:  39%|███▉      | 74/189 [01:58<03:07,  1.63s/it, loss=8.16]

Epoch 4/5:  39%|███▉      | 74/189 [02:00<03:07,  1.63s/it, loss=8.11]

Epoch 4/5:  40%|███▉      | 75/189 [02:00<03:03,  1.61s/it, loss=8.11]

Epoch 4/5:  40%|███▉      | 75/189 [02:01<03:03,  1.61s/it, loss=8.11]

Epoch 4/5:  40%|████      | 76/189 [02:01<03:04,  1.63s/it, loss=8.11]

Epoch 4/5:  40%|████      | 76/189 [02:03<03:04,  1.63s/it, loss=8.08]

Epoch 4/5:  41%|████      | 77/189 [02:03<03:00,  1.61s/it, loss=8.08]

Epoch 4/5:  41%|████      | 77/189 [02:04<03:00,  1.61s/it, loss=8.08]

Epoch 4/5:  41%|████▏     | 78/189 [02:04<02:53,  1.56s/it, loss=8.08]

Epoch 4/5:  41%|████▏     | 78/189 [02:06<02:53,  1.56s/it, loss=8.08]

Epoch 4/5:  42%|████▏     | 79/189 [02:06<02:52,  1.57s/it, loss=8.08]

Epoch 4/5:  42%|████▏     | 79/189 [02:08<02:52,  1.57s/it, loss=8.12]

Epoch 4/5:  42%|████▏     | 80/189 [02:08<02:50,  1.57s/it, loss=8.12]

Epoch 4/5:  42%|████▏     | 80/189 [02:09<02:50,  1.57s/it, loss=8.11]

Epoch 4/5:  43%|████▎     | 81/189 [02:09<02:49,  1.57s/it, loss=8.11]

Epoch 4/5:  43%|████▎     | 81/189 [02:11<02:49,  1.57s/it, loss=7.96]

Epoch 4/5:  43%|████▎     | 82/189 [02:11<02:51,  1.60s/it, loss=7.96]

Epoch 4/5:  43%|████▎     | 82/189 [02:12<02:51,  1.60s/it, loss=8.03]

Epoch 4/5:  44%|████▍     | 83/189 [02:12<02:50,  1.61s/it, loss=8.03]

Epoch 4/5:  44%|████▍     | 83/189 [02:14<02:50,  1.61s/it, loss=7.98]

Epoch 4/5:  44%|████▍     | 84/189 [02:14<02:47,  1.59s/it, loss=7.98]

Epoch 4/5:  44%|████▍     | 84/189 [02:16<02:47,  1.59s/it, loss=7.97]

Epoch 4/5:  45%|████▍     | 85/189 [02:16<02:51,  1.65s/it, loss=7.97]

Epoch 4/5:  45%|████▍     | 85/189 [02:18<02:51,  1.65s/it, loss=8.13]

Epoch 4/5:  46%|████▌     | 86/189 [02:18<02:50,  1.66s/it, loss=8.13]

Epoch 4/5:  46%|████▌     | 86/189 [02:19<02:50,  1.66s/it, loss=8.12]

Epoch 4/5:  46%|████▌     | 87/189 [02:19<02:47,  1.64s/it, loss=8.12]

Epoch 4/5:  46%|████▌     | 87/189 [02:21<02:47,  1.64s/it, loss=8.12]

Epoch 4/5:  47%|████▋     | 88/189 [02:21<02:45,  1.64s/it, loss=8.12]

Epoch 4/5:  47%|████▋     | 88/189 [02:23<02:45,  1.64s/it, loss=8.02]

Epoch 4/5:  47%|████▋     | 89/189 [02:23<02:50,  1.70s/it, loss=8.02]

Epoch 4/5:  47%|████▋     | 89/189 [02:24<02:50,  1.70s/it, loss=8.00]

Epoch 4/5:  48%|████▊     | 90/189 [02:24<02:46,  1.68s/it, loss=8.00]

Epoch 4/5:  48%|████▊     | 90/189 [02:26<02:46,  1.68s/it, loss=8.12]

Epoch 4/5:  48%|████▊     | 91/189 [02:26<02:44,  1.68s/it, loss=8.12]

Epoch 4/5:  48%|████▊     | 91/189 [02:27<02:44,  1.68s/it, loss=8.13]

Epoch 4/5:  49%|████▊     | 92/189 [02:27<02:40,  1.65s/it, loss=8.13]

Epoch 4/5:  49%|████▊     | 92/189 [02:29<02:40,  1.65s/it, loss=8.04]

Epoch 4/5:  49%|████▉     | 93/189 [02:29<02:37,  1.64s/it, loss=8.04]

Epoch 4/5:  49%|████▉     | 93/189 [02:31<02:37,  1.64s/it, loss=8.08]

Epoch 4/5:  50%|████▉     | 94/189 [02:31<02:31,  1.59s/it, loss=8.08]

Epoch 4/5:  50%|████▉     | 94/189 [02:32<02:31,  1.59s/it, loss=8.00]

Epoch 4/5:  50%|█████     | 95/189 [02:32<02:32,  1.62s/it, loss=8.00]

Epoch 4/5:  50%|█████     | 95/189 [02:34<02:32,  1.62s/it, loss=8.02]

Epoch 4/5:  51%|█████     | 96/189 [02:34<02:30,  1.62s/it, loss=8.02]

Epoch 4/5:  51%|█████     | 96/189 [02:36<02:30,  1.62s/it, loss=8.11]

Epoch 4/5:  51%|█████▏    | 97/189 [02:36<02:31,  1.65s/it, loss=8.11]

Epoch 4/5:  51%|█████▏    | 97/189 [02:37<02:31,  1.65s/it, loss=7.96]

Epoch 4/5:  52%|█████▏    | 98/189 [02:37<02:31,  1.66s/it, loss=7.96]

Epoch 4/5:  52%|█████▏    | 98/189 [02:39<02:31,  1.66s/it, loss=8.11]

Epoch 4/5:  52%|█████▏    | 99/189 [02:39<02:27,  1.64s/it, loss=8.11]

Epoch 4/5:  52%|█████▏    | 99/189 [02:40<02:27,  1.64s/it, loss=7.94]

Epoch 4/5:  53%|█████▎    | 100/189 [02:40<02:23,  1.61s/it, loss=7.94]

Epoch 4/5:  53%|█████▎    | 100/189 [02:42<02:23,  1.61s/it, loss=8.13]

Epoch 4/5:  53%|█████▎    | 101/189 [02:42<02:22,  1.62s/it, loss=8.13]

Epoch 4/5:  53%|█████▎    | 101/189 [02:44<02:22,  1.62s/it, loss=8.12]

Epoch 4/5:  54%|█████▍    | 102/189 [02:44<02:17,  1.58s/it, loss=8.12]

Epoch 4/5:  54%|█████▍    | 102/189 [02:45<02:17,  1.58s/it, loss=8.07]

Epoch 4/5:  54%|█████▍    | 103/189 [02:45<02:11,  1.53s/it, loss=8.07]

Epoch 4/5:  54%|█████▍    | 103/189 [02:47<02:11,  1.53s/it, loss=7.99]

Epoch 4/5:  55%|█████▌    | 104/189 [02:47<02:14,  1.58s/it, loss=7.99]

Epoch 4/5:  55%|█████▌    | 104/189 [02:48<02:14,  1.58s/it, loss=7.98]

Epoch 4/5:  56%|█████▌    | 105/189 [02:48<02:14,  1.61s/it, loss=7.98]

Epoch 4/5:  56%|█████▌    | 105/189 [02:50<02:14,  1.61s/it, loss=7.96]

Epoch 4/5:  56%|█████▌    | 106/189 [02:50<02:14,  1.62s/it, loss=7.96]

Epoch 4/5:  56%|█████▌    | 106/189 [02:52<02:14,  1.62s/it, loss=8.02]

Epoch 4/5:  57%|█████▋    | 107/189 [02:52<02:15,  1.65s/it, loss=8.02]

Epoch 4/5:  57%|█████▋    | 107/189 [02:53<02:15,  1.65s/it, loss=8.03]

Epoch 4/5:  57%|█████▋    | 108/189 [02:53<02:09,  1.60s/it, loss=8.03]

Epoch 4/5:  57%|█████▋    | 108/189 [02:55<02:09,  1.60s/it, loss=8.18]

Epoch 4/5:  58%|█████▊    | 109/189 [02:55<02:07,  1.59s/it, loss=8.18]

Epoch 4/5:  58%|█████▊    | 109/189 [02:56<02:07,  1.59s/it, loss=7.83]

Epoch 4/5:  58%|█████▊    | 110/189 [02:56<02:03,  1.56s/it, loss=7.83]

Epoch 4/5:  58%|█████▊    | 110/189 [02:58<02:03,  1.56s/it, loss=8.14]

Epoch 4/5:  59%|█████▊    | 111/189 [02:58<02:02,  1.57s/it, loss=8.14]

Epoch 4/5:  59%|█████▊    | 111/189 [03:00<02:02,  1.57s/it, loss=8.05]

Epoch 4/5:  59%|█████▉    | 112/189 [03:00<02:03,  1.60s/it, loss=8.05]

Epoch 4/5:  59%|█████▉    | 112/189 [03:01<02:03,  1.60s/it, loss=8.09]

Epoch 4/5:  60%|█████▉    | 113/189 [03:01<02:03,  1.62s/it, loss=8.09]

Epoch 4/5:  60%|█████▉    | 113/189 [03:03<02:03,  1.62s/it, loss=7.88]

Epoch 4/5:  60%|██████    | 114/189 [03:03<02:01,  1.61s/it, loss=7.88]

Epoch 4/5:  60%|██████    | 114/189 [03:04<02:01,  1.61s/it, loss=7.95]

Epoch 4/5:  61%|██████    | 115/189 [03:04<01:57,  1.59s/it, loss=7.95]

Epoch 4/5:  61%|██████    | 115/189 [03:06<01:57,  1.59s/it, loss=8.16]

Epoch 4/5:  61%|██████▏   | 116/189 [03:06<01:57,  1.61s/it, loss=8.16]

Epoch 4/5:  61%|██████▏   | 116/189 [03:08<01:57,  1.61s/it, loss=8.02]

Epoch 4/5:  62%|██████▏   | 117/189 [03:08<01:55,  1.60s/it, loss=8.02]

Epoch 4/5:  62%|██████▏   | 117/189 [03:09<01:55,  1.60s/it, loss=8.01]

Epoch 4/5:  62%|██████▏   | 118/189 [03:09<01:55,  1.62s/it, loss=8.01]

Epoch 4/5:  62%|██████▏   | 118/189 [03:11<01:55,  1.62s/it, loss=7.77]

Epoch 4/5:  63%|██████▎   | 119/189 [03:11<01:53,  1.62s/it, loss=7.77]

Epoch 4/5:  63%|██████▎   | 119/189 [03:12<01:53,  1.62s/it, loss=8.01]

Epoch 4/5:  63%|██████▎   | 120/189 [03:12<01:52,  1.63s/it, loss=8.01]

Epoch 4/5:  63%|██████▎   | 120/189 [03:14<01:52,  1.63s/it, loss=7.82]

Epoch 4/5:  64%|██████▍   | 121/189 [03:14<01:51,  1.64s/it, loss=7.82]

Epoch 4/5:  64%|██████▍   | 121/189 [03:16<01:51,  1.64s/it, loss=8.04]

Epoch 4/5:  65%|██████▍   | 122/189 [03:16<01:51,  1.67s/it, loss=8.04]

Epoch 4/5:  65%|██████▍   | 122/189 [03:17<01:51,  1.67s/it, loss=8.08]

Epoch 4/5:  65%|██████▌   | 123/189 [03:17<01:47,  1.63s/it, loss=8.08]

Epoch 4/5:  65%|██████▌   | 123/189 [03:19<01:47,  1.63s/it, loss=7.92]

Epoch 4/5:  66%|██████▌   | 124/189 [03:19<01:45,  1.63s/it, loss=7.92]

Epoch 4/5:  66%|██████▌   | 124/189 [03:21<01:45,  1.63s/it, loss=8.08]

Epoch 4/5:  66%|██████▌   | 125/189 [03:21<01:41,  1.59s/it, loss=8.08]

Epoch 4/5:  66%|██████▌   | 125/189 [03:22<01:41,  1.59s/it, loss=7.87]

Epoch 4/5:  67%|██████▋   | 126/189 [03:22<01:38,  1.57s/it, loss=7.87]

Epoch 4/5:  67%|██████▋   | 126/189 [03:24<01:38,  1.57s/it, loss=8.12]

Epoch 4/5:  67%|██████▋   | 127/189 [03:24<01:38,  1.58s/it, loss=8.12]

Epoch 4/5:  67%|██████▋   | 127/189 [03:25<01:38,  1.58s/it, loss=8.02]

Epoch 4/5:  68%|██████▊   | 128/189 [03:25<01:36,  1.58s/it, loss=8.02]

Epoch 4/5:  68%|██████▊   | 128/189 [03:27<01:36,  1.58s/it, loss=8.03]

Epoch 4/5:  68%|██████▊   | 129/189 [03:27<01:38,  1.63s/it, loss=8.03]

Epoch 4/5:  68%|██████▊   | 129/189 [03:29<01:38,  1.63s/it, loss=8.11]

Epoch 4/5:  69%|██████▉   | 130/189 [03:29<01:36,  1.63s/it, loss=8.11]

Epoch 4/5:  69%|██████▉   | 130/189 [03:30<01:36,  1.63s/it, loss=7.91]

Epoch 4/5:  69%|██████▉   | 131/189 [03:30<01:34,  1.64s/it, loss=7.91]

Epoch 4/5:  69%|██████▉   | 131/189 [03:32<01:34,  1.64s/it, loss=7.94]

Epoch 4/5:  70%|██████▉   | 132/189 [03:32<01:34,  1.67s/it, loss=7.94]

Epoch 4/5:  70%|██████▉   | 132/189 [03:34<01:34,  1.67s/it, loss=7.95]

Epoch 4/5:  70%|███████   | 133/189 [03:34<01:33,  1.66s/it, loss=7.95]

Epoch 4/5:  70%|███████   | 133/189 [03:35<01:33,  1.66s/it, loss=7.78]

Epoch 4/5:  71%|███████   | 134/189 [03:35<01:29,  1.63s/it, loss=7.78]

Epoch 4/5:  71%|███████   | 134/189 [03:37<01:29,  1.63s/it, loss=8.03]

Epoch 4/5:  71%|███████▏  | 135/189 [03:37<01:28,  1.64s/it, loss=8.03]

Epoch 4/5:  71%|███████▏  | 135/189 [03:39<01:28,  1.64s/it, loss=7.90]

Epoch 4/5:  72%|███████▏  | 136/189 [03:39<01:26,  1.64s/it, loss=7.90]

Epoch 4/5:  72%|███████▏  | 136/189 [03:40<01:26,  1.64s/it, loss=8.02]

Epoch 4/5:  72%|███████▏  | 137/189 [03:40<01:26,  1.67s/it, loss=8.02]

Epoch 4/5:  72%|███████▏  | 137/189 [03:42<01:26,  1.67s/it, loss=7.86]

Epoch 4/5:  73%|███████▎  | 138/189 [03:42<01:25,  1.68s/it, loss=7.86]

Epoch 4/5:  73%|███████▎  | 138/189 [03:44<01:25,  1.68s/it, loss=7.98]

Epoch 4/5:  74%|███████▎  | 139/189 [03:44<01:23,  1.66s/it, loss=7.98]

Epoch 4/5:  74%|███████▎  | 139/189 [03:45<01:23,  1.66s/it, loss=7.85]

Epoch 4/5:  74%|███████▍  | 140/189 [03:45<01:22,  1.68s/it, loss=7.85]

Epoch 4/5:  74%|███████▍  | 140/189 [03:47<01:22,  1.68s/it, loss=8.20]

Epoch 4/5:  75%|███████▍  | 141/189 [03:47<01:16,  1.60s/it, loss=8.20]

Epoch 4/5:  75%|███████▍  | 141/189 [03:48<01:16,  1.60s/it, loss=8.05]

Epoch 4/5:  75%|███████▌  | 142/189 [03:48<01:16,  1.63s/it, loss=8.05]

Epoch 4/5:  75%|███████▌  | 142/189 [03:50<01:16,  1.63s/it, loss=8.03]

Epoch 4/5:  76%|███████▌  | 143/189 [03:50<01:16,  1.66s/it, loss=8.03]

Epoch 4/5:  76%|███████▌  | 143/189 [03:52<01:16,  1.66s/it, loss=7.94]

Epoch 4/5:  76%|███████▌  | 144/189 [03:52<01:14,  1.65s/it, loss=7.94]

Epoch 4/5:  76%|███████▌  | 144/189 [03:53<01:14,  1.65s/it, loss=8.02]

Epoch 4/5:  77%|███████▋  | 145/189 [03:53<01:11,  1.63s/it, loss=8.02]

Epoch 4/5:  77%|███████▋  | 145/189 [03:55<01:11,  1.63s/it, loss=7.91]

Epoch 4/5:  77%|███████▋  | 146/189 [03:55<01:09,  1.62s/it, loss=7.91]

Epoch 4/5:  77%|███████▋  | 146/189 [03:57<01:09,  1.62s/it, loss=7.79]

Epoch 4/5:  78%|███████▊  | 147/189 [03:57<01:09,  1.64s/it, loss=7.79]

Epoch 4/5:  78%|███████▊  | 147/189 [03:58<01:09,  1.64s/it, loss=7.92]

Epoch 4/5:  78%|███████▊  | 148/189 [03:58<01:07,  1.66s/it, loss=7.92]

Epoch 4/5:  78%|███████▊  | 148/189 [04:00<01:07,  1.66s/it, loss=7.88]

Epoch 4/5:  79%|███████▉  | 149/189 [04:00<01:06,  1.67s/it, loss=7.88]

Epoch 4/5:  79%|███████▉  | 149/189 [04:02<01:06,  1.67s/it, loss=7.77]

Epoch 4/5:  79%|███████▉  | 150/189 [04:02<01:05,  1.68s/it, loss=7.77]

Epoch 4/5:  79%|███████▉  | 150/189 [04:04<01:05,  1.68s/it, loss=7.93]

Epoch 4/5:  80%|███████▉  | 151/189 [04:04<01:04,  1.70s/it, loss=7.93]

Epoch 4/5:  80%|███████▉  | 151/189 [04:05<01:04,  1.70s/it, loss=7.93]

Epoch 4/5:  80%|████████  | 152/189 [04:05<01:02,  1.69s/it, loss=7.93]

Epoch 4/5:  80%|████████  | 152/189 [04:07<01:02,  1.69s/it, loss=7.96]

Epoch 4/5:  81%|████████  | 153/189 [04:07<01:01,  1.71s/it, loss=7.96]

Epoch 4/5:  81%|████████  | 153/189 [04:09<01:01,  1.71s/it, loss=7.98]

Epoch 4/5:  81%|████████▏ | 154/189 [04:09<00:59,  1.71s/it, loss=7.98]

Epoch 4/5:  81%|████████▏ | 154/189 [04:10<00:59,  1.71s/it, loss=7.90]

Epoch 4/5:  82%|████████▏ | 155/189 [04:10<00:58,  1.71s/it, loss=7.90]

Epoch 4/5:  82%|████████▏ | 155/189 [04:12<00:58,  1.71s/it, loss=7.77]

Epoch 4/5:  83%|████████▎ | 156/189 [04:12<00:54,  1.65s/it, loss=7.77]

Epoch 4/5:  83%|████████▎ | 156/189 [04:14<00:54,  1.65s/it, loss=7.96]

Epoch 4/5:  83%|████████▎ | 157/189 [04:14<00:53,  1.66s/it, loss=7.96]

Epoch 4/5:  83%|████████▎ | 157/189 [04:15<00:53,  1.66s/it, loss=7.86]

Epoch 4/5:  84%|████████▎ | 158/189 [04:15<00:50,  1.63s/it, loss=7.86]

Epoch 4/5:  84%|████████▎ | 158/189 [04:17<00:50,  1.63s/it, loss=7.83]

Epoch 4/5:  84%|████████▍ | 159/189 [04:17<00:49,  1.64s/it, loss=7.83]

Epoch 4/5:  84%|████████▍ | 159/189 [04:18<00:49,  1.64s/it, loss=7.81]

Epoch 4/5:  85%|████████▍ | 160/189 [04:18<00:47,  1.65s/it, loss=7.81]

Epoch 4/5:  85%|████████▍ | 160/189 [04:20<00:47,  1.65s/it, loss=7.94]

Epoch 4/5:  85%|████████▌ | 161/189 [04:20<00:46,  1.66s/it, loss=7.94]

Epoch 4/5:  85%|████████▌ | 161/189 [04:22<00:46,  1.66s/it, loss=7.79]

Epoch 4/5:  86%|████████▌ | 162/189 [04:22<00:44,  1.66s/it, loss=7.79]

Epoch 4/5:  86%|████████▌ | 162/189 [04:23<00:44,  1.66s/it, loss=7.77]

Epoch 4/5:  86%|████████▌ | 163/189 [04:23<00:41,  1.61s/it, loss=7.77]

Epoch 4/5:  86%|████████▌ | 163/189 [04:25<00:41,  1.61s/it, loss=7.72]

Epoch 4/5:  87%|████████▋ | 164/189 [04:25<00:40,  1.63s/it, loss=7.72]

Epoch 4/5:  87%|████████▋ | 164/189 [04:26<00:40,  1.63s/it, loss=8.04]

Epoch 4/5:  87%|████████▋ | 165/189 [04:26<00:37,  1.58s/it, loss=8.04]

Epoch 4/5:  87%|████████▋ | 165/189 [04:28<00:37,  1.58s/it, loss=7.95]

Epoch 4/5:  88%|████████▊ | 166/189 [04:28<00:35,  1.53s/it, loss=7.95]

Epoch 4/5:  88%|████████▊ | 166/189 [04:29<00:35,  1.53s/it, loss=7.90]

Epoch 4/5:  88%|████████▊ | 167/189 [04:29<00:34,  1.55s/it, loss=7.90]

Epoch 4/5:  88%|████████▊ | 167/189 [04:31<00:34,  1.55s/it, loss=7.85]

Epoch 4/5:  89%|████████▉ | 168/189 [04:31<00:33,  1.58s/it, loss=7.85]

Epoch 4/5:  89%|████████▉ | 168/189 [04:33<00:33,  1.58s/it, loss=7.91]

Epoch 4/5:  89%|████████▉ | 169/189 [04:33<00:32,  1.60s/it, loss=7.91]

Epoch 4/5:  89%|████████▉ | 169/189 [04:34<00:32,  1.60s/it, loss=7.77]

Epoch 4/5:  90%|████████▉ | 170/189 [04:34<00:29,  1.58s/it, loss=7.77]

Epoch 4/5:  90%|████████▉ | 170/189 [04:36<00:29,  1.58s/it, loss=7.96]

Epoch 4/5:  90%|█████████ | 171/189 [04:36<00:29,  1.62s/it, loss=7.96]

Epoch 4/5:  90%|█████████ | 171/189 [04:38<00:29,  1.62s/it, loss=8.05]

Epoch 4/5:  91%|█████████ | 172/189 [04:38<00:27,  1.60s/it, loss=8.05]

Epoch 4/5:  91%|█████████ | 172/189 [04:39<00:27,  1.60s/it, loss=8.11]

Epoch 4/5:  92%|█████████▏| 173/189 [04:39<00:25,  1.60s/it, loss=8.11]

Epoch 4/5:  92%|█████████▏| 173/189 [04:41<00:25,  1.60s/it, loss=7.86]

Epoch 4/5:  92%|█████████▏| 174/189 [04:41<00:23,  1.59s/it, loss=7.86]

Epoch 4/5:  92%|█████████▏| 174/189 [04:42<00:23,  1.59s/it, loss=7.77]

Epoch 4/5:  93%|█████████▎| 175/189 [04:42<00:22,  1.60s/it, loss=7.77]

Epoch 4/5:  93%|█████████▎| 175/189 [04:44<00:22,  1.60s/it, loss=7.96]

Epoch 4/5:  93%|█████████▎| 176/189 [04:44<00:21,  1.62s/it, loss=7.96]

Epoch 4/5:  93%|█████████▎| 176/189 [04:46<00:21,  1.62s/it, loss=7.82]

Epoch 4/5:  94%|█████████▎| 177/189 [04:46<00:19,  1.63s/it, loss=7.82]

Epoch 4/5:  94%|█████████▎| 177/189 [04:47<00:19,  1.63s/it, loss=7.88]

Epoch 4/5:  94%|█████████▍| 178/189 [04:47<00:17,  1.62s/it, loss=7.88]

Epoch 4/5:  94%|█████████▍| 178/189 [04:49<00:17,  1.62s/it, loss=7.86]

Epoch 4/5:  95%|█████████▍| 179/189 [04:49<00:16,  1.60s/it, loss=7.86]

Epoch 4/5:  95%|█████████▍| 179/189 [04:50<00:16,  1.60s/it, loss=7.97]

Epoch 4/5:  95%|█████████▌| 180/189 [04:50<00:14,  1.60s/it, loss=7.97]

Epoch 4/5:  95%|█████████▌| 180/189 [04:52<00:14,  1.60s/it, loss=7.95]

Epoch 4/5:  96%|█████████▌| 181/189 [04:52<00:12,  1.61s/it, loss=7.95]

Epoch 4/5:  96%|█████████▌| 181/189 [04:53<00:12,  1.61s/it, loss=7.97]

Epoch 4/5:  96%|█████████▋| 182/189 [04:53<00:10,  1.56s/it, loss=7.97]

Epoch 4/5:  96%|█████████▋| 182/189 [04:55<00:10,  1.56s/it, loss=7.95]

Epoch 4/5:  97%|█████████▋| 183/189 [04:55<00:09,  1.62s/it, loss=7.95]

Epoch 4/5:  97%|█████████▋| 183/189 [04:57<00:09,  1.62s/it, loss=7.92]

Epoch 4/5:  97%|█████████▋| 184/189 [04:57<00:08,  1.64s/it, loss=7.92]

Epoch 4/5:  97%|█████████▋| 184/189 [04:59<00:08,  1.64s/it, loss=7.97]

Epoch 4/5:  98%|█████████▊| 185/189 [04:59<00:06,  1.66s/it, loss=7.97]

Epoch 4/5:  98%|█████████▊| 185/189 [05:00<00:06,  1.66s/it, loss=7.78]

Epoch 4/5:  98%|█████████▊| 186/189 [05:00<00:05,  1.67s/it, loss=7.78]

Epoch 4/5:  98%|█████████▊| 186/189 [05:02<00:05,  1.67s/it, loss=7.72]

Epoch 4/5:  99%|█████████▉| 187/189 [05:02<00:03,  1.57s/it, loss=7.72]

Epoch 4/5:  99%|█████████▉| 187/189 [05:03<00:03,  1.57s/it, loss=7.84]

Epoch 4/5:  99%|█████████▉| 188/189 [05:03<00:01,  1.55s/it, loss=7.84]

Epoch 4/5:  99%|█████████▉| 188/189 [05:05<00:01,  1.55s/it, loss=7.76]

Epoch 4/5: 100%|██████████| 189/189 [05:05<00:00,  1.54s/it, loss=7.76]

Epoch 4/5: 100%|██████████| 189/189 [05:05<00:00,  1.61s/it, loss=7.76]




  0%|          | 0/23 [00:00<?, ?it/s]

  4%|▍         | 1/23 [00:00<00:07,  2.82it/s]

  9%|▊         | 2/23 [00:00<00:07,  2.96it/s]

 13%|█▎        | 3/23 [00:00<00:05,  3.48it/s]

 17%|█▋        | 4/23 [00:01<00:06,  2.97it/s]

 22%|██▏       | 5/23 [00:01<00:06,  2.89it/s]

 26%|██▌       | 6/23 [00:02<00:05,  2.87it/s]

 30%|███       | 7/23 [00:02<00:05,  2.90it/s]

 35%|███▍      | 8/23 [00:02<00:05,  2.97it/s]

 39%|███▉      | 9/23 [00:03<00:04,  3.02it/s]

 43%|████▎     | 10/23 [00:03<00:04,  2.97it/s]

 48%|████▊     | 11/23 [00:03<00:04,  2.92it/s]

 52%|█████▏    | 12/23 [00:04<00:03,  3.05it/s]

 57%|█████▋    | 13/23 [00:04<00:03,  2.98it/s]

 61%|██████    | 14/23 [00:04<00:03,  2.97it/s]

 65%|██████▌   | 15/23 [00:05<00:02,  2.88it/s]

 70%|██████▉   | 16/23 [00:05<00:02,  2.88it/s]

 74%|███████▍  | 17/23 [00:05<00:02,  2.81it/s]

 78%|███████▊  | 18/23 [00:06<00:01,  2.89it/s]

 83%|████████▎ | 19/23 [00:06<00:01,  2.94it/s]

 87%|████████▋ | 20/23 [00:06<00:00,  3.06it/s]

 91%|█████████▏| 21/23 [00:07<00:00,  3.05it/s]

 96%|█████████▌| 22/23 [00:07<00:00,  2.95it/s]

100%|██████████| 23/23 [00:07<00:00,  2.92it/s]

100%|██████████| 23/23 [00:07<00:00,  2.95it/s]





Epoch 4: train_loss=8.0537 | R@10=0.0265 | DCG@10=0.2720 | NDCG@10=0.0660


Epoch 5/5:   0%|          | 0/189 [00:00<?, ?it/s]

Epoch 5/5:   0%|          | 0/189 [00:01<?, ?it/s, loss=7.68]

Epoch 5/5:   1%|          | 1/189 [00:01<04:44,  1.51s/it, loss=7.68]

Epoch 5/5:   1%|          | 1/189 [00:02<04:44,  1.51s/it, loss=7.91]

Epoch 5/5:   1%|          | 2/189 [00:02<04:39,  1.49s/it, loss=7.91]

Epoch 5/5:   1%|          | 2/189 [00:04<04:39,  1.49s/it, loss=7.74]

Epoch 5/5:   2%|▏         | 3/189 [00:04<04:38,  1.50s/it, loss=7.74]

Epoch 5/5:   2%|▏         | 3/189 [00:05<04:38,  1.50s/it, loss=7.80]

Epoch 5/5:   2%|▏         | 4/189 [00:05<04:33,  1.48s/it, loss=7.80]

Epoch 5/5:   2%|▏         | 4/189 [00:07<04:33,  1.48s/it, loss=7.79]

Epoch 5/5:   3%|▎         | 5/189 [00:07<04:37,  1.51s/it, loss=7.79]

Epoch 5/5:   3%|▎         | 5/189 [00:09<04:37,  1.51s/it, loss=7.93]

Epoch 5/5:   3%|▎         | 6/189 [00:09<04:42,  1.55s/it, loss=7.93]

Epoch 5/5:   3%|▎         | 6/189 [00:10<04:42,  1.55s/it, loss=7.79]

Epoch 5/5:   4%|▎         | 7/189 [00:10<04:28,  1.47s/it, loss=7.79]

Epoch 5/5:   4%|▎         | 7/189 [00:12<04:28,  1.47s/it, loss=8.03]

Epoch 5/5:   4%|▍         | 8/189 [00:12<04:41,  1.55s/it, loss=8.03]

Epoch 5/5:   4%|▍         | 8/189 [00:13<04:41,  1.55s/it, loss=7.76]

Epoch 5/5:   5%|▍         | 9/189 [00:13<04:36,  1.54s/it, loss=7.76]

Epoch 5/5:   5%|▍         | 9/189 [00:15<04:36,  1.54s/it, loss=7.93]

Epoch 5/5:   5%|▌         | 10/189 [00:15<04:29,  1.51s/it, loss=7.93]

Epoch 5/5:   5%|▌         | 10/189 [00:16<04:29,  1.51s/it, loss=8.03]

Epoch 5/5:   6%|▌         | 11/189 [00:16<04:30,  1.52s/it, loss=8.03]

Epoch 5/5:   6%|▌         | 11/189 [00:18<04:30,  1.52s/it, loss=7.88]

Epoch 5/5:   6%|▋         | 12/189 [00:18<04:32,  1.54s/it, loss=7.88]

Epoch 5/5:   6%|▋         | 12/189 [00:19<04:32,  1.54s/it, loss=7.80]

Epoch 5/5:   7%|▋         | 13/189 [00:19<04:29,  1.53s/it, loss=7.80]

Epoch 5/5:   7%|▋         | 13/189 [00:21<04:29,  1.53s/it, loss=7.97]

Epoch 5/5:   7%|▋         | 14/189 [00:21<04:24,  1.51s/it, loss=7.97]

Epoch 5/5:   7%|▋         | 14/189 [00:22<04:24,  1.51s/it, loss=8.00]

Epoch 5/5:   8%|▊         | 15/189 [00:22<04:24,  1.52s/it, loss=8.00]

Epoch 5/5:   8%|▊         | 15/189 [00:24<04:24,  1.52s/it, loss=7.76]

Epoch 5/5:   8%|▊         | 16/189 [00:24<04:25,  1.54s/it, loss=7.76]

Epoch 5/5:   8%|▊         | 16/189 [00:25<04:25,  1.54s/it, loss=8.08]

Epoch 5/5:   9%|▉         | 17/189 [00:25<04:22,  1.53s/it, loss=8.08]

Epoch 5/5:   9%|▉         | 17/189 [00:27<04:22,  1.53s/it, loss=7.82]

Epoch 5/5:  10%|▉         | 18/189 [00:27<04:20,  1.53s/it, loss=7.82]

Epoch 5/5:  10%|▉         | 18/189 [00:28<04:20,  1.53s/it, loss=7.66]

Epoch 5/5:  10%|█         | 19/189 [00:28<04:24,  1.56s/it, loss=7.66]

Epoch 5/5:  10%|█         | 19/189 [00:30<04:24,  1.56s/it, loss=7.69]

Epoch 5/5:  11%|█         | 20/189 [00:30<04:22,  1.55s/it, loss=7.69]

Epoch 5/5:  11%|█         | 20/189 [00:32<04:22,  1.55s/it, loss=7.87]

Epoch 5/5:  11%|█         | 21/189 [00:32<04:22,  1.56s/it, loss=7.87]

Epoch 5/5:  11%|█         | 21/189 [00:33<04:22,  1.56s/it, loss=7.87]

Epoch 5/5:  12%|█▏        | 22/189 [00:33<04:21,  1.56s/it, loss=7.87]

Epoch 5/5:  12%|█▏        | 22/189 [00:35<04:21,  1.56s/it, loss=7.87]

Epoch 5/5:  12%|█▏        | 23/189 [00:35<04:16,  1.54s/it, loss=7.87]

Epoch 5/5:  12%|█▏        | 23/189 [00:36<04:16,  1.54s/it, loss=7.64]

Epoch 5/5:  13%|█▎        | 24/189 [00:36<04:13,  1.54s/it, loss=7.64]

Epoch 5/5:  13%|█▎        | 24/189 [00:38<04:13,  1.54s/it, loss=7.83]

Epoch 5/5:  13%|█▎        | 25/189 [00:38<04:14,  1.55s/it, loss=7.83]

Epoch 5/5:  13%|█▎        | 25/189 [00:39<04:14,  1.55s/it, loss=7.88]

Epoch 5/5:  14%|█▍        | 26/189 [00:39<04:12,  1.55s/it, loss=7.88]

Epoch 5/5:  14%|█▍        | 26/189 [00:41<04:12,  1.55s/it, loss=7.93]

Epoch 5/5:  14%|█▍        | 27/189 [00:41<04:12,  1.56s/it, loss=7.93]

Epoch 5/5:  14%|█▍        | 27/189 [00:42<04:12,  1.56s/it, loss=7.88]

Epoch 5/5:  15%|█▍        | 28/189 [00:42<04:06,  1.53s/it, loss=7.88]

Epoch 5/5:  15%|█▍        | 28/189 [00:44<04:06,  1.53s/it, loss=7.90]

Epoch 5/5:  15%|█▌        | 29/189 [00:44<03:49,  1.44s/it, loss=7.90]

Epoch 5/5:  15%|█▌        | 29/189 [00:45<03:49,  1.44s/it, loss=7.81]

Epoch 5/5:  16%|█▌        | 30/189 [00:45<03:56,  1.49s/it, loss=7.81]

Epoch 5/5:  16%|█▌        | 30/189 [00:47<03:56,  1.49s/it, loss=7.91]

Epoch 5/5:  16%|█▋        | 31/189 [00:47<04:01,  1.53s/it, loss=7.91]

Epoch 5/5:  16%|█▋        | 31/189 [00:49<04:01,  1.53s/it, loss=7.87]

Epoch 5/5:  17%|█▋        | 32/189 [00:49<04:06,  1.57s/it, loss=7.87]

Epoch 5/5:  17%|█▋        | 32/189 [00:50<04:06,  1.57s/it, loss=7.83]

Epoch 5/5:  17%|█▋        | 33/189 [00:50<04:02,  1.56s/it, loss=7.83]

Epoch 5/5:  17%|█▋        | 33/189 [00:52<04:02,  1.56s/it, loss=7.84]

Epoch 5/5:  18%|█▊        | 34/189 [00:52<04:04,  1.58s/it, loss=7.84]

Epoch 5/5:  18%|█▊        | 34/189 [00:53<04:04,  1.58s/it, loss=7.87]

Epoch 5/5:  19%|█▊        | 35/189 [00:53<03:57,  1.55s/it, loss=7.87]

Epoch 5/5:  19%|█▊        | 35/189 [00:55<03:57,  1.55s/it, loss=7.78]

Epoch 5/5:  19%|█▉        | 36/189 [00:55<03:57,  1.55s/it, loss=7.78]

Epoch 5/5:  19%|█▉        | 36/189 [00:56<03:57,  1.55s/it, loss=7.81]

Epoch 5/5:  20%|█▉        | 37/189 [00:56<04:05,  1.62s/it, loss=7.81]

Epoch 5/5:  20%|█▉        | 37/189 [00:58<04:05,  1.62s/it, loss=7.78]

Epoch 5/5:  20%|██        | 38/189 [00:58<04:05,  1.63s/it, loss=7.78]

Epoch 5/5:  20%|██        | 38/189 [01:00<04:05,  1.63s/it, loss=7.81]

Epoch 5/5:  21%|██        | 39/189 [01:00<04:05,  1.64s/it, loss=7.81]

Epoch 5/5:  21%|██        | 39/189 [01:01<04:05,  1.64s/it, loss=7.80]

Epoch 5/5:  21%|██        | 40/189 [01:01<04:03,  1.63s/it, loss=7.80]

Epoch 5/5:  21%|██        | 40/189 [01:03<04:03,  1.63s/it, loss=7.67]

Epoch 5/5:  22%|██▏       | 41/189 [01:03<04:01,  1.63s/it, loss=7.67]

Epoch 5/5:  22%|██▏       | 41/189 [01:05<04:01,  1.63s/it, loss=7.80]

Epoch 5/5:  22%|██▏       | 42/189 [01:05<04:04,  1.67s/it, loss=7.80]

Epoch 5/5:  22%|██▏       | 42/189 [01:06<04:04,  1.67s/it, loss=7.76]

Epoch 5/5:  23%|██▎       | 43/189 [01:06<04:03,  1.67s/it, loss=7.76]

Epoch 5/5:  23%|██▎       | 43/189 [01:08<04:03,  1.67s/it, loss=7.85]

Epoch 5/5:  23%|██▎       | 44/189 [01:08<04:02,  1.67s/it, loss=7.85]

Epoch 5/5:  23%|██▎       | 44/189 [01:10<04:02,  1.67s/it, loss=7.73]

Epoch 5/5:  24%|██▍       | 45/189 [01:10<04:00,  1.67s/it, loss=7.73]

Epoch 5/5:  24%|██▍       | 45/189 [01:11<04:00,  1.67s/it, loss=7.94]

Epoch 5/5:  24%|██▍       | 46/189 [01:11<03:57,  1.66s/it, loss=7.94]

Epoch 5/5:  24%|██▍       | 46/189 [01:13<03:57,  1.66s/it, loss=7.81]

Epoch 5/5:  25%|██▍       | 47/189 [01:13<03:55,  1.66s/it, loss=7.81]

Epoch 5/5:  25%|██▍       | 47/189 [01:15<03:55,  1.66s/it, loss=7.85]

Epoch 5/5:  25%|██▌       | 48/189 [01:15<03:59,  1.70s/it, loss=7.85]

Epoch 5/5:  25%|██▌       | 48/189 [01:17<03:59,  1.70s/it, loss=7.63]

Epoch 5/5:  26%|██▌       | 49/189 [01:17<04:00,  1.72s/it, loss=7.63]

Epoch 5/5:  26%|██▌       | 49/189 [01:18<04:00,  1.72s/it, loss=7.65]

Epoch 5/5:  26%|██▋       | 50/189 [01:18<03:59,  1.72s/it, loss=7.65]

Epoch 5/5:  26%|██▋       | 50/189 [01:20<03:59,  1.72s/it, loss=7.83]

Epoch 5/5:  27%|██▋       | 51/189 [01:20<03:55,  1.70s/it, loss=7.83]

Epoch 5/5:  27%|██▋       | 51/189 [01:22<03:55,  1.70s/it, loss=7.81]

Epoch 5/5:  28%|██▊       | 52/189 [01:22<03:57,  1.73s/it, loss=7.81]

Epoch 5/5:  28%|██▊       | 52/189 [01:23<03:57,  1.73s/it, loss=7.69]

Epoch 5/5:  28%|██▊       | 53/189 [01:23<03:51,  1.70s/it, loss=7.69]

Epoch 5/5:  28%|██▊       | 53/189 [01:25<03:51,  1.70s/it, loss=7.81]

Epoch 5/5:  29%|██▊       | 54/189 [01:25<03:51,  1.71s/it, loss=7.81]

Epoch 5/5:  29%|██▊       | 54/189 [01:27<03:51,  1.71s/it, loss=7.74]

Epoch 5/5:  29%|██▉       | 55/189 [01:27<03:44,  1.67s/it, loss=7.74]

Epoch 5/5:  29%|██▉       | 55/189 [01:28<03:44,  1.67s/it, loss=7.67]

Epoch 5/5:  30%|██▉       | 56/189 [01:28<03:37,  1.63s/it, loss=7.67]

Epoch 5/5:  30%|██▉       | 56/189 [01:30<03:37,  1.63s/it, loss=7.84]

Epoch 5/5:  30%|███       | 57/189 [01:30<03:38,  1.65s/it, loss=7.84]

Epoch 5/5:  30%|███       | 57/189 [01:32<03:38,  1.65s/it, loss=7.85]

Epoch 5/5:  31%|███       | 58/189 [01:32<03:37,  1.66s/it, loss=7.85]

Epoch 5/5:  31%|███       | 58/189 [01:33<03:37,  1.66s/it, loss=7.93]

Epoch 5/5:  31%|███       | 59/189 [01:33<03:33,  1.65s/it, loss=7.93]

Epoch 5/5:  31%|███       | 59/189 [01:35<03:33,  1.65s/it, loss=7.71]

Epoch 5/5:  32%|███▏      | 60/189 [01:35<03:30,  1.63s/it, loss=7.71]

Epoch 5/5:  32%|███▏      | 60/189 [01:36<03:30,  1.63s/it, loss=7.58]

Epoch 5/5:  32%|███▏      | 61/189 [01:36<03:26,  1.61s/it, loss=7.58]

Epoch 5/5:  32%|███▏      | 61/189 [01:38<03:26,  1.61s/it, loss=7.78]

Epoch 5/5:  33%|███▎      | 62/189 [01:38<03:26,  1.63s/it, loss=7.78]

Epoch 5/5:  33%|███▎      | 62/189 [01:40<03:26,  1.63s/it, loss=7.76]

Epoch 5/5:  33%|███▎      | 63/189 [01:40<03:22,  1.61s/it, loss=7.76]

Epoch 5/5:  33%|███▎      | 63/189 [01:41<03:22,  1.61s/it, loss=7.74]

Epoch 5/5:  34%|███▍      | 64/189 [01:41<03:23,  1.63s/it, loss=7.74]

Epoch 5/5:  34%|███▍      | 64/189 [01:43<03:23,  1.63s/it, loss=7.81]

Epoch 5/5:  34%|███▍      | 65/189 [01:43<03:24,  1.65s/it, loss=7.81]

Epoch 5/5:  34%|███▍      | 65/189 [01:45<03:24,  1.65s/it, loss=7.62]

Epoch 5/5:  35%|███▍      | 66/189 [01:45<03:22,  1.64s/it, loss=7.62]

Epoch 5/5:  35%|███▍      | 66/189 [01:46<03:22,  1.64s/it, loss=7.59]

Epoch 5/5:  35%|███▌      | 67/189 [01:46<03:18,  1.63s/it, loss=7.59]

Epoch 5/5:  35%|███▌      | 67/189 [01:48<03:18,  1.63s/it, loss=7.66]

Epoch 5/5:  36%|███▌      | 68/189 [01:48<03:17,  1.63s/it, loss=7.66]

Epoch 5/5:  36%|███▌      | 68/189 [01:49<03:17,  1.63s/it, loss=7.72]

Epoch 5/5:  37%|███▋      | 69/189 [01:49<03:13,  1.61s/it, loss=7.72]

Epoch 5/5:  37%|███▋      | 69/189 [01:51<03:13,  1.61s/it, loss=7.85]

Epoch 5/5:  37%|███▋      | 70/189 [01:51<03:14,  1.64s/it, loss=7.85]

Epoch 5/5:  37%|███▋      | 70/189 [01:53<03:14,  1.64s/it, loss=7.71]

Epoch 5/5:  38%|███▊      | 71/189 [01:53<03:13,  1.64s/it, loss=7.71]

Epoch 5/5:  38%|███▊      | 71/189 [01:54<03:13,  1.64s/it, loss=7.71]

Epoch 5/5:  38%|███▊      | 72/189 [01:54<03:07,  1.61s/it, loss=7.71]

Epoch 5/5:  38%|███▊      | 72/189 [01:56<03:07,  1.61s/it, loss=7.80]

Epoch 5/5:  39%|███▊      | 73/189 [01:56<03:09,  1.64s/it, loss=7.80]

Epoch 5/5:  39%|███▊      | 73/189 [01:58<03:09,  1.64s/it, loss=7.79]

Epoch 5/5:  39%|███▉      | 74/189 [01:58<03:05,  1.62s/it, loss=7.79]

Epoch 5/5:  39%|███▉      | 74/189 [01:59<03:05,  1.62s/it, loss=7.75]

Epoch 5/5:  40%|███▉      | 75/189 [01:59<03:03,  1.61s/it, loss=7.75]

Epoch 5/5:  40%|███▉      | 75/189 [02:01<03:03,  1.61s/it, loss=7.79]

Epoch 5/5:  40%|████      | 76/189 [02:01<03:04,  1.63s/it, loss=7.79]

Epoch 5/5:  40%|████      | 76/189 [02:02<03:04,  1.63s/it, loss=7.69]

Epoch 5/5:  41%|████      | 77/189 [02:02<02:57,  1.59s/it, loss=7.69]

Epoch 5/5:  41%|████      | 77/189 [02:04<02:57,  1.59s/it, loss=7.83]

Epoch 5/5:  41%|████▏     | 78/189 [02:04<02:58,  1.60s/it, loss=7.83]

Epoch 5/5:  41%|████▏     | 78/189 [02:06<02:58,  1.60s/it, loss=7.76]

Epoch 5/5:  42%|████▏     | 79/189 [02:06<03:02,  1.66s/it, loss=7.76]

Epoch 5/5:  42%|████▏     | 79/189 [02:08<03:02,  1.66s/it, loss=7.67]

Epoch 5/5:  42%|████▏     | 80/189 [02:08<03:03,  1.68s/it, loss=7.67]

Epoch 5/5:  42%|████▏     | 80/189 [02:09<03:03,  1.68s/it, loss=7.84]

Epoch 5/5:  43%|████▎     | 81/189 [02:09<03:01,  1.68s/it, loss=7.84]

Epoch 5/5:  43%|████▎     | 81/189 [02:11<03:01,  1.68s/it, loss=8.03]

Epoch 5/5:  43%|████▎     | 82/189 [02:11<03:02,  1.70s/it, loss=8.03]

Epoch 5/5:  43%|████▎     | 82/189 [02:13<03:02,  1.70s/it, loss=7.73]

Epoch 5/5:  44%|████▍     | 83/189 [02:13<02:54,  1.64s/it, loss=7.73]

Epoch 5/5:  44%|████▍     | 83/189 [02:14<02:54,  1.64s/it, loss=7.82]

Epoch 5/5:  44%|████▍     | 84/189 [02:14<02:53,  1.65s/it, loss=7.82]

Epoch 5/5:  44%|████▍     | 84/189 [02:16<02:53,  1.65s/it, loss=7.69]

Epoch 5/5:  45%|████▍     | 85/189 [02:16<02:49,  1.63s/it, loss=7.69]

Epoch 5/5:  45%|████▍     | 85/189 [02:17<02:49,  1.63s/it, loss=7.72]

Epoch 5/5:  46%|████▌     | 86/189 [02:17<02:44,  1.60s/it, loss=7.72]

Epoch 5/5:  46%|████▌     | 86/189 [02:19<02:44,  1.60s/it, loss=7.68]

Epoch 5/5:  46%|████▌     | 87/189 [02:19<02:41,  1.58s/it, loss=7.68]

Epoch 5/5:  46%|████▌     | 87/189 [02:20<02:41,  1.58s/it, loss=7.72]

Epoch 5/5:  47%|████▋     | 88/189 [02:20<02:41,  1.60s/it, loss=7.72]

Epoch 5/5:  47%|████▋     | 88/189 [02:22<02:41,  1.60s/it, loss=7.60]

Epoch 5/5:  47%|████▋     | 89/189 [02:22<02:40,  1.60s/it, loss=7.60]

Epoch 5/5:  47%|████▋     | 89/189 [02:24<02:40,  1.60s/it, loss=7.75]

Epoch 5/5:  48%|████▊     | 90/189 [02:24<02:42,  1.64s/it, loss=7.75]

Epoch 5/5:  48%|████▊     | 90/189 [02:25<02:42,  1.64s/it, loss=7.84]

Epoch 5/5:  48%|████▊     | 91/189 [02:25<02:39,  1.63s/it, loss=7.84]

Epoch 5/5:  48%|████▊     | 91/189 [02:27<02:39,  1.63s/it, loss=7.83]

Epoch 5/5:  49%|████▊     | 92/189 [02:27<02:28,  1.53s/it, loss=7.83]

Epoch 5/5:  49%|████▊     | 92/189 [02:28<02:28,  1.53s/it, loss=7.78]

Epoch 5/5:  49%|████▉     | 93/189 [02:28<02:30,  1.57s/it, loss=7.78]

Epoch 5/5:  49%|████▉     | 93/189 [02:30<02:30,  1.57s/it, loss=7.77]

Epoch 5/5:  50%|████▉     | 94/189 [02:30<02:33,  1.61s/it, loss=7.77]

Epoch 5/5:  50%|████▉     | 94/189 [02:32<02:33,  1.61s/it, loss=7.85]

Epoch 5/5:  50%|█████     | 95/189 [02:32<02:31,  1.62s/it, loss=7.85]

Epoch 5/5:  50%|█████     | 95/189 [02:33<02:31,  1.62s/it, loss=7.67]

Epoch 5/5:  51%|█████     | 96/189 [02:33<02:26,  1.58s/it, loss=7.67]

Epoch 5/5:  51%|█████     | 96/189 [02:35<02:26,  1.58s/it, loss=7.77]

Epoch 5/5:  51%|█████▏    | 97/189 [02:35<02:21,  1.54s/it, loss=7.77]

Epoch 5/5:  51%|█████▏    | 97/189 [02:36<02:21,  1.54s/it, loss=7.68]

Epoch 5/5:  52%|█████▏    | 98/189 [02:36<02:23,  1.58s/it, loss=7.68]

Epoch 5/5:  52%|█████▏    | 98/189 [02:38<02:23,  1.58s/it, loss=7.81]

Epoch 5/5:  52%|█████▏    | 99/189 [02:38<02:25,  1.61s/it, loss=7.81]

Epoch 5/5:  52%|█████▏    | 99/189 [02:40<02:25,  1.61s/it, loss=7.78]

Epoch 5/5:  53%|█████▎    | 100/189 [02:40<02:25,  1.63s/it, loss=7.78]

Epoch 5/5:  53%|█████▎    | 100/189 [02:41<02:25,  1.63s/it, loss=7.66]

Epoch 5/5:  53%|█████▎    | 101/189 [02:41<02:24,  1.64s/it, loss=7.66]

Epoch 5/5:  53%|█████▎    | 101/189 [02:43<02:24,  1.64s/it, loss=7.66]

Epoch 5/5:  54%|█████▍    | 102/189 [02:43<02:21,  1.62s/it, loss=7.66]

Epoch 5/5:  54%|█████▍    | 102/189 [02:45<02:21,  1.62s/it, loss=7.67]

Epoch 5/5:  54%|█████▍    | 103/189 [02:45<02:22,  1.66s/it, loss=7.67]

Epoch 5/5:  54%|█████▍    | 103/189 [02:46<02:22,  1.66s/it, loss=7.71]

Epoch 5/5:  55%|█████▌    | 104/189 [02:46<02:18,  1.62s/it, loss=7.71]

Epoch 5/5:  55%|█████▌    | 104/189 [02:48<02:18,  1.62s/it, loss=7.64]

Epoch 5/5:  56%|█████▌    | 105/189 [02:48<02:18,  1.65s/it, loss=7.64]

Epoch 5/5:  56%|█████▌    | 105/189 [02:50<02:18,  1.65s/it, loss=7.69]

Epoch 5/5:  56%|█████▌    | 106/189 [02:50<02:16,  1.64s/it, loss=7.69]

Epoch 5/5:  56%|█████▌    | 106/189 [02:51<02:16,  1.64s/it, loss=7.73]

Epoch 5/5:  57%|█████▋    | 107/189 [02:51<02:13,  1.63s/it, loss=7.73]

Epoch 5/5:  57%|█████▋    | 107/189 [02:53<02:13,  1.63s/it, loss=7.70]

Epoch 5/5:  57%|█████▋    | 108/189 [02:53<02:08,  1.58s/it, loss=7.70]

Epoch 5/5:  57%|█████▋    | 108/189 [02:54<02:08,  1.58s/it, loss=7.76]

Epoch 5/5:  58%|█████▊    | 109/189 [02:54<02:04,  1.55s/it, loss=7.76]

Epoch 5/5:  58%|█████▊    | 109/189 [02:56<02:04,  1.55s/it, loss=7.62]

Epoch 5/5:  58%|█████▊    | 110/189 [02:56<02:02,  1.56s/it, loss=7.62]

Epoch 5/5:  58%|█████▊    | 110/189 [02:57<02:02,  1.56s/it, loss=7.79]

Epoch 5/5:  59%|█████▊    | 111/189 [02:57<02:00,  1.54s/it, loss=7.79]

Epoch 5/5:  59%|█████▊    | 111/189 [02:59<02:00,  1.54s/it, loss=7.76]

Epoch 5/5:  59%|█████▉    | 112/189 [02:59<01:58,  1.54s/it, loss=7.76]

Epoch 5/5:  59%|█████▉    | 112/189 [03:00<01:58,  1.54s/it, loss=7.63]

Epoch 5/5:  60%|█████▉    | 113/189 [03:00<01:58,  1.56s/it, loss=7.63]

Epoch 5/5:  60%|█████▉    | 113/189 [03:02<01:58,  1.56s/it, loss=7.92]

Epoch 5/5:  60%|██████    | 114/189 [03:02<01:54,  1.52s/it, loss=7.92]

Epoch 5/5:  60%|██████    | 114/189 [03:03<01:54,  1.52s/it, loss=7.59]

Epoch 5/5:  61%|██████    | 115/189 [03:03<01:52,  1.52s/it, loss=7.59]

Epoch 5/5:  61%|██████    | 115/189 [03:05<01:52,  1.52s/it, loss=7.76]

Epoch 5/5:  61%|██████▏   | 116/189 [03:05<01:52,  1.54s/it, loss=7.76]

Epoch 5/5:  61%|██████▏   | 116/189 [03:06<01:52,  1.54s/it, loss=7.61]

Epoch 5/5:  62%|██████▏   | 117/189 [03:06<01:50,  1.53s/it, loss=7.61]

Epoch 5/5:  62%|██████▏   | 117/189 [03:08<01:50,  1.53s/it, loss=7.73]

Epoch 5/5:  62%|██████▏   | 118/189 [03:08<01:46,  1.50s/it, loss=7.73]

Epoch 5/5:  62%|██████▏   | 118/189 [03:09<01:46,  1.50s/it, loss=7.88]

Epoch 5/5:  63%|██████▎   | 119/189 [03:09<01:47,  1.54s/it, loss=7.88]

Epoch 5/5:  63%|██████▎   | 119/189 [03:11<01:47,  1.54s/it, loss=7.71]

Epoch 5/5:  63%|██████▎   | 120/189 [03:11<01:43,  1.50s/it, loss=7.71]

Epoch 5/5:  63%|██████▎   | 120/189 [03:12<01:43,  1.50s/it, loss=7.70]

Epoch 5/5:  64%|██████▍   | 121/189 [03:12<01:44,  1.54s/it, loss=7.70]

Epoch 5/5:  64%|██████▍   | 121/189 [03:14<01:44,  1.54s/it, loss=7.70]

Epoch 5/5:  65%|██████▍   | 122/189 [03:14<01:43,  1.55s/it, loss=7.70]

Epoch 5/5:  65%|██████▍   | 122/189 [03:16<01:43,  1.55s/it, loss=7.76]

Epoch 5/5:  65%|██████▌   | 123/189 [03:16<01:45,  1.60s/it, loss=7.76]

Epoch 5/5:  65%|██████▌   | 123/189 [03:17<01:45,  1.60s/it, loss=7.71]

Epoch 5/5:  66%|██████▌   | 124/189 [03:17<01:45,  1.62s/it, loss=7.71]

Epoch 5/5:  66%|██████▌   | 124/189 [03:19<01:45,  1.62s/it, loss=7.87]

Epoch 5/5:  66%|██████▌   | 125/189 [03:19<01:41,  1.59s/it, loss=7.87]

Epoch 5/5:  66%|██████▌   | 125/189 [03:21<01:41,  1.59s/it, loss=7.70]

Epoch 5/5:  67%|██████▋   | 126/189 [03:21<01:42,  1.62s/it, loss=7.70]

Epoch 5/5:  67%|██████▋   | 126/189 [03:22<01:42,  1.62s/it, loss=7.93]

Epoch 5/5:  67%|██████▋   | 127/189 [03:22<01:41,  1.63s/it, loss=7.93]

Epoch 5/5:  67%|██████▋   | 127/189 [03:24<01:41,  1.63s/it, loss=7.88]

Epoch 5/5:  68%|██████▊   | 128/189 [03:24<01:40,  1.65s/it, loss=7.88]

Epoch 5/5:  68%|██████▊   | 128/189 [03:26<01:40,  1.65s/it, loss=7.75]

Epoch 5/5:  68%|██████▊   | 129/189 [03:26<01:39,  1.65s/it, loss=7.75]

Epoch 5/5:  68%|██████▊   | 129/189 [03:27<01:39,  1.65s/it, loss=7.55]

Epoch 5/5:  69%|██████▉   | 130/189 [03:27<01:36,  1.63s/it, loss=7.55]

Epoch 5/5:  69%|██████▉   | 130/189 [03:29<01:36,  1.63s/it, loss=7.70]

Epoch 5/5:  69%|██████▉   | 131/189 [03:29<01:34,  1.62s/it, loss=7.70]

Epoch 5/5:  69%|██████▉   | 131/189 [03:30<01:34,  1.62s/it, loss=7.82]

Epoch 5/5:  70%|██████▉   | 132/189 [03:30<01:31,  1.60s/it, loss=7.82]

Epoch 5/5:  70%|██████▉   | 132/189 [03:32<01:31,  1.60s/it, loss=7.72]

Epoch 5/5:  70%|███████   | 133/189 [03:32<01:27,  1.56s/it, loss=7.72]

Epoch 5/5:  70%|███████   | 133/189 [03:34<01:27,  1.56s/it, loss=7.75]

Epoch 5/5:  71%|███████   | 134/189 [03:34<01:29,  1.62s/it, loss=7.75]

Epoch 5/5:  71%|███████   | 134/189 [03:35<01:29,  1.62s/it, loss=7.70]

Epoch 5/5:  71%|███████▏  | 135/189 [03:35<01:29,  1.65s/it, loss=7.70]

Epoch 5/5:  71%|███████▏  | 135/189 [03:37<01:29,  1.65s/it, loss=7.73]

Epoch 5/5:  72%|███████▏  | 136/189 [03:37<01:26,  1.63s/it, loss=7.73]

Epoch 5/5:  72%|███████▏  | 136/189 [03:39<01:26,  1.63s/it, loss=7.61]

Epoch 5/5:  72%|███████▏  | 137/189 [03:39<01:25,  1.64s/it, loss=7.61]

Epoch 5/5:  72%|███████▏  | 137/189 [03:40<01:25,  1.64s/it, loss=7.61]

Epoch 5/5:  73%|███████▎  | 138/189 [03:40<01:23,  1.64s/it, loss=7.61]

Epoch 5/5:  73%|███████▎  | 138/189 [03:42<01:23,  1.64s/it, loss=7.74]

Epoch 5/5:  74%|███████▎  | 139/189 [03:42<01:20,  1.61s/it, loss=7.74]

Epoch 5/5:  74%|███████▎  | 139/189 [03:43<01:20,  1.61s/it, loss=7.77]

Epoch 5/5:  74%|███████▍  | 140/189 [03:43<01:18,  1.60s/it, loss=7.77]

Epoch 5/5:  74%|███████▍  | 140/189 [03:45<01:18,  1.60s/it, loss=7.84]

Epoch 5/5:  75%|███████▍  | 141/189 [03:45<01:18,  1.64s/it, loss=7.84]

Epoch 5/5:  75%|███████▍  | 141/189 [03:47<01:18,  1.64s/it, loss=7.75]

Epoch 5/5:  75%|███████▌  | 142/189 [03:47<01:16,  1.63s/it, loss=7.75]

Epoch 5/5:  75%|███████▌  | 142/189 [03:48<01:16,  1.63s/it, loss=7.62]

Epoch 5/5:  76%|███████▌  | 143/189 [03:48<01:14,  1.61s/it, loss=7.62]

Epoch 5/5:  76%|███████▌  | 143/189 [03:50<01:14,  1.61s/it, loss=7.75]

Epoch 5/5:  76%|███████▌  | 144/189 [03:50<01:11,  1.59s/it, loss=7.75]

Epoch 5/5:  76%|███████▌  | 144/189 [03:51<01:11,  1.59s/it, loss=7.64]

Epoch 5/5:  77%|███████▋  | 145/189 [03:51<01:10,  1.60s/it, loss=7.64]

Epoch 5/5:  77%|███████▋  | 145/189 [03:53<01:10,  1.60s/it, loss=7.44]

Epoch 5/5:  77%|███████▋  | 146/189 [03:53<01:08,  1.59s/it, loss=7.44]

Epoch 5/5:  77%|███████▋  | 146/189 [03:55<01:08,  1.59s/it, loss=7.66]

Epoch 5/5:  78%|███████▊  | 147/189 [03:55<01:06,  1.59s/it, loss=7.66]

Epoch 5/5:  78%|███████▊  | 147/189 [03:56<01:06,  1.59s/it, loss=7.79]

Epoch 5/5:  78%|███████▊  | 148/189 [03:56<01:06,  1.63s/it, loss=7.79]

Epoch 5/5:  78%|███████▊  | 148/189 [03:58<01:06,  1.63s/it, loss=7.79]

Epoch 5/5:  79%|███████▉  | 149/189 [03:58<01:05,  1.65s/it, loss=7.79]

Epoch 5/5:  79%|███████▉  | 149/189 [04:00<01:05,  1.65s/it, loss=7.73]

Epoch 5/5:  79%|███████▉  | 150/189 [04:00<01:04,  1.66s/it, loss=7.73]

Epoch 5/5:  79%|███████▉  | 150/189 [04:01<01:04,  1.66s/it, loss=7.68]

Epoch 5/5:  80%|███████▉  | 151/189 [04:01<01:03,  1.66s/it, loss=7.68]

Epoch 5/5:  80%|███████▉  | 151/189 [04:03<01:03,  1.66s/it, loss=7.79]

Epoch 5/5:  80%|████████  | 152/189 [04:03<01:01,  1.67s/it, loss=7.79]

Epoch 5/5:  80%|████████  | 152/189 [04:05<01:01,  1.67s/it, loss=7.63]

Epoch 5/5:  81%|████████  | 153/189 [04:05<01:00,  1.68s/it, loss=7.63]

Epoch 5/5:  81%|████████  | 153/189 [04:06<01:00,  1.68s/it, loss=7.74]

Epoch 5/5:  81%|████████▏ | 154/189 [04:06<00:58,  1.67s/it, loss=7.74]

Epoch 5/5:  81%|████████▏ | 154/189 [04:08<00:58,  1.67s/it, loss=7.77]

Epoch 5/5:  82%|████████▏ | 155/189 [04:08<00:57,  1.70s/it, loss=7.77]

Epoch 5/5:  82%|████████▏ | 155/189 [04:10<00:57,  1.70s/it, loss=7.56]

Epoch 5/5:  83%|████████▎ | 156/189 [04:10<00:55,  1.68s/it, loss=7.56]

Epoch 5/5:  83%|████████▎ | 156/189 [04:11<00:55,  1.68s/it, loss=7.86]

Epoch 5/5:  83%|████████▎ | 157/189 [04:11<00:51,  1.62s/it, loss=7.86]

Epoch 5/5:  83%|████████▎ | 157/189 [04:13<00:51,  1.62s/it, loss=7.72]

Epoch 5/5:  84%|████████▎ | 158/189 [04:13<00:50,  1.63s/it, loss=7.72]

Epoch 5/5:  84%|████████▎ | 158/189 [04:14<00:50,  1.63s/it, loss=7.66]

Epoch 5/5:  84%|████████▍ | 159/189 [04:14<00:48,  1.62s/it, loss=7.66]

Epoch 5/5:  84%|████████▍ | 159/189 [04:16<00:48,  1.62s/it, loss=7.76]

Epoch 5/5:  85%|████████▍ | 160/189 [04:16<00:45,  1.58s/it, loss=7.76]

Epoch 5/5:  85%|████████▍ | 160/189 [04:18<00:45,  1.58s/it, loss=7.86]

Epoch 5/5:  85%|████████▌ | 161/189 [04:18<00:44,  1.61s/it, loss=7.86]

Epoch 5/5:  85%|████████▌ | 161/189 [04:19<00:44,  1.61s/it, loss=7.72]

Epoch 5/5:  86%|████████▌ | 162/189 [04:19<00:41,  1.55s/it, loss=7.72]

Epoch 5/5:  86%|████████▌ | 162/189 [04:21<00:41,  1.55s/it, loss=7.62]

Epoch 5/5:  86%|████████▌ | 163/189 [04:21<00:40,  1.55s/it, loss=7.62]

Epoch 5/5:  86%|████████▌ | 163/189 [04:22<00:40,  1.55s/it, loss=7.76]

Epoch 5/5:  87%|████████▋ | 164/189 [04:22<00:39,  1.57s/it, loss=7.76]

Epoch 5/5:  87%|████████▋ | 164/189 [04:24<00:39,  1.57s/it, loss=7.89]

Epoch 5/5:  87%|████████▋ | 165/189 [04:24<00:37,  1.55s/it, loss=7.89]

Epoch 5/5:  87%|████████▋ | 165/189 [04:25<00:37,  1.55s/it, loss=7.70]

Epoch 5/5:  88%|████████▊ | 166/189 [04:25<00:36,  1.57s/it, loss=7.70]

Epoch 5/5:  88%|████████▊ | 166/189 [04:27<00:36,  1.57s/it, loss=7.61]

Epoch 5/5:  88%|████████▊ | 167/189 [04:27<00:34,  1.56s/it, loss=7.61]

Epoch 5/5:  88%|████████▊ | 167/189 [04:29<00:34,  1.56s/it, loss=7.61]

Epoch 5/5:  89%|████████▉ | 168/189 [04:29<00:33,  1.60s/it, loss=7.61]

Epoch 5/5:  89%|████████▉ | 168/189 [04:30<00:33,  1.60s/it, loss=7.61]

Epoch 5/5:  89%|████████▉ | 169/189 [04:30<00:32,  1.63s/it, loss=7.61]

Epoch 5/5:  89%|████████▉ | 169/189 [04:32<00:32,  1.63s/it, loss=7.71]

Epoch 5/5:  90%|████████▉ | 170/189 [04:32<00:30,  1.63s/it, loss=7.71]

Epoch 5/5:  90%|████████▉ | 170/189 [04:34<00:30,  1.63s/it, loss=7.57]

Epoch 5/5:  90%|█████████ | 171/189 [04:34<00:29,  1.66s/it, loss=7.57]

Epoch 5/5:  90%|█████████ | 171/189 [04:35<00:29,  1.66s/it, loss=7.71]

Epoch 5/5:  91%|█████████ | 172/189 [04:35<00:27,  1.64s/it, loss=7.71]

Epoch 5/5:  91%|█████████ | 172/189 [04:37<00:27,  1.64s/it, loss=7.65]

Epoch 5/5:  92%|█████████▏| 173/189 [04:37<00:26,  1.65s/it, loss=7.65]

Epoch 5/5:  92%|█████████▏| 173/189 [04:39<00:26,  1.65s/it, loss=7.61]

Epoch 5/5:  92%|█████████▏| 174/189 [04:39<00:24,  1.64s/it, loss=7.61]

Epoch 5/5:  92%|█████████▏| 174/189 [04:40<00:24,  1.64s/it, loss=7.82]

Epoch 5/5:  93%|█████████▎| 175/189 [04:40<00:22,  1.57s/it, loss=7.82]

Epoch 5/5:  93%|█████████▎| 175/189 [04:41<00:22,  1.57s/it, loss=7.57]

Epoch 5/5:  93%|█████████▎| 176/189 [04:41<00:20,  1.56s/it, loss=7.57]

Epoch 5/5:  93%|█████████▎| 176/189 [04:43<00:20,  1.56s/it, loss=7.75]

Epoch 5/5:  94%|█████████▎| 177/189 [04:43<00:19,  1.59s/it, loss=7.75]

Epoch 5/5:  94%|█████████▎| 177/189 [04:45<00:19,  1.59s/it, loss=7.71]

Epoch 5/5:  94%|█████████▍| 178/189 [04:45<00:17,  1.60s/it, loss=7.71]

Epoch 5/5:  94%|█████████▍| 178/189 [04:47<00:17,  1.60s/it, loss=7.75]

Epoch 5/5:  95%|█████████▍| 179/189 [04:47<00:16,  1.65s/it, loss=7.75]

Epoch 5/5:  95%|█████████▍| 179/189 [04:48<00:16,  1.65s/it, loss=7.66]

Epoch 5/5:  95%|█████████▌| 180/189 [04:48<00:14,  1.66s/it, loss=7.66]

Epoch 5/5:  95%|█████████▌| 180/189 [04:50<00:14,  1.66s/it, loss=7.64]

Epoch 5/5:  96%|█████████▌| 181/189 [04:50<00:12,  1.59s/it, loss=7.64]

Epoch 5/5:  96%|█████████▌| 181/189 [04:51<00:12,  1.59s/it, loss=7.60]

Epoch 5/5:  96%|█████████▋| 182/189 [04:51<00:11,  1.60s/it, loss=7.60]

Epoch 5/5:  96%|█████████▋| 182/189 [04:53<00:11,  1.60s/it, loss=7.72]

Epoch 5/5:  97%|█████████▋| 183/189 [04:53<00:09,  1.58s/it, loss=7.72]

Epoch 5/5:  97%|█████████▋| 183/189 [04:54<00:09,  1.58s/it, loss=7.76]

Epoch 5/5:  97%|█████████▋| 184/189 [04:54<00:07,  1.56s/it, loss=7.76]

Epoch 5/5:  97%|█████████▋| 184/189 [04:56<00:07,  1.56s/it, loss=7.67]

Epoch 5/5:  98%|█████████▊| 185/189 [04:56<00:06,  1.60s/it, loss=7.67]

Epoch 5/5:  98%|█████████▊| 185/189 [04:58<00:06,  1.60s/it, loss=7.57]

Epoch 5/5:  98%|█████████▊| 186/189 [04:58<00:04,  1.61s/it, loss=7.57]

Epoch 5/5:  98%|█████████▊| 186/189 [04:59<00:04,  1.61s/it, loss=7.77]

Epoch 5/5:  99%|█████████▉| 187/189 [04:59<00:03,  1.62s/it, loss=7.77]

Epoch 5/5:  99%|█████████▉| 187/189 [05:01<00:03,  1.62s/it, loss=7.66]

Epoch 5/5:  99%|█████████▉| 188/189 [05:01<00:01,  1.58s/it, loss=7.66]

Epoch 5/5:  99%|█████████▉| 188/189 [05:02<00:01,  1.58s/it, loss=7.65]

Epoch 5/5: 100%|██████████| 189/189 [05:02<00:00,  1.53s/it, loss=7.65]

Epoch 5/5: 100%|██████████| 189/189 [05:02<00:00,  1.60s/it, loss=7.65]




  0%|          | 0/23 [00:00<?, ?it/s]

  4%|▍         | 1/23 [00:00<00:06,  3.64it/s]

  9%|▊         | 2/23 [00:00<00:06,  3.30it/s]

 13%|█▎        | 3/23 [00:00<00:06,  3.02it/s]

 17%|█▋        | 4/23 [00:01<00:06,  3.02it/s]

 22%|██▏       | 5/23 [00:01<00:05,  3.16it/s]

 26%|██▌       | 6/23 [00:01<00:05,  2.89it/s]

 30%|███       | 7/23 [00:02<00:05,  3.01it/s]

 35%|███▍      | 8/23 [00:02<00:05,  2.90it/s]

 39%|███▉      | 9/23 [00:03<00:05,  2.70it/s]

 43%|████▎     | 10/23 [00:03<00:04,  2.83it/s]

 48%|████▊     | 11/23 [00:03<00:04,  2.82it/s]

 52%|█████▏    | 12/23 [00:04<00:03,  2.88it/s]

 57%|█████▋    | 13/23 [00:04<00:03,  3.19it/s]

 61%|██████    | 14/23 [00:04<00:02,  3.18it/s]

 65%|██████▌   | 15/23 [00:05<00:02,  2.98it/s]

 70%|██████▉   | 16/23 [00:05<00:02,  3.16it/s]

 74%|███████▍  | 17/23 [00:05<00:01,  3.10it/s]

 78%|███████▊  | 18/23 [00:05<00:01,  3.07it/s]

 83%|████████▎ | 19/23 [00:06<00:01,  3.23it/s]

 87%|████████▋ | 20/23 [00:06<00:00,  3.15it/s]

 91%|█████████▏| 21/23 [00:06<00:00,  3.09it/s]

 96%|█████████▌| 22/23 [00:07<00:00,  3.16it/s]

100%|██████████| 23/23 [00:07<00:00,  3.06it/s]

100%|██████████| 23/23 [00:07<00:00,  3.04it/s]


Epoch 5: train_loss=7.7534 | R@10=0.0242 | DCG@10=0.2592 | NDCG@10=0.0612





In [None]:
metrics

{'recall@5': 0.01575394982720289,
 'recall@10': 0.0241689173872425,
 'recall@20': 0.04311617966492117,
 'precision@5': 0.05770308123249291,
 'precision@10': 0.05546218487394955,
 'precision@20': 0.05042016806722694,
 'dcg@5': 0.17425028015585506,
 'dcg@10': 0.2591816378073866,
 'dcg@20': 0.37284562258827253,
 'ndcg@5': 0.06126104414152299,
 'ndcg@10': 0.061165418642456525,
 'ndcg@20': 0.06258607128889172,
 'mrr': 0.13528584705055305}

The model shows promising results—metrics are improving across epochs and beginning to converge. I'd like to scale up with larger embedding dimensions and more training epochs, but I'm limited by compute in this environment.

One challenge I've encountered is the difficulty of fairly evaluating recommendation models. There's significant variation in how metrics are calculated across papers and implementations, making it hard to compare results directly. I plan to dig deeper into this topic in a future post.

## Cleanup

Remove the rec repo so that the notebook runs end to end on restart, and unneccesary files are removed from the blog.

In [None]:
!rm -rf rec

In [None]:
!rm -rf ../../assets/movielens_rec_data

# Conclusion

This post presents a GPTRec implementation using my `rec` framework. The key contributions are:

- A minimal, reproducible PyTorch implementation of GPTRec, contrasting with the [original version](https://github.com/asash/gptrec_rl) which is more general but implemented in TensorFlow
- A successful demonstration of the `rec` framework's capabilities

The model shows reasonable performance, validating that the architecture is implemented correctly.

Looking ahead, I'd like to integrate sequential models into the `rec` framework. The framework currently supports a Retrieval → Ranking pipeline via the [train_all script](https://github.com/AndrewBoney/rec/blob/main/rec/train_all.py). I'm considering two approaches:

1. **Three-stage pipeline** (Retrieval → Sequential → Ranking): The ranking model would either become a hybrid combining traditional ranking with sequential signals, or incorporate the sequential model's logits into the ranking embeddings.

2. **Sequential ranking**: Replace the ranking stage with a sequential model that also leverages user/item features. This aligns with industry trends—see Meta's recent work on [sequence learning for personalized recommendations](https://engineering.fb.com/2024/11/19/data-infrastructure/sequence-learning-personalized-ads-recommendations/).

Finally, a note on tooling: I found solveit's compute limitations frustrating for this implementation-heavy post, requiring many restarts. For future implementation work, I'll likely develop locally and reserve solveit for paper reviews and lighter research tasks.