## Training Notebook for BirdCLEF2022 ##

[Original baseline by Kaerururu](https://www.kaggle.com/code/kaerunantoka/birdclef2022-use-2nd-label-f0/notebook)  
[That was forked from Kidehisa Arai (2021 comp)](https://www.kaggle.com/code/hidehisaarai1213/pytorch-inference-birdclef2021-starter/notebook)  
[My inference notebook](https://www.kaggle.com/code/ollypowell/birdclef2022-ex005-f0-infer/)  
[Original Infernence fork](https://www.kaggle.com/code/kaerunantoka/birdclef2022-ex005-f0-infer/notebook)

Data:  
https://www.kaggle.com/kaerunantoka/birdclef2022-audio-to-numpy-1-4  
https://www.kaggle.com/kaerunantoka/birdclef2022-audio-to-numpy-2-4  
https://www.kaggle.com/kaerunantoka/birdclef2022-audio-to-numpy-3-4  
https://www.kaggle.com/kaerunantoka/birdclef2022-audio-to-numpy-4-4  


**My Strategy:**  
1. Set up the training notebook on my own GPU, so no time limits
2. Run with all folds, while I work on next step.  Run inference on Kaggle once the models are ready.
3. Improve my CV score as much as I can by doing data augmentation, starting with the basic strategies below from Shinmaru, use 128 image size to speed up if need be.  
2. Train on additional folds
3. Fine tune the inference threshold (Should be in the vacinity of .005 to 0.1)

[**Basic Augmentation strategies, suggested to be useful by Shinmaru:**](https://www.kaggle.com/competitions/birdclef-2022/discussion/324318)

* Time shift
* Add pink noise and brown noise
* Mix other audio dataset

[Good notebook on this topic by Hidehisa Arai](https://www.kaggle.com/code/hidehisaarai1213/rfcx-audio-data-augmentation-japanese-english)  
[Time and noise only covered by Shinmaru](https://www.kaggle.com/code/shinmurashinmura/birdclef2022-basic-augmentation/notebook)

[**More advanced ideas (also suggested to work from Shinmaru)**](https://www.kaggle.com/competitions/birdclef-2022/discussion/307880)  

[SpecAugment](https://arxiv.org/abs/1904.08779)  
[SpecAugment++](https://arxiv.org/abs/2103.16858v3)  
[ImportantAug](https://arxiv.org/abs/2112.07156)

In [3]:
import os
import sys
sys.path.append('input/pytorch-image-models/pytorch-image-models-master')  # removed ../
import random
import time
import librosa
import colorednoise as cn
import numpy as np
import pandas as pd
import timm
import torch
import torch.nn as nn
import torch.nn.functional as F
from sklearn.model_selection import StratifiedKFold
from sklearn import metrics
from torchlibrosa.augmentation import SpecAugmentation
from tqdm import tqdm
import ast
import glob 
import albumentations as A
import transformers
from torch.cuda.amp import autocast, GradScaler

In [4]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

print('Setup complete. Using torch %s %s' % (torch.__version__, torch.cuda.get_device_properties(0) if torch.cuda.is_available() else 'CPU'))

Setup complete. Using torch 1.11.0 _CudaDeviceProperties(name='NVIDIA GeForce RTX 3060 Laptop GPU', major=8, minor=6, total_memory=5946MB, multi_processor_count=30)


In [5]:
# Changed the paths to suit my own filenames
all_path = glob.glob('input/train_np_1/*/*.npy')\
+ glob.glob('input/train_np_2/*/*.npy')\
+ glob.glob('input/train_np_3/*/*.npy')\
+ glob.glob('input/train_np_4/*/*.npy')

len(all_path)

14852

In [6]:
train = pd.read_csv('input/birdclef-2022/train_metadata.csv')  #Changed filepath to suit

train['new_target'] = train['primary_label'] + ' ' + train['secondary_labels'].map(lambda x: ' '.join(ast.literal_eval(x)))
train['len_new_target'] = train['new_target'].map(lambda x: len(x.split()))
# train['len_new_target'].value_counts()
train.head()

Unnamed: 0,primary_label,secondary_labels,type,latitude,longitude,scientific_name,common_name,author,license,rating,time,url,filename,new_target,len_new_target
0,afrsil1,[],"['call', 'flight call']",12.391,-1.493,Euodice cantans,African Silverbill,Bram Piot,Creative Commons Attribution-NonCommercial-Sha...,2.5,08:00,https://www.xeno-canto.org/125458,afrsil1/XC125458.ogg,afrsil1,1
1,afrsil1,"['houspa', 'redava', 'zebdov']",['call'],19.8801,-155.7254,Euodice cantans,African Silverbill,Dan Lane,Creative Commons Attribution-NonCommercial-Sha...,3.5,08:30,https://www.xeno-canto.org/175522,afrsil1/XC175522.ogg,afrsil1 houspa redava zebdov,4
2,afrsil1,[],"['call', 'song']",16.2901,-16.0321,Euodice cantans,African Silverbill,Bram Piot,Creative Commons Attribution-NonCommercial-Sha...,4.0,11:30,https://www.xeno-canto.org/177993,afrsil1/XC177993.ogg,afrsil1,1
3,afrsil1,[],"['alarm call', 'call']",17.0922,54.2958,Euodice cantans,African Silverbill,Oscar Campbell,Creative Commons Attribution-NonCommercial-Sha...,4.0,11:00,https://www.xeno-canto.org/205893,afrsil1/XC205893.ogg,afrsil1,1
4,afrsil1,[],['flight call'],21.4581,-157.7252,Euodice cantans,African Silverbill,Ross Gallardy,Creative Commons Attribution-NonCommercial-Sha...,3.0,16:30,https://www.xeno-canto.org/207431,afrsil1/XC207431.ogg,afrsil1,1


In [7]:
path_df = pd.DataFrame(all_path, columns=['file_path'])
path_df['filename'] = path_df['file_path'].map(lambda x: x.split('/')[-2]+'/'+x.split('/')[-1][:-4])
path_df.head()

Unnamed: 0,file_path,filename
0,input/train_np_1/bcnher/XC256938.ogg.npy,bcnher/XC256938.ogg
1,input/train_np_1/bcnher/XC648367.ogg.npy,bcnher/XC648367.ogg
2,input/train_np_1/bcnher/XC587839.ogg.npy,bcnher/XC587839.ogg
3,input/train_np_1/bcnher/XC548602.ogg.npy,bcnher/XC548602.ogg
4,input/train_np_1/bcnher/XC500284.ogg.npy,bcnher/XC500284.ogg


In [8]:
train = pd.merge(train, path_df, on='filename')
print(train.shape)
train.head()

(14852, 16)


Unnamed: 0,primary_label,secondary_labels,type,latitude,longitude,scientific_name,common_name,author,license,rating,time,url,filename,new_target,len_new_target,file_path
0,afrsil1,[],"['call', 'flight call']",12.391,-1.493,Euodice cantans,African Silverbill,Bram Piot,Creative Commons Attribution-NonCommercial-Sha...,2.5,08:00,https://www.xeno-canto.org/125458,afrsil1/XC125458.ogg,afrsil1,1,input/train_np_1/afrsil1/XC125458.ogg.npy
1,afrsil1,"['houspa', 'redava', 'zebdov']",['call'],19.8801,-155.7254,Euodice cantans,African Silverbill,Dan Lane,Creative Commons Attribution-NonCommercial-Sha...,3.5,08:30,https://www.xeno-canto.org/175522,afrsil1/XC175522.ogg,afrsil1 houspa redava zebdov,4,input/train_np_1/afrsil1/XC175522.ogg.npy
2,afrsil1,[],"['call', 'song']",16.2901,-16.0321,Euodice cantans,African Silverbill,Bram Piot,Creative Commons Attribution-NonCommercial-Sha...,4.0,11:30,https://www.xeno-canto.org/177993,afrsil1/XC177993.ogg,afrsil1,1,input/train_np_1/afrsil1/XC177993.ogg.npy
3,afrsil1,[],"['alarm call', 'call']",17.0922,54.2958,Euodice cantans,African Silverbill,Oscar Campbell,Creative Commons Attribution-NonCommercial-Sha...,4.0,11:00,https://www.xeno-canto.org/205893,afrsil1/XC205893.ogg,afrsil1,1,input/train_np_1/afrsil1/XC205893.ogg.npy
4,afrsil1,[],['flight call'],21.4581,-157.7252,Euodice cantans,African Silverbill,Ross Gallardy,Creative Commons Attribution-NonCommercial-Sha...,3.0,16:30,https://www.xeno-canto.org/207431,afrsil1/XC207431.ogg,afrsil1,1,input/train_np_1/afrsil1/XC207431.ogg.npy


In [9]:
Fold = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
for n, (trn_index, val_index) in enumerate(Fold.split(train, train['primary_label'])):
    train.loc[val_index, 'kfold'] = int(n)
train['kfold'] = train['kfold'].astype(int)



In [10]:
train.to_csv('train_folds.csv', index=False)

In [11]:
class CFG:
    ######################
    # Globals #
    ######################
    EXP_ID = 'EX005'
    seed = 71
    epochs = 23
    cutmix_and_mixup_epochs = 18
    folds =  [0, 1, 2, 3, 4]  #[0]
    N_FOLDS = 5
    LR = 1e-3
    ETA_MIN = 1e-6
    WEIGHT_DECAY = 1e-6
    train_bs = 16 # 32
    valid_bs = 32 # 64
    base_model_name = "tf_efficientnet_b0_ns"
    EARLY_STOPPING = True
    DEBUG = False # True
    EVALUATION = 'AUC'
    apex = True

    pooling = "max"
    pretrained = True
    num_classes = 152
    in_channels = 3
    target_columns = 'afrsil1 akekee akepa1 akiapo akikik amewig aniani apapan arcter \
                      barpet bcnher belkin1 bkbplo bknsti bkwpet blkfra blknod bongul \
                      brant brnboo brnnod brnowl brtcur bubsan buffle bulpet burpar buwtea \
                      cacgoo1 calqua cangoo canvas caster1 categr chbsan chemun chukar cintea \
                      comgal1 commyn compea comsan comwax coopet crehon dunlin elepai ercfra eurwig \
                      fragul gadwal gamqua glwgul gnwtea golphe grbher3 grefri gresca gryfra gwfgoo \
                      hawama hawcoo hawcre hawgoo hawhaw hawpet1 hoomer houfin houspa hudgod iiwi incter1 \
                      jabwar japqua kalphe kauama laugul layalb lcspet leasan leater1 lessca lesyel lobdow lotjae \
                      madpet magpet1 mallar3 masboo mauala maupar merlin mitpar moudov norcar norhar2 normoc norpin \
                      norsho nutman oahama omao osprey pagplo palila parjae pecsan peflov perfal pibgre pomjae puaioh \
                      reccar redava redjun redpha1 refboo rempar rettro ribgul rinduc rinphe rocpig rorpar rudtur ruff \
                      saffin sander semplo sheowl shtsan skylar snogoo sooshe sooter1 sopsku1 sora spodov sposan \
                      towsol wantat1 warwhe1 wesmea wessan wetshe whfibi whiter whttro wiltur yebcar yefcan zebdov'.split()

    img_size = 224 #224 # 128
    main_metric = "epoch_f1_at_03"

    period = 5
    n_mels = 224 #224 # 128
    fmin = 20
    fmax = 16000
    n_fft = 2048
    hop_length = 512
    sample_rate = 32000
    melspectrogram_parameters = {
        "n_mels": 224, #224, # 128,
        "fmin": 20,
        "fmax": 16000
    }
    
    
class AudioParams:
    """
    Parameters used for the audio data
    """
    sr = CFG.sample_rate
    duration = CFG.period
    # Melspectrogram
    n_mels = CFG.n_mels
    fmin = CFG.fmin
    fmax = CFG.fmax

In [12]:
class Compose:
    def __init__(self, transforms: list):
        self.transforms = transforms

    def __call__(self, y: np.ndarray, sr):
        for trns in self.transforms:
            y = trns(y, sr)
        return y


class AudioTransform:
    def __init__(self, always_apply=False, p=0.5):
        self.always_apply = always_apply
        self.p = p

    def __call__(self, y: np.ndarray, sr):
        if self.always_apply:
            return self.apply(y, sr=sr)
        else:
            if np.random.rand() < self.p:
                return self.apply(y, sr=sr)
            else:
                return y

    def apply(self, y: np.ndarray, **params):
        raise NotImplementedError


class OneOf(Compose):
    # https://github.com/albumentations-team/albumentations/blob/master/albumentations/core/composition.py
    def __init__(self, transforms, p=0.5):
        super().__init__(transforms)
        self.p = p
        transforms_ps = [t.p for t in transforms]
        s = sum(transforms_ps)
        self.transforms_ps = [t / s for t in transforms_ps]

    def __call__(self, y: np.ndarray, sr):
        data = y
        if self.transforms_ps and (random.random() < self.p):
            random_state = np.random.RandomState(random.randint(0, 2 ** 32 - 1))
            t = random_state.choice(self.transforms, p=self.transforms_ps)
            data = t(y, sr)
        return data


class Normalize(AudioTransform):
    def __init__(self, always_apply=False, p=1):
        super().__init__(always_apply, p)

    def apply(self, y: np.ndarray, **params):
        max_vol = np.abs(y).max()
        y_vol = y * 1 / max_vol
        return np.asfortranarray(y_vol)


class NewNormalize(AudioTransform):
    def __init__(self, always_apply=False, p=1):
        super().__init__(always_apply, p)

    def apply(self, y: np.ndarray, **params):
        y_mm = y - y.mean()
        return y_mm / y_mm.abs().max()


class NoiseInjection(AudioTransform):
    def __init__(self, always_apply=False, p=0.5, max_noise_level=0.5):
        super().__init__(always_apply, p)

        self.noise_level = (0.0, max_noise_level)

    def apply(self, y: np.ndarray, **params):
        noise_level = np.random.uniform(*self.noise_level)
        noise = np.random.randn(len(y))
        augmented = (y + noise * noise_level).astype(y.dtype)
        return augmented


class GaussianNoise(AudioTransform):
    def __init__(self, always_apply=False, p=0.5, min_snr=5, max_snr=20):
        super().__init__(always_apply, p)

        self.min_snr = min_snr
        self.max_snr = max_snr

    def apply(self, y: np.ndarray, **params):
        snr = np.random.uniform(self.min_snr, self.max_snr)
        a_signal = np.sqrt(y ** 2).max()
        a_noise = a_signal / (10 ** (snr / 20))

        white_noise = np.random.randn(len(y))
        a_white = np.sqrt(white_noise ** 2).max()
        augmented = (y + white_noise * 1 / a_white * a_noise).astype(y.dtype)
        return augmented


class PinkNoise(AudioTransform):
    def __init__(self, always_apply=False, p=0.5, min_snr=5, max_snr=20):
        super().__init__(always_apply, p)

        self.min_snr = min_snr
        self.max_snr = max_snr

    def apply(self, y: np.ndarray, **params):
        snr = np.random.uniform(self.min_snr, self.max_snr)
        a_signal = np.sqrt(y ** 2).max()
        a_noise = a_signal / (10 ** (snr / 20))

        pink_noise = cn.powerlaw_psd_gaussian(1, len(y))
        a_pink = np.sqrt(pink_noise ** 2).max()
        augmented = (y + pink_noise * 1 / a_pink * a_noise).astype(y.dtype)
        return augmented


class PitchShift(AudioTransform):
    def __init__(self, always_apply=False, p=0.5, max_range=5):
        super().__init__(always_apply, p)
        self.max_range = max_range

    def apply(self, y: np.ndarray, sr, **params):
        n_steps = np.random.randint(-self.max_range, self.max_range)
        augmented = librosa.effects.pitch_shift(y, sr, n_steps)
        return augmented


class TimeStretch(AudioTransform):
    def __init__(self, always_apply=False, p=0.5, max_rate=1):
        super().__init__(always_apply, p)
        self.max_rate = max_rate

    def apply(self, y: np.ndarray, **params):
        rate = np.random.uniform(0, self.max_rate)
        augmented = librosa.effects.time_stretch(y, rate)
        return augmented


def _db2float(db: float, amplitude=True):
    if amplitude:
        return 10 ** (db / 20)
    else:
        return 10 ** (db / 10)


def volume_down(y: np.ndarray, db: float):
    """
    Low level API for decreasing the volume
    Parameters
    ----------
    y: numpy.ndarray
        stereo / monaural input audio
    db: float
        how much decibel to decrease
    Returns
    -------
    applied: numpy.ndarray
        audio with decreased volume
    """
    applied = y * _db2float(-db)
    return applied


def volume_up(y: np.ndarray, db: float):
    """
    Low level API for increasing the volume
    Parameters
    ----------
    y: numpy.ndarray
        stereo / monaural input audio
    db: float
        how much decibel to increase
    Returns
    -------
    applied: numpy.ndarray
        audio with increased volume
    """
    applied = y * _db2float(db)
    return applied


class RandomVolume(AudioTransform):
    def __init__(self, always_apply=False, p=0.5, limit=10):
        super().__init__(always_apply, p)
        self.limit = limit

    def apply(self, y: np.ndarray, **params):
        db = np.random.uniform(-self.limit, self.limit)
        if db >= 0:
            return volume_up(y, db)
        else:
            return volume_down(y, db)


class CosineVolume(AudioTransform):
    def __init__(self, always_apply=False, p=0.5, limit=10):
        super().__init__(always_apply, p)
        self.limit = limit

    def apply(self, y: np.ndarray, **params):
        db = np.random.uniform(-self.limit, self.limit)
        cosine = np.cos(np.arange(len(y)) / len(y) * np.pi * 2)
        dbs = _db2float(cosine * db)
        return y * dbs

In [13]:
OUTPUT_DIR = f'output'    #was ./ for Kaggle
if not os.path.exists(OUTPUT_DIR):
    os.makedirs(OUTPUT_DIR)
   
    
def set_seed(seed=42):
    random.seed(seed)
    os.environ["PYTHONHASHSEED"] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    
set_seed(CFG.seed)

In [14]:
def calc_loss(y_true, y_pred):
    return metrics.roc_auc_score(np.array(y_true), np.array(y_pred))


# ====================================================
# Training helper functions
# ====================================================
class AverageMeter(object):
    """Computes and stores the average and current value"""

    def __init__(self):
        self.reset()

    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count
        

class MetricMeter(object):
    def __init__(self):
        self.reset()
    
    def reset(self):
        self.y_true = []
        self.y_pred = []
    
    def update(self, y_true, y_pred):
        self.y_true.extend(y_true.cpu().detach().numpy().tolist())
        self.y_pred.extend(y_pred["clipwise_output"].cpu().detach().numpy().tolist())

    @property
    def avg(self):
        self.f1_03 = metrics.f1_score(np.array(self.y_true), np.array(self.y_pred) > 0.3, average="micro")
        self.f1_05 = metrics.f1_score(np.array(self.y_true), np.array(self.y_pred) > 0.5, average="micro")
        
        return {
            "f1_at_03" : self.f1_03,
            "f1_at_05" : self.f1_05,
        }
    
    
# https://www.kaggle.com/c/rfcx-species-audio-detection/discussion/213075
class BCEFocalLoss(nn.Module):
    def __init__(self, alpha=0.25, gamma=2.0):
        super().__init__()
        self.alpha = alpha
        self.gamma = gamma

    def forward(self, preds, targets):
        bce_loss = nn.BCEWithLogitsLoss(reduction='none')(preds, targets)
        probas = torch.sigmoid(preds)
        loss = targets * self.alpha * \
            (1. - probas)**self.gamma * bce_loss + \
            (1. - targets) * probas**self.gamma * bce_loss
        loss = loss.mean()
        return loss


class BCEFocal2WayLoss(nn.Module):
    def __init__(self, weights=[1, 1], class_weights=None):
        super().__init__()

        self.focal = BCEFocalLoss()

        self.weights = weights

    def forward(self, input, target):
        input_ = input["logit"]
        target = target.float()

        framewise_output = input["framewise_logit"]
        clipwise_output_with_max, _ = framewise_output.max(dim=1)

        loss = self.focal(input_, target)
        aux_loss = self.focal(clipwise_output_with_max, target)

        return self.weights[0] * loss + self.weights[1] * aux_loss

In [15]:
def compute_melspec(y, params):
    """
    Computes a mel-spectrogram and puts it at decibel scale
    Arguments:
        y {np array} -- signal
        params {AudioParams} -- Parameters to use for the spectrogram. Expected to have the attributes sr, n_mels, f_min, f_max
    Returns:
        np array -- Mel-spectrogram
    """
    melspec = librosa.feature.melspectrogram(
        y=y, sr=params.sr, n_mels=params.n_mels, fmin=params.fmin, fmax=params.fmax,
    )

    melspec = librosa.power_to_db(melspec).astype(np.float32)
    return melspec


def crop_or_pad(y, length, sr, train=True, probs=None):
    """
    Crops an array to a chosen length
    Arguments:
        y {1D np array} -- Array to crop
        length {int} -- Length of the crop
        sr {int} -- Sampling rate
    Keyword Arguments:
        train {bool} -- Whether we are at train time. If so, crop randomly, else return the beginning of y (default: {True})
        probs {None or numpy array} -- Probabilities to use to chose where to crop (default: {None})
    Returns:
        1D np array -- Cropped array
    """
    if len(y) <= length:
        y = np.concatenate([y, np.zeros(length - len(y))])
    else:
        if not train:
            start = 0
        elif probs is None:
            start = np.random.randint(len(y) - length)
        else:
            start = (
                    np.random.choice(np.arange(len(probs)), p=probs) + np.random.random()
            )
            start = int(sr * (start))

        y = y[start: start + length]

    return y.astype(np.float32)


def mono_to_color(X, eps=1e-6, mean=None, std=None):
    """
    Converts a one channel array to a 3 channel one in [0, 255]
    Arguments:
        X {numpy array [H x W]} -- 2D array to convert
    Keyword Arguments:
        eps {float} -- To avoid dividing by 0 (default: {1e-6})
        mean {None or np array} -- Mean for normalization (default: {None})
        std {None or np array} -- Std for normalization (default: {None})
    Returns:
        numpy array [3 x H x W] -- RGB numpy array
    """
    X = np.stack([X, X, X], axis=-1)

    # Standardize
    mean = mean or X.mean()
    std = std or X.std()
    X = (X - mean) / (std + eps)

    # Normalize to [0, 255]
    _min, _max = X.min(), X.max()

    if (_max - _min) > eps:
        V = np.clip(X, _min, _max)
        V = 255 * (V - _min) / (_max - _min)
        V = V.astype(np.uint8)
    else:
        V = np.zeros_like(X, dtype=np.uint8)

    return V


mean = (0.485, 0.456, 0.406) # RGB
std = (0.229, 0.224, 0.225) # RGB

albu_transforms = {
    'train' : A.Compose([
            A.HorizontalFlip(p=0.5),
            A.OneOf([
                A.Cutout(max_h_size=5, max_w_size=16),
                A.CoarseDropout(max_holes=4),
            ], p=0.5),
            A.Normalize(mean, std),
    ]),
    'valid' : A.Compose([
            A.Normalize(mean, std),
    ]),
}


class WaveformDataset(torch.utils.data.Dataset):
    def __init__(self,
                 df: pd.DataFrame,
                 mode='train'):
        self.df = df
        self.mode = mode

        if mode == 'train':
            self.wave_transforms = Compose(
                [
                    OneOf(
                        [
                            NoiseInjection(p=1, max_noise_level=0.04),
                            GaussianNoise(p=1, min_snr=5, max_snr=20),
                            PinkNoise(p=1, min_snr=5, max_snr=20),
                        ],
                        p=0.2,
                    ),
                    RandomVolume(p=0.2, limit=4),
                    Normalize(p=1),
                ]
            )
        else:
            self.wave_transforms = Compose(
                [
                    Normalize(p=1),
                ]
            )

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx: int):
        SR = 32000
        sample = self.df.loc[idx, :]
        
        wav_path = sample["file_path"]
        labels = sample["new_target"]

        y = np.load(wav_path)

        # SEC = int(len(y)/2/SR)
        # if SEC > 0:
        #     start = np.random.randint(SEC)
        #     end = start+AudioParams.duration
        if len(y) > 0:
            y = y[:AudioParams.duration*SR]

            if self.wave_transforms:
                y = self.wave_transforms(y, sr=SR)

        y = np.concatenate([y, y, y])[:AudioParams.duration * AudioParams.sr] 
        y = crop_or_pad(y, AudioParams.duration * AudioParams.sr, sr=AudioParams.sr, train=True, probs=None)
        image = compute_melspec(y, AudioParams)
        image = mono_to_color(image)
        image = image.astype(np.uint8)
        
        # image = np.load(wav_path) # (224, 313, 3)
        image = albu_transforms[self.mode](image=image)['image']
        image = image.T
        
        targets = np.zeros(len(CFG.target_columns), dtype=float)
        for ebird_code in labels.split():
            targets[CFG.target_columns.index(ebird_code)] = 1.0

        return {
            "image": image,
            "targets": targets,
        }



In [16]:
def init_layer(layer):
    nn.init.xavier_uniform_(layer.weight)

    if hasattr(layer, "bias"):
        if layer.bias is not None:
            layer.bias.data.fill_(0.)


def init_bn(bn):
    bn.bias.data.fill_(0.)
    bn.weight.data.fill_(1.0)


def init_weights(model):
    classname = model.__class__.__name__
    if classname.find("Conv2d") != -1:
        nn.init.xavier_uniform_(model.weight, gain=np.sqrt(2))
        model.bias.data.fill_(0)
    elif classname.find("BatchNorm") != -1:
        model.weight.data.normal_(1.0, 0.02)
        model.bias.data.fill_(0)
    elif classname.find("GRU") != -1:
        for weight in model.parameters():
            if len(weight.size()) > 1:
                nn.init.orghogonal_(weight.data)
    elif classname.find("Linear") != -1:
        model.weight.data.normal_(0, 0.01)
        model.bias.data.zero_()


def interpolate(x: torch.Tensor, ratio: int):
    """Interpolate data in time domain. This is used to compensate the
    resolution reduction in downsampling of a CNN.
    Args:
      x: (batch_size, time_steps, classes_num)
      ratio: int, ratio to interpolate
    Returns:
      upsampled: (batch_size, time_steps * ratio, classes_num)
    """
    (batch_size, time_steps, classes_num) = x.shape
    upsampled = x[:, :, None, :].repeat(1, 1, ratio, 1)
    upsampled = upsampled.reshape(batch_size, time_steps * ratio, classes_num)
    return upsampled


def pad_framewise_output(framewise_output: torch.Tensor, frames_num: int):
    """Pad framewise_output to the same length as input frames. The pad value
    is the same as the value of the last frame.
    Args:
      framewise_output: (batch_size, frames_num, classes_num)
      frames_num: int, number of frames to pad
    Outputs:
      output: (batch_size, frames_num, classes_num)
    """
    output = F.interpolate(
        framewise_output.unsqueeze(1),
        size=(frames_num, framewise_output.size(2)),
        align_corners=True,
        mode="bilinear").squeeze(1)

    return output


class AttBlockV2(nn.Module):
    def __init__(self,
                 in_features: int,
                 out_features: int,
                 activation="linear"):
        super().__init__()

        self.activation = activation
        self.att = nn.Conv1d(
            in_channels=in_features,
            out_channels=out_features,
            kernel_size=1,
            stride=1,
            padding=0,
            bias=True)
        self.cla = nn.Conv1d(
            in_channels=in_features,
            out_channels=out_features,
            kernel_size=1,
            stride=1,
            padding=0,
            bias=True)

        self.init_weights()

    def init_weights(self):
        init_layer(self.att)
        init_layer(self.cla)

    def forward(self, x):
        # x: (n_samples, n_in, n_time)
        norm_att = torch.softmax(torch.tanh(self.att(x)), dim=-1)
        cla = self.nonlinear_transform(self.cla(x))
        x = torch.sum(norm_att * cla, dim=2)
        return x, norm_att, cla

    def nonlinear_transform(self, x):
        if self.activation == 'linear':
            return x
        elif self.activation == 'sigmoid':
            return torch.sigmoid(x)


class TimmSED(nn.Module):
    def __init__(self, base_model_name: str, pretrained=False, num_classes=24, in_channels=1):
        super().__init__()

        self.spec_augmenter = SpecAugmentation(time_drop_width=64//2, time_stripes_num=2,
                                               freq_drop_width=8//2, freq_stripes_num=2)

        self.bn0 = nn.BatchNorm2d(CFG.n_mels)

        base_model = timm.create_model(
            base_model_name, pretrained=pretrained, in_chans=in_channels)
        layers = list(base_model.children())[:-2]
        self.encoder = nn.Sequential(*layers)

        if hasattr(base_model, "fc"):
            in_features = base_model.fc.in_features
        else:
            in_features = base_model.classifier.in_features

        self.fc1 = nn.Linear(in_features, in_features, bias=True)
        self.att_block = AttBlockV2(
            in_features, num_classes, activation="sigmoid")

        self.init_weight()

    def init_weight(self):
        init_bn(self.bn0)
        init_layer(self.fc1)
        

    def forward(self, input_data):
        x = input_data # (batch_size, 3, time_steps, mel_bins)

        frames_num = x.shape[2]

        x = x.transpose(1, 3)
        x = self.bn0(x)
        x = x.transpose(1, 3)

        if self.training:
            if random.random() < 0.25:
                x = self.spec_augmenter(x)

        x = x.transpose(2, 3)

        x = self.encoder(x)
        
        # Aggregate in frequency axis
        x = torch.mean(x, dim=3)

        x1 = F.max_pool1d(x, kernel_size=3, stride=1, padding=1)
        x2 = F.avg_pool1d(x, kernel_size=3, stride=1, padding=1)
        x = x1 + x2

        x = F.dropout(x, p=0.5, training=self.training)
        x = x.transpose(1, 2)
        x = F.relu_(self.fc1(x))
        x = x.transpose(1, 2)
        x = F.dropout(x, p=0.5, training=self.training)

        (clipwise_output, norm_att, segmentwise_output) = self.att_block(x)
        logit = torch.sum(norm_att * self.att_block.cla(x), dim=2)
        segmentwise_logit = self.att_block.cla(x).transpose(1, 2)
        segmentwise_output = segmentwise_output.transpose(1, 2)

        interpolate_ratio = frames_num // segmentwise_output.size(1)

        # Get framewise output
        framewise_output = interpolate(segmentwise_output,
                                       interpolate_ratio)
        framewise_output = pad_framewise_output(framewise_output, frames_num)

        framewise_logit = interpolate(segmentwise_logit, interpolate_ratio)
        framewise_logit = pad_framewise_output(framewise_logit, frames_num)

        output_dict = {
            'framewise_output': framewise_output,
            'clipwise_output': clipwise_output,
            'logit': logit,
            'framewise_logit': framewise_logit,
        }

        return output_dict

In [17]:
def rand_bbox(size, lam):
    W = size[2]
    H = size[3]
    cut_rat = np.sqrt(1. - lam)
    cut_w = int(W * cut_rat)
    cut_h = int(H * cut_rat)

    # uniform
    cx = np.random.randint(W)
    cy = np.random.randint(H)

    bbx1 = np.clip(cx - cut_w // 2, 0, W)
    bby1 = np.clip(cy - cut_h // 2, 0, H)
    bbx2 = np.clip(cx + cut_w // 2, 0, W)
    bby2 = np.clip(cy + cut_h // 2, 0, H)

    return bbx1, bby1, bbx2, bby2


def cutmix(data, targets, alpha):
    indices = torch.randperm(data.size(0))
    shuffled_data = data[indices]
    shuffled_targets = targets[indices]

    lam = np.random.beta(alpha, alpha)
    bbx1, bby1, bbx2, bby2 = rand_bbox(data.size(), lam)
    data[:, :, bbx1:bbx2, bby1:bby2] = data[indices, :, bbx1:bbx2, bby1:bby2]
    # adjust lambda to exactly match pixel ratio
    lam = 1 - ((bbx2 - bbx1) * (bby2 - bby1) / (data.size()[-1] * data.size()[-2]))

    new_targets = [targets, shuffled_targets, lam]
    return data, new_targets

def mixup(data, targets, alpha):
    indices = torch.randperm(data.size(0))
    shuffled_data = data[indices]
    shuffled_targets = targets[indices]

    lam = np.random.beta(alpha, alpha)
    new_data = data * lam + shuffled_data * (1 - lam)
    new_targets = [targets, shuffled_targets, lam]
    return new_data, new_targets


def cutmix_criterion(preds, new_targets):
    targets1, targets2, lam = new_targets[0], new_targets[1], new_targets[2]
    criterion = BCEFocal2WayLoss()
    return lam * criterion(preds, targets1) + (1 - lam) * criterion(preds, targets2)

def mixup_criterion(preds, new_targets):
    targets1, targets2, lam = new_targets[0], new_targets[1], new_targets[2]
    criterion = BCEFocal2WayLoss()
    return lam * criterion(preds, targets1) + (1 - lam) * criterion(preds, targets2)


def loss_fn(logits, targets):
    loss_fct = BCEFocal2WayLoss()
    loss = loss_fct(logits, targets)
    return loss

In [18]:
def train_fn(model, data_loader, device, optimizer, scheduler):
    model.train()
    scaler = GradScaler(enabled=CFG.apex)
    losses = AverageMeter()
    scores = MetricMeter()
    tk0 = tqdm(data_loader, total=len(data_loader))
    
    for data in tk0:
        optimizer.zero_grad()
        inputs = data['image'].to(device)
        targets = data['targets'].to(device)
        with autocast(enabled=CFG.apex):
            outputs = model(inputs)
            loss = loss_fn(outputs, targets)
        
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()
        
        scheduler.step()
        losses.update(loss.item(), inputs.size(0))
        scores.update(targets, outputs)
        tk0.set_postfix(loss=losses.avg)
    return scores.avg, losses.avg


def train_mixup_cutmix_fn(model, data_loader, device, optimizer, scheduler):
    model.train()
    scaler = GradScaler(enabled=CFG.apex)
    losses = AverageMeter()
    scores = MetricMeter()
    tk0 = tqdm(data_loader, total=len(data_loader))

    for data in tk0:
        optimizer.zero_grad()
        inputs = data['image'].to(device)
        targets = data['targets'].to(device)

        if np.random.rand()<0.5:
            inputs, new_targets = mixup(inputs, targets, 0.4)
            with autocast(enabled=CFG.apex):
                outputs = model(inputs)
                loss = mixup_criterion(outputs, new_targets) 
        else:
            inputs, new_targets = cutmix(inputs, targets, 0.4)
            with autocast(enabled=CFG.apex):
                outputs = model(inputs)
                loss = cutmix_criterion(outputs, new_targets)

        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()
        
        scheduler.step()
        losses.update(loss.item(), inputs.size(0))
        scores.update(new_targets[0], outputs)
        tk0.set_postfix(loss=losses.avg)
    return scores.avg, losses.avg


def valid_fn(model, data_loader, device):
    model.eval()
    losses = AverageMeter()
    scores = MetricMeter()
    tk0 = tqdm(data_loader, total=len(data_loader))
    valid_preds = []
    with torch.no_grad():
        for data in tk0:
            inputs = data['image'].to(device)
            targets = data['targets'].to(device)
            outputs = model(inputs)
            loss = loss_fn(outputs, targets)
            losses.update(loss.item(), inputs.size(0))
            scores.update(targets, outputs)
            tk0.set_postfix(loss=losses.avg)
    return scores.avg, losses.avg

In [19]:
def inference_fn(model, data_loader, device):
    model.eval()
    tk0 = tqdm(data_loader, total=len(data_loader))
    final_output = []
    final_target = []
    with torch.no_grad():
        for b_idx, data in enumerate(tk0):
            inputs = data['image'].to(device)
            targets = data['targets'].to(device).detach().cpu().numpy().tolist()
            output = model(inputs)
            output = output["clipwise_output"].cpu().detach().cpu().numpy().tolist()
            final_output.extend(output)
            final_target.extend(targets)
    return final_output, final_target


def calc_cv(model_paths):
    df = pd.read_csv('train_folds.csv')
    y_true = []
    y_pred = []
    for fold, model_path in enumerate(model_paths):
        model = TimmSED(
            base_model_name=CFG.base_model_name,
            pretrained=CFG.pretrained,
            num_classes=CFG.num_classes,
            in_channels=CFG.in_channels)

        model.to(device)
        model.load_state_dict(torch.load(model_path))
        model.eval()

        val_df = df[df.kfold == fold].reset_index(drop=True)
        dataset = WaveformDataset(df=val_df, mode='valid')
        dataloader = torch.utils.data.DataLoader(
            dataset, batch_size=CFG.valid_bs, num_workers=0, pin_memory=True, shuffle=False
        )

        final_output, final_target = inference_fn(model, dataloader, device)
        y_pred.extend(final_output)
        y_true.extend(final_target)
        torch.cuda.empty_cache()

        f1_03 = metrics.f1_score(np.array(y_true), np.array(y_pred) > 0.3, average="micro")
        print(f'micro f1_0.3 {f1_03}')

    f1_03 = metrics.f1_score(np.array(y_true), np.array(y_pred) > 0.3, average="micro")
    f1_05 = metrics.f1_score(np.array(y_true), np.array(y_pred) > 0.5, average="micro")

    print(f'overall micro f1_0.3 {f1_03}')
    print(f'overall micro f1_0.5 {f1_05}')
    return

In [20]:
# main loop
for fold in range(5):
    if fold not in CFG.folds:
        continue
    print("=" * 100)
    print(f"Fold {fold} Training")
    print("=" * 100)

    trn_df = train[train.kfold != fold].reset_index(drop=True)
    val_df = train[train.kfold == fold].reset_index(drop=True)

    train_dataset = WaveformDataset(df=trn_df, mode='train')
    train_dataloader = torch.utils.data.DataLoader(
        train_dataset, batch_size=CFG.train_bs, num_workers=0, pin_memory=True, shuffle=True
    )
    
    valid_dataset = WaveformDataset(df=val_df, mode='valid')
    valid_dataloader = torch.utils.data.DataLoader(
        valid_dataset, batch_size=CFG.valid_bs, num_workers=0, pin_memory=True, shuffle=False
    )

    model = TimmSED(
        base_model_name=CFG.base_model_name,
        pretrained=CFG.pretrained,
        num_classes=CFG.num_classes,
        in_channels=CFG.in_channels)

    optimizer = transformers.AdamW(model.parameters(), lr=CFG.LR, weight_decay=CFG.WEIGHT_DECAY)
    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, eta_min=CFG.ETA_MIN, T_max=500)

    model = model.to(device)

    min_loss = 999
    best_score = -np.inf

    for epoch in range(CFG.epochs):
        print("Starting {} epoch...".format(epoch+1))

        start_time = time.time()

        if epoch < CFG.cutmix_and_mixup_epochs:
            train_avg, train_loss = train_mixup_cutmix_fn(model, train_dataloader, device, optimizer, scheduler)
        else: 
            train_avg, train_loss = train_fn(model, train_dataloader, device, optimizer, scheduler)

        valid_avg, valid_loss = valid_fn(model, valid_dataloader, device)

        elapsed = time.time() - start_time

        print(f'Epoch {epoch+1} - avg_train_loss: {train_loss:.5f}  avg_val_loss: {valid_loss:.5f}  time: {elapsed:.0f}s')
        print(f"Epoch {epoch+1} - train_f1_at_03:{train_avg['f1_at_03']:0.5f}  valid_f1_at_03:{valid_avg['f1_at_03']:0.5f}")
        print(f"Epoch {epoch+1} - train_f1_at_05:{train_avg['f1_at_05']:0.5f}  valid_f1_at_05:{valid_avg['f1_at_05']:0.5f}")

        if valid_avg['f1_at_03'] > best_score:
            print(f">>>>>>>> Model Improved From {best_score} ----> {valid_avg['f1_at_03']}")
            print(f"other scores here... {valid_avg['f1_at_03']}, {valid_avg['f1_at_05']}")
            torch.save(model.state_dict(), f'fold-{fold}.bin')
            best_score = valid_avg['f1_at_03']

Fold 0 Training




Starting 1 epoch...


100%|██████████| 743/743 [17:28<00:00,  1.41s/it, loss=0.0144]
100%|██████████| 93/93 [01:14<00:00,  1.25it/s, loss=0.00801]


Epoch 1 - avg_train_loss: 0.01438  avg_val_loss: 0.00801  time: 1123s
Epoch 1 - train_f1_at_03:0.00424  valid_f1_at_03:0.00060
Epoch 1 - train_f1_at_05:0.00148  valid_f1_at_05:0.00000
>>>>>>>> Model Improved From -inf ----> 0.0006040471156750226
other scores here... 0.0006040471156750226, 0.0
Starting 2 epoch...


100%|██████████| 743/743 [15:55<00:00,  1.29s/it, loss=0.00845]
100%|██████████| 93/93 [01:05<00:00,  1.43it/s, loss=0.00672]


Epoch 2 - avg_train_loss: 0.00845  avg_val_loss: 0.00672  time: 1021s
Epoch 2 - train_f1_at_03:0.01847  valid_f1_at_03:0.19333
Epoch 2 - train_f1_at_05:0.00045  valid_f1_at_05:0.04082
>>>>>>>> Model Improved From 0.0006040471156750226 ----> 0.19333333333333336
other scores here... 0.19333333333333336, 0.04081632653061225
Starting 3 epoch...


100%|██████████| 743/743 [15:13<00:00,  1.23s/it, loss=0.0077] 
100%|██████████| 93/93 [01:04<00:00,  1.44it/s, loss=0.0061] 


Epoch 3 - avg_train_loss: 0.00770  avg_val_loss: 0.00610  time: 979s
Epoch 3 - train_f1_at_03:0.05794  valid_f1_at_03:0.31030
Epoch 3 - train_f1_at_05:0.00478  valid_f1_at_05:0.10297
>>>>>>>> Model Improved From 0.19333333333333336 ----> 0.3102968460111318
other scores here... 0.3102968460111318, 0.10297482837528606
Starting 4 epoch...


100%|██████████| 743/743 [15:58<00:00,  1.29s/it, loss=0.00709]
100%|██████████| 93/93 [01:04<00:00,  1.43it/s, loss=0.00597]


Epoch 4 - avg_train_loss: 0.00709  avg_val_loss: 0.00597  time: 1024s
Epoch 4 - train_f1_at_03:0.10816  valid_f1_at_03:0.34799
Epoch 4 - train_f1_at_05:0.01013  valid_f1_at_05:0.14198
>>>>>>>> Model Improved From 0.3102968460111318 ----> 0.3479948253557568
other scores here... 0.3479948253557568, 0.14198218262806236
Starting 5 epoch...


100%|██████████| 743/743 [15:13<00:00,  1.23s/it, loss=0.00678]
100%|██████████| 93/93 [01:04<00:00,  1.45it/s, loss=0.00494]


Epoch 5 - avg_train_loss: 0.00678  avg_val_loss: 0.00494  time: 979s
Epoch 5 - train_f1_at_03:0.14913  valid_f1_at_03:0.47040
Epoch 5 - train_f1_at_05:0.02168  valid_f1_at_05:0.25948
>>>>>>>> Model Improved From 0.3479948253557568 ----> 0.470397404703974
other scores here... 0.470397404703974, 0.2594820821344494
Starting 6 epoch...


100%|██████████| 743/743 [15:51<00:00,  1.28s/it, loss=0.00663]
100%|██████████| 93/93 [01:04<00:00,  1.43it/s, loss=0.00456]


Epoch 6 - avg_train_loss: 0.00663  avg_val_loss: 0.00456  time: 1017s
Epoch 6 - train_f1_at_03:0.18572  valid_f1_at_03:0.53185
Epoch 6 - train_f1_at_05:0.02311  valid_f1_at_05:0.33200
>>>>>>>> Model Improved From 0.470397404703974 ----> 0.5318495778971604
other scores here... 0.5318495778971604, 0.33199999999999996
Starting 7 epoch...


100%|██████████| 743/743 [15:15<00:00,  1.23s/it, loss=0.00638]
100%|██████████| 93/93 [01:04<00:00,  1.45it/s, loss=0.00479]


Epoch 7 - avg_train_loss: 0.00638  avg_val_loss: 0.00479  time: 980s
Epoch 7 - train_f1_at_03:0.21600  valid_f1_at_03:0.49028
Epoch 7 - train_f1_at_05:0.03251  valid_f1_at_05:0.23151
Starting 8 epoch...


100%|██████████| 743/743 [15:12<00:00,  1.23s/it, loss=0.00605]
100%|██████████| 93/93 [01:04<00:00,  1.44it/s, loss=0.00477]


Epoch 8 - avg_train_loss: 0.00605  avg_val_loss: 0.00477  time: 978s
Epoch 8 - train_f1_at_03:0.25370  valid_f1_at_03:0.50818
Epoch 8 - train_f1_at_05:0.04438  valid_f1_at_05:0.26587
Starting 9 epoch...


100%|██████████| 743/743 [15:16<00:00,  1.23s/it, loss=0.006]  
100%|██████████| 93/93 [01:05<00:00,  1.43it/s, loss=0.00407]


Epoch 9 - avg_train_loss: 0.00600  avg_val_loss: 0.00407  time: 982s
Epoch 9 - train_f1_at_03:0.26109  valid_f1_at_03:0.58996
Epoch 9 - train_f1_at_05:0.04626  valid_f1_at_05:0.39315
>>>>>>>> Model Improved From 0.5318495778971604 ----> 0.5899573837317028
other scores here... 0.5899573837317028, 0.39315002411963335
Starting 10 epoch...


100%|██████████| 743/743 [15:17<00:00,  1.23s/it, loss=0.00593]
100%|██████████| 93/93 [01:05<00:00,  1.42it/s, loss=0.00392]


Epoch 10 - avg_train_loss: 0.00593  avg_val_loss: 0.00392  time: 983s
Epoch 10 - train_f1_at_03:0.27158  valid_f1_at_03:0.61059
Epoch 10 - train_f1_at_05:0.06074  valid_f1_at_05:0.39362
>>>>>>>> Model Improved From 0.5899573837317028 ----> 0.6105895606458881
other scores here... 0.6105895606458881, 0.3936247283264912
Starting 11 epoch...


100%|██████████| 743/743 [15:14<00:00,  1.23s/it, loss=0.00568]
100%|██████████| 93/93 [01:05<00:00,  1.43it/s, loss=0.00432]


Epoch 11 - avg_train_loss: 0.00568  avg_val_loss: 0.00432  time: 980s
Epoch 11 - train_f1_at_03:0.30270  valid_f1_at_03:0.56782
Epoch 11 - train_f1_at_05:0.07009  valid_f1_at_05:0.32825
Starting 12 epoch...


100%|██████████| 743/743 [15:16<00:00,  1.23s/it, loss=0.00541]
100%|██████████| 93/93 [01:04<00:00,  1.43it/s, loss=0.00414]


Epoch 12 - avg_train_loss: 0.00541  avg_val_loss: 0.00414  time: 982s
Epoch 12 - train_f1_at_03:0.33108  valid_f1_at_03:0.58135
Epoch 12 - train_f1_at_05:0.07928  valid_f1_at_05:0.33183
Starting 13 epoch...


100%|██████████| 743/743 [15:16<00:00,  1.23s/it, loss=0.00553]
100%|██████████| 93/93 [01:05<00:00,  1.42it/s, loss=0.00367]


Epoch 13 - avg_train_loss: 0.00553  avg_val_loss: 0.00367  time: 982s
Epoch 13 - train_f1_at_03:0.32854  valid_f1_at_03:0.64478
Epoch 13 - train_f1_at_05:0.08775  valid_f1_at_05:0.39894
>>>>>>>> Model Improved From 0.6105895606458881 ----> 0.644782688428388
other scores here... 0.644782688428388, 0.39894001445434835
Starting 14 epoch...


100%|██████████| 743/743 [16:03<00:00,  1.30s/it, loss=0.00548]
100%|██████████| 93/93 [01:05<00:00,  1.42it/s, loss=0.00357]


Epoch 14 - avg_train_loss: 0.00548  avg_val_loss: 0.00357  time: 1029s
Epoch 14 - train_f1_at_03:0.35897  valid_f1_at_03:0.65995
Epoch 14 - train_f1_at_05:0.09364  valid_f1_at_05:0.41833
>>>>>>>> Model Improved From 0.644782688428388 ----> 0.6599532458190973
other scores here... 0.6599532458190973, 0.41832858499525166
Starting 15 epoch...


100%|██████████| 743/743 [15:14<00:00,  1.23s/it, loss=0.00527]
100%|██████████| 93/93 [01:05<00:00,  1.42it/s, loss=0.00389]


Epoch 15 - avg_train_loss: 0.00527  avg_val_loss: 0.00389  time: 980s
Epoch 15 - train_f1_at_03:0.34881  valid_f1_at_03:0.62082
Epoch 15 - train_f1_at_05:0.09691  valid_f1_at_05:0.40544
Starting 16 epoch...


100%|██████████| 743/743 [15:13<00:00,  1.23s/it, loss=0.00501]
100%|██████████| 93/93 [01:05<00:00,  1.42it/s, loss=0.00391]


Epoch 16 - avg_train_loss: 0.00501  avg_val_loss: 0.00391  time: 979s
Epoch 16 - train_f1_at_03:0.38772  valid_f1_at_03:0.62046
Epoch 16 - train_f1_at_05:0.12623  valid_f1_at_05:0.35000
Starting 17 epoch...


100%|██████████| 743/743 [15:14<00:00,  1.23s/it, loss=0.0052] 
100%|██████████| 93/93 [01:04<00:00,  1.43it/s, loss=0.00343]


Epoch 17 - avg_train_loss: 0.00520  avg_val_loss: 0.00343  time: 980s
Epoch 17 - train_f1_at_03:0.37282  valid_f1_at_03:0.67617
Epoch 17 - train_f1_at_05:0.11304  valid_f1_at_05:0.44278
>>>>>>>> Model Improved From 0.6599532458190973 ----> 0.676165347405453
other scores here... 0.676165347405453, 0.4427802480692722
Starting 18 epoch...


100%|██████████| 743/743 [15:15<00:00,  1.23s/it, loss=0.00511]
100%|██████████| 93/93 [01:05<00:00,  1.43it/s, loss=0.00331]


Epoch 18 - avg_train_loss: 0.00511  avg_val_loss: 0.00331  time: 982s
Epoch 18 - train_f1_at_03:0.37638  valid_f1_at_03:0.68847
Epoch 18 - train_f1_at_05:0.11563  valid_f1_at_05:0.49173
>>>>>>>> Model Improved From 0.676165347405453 ----> 0.6884669479606188
other scores here... 0.6884669479606188, 0.4917289825515523
Starting 19 epoch...


100%|██████████| 743/743 [15:12<00:00,  1.23s/it, loss=0.00334]
100%|██████████| 93/93 [01:04<00:00,  1.44it/s, loss=0.00409]


Epoch 19 - avg_train_loss: 0.00334  avg_val_loss: 0.00409  time: 978s
Epoch 19 - train_f1_at_03:0.68919  valid_f1_at_03:0.62052
Epoch 19 - train_f1_at_05:0.49512  valid_f1_at_05:0.53835
Starting 20 epoch...


100%|██████████| 743/743 [15:09<00:00,  1.22s/it, loss=0.00316]
100%|██████████| 93/93 [01:04<00:00,  1.44it/s, loss=0.00401]


Epoch 20 - avg_train_loss: 0.00316  avg_val_loss: 0.00401  time: 975s
Epoch 20 - train_f1_at_03:0.70982  valid_f1_at_03:0.63599
Epoch 20 - train_f1_at_05:0.55678  valid_f1_at_05:0.56616
Starting 21 epoch...


100%|██████████| 743/743 [15:11<00:00,  1.23s/it, loss=0.00331]
100%|██████████| 93/93 [01:04<00:00,  1.44it/s, loss=0.00334]


Epoch 21 - avg_train_loss: 0.00331  avg_val_loss: 0.00334  time: 976s
Epoch 21 - train_f1_at_03:0.69375  valid_f1_at_03:0.69921
Epoch 21 - train_f1_at_05:0.53712  valid_f1_at_05:0.65669
>>>>>>>> Model Improved From 0.6884669479606188 ----> 0.699205823957644
other scores here... 0.699205823957644, 0.6566929133858267
Starting 22 epoch...


100%|██████████| 743/743 [15:08<00:00,  1.22s/it, loss=0.00317]
100%|██████████| 93/93 [01:03<00:00,  1.47it/s, loss=0.00345]


Epoch 22 - avg_train_loss: 0.00317  avg_val_loss: 0.00345  time: 973s
Epoch 22 - train_f1_at_03:0.70550  valid_f1_at_03:0.69506
Epoch 22 - train_f1_at_05:0.56390  valid_f1_at_05:0.65069
Starting 23 epoch...


100%|██████████| 743/743 [15:08<00:00,  1.22s/it, loss=0.00278]
100%|██████████| 93/93 [01:04<00:00,  1.45it/s, loss=0.00431]


Epoch 23 - avg_train_loss: 0.00278  avg_val_loss: 0.00431  time: 974s
Epoch 23 - train_f1_at_03:0.74859  valid_f1_at_03:0.63259
Epoch 23 - train_f1_at_05:0.61969  valid_f1_at_05:0.56528
Fold 1 Training
Starting 1 epoch...


100%|██████████| 743/743 [15:06<00:00,  1.22s/it, loss=0.0145]
100%|██████████| 93/93 [01:02<00:00,  1.48it/s, loss=0.00781]


Epoch 1 - avg_train_loss: 0.01451  avg_val_loss: 0.00781  time: 970s
Epoch 1 - train_f1_at_03:0.00488  valid_f1_at_03:0.00000
Epoch 1 - train_f1_at_05:0.00215  valid_f1_at_05:0.00000
>>>>>>>> Model Improved From -inf ----> 0.0
other scores here... 0.0, 0.0
Starting 2 epoch...


100%|██████████| 743/743 [15:07<00:00,  1.22s/it, loss=0.00838]
100%|██████████| 93/93 [01:03<00:00,  1.46it/s, loss=0.00664]


Epoch 2 - avg_train_loss: 0.00838  avg_val_loss: 0.00664  time: 972s
Epoch 2 - train_f1_at_03:0.01406  valid_f1_at_03:0.19196
Epoch 2 - train_f1_at_05:0.00000  valid_f1_at_05:0.02554
>>>>>>>> Model Improved From 0.0 ----> 0.19196314307652929
other scores here... 0.19196314307652929, 0.02554202554202554
Starting 3 epoch...


100%|██████████| 743/743 [15:07<00:00,  1.22s/it, loss=0.00769]
100%|██████████| 93/93 [01:03<00:00,  1.47it/s, loss=0.00615]


Epoch 3 - avg_train_loss: 0.00769  avg_val_loss: 0.00615  time: 972s
Epoch 3 - train_f1_at_03:0.04836  valid_f1_at_03:0.32168
Epoch 3 - train_f1_at_05:0.00419  valid_f1_at_05:0.10777
>>>>>>>> Model Improved From 0.19196314307652929 ----> 0.3216783216783217
other scores here... 0.3216783216783217, 0.10777084515031196
Starting 4 epoch...


100%|██████████| 743/743 [15:06<00:00,  1.22s/it, loss=0.00708]
100%|██████████| 93/93 [01:03<00:00,  1.46it/s, loss=0.00569]


Epoch 4 - avg_train_loss: 0.00708  avg_val_loss: 0.00569  time: 971s
Epoch 4 - train_f1_at_03:0.11411  valid_f1_at_03:0.38282
Epoch 4 - train_f1_at_05:0.01190  valid_f1_at_05:0.16474
>>>>>>>> Model Improved From 0.3216783216783217 ----> 0.38281582305401957
other scores here... 0.38281582305401957, 0.16474464579901155
Starting 5 epoch...


100%|██████████| 743/743 [15:08<00:00,  1.22s/it, loss=0.00674]
100%|██████████| 93/93 [01:04<00:00,  1.45it/s, loss=0.005]  


Epoch 5 - avg_train_loss: 0.00674  avg_val_loss: 0.00500  time: 973s
Epoch 5 - train_f1_at_03:0.15713  valid_f1_at_03:0.47708
Epoch 5 - train_f1_at_05:0.01880  valid_f1_at_05:0.21199
>>>>>>>> Model Improved From 0.38281582305401957 ----> 0.47707533831549176
other scores here... 0.47707533831549176, 0.2119914346895075
Starting 6 epoch...


100%|██████████| 743/743 [15:04<00:00,  1.22s/it, loss=0.00661]
100%|██████████| 93/93 [01:03<00:00,  1.46it/s, loss=0.00457]


Epoch 6 - avg_train_loss: 0.00661  avg_val_loss: 0.00457  time: 969s
Epoch 6 - train_f1_at_03:0.18411  valid_f1_at_03:0.52750
Epoch 6 - train_f1_at_05:0.02417  valid_f1_at_05:0.27552
>>>>>>>> Model Improved From 0.47707533831549176 ----> 0.5275024295432459
other scores here... 0.5275024295432459, 0.2755233910571207
Starting 7 epoch...


100%|██████████| 743/743 [15:03<00:00,  1.22s/it, loss=0.00632]
100%|██████████| 93/93 [01:02<00:00,  1.49it/s, loss=0.00484]


Epoch 7 - avg_train_loss: 0.00632  avg_val_loss: 0.00484  time: 967s
Epoch 7 - train_f1_at_03:0.21234  valid_f1_at_03:0.49383
Epoch 7 - train_f1_at_05:0.03448  valid_f1_at_05:0.20920
Starting 8 epoch...


100%|██████████| 743/743 [15:05<00:00,  1.22s/it, loss=0.00601]
100%|██████████| 93/93 [01:03<00:00,  1.47it/s, loss=0.00471]


Epoch 8 - avg_train_loss: 0.00601  avg_val_loss: 0.00471  time: 969s
Epoch 8 - train_f1_at_03:0.26310  valid_f1_at_03:0.52409
Epoch 8 - train_f1_at_05:0.05358  valid_f1_at_05:0.26387
Starting 9 epoch...


100%|██████████| 743/743 [15:04<00:00,  1.22s/it, loss=0.00594]
100%|██████████| 93/93 [01:03<00:00,  1.47it/s, loss=0.00411]


Epoch 9 - avg_train_loss: 0.00594  avg_val_loss: 0.00411  time: 969s
Epoch 9 - train_f1_at_03:0.25654  valid_f1_at_03:0.58613
Epoch 9 - train_f1_at_05:0.04910  valid_f1_at_05:0.32111
>>>>>>>> Model Improved From 0.5275024295432459 ----> 0.5861348528015196
other scores here... 0.5861348528015196, 0.32110552763819095
Starting 10 epoch...


100%|██████████| 743/743 [15:06<00:00,  1.22s/it, loss=0.00601]
100%|██████████| 93/93 [01:03<00:00,  1.47it/s, loss=0.00392]


Epoch 10 - avg_train_loss: 0.00601  avg_val_loss: 0.00392  time: 970s
Epoch 10 - train_f1_at_03:0.26021  valid_f1_at_03:0.61701
Epoch 10 - train_f1_at_05:0.04882  valid_f1_at_05:0.37122
>>>>>>>> Model Improved From 0.5861348528015196 ----> 0.6170133729569093
other scores here... 0.6170133729569093, 0.37121951219512195
Starting 11 epoch...


100%|██████████| 743/743 [15:04<00:00,  1.22s/it, loss=0.00568]
100%|██████████| 93/93 [01:03<00:00,  1.47it/s, loss=0.00423]


Epoch 11 - avg_train_loss: 0.00568  avg_val_loss: 0.00423  time: 968s
Epoch 11 - train_f1_at_03:0.30218  valid_f1_at_03:0.57313
Epoch 11 - train_f1_at_05:0.06803  valid_f1_at_05:0.28966
Starting 12 epoch...


100%|██████████| 743/743 [15:07<00:00,  1.22s/it, loss=0.00556]
100%|██████████| 93/93 [01:03<00:00,  1.46it/s, loss=0.00431]


Epoch 12 - avg_train_loss: 0.00556  avg_val_loss: 0.00431  time: 972s
Epoch 12 - train_f1_at_03:0.31232  valid_f1_at_03:0.59329
Epoch 12 - train_f1_at_05:0.06171  valid_f1_at_05:0.29460
Starting 13 epoch...


100%|██████████| 743/743 [15:05<00:00,  1.22s/it, loss=0.00553]
100%|██████████| 93/93 [01:03<00:00,  1.47it/s, loss=0.00366]


Epoch 13 - avg_train_loss: 0.00553  avg_val_loss: 0.00366  time: 969s
Epoch 13 - train_f1_at_03:0.32364  valid_f1_at_03:0.65776
Epoch 13 - train_f1_at_05:0.07653  valid_f1_at_05:0.40933
>>>>>>>> Model Improved From 0.6170133729569093 ----> 0.6577569080729637
other scores here... 0.6577569080729637, 0.40932889100428366
Starting 14 epoch...


100%|██████████| 743/743 [15:07<00:00,  1.22s/it, loss=0.00551]
100%|██████████| 93/93 [01:03<00:00,  1.46it/s, loss=0.00356]


Epoch 14 - avg_train_loss: 0.00551  avg_val_loss: 0.00356  time: 972s
Epoch 14 - train_f1_at_03:0.32382  valid_f1_at_03:0.66248
Epoch 14 - train_f1_at_05:0.08606  valid_f1_at_05:0.43690
>>>>>>>> Model Improved From 0.6577569080729637 ----> 0.6624843161856964
other scores here... 0.6624843161856964, 0.4369000234137204
Starting 15 epoch...


100%|██████████| 743/743 [15:07<00:00,  1.22s/it, loss=0.0053] 
100%|██████████| 93/93 [01:03<00:00,  1.47it/s, loss=0.00393]


Epoch 15 - avg_train_loss: 0.00530  avg_val_loss: 0.00393  time: 972s
Epoch 15 - train_f1_at_03:0.34840  valid_f1_at_03:0.61840
Epoch 15 - train_f1_at_05:0.09334  valid_f1_at_05:0.28689
Starting 16 epoch...


100%|██████████| 743/743 [15:06<00:00,  1.22s/it, loss=0.00503]
100%|██████████| 93/93 [01:03<00:00,  1.47it/s, loss=0.00401]


Epoch 16 - avg_train_loss: 0.00503  avg_val_loss: 0.00401  time: 970s
Epoch 16 - train_f1_at_03:0.38367  valid_f1_at_03:0.60509
Epoch 16 - train_f1_at_05:0.12550  valid_f1_at_05:0.30371
Starting 17 epoch...


100%|██████████| 743/743 [15:05<00:00,  1.22s/it, loss=0.00526]
100%|██████████| 93/93 [01:02<00:00,  1.48it/s, loss=0.00337]


Epoch 17 - avg_train_loss: 0.00526  avg_val_loss: 0.00337  time: 969s
Epoch 17 - train_f1_at_03:0.35661  valid_f1_at_03:0.70051
Epoch 17 - train_f1_at_05:0.09773  valid_f1_at_05:0.43962
>>>>>>>> Model Improved From 0.6624843161856964 ----> 0.7005093975057087
other scores here... 0.7005093975057087, 0.43961691193646346
Starting 18 epoch...


100%|██████████| 743/743 [15:07<00:00,  1.22s/it, loss=0.00514]
100%|██████████| 93/93 [01:03<00:00,  1.47it/s, loss=0.00341]


Epoch 18 - avg_train_loss: 0.00514  avg_val_loss: 0.00341  time: 971s
Epoch 18 - train_f1_at_03:0.38113  valid_f1_at_03:0.68991
Epoch 18 - train_f1_at_05:0.11516  valid_f1_at_05:0.43262
Starting 19 epoch...


100%|██████████| 743/743 [15:07<00:00,  1.22s/it, loss=0.00336]
100%|██████████| 93/93 [01:03<00:00,  1.46it/s, loss=0.00409]


Epoch 19 - avg_train_loss: 0.00336  avg_val_loss: 0.00409  time: 971s
Epoch 19 - train_f1_at_03:0.68811  valid_f1_at_03:0.63309
Epoch 19 - train_f1_at_05:0.49070  valid_f1_at_05:0.55178
Starting 20 epoch...


100%|██████████| 743/743 [15:03<00:00,  1.22s/it, loss=0.00314]
100%|██████████| 93/93 [01:04<00:00,  1.44it/s, loss=0.0037] 


Epoch 20 - avg_train_loss: 0.00314  avg_val_loss: 0.00370  time: 969s
Epoch 20 - train_f1_at_03:0.70897  valid_f1_at_03:0.65364
Epoch 20 - train_f1_at_05:0.55244  valid_f1_at_05:0.58430
Starting 21 epoch...


100%|██████████| 743/743 [15:05<00:00,  1.22s/it, loss=0.00329]
100%|██████████| 93/93 [01:03<00:00,  1.47it/s, loss=0.00328]


Epoch 21 - avg_train_loss: 0.00329  avg_val_loss: 0.00328  time: 969s
Epoch 21 - train_f1_at_03:0.69238  valid_f1_at_03:0.71036
Epoch 21 - train_f1_at_05:0.54205  valid_f1_at_05:0.65601
>>>>>>>> Model Improved From 0.7005093975057087 ----> 0.7103608076795764
other scores here... 0.7103608076795764, 0.6560094730609829
Starting 22 epoch...


100%|██████████| 743/743 [15:10<00:00,  1.23s/it, loss=0.00319]
100%|██████████| 93/93 [01:03<00:00,  1.46it/s, loss=0.0034] 


Epoch 22 - avg_train_loss: 0.00319  avg_val_loss: 0.00340  time: 974s
Epoch 22 - train_f1_at_03:0.70677  valid_f1_at_03:0.70114
Epoch 22 - train_f1_at_05:0.55507  valid_f1_at_05:0.65365
Starting 23 epoch...


100%|██████████| 743/743 [15:06<00:00,  1.22s/it, loss=0.00279]
100%|██████████| 93/93 [01:03<00:00,  1.47it/s, loss=0.00408]


Epoch 23 - avg_train_loss: 0.00279  avg_val_loss: 0.00408  time: 971s
Epoch 23 - train_f1_at_03:0.74771  valid_f1_at_03:0.63405
Epoch 23 - train_f1_at_05:0.62003  valid_f1_at_05:0.56450
Fold 2 Training
Starting 1 epoch...


100%|██████████| 743/743 [16:48<00:00,  1.36s/it, loss=0.0147]
100%|██████████| 93/93 [01:18<00:00,  1.18it/s, loss=0.00787]


Epoch 1 - avg_train_loss: 0.01471  avg_val_loss: 0.00787  time: 1087s
Epoch 1 - train_f1_at_03:0.00377  valid_f1_at_03:0.00060
Epoch 1 - train_f1_at_05:0.00174  valid_f1_at_05:0.00000
>>>>>>>> Model Improved From -inf ----> 0.0005961251862891208
other scores here... 0.0005961251862891208, 0.0
Starting 2 epoch...


100%|██████████| 743/743 [15:02<00:00,  1.22s/it, loss=0.0084] 
100%|██████████| 93/93 [01:02<00:00,  1.50it/s, loss=0.00676]


Epoch 2 - avg_train_loss: 0.00840  avg_val_loss: 0.00676  time: 966s
Epoch 2 - train_f1_at_03:0.01658  valid_f1_at_03:0.19570
Epoch 2 - train_f1_at_05:0.00045  valid_f1_at_05:0.03399
>>>>>>>> Model Improved From 0.0005961251862891208 ----> 0.19570164348925412
other scores here... 0.19570164348925412, 0.033987694110752996
Starting 3 epoch...


100%|██████████| 743/743 [14:52<00:00,  1.20s/it, loss=0.00769]
100%|██████████| 93/93 [01:00<00:00,  1.53it/s, loss=0.00607]


Epoch 3 - avg_train_loss: 0.00769  avg_val_loss: 0.00607  time: 954s
Epoch 3 - train_f1_at_03:0.05483  valid_f1_at_03:0.33489
Epoch 3 - train_f1_at_05:0.00300  valid_f1_at_05:0.11666
>>>>>>>> Model Improved From 0.19570164348925412 ----> 0.33489200623469156
other scores here... 0.33489200623469156, 0.11665731912507012
Starting 4 epoch...


100%|██████████| 743/743 [14:46<00:00,  1.19s/it, loss=0.00707]
100%|██████████| 93/93 [01:01<00:00,  1.51it/s, loss=0.00593]


Epoch 4 - avg_train_loss: 0.00707  avg_val_loss: 0.00593  time: 949s
Epoch 4 - train_f1_at_03:0.10755  valid_f1_at_03:0.35656
Epoch 4 - train_f1_at_05:0.01075  valid_f1_at_05:0.12865
>>>>>>>> Model Improved From 0.33489200623469156 ----> 0.35655828356558283
other scores here... 0.35655828356558283, 0.1286549707602339
Starting 5 epoch...


100%|██████████| 743/743 [14:49<00:00,  1.20s/it, loss=0.00677]
100%|██████████| 93/93 [01:02<00:00,  1.49it/s, loss=0.0052] 


Epoch 5 - avg_train_loss: 0.00677  avg_val_loss: 0.00520  time: 952s
Epoch 5 - train_f1_at_03:0.16245  valid_f1_at_03:0.45856
Epoch 5 - train_f1_at_05:0.01723  valid_f1_at_05:0.19301
>>>>>>>> Model Improved From 0.35655828356558283 ----> 0.4585567010309278
other scores here... 0.4585567010309278, 0.19301075268817203
Starting 6 epoch...


100%|██████████| 743/743 [14:50<00:00,  1.20s/it, loss=0.00661]
100%|██████████| 93/93 [01:01<00:00,  1.51it/s, loss=0.00468]


Epoch 6 - avg_train_loss: 0.00661  avg_val_loss: 0.00468  time: 953s
Epoch 6 - train_f1_at_03:0.18010  valid_f1_at_03:0.53414
Epoch 6 - train_f1_at_05:0.02349  valid_f1_at_05:0.29200
>>>>>>>> Model Improved From 0.4585567010309278 ----> 0.5341373273682163
other scores here... 0.5341373273682163, 0.292004048582996
Starting 7 epoch...


100%|██████████| 743/743 [16:08<00:00,  1.30s/it, loss=0.00632]
100%|██████████| 93/93 [01:08<00:00,  1.36it/s, loss=0.00489]


Epoch 7 - avg_train_loss: 0.00632  avg_val_loss: 0.00489  time: 1037s
Epoch 7 - train_f1_at_03:0.21370  valid_f1_at_03:0.50514
Epoch 7 - train_f1_at_05:0.03167  valid_f1_at_05:0.28397
Starting 8 epoch...


100%|██████████| 743/743 [16:22<00:00,  1.32s/it, loss=0.00603]
100%|██████████| 93/93 [01:07<00:00,  1.39it/s, loss=0.00486]


Epoch 8 - avg_train_loss: 0.00603  avg_val_loss: 0.00486  time: 1050s
Epoch 8 - train_f1_at_03:0.25659  valid_f1_at_03:0.51085
Epoch 8 - train_f1_at_05:0.05523  valid_f1_at_05:0.28738
Starting 9 epoch...


100%|██████████| 743/743 [16:18<00:00,  1.32s/it, loss=0.00604]
100%|██████████| 93/93 [01:07<00:00,  1.38it/s, loss=0.00412]


Epoch 9 - avg_train_loss: 0.00604  avg_val_loss: 0.00412  time: 1047s
Epoch 9 - train_f1_at_03:0.24373  valid_f1_at_03:0.60074
Epoch 9 - train_f1_at_05:0.04332  valid_f1_at_05:0.35733
>>>>>>>> Model Improved From 0.5341373273682163 ----> 0.6007448789571695
other scores here... 0.6007448789571695, 0.3573346350988528
Starting 10 epoch...


100%|██████████| 743/743 [16:17<00:00,  1.32s/it, loss=0.00599]
100%|██████████| 93/93 [01:07<00:00,  1.37it/s, loss=0.00397]


Epoch 10 - avg_train_loss: 0.00599  avg_val_loss: 0.00397  time: 1046s
Epoch 10 - train_f1_at_03:0.26731  valid_f1_at_03:0.61780
Epoch 10 - train_f1_at_05:0.05884  valid_f1_at_05:0.40388
>>>>>>>> Model Improved From 0.6007448789571695 ----> 0.6178049231908199
other scores here... 0.6178049231908199, 0.40387798533932373
Starting 11 epoch...


100%|██████████| 743/743 [16:19<00:00,  1.32s/it, loss=0.0057] 
100%|██████████| 93/93 [01:08<00:00,  1.36it/s, loss=0.00428]


Epoch 11 - avg_train_loss: 0.00570  avg_val_loss: 0.00428  time: 1049s
Epoch 11 - train_f1_at_03:0.29903  valid_f1_at_03:0.58784
Epoch 11 - train_f1_at_05:0.07090  valid_f1_at_05:0.31247
Starting 12 epoch...


100%|██████████| 743/743 [15:46<00:00,  1.27s/it, loss=0.00551]
100%|██████████| 93/93 [01:08<00:00,  1.36it/s, loss=0.00445]


Epoch 12 - avg_train_loss: 0.00551  avg_val_loss: 0.00445  time: 1016s
Epoch 12 - train_f1_at_03:0.31601  valid_f1_at_03:0.56988
Epoch 12 - train_f1_at_05:0.07527  valid_f1_at_05:0.38249
Starting 13 epoch...


100%|██████████| 743/743 [15:51<00:00,  1.28s/it, loss=0.00557]
100%|██████████| 93/93 [01:08<00:00,  1.36it/s, loss=0.00366]


Epoch 13 - avg_train_loss: 0.00557  avg_val_loss: 0.00366  time: 1020s
Epoch 13 - train_f1_at_03:0.31591  valid_f1_at_03:0.65044
Epoch 13 - train_f1_at_05:0.08093  valid_f1_at_05:0.46639
>>>>>>>> Model Improved From 0.6178049231908199 ----> 0.6504440497335702
other scores here... 0.6504440497335702, 0.46639418710263386
Starting 14 epoch...


100%|██████████| 743/743 [15:52<00:00,  1.28s/it, loss=0.00548]
100%|██████████| 93/93 [01:13<00:00,  1.27it/s, loss=0.00365]


Epoch 14 - avg_train_loss: 0.00548  avg_val_loss: 0.00365  time: 1026s
Epoch 14 - train_f1_at_03:0.33403  valid_f1_at_03:0.65490
Epoch 14 - train_f1_at_05:0.09245  valid_f1_at_05:0.42757
>>>>>>>> Model Improved From 0.6504440497335702 ----> 0.6548984995586937
other scores here... 0.6548984995586937, 0.42757335817419656
Starting 15 epoch...


100%|██████████| 743/743 [15:52<00:00,  1.28s/it, loss=0.00522]
100%|██████████| 93/93 [01:15<00:00,  1.24it/s, loss=0.00405]


Epoch 15 - avg_train_loss: 0.00522  avg_val_loss: 0.00405  time: 1028s
Epoch 15 - train_f1_at_03:0.35388  valid_f1_at_03:0.61445
Epoch 15 - train_f1_at_05:0.10110  valid_f1_at_05:0.35139
Starting 16 epoch...


100%|██████████| 743/743 [16:05<00:00,  1.30s/it, loss=0.00508]
100%|██████████| 93/93 [01:03<00:00,  1.46it/s, loss=0.00403]


Epoch 16 - avg_train_loss: 0.00508  avg_val_loss: 0.00403  time: 1030s
Epoch 16 - train_f1_at_03:0.36453  valid_f1_at_03:0.61243
Epoch 16 - train_f1_at_05:0.11094  valid_f1_at_05:0.39868
Starting 17 epoch...


100%|██████████| 743/743 [15:07<00:00,  1.22s/it, loss=0.00522]
100%|██████████| 93/93 [01:04<00:00,  1.44it/s, loss=0.00347]


Epoch 17 - avg_train_loss: 0.00522  avg_val_loss: 0.00347  time: 972s
Epoch 17 - train_f1_at_03:0.35688  valid_f1_at_03:0.67865
Epoch 17 - train_f1_at_05:0.10071  valid_f1_at_05:0.47389
>>>>>>>> Model Improved From 0.6548984995586937 ----> 0.6786454733932273
other scores here... 0.6786454733932273, 0.4738865023739543
Starting 18 epoch...


100%|██████████| 743/743 [15:08<00:00,  1.22s/it, loss=0.00515]
100%|██████████| 93/93 [01:03<00:00,  1.45it/s, loss=0.00348]


Epoch 18 - avg_train_loss: 0.00515  avg_val_loss: 0.00348  time: 973s
Epoch 18 - train_f1_at_03:0.36787  valid_f1_at_03:0.67653
Epoch 18 - train_f1_at_05:0.11350  valid_f1_at_05:0.43843
Starting 19 epoch...


100%|██████████| 743/743 [15:06<00:00,  1.22s/it, loss=0.00341]
100%|██████████| 93/93 [01:03<00:00,  1.47it/s, loss=0.00418]


Epoch 19 - avg_train_loss: 0.00341  avg_val_loss: 0.00418  time: 971s
Epoch 19 - train_f1_at_03:0.68117  valid_f1_at_03:0.60465
Epoch 19 - train_f1_at_05:0.47784  valid_f1_at_05:0.51166
Starting 20 epoch...


100%|██████████| 743/743 [15:08<00:00,  1.22s/it, loss=0.00318]
100%|██████████| 93/93 [01:04<00:00,  1.45it/s, loss=0.00376]


Epoch 20 - avg_train_loss: 0.00318  avg_val_loss: 0.00376  time: 974s
Epoch 20 - train_f1_at_03:0.70237  valid_f1_at_03:0.65757
Epoch 20 - train_f1_at_05:0.54755  valid_f1_at_05:0.56039
Starting 21 epoch...


100%|██████████| 743/743 [17:10<00:00,  1.39s/it, loss=0.00339]
100%|██████████| 93/93 [01:18<00:00,  1.18it/s, loss=0.00341]


Epoch 21 - avg_train_loss: 0.00339  avg_val_loss: 0.00341  time: 1110s
Epoch 21 - train_f1_at_03:0.68392  valid_f1_at_03:0.70714
Epoch 21 - train_f1_at_05:0.51776  valid_f1_at_05:0.65602
>>>>>>>> Model Improved From 0.6786454733932273 ----> 0.7071440356376835
other scores here... 0.7071440356376835, 0.6560249172668873
Starting 22 epoch...


100%|██████████| 743/743 [17:06<00:00,  1.38s/it, loss=0.0032] 
100%|██████████| 93/93 [01:10<00:00,  1.31it/s, loss=0.00354]


Epoch 22 - avg_train_loss: 0.00320  avg_val_loss: 0.00354  time: 1098s
Epoch 22 - train_f1_at_03:0.70435  valid_f1_at_03:0.70435
Epoch 22 - train_f1_at_05:0.55600  valid_f1_at_05:0.66227
Starting 23 epoch...


100%|██████████| 743/743 [15:13<00:00,  1.23s/it, loss=0.00282]
100%|██████████| 93/93 [01:04<00:00,  1.44it/s, loss=0.00396]


Epoch 23 - avg_train_loss: 0.00282  avg_val_loss: 0.00396  time: 979s
Epoch 23 - train_f1_at_03:0.74303  valid_f1_at_03:0.66264
Epoch 23 - train_f1_at_05:0.61200  valid_f1_at_05:0.59830
Fold 3 Training
Starting 1 epoch...


100%|██████████| 743/743 [16:04<00:00,  1.30s/it, loss=0.0152]
100%|██████████| 93/93 [01:03<00:00,  1.47it/s, loss=0.00788]


Epoch 1 - avg_train_loss: 0.01515  avg_val_loss: 0.00788  time: 1028s
Epoch 1 - train_f1_at_03:0.00435  valid_f1_at_03:0.00000
Epoch 1 - train_f1_at_05:0.00252  valid_f1_at_05:0.00000
>>>>>>>> Model Improved From -inf ----> 0.0
other scores here... 0.0, 0.0
Starting 2 epoch...


100%|██████████| 743/743 [15:11<00:00,  1.23s/it, loss=0.00851]
100%|██████████| 93/93 [01:04<00:00,  1.44it/s, loss=0.00688]


Epoch 2 - avg_train_loss: 0.00851  avg_val_loss: 0.00688  time: 977s
Epoch 2 - train_f1_at_03:0.00582  valid_f1_at_03:0.13981
Epoch 2 - train_f1_at_05:0.00000  valid_f1_at_05:0.01374
>>>>>>>> Model Improved From 0.0 ----> 0.1398110661268556
other scores here... 0.1398110661268556, 0.013743651030773825
Starting 3 epoch...


100%|██████████| 743/743 [15:12<00:00,  1.23s/it, loss=0.00783]
100%|██████████| 93/93 [01:03<00:00,  1.46it/s, loss=0.00629]


Epoch 3 - avg_train_loss: 0.00783  avg_val_loss: 0.00629  time: 977s
Epoch 3 - train_f1_at_03:0.03531  valid_f1_at_03:0.29521
Epoch 3 - train_f1_at_05:0.00045  valid_f1_at_05:0.09226
>>>>>>>> Model Improved From 0.1398110661268556 ----> 0.29520795660036164
other scores here... 0.29520795660036164, 0.09226361031518623
Starting 4 epoch...


100%|██████████| 743/743 [15:09<00:00,  1.22s/it, loss=0.00716]
100%|██████████| 93/93 [01:04<00:00,  1.44it/s, loss=0.00591]


Epoch 4 - avg_train_loss: 0.00716  avg_val_loss: 0.00591  time: 975s
Epoch 4 - train_f1_at_03:0.09733  valid_f1_at_03:0.31712
Epoch 4 - train_f1_at_05:0.00731  valid_f1_at_05:0.10427
>>>>>>>> Model Improved From 0.29520795660036164 ----> 0.31711794401101423
other scores here... 0.31711794401101423, 0.10427350427350429
Starting 5 epoch...


100%|██████████| 743/743 [15:10<00:00,  1.23s/it, loss=0.00676]
100%|██████████| 93/93 [01:03<00:00,  1.46it/s, loss=0.00505]


Epoch 5 - avg_train_loss: 0.00676  avg_val_loss: 0.00505  time: 975s
Epoch 5 - train_f1_at_03:0.15662  valid_f1_at_03:0.47080
Epoch 5 - train_f1_at_05:0.02301  valid_f1_at_05:0.26372
>>>>>>>> Model Improved From 0.31711794401101423 ----> 0.4707957663661309
other scores here... 0.4707957663661309, 0.2637191157347204
Starting 6 epoch...


100%|██████████| 743/743 [15:12<00:00,  1.23s/it, loss=0.00667]
100%|██████████| 93/93 [01:03<00:00,  1.46it/s, loss=0.00462]


Epoch 6 - avg_train_loss: 0.00667  avg_val_loss: 0.00462  time: 976s
Epoch 6 - train_f1_at_03:0.15986  valid_f1_at_03:0.52840
Epoch 6 - train_f1_at_05:0.02242  valid_f1_at_05:0.30147
>>>>>>>> Model Improved From 0.4707957663661309 ----> 0.5284046692607004
other scores here... 0.5284046692607004, 0.3014743263853584
Starting 7 epoch...


100%|██████████| 743/743 [15:11<00:00,  1.23s/it, loss=0.00639]
100%|██████████| 93/93 [01:04<00:00,  1.44it/s, loss=0.00474]


Epoch 7 - avg_train_loss: 0.00639  avg_val_loss: 0.00474  time: 977s
Epoch 7 - train_f1_at_03:0.20547  valid_f1_at_03:0.51751
Epoch 7 - train_f1_at_05:0.03253  valid_f1_at_05:0.29046
Starting 8 epoch...


100%|██████████| 743/743 [15:09<00:00,  1.22s/it, loss=0.00609]
100%|██████████| 93/93 [01:03<00:00,  1.46it/s, loss=0.005]  


Epoch 8 - avg_train_loss: 0.00609  avg_val_loss: 0.00500  time: 974s
Epoch 8 - train_f1_at_03:0.24434  valid_f1_at_03:0.47154
Epoch 8 - train_f1_at_05:0.04245  valid_f1_at_05:0.19827
Starting 9 epoch...


100%|██████████| 743/743 [15:08<00:00,  1.22s/it, loss=0.00601]
100%|██████████| 93/93 [01:04<00:00,  1.45it/s, loss=0.00412]


Epoch 9 - avg_train_loss: 0.00601  avg_val_loss: 0.00412  time: 973s
Epoch 9 - train_f1_at_03:0.25869  valid_f1_at_03:0.59384
Epoch 9 - train_f1_at_05:0.05198  valid_f1_at_05:0.36644
>>>>>>>> Model Improved From 0.5284046692607004 ----> 0.593841642228739
other scores here... 0.593841642228739, 0.3664383561643835
Starting 10 epoch...


100%|██████████| 743/743 [15:08<00:00,  1.22s/it, loss=0.00596]
100%|██████████| 93/93 [01:03<00:00,  1.46it/s, loss=0.00396]


Epoch 10 - avg_train_loss: 0.00596  avg_val_loss: 0.00396  time: 972s
Epoch 10 - train_f1_at_03:0.26734  valid_f1_at_03:0.61823
Epoch 10 - train_f1_at_05:0.05836  valid_f1_at_05:0.37162
>>>>>>>> Model Improved From 0.593841642228739 ----> 0.6182288777962655
other scores here... 0.6182288777962655, 0.37161667885881494
Starting 11 epoch...


100%|██████████| 743/743 [15:16<00:00,  1.23s/it, loss=0.00572]
100%|██████████| 93/93 [01:11<00:00,  1.31it/s, loss=0.00427]


Epoch 11 - avg_train_loss: 0.00572  avg_val_loss: 0.00427  time: 988s
Epoch 11 - train_f1_at_03:0.29799  valid_f1_at_03:0.58058
Epoch 11 - train_f1_at_05:0.06947  valid_f1_at_05:0.31135
Starting 12 epoch...


100%|██████████| 743/743 [17:12<00:00,  1.39s/it, loss=0.00552]
100%|██████████| 93/93 [01:20<00:00,  1.15it/s, loss=0.00423]


Epoch 12 - avg_train_loss: 0.00552  avg_val_loss: 0.00423  time: 1114s
Epoch 12 - train_f1_at_03:0.31441  valid_f1_at_03:0.58808
Epoch 12 - train_f1_at_05:0.07327  valid_f1_at_05:0.32843
Starting 13 epoch...


100%|██████████| 743/743 [17:32<00:00,  1.42s/it, loss=0.0056] 
100%|██████████| 93/93 [01:13<00:00,  1.26it/s, loss=0.00363]


Epoch 13 - avg_train_loss: 0.00560  avg_val_loss: 0.00363  time: 1127s
Epoch 13 - train_f1_at_03:0.30684  valid_f1_at_03:0.65157
Epoch 13 - train_f1_at_05:0.06429  valid_f1_at_05:0.42934
>>>>>>>> Model Improved From 0.6182288777962655 ----> 0.6515723270440251
other scores here... 0.6515723270440251, 0.42934399247589944
Starting 14 epoch...


100%|██████████| 743/743 [17:12<00:00,  1.39s/it, loss=0.00553]
100%|██████████| 93/93 [01:15<00:00,  1.23it/s, loss=0.0036] 


Epoch 14 - avg_train_loss: 0.00553  avg_val_loss: 0.00360  time: 1109s
Epoch 14 - train_f1_at_03:0.31823  valid_f1_at_03:0.65813
Epoch 14 - train_f1_at_05:0.07746  valid_f1_at_05:0.44604
>>>>>>>> Model Improved From 0.6515723270440251 ----> 0.6581301526048061
other scores here... 0.6581301526048061, 0.44603983325613705
Starting 15 epoch...


100%|██████████| 743/743 [17:38<00:00,  1.42s/it, loss=0.00527]
100%|██████████| 93/93 [01:15<00:00,  1.23it/s, loss=0.00408]


Epoch 15 - avg_train_loss: 0.00527  avg_val_loss: 0.00408  time: 1135s
Epoch 15 - train_f1_at_03:0.33616  valid_f1_at_03:0.59219
Epoch 15 - train_f1_at_05:0.09417  valid_f1_at_05:0.34712
Starting 16 epoch...


100%|██████████| 743/743 [17:27<00:00,  1.41s/it, loss=0.00512]
100%|██████████| 93/93 [01:18<00:00,  1.19it/s, loss=0.00383]


Epoch 16 - avg_train_loss: 0.00512  avg_val_loss: 0.00383  time: 1126s
Epoch 16 - train_f1_at_03:0.37549  valid_f1_at_03:0.63732
Epoch 16 - train_f1_at_05:0.10834  valid_f1_at_05:0.42269
Starting 17 epoch...


100%|██████████| 743/743 [17:34<00:00,  1.42s/it, loss=0.00514]
100%|██████████| 93/93 [01:24<00:00,  1.10it/s, loss=0.00341]


Epoch 17 - avg_train_loss: 0.00514  avg_val_loss: 0.00341  time: 1140s
Epoch 17 - train_f1_at_03:0.35826  valid_f1_at_03:0.68555
Epoch 17 - train_f1_at_05:0.10825  valid_f1_at_05:0.47626
>>>>>>>> Model Improved From 0.6581301526048061 ----> 0.6855465377309619
other scores here... 0.6855465377309619, 0.4762553965007953
Starting 18 epoch...


100%|██████████| 743/743 [17:25<00:00,  1.41s/it, loss=0.00509]
100%|██████████| 93/93 [01:16<00:00,  1.22it/s, loss=0.00337]


Epoch 18 - avg_train_loss: 0.00509  avg_val_loss: 0.00337  time: 1122s
Epoch 18 - train_f1_at_03:0.38348  valid_f1_at_03:0.68869
Epoch 18 - train_f1_at_05:0.11967  valid_f1_at_05:0.47786
>>>>>>>> Model Improved From 0.6855465377309619 ----> 0.6886907883237197
other scores here... 0.6886907883237197, 0.47785600726777194
Starting 19 epoch...


100%|██████████| 743/743 [18:11<00:00,  1.47s/it, loss=0.00337]
100%|██████████| 93/93 [01:18<00:00,  1.18it/s, loss=0.00405]


Epoch 19 - avg_train_loss: 0.00337  avg_val_loss: 0.00405  time: 1171s
Epoch 19 - train_f1_at_03:0.68385  valid_f1_at_03:0.63532
Epoch 19 - train_f1_at_05:0.48635  valid_f1_at_05:0.56436
Starting 20 epoch...


100%|██████████| 743/743 [15:22<00:00,  1.24s/it, loss=0.00313]
100%|██████████| 93/93 [01:06<00:00,  1.39it/s, loss=0.00414]


Epoch 20 - avg_train_loss: 0.00313  avg_val_loss: 0.00414  time: 990s
Epoch 20 - train_f1_at_03:0.71590  valid_f1_at_03:0.66256
Epoch 20 - train_f1_at_05:0.55855  valid_f1_at_05:0.63732
Starting 21 epoch...


100%|██████████| 743/743 [15:19<00:00,  1.24s/it, loss=0.00327]
100%|██████████| 93/93 [01:07<00:00,  1.38it/s, loss=0.00333]


Epoch 21 - avg_train_loss: 0.00327  avg_val_loss: 0.00333  time: 987s
Epoch 21 - train_f1_at_03:0.69984  valid_f1_at_03:0.70702
Epoch 21 - train_f1_at_05:0.55180  valid_f1_at_05:0.66016
>>>>>>>> Model Improved From 0.6886907883237197 ----> 0.7070241816088172
other scores here... 0.7070241816088172, 0.66015625
Starting 22 epoch...


100%|██████████| 743/743 [15:20<00:00,  1.24s/it, loss=0.00319]
100%|██████████| 93/93 [01:05<00:00,  1.41it/s, loss=0.00353]


Epoch 22 - avg_train_loss: 0.00319  avg_val_loss: 0.00353  time: 987s
Epoch 22 - train_f1_at_03:0.70616  valid_f1_at_03:0.69916
Epoch 22 - train_f1_at_05:0.56140  valid_f1_at_05:0.65925
Starting 23 epoch...


100%|██████████| 743/743 [15:20<00:00,  1.24s/it, loss=0.00281]
100%|██████████| 93/93 [01:06<00:00,  1.39it/s, loss=0.00392]


Epoch 23 - avg_train_loss: 0.00281  avg_val_loss: 0.00392  time: 988s
Epoch 23 - train_f1_at_03:0.74489  valid_f1_at_03:0.66532
Epoch 23 - train_f1_at_05:0.61485  valid_f1_at_05:0.60507
Fold 4 Training
Starting 1 epoch...


100%|██████████| 743/743 [15:18<00:00,  1.24s/it, loss=0.0147]
100%|██████████| 93/93 [01:07<00:00,  1.38it/s, loss=0.00784]


Epoch 1 - avg_train_loss: 0.01467  avg_val_loss: 0.00784  time: 987s
Epoch 1 - train_f1_at_03:0.00348  valid_f1_at_03:0.00120
Epoch 1 - train_f1_at_05:0.00108  valid_f1_at_05:0.00000
>>>>>>>> Model Improved From -inf ----> 0.001199760047990402
other scores here... 0.001199760047990402, 0.0
Starting 2 epoch...


100%|██████████| 743/743 [16:12<00:00,  1.31s/it, loss=0.00845]
100%|██████████| 93/93 [01:21<00:00,  1.14it/s, loss=0.0068] 


Epoch 2 - avg_train_loss: 0.00845  avg_val_loss: 0.00680  time: 1055s
Epoch 2 - train_f1_at_03:0.01861  valid_f1_at_03:0.18256
Epoch 2 - train_f1_at_05:0.00090  valid_f1_at_05:0.03016
>>>>>>>> Model Improved From 0.001199760047990402 ----> 0.18256200460240346
other scores here... 0.18256200460240346, 0.03015966883500887
Starting 3 epoch...


100%|██████████| 743/743 [18:39<00:00,  1.51s/it, loss=0.00776]
100%|██████████| 93/93 [01:23<00:00,  1.12it/s, loss=0.00615]


Epoch 3 - avg_train_loss: 0.00776  avg_val_loss: 0.00615  time: 1204s
Epoch 3 - train_f1_at_03:0.05383  valid_f1_at_03:0.31282
Epoch 3 - train_f1_at_05:0.00464  valid_f1_at_05:0.12521
>>>>>>>> Model Improved From 0.18256200460240346 ----> 0.312821660326695
other scores here... 0.312821660326695, 0.12521055586749016
Starting 4 epoch...


100%|██████████| 743/743 [17:50<00:00,  1.44s/it, loss=0.0071] 
100%|██████████| 93/93 [01:15<00:00,  1.23it/s, loss=0.00613]


Epoch 4 - avg_train_loss: 0.00710  avg_val_loss: 0.00613  time: 1147s
Epoch 4 - train_f1_at_03:0.10945  valid_f1_at_03:0.33340
Epoch 4 - train_f1_at_05:0.00777  valid_f1_at_05:0.11820
>>>>>>>> Model Improved From 0.312821660326695 ----> 0.3334043913877638
other scores here... 0.3334043913877638, 0.11820462782269306
Starting 5 epoch...


100%|██████████| 743/743 [15:17<00:00,  1.24s/it, loss=0.00677]
100%|██████████| 93/93 [01:04<00:00,  1.43it/s, loss=0.00518]


Epoch 5 - avg_train_loss: 0.00677  avg_val_loss: 0.00518  time: 983s
Epoch 5 - train_f1_at_03:0.15280  valid_f1_at_03:0.47270
Epoch 5 - train_f1_at_05:0.01647  valid_f1_at_05:0.24298
>>>>>>>> Model Improved From 0.3334043913877638 ----> 0.47269828616978876
other scores here... 0.47269828616978876, 0.24298084492259248
Starting 6 epoch...


100%|██████████| 743/743 [15:12<00:00,  1.23s/it, loss=0.0066] 
100%|██████████| 93/93 [01:04<00:00,  1.45it/s, loss=0.00459]


Epoch 6 - avg_train_loss: 0.00660  avg_val_loss: 0.00459  time: 977s
Epoch 6 - train_f1_at_03:0.18122  valid_f1_at_03:0.52980
Epoch 6 - train_f1_at_05:0.02516  valid_f1_at_05:0.31970
>>>>>>>> Model Improved From 0.47269828616978876 ----> 0.5297962322183776
other scores here... 0.5297962322183776, 0.3196988707653701
Starting 7 epoch...


100%|██████████| 743/743 [15:10<00:00,  1.23s/it, loss=0.00635]
100%|██████████| 93/93 [01:04<00:00,  1.45it/s, loss=0.00491]


Epoch 7 - avg_train_loss: 0.00635  avg_val_loss: 0.00491  time: 975s
Epoch 7 - train_f1_at_03:0.21488  valid_f1_at_03:0.50607
Epoch 7 - train_f1_at_05:0.03221  valid_f1_at_05:0.28403
Starting 8 epoch...


100%|██████████| 743/743 [16:31<00:00,  1.33s/it, loss=0.00604]
100%|██████████| 93/93 [01:12<00:00,  1.28it/s, loss=0.00479]


Epoch 8 - avg_train_loss: 0.00604  avg_val_loss: 0.00479  time: 1065s
Epoch 8 - train_f1_at_03:0.24420  valid_f1_at_03:0.51208
Epoch 8 - train_f1_at_05:0.04246  valid_f1_at_05:0.29200
Starting 9 epoch...


100%|██████████| 743/743 [17:25<00:00,  1.41s/it, loss=0.00601]
100%|██████████| 93/93 [01:15<00:00,  1.23it/s, loss=0.00406]


Epoch 9 - avg_train_loss: 0.00601  avg_val_loss: 0.00406  time: 1122s
Epoch 9 - train_f1_at_03:0.25487  valid_f1_at_03:0.59599
Epoch 9 - train_f1_at_05:0.04910  valid_f1_at_05:0.39254
>>>>>>>> Model Improved From 0.5297962322183776 ----> 0.5959948557780635
other scores here... 0.5959948557780635, 0.39254123834568494
Starting 10 epoch...


100%|██████████| 743/743 [17:35<00:00,  1.42s/it, loss=0.00598]
100%|██████████| 93/93 [01:14<00:00,  1.26it/s, loss=0.00395]


Epoch 10 - avg_train_loss: 0.00598  avg_val_loss: 0.00395  time: 1130s
Epoch 10 - train_f1_at_03:0.25676  valid_f1_at_03:0.60803
Epoch 10 - train_f1_at_05:0.04731  valid_f1_at_05:0.39579
>>>>>>>> Model Improved From 0.5959948557780635 ----> 0.6080291970802919
other scores here... 0.6080291970802919, 0.3957884661402249
Starting 11 epoch...


100%|██████████| 743/743 [17:15<00:00,  1.39s/it, loss=0.00581]
100%|██████████| 93/93 [01:14<00:00,  1.25it/s, loss=0.00424]


Epoch 11 - avg_train_loss: 0.00581  avg_val_loss: 0.00424  time: 1110s
Epoch 11 - train_f1_at_03:0.28034  valid_f1_at_03:0.58888
Epoch 11 - train_f1_at_05:0.05382  valid_f1_at_05:0.33243
Starting 12 epoch...


100%|██████████| 743/743 [17:19<00:00,  1.40s/it, loss=0.00542]
100%|██████████| 93/93 [01:19<00:00,  1.17it/s, loss=0.00412]


Epoch 12 - avg_train_loss: 0.00542  avg_val_loss: 0.00412  time: 1120s
Epoch 12 - train_f1_at_03:0.33280  valid_f1_at_03:0.60271
Epoch 12 - train_f1_at_05:0.08161  valid_f1_at_05:0.40509
Starting 13 epoch...


100%|██████████| 743/743 [16:01<00:00,  1.29s/it, loss=0.00554]
100%|██████████| 93/93 [01:05<00:00,  1.42it/s, loss=0.00361]


Epoch 13 - avg_train_loss: 0.00554  avg_val_loss: 0.00361  time: 1027s
Epoch 13 - train_f1_at_03:0.31431  valid_f1_at_03:0.66442
Epoch 13 - train_f1_at_05:0.07515  valid_f1_at_05:0.42700
>>>>>>>> Model Improved From 0.6080291970802919 ----> 0.6644176136363636
other scores here... 0.6644176136363636, 0.42699789078978206
Starting 14 epoch...


100%|██████████| 743/743 [15:07<00:00,  1.22s/it, loss=0.0055] 
100%|██████████| 93/93 [01:06<00:00,  1.41it/s, loss=0.0036] 


Epoch 14 - avg_train_loss: 0.00550  avg_val_loss: 0.00360  time: 974s
Epoch 14 - train_f1_at_03:0.32818  valid_f1_at_03:0.66384
Epoch 14 - train_f1_at_05:0.08083  valid_f1_at_05:0.45072
Starting 15 epoch...


100%|██████████| 743/743 [15:08<00:00,  1.22s/it, loss=0.00524]
100%|██████████| 93/93 [01:04<00:00,  1.44it/s, loss=0.00398]


Epoch 15 - avg_train_loss: 0.00524  avg_val_loss: 0.00398  time: 973s
Epoch 15 - train_f1_at_03:0.35806  valid_f1_at_03:0.62286
Epoch 15 - train_f1_at_05:0.09951  valid_f1_at_05:0.34117
Starting 16 epoch...


100%|██████████| 743/743 [15:09<00:00,  1.22s/it, loss=0.00506]
100%|██████████| 93/93 [01:04<00:00,  1.43it/s, loss=0.00382]


Epoch 16 - avg_train_loss: 0.00506  avg_val_loss: 0.00382  time: 975s
Epoch 16 - train_f1_at_03:0.39373  valid_f1_at_03:0.63638
Epoch 16 - train_f1_at_05:0.11813  valid_f1_at_05:0.38754
Starting 17 epoch...


100%|██████████| 743/743 [15:07<00:00,  1.22s/it, loss=0.00525]
100%|██████████| 93/93 [01:05<00:00,  1.42it/s, loss=0.00336]


Epoch 17 - avg_train_loss: 0.00525  avg_val_loss: 0.00336  time: 973s
Epoch 17 - train_f1_at_03:0.34759  valid_f1_at_03:0.69029
Epoch 17 - train_f1_at_05:0.10044  valid_f1_at_05:0.43753
>>>>>>>> Model Improved From 0.6644176136363636 ----> 0.6902873259298431
other scores here... 0.6902873259298431, 0.4375292466073935
Starting 18 epoch...


100%|██████████| 743/743 [15:09<00:00,  1.22s/it, loss=0.00513]
100%|██████████| 93/93 [01:08<00:00,  1.36it/s, loss=0.00335]


Epoch 18 - avg_train_loss: 0.00513  avg_val_loss: 0.00335  time: 979s
Epoch 18 - train_f1_at_03:0.36473  valid_f1_at_03:0.68803
Epoch 18 - train_f1_at_05:0.10551  valid_f1_at_05:0.49268
Starting 19 epoch...


100%|██████████| 743/743 [15:06<00:00,  1.22s/it, loss=0.00338]
100%|██████████| 93/93 [01:04<00:00,  1.45it/s, loss=0.00414]


Epoch 19 - avg_train_loss: 0.00338  avg_val_loss: 0.00414  time: 971s
Epoch 19 - train_f1_at_03:0.68649  valid_f1_at_03:0.61525
Epoch 19 - train_f1_at_05:0.47634  valid_f1_at_05:0.49989
Starting 20 epoch...


100%|██████████| 743/743 [15:06<00:00,  1.22s/it, loss=0.00317]
100%|██████████| 93/93 [01:03<00:00,  1.46it/s, loss=0.00391]


Epoch 20 - avg_train_loss: 0.00317  avg_val_loss: 0.00391  time: 971s
Epoch 20 - train_f1_at_03:0.70540  valid_f1_at_03:0.65639
Epoch 20 - train_f1_at_05:0.55463  valid_f1_at_05:0.59867
Starting 21 epoch...


100%|██████████| 743/743 [15:02<00:00,  1.21s/it, loss=0.00337]
100%|██████████| 93/93 [01:03<00:00,  1.45it/s, loss=0.00338]


Epoch 21 - avg_train_loss: 0.00337  avg_val_loss: 0.00338  time: 967s
Epoch 21 - train_f1_at_03:0.68990  valid_f1_at_03:0.70166
Epoch 21 - train_f1_at_05:0.52569  valid_f1_at_05:0.65569
>>>>>>>> Model Improved From 0.6902873259298431 ----> 0.7016611295681063
other scores here... 0.7016611295681063, 0.6556927297668039
Starting 22 epoch...


100%|██████████| 743/743 [15:02<00:00,  1.21s/it, loss=0.00317]
100%|██████████| 93/93 [01:04<00:00,  1.44it/s, loss=0.00336]


Epoch 22 - avg_train_loss: 0.00317  avg_val_loss: 0.00336  time: 968s
Epoch 22 - train_f1_at_03:0.70776  valid_f1_at_03:0.71266
Epoch 22 - train_f1_at_05:0.56314  valid_f1_at_05:0.66251
>>>>>>>> Model Improved From 0.7016611295681063 ----> 0.7126550868486352
other scores here... 0.7126550868486352, 0.6625073113667381
Starting 23 epoch...


100%|██████████| 743/743 [15:04<00:00,  1.22s/it, loss=0.00283]
100%|██████████| 93/93 [01:03<00:00,  1.45it/s, loss=0.00396]

Epoch 23 - avg_train_loss: 0.00283  avg_val_loss: 0.00396  time: 969s
Epoch 23 - train_f1_at_03:0.74553  valid_f1_at_03:0.64613
Epoch 23 - train_f1_at_05:0.61830  valid_f1_at_05:0.57078





In [21]:
model_paths = [f'fold-{i}.bin' for i in CFG.folds]

calc_cv(model_paths)

100%|██████████| 93/93 [01:03<00:00,  1.46it/s]


micro f1_0.3 0.699205823957644


100%|██████████| 93/93 [01:03<00:00,  1.45it/s]


micro f1_0.3 0.7047823928512328


100%|██████████| 93/93 [01:02<00:00,  1.48it/s]


micro f1_0.3 0.7055711687882295


100%|██████████| 93/93 [01:02<00:00,  1.49it/s]


micro f1_0.3 0.7059357714851812


100%|██████████| 93/93 [01:02<00:00,  1.49it/s]


micro f1_0.3 0.7072775924151827
overall micro f1_0.3 0.7072775924151827
overall micro f1_0.5 0.6582853562056946
