## Training Notebook for BirdCLEF2022 ##

[Original baseline by Kaerururu](https://www.kaggle.com/code/kaerunantoka/birdclef2022-use-2nd-label-f0/notebook)  
[That was forked from Kidehisa Arai (2021 comp)](https://www.kaggle.com/code/hidehisaarai1213/pytorch-inference-birdclef2021-starter/notebook)  
[My inference notebook](https://www.kaggle.com/code/ollypowell/birdclef2022-ex005-f0-infer/)  
[Original Infernence fork](https://www.kaggle.com/code/kaerunantoka/birdclef2022-ex005-f0-infer/notebook)

Data:  
https://www.kaggle.com/kaerunantoka/birdclef2022-audio-to-numpy-1-4  
https://www.kaggle.com/kaerunantoka/birdclef2022-audio-to-numpy-2-4  
https://www.kaggle.com/kaerunantoka/birdclef2022-audio-to-numpy-3-4  
https://www.kaggle.com/kaerunantoka/birdclef2022-audio-to-numpy-4-4  


**My Strategy:**  
1. Set up the training notebook on my own GPU, so no time limits
2. Run with all folds, while I work on next step.  Run inference on Kaggle once the models are ready.
3. Improve my CV score as much as I can by doing data augmentation, starting with the basic strategies below from Shinmaru, use 128 image size to speed up if need be.  
2. Train on additional folds
3. Fine tune the inference threshold (Should be in the vacinity of .005 to 0.1)

[**Basic Augmentation strategies, suggested to be useful by Shinmaru:**](https://www.kaggle.com/competitions/birdclef-2022/discussion/324318)

* Time shift
* Add pink noise and brown noise
* Mix other audio dataset

[Good notebook on this topic by Hidehisa Arai](https://www.kaggle.com/code/hidehisaarai1213/rfcx-audio-data-augmentation-japanese-english)  
[Time and noise only covered by Shinmaru](https://www.kaggle.com/code/shinmurashinmura/birdclef2022-basic-augmentation/notebook)

[**More advanced ideas (also suggested to work from Shinmaru)**](https://www.kaggle.com/competitions/birdclef-2022/discussion/307880)  

[SpecAugment](https://arxiv.org/abs/1904.08779)  
[SpecAugment++](https://arxiv.org/abs/2103.16858v3)  
[ImportantAug](https://arxiv.org/abs/2112.07156)

In [1]:
import os
import sys
sys.path.append('input/pytorch-image-models/pytorch-image-models-master')  # removed ../
import random
import time
import librosa
import colorednoise as cn
import numpy as np
import pandas as pd
import timm
import torch
import torch.nn as nn
import torch.nn.functional as F
from sklearn.model_selection import StratifiedKFold
from sklearn import metrics
from torchlibrosa.augmentation import SpecAugmentation
from tqdm import tqdm
import ast
import glob 
import albumentations as A
import audiomentations as AA
import transformers
from torch.cuda.amp import autocast, GradScaler

In [2]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

print('Setup complete. Using torch %s %s' % (torch.__version__, torch.cuda.get_device_properties(0) if torch.cuda.is_available() else 'CPU'))

Setup complete. Using torch 1.11.0 _CudaDeviceProperties(name='NVIDIA GeForce RTX 3060 Laptop GPU', major=8, minor=6, total_memory=5946MB, multi_processor_count=30)


In [3]:
# Changed the paths to suit my own filenames
all_path = glob.glob('input/train_np_1/*/*.npy')\
+ glob.glob('input/train_np_2/*/*.npy')\
+ glob.glob('input/train_np_3/*/*.npy')\
+ glob.glob('input/train_np_4/*/*.npy')

len(all_path)

14852

In [4]:
train = pd.read_csv('input/birdclef-2022/train_metadata.csv')  #Changed filepath to suit

train['new_target'] = train['primary_label'] + ' ' + train['secondary_labels'].map(lambda x: ' '.join(ast.literal_eval(x)))
train['len_new_target'] = train['new_target'].map(lambda x: len(x.split()))
# train['len_new_target'].value_counts()
train.head()

Unnamed: 0,primary_label,secondary_labels,type,latitude,longitude,scientific_name,common_name,author,license,rating,time,url,filename,new_target,len_new_target
0,afrsil1,[],"['call', 'flight call']",12.391,-1.493,Euodice cantans,African Silverbill,Bram Piot,Creative Commons Attribution-NonCommercial-Sha...,2.5,08:00,https://www.xeno-canto.org/125458,afrsil1/XC125458.ogg,afrsil1,1
1,afrsil1,"['houspa', 'redava', 'zebdov']",['call'],19.8801,-155.7254,Euodice cantans,African Silverbill,Dan Lane,Creative Commons Attribution-NonCommercial-Sha...,3.5,08:30,https://www.xeno-canto.org/175522,afrsil1/XC175522.ogg,afrsil1 houspa redava zebdov,4
2,afrsil1,[],"['call', 'song']",16.2901,-16.0321,Euodice cantans,African Silverbill,Bram Piot,Creative Commons Attribution-NonCommercial-Sha...,4.0,11:30,https://www.xeno-canto.org/177993,afrsil1/XC177993.ogg,afrsil1,1
3,afrsil1,[],"['alarm call', 'call']",17.0922,54.2958,Euodice cantans,African Silverbill,Oscar Campbell,Creative Commons Attribution-NonCommercial-Sha...,4.0,11:00,https://www.xeno-canto.org/205893,afrsil1/XC205893.ogg,afrsil1,1
4,afrsil1,[],['flight call'],21.4581,-157.7252,Euodice cantans,African Silverbill,Ross Gallardy,Creative Commons Attribution-NonCommercial-Sha...,3.0,16:30,https://www.xeno-canto.org/207431,afrsil1/XC207431.ogg,afrsil1,1


In [5]:
path_df = pd.DataFrame(all_path, columns=['file_path'])
path_df['filename'] = path_df['file_path'].map(lambda x: x.split('/')[-2]+'/'+x.split('/')[-1][:-4])
path_df.head()

Unnamed: 0,file_path,filename
0,input/train_np_1/bcnher/XC256938.ogg.npy,bcnher/XC256938.ogg
1,input/train_np_1/bcnher/XC648367.ogg.npy,bcnher/XC648367.ogg
2,input/train_np_1/bcnher/XC587839.ogg.npy,bcnher/XC587839.ogg
3,input/train_np_1/bcnher/XC548602.ogg.npy,bcnher/XC548602.ogg
4,input/train_np_1/bcnher/XC500284.ogg.npy,bcnher/XC500284.ogg


In [6]:
train = pd.merge(train, path_df, on='filename')
print(train.shape)
train.head()

(14852, 16)


Unnamed: 0,primary_label,secondary_labels,type,latitude,longitude,scientific_name,common_name,author,license,rating,time,url,filename,new_target,len_new_target,file_path
0,afrsil1,[],"['call', 'flight call']",12.391,-1.493,Euodice cantans,African Silverbill,Bram Piot,Creative Commons Attribution-NonCommercial-Sha...,2.5,08:00,https://www.xeno-canto.org/125458,afrsil1/XC125458.ogg,afrsil1,1,input/train_np_1/afrsil1/XC125458.ogg.npy
1,afrsil1,"['houspa', 'redava', 'zebdov']",['call'],19.8801,-155.7254,Euodice cantans,African Silverbill,Dan Lane,Creative Commons Attribution-NonCommercial-Sha...,3.5,08:30,https://www.xeno-canto.org/175522,afrsil1/XC175522.ogg,afrsil1 houspa redava zebdov,4,input/train_np_1/afrsil1/XC175522.ogg.npy
2,afrsil1,[],"['call', 'song']",16.2901,-16.0321,Euodice cantans,African Silverbill,Bram Piot,Creative Commons Attribution-NonCommercial-Sha...,4.0,11:30,https://www.xeno-canto.org/177993,afrsil1/XC177993.ogg,afrsil1,1,input/train_np_1/afrsil1/XC177993.ogg.npy
3,afrsil1,[],"['alarm call', 'call']",17.0922,54.2958,Euodice cantans,African Silverbill,Oscar Campbell,Creative Commons Attribution-NonCommercial-Sha...,4.0,11:00,https://www.xeno-canto.org/205893,afrsil1/XC205893.ogg,afrsil1,1,input/train_np_1/afrsil1/XC205893.ogg.npy
4,afrsil1,[],['flight call'],21.4581,-157.7252,Euodice cantans,African Silverbill,Ross Gallardy,Creative Commons Attribution-NonCommercial-Sha...,3.0,16:30,https://www.xeno-canto.org/207431,afrsil1/XC207431.ogg,afrsil1,1,input/train_np_1/afrsil1/XC207431.ogg.npy


In [7]:
Fold = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
for n, (trn_index, val_index) in enumerate(Fold.split(train, train['primary_label'])):
    train.loc[val_index, 'kfold'] = int(n)
train['kfold'] = train['kfold'].astype(int)



In [8]:
train.to_csv('train_folds.csv', index=False)

In [9]:
class CFG:
    ######################
    # Globals #
    ######################
    EXP_ID = 'EX005'
    seed = 71
    epochs = 32     # Was 23
    cutmix_and_mixup_epochs = 18
    folds =  [0, 1, 2, 3, 4]  #[0]
    N_FOLDS = 5
    LR = 1e-3
    ETA_MIN = 1e-6
    WEIGHT_DECAY = 1e-6
    train_bs = 16 # 32
    valid_bs = 32 # 64
    base_model_name = "tf_efficientnet_b0_ns"
    EARLY_STOPPING = True
    DEBUG = False # True
    EVALUATION = 'AUC'
    apex = True

    pooling = "max"
    pretrained = True
    num_classes = 152
    in_channels = 3
    target_columns = 'afrsil1 akekee akepa1 akiapo akikik amewig aniani apapan arcter \
                      barpet bcnher belkin1 bkbplo bknsti bkwpet blkfra blknod bongul \
                      brant brnboo brnnod brnowl brtcur bubsan buffle bulpet burpar buwtea \
                      cacgoo1 calqua cangoo canvas caster1 categr chbsan chemun chukar cintea \
                      comgal1 commyn compea comsan comwax coopet crehon dunlin elepai ercfra eurwig \
                      fragul gadwal gamqua glwgul gnwtea golphe grbher3 grefri gresca gryfra gwfgoo \
                      hawama hawcoo hawcre hawgoo hawhaw hawpet1 hoomer houfin houspa hudgod iiwi incter1 \
                      jabwar japqua kalphe kauama laugul layalb lcspet leasan leater1 lessca lesyel lobdow lotjae \
                      madpet magpet1 mallar3 masboo mauala maupar merlin mitpar moudov norcar norhar2 normoc norpin \
                      norsho nutman oahama omao osprey pagplo palila parjae pecsan peflov perfal pibgre pomjae puaioh \
                      reccar redava redjun redpha1 refboo rempar rettro ribgul rinduc rinphe rocpig rorpar rudtur ruff \
                      saffin sander semplo sheowl shtsan skylar snogoo sooshe sooter1 sopsku1 sora spodov sposan \
                      towsol wantat1 warwhe1 wesmea wessan wetshe whfibi whiter whttro wiltur yebcar yefcan zebdov'.split()

    img_size = 224 #224 # 128
    main_metric = "epoch_f1_at_03"

    period = 5
    n_mels = 224 #224 # 128
    fmin = 20
    fmax = 16000
    n_fft = 2048
    hop_length = 512
    sample_rate = 32000
    melspectrogram_parameters = {
        "n_mels": 224, #224, # 128,
        "fmin": 20,
        "fmax": 16000
    }
    
    
class AudioParams:
    """
    Parameters used for the audio data
    """
    sr = CFG.sample_rate
    duration = CFG.period
    # Melspectrogram
    n_mels = CFG.n_mels
    fmin = CFG.fmin
    fmax = CFG.fmax

In [10]:
class Compose:
    def __init__(self, transforms: list):
        self.transforms = transforms

    def __call__(self, y: np.ndarray, sr):
        for trns in self.transforms:
            y = trns(y, sr)
        return y


class AudioTransform:
    def __init__(self, always_apply=False, p=0.5):
        self.always_apply = always_apply
        self.p = p

    def __call__(self, y: np.ndarray, sr):
        if self.always_apply:
            return self.apply(y, sr=sr)
        else:
            if np.random.rand() < self.p:
                return self.apply(y, sr=sr)
            else:
                return y

    def apply(self, y: np.ndarray, **params):
        raise NotImplementedError


class OneOf(Compose):
    # https://github.com/albumentations-team/albumentations/blob/master/albumentations/core/composition.py
    def __init__(self, transforms, p=0.5):
        super().__init__(transforms)
        self.p = p
        transforms_ps = [t.p for t in transforms]
        s = sum(transforms_ps)
        self.transforms_ps = [t / s for t in transforms_ps]

    def __call__(self, y: np.ndarray, sr):
        data = y
        if self.transforms_ps and (random.random() < self.p):
            random_state = np.random.RandomState(random.randint(0, 2 ** 32 - 1))
            t = random_state.choice(self.transforms, p=self.transforms_ps)
            data = t(y, sr)
        return data


class Normalize(AudioTransform):
    def __init__(self, always_apply=False, p=1):
        super().__init__(always_apply, p)

    def apply(self, y: np.ndarray, **params):
        max_vol = np.abs(y).max()
        y_vol = y * 1 / max_vol
        return np.asfortranarray(y_vol)


class NewNormalize(AudioTransform):
    def __init__(self, always_apply=False, p=1):
        super().__init__(always_apply, p)

    def apply(self, y: np.ndarray, **params):
        y_mm = y - y.mean()
        return y_mm / y_mm.abs().max()


class NoiseInjection(AudioTransform):
    def __init__(self, always_apply=False, p=0.5, max_noise_level=0.5):
        super().__init__(always_apply, p)

        self.noise_level = (0.0, max_noise_level)

    def apply(self, y: np.ndarray, **params):
        noise_level = np.random.uniform(*self.noise_level)
        noise = np.random.randn(len(y))
        augmented = (y + noise * noise_level).astype(y.dtype)
        return augmented


class GaussianNoise(AudioTransform):
    def __init__(self, always_apply=False, p=0.5, min_snr=5, max_snr=20):
        super().__init__(always_apply, p)

        self.min_snr = min_snr
        self.max_snr = max_snr

    def apply(self, y: np.ndarray, **params):
        snr = np.random.uniform(self.min_snr, self.max_snr)
        a_signal = np.sqrt(y ** 2).max()
        a_noise = a_signal / (10 ** (snr / 20))

        white_noise = np.random.randn(len(y))
        a_white = np.sqrt(white_noise ** 2).max()
        augmented = (y + white_noise * 1 / a_white * a_noise).astype(y.dtype)
        return augmented

#https://github.com/felixpatzelt/colorednoise
class PinkNoise(AudioTransform):
    def __init__(self, always_apply=False, p=0.5, min_snr=5, max_snr=20):
        super().__init__(always_apply, p)

        self.min_snr = min_snr
        self.max_snr = max_snr

    def apply(self, y: np.ndarray, **params):
        snr = np.random.uniform(self.min_snr, self.max_snr)
        a_signal = np.sqrt(y ** 2).max()
        a_noise = a_signal / (10 ** (snr / 20))

        pink_noise = cn.powerlaw_psd_gaussian(1, len(y))
        a_pink = np.sqrt(pink_noise ** 2).max()
        augmented = (y + pink_noise * 1 / a_pink * a_noise).astype(y.dtype)
        return augmented

#https://github.com/felixpatzelt/colorednoise
class BrownNoise(AudioTransform):       #Added in V2
    def __init__(self, always_apply=False, p=0.5, min_snr=5, max_snr=20):
        super().__init__(always_apply, p)

        self.min_snr = min_snr
        self.max_snr = max_snr

    def apply(self, y: np.ndarray, **params):
        snr = np.random.uniform(self.min_snr, self.max_snr)
        a_signal = np.sqrt(y ** 2).max()
        a_noise = a_signal / (10 ** (snr / 20))

        brown_noise = cn.powerlaw_psd_gaussian(2, len(y))
        a_brown = np.sqrt(brown_noise ** 2).max()
        augmented = (y + brown_noise * 1 / a_brown * a_noise).astype(y.dtype)
        return augmented

#https://www.kaggle.com/code/hidehisaarai1213/rfcx-audio-data-augmentation-japanese-english
#https://medium.com/@makcedward/data-augmentation-for-audio-76912b01fdf6
#Not implemented this yet, try in V3.

class TimeShift(AudioTransform):
    def __init__(self, always_apply=False, p=0.5, max_shift_second=2, sr=32000, padding_mode="replace"):
        super().__init__(always_apply, p)
    
        assert padding_mode in ["replace", "zero"], "`padding_mode` must be either 'replace' or 'zero'"
        self.max_shift_second = max_shift_second
        self.sr = sr
        self.padding_mode = padding_mode

    def apply(self, y: np.ndarray, **params):
        shift = np.random.randint(-self.sr * self.max_shift_second, self.sr * self.max_shift_second)
        augmented = np.roll(y, shift)
        if self.padding_mode == "zero":
            if shift > 0:
                augmented[:shift] = 0
            else:
                augmented[shift:] = 0
        return augmented



class AddBackround_1(AudioTransform):       #Added in V3, using a single 30 second clip
    def __init__(self, always_apply=False, p=0.5, min_snr=5, max_snr=20):
        super().__init__(always_apply, p)

        self.min_snr = min_snr
        self.max_snr = max_snr

    def apply(self, y: np.ndarray, **params):
        snr = np.random.uniform(self.min_snr, self.max_snr)
        background_dir = 'input/background_np_1/'
        filename = random.choice(os.listdir(background_dir))
        file_path = os.path.join(background_dir, filename)

        a_signal = np.sqrt(y ** 2).max()
        a_noise = a_signal / (10 ** (snr / 20))  
        l_signal = len(y)

        background = np.load(file_path)
        a_background = np.sqrt(background ** 2).max()
        l_background = len(background)

        if l_signal > l_background:
            ratio = l_signal//l_background
            background = np.tile(background, ratio+1 )
            background = background[0:l_signal]

        if l_signal < l_background:
            background = background[0:l_signal]

        
        augmented = (y + background * 1 / a_background * a_noise).astype(y.dtype)
        return augmented

#not used
class PitchShift(AudioTransform):
    def __init__(self, always_apply=False, p=0.5, max_range=5):
        super().__init__(always_apply, p)
        self.max_range = max_range

    def apply(self, y: np.ndarray, sr, **params):
        n_steps = np.random.randint(-self.max_range, self.max_range)
        augmented = librosa.effects.pitch_shift(y, sr, n_steps)
        return augmented

#not used
class TimeStretch(AudioTransform):
    def __init__(self, always_apply=False, p=0.5, max_rate=1):
        super().__init__(always_apply, p)
        self.max_rate = max_rate

    def apply(self, y: np.ndarray, **params):
        rate = np.random.uniform(0, self.max_rate)
        augmented = librosa.effects.time_stretch(y, rate)
        return augmented


def _db2float(db: float, amplitude=True):
    if amplitude:
        return 10 ** (db / 20)
    else:
        return 10 ** (db / 10)


def volume_down(y: np.ndarray, db: float):
    """
    Low level API for decreasing the volume
    Parameters
    ----------
    y: numpy.ndarray
        stereo / monaural input audio
    db: float
        how much decibel to decrease
    Returns
    -------
    applied: numpy.ndarray
        audio with decreased volume
    """
    applied = y * _db2float(-db)
    return applied


def volume_up(y: np.ndarray, db: float):
    """
    Low level API for increasing the volume
    Parameters
    ----------
    y: numpy.ndarray
        stereo / monaural input audio
    db: float
        how much decibel to increase
    Returns
    -------
    applied: numpy.ndarray
        audio with increased volume
    """
    applied = y * _db2float(db)
    return applied


class RandomVolume(AudioTransform):
    def __init__(self, always_apply=False, p=0.5, limit=10):
        super().__init__(always_apply, p)
        self.limit = limit

    def apply(self, y: np.ndarray, **params):
        db = np.random.uniform(-self.limit, self.limit)
        if db >= 0:
            return volume_up(y, db)
        else:
            return volume_down(y, db)

# not used
class CosineVolume(AudioTransform):
    def __init__(self, always_apply=False, p=0.5, limit=10):
        super().__init__(always_apply, p)
        self.limit = limit

    def apply(self, y: np.ndarray, **params):
        db = np.random.uniform(-self.limit, self.limit)
        cosine = np.cos(np.arange(len(y)) / len(y) * np.pi * 2)
        dbs = _db2float(cosine * db)
        return y * dbs

In [11]:
OUTPUT_DIR = f'output'    #was ./ for Kaggle
if not os.path.exists(OUTPUT_DIR):
    os.makedirs(OUTPUT_DIR)
   
    
def set_seed(seed=42):
    random.seed(seed)
    os.environ["PYTHONHASHSEED"] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    
set_seed(CFG.seed)

In [12]:
def calc_loss(y_true, y_pred):
    return metrics.roc_auc_score(np.array(y_true), np.array(y_pred))


# ====================================================
# Training helper functions
# ====================================================
class AverageMeter(object):
    """Computes and stores the average and current value"""

    def __init__(self):
        self.reset()

    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count
        

class MetricMeter(object):
    def __init__(self):
        self.reset()
    
    def reset(self):
        self.y_true = []
        self.y_pred = []
    
    def update(self, y_true, y_pred):
        self.y_true.extend(y_true.cpu().detach().numpy().tolist())
        self.y_pred.extend(y_pred["clipwise_output"].cpu().detach().numpy().tolist())

    @property
    def avg(self):
        self.f1_03 = metrics.f1_score(np.array(self.y_true), np.array(self.y_pred) > 0.3, average="micro")
        self.f1_05 = metrics.f1_score(np.array(self.y_true), np.array(self.y_pred) > 0.5, average="micro")
        
        return {
            "f1_at_03" : self.f1_03,
            "f1_at_05" : self.f1_05,
        }
    
    
# https://www.kaggle.com/c/rfcx-species-audio-detection/discussion/213075
class BCEFocalLoss(nn.Module):
    def __init__(self, alpha=0.25, gamma=2.0):
        super().__init__()
        self.alpha = alpha
        self.gamma = gamma

    def forward(self, preds, targets):
        bce_loss = nn.BCEWithLogitsLoss(reduction='none')(preds, targets)
        probas = torch.sigmoid(preds)
        loss = targets * self.alpha * \
            (1. - probas)**self.gamma * bce_loss + \
            (1. - targets) * probas**self.gamma * bce_loss
        loss = loss.mean()
        return loss


class BCEFocal2WayLoss(nn.Module):
    def __init__(self, weights=[1, 1], class_weights=None):
        super().__init__()

        self.focal = BCEFocalLoss()

        self.weights = weights

    def forward(self, input, target):
        input_ = input["logit"]
        target = target.float()

        framewise_output = input["framewise_logit"]
        clipwise_output_with_max, _ = framewise_output.max(dim=1)

        loss = self.focal(input_, target)
        aux_loss = self.focal(clipwise_output_with_max, target)

        return self.weights[0] * loss + self.weights[1] * aux_loss

In [13]:
def compute_melspec(y, params):
    """
    Computes a mel-spectrogram and puts it at decibel scale
    Arguments:
        y {np array} -- signal
        params {AudioParams} -- Parameters to use for the spectrogram. Expected to have the attributes sr, n_mels, f_min, f_max
    Returns:
        np array -- Mel-spectrogram
    """
    melspec = librosa.feature.melspectrogram(
        y=y, sr=params.sr, n_mels=params.n_mels, fmin=params.fmin, fmax=params.fmax,
    )

    melspec = librosa.power_to_db(melspec).astype(np.float32)
    return melspec


def crop_or_pad(y, length, sr, train=True, probs=None):
    """
    Crops an array to a chosen length
    Arguments:
        y {1D np array} -- Array to crop
        length {int} -- Length of the crop
        sr {int} -- Sampling rate
    Keyword Arguments:
        train {bool} -- Whether we are at train time. If so, crop randomly, else return the beginning of y (default: {True})
        probs {None or numpy array} -- Probabilities to use to chose where to crop (default: {None})
    Returns:
        1D np array -- Cropped array
    """
    if len(y) <= length:
        y = np.concatenate([y, np.zeros(length - len(y))])
    else:
        if not train:
            start = 0
        elif probs is None:
            start = np.random.randint(len(y) - length)
        else:
            start = (
                    np.random.choice(np.arange(len(probs)), p=probs) + np.random.random()
            )
            start = int(sr * (start))

        y = y[start: start + length]

    return y.astype(np.float32)


def mono_to_color(X, eps=1e-6, mean=None, std=None):
    """
    Converts a one channel array to a 3 channel one in [0, 255]
    Arguments:
        X {numpy array [H x W]} -- 2D array to convert
    Keyword Arguments:
        eps {float} -- To avoid dividing by 0 (default: {1e-6})
        mean {None or np array} -- Mean for normalization (default: {None})
        std {None or np array} -- Std for normalization (default: {None})
    Returns:
        numpy array [3 x H x W] -- RGB numpy array
    """
    X = np.stack([X, X, X], axis=-1)

    # Standardize
    mean = mean or X.mean()
    std = std or X.std()
    X = (X - mean) / (std + eps)

    # Normalize to [0, 255]
    _min, _max = X.min(), X.max()

    if (_max - _min) > eps:
        V = np.clip(X, _min, _max)
        V = 255 * (V - _min) / (_max - _min)
        V = V.astype(np.uint8)
    else:
        V = np.zeros_like(X, dtype=np.uint8)

    return V


mean = (0.485, 0.456, 0.406) # RGB
std = (0.229, 0.224, 0.225) # RGB

albu_transforms = {
    'train' : A.Compose([
            A.HorizontalFlip(p=0.5),
            A.OneOf([
                A.Cutout(max_h_size=5, max_w_size=16),
                A.CoarseDropout(max_holes=4),
            ], p=0.5),
            A.Normalize(mean, std),
    ]),
    'valid' : A.Compose([
            A.Normalize(mean, std),
    ]),
}



# This was the example given
#transform = TimeShift(always_apply=True, max_shift_second=4, sr=sr)
#y_time_shifted = transform(y)
#Audio(y_time_shifted, rate=sr)

class WaveformDataset(torch.utils.data.Dataset):
    def __init__(self,
                 df: pd.DataFrame,
                 mode='train'):
        self.df = df
        self.mode = mode

        if mode == 'train':
            self.wave_transforms = Compose(
                [
                    OneOf(
                        [
                            NoiseInjection(p=1, max_noise_level=0.04),
                            GaussianNoise(p=1, min_snr=5, max_snr=20),
                            PinkNoise(p=1, min_snr=5, max_snr=20),
                            BrownNoise(p=1, min_snr=5, max_snr=20),     #Added in V2
                        ],
                        p=0.2,
                    ),
                    AddBackround_1(p=0.4, min_snr=5, max_snr=20),
                    TimeShift(p=0.2, max_shift_second=0.5),  #I'm worried it's too much given they are only 5 second clips
                    RandomVolume(p=0.2, limit=4),
                    Normalize(p=1),
                ]
            )
        else:
            self.wave_transforms = Compose(
                [
                    Normalize(p=1),
                ]
            )

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx: int):
        SR = 32000
        sample = self.df.loc[idx, :]
        
        wav_path = sample["file_path"]
        labels = sample["new_target"]

        y = np.load(wav_path)

        # SEC = int(len(y)/2/SR)
        # if SEC > 0:
        #     start = np.random.randint(SEC)
        #     end = start+AudioParams.duration
        if len(y) > 0:
            y = y[:AudioParams.duration*SR]

            if self.wave_transforms:
                y = self.wave_transforms(y, sr=SR)

        y = np.concatenate([y, y, y])[:AudioParams.duration * AudioParams.sr] 
        y = crop_or_pad(y, AudioParams.duration * AudioParams.sr, sr=AudioParams.sr, train=True, probs=None)
        image = compute_melspec(y, AudioParams)
        image = mono_to_color(image)
        image = image.astype(np.uint8)
        
        # image = np.load(wav_path) # (224, 313, 3)
        image = albu_transforms[self.mode](image=image)['image']
        image = image.T
        
        targets = np.zeros(len(CFG.target_columns), dtype=float)
        for ebird_code in labels.split():
            targets[CFG.target_columns.index(ebird_code)] = 1.0

        return {
            "image": image,
            "targets": targets,
        }



In [14]:
def init_layer(layer):
    nn.init.xavier_uniform_(layer.weight)

    if hasattr(layer, "bias"):
        if layer.bias is not None:
            layer.bias.data.fill_(0.)


def init_bn(bn):
    bn.bias.data.fill_(0.)
    bn.weight.data.fill_(1.0)


def init_weights(model):
    classname = model.__class__.__name__
    if classname.find("Conv2d") != -1:
        nn.init.xavier_uniform_(model.weight, gain=np.sqrt(2))
        model.bias.data.fill_(0)
    elif classname.find("BatchNorm") != -1:
        model.weight.data.normal_(1.0, 0.02)
        model.bias.data.fill_(0)
    elif classname.find("GRU") != -1:
        for weight in model.parameters():
            if len(weight.size()) > 1:
                nn.init.orghogonal_(weight.data)
    elif classname.find("Linear") != -1:
        model.weight.data.normal_(0, 0.01)
        model.bias.data.zero_()


def interpolate(x: torch.Tensor, ratio: int):
    """Interpolate data in time domain. This is used to compensate the
    resolution reduction in downsampling of a CNN.
    Args:
      x: (batch_size, time_steps, classes_num)
      ratio: int, ratio to interpolate
    Returns:
      upsampled: (batch_size, time_steps * ratio, classes_num)
    """
    (batch_size, time_steps, classes_num) = x.shape
    upsampled = x[:, :, None, :].repeat(1, 1, ratio, 1)
    upsampled = upsampled.reshape(batch_size, time_steps * ratio, classes_num)
    return upsampled


def pad_framewise_output(framewise_output: torch.Tensor, frames_num: int):
    """Pad framewise_output to the same length as input frames. The pad value
    is the same as the value of the last frame.
    Args:
      framewise_output: (batch_size, frames_num, classes_num)
      frames_num: int, number of frames to pad
    Outputs:
      output: (batch_size, frames_num, classes_num)
    """
    output = F.interpolate(
        framewise_output.unsqueeze(1),
        size=(frames_num, framewise_output.size(2)),
        align_corners=True,
        mode="bilinear").squeeze(1)

    return output


class AttBlockV2(nn.Module):
    def __init__(self,
                 in_features: int,
                 out_features: int,
                 activation="linear"):
        super().__init__()

        self.activation = activation
        self.att = nn.Conv1d(
            in_channels=in_features,
            out_channels=out_features,
            kernel_size=1,
            stride=1,
            padding=0,
            bias=True)
        self.cla = nn.Conv1d(
            in_channels=in_features,
            out_channels=out_features,
            kernel_size=1,
            stride=1,
            padding=0,
            bias=True)

        self.init_weights()

    def init_weights(self):
        init_layer(self.att)
        init_layer(self.cla)

    def forward(self, x):
        # x: (n_samples, n_in, n_time)
        norm_att = torch.softmax(torch.tanh(self.att(x)), dim=-1)
        cla = self.nonlinear_transform(self.cla(x))
        x = torch.sum(norm_att * cla, dim=2)
        return x, norm_att, cla

    def nonlinear_transform(self, x):
        if self.activation == 'linear':
            return x
        elif self.activation == 'sigmoid':
            return torch.sigmoid(x)


class TimmSED(nn.Module):
    def __init__(self, base_model_name: str, pretrained=False, num_classes=24, in_channels=1):
        super().__init__()

        self.spec_augmenter = SpecAugmentation(time_drop_width=64//2, time_stripes_num=2,
                                               freq_drop_width=8//2, freq_stripes_num=2)

        self.bn0 = nn.BatchNorm2d(CFG.n_mels)

        base_model = timm.create_model(
            base_model_name, pretrained=pretrained, in_chans=in_channels)
        layers = list(base_model.children())[:-2]
        self.encoder = nn.Sequential(*layers)

        if hasattr(base_model, "fc"):
            in_features = base_model.fc.in_features
        else:
            in_features = base_model.classifier.in_features

        self.fc1 = nn.Linear(in_features, in_features, bias=True)
        self.att_block = AttBlockV2(
            in_features, num_classes, activation="sigmoid")

        self.init_weight()

    def init_weight(self):
        init_bn(self.bn0)
        init_layer(self.fc1)
        

    def forward(self, input_data):
        x = input_data # (batch_size, 3, time_steps, mel_bins)

        frames_num = x.shape[2]

        x = x.transpose(1, 3)
        x = self.bn0(x)
        x = x.transpose(1, 3)

        if self.training:
            if random.random() < 0.25:
                x = self.spec_augmenter(x)

        x = x.transpose(2, 3)

        x = self.encoder(x)
        
        # Aggregate in frequency axis
        x = torch.mean(x, dim=3)

        x1 = F.max_pool1d(x, kernel_size=3, stride=1, padding=1)
        x2 = F.avg_pool1d(x, kernel_size=3, stride=1, padding=1)
        x = x1 + x2

        x = F.dropout(x, p=0.5, training=self.training)
        x = x.transpose(1, 2)
        x = F.relu_(self.fc1(x))
        x = x.transpose(1, 2)
        x = F.dropout(x, p=0.5, training=self.training)

        (clipwise_output, norm_att, segmentwise_output) = self.att_block(x)
        logit = torch.sum(norm_att * self.att_block.cla(x), dim=2)
        segmentwise_logit = self.att_block.cla(x).transpose(1, 2)
        segmentwise_output = segmentwise_output.transpose(1, 2)

        interpolate_ratio = frames_num // segmentwise_output.size(1)

        # Get framewise output
        framewise_output = interpolate(segmentwise_output,
                                       interpolate_ratio)
        framewise_output = pad_framewise_output(framewise_output, frames_num)

        framewise_logit = interpolate(segmentwise_logit, interpolate_ratio)
        framewise_logit = pad_framewise_output(framewise_logit, frames_num)

        output_dict = {
            'framewise_output': framewise_output,
            'clipwise_output': clipwise_output,
            'logit': logit,
            'framewise_logit': framewise_logit,
        }

        return output_dict

In [15]:
def rand_bbox(size, lam):
    W = size[2]
    H = size[3]
    cut_rat = np.sqrt(1. - lam)
    cut_w = int(W * cut_rat)
    cut_h = int(H * cut_rat)

    # uniform
    cx = np.random.randint(W)
    cy = np.random.randint(H)

    bbx1 = np.clip(cx - cut_w // 2, 0, W)
    bby1 = np.clip(cy - cut_h // 2, 0, H)
    bbx2 = np.clip(cx + cut_w // 2, 0, W)
    bby2 = np.clip(cy + cut_h // 2, 0, H)

    return bbx1, bby1, bbx2, bby2


def cutmix(data, targets, alpha):
    indices = torch.randperm(data.size(0))
    shuffled_data = data[indices]
    shuffled_targets = targets[indices]

    lam = np.random.beta(alpha, alpha)
    bbx1, bby1, bbx2, bby2 = rand_bbox(data.size(), lam)
    data[:, :, bbx1:bbx2, bby1:bby2] = data[indices, :, bbx1:bbx2, bby1:bby2]
    # adjust lambda to exactly match pixel ratio
    lam = 1 - ((bbx2 - bbx1) * (bby2 - bby1) / (data.size()[-1] * data.size()[-2]))

    new_targets = [targets, shuffled_targets, lam]
    return data, new_targets

def mixup(data, targets, alpha):
    indices = torch.randperm(data.size(0))
    shuffled_data = data[indices]
    shuffled_targets = targets[indices]

    lam = np.random.beta(alpha, alpha)
    new_data = data * lam + shuffled_data * (1 - lam)
    new_targets = [targets, shuffled_targets, lam]
    return new_data, new_targets


def cutmix_criterion(preds, new_targets):
    targets1, targets2, lam = new_targets[0], new_targets[1], new_targets[2]
    criterion = BCEFocal2WayLoss()
    return lam * criterion(preds, targets1) + (1 - lam) * criterion(preds, targets2)

def mixup_criterion(preds, new_targets):
    targets1, targets2, lam = new_targets[0], new_targets[1], new_targets[2]
    criterion = BCEFocal2WayLoss()
    return lam * criterion(preds, targets1) + (1 - lam) * criterion(preds, targets2)


def loss_fn(logits, targets):
    loss_fct = BCEFocal2WayLoss()
    loss = loss_fct(logits, targets)
    return loss

In [16]:
def train_fn(model, data_loader, device, optimizer, scheduler):
    model.train()
    scaler = GradScaler(enabled=CFG.apex)
    losses = AverageMeter()
    scores = MetricMeter()
    tk0 = tqdm(data_loader, total=len(data_loader))
    
    for data in tk0:
        optimizer.zero_grad()
        inputs = data['image'].to(device)
        targets = data['targets'].to(device)
        with autocast(enabled=CFG.apex):
            outputs = model(inputs)
            loss = loss_fn(outputs, targets)
        
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()
        
        scheduler.step()
        losses.update(loss.item(), inputs.size(0))
        scores.update(targets, outputs)
        tk0.set_postfix(loss=losses.avg)
    return scores.avg, losses.avg


def train_mixup_cutmix_fn(model, data_loader, device, optimizer, scheduler):
    model.train()
    scaler = GradScaler(enabled=CFG.apex)
    losses = AverageMeter()
    scores = MetricMeter()
    tk0 = tqdm(data_loader, total=len(data_loader))

    for data in tk0:
        optimizer.zero_grad()
        inputs = data['image'].to(device)
        targets = data['targets'].to(device)

        if np.random.rand()<0.5:
            inputs, new_targets = mixup(inputs, targets, 0.4)
            with autocast(enabled=CFG.apex):
                outputs = model(inputs)
                loss = mixup_criterion(outputs, new_targets) 
        else:
            inputs, new_targets = cutmix(inputs, targets, 0.4)
            with autocast(enabled=CFG.apex):
                outputs = model(inputs)
                loss = cutmix_criterion(outputs, new_targets)

        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()
        
        scheduler.step()
        losses.update(loss.item(), inputs.size(0))
        scores.update(new_targets[0], outputs)
        tk0.set_postfix(loss=losses.avg)
    return scores.avg, losses.avg


def valid_fn(model, data_loader, device):
    model.eval()
    losses = AverageMeter()
    scores = MetricMeter()
    tk0 = tqdm(data_loader, total=len(data_loader))
    valid_preds = []
    with torch.no_grad():
        for data in tk0:
            inputs = data['image'].to(device)
            targets = data['targets'].to(device)
            outputs = model(inputs)
            loss = loss_fn(outputs, targets)
            losses.update(loss.item(), inputs.size(0))
            scores.update(targets, outputs)
            tk0.set_postfix(loss=losses.avg)
    return scores.avg, losses.avg

In [17]:
def inference_fn(model, data_loader, device):
    model.eval()
    tk0 = tqdm(data_loader, total=len(data_loader))
    final_output = []
    final_target = []
    with torch.no_grad():
        for b_idx, data in enumerate(tk0):
            inputs = data['image'].to(device)
            targets = data['targets'].to(device).detach().cpu().numpy().tolist()
            output = model(inputs)
            output = output["clipwise_output"].cpu().detach().cpu().numpy().tolist()
            final_output.extend(output)
            final_target.extend(targets)
    return final_output, final_target


def calc_cv(model_paths):
    df = pd.read_csv('train_folds.csv')
    y_true = []
    y_pred = []
    for fold, model_path in enumerate(model_paths):
        model = TimmSED(
            base_model_name=CFG.base_model_name,
            pretrained=CFG.pretrained,
            num_classes=CFG.num_classes,
            in_channels=CFG.in_channels)

        model.to(device)
        model.load_state_dict(torch.load(model_path))
        model.eval()

        val_df = df[df.kfold == fold].reset_index(drop=True)
        dataset = WaveformDataset(df=val_df, mode='valid')
        dataloader = torch.utils.data.DataLoader(
            dataset, batch_size=CFG.valid_bs, num_workers=0, pin_memory=True, shuffle=False
        )

        final_output, final_target = inference_fn(model, dataloader, device)
        y_pred.extend(final_output)
        y_true.extend(final_target)
        torch.cuda.empty_cache()

        f1_03 = metrics.f1_score(np.array(y_true), np.array(y_pred) > 0.3, average="micro")
        print(f'micro f1_0.3 {f1_03}')

    f1_03 = metrics.f1_score(np.array(y_true), np.array(y_pred) > 0.3, average="micro")
    f1_05 = metrics.f1_score(np.array(y_true), np.array(y_pred) > 0.5, average="micro")

    print(f'overall micro f1_0.3 {f1_03}')
    print(f'overall micro f1_0.5 {f1_05}')
    return

In [18]:
# main loop
for fold in [4,3,2,1,0]:   #range(5):
    if fold not in CFG.folds:
        continue
    print("=" * 100)
    print(f"Fold {fold} Training")
    print("=" * 100)

    trn_df = train[train.kfold != fold].reset_index(drop=True)
    val_df = train[train.kfold == fold].reset_index(drop=True)

    train_dataset = WaveformDataset(df=trn_df, mode='train')
    train_dataloader = torch.utils.data.DataLoader(
        train_dataset, batch_size=CFG.train_bs, num_workers=0, pin_memory=True, shuffle=True
    )
    
    valid_dataset = WaveformDataset(df=val_df, mode='valid')
    valid_dataloader = torch.utils.data.DataLoader(
        valid_dataset, batch_size=CFG.valid_bs, num_workers=0, pin_memory=True, shuffle=False
    )

    model = TimmSED(
        base_model_name=CFG.base_model_name,
        pretrained=CFG.pretrained,
        num_classes=CFG.num_classes,
        in_channels=CFG.in_channels)

    optimizer = transformers.AdamW(model.parameters(), lr=CFG.LR, weight_decay=CFG.WEIGHT_DECAY)
    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, eta_min=CFG.ETA_MIN, T_max=500)

    model = model.to(device)

    min_loss = 999
    best_score = -np.inf

    for epoch in range(CFG.epochs):
        print("Starting {} epoch...".format(epoch+1))

        start_time = time.time()

        if epoch < CFG.cutmix_and_mixup_epochs:
            train_avg, train_loss = train_mixup_cutmix_fn(model, train_dataloader, device, optimizer, scheduler)
        else: 
            train_avg, train_loss = train_fn(model, train_dataloader, device, optimizer, scheduler)

        valid_avg, valid_loss = valid_fn(model, valid_dataloader, device)

        elapsed = time.time() - start_time

        print(f'Epoch {epoch+1} - avg_train_loss: {train_loss:.5f}  avg_val_loss: {valid_loss:.5f}  time: {elapsed:.0f}s')
        print(f"Epoch {epoch+1} - train_f1_at_03:{train_avg['f1_at_03']:0.5f}  valid_f1_at_03:{valid_avg['f1_at_03']:0.5f}")
        print(f"Epoch {epoch+1} - train_f1_at_05:{train_avg['f1_at_05']:0.5f}  valid_f1_at_05:{valid_avg['f1_at_05']:0.5f}")

        if valid_avg['f1_at_03'] > best_score:
            print(f">>>>>>>> Model Improved From {best_score} ----> {valid_avg['f1_at_03']}")
            print(f"other scores here... {valid_avg['f1_at_03']}, {valid_avg['f1_at_05']}")
            torch.save(model.state_dict(), f'fold-{fold}.bin')
            best_score = valid_avg['f1_at_03']

Fold 4 Training




Starting 1 epoch...


100%|██████████| 743/743 [17:22<00:00,  1.40s/it, loss=0.0143]
100%|██████████| 93/93 [01:09<00:00,  1.34it/s, loss=0.00791]


Epoch 1 - avg_train_loss: 0.01428  avg_val_loss: 0.00791  time: 1113s
Epoch 1 - train_f1_at_03:0.00458  valid_f1_at_03:0.00180
Epoch 1 - train_f1_at_05:0.00175  valid_f1_at_05:0.00000
>>>>>>>> Model Improved From -inf ----> 0.0017974835230677051
other scores here... 0.0017974835230677051, 0.0
Starting 2 epoch...


100%|██████████| 743/743 [16:53<00:00,  1.36s/it, loss=0.00843]
100%|██████████| 93/93 [01:08<00:00,  1.36it/s, loss=0.00678]


Epoch 2 - avg_train_loss: 0.00843  avg_val_loss: 0.00678  time: 1082s
Epoch 2 - train_f1_at_03:0.01334  valid_f1_at_03:0.14465
Epoch 2 - train_f1_at_05:0.00060  valid_f1_at_05:0.01902
>>>>>>>> Model Improved From 0.0017974835230677051 ----> 0.14464905257539368
other scores here... 0.14464905257539368, 0.01902497027348395
Starting 3 epoch...


100%|██████████| 743/743 [16:57<00:00,  1.37s/it, loss=0.00778]
100%|██████████| 93/93 [01:09<00:00,  1.34it/s, loss=0.00605]


Epoch 3 - avg_train_loss: 0.00778  avg_val_loss: 0.00605  time: 1087s
Epoch 3 - train_f1_at_03:0.04238  valid_f1_at_03:0.30005
Epoch 3 - train_f1_at_05:0.00210  valid_f1_at_05:0.10398
>>>>>>>> Model Improved From 0.14464905257539368 ----> 0.30004688232536336
other scores here... 0.30004688232536336, 0.10397727272727274
Starting 4 epoch...


100%|██████████| 743/743 [16:56<00:00,  1.37s/it, loss=0.00706]
100%|██████████| 93/93 [01:09<00:00,  1.34it/s, loss=0.00593]


Epoch 4 - avg_train_loss: 0.00706  avg_val_loss: 0.00593  time: 1087s
Epoch 4 - train_f1_at_03:0.11600  valid_f1_at_03:0.29607
Epoch 4 - train_f1_at_05:0.01235  valid_f1_at_05:0.13333
Starting 5 epoch...


100%|██████████| 743/743 [16:58<00:00,  1.37s/it, loss=0.00677]
100%|██████████| 93/93 [01:09<00:00,  1.34it/s, loss=0.00511]


Epoch 5 - avg_train_loss: 0.00677  avg_val_loss: 0.00511  time: 1088s
Epoch 5 - train_f1_at_03:0.15932  valid_f1_at_03:0.46175
Epoch 5 - train_f1_at_05:0.02116  valid_f1_at_05:0.25950
>>>>>>>> Model Improved From 0.30004688232536336 ----> 0.4617481020050613
other scores here... 0.4617481020050613, 0.2594985784440424
Starting 6 epoch...


100%|██████████| 743/743 [16:55<00:00,  1.37s/it, loss=0.00666]
100%|██████████| 93/93 [01:08<00:00,  1.36it/s, loss=0.00464]


Epoch 6 - avg_train_loss: 0.00666  avg_val_loss: 0.00464  time: 1085s
Epoch 6 - train_f1_at_03:0.18266  valid_f1_at_03:0.53151
Epoch 6 - train_f1_at_05:0.02510  valid_f1_at_05:0.29301
>>>>>>>> Model Improved From 0.4617481020050613 ----> 0.5315142198308993
other scores here... 0.5315142198308993, 0.2930066360387953
Starting 7 epoch...


100%|██████████| 743/743 [16:57<00:00,  1.37s/it, loss=0.00637]
100%|██████████| 93/93 [01:09<00:00,  1.34it/s, loss=0.00488]


Epoch 7 - avg_train_loss: 0.00637  avg_val_loss: 0.00488  time: 1087s
Epoch 7 - train_f1_at_03:0.20472  valid_f1_at_03:0.47912
Epoch 7 - train_f1_at_05:0.03121  valid_f1_at_05:0.25404
Starting 8 epoch...


100%|██████████| 743/743 [16:57<00:00,  1.37s/it, loss=0.00603]
100%|██████████| 93/93 [01:09<00:00,  1.33it/s, loss=0.0049] 


Epoch 8 - avg_train_loss: 0.00603  avg_val_loss: 0.00490  time: 1088s
Epoch 8 - train_f1_at_03:0.24735  valid_f1_at_03:0.48628
Epoch 8 - train_f1_at_05:0.04766  valid_f1_at_05:0.28564
Starting 9 epoch...


100%|██████████| 743/743 [16:57<00:00,  1.37s/it, loss=0.00602]
100%|██████████| 93/93 [01:08<00:00,  1.36it/s, loss=0.00413]


Epoch 9 - avg_train_loss: 0.00602  avg_val_loss: 0.00413  time: 1087s
Epoch 9 - train_f1_at_03:0.25886  valid_f1_at_03:0.59315
Epoch 9 - train_f1_at_05:0.04416  valid_f1_at_05:0.42536
>>>>>>>> Model Improved From 0.5315142198308993 ----> 0.5931531531531532
other scores here... 0.5931531531531532, 0.4253626579316799
Starting 10 epoch...


100%|██████████| 743/743 [17:00<00:00,  1.37s/it, loss=0.00595]
100%|██████████| 93/93 [01:09<00:00,  1.34it/s, loss=0.00393]


Epoch 10 - avg_train_loss: 0.00595  avg_val_loss: 0.00393  time: 1090s
Epoch 10 - train_f1_at_03:0.28204  valid_f1_at_03:0.61306
Epoch 10 - train_f1_at_05:0.05669  valid_f1_at_05:0.39886
>>>>>>>> Model Improved From 0.5931531531531532 ----> 0.6130634774609015
other scores here... 0.6130634774609015, 0.3988563259471051
Starting 11 epoch...


100%|██████████| 743/743 [16:55<00:00,  1.37s/it, loss=0.00567]
100%|██████████| 93/93 [01:09<00:00,  1.33it/s, loss=0.00428]


Epoch 11 - avg_train_loss: 0.00567  avg_val_loss: 0.00428  time: 1086s
Epoch 11 - train_f1_at_03:0.29072  valid_f1_at_03:0.57105
Epoch 11 - train_f1_at_05:0.06669  valid_f1_at_05:0.33845
Starting 12 epoch...


100%|██████████| 743/743 [16:56<00:00,  1.37s/it, loss=0.00545]
100%|██████████| 93/93 [01:10<00:00,  1.32it/s, loss=0.00459]


Epoch 12 - avg_train_loss: 0.00545  avg_val_loss: 0.00459  time: 1088s
Epoch 12 - train_f1_at_03:0.31986  valid_f1_at_03:0.55572
Epoch 12 - train_f1_at_05:0.07800  valid_f1_at_05:0.22575
Starting 13 epoch...


100%|██████████| 743/743 [16:55<00:00,  1.37s/it, loss=0.00557]
100%|██████████| 93/93 [01:10<00:00,  1.32it/s, loss=0.00365]


Epoch 13 - avg_train_loss: 0.00557  avg_val_loss: 0.00365  time: 1087s
Epoch 13 - train_f1_at_03:0.32351  valid_f1_at_03:0.65150
Epoch 13 - train_f1_at_05:0.07046  valid_f1_at_05:0.47080
>>>>>>>> Model Improved From 0.6130634774609015 ----> 0.6514965867320146
other scores here... 0.6514965867320146, 0.47080209043399224
Starting 14 epoch...


100%|██████████| 743/743 [16:55<00:00,  1.37s/it, loss=0.00549]
100%|██████████| 93/93 [01:10<00:00,  1.32it/s, loss=0.00357]


Epoch 14 - avg_train_loss: 0.00549  avg_val_loss: 0.00357  time: 1086s
Epoch 14 - train_f1_at_03:0.32661  valid_f1_at_03:0.66073
Epoch 14 - train_f1_at_05:0.08434  valid_f1_at_05:0.47725
>>>>>>>> Model Improved From 0.6514965867320146 ----> 0.6607301869991095
other scores here... 0.6607301869991095, 0.47724700022639793
Starting 15 epoch...


100%|██████████| 743/743 [16:56<00:00,  1.37s/it, loss=0.00536]
100%|██████████| 93/93 [01:08<00:00,  1.36it/s, loss=0.00396]


Epoch 15 - avg_train_loss: 0.00536  avg_val_loss: 0.00396  time: 1085s
Epoch 15 - train_f1_at_03:0.34408  valid_f1_at_03:0.62045
Epoch 15 - train_f1_at_05:0.09408  valid_f1_at_05:0.34353
Starting 16 epoch...


100%|██████████| 743/743 [16:53<00:00,  1.36s/it, loss=0.00513]
100%|██████████| 93/93 [01:09<00:00,  1.34it/s, loss=0.00384]


Epoch 16 - avg_train_loss: 0.00513  avg_val_loss: 0.00384  time: 1083s
Epoch 16 - train_f1_at_03:0.35139  valid_f1_at_03:0.63636
Epoch 16 - train_f1_at_05:0.09992  valid_f1_at_05:0.46612
Starting 17 epoch...


100%|██████████| 743/743 [16:54<00:00,  1.37s/it, loss=0.00516]
100%|██████████| 93/93 [01:09<00:00,  1.34it/s, loss=0.00337]


Epoch 17 - avg_train_loss: 0.00516  avg_val_loss: 0.00337  time: 1085s
Epoch 17 - train_f1_at_03:0.36854  valid_f1_at_03:0.68414
Epoch 17 - train_f1_at_05:0.12217  valid_f1_at_05:0.51979
>>>>>>>> Model Improved From 0.6607301869991095 ----> 0.6841379310344827
other scores here... 0.6841379310344827, 0.5197889182058048
Starting 18 epoch...


100%|██████████| 743/743 [16:53<00:00,  1.36s/it, loss=0.00514]
100%|██████████| 93/93 [01:09<00:00,  1.34it/s, loss=0.00341]


Epoch 18 - avg_train_loss: 0.00514  avg_val_loss: 0.00341  time: 1084s
Epoch 18 - train_f1_at_03:0.36940  valid_f1_at_03:0.68429
Epoch 18 - train_f1_at_05:0.11695  valid_f1_at_05:0.47574
>>>>>>>> Model Improved From 0.6841379310344827 ----> 0.6842934687445281
other scores here... 0.6842934687445281, 0.4757369614512472
Starting 19 epoch...


100%|██████████| 743/743 [16:53<00:00,  1.36s/it, loss=0.00345]
100%|██████████| 93/93 [01:08<00:00,  1.36it/s, loss=0.00432]


Epoch 19 - avg_train_loss: 0.00345  avg_val_loss: 0.00432  time: 1083s
Epoch 19 - train_f1_at_03:0.67248  valid_f1_at_03:0.58637
Epoch 19 - train_f1_at_05:0.46655  valid_f1_at_05:0.46858
Starting 20 epoch...


100%|██████████| 743/743 [16:54<00:00,  1.37s/it, loss=0.00327]
100%|██████████| 93/93 [01:10<00:00,  1.33it/s, loss=0.00364]


Epoch 20 - avg_train_loss: 0.00327  avg_val_loss: 0.00364  time: 1086s
Epoch 20 - train_f1_at_03:0.69298  valid_f1_at_03:0.65507
Epoch 20 - train_f1_at_05:0.53508  valid_f1_at_05:0.55871
Starting 21 epoch...


100%|██████████| 743/743 [16:51<00:00,  1.36s/it, loss=0.00334]
100%|██████████| 93/93 [01:08<00:00,  1.36it/s, loss=0.00349]


Epoch 21 - avg_train_loss: 0.00334  avg_val_loss: 0.00349  time: 1081s
Epoch 21 - train_f1_at_03:0.68805  valid_f1_at_03:0.70217
Epoch 21 - train_f1_at_05:0.53341  valid_f1_at_05:0.65182
>>>>>>>> Model Improved From 0.6842934687445281 ----> 0.7021663634860261
other scores here... 0.7021663634860261, 0.6518171160609613
Starting 22 epoch...


100%|██████████| 743/743 [16:53<00:00,  1.36s/it, loss=0.00328]
100%|██████████| 93/93 [01:10<00:00,  1.33it/s, loss=0.00347]


Epoch 22 - avg_train_loss: 0.00328  avg_val_loss: 0.00347  time: 1084s
Epoch 22 - train_f1_at_03:0.70009  valid_f1_at_03:0.69664
Epoch 22 - train_f1_at_05:0.54925  valid_f1_at_05:0.63672
Starting 23 epoch...


100%|██████████| 743/743 [16:49<00:00,  1.36s/it, loss=0.00287]
100%|██████████| 93/93 [01:09<00:00,  1.34it/s, loss=0.00417]


Epoch 23 - avg_train_loss: 0.00287  avg_val_loss: 0.00417  time: 1080s
Epoch 23 - train_f1_at_03:0.73885  valid_f1_at_03:0.64191
Epoch 23 - train_f1_at_05:0.60431  valid_f1_at_05:0.57025
Starting 24 epoch...


100%|██████████| 743/743 [16:50<00:00,  1.36s/it, loss=0.00281]
100%|██████████| 93/93 [01:08<00:00,  1.35it/s, loss=0.00378]


Epoch 24 - avg_train_loss: 0.00281  avg_val_loss: 0.00378  time: 1080s
Epoch 24 - train_f1_at_03:0.74774  valid_f1_at_03:0.68761
Epoch 24 - train_f1_at_05:0.61140  valid_f1_at_05:0.63629
Starting 25 epoch...


100%|██████████| 743/743 [16:53<00:00,  1.36s/it, loss=0.00297]
100%|██████████| 93/93 [01:08<00:00,  1.37it/s, loss=0.0033] 


Epoch 25 - avg_train_loss: 0.00297  avg_val_loss: 0.00330  time: 1082s
Epoch 25 - train_f1_at_03:0.73119  valid_f1_at_03:0.71551
Epoch 25 - train_f1_at_05:0.59156  valid_f1_at_05:0.67462
>>>>>>>> Model Improved From 0.7021663634860261 ----> 0.7155059132720106
other scores here... 0.7155059132720106, 0.6746153846153846
Starting 26 epoch...


100%|██████████| 743/743 [16:52<00:00,  1.36s/it, loss=0.00286]
100%|██████████| 93/93 [01:08<00:00,  1.36it/s, loss=0.00366]


Epoch 26 - avg_train_loss: 0.00286  avg_val_loss: 0.00366  time: 1081s
Epoch 26 - train_f1_at_03:0.74224  valid_f1_at_03:0.69562
Epoch 26 - train_f1_at_05:0.61343  valid_f1_at_05:0.66318
Starting 27 epoch...


100%|██████████| 743/743 [16:49<00:00,  1.36s/it, loss=0.0025] 
100%|██████████| 93/93 [01:07<00:00,  1.37it/s, loss=0.00416]


Epoch 27 - avg_train_loss: 0.00250  avg_val_loss: 0.00416  time: 1078s
Epoch 27 - train_f1_at_03:0.77664  valid_f1_at_03:0.65015
Epoch 27 - train_f1_at_05:0.66313  valid_f1_at_05:0.59610
Starting 28 epoch...


100%|██████████| 743/743 [16:49<00:00,  1.36s/it, loss=0.00247]
100%|██████████| 93/93 [01:08<00:00,  1.36it/s, loss=0.00375]


Epoch 28 - avg_train_loss: 0.00247  avg_val_loss: 0.00375  time: 1079s
Epoch 28 - train_f1_at_03:0.78078  valid_f1_at_03:0.72039
Epoch 28 - train_f1_at_05:0.66687  valid_f1_at_05:0.69157
>>>>>>>> Model Improved From 0.7155059132720106 ----> 0.7203915171288743
other scores here... 0.7203915171288743, 0.6915676287492927
Starting 29 epoch...


100%|██████████| 743/743 [16:05<00:00,  1.30s/it, loss=0.00271]
100%|██████████| 93/93 [01:01<00:00,  1.52it/s, loss=0.00356]


Epoch 29 - avg_train_loss: 0.00271  avg_val_loss: 0.00356  time: 1028s
Epoch 29 - train_f1_at_03:0.75444  valid_f1_at_03:0.71580
Epoch 29 - train_f1_at_05:0.63611  valid_f1_at_05:0.68700
Starting 30 epoch...


100%|██████████| 743/743 [15:14<00:00,  1.23s/it, loss=0.00249]
100%|██████████| 93/93 [01:01<00:00,  1.52it/s, loss=0.00385]


Epoch 30 - avg_train_loss: 0.00249  avg_val_loss: 0.00385  time: 977s
Epoch 30 - train_f1_at_03:0.78025  valid_f1_at_03:0.69706
Epoch 30 - train_f1_at_05:0.66766  valid_f1_at_05:0.66347
Starting 31 epoch...


100%|██████████| 743/743 [15:14<00:00,  1.23s/it, loss=0.00227]
100%|██████████| 93/93 [01:00<00:00,  1.54it/s, loss=0.00403]


Epoch 31 - avg_train_loss: 0.00227  avg_val_loss: 0.00403  time: 975s
Epoch 31 - train_f1_at_03:0.80228  valid_f1_at_03:0.66839
Epoch 31 - train_f1_at_05:0.70459  valid_f1_at_05:0.62189
Starting 32 epoch...


100%|██████████| 743/743 [15:12<00:00,  1.23s/it, loss=0.0023] 
100%|██████████| 93/93 [01:00<00:00,  1.53it/s, loss=0.00355]


Epoch 32 - avg_train_loss: 0.00230  avg_val_loss: 0.00355  time: 974s
Epoch 32 - train_f1_at_03:0.80002  valid_f1_at_03:0.71092
Epoch 32 - train_f1_at_05:0.69893  valid_f1_at_05:0.68100
Fold 3 Training
Starting 1 epoch...


100%|██████████| 743/743 [15:15<00:00,  1.23s/it, loss=0.0145]
100%|██████████| 93/93 [01:01<00:00,  1.51it/s, loss=0.00793]


Epoch 1 - avg_train_loss: 0.01447  avg_val_loss: 0.00793  time: 978s
Epoch 1 - train_f1_at_03:0.00440  valid_f1_at_03:0.00000
Epoch 1 - train_f1_at_05:0.00134  valid_f1_at_05:0.00000
>>>>>>>> Model Improved From -inf ----> 0.0
other scores here... 0.0, 0.0
Starting 2 epoch...


100%|██████████| 743/743 [15:14<00:00,  1.23s/it, loss=0.00843]
100%|██████████| 93/93 [01:01<00:00,  1.51it/s, loss=0.00672]


Epoch 2 - avg_train_loss: 0.00843  avg_val_loss: 0.00672  time: 977s
Epoch 2 - train_f1_at_03:0.01216  valid_f1_at_03:0.17728
Epoch 2 - train_f1_at_05:0.00030  valid_f1_at_05:0.04007
>>>>>>>> Model Improved From 0.0 ----> 0.1772809981804003
other scores here... 0.1772809981804003, 0.04007071302298173
Starting 3 epoch...


100%|██████████| 743/743 [15:15<00:00,  1.23s/it, loss=0.00774]
100%|██████████| 93/93 [01:00<00:00,  1.53it/s, loss=0.0061] 


Epoch 3 - avg_train_loss: 0.00774  avg_val_loss: 0.00610  time: 977s
Epoch 3 - train_f1_at_03:0.04667  valid_f1_at_03:0.29352
Epoch 3 - train_f1_at_05:0.00210  valid_f1_at_05:0.10828
>>>>>>>> Model Improved From 0.1772809981804003 ----> 0.29352319706017455
other scores here... 0.29352319706017455, 0.1082766439909297
Starting 4 epoch...


100%|██████████| 743/743 [15:19<00:00,  1.24s/it, loss=0.00706]
100%|██████████| 93/93 [01:01<00:00,  1.51it/s, loss=0.00591]


Epoch 4 - avg_train_loss: 0.00706  avg_val_loss: 0.00591  time: 982s
Epoch 4 - train_f1_at_03:0.11266  valid_f1_at_03:0.34775
Epoch 4 - train_f1_at_05:0.01057  valid_f1_at_05:0.13959
>>>>>>>> Model Improved From 0.29352319706017455 ----> 0.34774774774774775
other scores here... 0.34774774774774775, 0.13958682300390843
Starting 5 epoch...


100%|██████████| 743/743 [15:18<00:00,  1.24s/it, loss=0.00679]
100%|██████████| 93/93 [01:01<00:00,  1.50it/s, loss=0.00502]


Epoch 5 - avg_train_loss: 0.00679  avg_val_loss: 0.00502  time: 981s
Epoch 5 - train_f1_at_03:0.15501  valid_f1_at_03:0.46235
Epoch 5 - train_f1_at_05:0.01483  valid_f1_at_05:0.25229
>>>>>>>> Model Improved From 0.34774774774774775 ----> 0.46235031459305875
other scores here... 0.46235031459305875, 0.2522899764459566
Starting 6 epoch...


100%|██████████| 743/743 [15:22<00:00,  1.24s/it, loss=0.00665]
100%|██████████| 93/93 [01:02<00:00,  1.49it/s, loss=0.00459]


Epoch 6 - avg_train_loss: 0.00665  avg_val_loss: 0.00459  time: 985s
Epoch 6 - train_f1_at_03:0.17986  valid_f1_at_03:0.53018
Epoch 6 - train_f1_at_05:0.02752  valid_f1_at_05:0.30508
>>>>>>>> Model Improved From 0.46235031459305875 ----> 0.530175706646295
other scores here... 0.530175706646295, 0.3050761421319797
Starting 7 epoch...


100%|██████████| 743/743 [17:14<00:00,  1.39s/it, loss=0.00637]
100%|██████████| 93/93 [01:13<00:00,  1.26it/s, loss=0.00483]


Epoch 7 - avg_train_loss: 0.00637  avg_val_loss: 0.00483  time: 1109s
Epoch 7 - train_f1_at_03:0.21463  valid_f1_at_03:0.50586
Epoch 7 - train_f1_at_05:0.03303  valid_f1_at_05:0.25894
Starting 8 epoch...


100%|██████████| 743/743 [17:46<00:00,  1.43s/it, loss=0.00606]
100%|██████████| 93/93 [01:10<00:00,  1.32it/s, loss=0.00473]


Epoch 8 - avg_train_loss: 0.00606  avg_val_loss: 0.00473  time: 1137s
Epoch 8 - train_f1_at_03:0.23858  valid_f1_at_03:0.51566
Epoch 8 - train_f1_at_05:0.04644  valid_f1_at_05:0.28058
Starting 9 epoch...


100%|██████████| 743/743 [17:44<00:00,  1.43s/it, loss=0.00606]
100%|██████████| 93/93 [01:14<00:00,  1.25it/s, loss=0.00407]


Epoch 9 - avg_train_loss: 0.00606  avg_val_loss: 0.00407  time: 1140s
Epoch 9 - train_f1_at_03:0.24388  valid_f1_at_03:0.58442
Epoch 9 - train_f1_at_05:0.04518  valid_f1_at_05:0.37460
>>>>>>>> Model Improved From 0.530175706646295 ----> 0.5844229675952246
other scores here... 0.5844229675952246, 0.3746047190464607
Starting 10 epoch...


100%|██████████| 743/743 [17:48<00:00,  1.44s/it, loss=0.006]  
100%|██████████| 93/93 [01:11<00:00,  1.31it/s, loss=0.00394]


Epoch 10 - avg_train_loss: 0.00600  avg_val_loss: 0.00394  time: 1141s
Epoch 10 - train_f1_at_03:0.26029  valid_f1_at_03:0.60662
Epoch 10 - train_f1_at_05:0.05535  valid_f1_at_05:0.38540
>>>>>>>> Model Improved From 0.5844229675952246 ----> 0.6066210467911965
other scores here... 0.6066210467911965, 0.3853965183752418
Starting 11 epoch...


100%|██████████| 743/743 [16:23<00:00,  1.32s/it, loss=0.00581]
100%|██████████| 93/93 [01:09<00:00,  1.34it/s, loss=0.00446]


Epoch 11 - avg_train_loss: 0.00581  avg_val_loss: 0.00446  time: 1053s
Epoch 11 - train_f1_at_03:0.28853  valid_f1_at_03:0.55182
Epoch 11 - train_f1_at_05:0.06311  valid_f1_at_05:0.25714
Starting 12 epoch...


100%|██████████| 743/743 [16:05<00:00,  1.30s/it, loss=0.00556]
100%|██████████| 93/93 [01:08<00:00,  1.35it/s, loss=0.00427]


Epoch 12 - avg_train_loss: 0.00556  avg_val_loss: 0.00427  time: 1035s
Epoch 12 - train_f1_at_03:0.32699  valid_f1_at_03:0.57127
Epoch 12 - train_f1_at_05:0.08093  valid_f1_at_05:0.30089
Starting 13 epoch...


100%|██████████| 743/743 [16:02<00:00,  1.30s/it, loss=0.0056] 
100%|██████████| 93/93 [01:08<00:00,  1.35it/s, loss=0.0037] 


Epoch 13 - avg_train_loss: 0.00560  avg_val_loss: 0.00370  time: 1032s
Epoch 13 - train_f1_at_03:0.31917  valid_f1_at_03:0.64332
Epoch 13 - train_f1_at_05:0.08044  valid_f1_at_05:0.44367
>>>>>>>> Model Improved From 0.6066210467911965 ----> 0.6433189655172414
other scores here... 0.6433189655172414, 0.4436685288640596
Starting 14 epoch...


100%|██████████| 743/743 [16:05<00:00,  1.30s/it, loss=0.00557]
100%|██████████| 93/93 [01:08<00:00,  1.36it/s, loss=0.0036] 


Epoch 14 - avg_train_loss: 0.00557  avg_val_loss: 0.00360  time: 1035s
Epoch 14 - train_f1_at_03:0.32439  valid_f1_at_03:0.65698
Epoch 14 - train_f1_at_05:0.07994  valid_f1_at_05:0.44434
>>>>>>>> Model Improved From 0.6433189655172414 ----> 0.6569808646350107
other scores here... 0.6569808646350107, 0.4443409408476945
Starting 15 epoch...


100%|██████████| 743/743 [17:18<00:00,  1.40s/it, loss=0.00539]
100%|██████████| 93/93 [01:17<00:00,  1.20it/s, loss=0.00408]


Epoch 15 - avg_train_loss: 0.00539  avg_val_loss: 0.00408  time: 1117s
Epoch 15 - train_f1_at_03:0.35093  valid_f1_at_03:0.60609
Epoch 15 - train_f1_at_05:0.09004  valid_f1_at_05:0.38556
Starting 16 epoch...


100%|██████████| 743/743 [18:12<00:00,  1.47s/it, loss=0.00515]
100%|██████████| 93/93 [01:17<00:00,  1.19it/s, loss=0.00376]


Epoch 16 - avg_train_loss: 0.00515  avg_val_loss: 0.00376  time: 1171s
Epoch 16 - train_f1_at_03:0.37196  valid_f1_at_03:0.63888
Epoch 16 - train_f1_at_05:0.11505  valid_f1_at_05:0.43825
Starting 17 epoch...


100%|██████████| 743/743 [18:14<00:00,  1.47s/it, loss=0.00524]
100%|██████████| 93/93 [01:20<00:00,  1.15it/s, loss=0.0034] 


Epoch 17 - avg_train_loss: 0.00524  avg_val_loss: 0.00340  time: 1176s
Epoch 17 - train_f1_at_03:0.35697  valid_f1_at_03:0.68480
Epoch 17 - train_f1_at_05:0.10713  valid_f1_at_05:0.48736
>>>>>>>> Model Improved From 0.6569808646350107 ----> 0.6848033269797262
other scores here... 0.6848033269797262, 0.4873646209386282
Starting 18 epoch...


100%|██████████| 743/743 [17:39<00:00,  1.43s/it, loss=0.00521]
100%|██████████| 93/93 [01:07<00:00,  1.37it/s, loss=0.00346]


Epoch 18 - avg_train_loss: 0.00521  avg_val_loss: 0.00346  time: 1128s
Epoch 18 - train_f1_at_03:0.36123  valid_f1_at_03:0.68039
Epoch 18 - train_f1_at_05:0.11120  valid_f1_at_05:0.45368
Starting 19 epoch...


100%|██████████| 743/743 [16:50<00:00,  1.36s/it, loss=0.00348]
100%|██████████| 93/93 [01:14<00:00,  1.25it/s, loss=0.00418]


Epoch 19 - avg_train_loss: 0.00348  avg_val_loss: 0.00418  time: 1086s
Epoch 19 - train_f1_at_03:0.66943  valid_f1_at_03:0.61195
Epoch 19 - train_f1_at_05:0.46373  valid_f1_at_05:0.53254
Starting 20 epoch...


100%|██████████| 743/743 [16:25<00:00,  1.33s/it, loss=0.00328]
100%|██████████| 93/93 [01:06<00:00,  1.40it/s, loss=0.00385]


Epoch 20 - avg_train_loss: 0.00328  avg_val_loss: 0.00385  time: 1052s
Epoch 20 - train_f1_at_03:0.69141  valid_f1_at_03:0.64950
Epoch 20 - train_f1_at_05:0.53349  valid_f1_at_05:0.56916
Starting 21 epoch...


100%|██████████| 743/743 [15:44<00:00,  1.27s/it, loss=0.00343]
100%|██████████| 93/93 [01:05<00:00,  1.43it/s, loss=0.00342]


Epoch 21 - avg_train_loss: 0.00343  avg_val_loss: 0.00342  time: 1011s
Epoch 21 - train_f1_at_03:0.67658  valid_f1_at_03:0.70066
Epoch 21 - train_f1_at_05:0.51790  valid_f1_at_05:0.65307
>>>>>>>> Model Improved From 0.6848033269797262 ----> 0.7006622516556291
other scores here... 0.7006622516556291, 0.6530692292606394
Starting 22 epoch...


100%|██████████| 743/743 [15:47<00:00,  1.27s/it, loss=0.00331]
100%|██████████| 93/93 [01:05<00:00,  1.41it/s, loss=0.00352]


Epoch 22 - avg_train_loss: 0.00331  avg_val_loss: 0.00352  time: 1014s
Epoch 22 - train_f1_at_03:0.69173  valid_f1_at_03:0.69120
Epoch 22 - train_f1_at_05:0.54102  valid_f1_at_05:0.63200
Starting 23 epoch...


100%|██████████| 743/743 [15:45<00:00,  1.27s/it, loss=0.00297]
100%|██████████| 93/93 [01:05<00:00,  1.42it/s, loss=0.00429]


Epoch 23 - avg_train_loss: 0.00297  avg_val_loss: 0.00429  time: 1012s
Epoch 23 - train_f1_at_03:0.72761  valid_f1_at_03:0.65045
Epoch 23 - train_f1_at_05:0.58902  valid_f1_at_05:0.60307
Starting 24 epoch...


100%|██████████| 743/743 [15:45<00:00,  1.27s/it, loss=0.00285]
100%|██████████| 93/93 [01:05<00:00,  1.42it/s, loss=0.00372]


Epoch 24 - avg_train_loss: 0.00285  avg_val_loss: 0.00372  time: 1012s
Epoch 24 - train_f1_at_03:0.73849  valid_f1_at_03:0.68645
Epoch 24 - train_f1_at_05:0.61157  valid_f1_at_05:0.65474
Starting 25 epoch...


100%|██████████| 743/743 [17:09<00:00,  1.39s/it, loss=0.00305]
100%|██████████| 93/93 [01:15<00:00,  1.24it/s, loss=0.00342]


Epoch 25 - avg_train_loss: 0.00305  avg_val_loss: 0.00342  time: 1105s
Epoch 25 - train_f1_at_03:0.71744  valid_f1_at_03:0.71291
Epoch 25 - train_f1_at_05:0.58369  valid_f1_at_05:0.67183
>>>>>>>> Model Improved From 0.7006622516556291 ----> 0.7129075182967398
other scores here... 0.7129075182967398, 0.6718296224588576
Starting 26 epoch...


100%|██████████| 743/743 [17:38<00:00,  1.42s/it, loss=0.00292]
100%|██████████| 93/93 [01:10<00:00,  1.31it/s, loss=0.00373]


Epoch 26 - avg_train_loss: 0.00292  avg_val_loss: 0.00373  time: 1130s
Epoch 26 - train_f1_at_03:0.73392  valid_f1_at_03:0.70540
Epoch 26 - train_f1_at_05:0.60385  valid_f1_at_05:0.66357
Starting 27 epoch...


100%|██████████| 743/743 [15:52<00:00,  1.28s/it, loss=0.00261]
100%|██████████| 93/93 [01:05<00:00,  1.41it/s, loss=0.00405]


Epoch 27 - avg_train_loss: 0.00261  avg_val_loss: 0.00405  time: 1019s
Epoch 27 - train_f1_at_03:0.76214  valid_f1_at_03:0.67263
Epoch 27 - train_f1_at_05:0.65092  valid_f1_at_05:0.63114
Starting 28 epoch...


100%|██████████| 743/743 [15:40<00:00,  1.27s/it, loss=0.00261]
100%|██████████| 93/93 [01:05<00:00,  1.43it/s, loss=0.00361]


Epoch 28 - avg_train_loss: 0.00261  avg_val_loss: 0.00361  time: 1006s
Epoch 28 - train_f1_at_03:0.76550  valid_f1_at_03:0.69917
Epoch 28 - train_f1_at_05:0.64478  valid_f1_at_05:0.66241
Starting 29 epoch...


100%|██████████| 743/743 [15:45<00:00,  1.27s/it, loss=0.00275]
100%|██████████| 93/93 [01:05<00:00,  1.42it/s, loss=0.0035] 


Epoch 29 - avg_train_loss: 0.00275  avg_val_loss: 0.00350  time: 1011s
Epoch 29 - train_f1_at_03:0.75082  valid_f1_at_03:0.71344
Epoch 29 - train_f1_at_05:0.62907  valid_f1_at_05:0.68809
>>>>>>>> Model Improved From 0.7129075182967398 ----> 0.7134367778144602
other scores here... 0.7134367778144602, 0.6880943324457969
Starting 30 epoch...


100%|██████████| 743/743 [15:42<00:00,  1.27s/it, loss=0.00255]
100%|██████████| 93/93 [01:06<00:00,  1.40it/s, loss=0.00378]


Epoch 30 - avg_train_loss: 0.00255  avg_val_loss: 0.00378  time: 1010s
Epoch 30 - train_f1_at_03:0.77334  valid_f1_at_03:0.70076
Epoch 30 - train_f1_at_05:0.66477  valid_f1_at_05:0.67342
Starting 31 epoch...


100%|██████████| 743/743 [17:44<00:00,  1.43s/it, loss=0.00226]
100%|██████████| 93/93 [01:16<00:00,  1.22it/s, loss=0.00431]


Epoch 31 - avg_train_loss: 0.00226  avg_val_loss: 0.00431  time: 1142s
Epoch 31 - train_f1_at_03:0.79939  valid_f1_at_03:0.67019
Epoch 31 - train_f1_at_05:0.69803  valid_f1_at_05:0.62830
Starting 32 epoch...


100%|██████████| 743/743 [17:41<00:00,  1.43s/it, loss=0.00234]
100%|██████████| 93/93 [01:11<00:00,  1.30it/s, loss=0.00377]


Epoch 32 - avg_train_loss: 0.00234  avg_val_loss: 0.00377  time: 1133s
Epoch 32 - train_f1_at_03:0.79309  valid_f1_at_03:0.71752
Epoch 32 - train_f1_at_05:0.68913  valid_f1_at_05:0.69224
>>>>>>>> Model Improved From 0.7134367778144602 ----> 0.7175169225689284
other scores here... 0.7175169225689284, 0.6922353825907125
Fold 2 Training




Starting 1 epoch...


  2%|▏         | 12/743 [00:17<17:21,  1.43s/it, loss=0.302]


KeyboardInterrupt: 

In [None]:
model_paths = [f'fold-{i}.bin' for i in CFG.folds]

calc_cv(model_paths)