Fine tuning experiments to create a best model to find clones between eva.ru forum users based on user lizon. Different transformers models, number of epochs and different training approaches avaiable in ktrain library are tested. 

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
import tensorflow as tf

In [3]:
######## GPU CONFIGS FOR RTX 2070 ###############
## Please ignore if not training on GPU       ##
## this is important for running CuDNN on GPU ##

tf.keras.backend.clear_session() #- for easy reset of notebook state

# chck if GPU can be seen by TF
tf.config.list_physical_devices('GPU')
#tf.debugging.set_log_device_placement(True)
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only use the first GPU
  try:
    tf.config.experimental.set_memory_growth(gpus[0], True)
    tf.config.experimental.set_visible_devices(gpus[0], 'GPU')
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
  except RuntimeError as e:
    # Visible devices must be set before GPUs have been initialized
    print(e)
###############################################

1 Physical GPUs, 1 Logical GPU


In [4]:
from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))

if ram_gb < 20:
  print('Not using a high-RAM runtime')
else:
  print('You are using a high-RAM runtime!')

Your runtime has 54.8 gigabytes of available RAM

You are using a high-RAM runtime!


In [5]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)

Fri Nov  5 05:08:32 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.44       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P0    40W / 300W |    425MiB / 16160MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [6]:
import os
import pandas as pd
import numpy as np

In [7]:
#experiment duration
import time

In [8]:
#!pip install openpyxl

In [9]:
#Saving into log (Excel file)
import openpyxl 
def SaveToExperimentLog(Experiments_file, LogEntry, data):
    book = openpyxl.load_workbook(Experiments_file)
    writer = pd.ExcelWriter(Experiments_file, engine='openpyxl') 
    writer.book = book

    writer.sheets = dict((ws.title, ws) for ws in book.worksheets)

    data.to_excel(writer, LogEntry[0:29],index=False)

    writer.save()
    writer.close()

In [10]:
#!pip install pycm

In [11]:
#to get score metrics from the model and save in the experiment log
import pycm as cm
def model_metrics(np_confusion_matrix,class_names):
  #converting numpy array to dictionary
  d_confusion_matrix={}
  for i in range(len(class_names)):
    d_confusion_matrix[class_names[i]]=dict(zip(class_names, np_confusion_matrix[i]))
  d_confusion_matrix=eval(str(d_confusion_matrix))  
  model_cm=cm.ConfusionMatrix(matrix=d_confusion_matrix)
  return model_cm.weighted_average('F1'), model_cm.Kappa, model_cm.PPV, model_cm.TPR, model_cm.F1

In [12]:
########## Ensure reproducibility ##########


# 1. Set `PYTHONHASHSEED` environment variable at a fixed value
os.environ['PYTHONHASHSEED']=str(42)

#Does not work with ktrain
#os.environ['TF_DETERMINISTIC_OPS'] = '1'

# 2. Set `python` built-in pseudo-random generator at a fixed value
#random.seed(42)

# 3. Set `numpy` pseudo-random generator at a fixed value
np.random.seed(42)

# 4. Set `tensorflow` pseudo-random generator at a fixed value
tf.random.set_seed(42)

In [13]:
#!pip install ktrain

In [14]:
import ktrain
from ktrain import text

In [15]:
Data = '/content/drive/MyDrive/Colab Notebooks/Projects/eva/Data/'

Messages_filename='lizon_data_for_finetuning.csv'
Messages_full_filename=os.path.join(Data, Messages_filename)

train_Messages_filename='lizon_data_for_finetuning_train_t.csv'
train_Messages_full_filename=os.path.join(Data, train_Messages_filename)

valid_Messages_filename='lizon_data_for_finetuning_valid_t.csv'
valid_Messages_full_filename=os.path.join(Data, valid_Messages_filename)

test_Messages_filename='lizon_clon_data_for_testg.csv'
test_Messages_full_filename=os.path.join(Data, test_Messages_filename)

text_column='message'
target_column='target'

Models = '/content/drive/MyDrive/Colab Notebooks/Projects/eva/Models/'

#Experiment
#Experiments log file 
Experiments_file='/content/drive/MyDrive/Colab Notebooks/Projects/eva/ExperimentLogs/lizon.xlsx'
Experiment_name='applied_final'
#Experiment can be continued from the lines in the configuration tab (Experiment_name) without results (NewExecution=False) or started from scratch ignoring previous results (NewExecution=True)
NewExecution=False

## Experiment
Experiment is configured in an experiment log file (Excel file, in my case,  in different tabs)

1. Reading an experiment configuration (Experiment_name) from an experiment log file (Experiments_file).

In [16]:
Experiment = pd.read_excel(open(Experiments_file, 'rb'), sheet_name=Experiment_name)
Experiment['comment'].apply(str)
Experiment.tail()

Unnamed: 0,Model,maxlen,batch_size,epochs,lr,method,weighted_avg_F1,kappa,lizon-precision,lizon-recall,lizon-f1-score,duration,comment,test_weighted_avg_F1,test_kappa,test_lizon-precision,test_lizon-recall,test_lizon-f1-score
0,DeepPavlov/rubert-base-cased-conversational,512,8,3,2e-05,fit_onecycle,,,,,,,,,,,,
1,blinoff/roberta-base-russian-v0,256,16,5,1e-05,fit_onecycle,,,,,,,,,,,,


## Data load and/or split

In [17]:
## split dataset
from sklearn import  model_selection
from pathlib import Path

test_file = Path(test_Messages_full_filename)
train_file = Path(train_Messages_full_filename)
valid_file = Path(valid_Messages_full_filename)

if (test_file.is_file() and train_file.is_file() and valid_file.is_file()):
  print('Train/Valid/Test files exist')
  df_test=pd.read_csv(test_Messages_full_filename, error_bad_lines=False, index_col=False, usecols=[target_column, text_column])
  df_train=pd.read_csv(train_Messages_full_filename, error_bad_lines=False, index_col=False, usecols=[target_column, text_column])
  df_valid=pd.read_csv(valid_Messages_full_filename, error_bad_lines=False, index_col=False, usecols=[target_column, text_column])
else:
  print('Train/Valid/Test files do  NOT  exist. Splitting...')
  df=pd.read_csv(Messages_full_filename, error_bad_lines=False, index_col=False, usecols=[target_column, text_column])
  #df.groupby(['target']).size().reset_index(name='counts').sort_values('counts', ascending=False)
  df_trainvalid, df_test = model_selection.train_test_split(df, test_size=0.3, random_state=42,shuffle=True)
  df_test.to_csv(test_Messages_full_filename, header=True, index=False)

  df_train, df_valid = model_selection.train_test_split(df_trainvalid, test_size=0.3, random_state=42,shuffle=True)
  df_train.to_csv(train_Messages_full_filename, header=True, index=False)
  df_valid.to_csv(valid_Messages_full_filename, header=True, index=False)

#
x_test = df_test[text_column].values.astype(str)
x_train = df_train[text_column].values.astype(str)
x_valid = df_valid[text_column].values.astype(str)

## get target
y_test = df_test[target_column].values.astype(str)
y_train = df_train[target_column].values.astype(str)
y_valid = df_valid[target_column].values.astype(str)

Train/Valid/Test files exist


## Model training and evaluation

## Custom Loss Functions

In [18]:
#https://lars76.github.io/2018/09/27/loss-functions-for-segmentation.html

def focal_loss(alpha=0.25, gamma=2):
  def focal_loss_with_logits(logits, targets, alpha, gamma, y_pred):
    targets = tf.cast(targets, tf.float32)
    weight_a = alpha * (1 - y_pred) ** gamma * targets
    weight_b = (1 - alpha) * y_pred ** gamma * (1 - targets)
    
    return (tf.math.log1p(tf.exp(-tf.abs(logits))) + tf.nn.relu(-logits)) * (weight_a + weight_b) + logits * weight_b 

  def loss(y_true, logits):
    y_pred = tf.math.sigmoid(logits)
    loss = focal_loss_with_logits(logits=logits, targets=y_true, alpha=alpha, gamma=gamma, y_pred=y_pred)

    return tf.reduce_mean(loss)

  return loss
#-------------------------------------------------------------------------------  
def dice_loss(smooth=1e-7):
  def dice_coef(y_true, y_pred):
    y_true = tf.cast(y_true, tf.float32)
    y_pred = tf.math.sigmoid(y_pred)
    numerator = 2 * tf.reduce_sum(y_true * y_pred)
    denominator = tf.reduce_sum(y_true + y_pred + smooth)

    return 1 - numerator / denominator
  return dice_coef
#-------------------------------------------------------------------------------
def tversky_loss(beta=0.5):
  def loss(y_true, y_pred):
    y_true = tf.cast(y_true, tf.float32)
    y_pred = tf.math.sigmoid(y_pred)
    numerator = y_true * y_pred
    denominator = y_true * y_pred + beta * (1 - y_true) * y_pred + (1 - beta) * y_true * (1 - y_pred)

    return 1 - tf.reduce_sum(numerator) / tf.reduce_sum(denominator)

  return loss


In [19]:
def combined_loss_focal(alpha=0.25, gamma=2):
  def focal_loss_with_logits(logits, targets, alpha, gamma, y_pred):
    targets = tf.cast(targets, tf.float32)
    weight_a = alpha * (1 - y_pred) ** gamma * targets
    weight_b = (1 - alpha) * y_pred ** gamma * (1 - targets)
    
    return (tf.math.log1p(tf.exp(-tf.abs(logits))) + tf.nn.relu(-logits)) * (weight_a + weight_b) + logits * weight_b 

  def loss(y_true, logits):
    y_pred = tf.math.sigmoid(logits)
    loss = tf.nn.sigmoid_cross_entropy_with_logits(tf.cast(y_true, tf.float32), y_pred) + focal_loss_with_logits(logits=logits, targets=y_true, alpha=alpha, gamma=gamma, y_pred=y_pred)

    return tf.reduce_mean(loss)

  return loss

def combined_loss_focalm(alpha=0.25, gamma=2):
  def focal_loss_with_logits(logits, targets, alpha, gamma, y_pred):
    targets = tf.cast(targets, tf.float32)
    weight_a = alpha * (1 - y_pred) ** gamma * targets
    weight_b = (1 - alpha) * y_pred ** gamma * (1 - targets)
    
    return (tf.math.log1p(tf.exp(-tf.abs(logits))) + tf.nn.relu(-logits)) * (weight_a + weight_b) + logits * weight_b 

  def loss(y_true, logits):
    y_pred = tf.math.sigmoid(logits)
    loss = tf.nn.sigmoid_cross_entropy_with_logits(tf.cast(y_true, tf.float32), y_pred) * focal_loss_with_logits(logits=logits, targets=y_true, alpha=alpha, gamma=gamma, y_pred=y_pred)

    return tf.reduce_mean(loss)

  return loss

In [20]:
def fit_onecycle(MODEL_NAME, maxlen=512,batch_size=8,lr=1e-5,epochs=1,ind=0, func=None):
  t = text.Transformer(MODEL_NAME, maxlen=maxlen)
  trn = t.preprocess_train(x_train, y_train)
  val = t.preprocess_test(x_valid, y_valid)
  test = t.preprocess_test(x_test, y_test)
  model = t.get_classifier()
  if func=='focal_loss':
    model.compile(loss=focal_loss(alpha=0.25, gamma=2),
              optimizer='adam',
              metrics=['accuracy']) 
  elif func=='dice_loss':
    model.compile(loss=dice_loss(smooth=1e-7),
              optimizer='adam',
              metrics=['accuracy'])  
  elif func=='tversky_loss':
    model.compile(loss=tversky_loss(beta=0.5),
              optimizer='adam',
              metrics=['accuracy'])     
  elif func=='combined_loss_focal':
    model.compile(loss=combined_loss_focal(alpha=0.25, gamma=2),
              optimizer='adam',
              metrics=['accuracy'])        
  elif func=='combined_loss_focalm':
    model.compile(loss=combined_loss_focalm(alpha=0.25, gamma=2),
              optimizer='adam',
              metrics=['accuracy'])     
  learner = ktrain.get_learner(model, train_data=trn, val_data=val, batch_size=batch_size)
  learner.fit_onecycle(lr=lr, epochs=epochs)
  #predictor = ktrain.get_predictor(learner.model, preproc=t)
  #Model_full_filename=os.path.join(Models, 'fit_onecycle_'+str(ind))
  #predictor.save(Model_full_filename)
  val_confusion_matrix=learner.validate(val_data=val, class_names=t.get_classes())
  val_weighted_avg_F1, val_kappa, val_PPV, val_TPR, val_F1 = model_metrics(np_confusion_matrix=val_confusion_matrix,class_names=t.get_classes())

  test_confusion_matrix=learner.validate(val_data=test, class_names=t.get_classes())
  test_weighted_avg_F1, test_kappa, test_PPV, test_TPR, test_F1 = model_metrics(np_confusion_matrix=test_confusion_matrix,class_names=t.get_classes())

  return val_weighted_avg_F1, val_kappa, val_PPV['lizon'], val_TPR['lizon'], val_F1['lizon'],test_weighted_avg_F1, test_kappa, test_PPV['lizon'], test_TPR['lizon'], test_F1['lizon']

In [21]:
def tria(MODEL_NAME, maxlen=512,batch_size=8,lr=1e-5,ind=0):
  t = text.Transformer(MODEL_NAME, maxlen=maxlen)
  trn = t.preprocess_train(x_train, y_train)
  val = t.preprocess_test(x_valid, y_valid)
  test = t.preprocess_test(x_test, y_test)
  model = t.get_classifier()
  learner = ktrain.get_learner(model, train_data=trn, val_data=val, batch_size=batch_size)
  learner.autofit(lr=lr)
  predictor = ktrain.get_predictor(learner.model, preproc=t)
  #Model_full_filename=os.path.join(Models, 'fit_onecycle_'+str(ind))
  #predictor.save(Model_full_filename)
  val_confusion_matrix=learner.validate(val_data=val, class_names=t.get_classes())
  val_weighted_avg_F1, val_kappa, val_PPV, val_TPR, val_F1 = model_metrics(np_confusion_matrix=val_confusion_matrix,class_names=t.get_classes())

  test_confusion_matrix=learner.validate(val_data=test, class_names=t.get_classes())
  test_weighted_avg_F1, test_kappa, test_PPV, test_TPR, test_F1 = model_metrics(np_confusion_matrix=test_confusion_matrix,class_names=t.get_classes())

  return val_weighted_avg_F1, val_kappa, val_PPV['lizon'], val_TPR['lizon'], val_F1['lizon'],test_weighted_avg_F1, test_kappa, test_PPV['lizon'], test_TPR['lizon'], test_F1['lizon']

In [22]:
def SGDR1(MODEL_NAME, maxlen=512,batch_size=8,lr=1e-5,n_cycles=5, cycle_len=1, ind=0):
  t = text.Transformer(MODEL_NAME, maxlen=maxlen)
  trn = t.preprocess_train(x_train, y_train)
  val = t.preprocess_test(x_valid, y_valid)
  test = t.preprocess_test(x_test, y_test)
  model = t.get_classifier()
  learner = ktrain.get_learner(model, train_data=trn, val_data=val, batch_size=batch_size)
  learner.fit(lr=lr, n_cycles=n_cycles, cycle_len=cycle_len)
  predictor = ktrain.get_predictor(learner.model, preproc=t)
  #Model_full_filename=os.path.join(Models, 'fit_onecycle_'+str(ind))
  #predictor.save(Model_full_filename)
  val_confusion_matrix=learner.validate(val_data=val, class_names=t.get_classes())
  val_weighted_avg_F1, val_kappa, val_PPV, val_TPR, val_F1 = model_metrics(np_confusion_matrix=val_confusion_matrix,class_names=t.get_classes())

  test_confusion_matrix=learner.validate(val_data=test, class_names=t.get_classes())
  test_weighted_avg_F1, test_kappa, test_PPV, test_TPR, test_F1 = model_metrics(np_confusion_matrix=test_confusion_matrix,class_names=t.get_classes())

  return val_weighted_avg_F1, val_kappa, val_PPV['lizon'], val_TPR['lizon'], val_F1['lizon'],test_weighted_avg_F1, test_kappa, test_PPV['lizon'], test_TPR['lizon'], test_F1['lizon']

In [23]:
def triareduced(MODEL_NAME, maxlen=512,batch_size=8,lr=1e-5,epochs=20, reduce_on_plateau=1, ind=0):
  t = text.Transformer(MODEL_NAME, maxlen=maxlen)
  trn = t.preprocess_train(x_train, y_train)
  val = t.preprocess_test(x_valid, y_valid)
  test = t.preprocess_test(x_test, y_test)
  model = t.get_classifier()
  learner = ktrain.get_learner(model, train_data=trn, val_data=val, batch_size=batch_size)
  learner.autofit(  lr=lr, epochs=epochs, reduce_on_plateau=reduce_on_plateau)
  predictor = ktrain.get_predictor(learner.model, preproc=t)
  #Model_full_filename=os.path.join(Models, 'fit_onecycle_'+str(ind))
  #predictor.save(Model_full_filename)
  val_confusion_matrix=learner.validate(val_data=val, class_names=t.get_classes())
  val_weighted_avg_F1, val_kappa, val_PPV, val_TPR, val_F1 = model_metrics(np_confusion_matrix=val_confusion_matrix,class_names=t.get_classes())

  test_confusion_matrix=learner.validate(val_data=test, class_names=t.get_classes())
  test_weighted_avg_F1, test_kappa, test_PPV, test_TPR, test_F1 = model_metrics(np_confusion_matrix=test_confusion_matrix,class_names=t.get_classes())

  return val_weighted_avg_F1, val_kappa, val_PPV['lizon'], val_TPR['lizon'], val_F1['lizon'],test_weighted_avg_F1, test_kappa, test_PPV['lizon'], test_TPR['lizon'], test_F1['lizon']

In [24]:
def SGDR2(MODEL_NAME, maxlen=512,batch_size=8,lr=1e-5,n_cycles=5, cycle_len=1, cycle_mult=2, ind=0):
  t = text.Transformer(MODEL_NAME, maxlen=maxlen)
  trn = t.preprocess_train(x_train, y_train)
  val = t.preprocess_test(x_valid, y_valid)
  test = t.preprocess_test(x_test, y_test)
  model = t.get_classifier()
  learner = ktrain.get_learner(model, train_data=trn, val_data=val, batch_size=batch_size)
  learner.fit(lr=lr, n_cycles=n_cycles, cycle_len=cycle_len, cycle_mult=cycle_mult)
  predictor = ktrain.get_predictor(learner.model, preproc=t)
  #Model_full_filename=os.path.join(Models, 'fit_onecycle_'+str(ind))
  #predictor.save(Model_full_filename)
  val_confusion_matrix=learner.validate(val_data=val, class_names=t.get_classes())
  val_weighted_avg_F1, val_kappa, val_PPV, val_TPR, val_F1 = model_metrics(np_confusion_matrix=val_confusion_matrix,class_names=t.get_classes())

  test_confusion_matrix=learner.validate(val_data=test, class_names=t.get_classes())
  test_weighted_avg_F1, test_kappa, test_PPV, test_TPR, test_F1 = model_metrics(np_confusion_matrix=test_confusion_matrix,class_names=t.get_classes())

  return val_weighted_avg_F1, val_kappa, val_PPV['lizon'], val_TPR['lizon'], val_F1['lizon'],test_weighted_avg_F1, test_kappa, test_PPV['lizon'], test_TPR['lizon'], test_F1['lizon']

In [25]:
 for index, row in Experiment.iterrows():
  print('Processing %s started...'%(row['Model']))
  if (not(NewExecution) and row['duration'])>0:
    print('%s is already processed. Continue'%(row['Model']))
    continue  
  
  print(row)
  print('---------------------------------------------')
  try:
    ts_start = time.time()
    if row['method']=='fit_onecycle':
      if 'func' in Experiment.columns:
        val_weighted_avg_F1, val_kappa, val_PPV, val_TPR, val_F1, test_weighted_avg_F1, test_kappa, test_PPV, test_TPR, test_F1 = fit_onecycle(row['Model'],row['maxlen'],row['batch_size'],row['lr'],row['epochs'],index,row['func'])
      else:
        val_weighted_avg_F1, val_kappa, val_PPV, val_TPR, val_F1, test_weighted_avg_F1, test_kappa, test_PPV, test_TPR, test_F1 = fit_onecycle(row['Model'],row['maxlen'],row['batch_size'],row['lr'],row['epochs'],index)
    elif row['method']=='tria':
      val_weighted_avg_F1, val_kappa, val_PPV, val_TPR, val_F1, test_weighted_avg_F1, test_kappa, test_PPV, test_TPR, test_F1 = tria(row['Model'],row['maxlen'],row['batch_size'],row['lr'],index)   
    elif row['method']=='SGDR1':
      val_weighted_avg_F1, val_kappa, val_PPV, val_TPR, val_F1, test_weighted_avg_F1, test_kappa, test_PPV, test_TPR, test_F1 = SGDR1(row['Model'],row['maxlen'],row['batch_size'],row['lr'],row['n_cycles'],row['cycle_len'],index) 
    elif row['method']=='triareduced':
      val_weighted_avg_F1, val_kappa, val_PPV, val_TPR, val_F1, test_weighted_avg_F1, test_kappa, test_PPV, test_TPR, test_F1 = triareduced(row['Model'],row['maxlen'],row['batch_size'],row['lr'],row['epochs'],row['reduce_on_plateau'],index)    
    elif row['method']=='SGRD2':
      val_weighted_avg_F1, val_kappa, val_PPV, val_TPR, val_F1, test_weighted_avg_F1, test_kappa, test_PPV, test_TPR, test_F1 = SGDR2(row['Model'],row['maxlen'],row['batch_size'],row['lr'],row['n_cycles'],row['cycle_len'],row['cycle_mult'],index)                     
    ts_end = time.time()
    Experiment.at[index,'duration']=(ts_end - ts_start)/60  

    Experiment.at[index,'weighted_avg_F1']=val_weighted_avg_F1
    Experiment.at[index,'kappa']=val_kappa
    Experiment.at[index,'lizon-precision']=val_PPV
    Experiment.at[index,'lizon-recall']=val_TPR
    Experiment.at[index,'lizon-f1-score']=val_F1

    Experiment.at[index,'test_weighted_avg_F1']=test_weighted_avg_F1
    Experiment.at[index,'test_kappa']=test_kappa
    Experiment.at[index,'test_lizon-precision']=test_PPV
    Experiment.at[index,'test_lizon-recall']=test_TPR
    Experiment.at[index,'test_lizon-f1-score']=test_F1    
  except:
    raise
    Experiment.at[index,'duration']=10000
    Experiment.at[index,'comment']='Failed'


  #---------------------------Save results to the log------
  try:
    SaveToExperimentLog(Experiments_file, Experiment_name, Experiment)
  except:
    #Continue training even if there is an issue
    print('Error saving to file!')  

Processing DeepPavlov/rubert-base-cased-conversational started...
Model                   DeepPavlov/rubert-base-cased-conversational
maxlen                                                          512
batch_size                                                        8
epochs                                                            3
lr                                                            2e-05
method                                                 fit_onecycle
weighted_avg_F1                                                 NaN
kappa                                                           NaN
lizon-precision                                                 NaN
lizon-recall                                                    NaN
lizon-f1-score                                                  NaN
duration                                                        NaN
comment                                                         NaN
test_weighted_avg_F1                              

Is Multi-Label? False
preprocessing test...
language: ru
test sequence lengths:
	mean : 56
	95percentile : 124
	99percentile : 210


preprocessing test...
language: ru
test sequence lengths:
	mean : 58
	95percentile : 133
	99percentile : 222


404 Client Error: Not Found for url: https://huggingface.co/DeepPavlov/rubert-base-cased-conversational/resolve/main/tf_model.h5




begin training using onecycle policy with max lr of 2e-05...
Epoch 1/3
Epoch 2/3
Epoch 3/3
              precision    recall  f1-score   support

       Other       0.96      0.99      0.98      1879
       lizon       0.86      0.55      0.67       151

    accuracy                           0.96      2030
   macro avg       0.91      0.77      0.83      2030
weighted avg       0.96      0.96      0.96      2030

              precision    recall  f1-score   support

       Other       0.96      0.99      0.97     11230
       lizon       0.56      0.25      0.34       635

    accuracy                           0.95     11865
   macro avg       0.76      0.62      0.66     11865
weighted avg       0.94      0.95      0.94     11865

Processing blinoff/roberta-base-russian-v0 started...
Model                   blinoff/roberta-base-russian-v0
maxlen                                              256
batch_size                                           16
epochs                         

Is Multi-Label? False
preprocessing test...
language: ru
test sequence lengths:
	mean : 56
	95percentile : 124
	99percentile : 210


preprocessing test...
language: ru
test sequence lengths:
	mean : 58
	95percentile : 133
	99percentile : 222


404 Client Error: Not Found for url: https://huggingface.co/blinoff/roberta-base-russian-v0/resolve/main/tf_model.h5




begin training using onecycle policy with max lr of 1e-05...
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
              precision    recall  f1-score   support

       Other       0.97      0.99      0.98      1879
       lizon       0.84      0.60      0.70       151

    accuracy                           0.96      2030
   macro avg       0.91      0.80      0.84      2030
weighted avg       0.96      0.96      0.96      2030

              precision    recall  f1-score   support

       Other       0.98      0.98      0.98     11230
       lizon       0.62      0.66      0.64       635

    accuracy                           0.96     11865
   macro avg       0.80      0.82      0.81     11865
weighted avg       0.96      0.96      0.96     11865

