# NLP assignment 3 - finetune BERT

We use to PyTorch implementation of BERT from: https://github.com/huggingface/pytorch-pretrained-BERT

We have used this blog post (https://medium.com/huggingface/multi-label-text-classification-using-bert-the-mighty-transformer-69714fa3fb3d) and the supporting code (https://nbviewer.jupyter.org/github/kaushaltrivedi/bert-toxic-comments-multilabel/blob/master/toxic-bert-multilabel-classification.ipynb) as a model for implementing our classifier. We refer to this below as Trivedi 2019. 

## Imports - will need to authorise Google Drive

In [0]:
!pip install pytorch-pretrained-bert

Collecting pytorch-pretrained-bert
[?25l  Downloading https://files.pythonhosted.org/packages/5d/3c/d5fa084dd3a82ffc645aba78c417e6072ff48552e3301b1fa3bd711e03d4/pytorch_pretrained_bert-0.6.1-py3-none-any.whl (114kB)
[K    100% |████████████████████████████████| 122kB 7.7MB/s 
Installing collected packages: pytorch-pretrained-bert
Successfully installed pytorch-pretrained-bert-0.6.1


In [0]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import pickle

In [0]:
from pytorch_pretrained_bert.tokenization import BertTokenizer, WordpieceTokenizer
from pytorch_pretrained_bert.modeling import BertForPreTraining, BertModel, BertConfig, BertForMaskedLM, BertForSequenceClassification #PretrainedBertModel
from pathlib import Path
import torch
import re
from torch import Tensor
from torch.nn import BCEWithLogitsLoss
from fastai.text import Tokenizer, Vocab
import collections
import os
import pdb
from tqdm import tqdm, trange
import sys
import random
from sklearn.model_selection import train_test_split
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)


from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler
from torch.utils.data.distributed import DistributedSampler
from pytorch_pretrained_bert.optimization import BertAdam

Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex.


In [0]:
import logging
logging.basicConfig(format='%(asctime)s - %(levelname)s - %(name)s -   %(message)s',
                    datefmt='%m/%d/%Y %H:%M:%S',
                    level=logging.INFO)
logger = logging.getLogger(__name__)

In [0]:
from google.colab import drive
drive.mount('/content/gdrive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/gdrive


## Define classes

These are all from the PyTorch BERT Github - copied in for reference when we were setting up the features.

In [0]:
class InputExample(object):
    """A single training/test example for simple sequence classification."""

    def __init__(self, guid, text_a, text_b=None, label=None):
        """Constructs a InputExample.
        Args:
            guid: Unique id for the example.
            text_a: string. The untokenized text of the first sequence. For single
            sequence tasks, only this sequence must be specified.
            text_b: (Optional) string. The untokenized text of the second sequence.
            Only must be specified for sequence pair tasks.
            label: (Optional) string. The label of the example. This should be
            specified for train and dev examples, but not for test examples.
        """
        self.guid = guid
        self.text_a = text_a
        self.text_b = text_b
        self.label = label

In [0]:
class InputFeatures(object):
    """A single set of features of data."""

    def __init__(self, input_ids, input_mask, segment_ids, label_ids):
        self.input_ids = input_ids
        self.input_mask = input_mask
        self.segment_ids = segment_ids
        self.label_ids = label_ids

In [0]:
def convert_examples_to_features(examples, label_list, max_seq_length, tokenizer):
    """Loads a data file into a list of `InputBatch`s."""

    label_map = {label : i for i, label in enumerate(label_list)}

    features = []
    for (ex_index, example) in enumerate(examples):
        tokens_a = tokenizer.tokenize(example.text_a)

        tokens_b = None
        if example.text_b:
            tokens_b = tokenizer.tokenize(example.text_b)
            # Modifies `tokens_a` and `tokens_b` in place so that the total
            # length is less than the specified length.
            # Account for [CLS], [SEP], [SEP] with "- 3"
            _truncate_seq_pair(tokens_a, tokens_b, max_seq_length - 3)
        else:
            # Account for [CLS] and [SEP] with "- 2"
            if len(tokens_a) > max_seq_length - 2:
                tokens_a = tokens_a[:(max_seq_length - 2)]

        # The convention in BERT is:
        # (a) For sequence pairs:
        #  tokens:   [CLS] is this jack ##son ##ville ? [SEP] no it is not . [SEP]
        #  type_ids: 0   0  0    0    0     0       0 0    1  1  1  1   1 1
        # (b) For single sequences:
        #  tokens:   [CLS] the dog is hairy . [SEP]
        #  type_ids: 0   0   0   0  0     0 0
        #
        # Where "type_ids" are used to indicate whether this is the first
        # sequence or the second sequence. The embedding vectors for `type=0` and
        # `type=1` were learned during pre-training and are added to the wordpiece
        # embedding vector (and position vector). This is not *strictly* necessary
        # since the [SEP] token unambigiously separates the sequences, but it makes
        # it easier for the model to learn the concept of sequences.
        #
        # For classification tasks, the first vector (corresponding to [CLS]) is
        # used as as the "sentence vector". Note that this only makes sense because
        # the entire model is fine-tuned.
        tokens = ["[CLS]"] + tokens_a + ["[SEP]"]
        segment_ids = [0] * len(tokens)

        if tokens_b:
            tokens += tokens_b + ["[SEP]"]
            segment_ids += [1] * (len(tokens_b) + 1)

        input_ids = tokenizer.convert_tokens_to_ids(tokens)

        # The mask has 1 for real tokens and 0 for padding tokens. Only real
        # tokens are attended to.
        input_mask = [1] * len(input_ids)

        # Zero-pad up to the sequence length.
        padding = [0] * (max_seq_length - len(input_ids))
        input_ids += padding
        input_mask += padding
        segment_ids += padding

        assert len(input_ids) == max_seq_length
        assert len(input_mask) == max_seq_length
        assert len(segment_ids) == max_seq_length

        label_ids = label_map[example.label]
        if ex_index < 5:
            logger.info("*** Example ***")
            logger.info("guid: %s" % (example.guid))
            logger.info("tokens: %s" % " ".join(
                    [str(x) for x in tokens]))
            logger.info("input_ids: %s" % " ".join([str(x) for x in input_ids]))
            logger.info("input_mask: %s" % " ".join([str(x) for x in input_mask]))
            logger.info(
                    "segment_ids: %s" % " ".join([str(x) for x in segment_ids]))
            logger.info("label: %s (id = %d)" % (example.label, label_ids))

        features.append(
                InputFeatures(input_ids=input_ids,
                              input_mask=input_mask,
                              segment_ids=segment_ids,
                              label_ids=label_ids))
    return features

## Training functions

These functions are based on those from Trivedi 2019.

In [0]:
def warmup_linear(x, warmup=0.002):
    if x < warmup:
        return x/warmup
    return 1.0 - x

In [0]:
def fit(model, train_dataloader, device, optimizer, num_epochs):
    
    resultsdf = pd.DataFrame(columns = ['epoch', 'train loss', 'train accuracy', 'validation loss', 'validation accuracy'])
    batch_losses = []
    
    global_step = 0
    model.train()
    for i_ in (range(int(num_epochs))):

        tr_loss = 0
        nb_tr_examples, nb_tr_steps = 0, 0
        for step, batch in enumerate(train_dataloader):

            batch = tuple(t.to(device) for t in batch)
            input_ids, input_mask, segment_ids, label_ids = batch
            loss = model(input_ids, segment_ids, input_mask, label_ids)
            
            batch_losses.append(loss.item())
            
            if step % 50 ==0:
              logger.info(f"Loss on batch {step}: {loss}")
                      
            if args['fp16']:
              optimizer.backward(loss)
            else:
              loss.backward()

            tr_loss += loss.item()
            nb_tr_examples += input_ids.size(0)
            nb_tr_steps += 1

            if (step + 1) % args['gradient_accumulation_steps'] == 0:
              if args['fp16']:
                # modify learning rate with special warm up BERT uses
                # if args.fp16 is False, BertAdam is used that handles this automatically
                lr_this_step = args['learning_rate'] * warmup_linear(global_step/num_train_optimization_steps, args['warmup_proportion'])
                for param_group in optimizer.param_groups:
                  param_group['lr'] = lr_this_step
              optimizer.step()
              optimizer.zero_grad()
              global_step += 1
            
        logger.info('Training loss after epoch {}'.format(tr_loss / nb_tr_steps))
        train_tup = eval(train_examples, train_features, model=model, device=device)
        logger.info('Training accuracy after epoch {}'.format(train_tup[0]['accuracy']))
        logger.info("***** Running evaluation *****")
        logger.info('Eval after epoch {}'.format(i_+1))
        eval_tup = eval(eval_examples, eval_features, model = model, device = device)
        logger.info(eval_tup[0])
        
        resultsdf = resultsdf.append({"epoch": i_+1, "train loss": train_tup[0]['loss'], "train accuracy": train_tup[0]['accuracy'],
                         "validation loss": eval_tup[0]['loss'], "validation accuracy": eval_tup[0]['accuracy']}, ignore_index=True)
    
    return resultsdf, batch_losses

## Evaluation functions

The functions accuracy() and eval() are based on those from Trevedi 2019.

In [0]:
def accuracy(out, labels):
    outputs = np.argmax(out, axis=1)
    return np.sum(outputs == labels)

Use this function to caluclate the accuracy on the balanced task

In [0]:
def balanced_accuracy(out, labels):
  
  #'out' should be the logits put into a softmax
  paired_pred = []
  
  for i in np.arange(0, len(out),2):
    if out[i][1] < out[i+1][1]:
      paired_pred.append(0)
      paired_pred.append(1)
    else:
      paired_pred.append(1)
      paired_pred.append(0)
  
  return np.sum(np.array(paired_pred) == labels)/len(out)

In [0]:
def eval(eval_examples, eval_features, model, device):
        
    args['output_dir'].mkdir(exist_ok=True)

    all_input_ids = torch.tensor([f.input_ids for f in eval_features], dtype=torch.long)
    all_input_mask = torch.tensor([f.input_mask for f in eval_features], dtype=torch.long)
    all_segment_ids = torch.tensor([f.segment_ids for f in eval_features], dtype=torch.long)
    all_label_ids = torch.tensor([f.label_ids for f in eval_features], dtype=torch.long)
    eval_data = TensorDataset(all_input_ids, all_input_mask, all_segment_ids, all_label_ids)
    # Run prediction for full data
    eval_sampler = SequentialSampler(eval_data)
    eval_dataloader = DataLoader(eval_data, sampler=eval_sampler, batch_size=args['eval_batch_size'])
    
    all_logits = None
    all_labels = None
    
    model.eval()
    eval_loss, eval_accuracy = 0, 0
    nb_eval_steps, nb_eval_examples = 0, 0
    for input_ids, input_mask, segment_ids, label_ids in eval_dataloader:
        input_ids = input_ids.to(device)
        input_mask = input_mask.to(device)
        segment_ids = segment_ids.to(device)
        label_ids = label_ids.to(device)

        with torch.no_grad():
            tmp_eval_loss = model(input_ids, segment_ids, input_mask, label_ids)
            logits = model(input_ids, segment_ids, input_mask)

        logits = logits.detach().cpu().numpy()
        label_ids = label_ids.to('cpu').numpy()
        tmp_eval_accuracy = accuracy(logits, label_ids)
        
        if all_logits is None:
            all_logits = logits
        else:
            all_logits = np.concatenate((all_logits, logits), axis=0)
            
        if all_labels is None:
            all_labels = label_ids
        else:    
            all_labels = np.concatenate((all_labels, label_ids), axis=0)

        eval_loss += tmp_eval_loss.mean().item()
        eval_accuracy += tmp_eval_accuracy

        nb_eval_examples += input_ids.size(0)
        nb_eval_steps += 1

    eval_loss = eval_loss / nb_eval_steps
    eval_accuracy = eval_accuracy / nb_eval_examples
    
    result = {'loss': eval_loss,
              'accuracy': eval_accuracy}
    
    return (result, all_logits, all_labels)

## Set an output path and the default value of the arguments

In [0]:
OUTPUT_PATH = Path('gdrive/My Drive/tmp/output')
OUTPUT_PATH.mkdir(parents=True, exist_ok = True)

The default arguments are based on those from Trivedi (2019)

In [0]:
args = {
    "train_size": -1,
    "val_size": -1,
    "task_name": "sarcpol",
    "no_cuda": False,
    "bert_model": 'bert-base-uncased',
    "output_dir": OUTPUT_PATH,
    "max_seq_length": 50,
    "do_train": True,
    "do_eval": True,
    "do_lower_case": True,
    "train_batch_size": 32 ,
    "eval_batch_size": 32,
    "learning_rate": 3e-5,
    "num_train_epochs": 5,
    "warmup_proportion": 0.1,
    "no_cuda": False,
    "local_rank": -1,
    "seed": 42,
    "gradient_accumulation_steps": 1,
    "optimize_on_cpu": False,
    "fp16": False,
    "loss_scale": 128
}

## Load in the training and validation sets

In [0]:
#Select the path contained the datasets
SARC_POL = '/content/gdrive/My Drive/SARC pol/'

In [0]:
#Load in the required training set
traindf = pd.read_csv(SARC_POL+'project_data/project_training_12.csv', index_col=0)

In [0]:
#Load in the validation set
validdf = pd.read_csv(SARC_POL+'project_data/project_validation.csv', index_col = 0)

## Process the training and validation sets




In [0]:
#Process the training examples
train_examples = []

for i in range(0,len(traindf.index)):
        train_examples.append(InputExample(str(i), traindf.loc[i,'response'], None, str(traindf.loc[i,'label'])))

In [0]:
#Process the validation examples
eval_examples = []

for i in range(0,len(validdf.index)):
        eval_examples.append(InputExample(str(i), validdf.loc[i,'response'], None, str(validdf.loc[i,'label'])))

In [0]:
#Create a list of labels
label_list = ['0', '1']
num_labels = len(label_list)

In [0]:
#Instantiate the tokenizer
tokenizer = BertTokenizer.from_pretrained(args['bert_model'], do_lower_case=args['do_lower_case'])

03/05/2019 22:20:30 - INFO - pytorch_pretrained_bert.file_utils -   https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt not found in cache, downloading to /tmp/tmpwxnp1pce
100%|██████████| 231508/231508 [00:00<00:00, 918957.81B/s]
03/05/2019 22:20:30 - INFO - pytorch_pretrained_bert.file_utils -   copying /tmp/tmpwxnp1pce to cache at /root/.pytorch_pretrained_bert/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
03/05/2019 22:20:30 - INFO - pytorch_pretrained_bert.file_utils -   creating metadata file for /root/.pytorch_pretrained_bert/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
03/05/2019 22:20:30 - INFO - pytorch_pretrained_bert.file_utils -   removing temp file /tmp/tmpwxnp1pce
03/05/2019 22:20:30 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.

In [0]:
#Create the features based on the training set
train_features = convert_examples_to_features(train_examples, label_list, args["max_seq_length"], tokenizer)

03/05/2019 22:20:34 - INFO - __main__ -   *** Example ***
03/05/2019 22:20:34 - INFO - __main__ -   guid: 0
03/05/2019 22:20:34 - INFO - __main__ -   tokens: [CLS] or anyone that ' s ever had to make an appeal . [SEP]
03/05/2019 22:20:34 - INFO - __main__ -   input_ids: 101 2030 3087 2008 1005 1055 2412 2018 2000 2191 2019 5574 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03/05/2019 22:20:34 - INFO - __main__ -   input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03/05/2019 22:20:34 - INFO - __main__ -   segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03/05/2019 22:20:34 - INFO - __main__ -   label: 0 (id = 0)
03/05/2019 22:20:34 - INFO - __main__ -   *** Example ***
03/05/2019 22:20:34 - INFO - __main__ -   guid: 1
03/05/2019 22:20:34 - INFO - __main__ -   tokens: [CLS] trump is the health ##iest president to ever take office

In [0]:
#Create the features based on the validation set
eval_features = convert_examples_to_features(eval_examples, label_list, args["max_seq_length"], tokenizer)

03/05/2019 22:20:38 - INFO - __main__ -   *** Example ***
03/05/2019 22:20:38 - INFO - __main__ -   guid: 0
03/05/2019 22:20:38 - INFO - __main__ -   tokens: [CLS] and if trump builds that wall they will be stuck here . [SEP]
03/05/2019 22:20:38 - INFO - __main__ -   input_ids: 101 1998 2065 8398 16473 2008 2813 2027 2097 2022 5881 2182 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03/05/2019 22:20:38 - INFO - __main__ -   input_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03/05/2019 22:20:38 - INFO - __main__ -   segment_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
03/05/2019 22:20:38 - INFO - __main__ -   label: 1 (id = 1)
03/05/2019 22:20:38 - INFO - __main__ -   *** Example ***
03/05/2019 22:20:38 - INFO - __main__ -   guid: 1
03/05/2019 22:20:38 - INFO - __main__ -   tokens: [CLS] says a voluntary survey . [SEP]
03/05/2019 22:2

## Define function to train the model

This function is based on the training function from Trivedi (2019)

In [0]:
def train():

#Set up PyTorch options

  num_train_optimization_steps = None
  if args["do_train"]:
      num_train_optimization_steps = int(
          len(train_examples) / args['train_batch_size'] / args['gradient_accumulation_steps']) * args['num_train_epochs']
      if args["local_rank"] != -1:
          num_train_optimization_steps = num_train_optimization_steps // torch.distributed.get_world_size()
  num_train_steps = int(
          len(train_examples) / args['train_batch_size'] / args['gradient_accumulation_steps'] * args['num_train_epochs'])

  if args["local_rank"] == -1 or args["no_cuda"]:
      device = torch.device("cuda" if torch.cuda.is_available() and not args["no_cuda"] else "cpu")
      n_gpu = torch.cuda.device_count()
  #     n_gpu = 1
  else:
      torch.cuda.set_device(args['local_rank'])
      device = torch.device("cuda", args['local_rank'])
      n_gpu = 1
      # Initializes the distributed backend which will take care of sychronizing nodes/GPUs
      torch.distributed.init_process_group(backend='nccl')
  logger.info("device: {} n_gpu: {}, distributed training: {}, 16-bits training: {}".format(
          device, n_gpu, bool(args['local_rank'] != -1), args['fp16']))

  #Instantiate the model

  model = BertForSequenceClassification.from_pretrained(args["bert_model"],
            num_labels = num_labels)
  if args["fp16"]:
      model.half()
  model.to(device)
  if args["local_rank"] != -1:
      try:
          from apex.parallel import DistributedDataParallel as DDP
      except ImportError:
          raise ImportError("Please install apex from https://www.github.com/nvidia/apex to use distributed and fp16 training.")

      model = DDP(model)
  elif n_gpu > 1:
      model = torch.nn.DataParallel(model)


  #Instantiate the optimizer
  param_optimizer = list(model.named_parameters())
  no_decay = ['bias', 'LayerNorm.bias', 'LayerNorm.weight']
  optimizer_grouped_parameters = [
      {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)], 'weight_decay': 0.01},
      {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}
      ]
  if args["fp16"]:
      try:
          from apex.optimizers import FP16_Optimizer
          from apex.optimizers import FusedAdam
      except ImportError:
          raise ImportError("Please install apex from https://www.github.com/nvidia/apex to use distributed and fp16 training.")

      optimizer = FusedAdam(optimizer_grouped_parameters,
                            lr=args["learning_rate"],
                            bias_correction=False,
                            max_grad_norm=1.0)
      if args["loss_scale"]== 0:
          optimizer = FP16_Optimizer(optimizer, dynamic_loss_scale=True)
      else:
          optimizer = FP16_Optimizer(optimizer, static_loss_scale=args["loss_scale"])

  else:
      optimizer = BertAdam(optimizer_grouped_parameters,
                           lr=args["learning_rate"],
                           warmup=args["warmup_proportion"],
                           t_total=num_train_optimization_steps)


  #Instantiate the PyTorch datasets and print key details
  logger.info("  Num examples = %d", len(train_examples))
  logger.info("  Batch size = %d", args['train_batch_size'])
  logger.info("  Num steps = %d", num_train_steps)
  all_input_ids = torch.tensor([f.input_ids for f in train_features], dtype=torch.long)
  all_input_mask = torch.tensor([f.input_mask for f in train_features], dtype=torch.long)
  all_segment_ids = torch.tensor([f.segment_ids for f in train_features], dtype=torch.long)
  all_label_ids = torch.tensor([f.label_ids for f in train_features], dtype=torch.long)
  train_data = TensorDataset(all_input_ids, all_input_mask, all_segment_ids, all_label_ids)
  if args['local_rank'] == -1:
      train_sampler = RandomSampler(train_data)
  else:
      train_sampler = DistributedSampler(train_data)
  train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=args['train_batch_size'])

  resultsdf, batch_losses = fit(model = model, train_dataloader = train_dataloader, device = device, optimizer = optimizer, num_epochs = args["num_train_epochs"])
  
  return resultsdf, batch_losses

## Set hyperparameters for cross-validation and run training

In [0]:
learning_rates = [5e-5, 3e-5, 2e-5]
batch_sizes = [16, 32]
nrepeats = 5
args["num_train_epochs"] = 10

In [0]:
#Not explicitly setting seed, each run because each lopp will generate a different random number from the seed set in args
for lr in learning_rates:
  args["learning_rate"] = lr
  
  for bs in batch_sizes:
    args["train_batch_size"] = bs
    
    for n in range(0, nrepeats):
      resultsdf, batch_losses = train()
      
      #May need to update save location here
      with open(SARC_POL+f"CVruns/results_train12_{lr}_{bs}_{n}_10epochs.pickle", 'wb') as handle:
        pickle.dump(resultsdf, handle)
        
      with open(SARC_POL+f"CVruns/batchlosses_train12_{lr}_{bs}_{n}_10epochs.pickle", 'wb') as handle:
        pickle.dump(batch_losses, handle)
      
      
  

03/05/2019 22:21:03 - INFO - __main__ -   device: cuda n_gpu: 1, distributed training: False, 16-bits training: False
03/05/2019 22:21:03 - INFO - pytorch_pretrained_bert.file_utils -   https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz not found in cache, downloading to /tmp/tmp3pzjqb3u
100%|██████████| 407873900/407873900 [00:14<00:00, 28678243.49B/s]
03/05/2019 22:21:18 - INFO - pytorch_pretrained_bert.file_utils -   copying /tmp/tmp3pzjqb3u to cache at /root/.pytorch_pretrained_bert/9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba
03/05/2019 22:21:19 - INFO - pytorch_pretrained_bert.file_utils -   creating metadata file for /root/.pytorch_pretrained_bert/9c41111e2de84547a463fd39217199738d1e3deb72d4fec4399e6e241983c6f0.ae3cef932725ca7a30cdcb93fc6e09150a55e2a130ec7af63975a16c153ae2ba
03/05/2019 22:21:19 - INFO - pytorch_pretrained_bert.file_utils -   removing temp file /tmp/tmp3