This is the notebook for experimenting with a zero-shot sliding-window Question-Answering model approach to the task of classifying documents based on whether or not they give evidence that the company submitting the document has provided any training about modern slavery to their employees. 

The motivation is to use transfer learning from models pre-trained to extract relevant answers (as a span) from a document (context) in order to automate the identification of which small subsets of the documents might be relevant to modern slavery training. These smaller subsets can then make the job of human-labelling additional documents more efficient or be fed into another model (perhaps a transformer trained for sequence classification) which can only handle a limited number of tokens.

The idea behind the approach is to use a pretrained QA model (one trained on SQuAD v2 such that it can return a "no span found" result) to ask questions of the documents. Since most documents in the dataset are longer than the maximum input length, a sliding window approach is used: after the entire document is tokenized, the QA model is run on successive windows, each slid by stride=128 tokens (~1/4th of the window size). All spans returned by the QA model are recorded in a new dataframe (df_with_segments.parquet). A notebook for visualizing the results of the sliding-window QA model approach is available: 'QA results viewer.ipynb'

Six questions are trialed to see which one(s) provide the best results:
 - 'Is there training provided?'
 - 'Is there training already in place?'
 - 'Has training been done?'
 - 'Is training planned?'
 - 'Is training in development?'
 - 'What kind of training is provided?'

Note: as this is a zero-shot approach, we can ignore the labels as we will not be doing any training. Therefore, the labeled (train) and unlabeled (test) data will be concatenated into a single input dataframe.

In [1]:
import pandas as pd
import torch
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import re
from datetime import datetime, timedelta

PyTorch version 1.6.0 available.
TensorFlow version 2.3.0 available.


In [2]:
# import the data, strip away the labels and combine into a single df
df_labeled=pd.read_csv('train (3).csv',index_col=0)
df_hidden=pd.read_csv('test (3).csv',index_col=0)
df_labeled['source']='labeled'
df_hidden['source']='hidden'
df = pd.concat([df_labeled[['source','TEXT']], 
                df_hidden[['source','TEXT']]],axis=0).reset_index()
df

Unnamed: 0,ID,source,TEXT
0,0,labeled,Modern Slavery Statement\n\nUa\n\n> Responsibi...
1,1,labeled,Burton's Biscuit Company (a trading name of Bu...
2,2,labeled,MODERN SLAVERY ACT STATEMENT\nOUR BUSINESS Zal...
3,3,labeled,MENU\nHOME\nU.K. MODERN SLAVERY ACT STATEMENT\...
4,4,labeled,Modern Slavery Act Statement\nIntroduction fro...
...,...,...,...
976,326,hidden,CECP Advisors LLP Modern Slavery Act Statement...
977,327,hidden,Modern Slavery Act Transparency Statement\n201...
978,328,hidden,MENU\n\n0333 2203 121\nBOOK A ROOM\n\nAnti Sla...
979,329,hidden,"We have placed cookies on your computer, as th..."


In [3]:
# any characters repeated more than 4 times will be shortened to 4 repetitions: 
# https://stackoverflow.com/questions/10072744/remove-repeating-characters-from-words
df['TEXT']=df['TEXT'].apply(lambda x: re.sub(r'(.)\1{4,}', r'\1\1\1\1', str(x)))

In [4]:
# Model chosen based on SQuAD v2 leaderboards December 2020
model_name = 'ktrapeznikov/albert-xlarge-v2-squad-v2'
tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModelForQuestionAnswering.from_pretrained(model_name)
model.eval()

loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/ktrapeznikov/albert-xlarge-v2-squad-v2/config.json from cache at C:\Users\dhilg/.cache\torch\transformers\c63acbd2ffb1762d161c0c366bb4a0dd5312f615847b87d4cf7be001ca562cab.0aa9a4e13357b14219e56e90005cf95adfc4fbb59ad847974267550fef9c2f6f
Model config AlbertConfig {
  "architectures": [
    "AlbertForQuestionAnswering"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 2,
  "classifier_dropout_prob": 0.1,
  "down_scale_factor": 1,
  "embedding_size": 128,
  "eos_token_id": 3,
  "gap_size": 0,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 2048,
  "initializer_range": 0.02,
  "inner_group_num": 1,
  "intermediate_size": 8192,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "albert",
  "net_structure_type": 0,
  "num_attention_heads": 16,
  "num_hidden_groups": 1,
  "num_hidden_layers": 24,
  "num_memory_blocks": 0,
  "output_past": true,
  "pad_toke

AlbertForQuestionAnswering(
  (albert): AlbertModel(
    (embeddings): AlbertEmbeddings(
      (word_embeddings): Embedding(30000, 128, padding_idx=0)
      (position_embeddings): Embedding(512, 128)
      (token_type_embeddings): Embedding(2, 128)
      (LayerNorm): LayerNorm((128,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): AlbertTransformer(
      (embedding_hidden_mapping_in): Linear(in_features=128, out_features=2048, bias=True)
      (albert_layer_groups): ModuleList(
        (0): AlbertLayerGroup(
          (albert_layers): ModuleList(
            (0): AlbertLayer(
              (full_layer_layer_norm): LayerNorm((2048,), eps=1e-12, elementwise_affine=True)
              (attention): AlbertAttention(
                (query): Linear(in_features=2048, out_features=2048, bias=True)
                (key): Linear(in_features=2048, out_features=2048, bias=True)
                (value): Linear(in_features=2048, out_features=

In [5]:
def classify_tokens(model, tokenizer, questions, tokens, max_batch_size=8, max_model_tokens=512, stride=64):
    num_tokens = tokens.size()[1]
    token_classes = torch.zeros((len(questions), num_tokens), dtype=torch.long)

    batch_details=[]
    for i, question in enumerate(questions):
        question_tokens = tokenizer(question, return_tensors='pt')['input_ids']
        num_question_tokens = question_tokens.size()[1]
        max_context_tokens = max_model_tokens - num_question_tokens - 1 # -1 for the [SEP] token that will be added to the end

        num_windows = max(1, -(-(num_tokens - max_context_tokens) // stride) + 1)
        
        for j in range(num_windows):
            end = min(j*stride + max_context_tokens, num_tokens)
            start = max(0, end - max_context_tokens)
            batch_details.append({'question number':i,
                                  'question tokens':question_tokens,
                                  'token start':start,
                                  'token end':end
                                 })
            if len(batch_details) >= max_batch_size or (i==len(questions) and j==num_windows):
                token_classes = run_batch(model, tokens, token_classes, batch_details)
                batch_details = []
    
    return token_classes

def run_batch(model, tokens, token_classes, batch_details):
    inputs = batch_inputs(tokens, batch_details)
    
    answer_start_logits, answer_end_logits = model(**inputs)
    
    combined_logits = torch.cat([answer_start_logits.unsqueeze(0), answer_end_logits.unsqueeze(0)])
    span_input_ids = torch.max(combined_logits,2)[1]
    span_input_ids[1] += 1 # need to add 1 to end token of span
    
    # need to slide the token ids to remove the question tokens from the front of the tensor:
    input_question_lengths = torch.tensor([instance['question tokens'].size()[1] for instance in batch_details], 
                                          dtype=torch.long)
    span_token_ids = torch.max(span_input_ids - input_question_lengths, 
                               torch.zeros_like(span_input_ids)) # if no span is found, the model will point to the first 
                                                                 # token of the question. After subtracting the question
                                                                 # length, this would be negative, so set a floor of zero.
    for i, instance in enumerate(batch_details):
        # if a span was found, span_token_ids of the start token will be > 0
        if span_token_ids[0,i] > 0:
            span_token_start = span_token_ids[0,i] + instance['token start']
            span_token_end = span_token_ids[1,i] + instance['token start']
            
            token_classes[instance['question number'],span_token_start:span_token_end+1] = 1
    
    return token_classes
    
def batch_inputs(tokens, batch_details):
    max_tokens = max_batch_tokens(batch_details)
    batch_size = len(batch_details)
    
    inputs={'input_ids':torch.zeros(batch_size, max_tokens, dtype=torch.long),
            'token_type_ids':torch.zeros(batch_size, max_tokens, dtype=torch.long),
            'attention_mask':torch.zeros(batch_size, max_tokens, dtype=torch.long)
           }
    
    for i, instance in enumerate(batch_details):
        question_tokens = instance['question tokens']
        context_tokens = tokens[:, instance['token start']:instance['token end']]
        
        question_length = question_tokens.size()[1]
        context_length = context_tokens.size()[1]
        
        inputs['input_ids'][i, :question_length]=question_tokens.squeeze(0)
        inputs['input_ids'][i, question_length:question_length+context_length]=context_tokens.squeeze(0)
        # add final [SEP] token after context (same as final token of question tokens)
        inputs['input_ids'][i, -1]=question_tokens.squeeze(0)[-1]
        
        inputs['token_type_ids'][i, question_length:] = 1
        inputs['attention_mask'][i, :question_length+context_length+1] = 1
    return inputs

def max_batch_tokens(batch_details):
    max_tokens = 0
    for instance in batch_details:
        num_question_tokens = instance['question tokens'].size()[1] - 2 # don't count the [CLS] and [SEP] tokens
        num_context_tokens = instance['token end'] - instance['token start']
        instance_tokens = 1 + num_question_tokens + 1 + num_context_tokens + 1
        if instance_tokens > max_tokens:
            max_tokens = instance_tokens
    return max_tokens

def process_text(text, model, tokenizer, questions, max_batch_size=8, max_model_tokens=512, stride=64):
    tokens = tokenizer(text, return_tensors='pt')['input_ids'][:,1:-1] # drop first and last tokens ([CLS] and [SEP])
    
    token_classes = classify_tokens(model, tokenizer, questions, tokens, max_batch_size, max_model_tokens, stride)
    
    filtered_text=[]
    for i, question in enumerate(questions):
        filtered_text.append({'question':question,
                              'text segments':token_classes_to_str(tokenizer, tokens, token_classes[i])
                             })
    
    return filtered_text

def token_classes_to_str(tokenizer, tokens, token_classes):
    spans = identify_distinct_spans(token_classes)
    
    text_segments = []
    for span_start, span_end in spans:
        text_segments.append(tokens_to_str(tokenizer, tokens, span_start, span_end))
        
    return text_segments

def tokens_to_str(tokenizer, tokens, span_start, span_end):
    return tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(tokens[0][span_start:span_end+1]))
        
def identify_distinct_spans(token_classes):
    spans=[]
    last_token_class=0
    for i in range(token_classes.size()[0]):
        curr_token_class = token_classes[i]
        if last_token_class != curr_token_class:
            if last_token_class == 0:
                span_start = i
            else:
                span_end = i-1
                spans.append((span_start, span_end))
        last_token_class = curr_token_class
    
    return spans

In [6]:
def process_row(row_id, model, tokenizer, questions, max_batch_size=8, max_model_tokens=512, stride=64):
    filtered_text = process_text(text=df.loc[row_id,'TEXT'], 
                                 model=model, 
                                 tokenizer=tokenizer, 
                                 questions=questions, 
                                 max_batch_size=max_batch_size, 
                                 max_model_tokens=max_model_tokens, 
                                 stride=stride)
    for question in filtered_text:
        col_header = question['question']
        text_segments = question['text segments']
        for text_segment in text_segments:
            df.loc[row_id,col_header].append(text_segment)

In [7]:
questions=['Is there training provided?', 
           'Is there training already in place?',
           'Has training been done?',
           'Is training planned?',
           'Is training in development?',
           'What kind of training is provided?'
          ]

for question in questions:
    df[question]=[[] for _ in range(len(df))]

In [None]:
start_time=datetime.now()

for i in range(df.shape[0]):
    row_start = datetime.now()
    process_row(row_id=i, 
                model=model, 
                tokenizer=tokenizer, 
                questions=questions, 
                max_batch_size=2, 
                max_model_tokens=512, 
                stride=128)
    df.to_parquet('df_with_segments.parquet')
    print(f'row {i} finished at {datetime.now()}. Row time = {datetime.now() - row_start}. Total time elapsed = {datetime.now() - start_time}')
    
df

Token indices sequence length is longer than the specified maximum sequence length for this model (870 > 512). Running this sequence through the model will result in indexing errors


row 0 finished at 2020-12-11 13:23:36.092920. Row time = 0:04:39.406075. Total time elapsed = 0:04:39.422217


Token indices sequence length is longer than the specified maximum sequence length for this model (1259 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (1690 > 512). Running this sequence through the model will result in indexing errors


row 1 finished at 2020-12-11 13:29:26.767254. Row time = 0:05:50.567341. Total time elapsed = 0:10:30.081412


Token indices sequence length is longer than the specified maximum sequence length for this model (4146 > 512). Running this sequence through the model will result in indexing errors


row 2 finished at 2020-12-11 13:39:23.691940. Row time = 0:09:56.924686. Total time elapsed = 0:20:27.006098


Token indices sequence length is longer than the specified maximum sequence length for this model (746 > 512). Running this sequence through the model will result in indexing errors


row 3 finished at 2020-12-11 14:09:51.656265. Row time = 0:30:27.963362. Total time elapsed = 0:50:54.970423


Token indices sequence length is longer than the specified maximum sequence length for this model (723 > 512). Running this sequence through the model will result in indexing errors


row 4 finished at 2020-12-11 14:12:47.639705. Row time = 0:02:55.982467. Total time elapsed = 0:53:50.953863
row 5 finished at 2020-12-11 14:15:48.211806. Row time = 0:03:00.572101. Total time elapsed = 0:56:51.525964


Token indices sequence length is longer than the specified maximum sequence length for this model (10428 > 512). Running this sequence through the model will result in indexing errors


row 6 finished at 2020-12-11 15:43:03.934920. Row time = 1:27:15.723114. Total time elapsed = 2:24:07.249078


Token indices sequence length is longer than the specified maximum sequence length for this model (2390 > 512). Running this sequence through the model will result in indexing errors


row 7 finished at 2020-12-11 15:44:23.530051. Row time = 0:01:19.594151. Total time elapsed = 2:25:26.844209


Token indices sequence length is longer than the specified maximum sequence length for this model (1593 > 512). Running this sequence through the model will result in indexing errors


row 8 finished at 2020-12-11 16:04:42.621350. Row time = 0:20:19.090298. Total time elapsed = 2:45:45.935508


Token indices sequence length is longer than the specified maximum sequence length for this model (1217 > 512). Running this sequence through the model will result in indexing errors


row 9 finished at 2020-12-11 16:15:32.542328. Row time = 0:10:49.920978. Total time elapsed = 2:56:35.856486


Token indices sequence length is longer than the specified maximum sequence length for this model (4410 > 512). Running this sequence through the model will result in indexing errors


row 10 finished at 2020-12-11 16:23:59.585311. Row time = 0:08:27.042983. Total time elapsed = 3:05:02.899469


Token indices sequence length is longer than the specified maximum sequence length for this model (590 > 512). Running this sequence through the model will result in indexing errors


row 11 finished at 2020-12-11 17:02:23.627688. Row time = 0:38:24.041406. Total time elapsed = 3:43:26.941846


Token indices sequence length is longer than the specified maximum sequence length for this model (1169 > 512). Running this sequence through the model will result in indexing errors


row 12 finished at 2020-12-11 17:05:30.966617. Row time = 0:03:07.337932. Total time elapsed = 3:46:34.280775
row 13 finished at 2020-12-11 17:16:44.760919. Row time = 0:11:13.789747. Total time elapsed = 3:57:48.075077


Token indices sequence length is longer than the specified maximum sequence length for this model (2571 > 512). Running this sequence through the model will result in indexing errors


row 14 finished at 2020-12-11 17:17:39.949863. Row time = 0:00:55.188944. Total time elapsed = 3:58:43.264021


Token indices sequence length is longer than the specified maximum sequence length for this model (1528 > 512). Running this sequence through the model will result in indexing errors


row 15 finished at 2020-12-11 17:37:13.558455. Row time = 0:19:33.607619. Total time elapsed = 4:18:16.872613


Token indices sequence length is longer than the specified maximum sequence length for this model (1423 > 512). Running this sequence through the model will result in indexing errors


row 16 finished at 2020-12-11 17:48:20.672066. Row time = 0:11:07.113611. Total time elapsed = 4:29:23.986224


Token indices sequence length is longer than the specified maximum sequence length for this model (617 > 512). Running this sequence through the model will result in indexing errors


row 17 finished at 2020-12-11 18:02:47.742642. Row time = 0:14:27.070576. Total time elapsed = 4:43:51.056800


Token indices sequence length is longer than the specified maximum sequence length for this model (731 > 512). Running this sequence through the model will result in indexing errors


row 18 finished at 2020-12-11 18:05:52.632819. Row time = 0:03:04.890177. Total time elapsed = 4:46:55.946977


Token indices sequence length is longer than the specified maximum sequence length for this model (801 > 512). Running this sequence through the model will result in indexing errors


row 19 finished at 2020-12-11 18:10:15.918001. Row time = 0:04:23.285182. Total time elapsed = 4:51:19.232159


Token indices sequence length is longer than the specified maximum sequence length for this model (2634 > 512). Running this sequence through the model will result in indexing errors


row 20 finished at 2020-12-11 18:15:52.504273. Row time = 0:05:36.585299. Total time elapsed = 4:56:55.818431


Token indices sequence length is longer than the specified maximum sequence length for this model (545 > 512). Running this sequence through the model will result in indexing errors


row 21 finished at 2020-12-11 18:40:23.151351. Row time = 0:24:30.646077. Total time elapsed = 5:21:26.465509


Token indices sequence length is longer than the specified maximum sequence length for this model (709 > 512). Running this sequence through the model will result in indexing errors


row 22 finished at 2020-12-11 18:42:43.356001. Row time = 0:02:20.204650. Total time elapsed = 5:23:46.670159


Token indices sequence length is longer than the specified maximum sequence length for this model (1213 > 512). Running this sequence through the model will result in indexing errors


row 23 finished at 2020-12-11 18:45:43.298946. Row time = 0:02:59.942945. Total time elapsed = 5:26:46.613104


Token indices sequence length is longer than the specified maximum sequence length for this model (868 > 512). Running this sequence through the model will result in indexing errors


row 24 finished at 2020-12-11 18:53:03.926073. Row time = 0:07:20.626124. Total time elapsed = 5:34:07.240231


Token indices sequence length is longer than the specified maximum sequence length for this model (685 > 512). Running this sequence through the model will result in indexing errors


row 25 finished at 2020-12-11 18:57:22.597876. Row time = 0:04:18.671803. Total time elapsed = 5:38:25.912034
row 26 finished at 2020-12-11 19:01:46.145388. Row time = 0:04:23.547512. Total time elapsed = 5:42:49.459546


Token indices sequence length is longer than the specified maximum sequence length for this model (1372 > 512). Running this sequence through the model will result in indexing errors


row 27 finished at 2020-12-11 19:02:55.615131. Row time = 0:01:09.469743. Total time elapsed = 5:43:58.929289


Token indices sequence length is longer than the specified maximum sequence length for this model (1159 > 512). Running this sequence through the model will result in indexing errors


row 28 finished at 2020-12-11 19:15:47.525204. Row time = 0:12:51.909082. Total time elapsed = 5:56:50.839362


Token indices sequence length is longer than the specified maximum sequence length for this model (1260 > 512). Running this sequence through the model will result in indexing errors


row 29 finished at 2020-12-11 19:25:58.397835. Row time = 0:10:10.872631. Total time elapsed = 6:07:01.711993


Token indices sequence length is longer than the specified maximum sequence length for this model (1174 > 512). Running this sequence through the model will result in indexing errors


row 30 finished at 2020-12-11 19:36:03.426142. Row time = 0:10:05.029279. Total time elapsed = 6:17:06.741272


Token indices sequence length is longer than the specified maximum sequence length for this model (1074 > 512). Running this sequence through the model will result in indexing errors


row 31 finished at 2020-12-11 19:46:08.305195. Row time = 0:10:04.878081. Total time elapsed = 6:27:11.619353


Token indices sequence length is longer than the specified maximum sequence length for this model (5282 > 512). Running this sequence through the model will result in indexing errors


row 32 finished at 2020-12-11 19:54:23.631190. Row time = 0:08:15.325033. Total time elapsed = 6:35:26.945348
row 33 finished at 2020-12-11 21:00:22.094133. Row time = 1:05:58.462943. Total time elapsed = 7:41:25.408291


Token indices sequence length is longer than the specified maximum sequence length for this model (1479 > 512). Running this sequence through the model will result in indexing errors


row 34 finished at 2020-12-11 21:01:51.160775. Row time = 0:01:29.065642. Total time elapsed = 7:42:54.474933


Token indices sequence length is longer than the specified maximum sequence length for this model (1133 > 512). Running this sequence through the model will result in indexing errors


row 35 finished at 2020-12-11 21:19:01.343342. Row time = 0:17:10.181573. Total time elapsed = 8:00:04.657500


Token indices sequence length is longer than the specified maximum sequence length for this model (870 > 512). Running this sequence through the model will result in indexing errors


row 36 finished at 2020-12-11 21:31:09.743296. Row time = 0:12:08.399954. Total time elapsed = 8:12:13.057454


Token indices sequence length is longer than the specified maximum sequence length for this model (682 > 512). Running this sequence through the model will result in indexing errors


row 37 finished at 2020-12-11 21:38:45.797230. Row time = 0:07:36.033324. Total time elapsed = 8:19:49.111388


Token indices sequence length is longer than the specified maximum sequence length for this model (698 > 512). Running this sequence through the model will result in indexing errors


row 38 finished at 2020-12-11 21:44:35.607063. Row time = 0:05:49.809833. Total time elapsed = 8:25:38.921221


Token indices sequence length is longer than the specified maximum sequence length for this model (1191 > 512). Running this sequence through the model will result in indexing errors


row 39 finished at 2020-12-11 21:50:16.454091. Row time = 0:05:40.847028. Total time elapsed = 8:31:19.768249


Token indices sequence length is longer than the specified maximum sequence length for this model (1774 > 512). Running this sequence through the model will result in indexing errors


row 40 finished at 2020-12-11 22:04:20.449046. Row time = 0:14:03.993962. Total time elapsed = 8:45:23.763204
row 41 finished at 2020-12-11 22:26:27.052201. Row time = 0:22:06.602154. Total time elapsed = 9:07:30.366359


Token indices sequence length is longer than the specified maximum sequence length for this model (715 > 512). Running this sequence through the model will result in indexing errors


row 42 finished at 2020-12-11 22:27:33.017024. Row time = 0:01:05.930266. Total time elapsed = 9:08:36.331182


Token indices sequence length is longer than the specified maximum sequence length for this model (1539 > 512). Running this sequence through the model will result in indexing errors


row 43 finished at 2020-12-11 22:33:03.411308. Row time = 0:05:30.394284. Total time elapsed = 9:14:06.725466


Token indices sequence length is longer than the specified maximum sequence length for this model (1173 > 512). Running this sequence through the model will result in indexing errors


row 44 finished at 2020-12-11 22:46:06.960249. Row time = 0:13:03.548941. Total time elapsed = 9:27:10.274407
row 45 finished at 2020-12-11 22:53:36.847106. Row time = 0:07:29.885853. Total time elapsed = 9:34:40.161264


Token indices sequence length is longer than the specified maximum sequence length for this model (1995 > 512). Running this sequence through the model will result in indexing errors


row 46 finished at 2020-12-11 22:53:48.444753. Row time = 0:00:11.597647. Total time elapsed = 9:34:51.758911
row 47 finished at 2020-12-11 23:09:18.324254. Row time = 0:15:29.878496. Total time elapsed = 9:50:21.638412


Token indices sequence length is longer than the specified maximum sequence length for this model (745 > 512). Running this sequence through the model will result in indexing errors


row 48 finished at 2020-12-11 23:10:38.606398. Row time = 0:01:20.281143. Total time elapsed = 9:51:41.921554


Token indices sequence length is longer than the specified maximum sequence length for this model (1282 > 512). Running this sequence through the model will result in indexing errors


row 49 finished at 2020-12-11 23:15:28.520024. Row time = 0:04:49.911627. Total time elapsed = 9:56:31.834182


Token indices sequence length is longer than the specified maximum sequence length for this model (1201 > 512). Running this sequence through the model will result in indexing errors


row 50 finished at 2020-12-11 23:27:13.267484. Row time = 0:11:44.746491. Total time elapsed = 10:08:16.581642


Token indices sequence length is longer than the specified maximum sequence length for this model (1246 > 512). Running this sequence through the model will result in indexing errors


row 51 finished at 2020-12-11 23:36:25.151194. Row time = 0:09:11.883710. Total time elapsed = 10:17:28.465352


Token indices sequence length is longer than the specified maximum sequence length for this model (589 > 512). Running this sequence through the model will result in indexing errors


row 52 finished at 2020-12-11 23:45:31.398995. Row time = 0:09:06.246802. Total time elapsed = 10:26:34.713153


Token indices sequence length is longer than the specified maximum sequence length for this model (2122 > 512). Running this sequence through the model will result in indexing errors


row 53 finished at 2020-12-11 23:48:10.841440. Row time = 0:02:39.441449. Total time elapsed = 10:29:14.155598


Token indices sequence length is longer than the specified maximum sequence length for this model (2117 > 512). Running this sequence through the model will result in indexing errors


row 54 finished at 2020-12-12 00:06:24.646199. Row time = 0:18:13.803755. Total time elapsed = 10:47:27.960357
row 55 finished at 2020-12-12 00:25:13.548944. Row time = 0:18:48.901776. Total time elapsed = 11:06:16.863102


Token indices sequence length is longer than the specified maximum sequence length for this model (1751 > 512). Running this sequence through the model will result in indexing errors


row 56 finished at 2020-12-12 00:25:36.484904. Row time = 0:00:22.934961. Total time elapsed = 11:06:39.799062


Token indices sequence length is longer than the specified maximum sequence length for this model (1127 > 512). Running this sequence through the model will result in indexing errors


row 57 finished at 2020-12-12 00:40:31.903388. Row time = 0:14:55.418484. Total time elapsed = 11:21:35.217546


Token indices sequence length is longer than the specified maximum sequence length for this model (702 > 512). Running this sequence through the model will result in indexing errors


row 58 finished at 2020-12-12 00:48:41.177361. Row time = 0:08:09.273973. Total time elapsed = 11:29:44.491519


Token indices sequence length is longer than the specified maximum sequence length for this model (1705 > 512). Running this sequence through the model will result in indexing errors


row 59 finished at 2020-12-12 00:52:35.960213. Row time = 0:03:54.781859. Total time elapsed = 11:33:39.274371


Token indices sequence length is longer than the specified maximum sequence length for this model (1413 > 512). Running this sequence through the model will result in indexing errors


row 60 finished at 2020-12-12 01:06:40.559340. Row time = 0:14:04.598127. Total time elapsed = 11:47:43.873498


Token indices sequence length is longer than the specified maximum sequence length for this model (512 > 512). Running this sequence through the model will result in indexing errors


row 61 finished at 2020-12-12 01:18:42.911806. Row time = 0:12:02.352466. Total time elapsed = 11:59:46.225964


Token indices sequence length is longer than the specified maximum sequence length for this model (944 > 512). Running this sequence through the model will result in indexing errors


row 62 finished at 2020-12-12 01:21:25.226232. Row time = 0:02:42.314426. Total time elapsed = 12:02:28.540390
row 63 finished at 2020-12-12 01:28:17.531437. Row time = 0:06:52.305205. Total time elapsed = 12:09:20.845595


Token indices sequence length is longer than the specified maximum sequence length for this model (1025 > 512). Running this sequence through the model will result in indexing errors


row 64 finished at 2020-12-12 01:29:36.626644. Row time = 0:01:19.094209. Total time elapsed = 12:10:39.940802


Token indices sequence length is longer than the specified maximum sequence length for this model (2491 > 512). Running this sequence through the model will result in indexing errors


row 65 finished at 2020-12-12 01:37:31.917365. Row time = 0:07:55.289722. Total time elapsed = 12:18:35.231523
row 66 finished at 2020-12-12 01:59:12.613177. Row time = 0:21:40.694844. Total time elapsed = 12:40:15.927335


Token indices sequence length is longer than the specified maximum sequence length for this model (531 > 512). Running this sequence through the model will result in indexing errors


row 67 finished at 2020-12-12 01:59:38.801161. Row time = 0:00:26.187984. Total time elapsed = 12:40:42.115319


Token indices sequence length is longer than the specified maximum sequence length for this model (1411 > 512). Running this sequence through the model will result in indexing errors


row 68 finished at 2020-12-12 02:02:20.984079. Row time = 0:02:42.182918. Total time elapsed = 12:43:24.298237


Token indices sequence length is longer than the specified maximum sequence length for this model (866 > 512). Running this sequence through the model will result in indexing errors


row 69 finished at 2020-12-12 02:14:06.445882. Row time = 0:11:45.460801. Total time elapsed = 12:55:09.760040


Token indices sequence length is longer than the specified maximum sequence length for this model (1378 > 512). Running this sequence through the model will result in indexing errors


row 70 finished at 2020-12-12 02:19:26.710891. Row time = 0:05:20.265009. Total time elapsed = 13:00:30.025049


Token indices sequence length is longer than the specified maximum sequence length for this model (684 > 512). Running this sequence through the model will result in indexing errors


row 71 finished at 2020-12-12 02:29:56.519087. Row time = 0:10:29.808196. Total time elapsed = 13:10:59.833245


Token indices sequence length is longer than the specified maximum sequence length for this model (2065 > 512). Running this sequence through the model will result in indexing errors


row 72 finished at 2020-12-12 02:33:55.150333. Row time = 0:03:58.630247. Total time elapsed = 13:14:58.464491


Token indices sequence length is longer than the specified maximum sequence length for this model (1180 > 512). Running this sequence through the model will result in indexing errors


row 73 finished at 2020-12-12 02:52:12.626841. Row time = 0:18:17.475534. Total time elapsed = 13:33:15.940999


Token indices sequence length is longer than the specified maximum sequence length for this model (715 > 512). Running this sequence through the model will result in indexing errors


row 74 finished at 2020-12-12 03:01:23.491527. Row time = 0:09:10.864686. Total time elapsed = 13:42:26.805685


Token indices sequence length is longer than the specified maximum sequence length for this model (2031 > 512). Running this sequence through the model will result in indexing errors


row 75 finished at 2020-12-12 03:05:22.240456. Row time = 0:03:58.747933. Total time elapsed = 13:46:25.554614


Token indices sequence length is longer than the specified maximum sequence length for this model (2552 > 512). Running this sequence through the model will result in indexing errors


row 76 finished at 2020-12-12 03:22:17.510521. Row time = 0:16:55.270065. Total time elapsed = 14:03:20.824679


Token indices sequence length is longer than the specified maximum sequence length for this model (1363 > 512). Running this sequence through the model will result in indexing errors


row 77 finished at 2020-12-12 03:46:05.415723. Row time = 0:23:47.905202. Total time elapsed = 14:27:08.729881


Token indices sequence length is longer than the specified maximum sequence length for this model (1307 > 512). Running this sequence through the model will result in indexing errors


row 78 finished at 2020-12-12 03:56:11.834944. Row time = 0:10:06.418257. Total time elapsed = 14:37:15.149102
row 79 finished at 2020-12-12 04:06:38.333249. Row time = 0:10:26.498305. Total time elapsed = 14:47:41.647407
row 80 finished at 2020-12-12 04:07:46.425856. Row time = 0:01:08.092607. Total time elapsed = 14:48:49.740014


Token indices sequence length is longer than the specified maximum sequence length for this model (1234 > 512). Running this sequence through the model will result in indexing errors


row 81 finished at 2020-12-12 04:08:52.761846. Row time = 0:01:06.335990. Total time elapsed = 14:49:56.076004
row 82 finished at 2020-12-12 04:18:01.530790. Row time = 0:09:08.768944. Total time elapsed = 14:59:04.844948


Token indices sequence length is longer than the specified maximum sequence length for this model (877 > 512). Running this sequence through the model will result in indexing errors


row 83 finished at 2020-12-12 04:19:15.100520. Row time = 0:01:13.569730. Total time elapsed = 15:00:18.414678


Token indices sequence length is longer than the specified maximum sequence length for this model (1352 > 512). Running this sequence through the model will result in indexing errors


row 84 finished at 2020-12-12 04:24:32.734221. Row time = 0:05:17.632731. Total time elapsed = 15:05:36.048379


Token indices sequence length is longer than the specified maximum sequence length for this model (619 > 512). Running this sequence through the model will result in indexing errors


row 85 finished at 2020-12-12 04:35:02.674629. Row time = 0:10:29.939409. Total time elapsed = 15:16:05.988787


Token indices sequence length is longer than the specified maximum sequence length for this model (1049 > 512). Running this sequence through the model will result in indexing errors


row 86 finished at 2020-12-12 04:37:42.312424. Row time = 0:02:39.637795. Total time elapsed = 15:18:45.626582


Token indices sequence length is longer than the specified maximum sequence length for this model (720 > 512). Running this sequence through the model will result in indexing errors


row 87 finished at 2020-12-12 04:45:37.588462. Row time = 0:07:55.276038. Total time elapsed = 15:26:40.902620


Token indices sequence length is longer than the specified maximum sequence length for this model (1311 > 512). Running this sequence through the model will result in indexing errors


row 88 finished at 2020-12-12 04:49:36.412062. Row time = 0:03:58.823600. Total time elapsed = 15:30:39.726220


Token indices sequence length is longer than the specified maximum sequence length for this model (1537 > 512). Running this sequence through the model will result in indexing errors


row 89 finished at 2020-12-12 05:00:02.355086. Row time = 0:10:25.943024. Total time elapsed = 15:41:05.669244


Token indices sequence length is longer than the specified maximum sequence length for this model (3122 > 512). Running this sequence through the model will result in indexing errors


row 90 finished at 2020-12-12 05:13:05.491327. Row time = 0:13:03.136241. Total time elapsed = 15:54:08.805485


Token indices sequence length is longer than the specified maximum sequence length for this model (841 > 512). Running this sequence through the model will result in indexing errors


row 91 finished at 2020-12-12 05:41:42.550561. Row time = 0:28:37.059234. Total time elapsed = 16:22:45.864719
row 92 finished at 2020-12-12 05:46:56.138168. Row time = 0:05:13.586605. Total time elapsed = 16:27:59.452326
row 93 finished at 2020-12-12 05:47:42.083597. Row time = 0:00:45.945429. Total time elapsed = 16:28:45.397755


Token indices sequence length is longer than the specified maximum sequence length for this model (7404 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (865 > 512). Running this sequence through the model will result in indexing errors


row 94 finished at 2020-12-12 06:59:10.853710. Row time = 1:11:28.769142. Total time elapsed = 17:40:14.167868


Token indices sequence length is longer than the specified maximum sequence length for this model (2182 > 512). Running this sequence through the model will result in indexing errors


row 95 finished at 2020-12-12 07:04:26.380881. Row time = 0:05:15.528141. Total time elapsed = 17:45:29.696009


Token indices sequence length is longer than the specified maximum sequence length for this model (652 > 512). Running this sequence through the model will result in indexing errors


row 96 finished at 2020-12-12 07:24:01.781203. Row time = 0:19:35.399352. Total time elapsed = 18:05:05.095361


Token indices sequence length is longer than the specified maximum sequence length for this model (2212 > 512). Running this sequence through the model will result in indexing errors


row 97 finished at 2020-12-12 07:28:01.677397. Row time = 0:03:59.896194. Total time elapsed = 18:09:04.991555
row 98 finished at 2020-12-12 07:47:35.050405. Row time = 0:19:33.373008. Total time elapsed = 18:28:38.364563
row 99 finished at 2020-12-12 07:48:10.175908. Row time = 0:00:35.124501. Total time elapsed = 18:29:13.490066


Token indices sequence length is longer than the specified maximum sequence length for this model (1061 > 512). Running this sequence through the model will result in indexing errors


row 100 finished at 2020-12-12 07:49:17.767300. Row time = 0:01:07.591392. Total time elapsed = 18:30:21.081458


Token indices sequence length is longer than the specified maximum sequence length for this model (1591 > 512). Running this sequence through the model will result in indexing errors


row 101 finished at 2020-12-12 07:56:49.538948. Row time = 0:07:31.771648. Total time elapsed = 18:37:52.853106
row 102 finished at 2020-12-12 08:09:36.275027. Row time = 0:12:46.736079. Total time elapsed = 18:50:39.589185


Token indices sequence length is longer than the specified maximum sequence length for this model (514 > 512). Running this sequence through the model will result in indexing errors


row 103 finished at 2020-12-12 08:09:53.581768. Row time = 0:00:17.306741. Total time elapsed = 18:50:56.895926


Token indices sequence length is longer than the specified maximum sequence length for this model (2101 > 512). Running this sequence through the model will result in indexing errors


row 104 finished at 2020-12-12 08:12:33.312176. Row time = 0:02:39.729419. Total time elapsed = 18:53:36.626334


Token indices sequence length is longer than the specified maximum sequence length for this model (589 > 512). Running this sequence through the model will result in indexing errors


row 105 finished at 2020-12-12 08:30:35.049377. Row time = 0:18:01.737201. Total time elapsed = 19:11:38.363535


Token indices sequence length is longer than the specified maximum sequence length for this model (1262 > 512). Running this sequence through the model will result in indexing errors


row 106 finished at 2020-12-12 08:33:12.495731. Row time = 0:02:37.445353. Total time elapsed = 19:14:15.809889


Token indices sequence length is longer than the specified maximum sequence length for this model (1019 > 512). Running this sequence through the model will result in indexing errors


row 107 finished at 2020-12-12 08:42:15.909757. Row time = 0:09:03.413029. Total time elapsed = 19:23:19.223915


Token indices sequence length is longer than the specified maximum sequence length for this model (943 > 512). Running this sequence through the model will result in indexing errors


row 108 finished at 2020-12-12 08:48:58.453482. Row time = 0:06:42.543725. Total time elapsed = 19:30:01.767640


Token indices sequence length is longer than the specified maximum sequence length for this model (695 > 512). Running this sequence through the model will result in indexing errors


row 109 finished at 2020-12-12 08:53:48.062839. Row time = 0:04:49.609357. Total time elapsed = 19:34:51.376997


Token indices sequence length is longer than the specified maximum sequence length for this model (1188 > 512). Running this sequence through the model will result in indexing errors


row 110 finished at 2020-12-12 08:56:42.194318. Row time = 0:02:54.130474. Total time elapsed = 19:37:45.508476


Token indices sequence length is longer than the specified maximum sequence length for this model (726 > 512). Running this sequence through the model will result in indexing errors


row 111 finished at 2020-12-12 09:03:24.354220. Row time = 0:06:42.159902. Total time elapsed = 19:44:27.668378


Token indices sequence length is longer than the specified maximum sequence length for this model (1148 > 512). Running this sequence through the model will result in indexing errors


row 112 finished at 2020-12-12 09:07:22.082138. Row time = 0:03:57.726955. Total time elapsed = 19:48:25.396296


Token indices sequence length is longer than the specified maximum sequence length for this model (865 > 512). Running this sequence through the model will result in indexing errors


row 113 finished at 2020-12-12 09:17:48.422874. Row time = 0:10:26.340736. Total time elapsed = 19:58:51.737032


Token indices sequence length is longer than the specified maximum sequence length for this model (589 > 512). Running this sequence through the model will result in indexing errors


row 114 finished at 2020-12-12 09:23:39.656805. Row time = 0:05:51.233931. Total time elapsed = 20:04:42.970963
row 115 finished at 2020-12-12 09:26:34.611247. Row time = 0:02:54.954442. Total time elapsed = 20:07:37.925405


Token indices sequence length is longer than the specified maximum sequence length for this model (778 > 512). Running this sequence through the model will result in indexing errors


row 116 finished at 2020-12-12 09:27:14.558481. Row time = 0:00:39.947234. Total time elapsed = 20:08:17.872639


Token indices sequence length is longer than the specified maximum sequence length for this model (771 > 512). Running this sequence through the model will result in indexing errors


row 117 finished at 2020-12-12 09:32:40.473626. Row time = 0:05:25.915145. Total time elapsed = 20:13:43.787784


Token indices sequence length is longer than the specified maximum sequence length for this model (998 > 512). Running this sequence through the model will result in indexing errors


row 118 finished at 2020-12-12 09:38:01.743581. Row time = 0:05:21.269955. Total time elapsed = 20:19:05.057739


Token indices sequence length is longer than the specified maximum sequence length for this model (638 > 512). Running this sequence through the model will result in indexing errors


row 119 finished at 2020-12-12 09:44:41.324736. Row time = 0:06:39.580154. Total time elapsed = 20:25:44.638894


Token indices sequence length is longer than the specified maximum sequence length for this model (667 > 512). Running this sequence through the model will result in indexing errors


row 120 finished at 2020-12-12 09:48:40.708806. Row time = 0:03:59.384070. Total time elapsed = 20:29:44.022964


Token indices sequence length is longer than the specified maximum sequence length for this model (2761 > 512). Running this sequence through the model will result in indexing errors


row 121 finished at 2020-12-12 09:52:37.651760. Row time = 0:03:56.942954. Total time elapsed = 20:33:40.965918


Token indices sequence length is longer than the specified maximum sequence length for this model (856 > 512). Running this sequence through the model will result in indexing errors


row 122 finished at 2020-12-12 10:17:09.825712. Row time = 0:24:32.173952. Total time elapsed = 20:58:13.139870


Token indices sequence length is longer than the specified maximum sequence length for this model (1464 > 512). Running this sequence through the model will result in indexing errors


row 123 finished at 2020-12-12 10:22:18.189028. Row time = 0:05:08.363316. Total time elapsed = 21:03:21.503186


Token indices sequence length is longer than the specified maximum sequence length for this model (676 > 512). Running this sequence through the model will result in indexing errors


row 124 finished at 2020-12-12 10:33:45.411884. Row time = 0:11:27.222856. Total time elapsed = 21:14:48.726042


Token indices sequence length is longer than the specified maximum sequence length for this model (1351 > 512). Running this sequence through the model will result in indexing errors


row 125 finished at 2020-12-12 10:37:37.947073. Row time = 0:03:52.534188. Total time elapsed = 21:18:41.261231


Token indices sequence length is longer than the specified maximum sequence length for this model (1203 > 512). Running this sequence through the model will result in indexing errors


row 126 finished at 2020-12-12 10:47:47.086030. Row time = 0:10:09.138957. Total time elapsed = 21:28:50.400188


Token indices sequence length is longer than the specified maximum sequence length for this model (1406 > 512). Running this sequence through the model will result in indexing errors


row 127 finished at 2020-12-12 10:56:45.360116. Row time = 0:08:58.273117. Total time elapsed = 21:37:48.674274


Token indices sequence length is longer than the specified maximum sequence length for this model (1373 > 512). Running this sequence through the model will result in indexing errors


row 128 finished at 2020-12-12 11:08:07.473850. Row time = 0:11:22.113734. Total time elapsed = 21:49:10.788008


Token indices sequence length is longer than the specified maximum sequence length for this model (1225 > 512). Running this sequence through the model will result in indexing errors


row 129 finished at 2020-12-12 11:18:12.669789. Row time = 0:10:05.194963. Total time elapsed = 21:59:15.983947
row 130 finished at 2020-12-12 11:27:06.879333. Row time = 0:08:54.209544. Total time elapsed = 22:08:10.193491


Token indices sequence length is longer than the specified maximum sequence length for this model (11230 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (1217 > 512). Running this sequence through the model will result in indexing errors


row 131 finished at 2020-12-12 13:14:00.772417. Row time = 1:46:53.893084. Total time elapsed = 23:55:04.086575


Token indices sequence length is longer than the specified maximum sequence length for this model (1087 > 512). Running this sequence through the model will result in indexing errors


row 132 finished at 2020-12-12 13:22:54.129126. Row time = 0:08:53.356709. Total time elapsed = 1 day, 0:03:57.443284
row 133 finished at 2020-12-12 13:30:39.824017. Row time = 0:07:45.693928. Total time elapsed = 1 day, 0:11:43.138175


Token indices sequence length is longer than the specified maximum sequence length for this model (8103 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (1354 > 512). Running this sequence through the model will result in indexing errors


row 134 finished at 2020-12-12 14:56:52.407805. Row time = 1:26:12.583788. Total time elapsed = 1 day, 1:37:55.721963
row 135 finished at 2020-12-12 15:07:44.141059. Row time = 0:10:51.733254. Total time elapsed = 1 day, 1:48:47.455217


Token indices sequence length is longer than the specified maximum sequence length for this model (1888 > 512). Running this sequence through the model will result in indexing errors


row 136 finished at 2020-12-12 15:07:48.340849. Row time = 0:00:04.199790. Total time elapsed = 1 day, 1:48:51.655007


Token indices sequence length is longer than the specified maximum sequence length for this model (994 > 512). Running this sequence through the model will result in indexing errors


row 137 finished at 2020-12-12 15:24:11.418764. Row time = 0:16:23.076915. Total time elapsed = 1 day, 2:05:14.732922


Token indices sequence length is longer than the specified maximum sequence length for this model (1055 > 512). Running this sequence through the model will result in indexing errors


row 138 finished at 2020-12-12 15:31:04.910326. Row time = 0:06:53.490597. Total time elapsed = 1 day, 2:12:08.224484


Token indices sequence length is longer than the specified maximum sequence length for this model (1591 > 512). Running this sequence through the model will result in indexing errors


row 139 finished at 2020-12-12 15:39:15.530710. Row time = 0:08:10.620384. Total time elapsed = 1 day, 2:20:18.844868
row 140 finished at 2020-12-12 15:53:02.124746. Row time = 0:13:46.594036. Total time elapsed = 1 day, 2:34:05.438904


Token indices sequence length is longer than the specified maximum sequence length for this model (592 > 512). Running this sequence through the model will result in indexing errors


row 141 finished at 2020-12-12 15:54:24.968679. Row time = 0:01:22.842932. Total time elapsed = 1 day, 2:35:28.282837


Token indices sequence length is longer than the specified maximum sequence length for this model (1130 > 512). Running this sequence through the model will result in indexing errors


row 142 finished at 2020-12-12 15:57:22.796485. Row time = 0:02:57.827806. Total time elapsed = 1 day, 2:38:26.110643
row 143 finished at 2020-12-12 16:05:56.615603. Row time = 0:08:33.818117. Total time elapsed = 1 day, 2:46:59.929761


Token indices sequence length is longer than the specified maximum sequence length for this model (913 > 512). Running this sequence through the model will result in indexing errors


row 144 finished at 2020-12-12 16:07:13.255801. Row time = 0:01:16.639227. Total time elapsed = 1 day, 2:48:16.569959


Token indices sequence length is longer than the specified maximum sequence length for this model (654 > 512). Running this sequence through the model will result in indexing errors


row 145 finished at 2020-12-12 16:14:24.080342. Row time = 0:07:10.824541. Total time elapsed = 1 day, 2:55:27.394500


Token indices sequence length is longer than the specified maximum sequence length for this model (1458 > 512). Running this sequence through the model will result in indexing errors


row 146 finished at 2020-12-12 16:18:39.221859. Row time = 0:04:15.141517. Total time elapsed = 1 day, 2:59:42.536017


Token indices sequence length is longer than the specified maximum sequence length for this model (1981 > 512). Running this sequence through the model will result in indexing errors


row 147 finished at 2020-12-12 16:31:04.883140. Row time = 0:12:25.661281. Total time elapsed = 1 day, 3:12:08.197298


Token indices sequence length is longer than the specified maximum sequence length for this model (2116 > 512). Running this sequence through the model will result in indexing errors


row 148 finished at 2020-12-12 16:49:03.738215. Row time = 0:17:58.855075. Total time elapsed = 1 day, 3:30:07.052373


Token indices sequence length is longer than the specified maximum sequence length for this model (1251 > 512). Running this sequence through the model will result in indexing errors


row 149 finished at 2020-12-13 08:19:22.994583. Row time = 15:30:19.256368. Total time elapsed = 1 day, 19:00:26.308741


Token indices sequence length is longer than the specified maximum sequence length for this model (1063 > 512). Running this sequence through the model will result in indexing errors


row 150 finished at 2020-12-13 08:25:19.593741. Row time = 0:05:56.598159. Total time elapsed = 1 day, 19:06:22.907899


Token indices sequence length is longer than the specified maximum sequence length for this model (1281 > 512). Running this sequence through the model will result in indexing errors


row 151 finished at 2020-12-13 08:30:37.590240. Row time = 0:05:17.995495. Total time elapsed = 1 day, 19:11:40.904398


Token indices sequence length is longer than the specified maximum sequence length for this model (572 > 512). Running this sequence through the model will result in indexing errors


row 152 finished at 2020-12-14 09:46:36.497095. Row time = 1 day, 1:15:58.905854. Total time elapsed = 2 days, 20:27:39.811253


Token indices sequence length is longer than the specified maximum sequence length for this model (588 > 512). Running this sequence through the model will result in indexing errors


row 153 finished at 2020-12-14 09:48:34.381503. Row time = 0:01:57.883408. Total time elapsed = 2 days, 20:29:37.695661


Token indices sequence length is longer than the specified maximum sequence length for this model (1744 > 512). Running this sequence through the model will result in indexing errors


row 154 finished at 2020-12-14 09:50:29.436269. Row time = 0:01:55.053767. Total time elapsed = 2 days, 20:31:32.750427


Token indices sequence length is longer than the specified maximum sequence length for this model (585 > 512). Running this sequence through the model will result in indexing errors


row 155 finished at 2020-12-14 10:02:19.244915. Row time = 0:11:49.807676. Total time elapsed = 2 days, 20:43:22.559073


Token indices sequence length is longer than the specified maximum sequence length for this model (791 > 512). Running this sequence through the model will result in indexing errors


row 156 finished at 2020-12-14 10:04:33.009353. Row time = 0:02:13.764438. Total time elapsed = 2 days, 20:45:36.323511


Token indices sequence length is longer than the specified maximum sequence length for this model (559 > 512). Running this sequence through the model will result in indexing errors


row 157 finished at 2020-12-14 10:08:59.161851. Row time = 0:04:26.151496. Total time elapsed = 2 days, 20:50:02.476009


Token indices sequence length is longer than the specified maximum sequence length for this model (4184 > 512). Running this sequence through the model will result in indexing errors


row 158 finished at 2020-12-14 10:11:13.236896. Row time = 0:02:14.075045. Total time elapsed = 2 days, 20:52:16.551054


Token indices sequence length is longer than the specified maximum sequence length for this model (934 > 512). Running this sequence through the model will result in indexing errors


row 159 finished at 2020-12-14 10:44:19.032669. Row time = 0:33:05.794811. Total time elapsed = 2 days, 21:25:22.346827


Token indices sequence length is longer than the specified maximum sequence length for this model (754 > 512). Running this sequence through the model will result in indexing errors


row 160 finished at 2020-12-14 10:49:35.674118. Row time = 0:05:16.641449. Total time elapsed = 2 days, 21:30:38.988276


Token indices sequence length is longer than the specified maximum sequence length for this model (618 > 512). Running this sequence through the model will result in indexing errors


row 161 finished at 2020-12-14 10:52:31.414268. Row time = 0:02:55.740150. Total time elapsed = 2 days, 21:33:34.728426


Token indices sequence length is longer than the specified maximum sequence length for this model (1448 > 512). Running this sequence through the model will result in indexing errors


row 162 finished at 2020-12-14 10:54:28.353769. Row time = 0:01:56.938530. Total time elapsed = 2 days, 21:35:31.667927


Token indices sequence length is longer than the specified maximum sequence length for this model (567 > 512). Running this sequence through the model will result in indexing errors


row 163 finished at 2020-12-14 11:03:14.200033. Row time = 0:08:45.846264. Total time elapsed = 2 days, 21:44:17.514191


Token indices sequence length is longer than the specified maximum sequence length for this model (1105 > 512). Running this sequence through the model will result in indexing errors


row 164 finished at 2020-12-14 11:05:15.376902. Row time = 0:02:01.175901. Total time elapsed = 2 days, 21:46:18.691060


Token indices sequence length is longer than the specified maximum sequence length for this model (1157 > 512). Running this sequence through the model will result in indexing errors


row 165 finished at 2020-12-14 12:04:33.931399. Row time = 0:59:18.554497. Total time elapsed = 2 days, 22:45:37.245557


Token indices sequence length is longer than the specified maximum sequence length for this model (1325 > 512). Running this sequence through the model will result in indexing errors


row 166 finished at 2020-12-14 12:11:02.342455. Row time = 0:06:28.411056. Total time elapsed = 2 days, 22:52:05.656613
row 167 finished at 2020-12-14 12:19:16.189416. Row time = 0:08:13.845959. Total time elapsed = 2 days, 23:00:19.503574


Token indices sequence length is longer than the specified maximum sequence length for this model (825 > 512). Running this sequence through the model will result in indexing errors


row 168 finished at 2020-12-14 12:19:54.621016. Row time = 0:00:38.431600. Total time elapsed = 2 days, 23:00:57.935174


Token indices sequence length is longer than the specified maximum sequence length for this model (954 > 512). Running this sequence through the model will result in indexing errors


row 169 finished at 2020-12-14 12:23:49.915054. Row time = 0:03:55.294038. Total time elapsed = 2 days, 23:04:53.229212


Token indices sequence length is longer than the specified maximum sequence length for this model (857 > 512). Running this sequence through the model will result in indexing errors


row 170 finished at 2020-12-14 12:28:38.046076. Row time = 0:04:48.131022. Total time elapsed = 2 days, 23:09:41.360234


Token indices sequence length is longer than the specified maximum sequence length for this model (3580 > 512). Running this sequence through the model will result in indexing errors


row 171 finished at 2020-12-14 12:32:22.455978. Row time = 0:03:44.408941. Total time elapsed = 2 days, 23:13:25.770136


Token indices sequence length is longer than the specified maximum sequence length for this model (1361 > 512). Running this sequence through the model will result in indexing errors


row 172 finished at 2020-12-14 12:59:15.128069. Row time = 0:26:52.672091. Total time elapsed = 2 days, 23:40:18.442227
row 173 finished at 2020-12-14 13:08:48.450394. Row time = 0:09:33.321323. Total time elapsed = 2 days, 23:49:51.764552


Token indices sequence length is longer than the specified maximum sequence length for this model (1462 > 512). Running this sequence through the model will result in indexing errors


row 174 finished at 2020-12-14 13:09:50.752187. Row time = 0:01:02.301793. Total time elapsed = 2 days, 23:50:54.066345
row 175 finished at 2020-12-14 13:20:47.055726. Row time = 0:10:56.303539. Total time elapsed = 3 days, 0:01:50.369884


Token indices sequence length is longer than the specified maximum sequence length for this model (698 > 512). Running this sequence through the model will result in indexing errors


row 176 finished at 2020-12-14 13:21:23.698596. Row time = 0:00:36.642870. Total time elapsed = 3 days, 0:02:27.012754


Token indices sequence length is longer than the specified maximum sequence length for this model (758 > 512). Running this sequence through the model will result in indexing errors


row 177 finished at 2020-12-14 13:24:33.283034. Row time = 0:03:09.584438. Total time elapsed = 3 days, 0:05:36.597192


Token indices sequence length is longer than the specified maximum sequence length for this model (1560 > 512). Running this sequence through the model will result in indexing errors


row 178 finished at 2020-12-14 13:27:38.918158. Row time = 0:03:05.635124. Total time elapsed = 3 days, 0:08:42.232316


Token indices sequence length is longer than the specified maximum sequence length for this model (718 > 512). Running this sequence through the model will result in indexing errors


row 179 finished at 2020-12-14 13:37:46.071379. Row time = 0:10:07.153221. Total time elapsed = 3 days, 0:18:49.385537


Token indices sequence length is longer than the specified maximum sequence length for this model (1236 > 512). Running this sequence through the model will result in indexing errors


row 180 finished at 2020-12-14 14:34:08.798139. Row time = 0:56:22.726760. Total time elapsed = 3 days, 1:15:12.112297


Token indices sequence length is longer than the specified maximum sequence length for this model (741 > 512). Running this sequence through the model will result in indexing errors


row 181 finished at 2020-12-14 14:41:03.495663. Row time = 0:06:54.696556. Total time elapsed = 3 days, 1:22:06.809821


Token indices sequence length is longer than the specified maximum sequence length for this model (1105 > 512). Running this sequence through the model will result in indexing errors


row 182 finished at 2020-12-14 14:44:18.582742. Row time = 0:03:15.086082. Total time elapsed = 3 days, 1:25:21.896900


Token indices sequence length is longer than the specified maximum sequence length for this model (861 > 512). Running this sequence through the model will result in indexing errors


row 183 finished at 2020-12-14 14:50:37.818028. Row time = 0:06:19.235286. Total time elapsed = 3 days, 1:31:41.132186


Token indices sequence length is longer than the specified maximum sequence length for this model (1270 > 512). Running this sequence through the model will result in indexing errors


row 184 finished at 2020-12-14 14:55:00.419992. Row time = 0:04:22.601964. Total time elapsed = 3 days, 1:36:03.734150


Token indices sequence length is longer than the specified maximum sequence length for this model (2597 > 512). Running this sequence through the model will result in indexing errors


row 185 finished at 2020-12-14 15:02:16.147372. Row time = 0:07:15.726383. Total time elapsed = 3 days, 1:43:19.461530


Token indices sequence length is longer than the specified maximum sequence length for this model (637 > 512). Running this sequence through the model will result in indexing errors


row 186 finished at 2020-12-14 15:23:58.978729. Row time = 0:21:42.831357. Total time elapsed = 3 days, 2:05:02.292887


Token indices sequence length is longer than the specified maximum sequence length for this model (3220 > 512). Running this sequence through the model will result in indexing errors


row 187 finished at 2020-12-14 15:27:31.073650. Row time = 0:03:32.093919. Total time elapsed = 3 days, 2:08:34.387808


Token indices sequence length is longer than the specified maximum sequence length for this model (2440 > 512). Running this sequence through the model will result in indexing errors


row 188 finished at 2020-12-14 15:53:23.676096. Row time = 0:25:52.602446. Total time elapsed = 3 days, 2:34:26.990254


Token indices sequence length is longer than the specified maximum sequence length for this model (1113 > 512). Running this sequence through the model will result in indexing errors


row 189 finished at 2020-12-14 16:14:10.075519. Row time = 0:20:46.399398. Total time elapsed = 3 days, 2:55:13.390650


Token indices sequence length is longer than the specified maximum sequence length for this model (1185 > 512). Running this sequence through the model will result in indexing errors


row 190 finished at 2020-12-14 16:21:22.870445. Row time = 0:07:12.793953. Total time elapsed = 3 days, 3:02:26.184603


Token indices sequence length is longer than the specified maximum sequence length for this model (2603 > 512). Running this sequence through the model will result in indexing errors


row 191 finished at 2020-12-14 16:29:13.124331. Row time = 0:07:50.253886. Total time elapsed = 3 days, 3:10:16.438489


Token indices sequence length is longer than the specified maximum sequence length for this model (1708 > 512). Running this sequence through the model will result in indexing errors


row 192 finished at 2020-12-14 16:49:01.412786. Row time = 0:19:48.288455. Total time elapsed = 3 days, 3:30:04.726944


Token indices sequence length is longer than the specified maximum sequence length for this model (2203 > 512). Running this sequence through the model will result in indexing errors


row 193 finished at 2020-12-14 17:02:23.540524. Row time = 0:13:22.127738. Total time elapsed = 3 days, 3:43:26.854682


Token indices sequence length is longer than the specified maximum sequence length for this model (1649 > 512). Running this sequence through the model will result in indexing errors


row 194 finished at 2020-12-14 17:21:25.539740. Row time = 0:19:01.998216. Total time elapsed = 3 days, 4:02:28.853898


Token indices sequence length is longer than the specified maximum sequence length for this model (2099 > 512). Running this sequence through the model will result in indexing errors


row 195 finished at 2020-12-14 17:32:15.319131. Row time = 0:10:49.779391. Total time elapsed = 3 days, 4:13:18.633289


Token indices sequence length is longer than the specified maximum sequence length for this model (783 > 512). Running this sequence through the model will result in indexing errors


row 196 finished at 2020-12-14 17:48:53.835727. Row time = 0:16:38.515595. Total time elapsed = 3 days, 4:29:57.149885


Token indices sequence length is longer than the specified maximum sequence length for this model (684 > 512). Running this sequence through the model will result in indexing errors


row 197 finished at 2020-12-14 17:53:53.230624. Row time = 0:04:59.393898. Total time elapsed = 3 days, 4:34:56.544782


Token indices sequence length is longer than the specified maximum sequence length for this model (781 > 512). Running this sequence through the model will result in indexing errors


row 198 finished at 2020-12-14 17:56:57.842485. Row time = 0:03:04.610903. Total time elapsed = 3 days, 4:38:01.156643


Token indices sequence length is longer than the specified maximum sequence length for this model (764 > 512). Running this sequence through the model will result in indexing errors


row 199 finished at 2020-12-14 18:01:01.713301. Row time = 0:04:03.869850. Total time elapsed = 3 days, 4:42:05.027459


Token indices sequence length is longer than the specified maximum sequence length for this model (1373 > 512). Running this sequence through the model will result in indexing errors


row 200 finished at 2020-12-14 18:05:05.512137. Row time = 0:04:03.797840. Total time elapsed = 3 days, 4:46:08.826295


Token indices sequence length is longer than the specified maximum sequence length for this model (2156 > 512). Running this sequence through the model will result in indexing errors


row 201 finished at 2020-12-14 18:13:04.272713. Row time = 0:07:58.760576. Total time elapsed = 3 days, 4:54:07.586871


Token indices sequence length is longer than the specified maximum sequence length for this model (1063 > 512). Running this sequence through the model will result in indexing errors


row 202 finished at 2020-12-15 08:07:08.658846. Row time = 13:54:04.386133. Total time elapsed = 3 days, 18:48:11.973004
row 203 finished at 2020-12-15 08:13:44.731943. Row time = 0:06:36.073097. Total time elapsed = 3 days, 18:54:48.046101


Token indices sequence length is longer than the specified maximum sequence length for this model (10625 > 512). Running this sequence through the model will result in indexing errors


row 204 finished at 2020-12-15 10:28:08.589801. Row time = 2:14:23.858849. Total time elapsed = 3 days, 21:09:11.904950


Token indices sequence length is longer than the specified maximum sequence length for this model (3490 > 512). Running this sequence through the model will result in indexing errors
Token indices sequence length is longer than the specified maximum sequence length for this model (1373 > 512). Running this sequence through the model will result in indexing errors


row 205 finished at 2020-12-15 10:56:03.442882. Row time = 0:27:54.851090. Total time elapsed = 3 days, 21:37:06.757040


Token indices sequence length is longer than the specified maximum sequence length for this model (592 > 512). Running this sequence through the model will result in indexing errors


row 206 finished at 2020-12-15 11:04:44.051604. Row time = 0:08:40.607721. Total time elapsed = 3 days, 21:45:47.365762


Token indices sequence length is longer than the specified maximum sequence length for this model (3925 > 512). Running this sequence through the model will result in indexing errors


row 207 finished at 2020-12-15 11:06:58.611048. Row time = 0:02:14.558431. Total time elapsed = 3 days, 21:48:01.925206


Token indices sequence length is longer than the specified maximum sequence length for this model (717 > 512). Running this sequence through the model will result in indexing errors


row 208 finished at 2020-12-15 11:39:10.082456. Row time = 0:32:11.471408. Total time elapsed = 3 days, 22:20:13.396614


Token indices sequence length is longer than the specified maximum sequence length for this model (1491 > 512). Running this sequence through the model will result in indexing errors


row 209 finished at 2020-12-15 11:42:28.118076. Row time = 0:03:18.035623. Total time elapsed = 3 days, 22:23:31.433237


Token indices sequence length is longer than the specified maximum sequence length for this model (769 > 512). Running this sequence through the model will result in indexing errors


row 210 finished at 2020-12-15 11:53:12.485577. Row time = 0:10:44.366498. Total time elapsed = 3 days, 22:34:15.799735


Token indices sequence length is longer than the specified maximum sequence length for this model (928 > 512). Running this sequence through the model will result in indexing errors


row 211 finished at 2020-12-15 11:58:02.551096. Row time = 0:04:50.065519. Total time elapsed = 3 days, 22:39:05.865254


Token indices sequence length is longer than the specified maximum sequence length for this model (6804 > 512). Running this sequence through the model will result in indexing errors


row 212 finished at 2020-12-15 12:04:04.176371. Row time = 0:06:01.624274. Total time elapsed = 3 days, 22:45:07.490529


Token indices sequence length is longer than the specified maximum sequence length for this model (1893 > 512). Running this sequence through the model will result in indexing errors


row 213 finished at 2020-12-15 13:06:44.820389. Row time = 1:02:40.644018. Total time elapsed = 3 days, 23:47:48.134547


Token indices sequence length is longer than the specified maximum sequence length for this model (1983 > 512). Running this sequence through the model will result in indexing errors


row 214 finished at 2020-12-15 13:20:05.032448. Row time = 0:13:20.210058. Total time elapsed = 4 days, 0:01:08.346606


Token indices sequence length is longer than the specified maximum sequence length for this model (686 > 512). Running this sequence through the model will result in indexing errors


row 215 finished at 2020-12-15 13:34:59.817391. Row time = 0:14:54.784943. Total time elapsed = 4 days, 0:16:03.131549


Token indices sequence length is longer than the specified maximum sequence length for this model (1026 > 512). Running this sequence through the model will result in indexing errors


row 216 finished at 2020-12-15 13:38:35.563280. Row time = 0:03:35.744893. Total time elapsed = 4 days, 0:19:38.877438


Token indices sequence length is longer than the specified maximum sequence length for this model (1117 > 512). Running this sequence through the model will result in indexing errors


row 217 finished at 2020-12-15 13:46:27.942153. Row time = 0:07:52.377912. Total time elapsed = 4 days, 0:27:31.256311


Token indices sequence length is longer than the specified maximum sequence length for this model (1275 > 512). Running this sequence through the model will result in indexing errors


row 218 finished at 2020-12-15 13:53:32.618936. Row time = 0:07:04.676783. Total time elapsed = 4 days, 0:34:35.933094


Token indices sequence length is longer than the specified maximum sequence length for this model (1730 > 512). Running this sequence through the model will result in indexing errors


row 219 finished at 2020-12-15 14:02:44.007692. Row time = 0:09:11.388756. Total time elapsed = 4 days, 0:43:47.321850


Token indices sequence length is longer than the specified maximum sequence length for this model (590 > 512). Running this sequence through the model will result in indexing errors


row 220 finished at 2020-12-15 14:16:12.861058. Row time = 0:13:28.853366. Total time elapsed = 4 days, 0:57:16.175216


Token indices sequence length is longer than the specified maximum sequence length for this model (694 > 512). Running this sequence through the model will result in indexing errors


row 221 finished at 2020-12-15 14:18:42.467552. Row time = 0:02:29.606494. Total time elapsed = 4 days, 0:59:45.781710


Token indices sequence length is longer than the specified maximum sequence length for this model (2028 > 512). Running this sequence through the model will result in indexing errors


row 222 finished at 2020-12-15 14:22:06.009103. Row time = 0:03:23.541551. Total time elapsed = 4 days, 1:03:09.323261
