# EXAMPLE - 1

**Tasks :- Intent Detection, NER, Fragment Detection**

**Tasks Description**

``Intent Detection`` :- This is a single sentence classification task where an `intent` specifies which class the data sample belongs to. 

``NER`` :- This is a Named Entity Recognition/ Sequence Labelling/ Slot filling task where individual words of the sentence are tagged with an entity label it belongs to. The words which don't belong to any entity label are simply labeled as "O". 

``Fragment Detection`` :- This is modeled as a single sentence classification task which detects whether a sentence is incomplete (fragment) or not (non-fragment).

**Conversational Utility** :-  Intent detection is one of the fundamental components for conversational system as it gives a broad understand of the category/domain the sentence/query belongs to.

NER helps in extracting values for required entities (eg. location, date-time) from query.

Fragment detection is a very useful piece in conversational system as knowing if a query/sentence is incomplete can aid in discarding bad queries beforehand.


**Data** :- In this example, we are using the <a href="https://snips-nlu.readthedocs.io/en/latest/dataset.html">SNIPS</a> data for intent and entity detection. For the sake of simplicity, we provide 
the data in simpler form under ``snips_data`` directory taken from <a href="https://github.com/LeePleased/StackPropagation-SLU/tree/master/data/snips">here</a>.


In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
%cd /content/drive/MyDrive/nlp_proj

/content/drive/.shortcut-targets-by-id/1bwBUst6IlPPC4ILHcbv0GX9d8tIScYkv/nlp_proj


In [9]:
!pip install seqeval
!pip install tqdm
!pip install ipywidgets
!pip install Keras
!pip install torch
!pip install tensorflow
!pip install numpy
!pip install sphinx_rtd_theme
!pip install pandas
!pip install scikit_learn
!pip install PyYAML
!pip install transformers
!pip install joblib
!pip install sentencepiece

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting seqeval
  Downloading seqeval-1.2.2.tar.gz (43 kB)
[K     |████████████████████████████████| 43 kB 2.0 MB/s 
Building wheels for collected packages: seqeval
  Building wheel for seqeval (setup.py) ... [?25l[?25hdone
  Created wheel for seqeval: filename=seqeval-1.2.2-py3-none-any.whl size=16179 sha256=94540bce6aea5aa988babc91facbfe51781e5f3097b011140d8037cfa6bd7c8a
  Stored in directory: /root/.cache/pip/wheels/ad/5c/ba/05fa33fa5855777b7d686e843ec07452f22a66a138e290e732
Successfully built seqeval
Installing collected packages: seqeval
Successfully installed seqeval-1.2.2
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting jedi>=0.10
  Downloading jedi-0.18.2-py2.py3-none-any.whl (1.6 MB)
[K     |████████████████████████

# Step -1 Data 
Convert input data file to individual file task with labels and create dictionary of labels for each task


In [4]:
def transform_snip(readFile,createLab=False):
  # createLab = True
  # readFile = 'snips_dev.txt'
  f = open(readFile,"r")
  labelNER = {}
  labelIntent = {}
  sentence = []
  label = []
  uid = 0
  IntentDir = open('intent_{}.tsv'.format(readFile.split('.')[0]),"w")
  NERDir = open('NER_{}.tsv'.format(readFile.split('.')[0]),"w")
  for i,lines in enumerate(f):

    line = lines.strip(' ')
    items = line.strip('\n').split(' ')
    if line == '\n':
      NERDir.write("{}\t{}\t{}\n".format(uid,label,sentence))
      sentence = []
      label = []
      uid +=1
    elif len(items) == 2:
      sentence.append(items[0])
      label.append(items[1])
      if createLab and items[1] not in labelNER:
        labelNER[items[1]] = len(labelNER)
    elif len(items) == 1:
      intent = items[0]
      sentences = ' '.join(sentence)
      IntentDir.write("{}\t{}\t{}\n".format(uid,sentences,intent))
      if createLab and intent not in labelIntent:
        labelIntent[intent] = len(labelIntent)
  return labelNER,labelIntent



    






In [5]:
transform_snip('snips_dev.txt')
transform_snip('snips_test.txt')
labelNER,labelIntent = transform_snip('snips_train.txt',True)

In [6]:
print(labelNER)
print(labelIntent)
if "[CLS]" not in labelNER.keys():
  labelNER['[CLS]'] = len(labelNER)
if "[SEP]" not in labelNER.keys():
  labelNER['[SEP]'] = len(labelNER)
if "X" not in labelNER.keys():
  labelNER['X'] = len(labelNER)
if "O" not in labelNER.keys():
  labelNER["O"] = len(labelNER)

len(labelNER)
print(labelNER['O'])

{'O': 0, 'B-artist': 1, 'B-album': 2, 'B-service': 3, 'I-service': 4, 'B-entity_name': 5, 'I-entity_name': 6, 'B-playlist': 7, 'I-playlist': 8, 'B-object_select': 9, 'B-object_type': 10, 'B-rating_value': 11, 'B-best_rating': 12, 'B-music_item': 13, 'B-track': 14, 'I-track': 15, 'I-artist': 16, 'B-playlist_owner': 17, 'B-year': 18, 'B-sort': 19, 'B-movie_name': 20, 'I-movie_name': 21, 'B-party_size_number': 22, 'B-state': 23, 'B-city': 24, 'B-timeRange': 25, 'I-timeRange': 26, 'B-object_part_of_series_type': 27, 'I-object_type': 28, 'B-movie_type': 29, 'B-spatial_relation': 30, 'I-spatial_relation': 31, 'B-geographic_poi': 32, 'I-geographic_poi': 33, 'B-restaurant_type': 34, 'I-city': 35, 'B-party_size_description': 36, 'I-party_size_description': 37, 'B-object_location_type': 38, 'I-object_location_type': 39, 'B-object_name': 40, 'I-object_name': 41, 'I-movie_type': 42, 'B-rating_unit': 43, 'I-sort': 44, 'B-location_name': 45, 'I-location_name': 46, 'B-current_location': 47, 'I-curren

In [7]:
len(labelNER)
len(labelIntent)


7

In [None]:
def make_ngram(data,seq_len_left,seq_len_right):
        sequence_dict = {}
        for sentence in data:
            word_list = sentence.split()
            len_seq = len(word_list)
            for ngram in range(seq_len_right):
                i = 0
                while i + ngram < len_seq:
                    if len_seq - i - ngram - 1 >= seq_len_right:
                        right_seq = seq_len_right
                    else:
                        right_seq = len_seq - i - ngram - 1

                    if i >= seq_len_left:
                        left_seq = seq_len_left
                    else:
                        left_seq = i

                    key = " ".join(word_list[i:i + ngram + 1])

                    if sequence_dict.get(key, None) != None:
                        sequence_dict[key] = min(sequence_dict[key], left_seq + right_seq)
                    else:
                        sequence_dict[key] = left_seq + right_seq
                    i += 1
        return sequence_dict

In [None]:
from tqdm import tqdm
def validate_sequences(sequence_dict, seq_len_right, seq_len_left):
    micro_sequences = []
    macro_sequences = {}

    for key in sequence_dict.keys():
        score = sequence_dict[key]

        if score < 1 and len(key.split()) <= seq_len_right:
            micro_sequences.append(key)
        else:
            macro_sequences[key] = score

    non_frag_sequences = []
    macro_sequences_copy = macro_sequences.copy()

    for sent in tqdm(micro_sequences, total = len(micro_sequences)):
        for key in macro_sequences.keys():
            if sent in key:
                non_frag_sequences.append(key)
                del macro_sequences_copy[key]

        macro_sequences = macro_sequences_copy.copy()

    for sent in non_frag_sequences:
        macro_sequences[sent] = 0

    for sent in micro_sequences:
        macro_sequences[sent] = 0

    return macro_sequences

Create a fragment file: 1:fragment, 0:not Fragment

In [None]:
import pandas as pd
import random
def create_fragment(readFile,percent = 0.5):
  # readFile = 'intent_snips_dev.tsv'
  writeDir = 'frag_{}.tsv'.format(readFile.split('.')[0])
  # percent = 0.5
  allDataDf = pd.read_csv(readFile, sep="\t", header=None)
  sampledDataDf = allDataDf.sample(frac = 0.2, random_state=42)
  dic = make_ngram(list(sampledDataDf[1]),2,3)
  fragDict = validate_sequences(dic,2,3)
  #decide number of fragments to take 
  fragDict = random.sample(list(fragDict.keys()),k=int(len(fragDict)*percent))
  finalDf = pd.DataFrame({'uid' : [i for i in range(len(fragDict)+len(allDataDf))],
                              'label' : [1]*len(fragDict)+[0]*len(allDataDf),
                              'query' : fragDict+list(allDataDf.iloc[:, 1]) })
  finalDf.to_csv(writeDir, sep='\t',index=False, header=False)




In [None]:
# create_fragment('intent_snips_dev.tsv')
create_fragment('intent_snips_test.tsv')
create_fragment('intent_snips_train.tsv')
create_fragment('intent_snips_dev.tsv')

100%|██████████| 9/9 [00:00<00:00, 400.48it/s]
100%|██████████| 9/9 [00:00<00:00, 389.82it/s]
0it [00:00, ?it/s]


# Step -2 Data Preparation
1.put all the data into a single array


In [None]:
from ast import literal_eval
import transformers
import json
max_len= 128
def make_dataTokens(readFile,tasks,tokenizer):
  allDataArray = []
  file = open(readFile,"r")
  wf = '/content/drive/MyDrive/nlp_proj/prepared_data/{}.json'.format(readFile.split('.')[0])
  with open(wf,"w") as writeF:
    for f in file:
      item = f.rstrip('\n').split("\t")
      row = {}
      if tasks == "NER":
        # row = {"uid":item[0],"label":literal_eval(item[1]),"sentence":literal_eval(item[2])}
        label = literal_eval(item[1])
        sentence = literal_eval(item[2])
        tempLab,tempSent = NER_preprocess(label,sentence,tokenizer)
        # out = tokenizer.encode_plus(tempSent,add_special_tokens = False, truncation='only_first',max_length=max_len,pad_to_max_length=False)
        out = tokenizer.encode_plus(text = tempSent, add_special_tokens=False,
                                        truncation_strategy ='only_first',
                                        max_length = 128, pad_to_max_length=False)
        row['uid'] = item[0]
        row['label'] = tempLab
        row['input_id'] = out['input_ids']
        row['token_id'] = out['token_type_ids']
        row['attention_mask'] = out['attention_mask']
      if tasks == "Intent":
        if (not item[2].isnumeric()):
          label = labelIntent[item[2]]
        row = {"uid":item[0],"label":label}
        out = tokenizer.encode_plus(item[1],add_special_tokens = True, truncation='only_first',max_length=max_len,pad_to_max_length=False)
        row['input_id'] = out['input_ids']
        row['token_id'] = out['token_type_ids']
        row['attention_mask'] = out['attention_mask']
      if tasks == "Fragment":
        assert item[1].isnumeric()
        row = {"uid":item[0], "label":int(item[1])}
        out = tokenizer.encode_plus(item[2],add_special_tokens = True, truncation='only_first',max_length=max_len,pad_to_max_length=False)
        row['input_id'] = out['input_ids']
        row['token_id'] = out['token_type_ids']
        row['attention_mask'] = out['attention_mask']
      writeF.write('{}\n'.format(json.dumps(row)))
  print("finishing writing {}".format(wf))




#tokenize the word into sub tokens and only the first in the subtoken array gets assigned
#with a label
def NER_preprocess(label,sentence,tokenizer):
  tempSent = ['[CLS]']
  tempLab = [labelNER['[CLS]']]
  for word,label in zip(sentence,label):
    tokens = tokenizer.tokenize(word)
    for m,tok in enumerate(tokens):
      tempSent.append(tok)
      if m==0:
        tempLab.append(labelNER[label])
      else:
        tempLab.append(labelNER["X"])
  tempLab.append(labelNER['[SEP]'])
  tempSent.append('[SEP]')
  return tempLab,tempSent
    
# tokenizer = transformers.BertTokenizer.from_pretrained('bert-base-uncased')
# make_dataTokens("1","1",tokenizer)




In [10]:
tasks = {"NER":["NER_snips_dev.tsv","NER_snips_test.tsv","NER_snips_train.tsv"],
         "Intent":["intent_snips_dev.tsv","intent_snips_test.tsv","intent_snips_train.tsv"],
         "Fragment":["frag_intent_snips_dev.tsv","frag_intent_snips_test.tsv","frag_intent_snips_train.tsv"]}
import transformers
model = "bert"
task = "NER"
if model == "bert":
  tokenizer = transformers.BertTokenizer.from_pretrained('bert-base-uncased')
elif model == "distilbert":
  tokenizer = transformers.DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
for task,fileArrays in tasks.items():
    for file in fileArrays:
      make_dataTokens(file,task,tokenizer)







Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

NameError: ignored

dataLoder: 
combines all tasks data together
DataSampler:
for all the batches, random generate the next batch task 

In [11]:
import torch
from torch.utils.data import Dataset, DataLoader,BatchSampler
is_cuda = torch.cuda.is_available()
print(is_cuda)

True


In [57]:
import json
class allTaskDS(Dataset):
  def __init__(self,paths):
    allTaskData, id2Task = self.make_array(paths)
    self.allData = allTaskData
    self.id2Task = id2Task

  def make_array(self,paths):
    count = 0
    allTaskData = {}
    id2Task = {}
    for path in paths:
      data = []
      with open(path,"r") as f:
        for js in f:
          data.append(json.loads(js))
      allTaskData[count] = data
      task = path.split("/")[-1].split("_")[0]
      id2Task[count] = task
      count += 1
    return allTaskData, id2Task

  def __len__(self):
    return sum(len(v) for k,v in self.allData.items())
  
  def __getitem__(self,idx):
    task,sampleId = idx
    data_dic = {}
    task_dic = {}
    task_dic['task_id'] = task
    task_dic['task_type'] = self.id2Task[task]
    data_dic['task'] = task_dic
    data_dic['sample'] = self.allData[task][sampleId]
    return data_dic
      
path_train = ["/content/drive/MyDrive/nlp_proj/prepared_data/NER_snips_train.json","/content/drive/MyDrive/nlp_proj/prepared_data/frag_intent_snips_train.json","/content/drive/MyDrive/nlp_proj/prepared_data/intent_snips_train.json"]
path_dev = ["/content/drive/MyDrive/nlp_proj/prepared_data/NER_snips_dev.json","/content/drive/MyDrive/nlp_proj/prepared_data/frag_intent_snips_dev.json","/content/drive/MyDrive/nlp_proj/prepared_data/frag_intent_snips_train.json"]
path_test = ["/content/drive/MyDrive/nlp_proj/prepared_data/NER_snips_test.json","/content/drive/MyDrive/nlp_proj/prepared_data/frag_intent_snips_test.json","/content/drive/MyDrive/nlp_proj/prepared_data/intent_snips_test.json"]
train_ds = allTaskDS(path_train)
dev_ds = allTaskDS(path_dev)
test_ds = allTaskDS(path_test)









In [58]:
#mix all tasks into batches with their task labels
import random
class Batcher(BatchSampler):
  def __init__(self,datasetObj,batch_size):
    self.allData = datasetObj.allData
    self.id2Task = datasetObj.id2Task
    self.seed = 42
    self.shuffle_batches = True
    self.shuffle_task = True
    self.batch_size = batch_size
    self.sampledID = []   #[[1,2,3],[4,5,6]]
    self.allTaskID = []
    self.allTasks = []
    self.makeBatch()
  def makeBatch(self):
    for id,data in self.allData.items():
      batch_idx = [list(range(i,min(i+self.batch_size,len(data)))) for i in range(0,len(data),self.batch_size)]
      if self.shuffle_batches == True:
        random.seed(self.seed)
        random.shuffle(batch_idx)
      self.sampledID.append(batch_idx)
      self.allTasks.append(id)
    for i in range(len(self.sampledID)):
      self.allTaskID += [i] * len(self.sampledID[i])
    if self.shuffle_task == True:
        random.seed(self.seed)
        random.shuffle(self.allTaskID)
  def __iter__(self):
    dataIter = [iter(samples) for samples in self.sampledID]
    for taskId in self.allTaskID:
      batchTaskId = self.allTasks[taskId]
      batch = next(dataIter[taskId])
      yield [(batchTaskId,sampleId) for sampleId in batch]
      
      
train_batch = Batcher(train_ds,16)
test_batch = Batcher(test_ds,32)
dev_batch = Batcher(dev_ds,32)






In [59]:
def make_tensor(batch):
  input_tensors = []
  token_tensors = []
  attention_mask = []
  label_tensors = []
  for sample in batch:
    input_tensors.append(torch.LongTensor(sample['sample']['input_id']))
    token_tensors.append(torch.LongTensor(sample['sample']['token_id']))
    attention_mask.append(torch.LongTensor(sample['sample']['attention_mask']))
    if isinstance(sample['sample']['label'],int):
      sample['sample']['label'] = [sample['sample']['label']]
    label_tensors.append(torch.LongTensor(sample['sample']['label']))
  input_padded = torch.nn.utils.rnn.pad_sequence(input_tensors,batch_first=True)
  token_padded = torch.nn.utils.rnn.pad_sequence(token_tensors,batch_first=True)
  attention_padded = torch.nn.utils.rnn.pad_sequence(attention_mask,batch_first=True)
  label_padded = torch.nn.utils.rnn.pad_sequence(label_tensors,batch_first=True)
  assert input_padded.shape == token_padded.shape
  assert input_padded.shape == attention_padded.shape
  return [input_padded,token_padded,attention_padded,label_padded]
def collate_fn(batch):
  batch_meta = {}
  batch_meta['task_id'] = batch[0]['task']['task_id']
  batch_meta['task_type'] = batch[0]['task']['task_type']
  batch_meta['uid'] = [sample['sample']['uid'] for sample in batch]
  data = make_tensor(batch)
  return batch_meta, data





In [60]:
train_loader = DataLoader(train_ds, batch_sampler = train_batch,
                                collate_fn=collate_fn,
                                pin_memory=True)
dev_loader = DataLoader(dev_ds, batch_sampler = dev_batch,
                                collate_fn=collate_fn,
                                pin_memory=True)


test_loader = DataLoader(test_ds, batch_sampler = test_batch,
                                collate_fn=collate_fn,
                                pin_memory=True)






In [61]:
import torch.nn as nn
from transformers import *
import torch.nn.functional as F
class multiTaskModel(nn.Module):
  def __init__(self,training=True):
    super(multiTaskModel, self).__init__()
    self.transformer_model = BertModel.from_pretrained("bert-base-uncased")
    self.hidden = self.transformer_model.config.hidden_size
    self.tasks = ['NER', 'frag', 'intent']
    self.dropout = {'NER':0.3,'frag':0.3,'intent':0.3}
    self.num_class = {'NER':len(labelNER),'frag':2,'intent':len(labelIntent)}
    self.training = training
    self.headerDict, self.doDict=self.make_heads()
    self.init_headers()
  def make_heads(self):
    allHeaders = nn.ModuleDict()
    allDropouts = nn.ModuleDict()
    for task in self.tasks:
      dropout_layer = nn.Dropout(p=self.dropout[task])
      linear_layer = nn.Linear(self.hidden, self.num_class[task])
      allHeaders[task] = linear_layer
      allDropouts[task] = dropout_layer
    return allHeaders,allDropouts
  def init_headers(self):
    def init_weight(module):
      if isinstance(module,(nn.Linear,nn.Embedding)):
        module.weight.data.normal_(mean=0.0,std=0.02*1.0)
      if isinstance(module,nn.Linear):
        if module.bias is not None:
          module.bias.data.zero_()
    self.apply(init_weight)
  def forward(self,tokenID,typeID,attentionMask,taskId,taskName):
    out = self.transformer_model(input_ids=tokenID,token_type_ids=typeID,attention_mask = attentionMask)
    last_hidden_state = out[0]
    pooler = out[1]  #last layer hidden state of the first token
    if taskId == 0:     #NER use all hidden state
      output = self.doDict[taskName](last_hidden_state)
      logits = self.headerDict[taskName](output)
      return logits
    else:
      output = self.doDict[taskName](pooler)
      logits = self.headerDict[taskName](output)
      return logits






      
      
      



In [62]:
from torch.nn.modules.loss import _Loss
class NERLoss(_Loss):
  def __init__(self):
    super().__init__()
    self.ignore_index = -1
  def forward(self,inp,target,mask):
    nerLoss = mask.view(-1) == 1
    nerlogits = inp.view(-1, inp.size(-1))
    nerLabels = torch.where(
            nerLoss, target.view(-1), torch.tensor(self.ignore_index).type_as(target)
            )
    finalLoss = F.cross_entropy(nerlogits, nerLabels, ignore_index=self.ignore_index)
    return finalLoss


In [63]:
import torch.nn
cross_entropy_loss = nn.CrossEntropyLoss(ignore_index=-1)
NER_loss = NERLoss()
loss_dic = {'NER':cross_entropy_loss, 
            'frag':cross_entropy_loss, 
            'intent':cross_entropy_loss}





In [64]:
def pin_mem(meta,batch_data,gpu=False):
    if gpu:
        for i, part in enumerate(batch_data):
            if part is not None:
                if isinstance(part, torch.Tensor):
                    batch_data[i] = part.pin_memory().cuda(non_blocking=True)
                elif isinstance(part, tuple):
                    batch_data[i] = tuple(sub_part.pin_memory().cuda(non_blocking=True) for sub_part in part)
                elif isinstance(part, list):
                    batch_data[i] = [sub_part.pin_memory().cuda(non_blocking=True) for sub_part in part]
                else:
                    raise TypeError("unknown batch data type at %s: %s" % (i, part))

    return meta, batch_data
def _to_cuda(tensor):
    if tensor is None: return tensor

    if isinstance(tensor, list) or isinstance(tensor, tuple):
        y = [e.cuda(non_blocking=True) for e in tensor]
        for e in y:
            e.requires_grad = False
    else:
        y = tensor.cuda(non_blocking=True)
        y.requires_grad = False
    return y 

In [113]:
from transformers import AdamW, get_linear_schedule_with_warmup
import math
from tqdm import tqdm
model = multiTaskModel(training=True)
if is_cuda:
  model = model.cuda()
eps = 1e-8
lr = 2e-5
batch_size =16
epoch = 10
grad_accumulation = 4
trainS = math.ceil(len(train_ds)/batch_size) * epoch // grad_accumulation
optimizer = AdamW(model.parameters(),lr=lr,eps=eps)
scheduler = get_linear_schedule_with_warmup(optimizer,
                                            num_warmup_steps=0,
                                            num_training_steps=trainS)















loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--bert-base-uncased/snapshots/0a6aa9128b6194f4f3c4db429b6cb4891cdb421b/config.json
Model config BertConfig {
  "architectures": [
    "BertForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.25.1",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

loading weights file pytorch_model.bin from cache at /root/.cache/huggingface/hub/models--bert-base-uncased/snapshots/0a6aa9128b6194f4f3c4db429b6cb4891cdb421b/pytorch_model.bin
Some weights of the model check

In [66]:
def update(meta,data,gb_s,acc_s):
  model.train()
  target = data[3]
  if is_cuda:
    target = _to_cuda(target)
  taskId = meta['task_id']
  taskName = meta['task_type']
  modelInputs = data[:3]
  modelInputs += [taskId]
  modelInputs += [taskName]
  logits = model(*modelInputs)

  task_loss = 0
  if loss_dic[taskName] and (target is not None):
    if taskName == "NER":
      mask = data[2]
      nerLoss = mask.view(-1) == 1
      nerlogits = logits.view(-1, logits.size(-1))
      nerLabels = torch.where(
      nerLoss, target.view(-1), torch.tensor(-1).type_as(target))
      task_loss = loss_dic[taskName](nerlogits,nerLabels)
    else:
      target = target.view(-1)
      task_loss = loss_dic[taskName](logits,target)
  task_loss /= grad_accumulation
  task_loss.backward()
  acc_s += 1
  if acc_s == grad_accumulation:
    optimizer.step()
    scheduler.step()
    optimizer.zero_grad()
    gb_s += 1
    acc_s = 0
  return task_loss,gb_s,acc_s







  



In [109]:
import numpy as np
def predict_step(meta,data):
  model.eval()
  taskId = meta['task_id']
  taskName = meta['task_type']
  modelInputs = data[:3] + [taskId] + [taskName]
  logits = model(*modelInputs)
  if taskName == "NER":
    #output (batchsize, len,numclass)
    softmax = nn.functional.softmax(logits,dim=2).data.cpu().numpy()
    sigmoid = nn.functional.sigmoid(logits).data.cpu().numpy()
    predicted = np.argmax(softmax,axis=2).tolist()
    score = np.max(sigmoid,axis=2).tolist()
    attention_mask = data[2]
    predictedTags = []
    predictedscore = []
    # if attention_mask is not None:
    sentence_len = attention_mask.cpu().numpy().sum(axis=1).tolist()
    for i,(tag,score) in enumerate(zip(predicted,score)):
      predictedTags.append(tag[:sentence_len[i]])
      predictedscore.append(score[:sentence_len[i]])
    return predictedTags,predictedscore


  else:
    softmax = nn.functional.softmax(logits, dim=1).data.cpu().numpy()
    sigmoid = nn.functional.sigmoid(logits).data.cpu().numpy()
    prediction = np.argmax(softmax,axis=1)
    predicted_l = prediction.tolist()
    return predicted_l, softmax


In [68]:
#citing using conlleval
def __startOfChunk(prevTag, tag, prevTagType, tagType, chunkStart = False):
    if prevTag == 'B' and tag == 'B':
        chunkStart = True
    if prevTag == 'I' and tag == 'B':
        chunkStart = True
    if prevTag == 'O' and tag == 'B':
        chunkStart = True
    if prevTag == 'O' and tag == 'I':
        chunkStart = True

    if prevTag == 'E' and tag == 'E':
        chunkStart = True
    if prevTag == 'E' and tag == 'I':
        chunkStart = True
    if prevTag == 'O' and tag == 'E':
        chunkStart = True
    if prevTag == 'O' and tag == 'I':
        chunkStart = True

    if tag != 'O' and tag != '.' and prevTagType != tagType:
        chunkStart = True
    return chunkStart

def __endOfChunk(prevTag, tag, prevTagType, tagType, chunkEnd = False):
    if prevTag == 'B' and tag == 'B':
        chunkEnd = True
    if prevTag == 'B' and tag == 'O':
        chunkEnd = True
    if prevTag == 'I' and tag == 'B':
        chunkEnd = True
    if prevTag == 'I' and tag == 'O':
        chunkEnd = True

    if prevTag == 'E' and tag == 'E':
        chunkEnd = True
    if prevTag == 'E' and tag == 'I':
        chunkEnd = True
    if prevTag == 'E' and tag == 'O':
        chunkEnd = True
    if prevTag == 'I' and tag == 'O':
        chunkEnd = True

    if prevTag != 'O' and prevTag != '.' and prevTagType != tagType:
        chunkEnd = True
    return chunkEnd

def __splitTagType(tag):
    s = tag.split('-')
    if len(s) > 2 or len(s) == 0:
        raise ValueError('tag format wrong. it must be B-xxx.xxx')
    if len(s) == 1:
        tag = s[0]
        tagType = ""
    else:
        tag = s[0]
        tagType = s[1]
    return tag, tagType

def computeF1Score(correct_slots, pred_slots):

    correctChunk = {}
    correctChunkCnt = 0
    foundCorrect = {}
    foundCorrectCnt = 0
    foundPred = {}
    foundPredCnt = 0
    correctTags = 0
    tokenCount = 0
    for correct_slot, pred_slot in zip(correct_slots, pred_slots):
        inCorrect = False
        lastCorrectTag = 'O'
        lastCorrectType = ''
        lastPredTag = 'O'
        lastPredType = ''
        for c, p in zip(correct_slot, pred_slot):
            correctTag, correctType = __splitTagType(c)
            predTag, predType = __splitTagType(p)

            if inCorrect == True:
                if __endOfChunk(lastCorrectTag, correctTag, lastCorrectType, correctType) == True and \
                   __endOfChunk(lastPredTag, predTag, lastPredType, predType) == True and \
                   (lastCorrectType == lastPredType):
                    inCorrect = False
                    correctChunkCnt += 1
                    if lastCorrectType in correctChunk:
                        correctChunk[lastCorrectType] += 1
                    else:
                        correctChunk[lastCorrectType] = 1
                elif __endOfChunk(lastCorrectTag, correctTag, lastCorrectType, correctType) != \
                     __endOfChunk(lastPredTag, predTag, lastPredType, predType) or \
                     (correctType != predType):
                    inCorrect = False

            if __startOfChunk(lastCorrectTag, correctTag, lastCorrectType, correctType) == True and \
               __startOfChunk(lastPredTag, predTag, lastPredType, predType) == True and \
               (correctType == predType):
                inCorrect = True

            if __startOfChunk(lastCorrectTag, correctTag, lastCorrectType, correctType) == True:
                foundCorrectCnt += 1
                if correctType in foundCorrect:
                    foundCorrect[correctType] += 1
                else:
                    foundCorrect[correctType] = 1

            if __startOfChunk(lastPredTag, predTag, lastPredType, predType) == True:
                foundPredCnt += 1
                if predType in foundPred:
                    foundPred[predType] += 1
                else:
                    foundPred[predType] = 1

            if correctTag == predTag and correctType == predType:
                correctTags += 1

            tokenCount += 1

            lastCorrectTag = correctTag
            lastCorrectType = correctType
            lastPredTag = predTag
            lastPredType = predType

        if inCorrect == True:
            correctChunkCnt += 1
            if lastCorrectType in correctChunk:
                correctChunk[lastCorrectType] += 1
            else:
                correctChunk[lastCorrectType] = 1

    if foundPredCnt > 0:
        precision = 100*correctChunkCnt/foundPredCnt
    else:
        precision = 0

    if foundCorrectCnt > 0:
        recall = 100*correctChunkCnt/foundCorrectCnt
    else:
        recall = 0

    if (precision+recall) > 0:
        f1 = (2*precision*recall)/(precision+recall)
    else:
        f1 = 0

    return f1, precision, recall




In [111]:
from sklearn.metrics import accuracy_score
def evaluate(dataset,batchSampler,dataLoader,batch_size,evaluate_metric=True):
  numStep = math.ceil(len(dataset)/batch_size)
  allLabel = [[],[],[]]
  allPred = [[],[],[]]
  allScore = [[],[],[]]
  allId = [[],[],[]]
  tasks = ['NER', 'frag', 'intent']
  for meta,data in tqdm(dataLoader,total=numStep):
    meta,data = pin_mem(meta,data,is_cuda)
    taskID = int(meta['task_id'])
    tags,score = predict_step(meta,data)
    if (meta['task_type'] == "frag" or meta['task_type'] == 'intent'):
      label = data[3].view(data[3].shape[0]).data.cpu().numpy()
    elif (meta['task_type'] == "NER"):
      label = data[3].data.cpu().numpy()
    allLabel[taskID].extend(label)
    allScore[taskID].extend(score)
    allPred[taskID].extend(tags)
    allId[taskID].extend(meta['uid'])
  for i in range(len(allPred)):
    if allPred[i] == []:
      continue
    taskName = tasks[i]
    if taskName == "NER":
      for j,(p,l) in enumerate(zip(allPred[i],allLabel[i])):
        itoLab = {v:k for k,v in labelNER.items()}
        allLabel[i][j] = l[:len(p)]
        allPred[i][j] = [itoLab[int(id)] for id in p]
        allLabel[i][j] = [itoLab[int(id)] for id in allLabel[i][j]]
      # print("-------")
      # print(allPred)
      newPred = []
      newLab = []
      newScore = []
      for m,sample in enumerate(allLabel[i]):
        onepred = []
        oneLab = []
        oneScore = []
        for n,ele in enumerate(sample):
          if ele != '[CLS]' and ele != '[SEP]' and ele != 'X':
            onepred.append(allPred[i][m][n])
            oneScore.append(allScore[i][m][n])
            oneLab.append(ele)
        newPred.append(onepred)
        newLab.append(oneLab)
        newScore.append(oneScore)
      allLabel[i] = newLab
      allPred[i] = newPred
      allScore[i] = newScore
    if taskName == 'frag':
      Labeldic = {1:"frag",0:"complete"}
      allPred[i] = [Labeldic[pred] for pred in allPred[i]]
      allLabel[i] = [Labeldic[lab] for lab in allLabel[i]]
    if taskName == 'intent':
      itoLab = {v:k for k,v in labelIntent.items()}
      allPred[i] = [itoLab[pred] for pred in allPred[i]]
      allLabel[i] = [itoLab[lab] for lab in allLabel[i]]
    
  if evaluate_metric:
    print('********** Evaluation{}************'.format(tasks[i]))
    for i in range(len(allPred)):
      if allPred[i] == []:
        continue
      taskName = tasks[i]
      if taskName == "NER":
        f1, precision, recall = computeF1Score(allLabel[i],allPred[i])
        print('f1:{}, precision:{}, recall:{} '.format(f1, precision, recall))
      
      elif taskName == "frag" or taskName == "intent":
        metric_val = accuracy_score(allLabel[i],allPred[i])*100
        print('accuracy:{}'.format(metric_val))
   
        
        






In [None]:
gb_s = 0
acc_s = 0
logging_update_s = 50

for e in range(epoch):
  totalEpochLoss = 0
  t = math.ceil(len(train_ds)/batch_size)
  description = "Epoch: {}".format(e)
  with tqdm(total=t,position=e,desc=description) as progress:
    for i,(meta,data)in enumerate(train_loader):
      meta,data = pin_mem(meta,data,is_cuda)
      task_loss,gb_s,acc_s = update(meta,data,gb_s,acc_s)
      totalEpochLoss += task_loss
      if gb_s % logging_update_s == 0 and (acc_s+1 == grad_accumulation):
        avg_loss = task_loss / (i+1)
        taskName = meta['task_type']
        print('Steps: {} Task: {} Avg.Loss: {} Task Loss: {}'.format(gb_s,taskName,avg_loss,task_loss))
      progress.update(1)
    evaluate(dev_ds,dev_batch,dev_loader,32,evaluate_metric=True)


Epoch: 0:   0%|          | 7/3282 [00:00<03:06, 17.53it/s]

Steps: 0 Task: NER Avg.Loss: 0.3435479402542114 Task Loss: 1.0306438207626343


Epoch: 0:   6%|▌         | 205/3282 [00:11<02:47, 18.33it/s]

Steps: 50 Task: frag Avg.Loss: 0.00022028795501682907 Task Loss: 0.04471845552325249


Epoch: 0:  12%|█▏        | 406/3282 [00:22<02:37, 18.31it/s]

Steps: 100 Task: frag Avg.Loss: 2.6726311261882074e-05 Task Loss: 0.010770703665912151


Epoch: 0:  18%|█▊        | 605/3282 [00:34<03:00, 14.83it/s]

Steps: 150 Task: frag Avg.Loss: 1.9536866602720693e-05 Task Loss: 0.011780730448663235


Epoch: 0:  25%|██▍       | 805/3282 [00:45<02:15, 18.30it/s]

Steps: 200 Task: frag Avg.Loss: 2.095280433422886e-05 Task Loss: 0.016825102269649506


Epoch: 0:  31%|███       | 1005/3282 [00:56<02:14, 16.98it/s]

Steps: 250 Task: intent Avg.Loss: 9.921723540173844e-05 Task Loss: 0.09951488673686981


Epoch: 0:  37%|███▋      | 1206/3282 [01:07<01:59, 17.36it/s]

Steps: 300 Task: frag Avg.Loss: 2.672851042007096e-05 Task Loss: 0.032154396176338196


Epoch: 0:  43%|████▎     | 1405/3282 [01:19<01:50, 16.95it/s]

Steps: 350 Task: intent Avg.Loss: 2.395657793385908e-05 Task Loss: 0.033611077815294266


Epoch: 0:  49%|████▉     | 1605/3282 [01:30<01:41, 16.50it/s]

Steps: 400 Task: intent Avg.Loss: 2.6298796001356095e-05 Task Loss: 0.042156971991062164


Epoch: 0:  55%|█████▍    | 1805/3282 [01:42<01:31, 16.15it/s]

Steps: 450 Task: frag Avg.Loss: 3.871704757330008e-05 Task Loss: 0.06980683654546738


Epoch: 0:  61%|██████    | 2006/3282 [01:54<01:18, 16.22it/s]

Steps: 500 Task: intent Avg.Loss: 2.357919584028423e-05 Task Loss: 0.04722912982106209


Epoch: 0:  67%|██████▋   | 2206/3282 [02:05<00:56, 19.13it/s]

Steps: 550 Task: NER Avg.Loss: 0.0001104569819290191 Task Loss: 0.2433367222547531


Epoch: 0:  73%|███████▎  | 2405/3282 [02:17<00:51, 17.11it/s]

Steps: 600 Task: intent Avg.Loss: 1.180172785097966e-05 Task Loss: 0.02835955284535885


Epoch: 0:  79%|███████▉  | 2606/3282 [02:28<00:38, 17.58it/s]

Steps: 650 Task: frag Avg.Loss: 1.1450767942733364e-06 Task Loss: 0.002980634802952409


Epoch: 0:  86%|████████▌ | 2807/3282 [02:39<00:26, 18.18it/s]

Steps: 700 Task: frag Avg.Loss: 4.674655428971164e-06 Task Loss: 0.013103059493005276


Epoch: 0:  92%|█████████▏| 3005/3282 [02:51<00:17, 16.24it/s]

Steps: 750 Task: frag Avg.Loss: 5.008433845432592e-07 Task Loss: 0.001504032756201923


Epoch: 0:  98%|█████████▊| 3205/3282 [03:03<00:04, 17.45it/s]

Steps: 800 Task: frag Avg.Loss: 9.430872864868434e-07 Task Loss: 0.003020708682015538


Epoch: 0: 3283it [03:08, 19.01it/s]                          

  0%|          | 3/901 [00:00<00:40, 22.26it/s][A
  1%|          | 7/901 [00:00<00:31, 28.06it/s][A
  1%|          | 11/901 [00:00<00:30, 28.85it/s][A
  2%|▏         | 15/901 [00:00<00:28, 31.53it/s][A
  2%|▏         | 19/901 [00:00<00:26, 32.72it/s][A
  3%|▎         | 23/901 [00:00<00:28, 31.24it/s][A
  3%|▎         | 27/901 [00:00<00:29, 29.74it/s][A
  3%|▎         | 31/901 [00:01<00:26, 32.39it/s][A
  4%|▍         | 36/901 [00:01<00:24, 34.92it/s][A
  4%|▍         | 40/901 [00:01<00:24, 35.41it/s][A
  5%|▍         | 45/901 [00:01<00:23, 36.01it/s][A
  5%|▌         | 49/901 [00:01<00:23, 36.03it/s][A
  6%|▌         | 53/901 [00:01<00:24, 34.08it/s][A
  6%|▋         | 57/901 [00:01<00:25, 33.50it/s][A
  7%|▋         | 61/901 [00:01<00:25, 32.44it/s][A
  7%|▋         | 65/901 [00:02<00:26, 31.81it/s][A
  8%|▊         | 69/901 [00:02<00:26, 31.25it/s][A
  8%|▊         | 74/901 [00:02<00:25, 32.95it/s][A
  9

********** Evaluationintent************
f1:48.550000000000004, precision:44.01631912964642, recall:54.12486064659978 
accuracy:99.04921700223713
accuracy:99.05076508334282



Epoch: 1:   0%|          | 0/3282 [00:00<?, ?it/s][A
Epoch: 1:   0%|          | 3/3282 [00:00<03:22, 16.22it/s][A
Epoch: 1:   0%|          | 5/3282 [00:00<03:56, 13.85it/s][A
Epoch: 1:   0%|          | 7/3282 [00:00<04:03, 13.43it/s][A
Epoch: 1:   0%|          | 9/3282 [00:00<04:14, 12.86it/s][A
Epoch: 1:   0%|          | 11/3282 [00:00<04:05, 13.34it/s][A
Epoch: 1:   0%|          | 13/3282 [00:01<04:37, 11.80it/s][A
Epoch: 1:   0%|          | 15/3282 [00:01<04:21, 12.48it/s][A
Epoch: 1:   1%|          | 17/3282 [00:01<04:37, 11.74it/s][A
Epoch: 1:   1%|          | 19/3282 [00:01<04:21, 12.49it/s][A
Epoch: 1:   1%|          | 21/3282 [00:01<04:28, 12.13it/s][A
Epoch: 1:   1%|          | 23/3282 [00:01<04:16, 12.71it/s][A
Epoch: 1:   1%|          | 25/3282 [00:01<04:25, 12.27it/s][A
Epoch: 1:   1%|          | 27/3282 [00:02<04:14, 12.78it/s][A
Epoch: 1:   1%|          | 29/3282 [00:02<04:12, 12.88it/s][A
Epoch: 1:   1%|          | 32/3282 [00:02<03:30, 15.45it/s][A
Epoc

Steps: 850 Task: intent Avg.Loss: 0.00014522518904414028 Task Loss: 0.017427021637558937



Epoch: 1:   4%|▍         | 125/3282 [00:08<03:05, 17.04it/s][A
Epoch: 1:   4%|▍         | 128/3282 [00:08<02:57, 17.72it/s][A
Epoch: 1:   4%|▍         | 130/3282 [00:08<03:10, 16.54it/s][A
Epoch: 1:   4%|▍         | 133/3282 [00:08<03:20, 15.72it/s][A
Epoch: 1:   4%|▍         | 136/3282 [00:08<02:58, 17.67it/s][A
Epoch: 1:   4%|▍         | 138/3282 [00:09<03:05, 16.98it/s][A
Epoch: 1:   4%|▍         | 141/3282 [00:09<03:10, 16.47it/s][A
Epoch: 1:   4%|▍         | 143/3282 [00:09<03:02, 17.16it/s][A
Epoch: 1:   4%|▍         | 145/3282 [00:09<03:17, 15.88it/s][A
Epoch: 1:   5%|▍         | 148/3282 [00:09<02:46, 18.80it/s][A
Epoch: 1:   5%|▍         | 150/3282 [00:09<03:06, 16.79it/s][A
Epoch: 1:   5%|▍         | 153/3282 [00:09<03:03, 17.02it/s][A
Epoch: 1:   5%|▍         | 156/3282 [00:10<02:37, 19.79it/s][A
Epoch: 1:   5%|▍         | 159/3282 [00:10<02:48, 18.51it/s][A
Epoch: 1:   5%|▍         | 161/3282 [00:10<03:04, 16.94it/s][A
Epoch: 1:   5%|▍         | 164/3282 [00

Steps: 900 Task: NER Avg.Loss: 0.0006875919061712921 Task Loss: 0.22002941370010376



Epoch: 1:  10%|▉         | 325/3282 [00:20<03:16, 15.02it/s][A
Epoch: 1:  10%|▉         | 328/3282 [00:20<02:57, 16.62it/s][A
Epoch: 1:  10%|█         | 330/3282 [00:20<03:03, 16.10it/s][A
Epoch: 1:  10%|█         | 333/3282 [00:20<02:45, 17.81it/s][A
Epoch: 1:  10%|█         | 336/3282 [00:20<02:24, 20.45it/s][A
Epoch: 1:  10%|█         | 339/3282 [00:20<02:41, 18.22it/s][A
Epoch: 1:  10%|█         | 341/3282 [00:21<02:49, 17.34it/s][A
Epoch: 1:  10%|█         | 344/3282 [00:21<02:35, 18.85it/s][A
Epoch: 1:  11%|█         | 346/3282 [00:21<02:46, 17.65it/s][A
Epoch: 1:  11%|█         | 348/3282 [00:21<02:44, 17.85it/s][A
Epoch: 1:  11%|█         | 350/3282 [00:21<03:02, 16.09it/s][A
Epoch: 1:  11%|█         | 352/3282 [00:21<02:54, 16.78it/s][A
Epoch: 1:  11%|█         | 354/3282 [00:21<02:57, 16.49it/s][A
Epoch: 1:  11%|█         | 356/3282 [00:21<03:04, 15.89it/s][A
Epoch: 1:  11%|█         | 358/3282 [00:22<03:00, 16.19it/s][A
Epoch: 1:  11%|█         | 361/3282 [00

Steps: 950 Task: NER Avg.Loss: 0.0004370818205643445 Task Loss: 0.2272825539112091



Epoch: 1:  16%|█▌        | 524/3282 [00:32<02:44, 16.73it/s][A
Epoch: 1:  16%|█▌        | 526/3282 [00:32<02:55, 15.69it/s][A
Epoch: 1:  16%|█▌        | 529/3282 [00:32<02:44, 16.74it/s][A
Epoch: 1:  16%|█▌        | 533/3282 [00:32<02:27, 18.65it/s][A
Epoch: 1:  16%|█▋        | 536/3282 [00:32<02:25, 18.83it/s][A
Epoch: 1:  16%|█▋        | 538/3282 [00:32<02:37, 17.47it/s][A
Epoch: 1:  16%|█▋        | 541/3282 [00:33<02:46, 16.43it/s][A
Epoch: 1:  17%|█▋        | 543/3282 [00:33<02:39, 17.16it/s][A
Epoch: 1:  17%|█▋        | 545/3282 [00:33<02:41, 16.98it/s][A
Epoch: 1:  17%|█▋        | 548/3282 [00:33<02:40, 17.01it/s][A
Epoch: 1:  17%|█▋        | 550/3282 [00:33<02:46, 16.39it/s][A
Epoch: 1:  17%|█▋        | 552/3282 [00:33<02:43, 16.69it/s][A
Epoch: 1:  17%|█▋        | 554/3282 [00:33<02:44, 16.55it/s][A
Epoch: 1:  17%|█▋        | 556/3282 [00:33<02:37, 17.33it/s][A
Epoch: 1:  17%|█▋        | 558/3282 [00:34<03:10, 14.32it/s][A
Epoch: 1:  17%|█▋        | 560/3282 [00

Steps: 1000 Task: frag Avg.Loss: 7.442190508299973e-06 Task Loss: 0.005358377005904913



Epoch: 1:  22%|██▏       | 724/3282 [00:43<02:29, 17.07it/s][A
Epoch: 1:  22%|██▏       | 726/3282 [00:44<02:34, 16.53it/s][A
Epoch: 1:  22%|██▏       | 729/3282 [00:44<02:43, 15.64it/s][A
Epoch: 1:  22%|██▏       | 732/3282 [00:44<02:23, 17.79it/s][A
Epoch: 1:  22%|██▏       | 734/3282 [00:44<02:33, 16.65it/s][A
Epoch: 1:  22%|██▏       | 737/3282 [00:44<02:42, 15.62it/s][A
Epoch: 1:  23%|██▎       | 741/3282 [00:44<02:30, 16.91it/s][A
Epoch: 1:  23%|██▎       | 744/3282 [00:45<02:21, 17.99it/s][A
Epoch: 1:  23%|██▎       | 746/3282 [00:45<02:28, 17.09it/s][A
Epoch: 1:  23%|██▎       | 749/3282 [00:45<02:39, 15.86it/s][A
Epoch: 1:  23%|██▎       | 752/3282 [00:45<02:26, 17.26it/s][A
Epoch: 1:  23%|██▎       | 754/3282 [00:45<02:25, 17.35it/s][A
Epoch: 1:  23%|██▎       | 757/3282 [00:45<02:24, 17.49it/s][A
Epoch: 1:  23%|██▎       | 760/3282 [00:45<02:07, 19.79it/s][A
Epoch: 1:  23%|██▎       | 763/3282 [00:46<02:09, 19.48it/s][A
Epoch: 1:  23%|██▎       | 766/3282 [00

Steps: 1050 Task: intent Avg.Loss: 5.715386214433238e-05 Task Loss: 0.052581556141376495



Epoch: 1:  28%|██▊       | 925/3282 [00:55<02:24, 16.36it/s][A
Epoch: 1:  28%|██▊       | 928/3282 [00:55<02:03, 19.06it/s][A
Epoch: 1:  28%|██▊       | 931/3282 [00:55<02:13, 17.55it/s][A
Epoch: 1:  28%|██▊       | 933/3282 [00:56<02:22, 16.43it/s][A
Epoch: 1:  29%|██▊       | 936/3282 [00:56<02:02, 19.08it/s][A
Epoch: 1:  29%|██▊       | 939/3282 [00:56<02:11, 17.77it/s][A
Epoch: 1:  29%|██▊       | 941/3282 [00:56<02:28, 15.74it/s][A
Epoch: 1:  29%|██▊       | 943/3282 [00:56<02:20, 16.61it/s][A
Epoch: 1:  29%|██▉       | 945/3282 [00:56<02:24, 16.13it/s][A
Epoch: 1:  29%|██▉       | 947/3282 [00:56<02:20, 16.63it/s][A
Epoch: 1:  29%|██▉       | 949/3282 [00:57<02:29, 15.56it/s][A
Epoch: 1:  29%|██▉       | 951/3282 [00:57<02:21, 16.51it/s][A
Epoch: 1:  29%|██▉       | 953/3282 [00:57<02:25, 16.02it/s][A
Epoch: 1:  29%|██▉       | 956/3282 [00:57<02:02, 18.93it/s][A
Epoch: 1:  29%|██▉       | 958/3282 [00:57<02:11, 17.62it/s][A
Epoch: 1:  29%|██▉       | 961/3282 [00

Steps: 1100 Task: NER Avg.Loss: 0.0002033203636528924 Task Loss: 0.2277188003063202



Epoch: 1:  34%|███▍      | 1124/3282 [01:07<02:07, 16.86it/s][A
Epoch: 1:  34%|███▍      | 1126/3282 [01:07<02:20, 15.38it/s][A
Epoch: 1:  34%|███▍      | 1129/3282 [01:07<02:11, 16.36it/s][A
Epoch: 1:  34%|███▍      | 1132/3282 [01:07<02:03, 17.35it/s][A
Epoch: 1:  35%|███▍      | 1134/3282 [01:07<02:01, 17.64it/s][A
Epoch: 1:  35%|███▍      | 1137/3282 [01:07<02:10, 16.42it/s][A
Epoch: 1:  35%|███▍      | 1140/3282 [01:07<01:59, 17.93it/s][A
Epoch: 1:  35%|███▍      | 1142/3282 [01:08<02:08, 16.63it/s][A
Epoch: 1:  35%|███▍      | 1145/3282 [01:08<02:11, 16.19it/s][A
Epoch: 1:  35%|███▍      | 1147/3282 [01:08<02:07, 16.77it/s][A
Epoch: 1:  35%|███▌      | 1149/3282 [01:08<02:19, 15.25it/s][A
Epoch: 1:  35%|███▌      | 1152/3282 [01:08<02:02, 17.36it/s][A
Epoch: 1:  35%|███▌      | 1154/3282 [01:08<02:11, 16.21it/s][A
Epoch: 1:  35%|███▌      | 1157/3282 [01:09<02:06, 16.84it/s][A
Epoch: 1:  35%|███▌      | 1160/3282 [01:09<01:47, 19.73it/s][A
Epoch: 1:  35%|███▌     

Steps: 1150 Task: intent Avg.Loss: 2.197891444666311e-05 Task Loss: 0.0290121678262949



Epoch: 1:  40%|████      | 1324/3282 [01:19<01:50, 17.71it/s][A
Epoch: 1:  40%|████      | 1326/3282 [01:19<02:06, 15.48it/s][A
Epoch: 1:  40%|████      | 1329/3282 [01:19<01:57, 16.67it/s][A
Epoch: 1:  41%|████      | 1332/3282 [01:19<01:40, 19.33it/s][A
Epoch: 1:  41%|████      | 1335/3282 [01:19<01:42, 18.99it/s][A
Epoch: 1:  41%|████      | 1337/3282 [01:19<01:51, 17.48it/s][A
Epoch: 1:  41%|████      | 1340/3282 [01:19<01:46, 18.25it/s][A
Epoch: 1:  41%|████      | 1342/3282 [01:20<01:56, 16.66it/s][A
Epoch: 1:  41%|████      | 1345/3282 [01:20<01:57, 16.53it/s][A
Epoch: 1:  41%|████      | 1347/3282 [01:20<01:54, 16.93it/s][A
Epoch: 1:  41%|████      | 1349/3282 [01:20<02:03, 15.59it/s][A
Epoch: 1:  41%|████      | 1352/3282 [01:20<01:50, 17.47it/s][A
Epoch: 1:  41%|████▏     | 1354/3282 [01:20<01:56, 16.58it/s][A
Epoch: 1:  41%|████▏     | 1357/3282 [01:21<01:59, 16.13it/s][A
Epoch: 1:  41%|████▏     | 1360/3282 [01:21<01:46, 18.08it/s][A
Epoch: 1:  41%|████▏    

Steps: 1200 Task: NER Avg.Loss: 9.440496796742082e-05 Task Loss: 0.14349554479122162



Epoch: 1:  46%|████▋     | 1524/3282 [01:30<01:50, 15.86it/s][A
Epoch: 1:  46%|████▋     | 1526/3282 [01:31<01:47, 16.29it/s][A
Epoch: 1:  47%|████▋     | 1529/3282 [01:31<01:55, 15.16it/s][A
Epoch: 1:  47%|████▋     | 1531/3282 [01:31<01:50, 15.88it/s][A
Epoch: 1:  47%|████▋     | 1533/3282 [01:31<01:57, 14.91it/s][A
Epoch: 1:  47%|████▋     | 1536/3282 [01:31<01:41, 17.21it/s][A
Epoch: 1:  47%|████▋     | 1538/3282 [01:31<01:42, 17.04it/s][A
Epoch: 1:  47%|████▋     | 1541/3282 [01:31<01:43, 16.77it/s][A
Epoch: 1:  47%|████▋     | 1544/3282 [01:32<01:28, 19.60it/s][A
Epoch: 1:  47%|████▋     | 1547/3282 [01:32<01:38, 17.59it/s][A
Epoch: 1:  47%|████▋     | 1549/3282 [01:32<01:41, 16.99it/s][A
Epoch: 1:  47%|████▋     | 1552/3282 [01:32<01:27, 19.88it/s][A
Epoch: 1:  47%|████▋     | 1555/3282 [01:32<01:32, 18.72it/s][A
Epoch: 1:  47%|████▋     | 1558/3282 [01:32<01:42, 16.90it/s][A
Epoch: 1:  48%|████▊     | 1561/3282 [01:33<01:43, 16.62it/s][A
Epoch: 1:  48%|████▊    

Steps: 1250 Task: intent Avg.Loss: 3.935663244192256e-06 Task Loss: 0.006769340485334396



Epoch: 1:  53%|█████▎    | 1724/3282 [01:42<01:28, 17.53it/s][A
Epoch: 1:  53%|█████▎    | 1726/3282 [01:42<01:33, 16.70it/s][A
Epoch: 1:  53%|█████▎    | 1729/3282 [01:43<01:36, 16.11it/s][A
Epoch: 1:  53%|█████▎    | 1732/3282 [01:43<01:24, 18.29it/s][A
Epoch: 1:  53%|█████▎    | 1734/3282 [01:43<01:33, 16.51it/s][A
Epoch: 1:  53%|█████▎    | 1736/3282 [01:43<01:31, 16.86it/s][A
Epoch: 1:  53%|█████▎    | 1738/3282 [01:43<01:38, 15.64it/s][A
Epoch: 1:  53%|█████▎    | 1740/3282 [01:43<01:39, 15.51it/s][A
Epoch: 1:  53%|█████▎    | 1742/3282 [01:43<01:48, 14.21it/s][A
Epoch: 1:  53%|█████▎    | 1745/3282 [01:44<01:46, 14.47it/s][A
Epoch: 1:  53%|█████▎    | 1749/3282 [01:44<01:37, 15.76it/s][A
Epoch: 1:  53%|█████▎    | 1752/3282 [01:44<01:23, 18.32it/s][A
Epoch: 1:  53%|█████▎    | 1754/3282 [01:44<01:35, 16.00it/s][A
Epoch: 1:  54%|█████▎    | 1757/3282 [01:44<01:25, 17.86it/s][A
Epoch: 1:  54%|█████▎    | 1760/3282 [01:44<01:21, 18.74it/s][A
Epoch: 1:  54%|█████▎   

Steps: 1300 Task: frag Avg.Loss: 1.4401323369384045e-06 Task Loss: 0.0027650538831949234



Epoch: 1:  59%|█████▊    | 1924/3282 [01:54<01:27, 15.59it/s][A
Epoch: 1:  59%|█████▊    | 1926/3282 [01:54<01:29, 15.22it/s][A
Epoch: 1:  59%|█████▊    | 1928/3282 [01:54<01:23, 16.15it/s][A
Epoch: 1:  59%|█████▉    | 1930/3282 [01:54<01:26, 15.54it/s][A
Epoch: 1:  59%|█████▉    | 1932/3282 [01:55<01:22, 16.44it/s][A
Epoch: 1:  59%|█████▉    | 1934/3282 [01:55<01:23, 16.12it/s][A
Epoch: 1:  59%|█████▉    | 1936/3282 [01:55<01:22, 16.26it/s][A
Epoch: 1:  59%|█████▉    | 1938/3282 [01:55<01:32, 14.48it/s][A
Epoch: 1:  59%|█████▉    | 1941/3282 [01:55<01:26, 15.48it/s][A
Epoch: 1:  59%|█████▉    | 1944/3282 [01:55<01:12, 18.55it/s][A
Epoch: 1:  59%|█████▉    | 1946/3282 [01:55<01:22, 16.27it/s][A
Epoch: 1:  59%|█████▉    | 1949/3282 [01:56<01:27, 15.22it/s][A
Epoch: 1:  59%|█████▉    | 1951/3282 [01:56<01:22, 16.10it/s][A
Epoch: 1:  60%|█████▉    | 1953/3282 [01:56<01:27, 15.17it/s][A
Epoch: 1:  60%|█████▉    | 1955/3282 [01:56<01:21, 16.24it/s][A
Epoch: 1:  60%|█████▉   

Steps: 1350 Task: frag Avg.Loss: 5.641915095111472e-07 Task Loss: 0.0011960859410464764



Epoch: 1:  65%|██████▍   | 2124/3282 [02:06<01:05, 17.65it/s][A
Epoch: 1:  65%|██████▍   | 2126/3282 [02:06<01:08, 16.79it/s][A
Epoch: 1:  65%|██████▍   | 2128/3282 [02:06<01:07, 17.12it/s][A
Epoch: 1:  65%|██████▍   | 2130/3282 [02:06<01:11, 16.01it/s][A
Epoch: 1:  65%|██████▍   | 2133/3282 [02:07<01:19, 14.41it/s][A
Epoch: 1:  65%|██████▌   | 2136/3282 [02:07<01:05, 17.47it/s][A
Epoch: 1:  65%|██████▌   | 2138/3282 [02:07<01:05, 17.48it/s][A
Epoch: 1:  65%|██████▌   | 2141/3282 [02:07<01:00, 18.92it/s][A
Epoch: 1:  65%|██████▌   | 2144/3282 [02:07<00:53, 21.23it/s][A
Epoch: 1:  65%|██████▌   | 2147/3282 [02:07<01:00, 18.76it/s][A
Epoch: 1:  66%|██████▌   | 2150/3282 [02:08<01:01, 18.47it/s][A
Epoch: 1:  66%|██████▌   | 2153/3282 [02:08<01:03, 17.69it/s][A
Epoch: 1:  66%|██████▌   | 2156/3282 [02:08<00:55, 20.11it/s][A
Epoch: 1:  66%|██████▌   | 2159/3282 [02:08<00:56, 19.99it/s][A
Epoch: 1:  66%|██████▌   | 2162/3282 [02:08<01:04, 17.33it/s][A
Epoch: 1:  66%|██████▌  

Steps: 1400 Task: intent Avg.Loss: 2.0906668396492023e-06 Task Loss: 0.004850347060710192



Epoch: 1:  71%|███████   | 2324/3282 [02:18<00:54, 17.53it/s][A
Epoch: 1:  71%|███████   | 2326/3282 [02:18<00:55, 17.08it/s][A
Epoch: 1:  71%|███████   | 2328/3282 [02:18<00:58, 16.42it/s][A
Epoch: 1:  71%|███████   | 2330/3282 [02:18<01:02, 15.33it/s][A
Epoch: 1:  71%|███████   | 2333/3282 [02:18<01:04, 14.76it/s][A
Epoch: 1:  71%|███████   | 2335/3282 [02:18<01:00, 15.70it/s][A
Epoch: 1:  71%|███████   | 2337/3282 [02:19<01:03, 14.93it/s][A
Epoch: 1:  71%|███████▏  | 2340/3282 [02:19<00:59, 15.87it/s][A
Epoch: 1:  71%|███████▏  | 2342/3282 [02:19<01:01, 15.23it/s][A
Epoch: 1:  71%|███████▏  | 2344/3282 [02:19<00:57, 16.17it/s][A
Epoch: 1:  71%|███████▏  | 2346/3282 [02:19<01:03, 14.67it/s][A
Epoch: 1:  72%|███████▏  | 2349/3282 [02:19<01:01, 15.05it/s][A
Epoch: 1:  72%|███████▏  | 2351/3282 [02:20<00:58, 15.97it/s][A
Epoch: 1:  72%|███████▏  | 2353/3282 [02:20<00:57, 16.07it/s][A
Epoch: 1:  72%|███████▏  | 2356/3282 [02:20<00:55, 16.81it/s][A
Epoch: 1:  72%|███████▏ 

Steps: 1450 Task: NER Avg.Loss: 5.746678289142437e-05 Task Loss: 0.1448162943124771



Epoch: 1:  77%|███████▋  | 2525/3282 [02:30<00:40, 18.64it/s][A
Epoch: 1:  77%|███████▋  | 2528/3282 [02:30<00:38, 19.74it/s][A
Epoch: 1:  77%|███████▋  | 2530/3282 [02:30<00:45, 16.42it/s][A
Epoch: 1:  77%|███████▋  | 2533/3282 [02:30<00:47, 15.75it/s][A
Epoch: 1:  77%|███████▋  | 2536/3282 [02:30<00:40, 18.49it/s][A
Epoch: 1:  77%|███████▋  | 2539/3282 [02:30<00:41, 17.76it/s][A
Epoch: 1:  77%|███████▋  | 2541/3282 [02:31<00:46, 15.86it/s][A
Epoch: 1:  78%|███████▊  | 2544/3282 [02:31<00:44, 16.70it/s][A
Epoch: 1:  78%|███████▊  | 2546/3282 [02:31<00:45, 16.10it/s][A
Epoch: 1:  78%|███████▊  | 2548/3282 [02:31<00:43, 16.70it/s][A
Epoch: 1:  78%|███████▊  | 2550/3282 [02:31<00:45, 15.98it/s][A
Epoch: 1:  78%|███████▊  | 2553/3282 [02:31<00:45, 16.19it/s][A
Epoch: 1:  78%|███████▊  | 2556/3282 [02:31<00:38, 18.83it/s][A
Epoch: 1:  78%|███████▊  | 2558/3282 [02:32<00:41, 17.60it/s][A
Epoch: 1:  78%|███████▊  | 2561/3282 [02:32<00:41, 17.49it/s][A
Epoch: 1:  78%|███████▊ 

Steps: 1500 Task: intent Avg.Loss: 3.584088744901237e-06 Task Loss: 0.009748721495270729



Epoch: 1:  83%|████████▎ | 2725/3282 [02:41<00:32, 17.38it/s][A
Epoch: 1:  83%|████████▎ | 2728/3282 [02:41<00:27, 20.09it/s][A
Epoch: 1:  83%|████████▎ | 2731/3282 [02:41<00:26, 20.63it/s][A
Epoch: 1:  83%|████████▎ | 2734/3282 [02:42<00:31, 17.67it/s][A
Epoch: 1:  83%|████████▎ | 2737/3282 [02:42<00:32, 16.92it/s][A
Epoch: 1:  83%|████████▎ | 2740/3282 [02:42<00:29, 18.36it/s][A
Epoch: 1:  84%|████████▎ | 2742/3282 [02:42<00:31, 17.03it/s][A
Epoch: 1:  84%|████████▎ | 2745/3282 [02:42<00:32, 16.67it/s][A
Epoch: 1:  84%|████████▎ | 2748/3282 [02:42<00:31, 17.01it/s][A
Epoch: 1:  84%|████████▍ | 2750/3282 [02:42<00:30, 17.53it/s][A
Epoch: 1:  84%|████████▍ | 2753/3282 [02:43<00:31, 16.99it/s][A
Epoch: 1:  84%|████████▍ | 2756/3282 [02:43<00:28, 18.57it/s][A
Epoch: 1:  84%|████████▍ | 2758/3282 [02:43<00:29, 17.77it/s][A
Epoch: 1:  84%|████████▍ | 2760/3282 [02:43<00:30, 17.29it/s][A
Epoch: 1:  84%|████████▍ | 2762/3282 [02:43<00:31, 16.67it/s][A
Epoch: 1:  84%|████████▍

Steps: 1550 Task: frag Avg.Loss: 5.361897024158679e-07 Task Loss: 0.001565673854202032



Epoch: 1:  89%|████████▉ | 2924/3282 [02:53<00:22, 15.72it/s][A
Epoch: 1:  89%|████████▉ | 2926/3282 [02:53<00:23, 15.41it/s][A
Epoch: 1:  89%|████████▉ | 2928/3282 [02:53<00:22, 15.91it/s][A
Epoch: 1:  89%|████████▉ | 2930/3282 [02:53<00:23, 15.01it/s][A
Epoch: 1:  89%|████████▉ | 2932/3282 [02:53<00:21, 16.17it/s][A
Epoch: 1:  89%|████████▉ | 2934/3282 [02:53<00:21, 15.93it/s][A
Epoch: 1:  89%|████████▉ | 2937/3282 [02:54<00:21, 16.34it/s][A
Epoch: 1:  90%|████████▉ | 2940/3282 [02:54<00:19, 17.23it/s][A
Epoch: 1:  90%|████████▉ | 2942/3282 [02:54<00:19, 17.59it/s][A
Epoch: 1:  90%|████████▉ | 2944/3282 [02:54<00:19, 17.05it/s][A
Epoch: 1:  90%|████████▉ | 2946/3282 [02:54<00:21, 15.39it/s][A
Epoch: 1:  90%|████████▉ | 2948/3282 [02:54<00:20, 16.30it/s][A
Epoch: 1:  90%|████████▉ | 2950/3282 [02:54<00:21, 15.39it/s][A
Epoch: 1:  90%|████████▉ | 2952/3282 [02:55<00:20, 16.17it/s][A
Epoch: 1:  90%|█████████ | 2954/3282 [02:55<00:22, 14.33it/s][A
Epoch: 1:  90%|█████████