## Experiments on Cross Lingual Transfer for Intent Detection

The first step is to prepare the data from Schuster et al. For now we are only examining English and Spanish datasets, since preprocessing Thai requires extra steps and is slightly more complex(tokenization). Firstly, we parse the tsv data into dataframes

In [1]:
from util import *
import pickle
import sklearn
import torch
import numpy as np

In [2]:

mapping = {}
with open('label_map.json','r') as f:
    mapping = json.load(f)
    mapping = {int(k):v for k,v in mapping.items()}
    
    
# preprocess training and test files to pandas df

# eng train
en_df, en_mapping = df_format(("/home/santi/BA/multilingual_task_oriented_dialog_slotfilling/en/train-en.tsv"),mapping)
en_df.to_pickle("training_files/en_train.p")

# eng eval
en_df_eval, en_mapping = df_format("/home/santi/BA/multilingual_task_oriented_dialog_slotfilling/en/eval-en.tsv",mapping)
en_df_eval.to_pickle("training_files/en_eval.p")

# eng test
en_df_test, en_mapping = df_format("/home/santi/BA/multilingual_task_oriented_dialog_slotfilling/en/test-en.tsv",mapping)
en_df_test.to_pickle("training_files/en_test.p")

# es train
es_df, es_mapping = df_format("/home/santi/BA/multilingual_task_oriented_dialog_slotfilling/es/train-es.tsv",mapping)
es_df.to_pickle("training_files/es_train.p")

# es eval
es_df_eval, es_mapping = df_format("/home/santi/BA/multilingual_task_oriented_dialog_slotfilling/es/eval-es.tsv",mapping)
es_df_eval.to_pickle("training_files/es_eval.p")

# es test
es_df_eval, es_mapping = df_format("/home/santi/BA/multilingual_task_oriented_dialog_slotfilling/es/test-es.tsv",mapping)
es_df_eval.to_pickle("training_files/es_test.p")

FileNotFoundError: [Errno 2] No such file or directory: 'label_map.json'

For now we treat the different subcategories as different tags. It maybe possible to employ a multi-labeled or layered setup.
The labels are mapped to a unique integer for ease of training. 

In [None]:
mapping

In [None]:
en_df.head()

In [None]:
es_df.head()

For some reason the dataset contains a significant amount of duplicates. We remove all instances of duplicates and take a look at the data. 

In [None]:
en_train = en_df.drop_duplicates("text")
es_train = es_df.drop_duplicates("text")
en_eval = en_df_eval.drop_duplicates("text")
es_eval = es_df_eval.drop_duplicates("text")

In [None]:
es_train

In [None]:
len(en_train)


In [None]:
len(es_train)

Training data for english is magnitudes of order larger than the spanish dataset. 

In [None]:
def avg_sent_l(df):
    return sum([len(l.split()) for l in df["text"]])/len(df)

def lexical_diversity(df):
    lexes = set()
    for l in df["text"]:
        for w in l.split():
            lexes.add(w)
    return len(lexes), lexes


In [None]:
print("average sentence length")
print(avg_sent_l(en_train))
print(avg_sent_l(es_train))

In [None]:
print("unique tokens")
print(lexical_diversity(en_train)[0])
print(lexical_diversity(es_train)[0])

intersection of train and test

utterances in eval are mutually exclusive between train and eval for english and (mostly) for spanish

In [None]:
print("unique utterances en")
unique_sents = []
for sent in en_eval["text"]:
    if sent not in en_train["text"]:
        unique_sents.append(sent)
print(len(unique_sents))

In [None]:
unique_sents = []
for sent in es_eval["text"]:
    if sent not in es_train["text"]:
        unique_sents.append(sent)

In [None]:
print("unique utterances es")
print(len(unique_sents))

However there is no variation in vocabulary whatsoever. #TODO show this again
We now move on to the actual experiments. We begin by establishing a few baselines for the performances.\

0. Bert train En test En
1. XLM/XLMR Intent Detection (sequence classification)
    1. train 0 test En
    2. train 0 test Es
    3. train En test En
    4. train Es test Es
    5. train En test es
    6. train En + Es test Es
2. XLMR (Token Classification) Slotfilling
    1. train 0 test En
    2. train 0 test Es
    3. train En test En
    4. train Es test Es
    5. train En test es
    6. train En + Es test Es
  
    

We load up a pretrained XLM model with a Max Ent layer for classification. Arguments are left pretty vanilla except fp16 which is not relevant for the results. 

In [None]:
from simpletransformers.classification import ClassificationModel
macro = lambda x,y:  sklearn.metrics.f1_score(x,y, average= 'macro')
micro = lambda x,y:  sklearn.metrics.f1_score(x,y, average= 'micro')

In [None]:
args={"fp16": False,'learning_rate':1e-5, 'num_train_epochs': 2, 'reprocess_input_data': True, 'overwrite_output_dir': True}

In [None]:
#checkpoint-14370-epoch-5# equal to from_pretrained in huggingface library
model_en= ClassificationModel('xlm','xlm-mlm-xnli15-1024', num_labels=12, args=args)

In [None]:
results, a, b = model_en.eval_model(en_eval, macro=macro, micro=micro)

In [None]:
print(results)

In [None]:
for t in b:
    print(t.text_a)

As expected the results from the un fine-tuned models are abysmal. Now we examine the results with a model I finetuned on the whole english training dataset for 5 epochs

In [None]:
model_en.train_model(en_train,output_dir = "models/")

In [None]:
results, a, b = model_en.eval_model(en_eval, macro=macro, micro=micro)

In [None]:
print(results)

In [None]:
# use this for reloading
#models/checkpoint-14370-epoch-5 for en only
#model = ClassificationModel('xlm','models/checkpoint-14370-epoch-5', num_labels=12, args=args)

In [None]:
#results, a, b = model.eval_model(en_eval, macro=macro, micro=micro)

In [None]:
print(results)

In [None]:
es_train

In [None]:
res,a,b= model.eval_model(es_eval, macro=macro, micro=micro)

In [None]:
print(res)

In [None]:
print(mapping)

In [None]:
print(len(es_eval))

In [None]:
print(model.model)

In [None]:
print(len(a))

In [None]:
es_eval.head(50)

In [None]:
print(len(b))
dom_corr = 0
for  t in b:
    
    lab_pred = mapping[np.argmax(a[t.guid])]
    lab_true = mapping[t.label]
    dom_pred = lab_pred.split("/")[0]
    dom_true = lab_true.split("/")[0]
    if dom_pred == dom_true:
        dom_corr += 1
    print(t.guid)
    print(t.text_a,"\t" ,lab_pred,"\t", lab_true,"\t", dom_pred,"\t", dom_true)



In [None]:
wrongs = [(inp.text_a,inp.label) for inp in b]

In [None]:
wrong_preds , vecs = model.predict([t for t,l in wrongs])

In [None]:
dom_corr = 0
weak_dom = 0
rem_alarms = ["reminder","alarm"]

for (text, lab_true), lab_pred in zip(wrongs,wrong_preds):
    
    lab_pred = mapping[lab_pred]
    lab_true = mapping[lab_true]
    dom_pred = lab_pred.split("/")[0]
    dom_true = lab_true.split("/")[0]
    
    if dom_pred == dom_true:
        dom_corr += 1
        
    if (dom_pred in rem_alarms) and (dom_true in rem_alarms):
        weak_dom += 1    
        
        
    print(text,"\t" ,lab_pred,"\t", lab_true,"\t", dom_pred,"\t", dom_true)



In [None]:
print(dom_corr/len(b))

In [None]:
print(weak_dom/len(b))

In [None]:
import 

In [None]:
model.predict(["ตั้ง นาฬิกา ปลุก", "set an alarm"])

In [None]:
weak_dom

In [None]:
pred, arr = model.predict(["hello"])

In [None]:
pred

In [None]:
np.argmax(arr)

In [None]:
model.train_model(es_train,output_dir = "spanish/")

In [None]:
model_es= ClassificationModel('xlm','xlm-mlm-xnli15-1024', num_labels=12, args=args)

In [None]:
model_es.train_model(es_train,output_dir = "spanish/")

In [None]:
model_es.eval_model(es_eval, macro=macro, micro=micro)

In [None]:
model_es.train_model(en_train,output_dir = "spanish/")