# Introduction

This notebook provides a baseline for each setting in [Subtask A of SemEval 2022 Task 2](https://sites.google.com/view/semeval2022task2-idiomaticity#h.qq7eefmehqf9). In addition this provides some helpful pre-processing scripts that you are free to use with your experiments. 

Please start by stepping through this notebook so you have a clear idea as to what is expected of the task and what you need to submit. 

These baselines are based on the results described in the paper “[AStitchInLanguageModels: Dataset and Methods for the Exploration of Idiomaticity in Pre-Trained Language Models](https://arxiv.org/abs/2109.04413)”. 

## Zero-shot setting: Methodology 

Note that in the zero-shot setting you are NOT allowed to train the model using the one-shot data. 

In the zero-shot setting, we choose to include the context (the sentences preceding and succeeding the one containing the idioms). We do not add the idiom as an additional feature (in the “second input sentence”). This is based on the results presented in the dataset paper. 

We use Multilingual BERT for this setting.

## One-shot setting: Methodology

In the one shot setting, we train the model on both the zero-shot and one-shot data. In this setting, we exclude the context (the sentences preceding and succeeding the one containing the idioms) and also add the idiom as an additional feature in the “second sentence”. Again, this is based on the results presented in the dataset paper. 

We also use Multilingual BERT for this setting.


# Setup 

In [1]:
%load_ext autoreload
%autoreload 2

Download the Task data and evaluation scripts

In [None]:
#!git clone https://github.com/H-TayyarMadabushi/SemEval_2022_Task2-idiomaticity.git

Download the “AStitchInLanguageModels” code which we make use of. 

In [None]:
#!git clone https://github.com/H-TayyarMadabushi/AStitchInLanguageModels.git

Download and install an editable version of huggingfaces transformers. 

In [None]:
#!git clone https://github.com/huggingface/transformers.git
#%cd transformers/
#!pip install --editable .
#%cd /content/ 

Required for run_glue ... 

In [None]:
## run_glue needs this. 
!pip install datasets

Editable install requires runtime restart unless we do this. 

In [2]:
import site
site.main()


# Imports and Helper functions

In [3]:
import numpy as np

In [4]:
import os
import csv

from pathlib import Path

In [5]:
def load_csv( path, delimiter=',' ) : 
  header = None
  data   = list()
  with open( path, encoding='utf-8') as csvfile:
    reader = csv.reader( csvfile, delimiter=delimiter ) 
    for row in reader : 
      if header is None : 
        header = row
        continue
      data.append( row ) 
  return header, data


In [6]:
def write_csv( data, location ) : 
  with open( location, 'w', encoding='utf-8') as csvfile:
    writer = csv.writer( csvfile ) 
    writer.writerows( data ) 
  print( "Wrote {}".format( location ) ) 
  return


The following function creates a submission file from the predictions output by run_glue (the text classification script from huggingface transformers - see below). 

Note that we set it up so we can load up results for only one setting. 

It requires as input the submission format file, which is available with the data. You can call this after completing each setting to load up results for both settings (see below).


In [7]:
def insert_to_submission_file( submission_format_file, input_file, prediction_format_file, setting ) :
    submission_header, submission_content = load_csv( submission_format_file )
    input_header     , input_data         = load_csv( input_file             )
    prediction_header, prediction_data    = load_csv( prediction_format_file, '\t' )

    assert len( input_data ) == len( prediction_data )

    ## submission_header ['ID', 'Language', 'Setting', 'Label']
    ## input_header      ['label', 'sentence1' ]
    ## prediction_header ['index', 'prediction']

    prediction_data = list( reversed( prediction_data ) )

    started_insert  = False
    for elem in submission_content : 
        if elem[ submission_header.index( 'Setting' ) ] != setting :
            if started_insert :
                if len( prediction_data ) == 0 :
                    break
                else : 
                    raise Exception( "Update should to contiguous ... something wrong." ) 
            continue
        started_insert = True
        elem[ submission_header.index( 'Label' ) ] = prediction_data.pop()[ prediction_header.index( 'prediction' ) ]

    return [ submission_header ] + submission_content

# Pre-process: Create train and dev and evaluation data in required format

In the zero-shot setting, we choose to include the context (the sentences preceding and succeeding the one containing the idioms). We do not add the idiom as an additional feature (in the “second input sentence”). 

In the one shot setting, we train the model on both the zero-shot and one-shot data. In this setting, we exclude the context (the sentences preceding and succeeding the one containing the idioms) and also add the idiom as an additional feature in the “second sentence”. 


## Functions for pre-processing

### _get_train_data

This function generates training data in the format required by the huggingface’s example script. It will include and exclude the MWE and the context based on parameters. 


In [8]:
def _get_train_data( data_location, file_name, include_context, include_idiom ) :
    
    file_name = os.path.join( data_location, file_name ) 

    header, data = load_csv( file_name )

    out_header = [ 'label', 'sentence1' ]
    if include_idiom :
        out_header = [ 'label', 'sentence1', 'sentence2' ]
        
    # ['DataID', 'Language', 'MWE', 'Setting', 'Previous', 'Target', 'Next', 'Label']
    out_data = list()
    for elem in data :
        label     = elem[ header.index( 'Label'  ) ]
        sentence1 = elem[ header.index( 'Target' ) ]
        if include_context :
            sentence1 = ' '.join( [ elem[ header.index( 'Previous' ) ], elem[ header.index( 'Target' ) ], elem[ header.index( 'Next' ) ] ] )
        this_row = None
        if not include_idiom :
            this_row = [ label, sentence1 ] 
        else :
            sentence2 = elem[ header.index( 'MWE' ) ]
            this_row = [ label, sentence1, sentence2 ]
        out_data.append( this_row )
        assert len( out_header ) == len( this_row )
    return [ out_header ] + out_data

### _get_dev_eval_data

This function generates training dev and eval data in the format required by the huggingface’s example script. It will include and exclude the MWE and the context based on parameters. 

Additionally, if there is no gold label provides (as in the case of eval) it will generate a file that can be used to generate predictions.


In [9]:
def _get_dev_eval_data( data_location, input_file_name, gold_file_name, include_context, include_idiom ) :

    input_headers, input_data = load_csv( os.path.join( data_location, input_file_name ) )
    gold_header  = gold_data = None
    if not gold_file_name is None : 
        gold_header  , gold_data  = load_csv( os.path.join( data_location, gold_file_name  ) )
        assert len( input_data ) == len( gold_data )

    # ['ID', 'Language', 'MWE', 'Previous', 'Target', 'Next']
    # ['ID', 'DataID', 'Language', 'Label']
    
    out_header = [ 'label', 'sentence1' ]
    if include_idiom :
        out_header = [ 'label', 'sentence1', 'sentence2' ]

    out_data = list()
    for index in range( len( input_data ) ) :
        label = 1
        if not gold_file_name is None : 
            this_input_id = input_data[ index ][ input_headers.index( 'ID' ) ]
            this_gold_id  = gold_data [ index ][ gold_header  .index( 'ID' ) ]
            assert this_input_id == this_gold_id
            
            label     = gold_data[ index ][ gold_header.index( 'Label'  ) ]
            
        elem      = input_data[ index ]
        sentence1 = elem[ input_headers.index( 'Target' ) ]
        if include_context :
            sentence1 = ' '.join( [ elem[ input_headers.index( 'Previous' ) ], elem[ input_headers.index( 'Target' ) ], elem[ input_headers.index( 'Next' ) ] ] )
        this_row = None
        if not include_idiom :
            this_row = [ label, sentence1 ] 
        else :
            sentence2 = elem[ input_headers.index( 'MWE' ) ]
            this_row = [ label, sentence1, sentence2 ]
        assert len( out_header ) == len( this_row ) 
        out_data.append( this_row )
        

    return [ out_header ] + out_data


### create_data

This function generates the training, development and evaluation data. 


In [8]:
"""
Based on the results presented in `AStitchInLanguageModels' we work with not including the idiom for the zero shot setting and including it in the one shot setting.
"""
def create_data( input_location, output_location ) :

    
    ## Zero shot data
    train_data = _get_train_data(
        data_location   = input_location,
        file_name       = 'train_zero_shot.csv',
        include_context = True,
        include_idiom   = False
    )
    write_csv( train_data, os.path.join( output_location, 'ZeroShot', 'train.csv' ) )
    
    dev_data = _get_dev_eval_data(
        data_location    = input_location,
        input_file_name  = 'dev.csv',
        gold_file_name   = 'dev_gold.csv', 
        include_context  = True,
        include_idiom    = False
    )        
    write_csv( dev_data, os.path.join( output_location, 'ZeroShot', 'dev.csv' ) )
    
    eval_data = _get_dev_eval_data(
        data_location    = input_location,
        input_file_name  = 'eval.csv',
        gold_file_name   = None , ## Don't have gold evaluation file -- submit to CodaLab
        include_context  = True,
        include_idiom    = False
    )
    write_csv( eval_data, os.path.join( output_location, 'ZeroShot', 'eval.csv' ) )

    test_data = _get_dev_eval_data(
        data_location    = input_location,
        input_file_name  = 'eval.csv',
        gold_file_name   = None , ## Don't have gold evaluation file -- submit to CodaLab
        include_context  = True,
        include_idiom    = False
    )
    write_csv( eval_data, os.path.join( output_location, 'ZeroShot', 'eval.csv' ) )
    

    ## OneShot Data (combine both for training)
    train_zero_data = _get_train_data(
        data_location   = input_location,
        file_name       = 'train_zero_shot.csv',
        include_context = False,
        include_idiom   = True
    )
    train_one_data = _get_train_data(
        data_location   = input_location,
        file_name       = 'train_one_shot.csv',
        include_context = False,
        include_idiom   = True
    )

    assert train_zero_data[0] == train_one_data[0] ## Headers
    train_data = train_one_data + train_zero_data[1:]
    write_csv( train_data, os.path.join( output_location, 'OneShot', 'train.csv' ) )
    
    dev_data = _get_dev_eval_data(
        data_location    = input_location,
        input_file_name  = 'dev.csv',
        gold_file_name   = 'dev_gold.csv', 
        include_context  = False,
        include_idiom    = True
    )        
    write_csv( dev_data, os.path.join( output_location, 'OneShot', 'dev.csv' ) )
    
    eval_data = _get_dev_eval_data(
        data_location    = input_location,
        input_file_name  = 'eval.csv',
        gold_file_name   = None,
        include_context  = False,
        include_idiom    = True
    )
    write_csv( eval_data, os.path.join( output_location, 'OneShot', 'eval.csv' ) )

    return

In [11]:
def create_test_data( input_location, output_location ) :

    test_data = _get_dev_eval_data(
        data_location    = input_location,
        input_file_name  = 'test.csv',
        gold_file_name   = None , ## Don't have gold evaluation file -- submit to CodaLab
        include_context  = True,
        include_idiom    = False
    )
    write_csv( test_data, os.path.join( output_location, 'ZeroShot', 'test.csv' ) )

    return

## Setup and Create data

In [10]:
!ls 

AStitchInLanguageModels  make_submission.sh  SemEval_2022_Task2-idiomaticity
data			 models		     SubTaskA-bert.ipynb
Data			 outputs	     SubTaskA-zeroshot.ipynb
jupyter.10140283.out	 pyproject.toml      SubTaskA-zeroshot-loaded.ipynb
lib			 README.md	     xlm-sentiment
LICENSE			 requirements.txt


In [13]:
outpath = 'Data'

In [15]:
Path( os.path.join( outpath, 'ZeroShot' ) ).mkdir(parents=True, exist_ok=True)
Path( os.path.join( outpath, 'ZeroShotPlus' ) ).mkdir(parents=True, exist_ok=True)
Path( os.path.join( outpath, 'OneShot' ) ).mkdir(parents=True, exist_ok=True)

create_data( 'SemEval_2022_Task2-idiomaticity/SubTaskA/Data/', outpath )

Wrote Data/ZeroShot/train.csv
Wrote Data/ZeroShot/dev.csv
Wrote Data/ZeroShot/eval.csv
Wrote Data/OneShot/train.csv
Wrote Data/OneShot/dev.csv
Wrote Data/OneShot/eval.csv


In [14]:
create_test_data( 'SemEval_2022_Task2-idiomaticity/SubTaskA/TestData/', outpath )

Wrote Data/ZeroShot/test.csv


# Zero Shot Setting

## Train Zero shot

In [16]:
!python AStitchInLanguageModels/Dataset/Task2/Utils/run_glue_f1_macro.py \
    	--model_name_or_path 'bert-base-multilingual-cased' \
    	--do_train \
    	--do_eval \
    	--max_seq_length 128 \
    	--per_device_train_batch_size 32 \
    	--learning_rate 2e-5 \
    	--num_train_epochs 9 \
    	--evaluation_strategy "epoch" \
    	--output_dir models/ZeroShot/0/ \
    	--seed 0 \
    	--train_file      Data/ZeroShot/train.csv \
    	--validation_file Data/ZeroShot/dev.csv \
	    --evaluation_strategy "epoch" \
	    --save_strategy "epoch"  \
	    --load_best_model_at_end \
	    --metric_for_best_model "f1" \
	    --save_total_limit 1

12/16/2021 22:22:40 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_steps=None,
evaluation_strategy=IntervalStrategy.EPOCH,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=True,
group_by_length=False,
hub_model_id=None,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=True,
local_rank=-1,
log_level=-1,
log_level_replica=-1,
log_on_each_node=True,
logging_dir=models/ZeroShot/0/runs/Dec16_22-22-4

- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|█████████████████████████████████████████████| 5/5 [00:02<00:00,  1.90ba/s]
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  2.35ba/s]
12/16/2021 22:23:13 - INFO - __main__ -   Sample 3155 of the training set: {'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1

 11%|████▍                                   | 141/1269 [00:38<04:02,  4.66it/s][INFO|trainer.py:541] 2021-12-16 22:24:27,265 >> The following columns in the evaluation set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: sentence1.
[INFO|trainer.py:2243] 2021-12-16 22:24:27,267 >> ***** Running Evaluation *****
[INFO|trainer.py:2245] 2021-12-16 22:24:27,267 >>   Num examples = 739
[INFO|trainer.py:2248] 2021-12-16 22:24:27,267 >>   Batch size = 8

  0%|                                                    | 0/93 [00:00<?, ?it/s][A
  4%|█▉                                          | 4/93 [00:00<00:02, 38.24it/s][A
  9%|███▊                                        | 8/93 [00:00<00:02, 33.66it/s][A
 13%|█████▌                                     | 12/93 [00:00<00:02, 30.07it/s][A
 17%|███████▍                                   | 16/93 [00:00<00:02, 29.13it/s][A
 22%|█████████▏                                 | 20/93 [00:00<00:02, 31.


  0%|                                                    | 0/93 [00:00<?, ?it/s][A
  4%|█▉                                          | 4/93 [00:00<00:02, 33.20it/s][A
  9%|███▊                                        | 8/93 [00:00<00:02, 32.95it/s][A
 13%|█████▌                                     | 12/93 [00:00<00:02, 33.35it/s][A
 17%|███████▍                                   | 16/93 [00:00<00:02, 32.64it/s][A
 22%|█████████▏                                 | 20/93 [00:00<00:02, 29.93it/s][A
 26%|███████████                                | 24/93 [00:00<00:02, 30.47it/s][A
 30%|████████████▉                              | 28/93 [00:00<00:02, 30.48it/s][A
 34%|██████████████▊                            | 32/93 [00:01<00:01, 31.18it/s][A
 39%|████████████████▋                          | 36/93 [00:01<00:01, 29.67it/s][A
 42%|██████████████████                         | 39/93 [00:01<00:01, 29.60it/s][A
 46%|███████████████████▉                       | 43/93 [00:01<00:01, 30.43

 61%|██████████████████████████▎                | 57/93 [00:01<00:01, 28.90it/s][A
 65%|███████████████████████████▋               | 60/93 [00:02<00:01, 27.67it/s][A
 69%|█████████████████████████████▌             | 64/93 [00:02<00:01, 27.94it/s][A
 72%|██████████████████████████████▉            | 67/93 [00:02<00:00, 28.05it/s][A
 75%|████████████████████████████████▎          | 70/93 [00:02<00:00, 28.15it/s][A
 78%|█████████████████████████████████▊         | 73/93 [00:02<00:00, 28.43it/s][A
 82%|███████████████████████████████████▏       | 76/93 [00:02<00:00, 27.74it/s][A
 86%|████████████████████████████████████▉      | 80/93 [00:02<00:00, 27.97it/s][A
 90%|██████████████████████████████████████▊    | 84/93 [00:02<00:00, 30.83it/s][A
 95%|████████████████████████████████████████▋  | 88/93 [00:02<00:00, 30.78it/s][A
 99%|██████████████████████████████████████████▌| 92/93 [00:03<00:00, 29.65it/s][A
{'eval_loss': 2.116579055786133, 'eval_accuracy': 0.6874154210090637, 'eval_

In [None]:
#from google.colab import drive
#drive.mount('/content/gdrive')

In [None]:
## Create save path
#!mkdir -p /content/gdrive/MyDrive/ColabData/SemEval2022Task2/TaskA/ZeroShot/0/
## Copy saved model.
#!cp -r /content/models/ZeroShot/0/* /content/gdrive/MyDrive/ColabData/SemEval2022Task2/TaskA/ZeroShot/0/

In [None]:
## Bring back saved model here. 
#!mkdir -p /content/models/ZeroShot/0/
# !cp -r /content/gdrive/MyDrive/ColabData/SemEval2022Task2/TaskA/ZeroShot/0/* /content/models/ZeroShot/0/

## Evaluation On Dev Data

In [15]:
!python AStitchInLanguageModels/Dataset/Task2/Utils/run_glue_f1_macro.py \
    	--model_name_or_path 'models/ZeroShot/0' \
    	--do_predict \
    	--max_seq_length 128 \
    	--per_device_train_batch_size 32 \
    	--learning_rate 2e-5 \
    	--num_train_epochs 9 \
    	--evaluation_strategy "epoch" \
    	--output_dir models/ZeroShot/0/eval-dev/ \
    	--seed 0 \
    	--train_file      Data/ZeroShot/train.csv \
    	--validation_file Data/ZeroShot/dev.csv \
      --test_file Data/ZeroShot/dev.csv \
	    --evaluation_strategy "epoch" \
	    --save_strategy "epoch"  \
	    --load_best_model_at_end \
	    --metric_for_best_model "f1" \
	    --save_total_limit 1

01/12/2022 14:16:24 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=True,
do_train=False,
eval_accumulation_steps=None,
eval_steps=None,
evaluation_strategy=IntervalStrategy.EPOCH,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=True,
group_by_length=False,
hub_model_id=None,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=True,
local_rank=-1,
log_level=-1,
log_level_replica=-1,
log_on_each_node=True,
logging_dir=models/ZeroShot/0/eval-dev/runs/Jan1

In [17]:
foo = np.array([[2,1], [3,4]])

In [18]:
np.max(foo, axis=1)/np.sum(foo, axis=1)

array([0.66666667, 0.57142857])

### Use predictions to create the submission file (for dev data)

In [20]:
params = {
    'submission_format_file' : 'SemEval_2022_Task2-idiomaticity/SubTaskA/Data/dev_submission_format.csv' ,
    'input_file'             : 'SemEval_2022_Task2-idiomaticity/SubTaskA/Data/dev.csv'                   ,
    'prediction_format_file' : 'models/ZeroShot/0/eval-dev/test_results_None.txt'                        ,
    }
params[ 'setting' ] = 'zero_shot'

In [21]:
 updated_data = insert_to_submission_file( **params )

In [22]:
!mkdir -p outputs

In [23]:
write_csv( updated_data, 'outputs/zero_shot_dev_formated.csv' ) 

Wrote outputs/zero_shot_dev_formated.csv


### For the development data, we can run evaluation script.

In [19]:
import sys
sys.path.append( 'SemEval_2022_Task2-idiomaticity/SubTaskA/' ) 
from SubTask1Evaluator import evaluate_submission


submission_file = 'outputs/zero_shot_dev_formated.csv'
gold_file       = 'SemEval_2022_Task2-idiomaticity/SubTaskA/Data/dev_gold.csv'

results = evaluate_submission( submission_file, gold_file )
#%reload_ext google.colab.data_table
import pandas as pd
df = pd.DataFrame(data=results[1:], columns=results[0])
df

Unnamed: 0,Settings,Languages,F1 Score (Macro)
0,zero_shot,EN,0.70485
1,zero_shot,PT,0.611721
2,zero_shot,"EN,PT",0.6939
3,one_shot,EN,"(None, None, None)"
4,one_shot,PT,"(None, None, None)"
5,one_shot,"EN,PT","(None, None, None)"


### Creating separate models for English and multi-lingual

In [21]:
import numpy as np

In [22]:
from lib import util

In [23]:
frames = util.load_csv_dataframes('SemEval_2022_Task2-idiomaticity/SubTaskA/Data')
tframes = util.load_csv_dataframes('SemEval_2022_Task2-idiomaticity/SubTaskA/TestData')

In [24]:
zdf = frames['train_zero_shot.csv']
odf = frames['train_one_shot.csv']
ddf = frames['dev.csv']
ddf_gold = frames['dev_gold.csv']
edf = frames['eval.csv']
tdf = tframes['test.csv']

In [25]:
zdf_enindex = zdf['Language'] == 'EN'
dev_enindex = ddf['Language'] == 'EN'
eval_enindex = edf['Language'] == 'EN'
test_enindex = tdf['Language'] == 'EN'

In [26]:
zdf_en = zdf[zdf_enindex].drop(['Language', 'DataID', 'Setting', 'Previous', 'Next'],
                                axis=1).rename(columns={'Label': 'label',
                                                        'Target': 'sentence1',
                                                        'MWE': 'sentence2'})

In [27]:
zdf_oth = zdf[~zdf_enindex].drop(['Language', 'DataID', 'Setting', 'Previous', 'Next'],
                                 axis=1).rename(columns={'Label': 'label',
                                                         'Target': 'sentence1',
                                                         'MWE': 'sentence2'})

In [28]:
ddf['label'] = ddf_gold['Label']

In [29]:
ddf_en = ddf[dev_enindex].drop(['Language', 'ID', 'Previous', 'Next'],
                                axis=1).rename(columns={'Target': 'sentence1', 'MWE': 'sentence2'})
ddf_oth = ddf[~dev_enindex].drop(['Language', 'ID', 'Previous', 'Next'],
                                 axis=1).rename(columns={'Target': 'sentence1', 'MWE': 'sentence2'})

In [30]:
edf['label'] = 1

In [31]:
edf_en = edf[eval_enindex].drop(['Language', 'ID', 'Previous', 'Next'],
                                 axis=1).rename(columns={'Target': 'sentence1', 'MWE': 'sentence2'})
edf_oth = edf[~eval_enindex].drop(['Language', 'ID', 'Previous', 'Next'],
                                   axis=1).rename(columns={'Target': 'sentence1', 'MWE': 'sentence2'})

In [32]:
tdf['label'] = 1

In [33]:
tdf_en = tdf[test_enindex].drop(['Language', 'ID', 'Previous', 'Next'],
                                 axis=1).rename(columns={'Target': 'sentence1', 'MWE': 'sentence2'})
tdf_oth = tdf[~test_enindex].drop(['Language', 'ID', 'Previous', 'Next'],
                                   axis=1).rename(columns={'Target': 'sentence1', 'MWE': 'sentence2'})

In [148]:
zdf_en.to_csv('Data/ZeroShot/train_en_2.csv', index=False)
zdf_oth.to_csv('Data/ZeroShot/train_ot_2.csv', index=False)

In [149]:
ddf_en.to_csv('Data/ZeroShot/dev_en_2.csv', index=False)
ddf_oth.to_csv('Data/ZeroShot/dev_ot_2.csv', index=False)

In [191]:
edf_en.to_csv('Data/ZeroShot/eval_en_2.csv', index=False)
edf_oth.to_csv('Data/ZeroShot/eval_ot_2.csv', index=False)

In [34]:
tdf_en.to_csv('Data/ZeroShot/test_en.csv', index=False)
tdf_oth.to_csv('Data/ZeroShot/test_ot.csv', index=False)

In [22]:
!python AStitchInLanguageModels/Dataset/Task2/Utils/run_glue_f1_macro.py \
    	--model_name_or_path 'bert-base-cased' \
    	--do_train \
    	--do_eval \
    	--max_seq_length 128 \
    	--per_device_train_batch_size 32 \
    	--learning_rate 2e-5 \
    	--num_train_epochs 9 \
    	--evaluation_strategy "epoch" \
    	--output_dir models/ZeroShot/1/ \
    	--seed 0 \
    	--train_file      Data/ZeroShot/train_en_2.csv \
    	--validation_file Data/ZeroShot/dev_en_2.csv \
	    --evaluation_strategy "epoch" \
	    --save_strategy "epoch"  \
	    --load_best_model_at_end \
	    --metric_for_best_model "f1" \
	    --save_total_limit 1

^C


In [171]:
!python AStitchInLanguageModels/Dataset/Task2/Utils/run_glue_f1_macro.py \
    	--model_name_or_path 'bert-base-multilingual-cased' \
    	--do_train \
    	--do_eval \
    	--max_seq_length 128 \
    	--per_device_train_batch_size 32 \
    	--learning_rate 2e-5 \
    	--num_train_epochs 9 \
    	--evaluation_strategy "epoch" \
    	--output_dir models/ZeroShot/2/ \
    	--seed 0 \
    	--train_file      Data/ZeroShot/train_ot_2.csv \
    	--validation_file Data/ZeroShot/dev_ot_2.csv \
	    --evaluation_strategy "epoch" \
	    --save_strategy "epoch"  \
	    --load_best_model_at_end \
	    --metric_for_best_model "f1" \
	    --save_total_limit 1

12/17/2021 00:55:54 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=False,
do_train=True,
eval_accumulation_steps=None,
eval_steps=None,
evaluation_strategy=IntervalStrategy.EPOCH,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=True,
group_by_length=False,
hub_model_id=None,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=True,
local_rank=-1,
log_level=-1,
log_level_replica=-1,
log_on_each_node=True,
logging_dir=models/ZeroShot/2/runs/Dec17_00-55-5

100%|█████████████████████████████████████████████| 1/1 [00:00<00:00, 11.05ba/s]
12/17/2021 00:56:04 - INFO - __main__ -   Sample 788 of the training set: {'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'input_ids': [101, 17607, 11070, 117, 10126, 20114, 50510, 13168, 11782, 10293, 10398, 93898, 10107, 10220, 26561, 183, 16008, 60259, 15088, 10212, 169, 77868, 24283, 10229, 117, 21752, 10266, 25598, 254, 24960, 93262, 119, 102, 77868, 118, 24283, 10229, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 

  0%|                                                    | 0/35 [00:00<?, ?it/s][A
 17%|███████▌                                    | 6/35 [00:00<00:00, 55.08it/s][A
 34%|██████████████▋                            | 12/35 [00:00<00:00, 49.88it/s][A
 51%|██████████████████████                     | 18/35 [00:00<00:00, 48.12it/s][A
 66%|████████████████████████████▎              | 23/35 [00:00<00:00, 47.50it/s][A
 80%|██████████████████████████████████▍        | 28/35 [00:00<00:00, 47.21it/s][A
                                                                                [A
[A{'eval_loss': 1.3511855602264404, 'eval_accuracy': 0.6153846383094788, 'eval_f1': 0.5270192228363997, 'eval_runtime': 0.7423, 'eval_samples_per_second': 367.786, 'eval_steps_per_second': 47.152, 'epoch': 2.0}
 22%|█████████▎                                | 74/333 [00:22<00:49,  5.19it/s]
100%|███████████████████████████████████████████| 35/35 [00:00<00:00, 47.01it/s][A
                                   

[INFO|trainer.py:2073] 2021-12-17 00:57:29,736 >> Deleting older checkpoint [models/ZeroShot/2/checkpoint-148] due to args.save_total_limit
 67%|███████████████████████████▎             | 222/333 [01:29<00:21,  5.23it/s][INFO|trainer.py:541] 2021-12-17 00:57:38,329 >> The following columns in the evaluation set  don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: sentence2, sentence1.
[INFO|trainer.py:2243] 2021-12-17 00:57:38,456 >> ***** Running Evaluation *****
[INFO|trainer.py:2245] 2021-12-17 00:57:38,456 >>   Num examples = 273
[INFO|trainer.py:2248] 2021-12-17 00:57:38,456 >>   Batch size = 8

  0%|                                                    | 0/35 [00:00<?, ?it/s][A
 17%|███████▌                                    | 6/35 [00:00<00:00, 55.29it/s][A
 34%|██████████████▋                            | 12/35 [00:00<00:00, 49.98it/s][A
 51%|██████████████████████                     | 18/35 [00:00<00:00, 48.40it/s][A
 66%|█

[INFO|modeling_utils.py:1058] 2021-12-17 00:58:24,443 >> Model weights saved in models/ZeroShot/2/checkpoint-333/pytorch_model.bin
[INFO|tokenization_utils_base.py:2034] 2021-12-17 00:58:24,445 >> tokenizer config file saved in models/ZeroShot/2/checkpoint-333/tokenizer_config.json
[INFO|tokenization_utils_base.py:2040] 2021-12-17 00:58:24,447 >> Special tokens file saved in models/ZeroShot/2/checkpoint-333/special_tokens_map.json
[INFO|trainer.py:2073] 2021-12-17 00:58:27,596 >> Deleting older checkpoint [models/ZeroShot/2/checkpoint-296] due to args.save_total_limit
[INFO|trainer.py:1409] 2021-12-17 00:58:27,918 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|trainer.py:1418] 2021-12-17 00:58:27,918 >> Loading best model from models/ZeroShot/2/checkpoint-222 (score: 0.590239046972019).
{'train_runtime': 139.1705, 'train_samples_per_second': 75.275, 'train_steps_per_second': 2.393, 'train_loss': 0.10252386098867422, 'epoch': 9.0}
100%|███

### Evaluation for language-separated data

In [26]:
!python AStitchInLanguageModels/Dataset/Task2/Utils/run_glue_f1_macro.py \
    	--model_name_or_path 'models/ZeroShot/1' \
    	--do_predict \
    	--max_seq_length 128 \
    	--per_device_train_batch_size 32 \
    	--learning_rate 2e-5 \
    	--num_train_epochs 9 \
    	--evaluation_strategy "epoch" \
    	--output_dir models/ZeroShot/1/eval-dev/ \
    	--seed 0 \
    	--train_file      Data/ZeroShot/train_en_2.csv \
    	--validation_file Data/ZeroShot/dev_en_2.csv \
      --test_file Data/ZeroShot/dev_en_2.csv \
	    --evaluation_strategy "epoch" \
	    --save_strategy "epoch"  \
	    --load_best_model_at_end \
	    --metric_for_best_model "f1" \
	    --save_total_limit 1

12/20/2021 21:39:27 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=True,
do_train=False,
eval_accumulation_steps=None,
eval_steps=None,
evaluation_strategy=IntervalStrategy.EPOCH,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=True,
group_by_length=False,
hub_model_id=None,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=True,
local_rank=-1,
log_level=-1,
log_level_replica=-1,
log_on_each_node=True,
logging_dir=models/ZeroShot/1/eval-dev/runs/Dec2

12/20/2021 21:40:11 - INFO - __main__ -   ***** Test results None *****
100%|███████████████████████████████████████████| 59/59 [00:01<00:00, 42.20it/s]


In [27]:
!python AStitchInLanguageModels/Dataset/Task2/Utils/run_glue_f1_macro.py \
    	--model_name_or_path 'models/ZeroShot/2' \
    	--do_predict \
    	--max_seq_length 128 \
    	--per_device_train_batch_size 32 \
    	--learning_rate 2e-5 \
    	--num_train_epochs 9 \
    	--evaluation_strategy "epoch" \
    	--output_dir models/ZeroShot/2/eval-dev/ \
    	--seed 0 \
    	--train_file      Data/ZeroShot/train_ot_2.csv \
    	--validation_file Data/ZeroShot/dev_ot_2.csv \
      --test_file Data/ZeroShot/dev_ot_2.csv \
	    --evaluation_strategy "epoch" \
	    --save_strategy "epoch"  \
	    --load_best_model_at_end \
	    --metric_for_best_model "f1" \
	    --save_total_limit 1

12/20/2021 21:40:21 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=True,
do_train=False,
eval_accumulation_steps=None,
eval_steps=None,
evaluation_strategy=IntervalStrategy.EPOCH,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=True,
group_by_length=False,
hub_model_id=None,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=True,
local_rank=-1,
log_level=-1,
log_level_replica=-1,
log_on_each_node=True,
logging_dir=models/ZeroShot/2/eval-dev/runs/Dec2

12/20/2021 21:40:33 - INFO - __main__ -   ***** Test results None *****
100%|███████████████████████████████████████████| 35/35 [00:00<00:00, 44.66it/s]


Combine EN+PT results to Comb.

In [174]:
!awk 'FNR==1 && NR!=1{next;}{print}' models/ZeroShot/1/eval-dev/test_results_None.txt models/ZeroShot/2/eval-dev/test_results_None.txt > models/ZeroShot/1/eval-dev/test_results_Comb.txt

In [175]:
params_z = {
    'submission_format_file' : 'SemEval_2022_Task2-idiomaticity/SubTaskA/Data/dev_submission_format.csv' ,
    'input_file'             : 'SemEval_2022_Task2-idiomaticity/SubTaskA/Data/dev.csv'                   ,
    'prediction_format_file' : 'models/ZeroShot/1/eval-dev/test_results_Comb.txt'                        ,
    }
params_z[ 'setting' ] = 'zero_shot'

In [176]:
updated_data = insert_to_submission_file( **params_z )

In [177]:
write_csv( updated_data, 'outputs/zero_shot_dev_formated_comb.csv' ) 

Wrote outputs/zero_shot_dev_formated_comb.csv


In [21]:
results_comb = evaluate_submission( 'outputs/zero_shot_dev_formated_comb.csv', gold_file )
#%reload_ext google.colab.data_table
pd.DataFrame(data=results_comb[1:], columns=results_comb[0])

Unnamed: 0,Settings,Languages,F1 Score (Macro)
0,zero_shot,EN,0.760457
1,zero_shot,PT,0.590239
2,zero_shot,"EN,PT",0.725061
3,one_shot,EN,"(None, None, None)"
4,one_shot,PT,"(None, None, None)"
5,one_shot,"EN,PT","(None, None, None)"


Get the English results from the English model and Portuguese results from the full model.

In [32]:
dres_0 = util.load_df('models/ZeroShot/0/eval-dev/test_results_None.txt', delimiter="\t")
dres_1 = util.load_df('models/ZeroShot/1/eval-dev/test_results_None.txt', delimiter="\t")

In [45]:
dres_0['index'] = dres_0['index'].astype(int)
dres_1['index'] = dres_1['index'].astype(int)

In [54]:
for i, row in dres_1.iterrows():
    # print(i, row)
    dres_0.loc[i, 'prediction'] = row['prediction']
    # break

In [59]:
dres_0.to_csv('models/ZeroShot/1/eval-dev/test_results_Comb2.txt', index=False, sep="\t")

In [60]:
params_z2 = {
    'submission_format_file' : 'SemEval_2022_Task2-idiomaticity/SubTaskA/Data/dev_submission_format.csv' ,
    'input_file'             : 'SemEval_2022_Task2-idiomaticity/SubTaskA/Data/dev.csv'                   ,
    'prediction_format_file' : 'models/ZeroShot/1/eval-dev/test_results_Comb2.txt'                        ,
    }
params_z2[ 'setting' ] = 'zero_shot'

In [61]:
updated_data = insert_to_submission_file( **params_z2 )
write_csv( updated_data, 'outputs/zero_shot_dev_formated_comb2.csv' ) 

Wrote outputs/zero_shot_dev_formated_comb2.csv


In [62]:
results_comb2 = evaluate_submission( 'outputs/zero_shot_dev_formated_comb2.csv', gold_file )
#%reload_ext google.colab.data_table
pd.DataFrame(data=results_comb2[1:], columns=results_comb2[0])

Unnamed: 0,Settings,Languages,F1 Score (Macro)
0,zero_shot,EN,0.760457
1,zero_shot,PT,0.611721
2,zero_shot,"EN,PT",0.721204
3,one_shot,EN,"(None, None, None)"
4,one_shot,PT,"(None, None, None)"
5,one_shot,"EN,PT","(None, None, None)"


In the end, getting the PT results from the full model didn't seem to improve things.

## Generate Eval Data output

In [28]:
!python AStitchInLanguageModels/Dataset/Task2/Utils/run_glue_f1_macro.py \
    	--model_name_or_path 'models/ZeroShot/0' \
    	--do_predict \
    	--max_seq_length 128 \
    	--per_device_train_batch_size 32 \
    	--learning_rate 2e-5 \
    	--num_train_epochs 9 \
    	--evaluation_strategy "epoch" \
    	--output_dir models/ZeroShot/0/eval-eval/ \
    	--seed 0 \
    	--train_file      Data/ZeroShot/train.csv \
    	--validation_file Data/ZeroShot/dev.csv \
      --test_file Data/ZeroShot/eval.csv \
	    --evaluation_strategy "epoch" \
	    --save_strategy "epoch"  \
	    --load_best_model_at_end \
	    --metric_for_best_model "f1" \
	    --save_total_limit 1

12/20/2021 21:40:36 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=True,
do_train=False,
eval_accumulation_steps=None,
eval_steps=None,
evaluation_strategy=IntervalStrategy.EPOCH,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=True,
group_by_length=False,
hub_model_id=None,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=True,
local_rank=-1,
log_level=-1,
log_level_replica=-1,
log_on_each_node=True,
logging_dir=models/ZeroShot/0/eval-eval/runs/Dec

### Use predictions to create the submission file (for eval data)

In [181]:
params = {
    'submission_format_file' : 'SemEval_2022_Task2-idiomaticity/SubTaskA/Data/eval_submission_format.csv' ,
    'input_file'             : 'SemEval_2022_Task2-idiomaticity/SubTaskA/Data/eval.csv'                   ,
    'prediction_format_file' : 'models/ZeroShot/0/eval-eval/test_results_None.txt'                        ,
    }
params[ 'setting' ] = 'zero_shot'

In [182]:
 updated_data = insert_to_submission_file( **params )

In [183]:
write_csv( updated_data, 'outputs/zero_shot_eval_formated.csv' ) 

Wrote outputs/zero_shot_eval_formated.csv


### Use language-specific models.

In [29]:
!python AStitchInLanguageModels/Dataset/Task2/Utils/run_glue_f1_macro.py \
    	--model_name_or_path 'models/ZeroShot/1' \
    	--do_predict \
    	--max_seq_length 128 \
    	--per_device_train_batch_size 32 \
    	--learning_rate 2e-5 \
    	--num_train_epochs 9 \
    	--evaluation_strategy "epoch" \
    	--output_dir models/ZeroShot/1/eval-eval/ \
    	--seed 0 \
    	--train_file      Data/ZeroShot/train_en_2.csv \
    	--validation_file Data/ZeroShot/dev_en_2.csv \
      --test_file Data/ZeroShot/eval_en_2.csv \
	    --evaluation_strategy "epoch" \
	    --save_strategy "epoch"  \
	    --load_best_model_at_end \
	    --metric_for_best_model "f1" \
	    --save_total_limit 1

12/20/2021 21:40:53 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=True,
do_train=False,
eval_accumulation_steps=None,
eval_steps=None,
evaluation_strategy=IntervalStrategy.EPOCH,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=True,
group_by_length=False,
hub_model_id=None,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=True,
local_rank=-1,
log_level=-1,
log_level_replica=-1,
log_on_each_node=True,
logging_dir=models/ZeroShot/1/eval-eval/runs/Dec

In [30]:
!python AStitchInLanguageModels/Dataset/Task2/Utils/run_glue_f1_macro.py \
    	--model_name_or_path 'models/ZeroShot/2' \
    	--do_predict \
    	--max_seq_length 128 \
    	--per_device_train_batch_size 32 \
    	--learning_rate 2e-5 \
    	--num_train_epochs 9 \
    	--evaluation_strategy "epoch" \
    	--output_dir models/ZeroShot/2/eval-eval/ \
    	--seed 0 \
    	--train_file      Data/ZeroShot/train_ot_2.csv \
    	--validation_file Data/ZeroShot/dev_ot_2.csv \
      --test_file Data/ZeroShot/eval_ot_2.csv \
	    --evaluation_strategy "epoch" \
	    --save_strategy "epoch"  \
	    --load_best_model_at_end \
	    --metric_for_best_model "f1" \
	    --save_total_limit 1

12/20/2021 21:41:05 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=True,
do_train=False,
eval_accumulation_steps=None,
eval_steps=None,
evaluation_strategy=IntervalStrategy.EPOCH,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=True,
group_by_length=False,
hub_model_id=None,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=True,
local_rank=-1,
log_level=-1,
log_level_replica=-1,
log_on_each_node=True,
logging_dir=models/ZeroShot/2/eval-eval/runs/Dec

100%|███████████████████████████████████████████| 35/35 [00:00<00:00, 44.11it/s]


In [194]:
!awk 'FNR==1 && NR!=1{next;}{print}' models/ZeroShot/1/eval-eval/test_results_None.txt models/ZeroShot/2/eval-eval/test_results_None.txt > models/ZeroShot/1/eval-eval/test_results_Comb2.txt


In [198]:
params_ez = {
    'submission_format_file' : 'SemEval_2022_Task2-idiomaticity/SubTaskA/Data/eval_submission_format.csv' ,
    'input_file'             : 'SemEval_2022_Task2-idiomaticity/SubTaskA/Data/eval.csv'                   ,
    'prediction_format_file' : 'models/ZeroShot/1/eval-eval/test_results_Comb2.txt'                        ,
    }
params_ez[ 'setting' ] = 'zero_shot'

In [199]:
 updated_data = insert_to_submission_file( **params_ez )

In [200]:
write_csv( updated_data, 'outputs/zero_shot_eval_formated_comb.csv' ) 

Wrote outputs/zero_shot_eval_formated_comb.csv


**NOTE**: You can submit this file, but it only has results for the zero-shot setting.

# Test data

In [35]:
!python AStitchInLanguageModels/Dataset/Task2/Utils/run_glue_f1_macro.py \
    	--model_name_or_path 'models/ZeroShot/0' \
    	--do_predict \
    	--max_seq_length 128 \
    	--per_device_train_batch_size 32 \
    	--learning_rate 2e-5 \
    	--num_train_epochs 9 \
    	--evaluation_strategy "epoch" \
    	--output_dir models/ZeroShot/0/eval-test/ \
    	--seed 0 \
    	--train_file      Data/ZeroShot/train.csv \
    	--validation_file Data/ZeroShot/dev.csv \
      --test_file Data/ZeroShot/test.csv \
	    --evaluation_strategy "epoch" \
	    --save_strategy "epoch"  \
	    --load_best_model_at_end \
	    --metric_for_best_model "f1" \
	    --save_total_limit 1

01/12/2022 14:28:57 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=True,
do_train=False,
eval_accumulation_steps=None,
eval_steps=None,
evaluation_strategy=IntervalStrategy.EPOCH,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=True,
group_by_length=False,
hub_model_id=None,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=True,
local_rank=-1,
log_level=-1,
log_level_replica=-1,
log_on_each_node=True,
logging_dir=models/ZeroShot/0/eval-test/runs/Jan

In [39]:
tparams = {
    'submission_format_file' : 'SemEval_2022_Task2-idiomaticity/SubTaskA/TestData/test_submission_format.csv' ,
    'input_file'             : 'SemEval_2022_Task2-idiomaticity/SubTaskA/TestData/test.csv'                   ,
    'prediction_format_file' : 'models/ZeroShot/0/eval-test/test_results_None.txt'                        ,
    }
tparams[ 'setting' ] = 'zero_shot'

In [40]:
updated_data = insert_to_submission_file( **tparams )

In [41]:
write_csv( updated_data, 'outputs/zero_shot_test_formated.csv' ) 

Wrote outputs/zero_shot_test_formated.csv


### Use language-specific models.

In [43]:
!python AStitchInLanguageModels/Dataset/Task2/Utils/run_glue_f1_macro.py \
    	--model_name_or_path 'models/ZeroShot/1' \
    	--do_predict \
    	--max_seq_length 128 \
    	--per_device_train_batch_size 32 \
    	--learning_rate 2e-5 \
    	--num_train_epochs 9 \
    	--evaluation_strategy "epoch" \
    	--output_dir models/ZeroShot/1/eval-test/ \
    	--seed 0 \
    	--train_file      Data/ZeroShot/train_en_2.csv \
    	--validation_file Data/ZeroShot/dev_en_2.csv \
      --test_file Data/ZeroShot/test_en.csv \
	    --evaluation_strategy "epoch" \
	    --save_strategy "epoch"  \
	    --load_best_model_at_end \
	    --metric_for_best_model "f1" \
	    --save_total_limit 1

01/12/2022 14:36:35 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=True,
do_train=False,
eval_accumulation_steps=None,
eval_steps=None,
evaluation_strategy=IntervalStrategy.EPOCH,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=True,
group_by_length=False,
hub_model_id=None,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=True,
local_rank=-1,
log_level=-1,
log_level_replica=-1,
log_on_each_node=True,
logging_dir=models/ZeroShot/1/eval-test/runs/Jan

In [44]:
!python AStitchInLanguageModels/Dataset/Task2/Utils/run_glue_f1_macro.py \
    	--model_name_or_path 'models/ZeroShot/2' \
    	--do_predict \
    	--max_seq_length 128 \
    	--per_device_train_batch_size 32 \
    	--learning_rate 2e-5 \
    	--num_train_epochs 9 \
    	--evaluation_strategy "epoch" \
    	--output_dir models/ZeroShot/2/eval-test/ \
    	--seed 0 \
    	--train_file      Data/ZeroShot/train_ot_2.csv \
    	--validation_file Data/ZeroShot/dev_ot_2.csv \
      --test_file Data/ZeroShot/test_ot.csv \
	    --evaluation_strategy "epoch" \
	    --save_strategy "epoch"  \
	    --load_best_model_at_end \
	    --metric_for_best_model "f1" \
	    --save_total_limit 1

01/12/2022 14:41:14 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=True,
do_train=False,
eval_accumulation_steps=None,
eval_steps=None,
evaluation_strategy=IntervalStrategy.EPOCH,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=True,
group_by_length=False,
hub_model_id=None,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=<HUB_TOKEN>,
ignore_data_skip=False,
label_names=None,
label_smoothing_factor=0.0,
learning_rate=2e-05,
length_column_name=length,
load_best_model_at_end=True,
local_rank=-1,
log_level=-1,
log_level_replica=-1,
log_on_each_node=True,
logging_dir=models/ZeroShot/2/eval-test/runs/Jan

In [45]:
!awk 'FNR==1 && NR!=1{next;}{print}' models/ZeroShot/1/eval-test/test_results_None.txt models/ZeroShot/2/eval-test/test_results_None.txt > models/ZeroShot/1/eval-test/test_results_Comb2.txt


In [46]:
t2params = {
    'submission_format_file' : 'SemEval_2022_Task2-idiomaticity/SubTaskA/TestData/test_submission_format.csv' ,
    'input_file'             : 'SemEval_2022_Task2-idiomaticity/SubTaskA/TestData/test.csv'                   ,
    'prediction_format_file' : 'models/ZeroShot/1/eval-test/test_results_Comb2.txt'                        ,
    }
t2params[ 'setting' ] = 'zero_shot'

In [47]:
updated_data = insert_to_submission_file( **t2params )

In [48]:
write_csv( updated_data, 'outputs/zero_shot_test_formated_comb.csv' ) 

Wrote outputs/zero_shot_test_formated_comb.csv


# One Shot Setting

## Train One shot

In [None]:
!python AStitchInLanguageModels/Dataset/Task2/Utils/run_glue_f1_macro.py \
    	--model_name_or_path 'bert-base-multilingual-cased' \
    	--do_train \
    	--do_eval \
    	--max_seq_length 128 \
    	--per_device_train_batch_size 32 \
    	--learning_rate 2e-5 \
    	--num_train_epochs 9 \
    	--evaluation_strategy "epoch" \
    	--output_dir models/OneShot/1/ \
    	--seed 1 \
    	--train_file      Data/OneShot/train.csv \
    	--validation_file Data/OneShot/dev.csv \
	    --evaluation_strategy "epoch" \
	    --save_strategy "epoch"  \
	    --load_best_model_at_end \
	    --metric_for_best_model "f1" \
	    --save_total_limit 1

In [None]:
#from google.colab import drive
#drive.mount('/content/gdrive')

In [None]:
## Create save path
#!mkdir -p /content/gdrive/MyDrive/ColabData/SemEval2022Task2/TaskA/OneShot/1/
## Copy saved model.
#!cp -r /content/models/OneShot/1/* /content/gdrive/MyDrive/ColabData/SemEval2022Task2/TaskA/OneShot/1/

## Evaluation On Dev Data

In [None]:
!python AStitchInLanguageModels/Dataset/Task2/Utils/run_glue_f1_macro.py \
    	--model_name_or_path 'models/OneShot/1' \
    	--do_predict \
    	--max_seq_length 128 \
    	--per_device_train_batch_size 32 \
    	--learning_rate 2e-5 \
    	--num_train_epochs 9 \
    	--evaluation_strategy "epoch" \
    	--output_dir models/OneShot/1/eval-dev/ \
    	--seed 1 \
    	--train_file      Data/OneShot/train.csv \
    	--validation_file Data/OneShot/dev.csv \
      --test_file Data/OneShot/dev.csv \
	    --evaluation_strategy "epoch" \
	    --save_strategy "epoch"  \
	    --load_best_model_at_end \
	    --metric_for_best_model "f1" \
	    --save_total_limit 1

### Use predictions to create the submission file (for dev data)

In [None]:
params = {
    'submission_format_file' : 'outputs/zero_shot_dev_formated.csv' ,
    'input_file'             : 'SemEval_2022_Task2-idiomaticity/SubTaskA/Data/dev.csv'                   ,
    'prediction_format_file' : 'models/OneShot/1/eval-dev/test_results_None.txt'                        ,
    }
params[ 'setting' ] = 'one_shot'

In [None]:
 updated_data = insert_to_submission_file( **params )
 write_csv( updated_data, 'outputs/both_dev_formated.csv' ) 

### For the development data, we can run evaluation script.

In [None]:
import sys
sys.path.append( 'SemEval_2022_Task2-idiomaticity/SubTaskA/' ) 
from SubTask1Evaluator import evaluate_submission


submission_file = 'outputs/both_dev_formated.csv'
gold_file       = 'SemEval_2022_Task2-idiomaticity/SubTaskA/Data/dev_gold.csv'

results = evaluate_submission( submission_file, gold_file )
#%reload_ext google.colab.data_table
import pandas as pd
df = pd.DataFrame(data=results[1:], columns=results[0])
df

## Generate Eval Data output

In [None]:
!python AStitchInLanguageModels/Dataset/Task2/Utils/run_glue_f1_macro.py \
    	--model_name_or_path 'models/OneShot/1' \
    	--do_predict \
    	--max_seq_length 128 \
    	--per_device_train_batch_size 32 \
    	--learning_rate 2e-5 \
    	--num_train_epochs 9 \
    	--evaluation_strategy "epoch" \
    	--output_dir models/OneShot/1/eval-eval/ \
    	--seed 1 \
    	--train_file      Data/OneShot/train.csv \
    	--validation_file Data/OneShot/dev.csv \
      --test_file Data/OneShot/eval.csv \
	    --evaluation_strategy "epoch" \
	    --save_strategy "epoch"  \
	    --load_best_model_at_end \
	    --metric_for_best_model "f1" \
	    --save_total_limit 1

### Use predictions to create the submission file (for eval data)

#### Create One Shot submission

In [None]:
params = {
    'submission_format_file' : 'SemEval_2022_Task2-idiomaticity/SubTaskA/Data/eval_submission_format.csv' ,
    'input_file'             : 'SemEval_2022_Task2-idiomaticity/SubTaskA/Data/eval.csv'                   ,
    'prediction_format_file' : 'models/OneShot/1/eval-eval/test_results_None.txt'                         ,
    }
params[ 'setting' ] = 'one_shot'


In [None]:
 updated_data = insert_to_submission_file( **params )
 write_csv( updated_data, 'outputs/one_shot_eval_formated.csv' ) 

#### Combine Zero Shot and One Shot submission files.

Do this by loading zero shot data as submission file format.

In [None]:
params = {
    'submission_format_file' : 'outputs/zero_shot_eval_formated.csv' ,
    'input_file'             : 'SemEval_2022_Task2-idiomaticity/SubTaskA/Data/eval.csv'                   ,
    'prediction_format_file' : 'models/OneShot/1/eval-eval/test_results_None.txt'                        ,
    }
params[ 'setting' ] = 'one_shot'


In [None]:
 updated_data = insert_to_submission_file( **params )
 write_csv( updated_data, 'outputs/task2_subtaska.csv' ) 

# Download Submission File

In [None]:
from google.colab import files
files.download('/content/outputs/task2_subtaska.csv') 
## Remeber to put this in a folder called "submission".

# Discussion

In [None]:
df

Notice the significant jump in F1 scores with the introduction of just one positive and one negative example. 

Note that your position on the leaderboard will be based on rows with index 2 and 5 (combined results for both languages). The rest of the results for information and ablation studies. 



