<a href="https://colab.research.google.com/github/GabHoo/Challenging-SRL/blob/main/Test_Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# SETTING UP
Run the following cells in the Settin Up section to be able to run any of the tests in this notebook with the two suggested models.  
Change the current value for both model path according to your own models location. If No models are found they will be downloaded

In [1]:
# INSTALL AND IMPORTS
"""
pip install allennlp
pip install allennlp-models
pip install -U spaCy
pip install checklist
"""
from allennlp.predictors import Predictor
import allennlp_models.tagging

import json
import os
from utils import *
import utils
import re


### Change the model here:

In [2]:
#LOADING MODEL
model="Bilstm"

if model=="Bert":
    model_name="structured-prediction-srl-bert.2020.12.15.tar.gz"
    path_model="models/"+model_name
elif model=="Bilstm":
    model_name="openie-model.2020.03.26.tar.gz"
    path_model="models/"+model_name
else:
    print("Model not found!")
    exit()

if os.path.exists(path_model):
    print("Model found!")
    predictor = Predictor.from_path(path_model)
else:
    predictor = Predictor.from_path("https://storage.googleapis.com/"+model_name)

#TESTING IF THE MODELS ARE LOADED CORRECTLY

pred=predictor.predict("SRL model was loaded succeffully!")
if pred :
    print((" ").join(pred['words'])) 
    

Model found!




SRL model was loaded succeffully !


# Useful functions


In [3]:
def append_to_results(results_path,your_result):
    """Add results to the score board. 
    your_Results neets to be a dictionary alraedy'
    """
    if type(your_result) != dict:
        print("your_result needs to be a dictionary!")
        return
    
    with open(results_path, 'r') as f:
        data = json.load(f)
    
    data.update(your_result)

    with open(results_path, 'w') as f:
        json.dump(data, f,indent=4)


# PREDICATE IDENTIFICATION

In [12]:
#In this folder we have the data for the predicate classidication task. Change the path accordingly if files were moved.
path="./Data/Predicate_identification/"
results_path=f"./results_{model}_Predicate_Identification.json"
## we initialize also the dictionary that will contain the results
with open(results_path, 'w') as f:
    json.dump({"test":"failure_rate"},f,indent=4)

## VOCABULARY+POS

### Test for contractions

In [13]:
data=json.load(open(path+'contracted_predicates.json'))
failure_rate=evaluate_PI_contractions_INV(predictor,data)
print(f"Failure rate: {failure_rate}. Total number of tests: {len(data)}")
append_to_results(results_path,{"contractions":failure_rate})

Failure rate: 0.0. Total number of tests: 18


### Test for irregular inflections

In [14]:
data=json.load(open(path+"inflected_predicates.json"))
failure_rate=evaluate_PI_inflections_MFT(predictor,data)
print(f"Failure rate: {failure_rate}. Total number of tests: {len(data)}")
append_to_results(results_path,{"inflected_sentences":failure_rate})

Failed for: Gary vext an innocent soul. did not detect vext

Failure rate: 2.857142857142857. Total number of tests: 35


## ROBUTSTNESS

In [15]:
data=json.load(open(path+"verb_typos_sentence.json"))
failure_rate=evaluate_PI_inflections_MFT(predictor,data)
print(f"Failure rate: {failure_rate}. Total number of tests: {len(data)}")
append_to_results(results_path,{"inflected_sentences":failure_rate})


Failed for: They aet a cool lot. did not detect aet

Failure rate: 1.8867924528301887. Total number of tests: 53


## AMBIGUITY

### Experiment with polysemic verbs [POLISEMIC]

In [16]:
data=json.load(open(path+'polysem_verbs_sentences.json'))
failure_rate=evaluate_PI_Polysem_DIR(predictor,data)
print(f"Failure rate: {failure_rate}. Total number of tests: {len(data)}")
append_to_results(results_path,{"polysemic_verbs":failure_rate})



Failed for: I always turn left at the stop sign. ... [left] found as a verb 
Failure rate: 3.4482758620689653. Total number of tests: 29


### Experiment with verbs being in different roles -ing [GERUNDS]

In [17]:
data=json.load(open(path+"gerunds.json"))
failure_rate=find_roleset_MFT(data,predictor,verboose=True)
print(f"\nFailure rate: {failure_rate}. Total number of tests: {len(data)}")
append_to_results(results_path,{"gerunds":failure_rate})

[paitings] not detected from 'From paintings of his done during that early period', only ['done'] were found
[writing] not detected from 'The writing of the thesis took a whole year.', only ['took'] were found
[singning] not detected from 'The singing of the birds was a welcome sound in the morning.', only ['was'] were found
[reading] not detected from 'What is your reading on the situation?', only ['is'] were found
[cleaning] not detected from 'The cleaning of the house was a tedious chore.', only ['was'] were found
[washing] not detected from 'The washing of the dishes was a never-ending job.', only ['was', 'ending'] were found
[cooking] not detected from 'I have always enjoyed cooking, it is one of my favourite hobbies.', only ['have', 'enjoyed', 'is'] were found
[building] not detected from 'The building of the bridge was a remarkable feat of engineering.', only ['was'] were found
[cutting] not detected from 'The cutting of the cake was the highlight of the party.', only ['was'] we

## RARITY

### SLANG

In [18]:
data=json.load(open(path+"verbs_slang.json"))
failure_rate=evaluate_PI_inflections_MFT(predictor,data)
print(f"Failure rate: {failure_rate}. Total number of tests: {len(data)}")
append_to_results(results_path,{"slang_verbs":failure_rate})


Failed for: I'm gonna meet my friends at the mall. did not detect gonna

Failed for: I gotta finish my homework before I can go out. did not detect gotta

Failed for: Gimme a slice of pizza, please. did not detect gimme

Failed for: Lemme know if you need any help. did not detect lemme

Failed for: I'm tryna get in shape for summer. did not detect tryna

Failed for: Ima buy a new car next month. did not detect Ima

Failed for: I Needa take a break from work. did not detect Needa

Failed for: I hafta leave early today for a doctor's appointment. did not detect Hafta

Failed for: Whatcha doing this weekend? did not detect Whatcha

Failed for: C'mon, let's go to the park. did not detect C'mon

Failure rate: 83.33333333333334. Total number of tests: 12


### NEW WORDS

In [19]:
data=json.load(open(path+"new_verbs.json"))
failure_rate=evaluate_PI_inflections_MFT(predictor,data)
print(f"Failure rate: {failure_rate}. Total number of tests: {len(data)}")
append_to_results(results_path,{"new_verbs":failure_rate})


Failed for: I'm binge-watching the new TV series this weekend. did not detect binge-watching

Failure rate: 10.0. Total number of tests: 10


# AROUGMENTS CLASSIFICATION

In [21]:
#In this folder we have the data for the predicate classidication task. Change the path accordingly if files were moved.
path="./Data/Argument_classification/"
results_path=f"./results_{model}_Argument_Classification.json"
## we initialize also the dictionary that will contain the results
with open(results_path, 'w') as f:
    json.dump({"test_name":"failure_rate"},f,indent=4)

## VOCABULAIRTY+POS

###  Entity

In [22]:
with open(path+f"FirstNames_sents.json", 'r') as f:
        sentences = json.load(f)

labels=sentences["labels"]
sentences=sentences['data']
failure=eval_full_sent_BIOtags(sentences,labels,predictor,verbose=True)

print(f"\nRate of failure: ",failure,"Total number of example: ",len(sentences),"\n")

append_to_results(results_path,{f"FistNames":failure})



 FAILED FOR Sentence:  George went with Jim to the market
Predicted BIO tags:  ['B-ARG1', 'B-V', 'B-ARGM-COM', 'I-ARGM-COM', 'B-ARG4', 'I-ARG4', 'I-ARG4']
True BIO tags:  ['B-ARG0', 'B-V', 'B-ARGM-COM', 'I-ARGM-COM', 'B-ARG4', 'I-ARG4', 'I-ARG4']

 FAILED FOR Sentence:  Adam went with Donald to the mall
Predicted BIO tags:  ['B-ARG1', 'B-V', 'B-ARGM-COM', 'I-ARGM-COM', 'B-ARG4', 'I-ARG4', 'I-ARG4']
True BIO tags:  ['B-ARG0', 'B-V', 'B-ARGM-COM', 'I-ARGM-COM', 'B-ARG4', 'I-ARG4', 'I-ARG4']

 FAILED FOR Sentence:  Marie went with Francis to the mosque
Predicted BIO tags:  ['B-ARG0', 'B-V', 'B-ARGM-MNR', 'I-ARGM-MNR', 'B-ARG4', 'I-ARG4', 'I-ARG4']
True BIO tags:  ['B-ARG0', 'B-V', 'B-ARGM-COM', 'I-ARGM-COM', 'B-ARG4', 'I-ARG4', 'I-ARG4']

 FAILED FOR Sentence:  Nicole went with Eleanor to the bakery
Predicted BIO tags:  ['B-ARG1', 'B-V', 'B-ARGM-COM', 'I-ARGM-COM', 'B-ARG4', 'I-ARG4', 'I-ARG4']
True BIO tags:  ['B-ARG0', 'B-V', 'B-ARGM-COM', 'I-ARGM-COM', 'B-ARG4', 'I-ARG4', 'I-ARG4']

 

### Pronouns

In [24]:
with open(path+f"Pronouns_sents.json", 'r') as f:
        sentences = json.load(f)

labels=sentences["labels"]
sentences=sentences['data']
failure=eval_full_sent_BIOtags(sentences,labels,predictor,verbose=True)

print(f"\nRate of failure: ",failure,"Total number of example: ",len(sentences),"\n")

append_to_results(results_path,{f"Pronouns":failure})



 FAILED FOR Sentence:  We went with them to the Capitol
Predicted BIO tags:  ['B-ARG0', 'B-V', 'B-ARGM-MNR', 'I-ARGM-MNR', 'B-ARG4', 'I-ARG4', 'I-ARG4']
True BIO tags:  ['B-ARG0', 'B-V', 'B-ARGM-COM', 'I-ARGM-COM', 'B-ARG4', 'I-ARG4', 'I-ARG4']

 FAILED FOR Sentence:  We went with them to the war
Predicted BIO tags:  ['B-ARG0', 'B-V', 'B-ARGM-MNR', 'I-ARGM-MNR', 'B-ARG4', 'I-ARG4', 'I-ARG4']
True BIO tags:  ['B-ARG0', 'B-V', 'B-ARGM-COM', 'I-ARGM-COM', 'B-ARG4', 'I-ARG4', 'I-ARG4']

 FAILED FOR Sentence:  She went with them to the movie
Predicted BIO tags:  ['B-ARG0', 'B-V', 'B-ARGM-MNR', 'I-ARGM-MNR', 'B-ARG4', 'I-ARG4', 'I-ARG4']
True BIO tags:  ['B-ARG0', 'B-V', 'B-ARGM-COM', 'I-ARGM-COM', 'B-ARG4', 'I-ARG4', 'I-ARG4']

Rate of failure:  3.0 Total number of example:  100 



## AMBIGUITY/TAXONOMY (PP-ATTACHMENT AMBIGUITY)

### PP-ATTACHMENT AMBIGUITY INV

In [27]:
with open(path+f"Inv_PPattachments.json", 'r') as f:
        sentences = json.load(f)

In [28]:
rate=eval_PP_INV(sentences,predictor,verbose=True)
print(f"\nRate of failure: ",rate,"Total number of example: ",len(sentences),"\n")
append_to_results(results_path,{f"PP_INV":rate})


Input sentences: I went to the resturant by the Hutson
Predicted labels for PP: ['I-ARG4', 'B-ARGM-MNR', 'I-ARGM-MNR'] but should have been ['I-ARG4', 'I-ARG4', 'I-ARG4']
Input sentences: I fixed the car with a red logo
Predicted labels for PP: ['B-ARG2', 'I-ARG2', 'I-ARG2', 'I-ARG2'] but should have been ['I-ARG1', 'I-ARG1', 'I-ARG1', 'I-ARG1']
Input sentences:  I bought a computer with bitcoins
Predicted labels for PP: ['I-ARG1', 'I-ARG1'] but should have been ['B-ARGM-MNR', 'I-ARGM-MNR']
Input sentences: Lukas helps a man with a disability
Predicted labels for PP: ['B-ARG1', 'I-ARG1', 'I-ARG1'] but should have been ['I-ARG2', 'I-ARG2', 'I-ARG2']
Input sentences: I drink whiskey with soda
Predicted labels for PP: ['B-ARGM-MNR', 'I-ARGM-MNR'] but should have been ['I-ARG1', 'I-ARG1']
Input sentences: They Buy a house in cash
Predicted labels for PP: ['I-ARG1', 'I-ARG1'] but should have been ['B-ARGM-MNR', 'I-ARGM-MNR']

Rate of failure:  100.0 Total number of example:  6 



#### Big PP test

In [32]:
with open(path+"PP_proceesed_test.json","r") as f:
    di=json.load(f)


rate,total=eval_PP_MFT(di,predictor,verbose=False)
print(f"\nRate of failure: ",rate,"Total number of example: ",total,"\n")
append_to_results(results_path,{f"PP_MFT":rate})


Rate of failure:  19.256550883607556 Total number of example:  1641 



## SPAN IDENTIFICATION

### LONG NER

In [29]:

with open(path+"NER_sentences.json","r") as f:
    di=json.load(f)

golden=di['labels']
sentences=di['data']

rate=eval_full_sent_BIOtags(sentences,golden,predictor,verbose=False)
print(f"\nFailure rate: {rate}. Total number of tests: {len(sentences)}")
append_to_results(results_path,{f"NER_sents":rate})



Failure rate: 57.99999999999999. Total number of tests: 100


### LONG SPAN ADJECTIVES

In [30]:
with open(path+"longspan_sents.json","r") as f:
    di=json.load(f)
sents=di['data']
sents=sents[:30]
start,end=di["indexes"]
rate=eval_spanDetection(sents,start,end,predictor,verbose=False)
print(f"\nFailure rate: {rate}. Total number of tests: {len(sents)}")
append_to_results(results_path,{f"longspan":rate})

{'verb': 'is', 'description': 'The Taoist Hungarian asexual friend of mine [V: is] dying too', 'tags': ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-V', 'O', 'O']}
['O', 'O', 'O', 'O', 'O', 'O', 'O']
{'verb': 'dying', 'description': '[ARG1: The Taoist Hungarian asexual friend of mine] is [V: dying] [ARGM-ADV: too]', 'tags': ['B-ARG1', 'I-ARG1', 'I-ARG1', 'I-ARG1', 'I-ARG1', 'I-ARG1', 'I-ARG1', 'O', 'B-V', 'B-ARGM-ADV']}
['ARG1', 'ARG1', 'ARG1', 'ARG1', 'ARG1', 'ARG1', 'ARG1']
{'verb': 'is', 'description': '[ARG1: The Christian Kenyan non - binary friend of mine] [V: is] [ARG2: dead :(]', 'tags': ['B-ARG1', 'I-ARG1', 'I-ARG1', 'I-ARG1', 'I-ARG1', 'I-ARG1', 'I-ARG1', 'I-ARG1', 'I-ARG1', 'B-V', 'B-ARG2', 'I-ARG2']}
['ARG1', 'ARG1', 'ARG1', 'ARG1', 'ARG1', 'ARG1', 'ARG1']
{'verb': 'is', 'description': '[ARG1: The Hindu Kittitian or Nevisian bisexual friend of mine] [V: is] [ARG2: … more]', 'tags': ['B-ARG1', 'I-ARG1', 'I-ARG1', 'I-ARG1', 'I-ARG1', 'I-ARG1', 'I-ARG1', 'I-ARG1', 'I-ARG1', 'B-V', 'B

## ROBUSTNESS

### Typos

In [25]:
for i in range(1,5):#becuase we have 4 files like that

    with open(path+f"sents_{i}_typos.json", 'r') as f:
        sentences = json.load(f)

    labels=sentences.pop("labels")
    sentences=list(sentences.values())
    rate_typos_n=eval_full_sent_BIOtags(sentences,labels,predictor,verbose=False)

    print(f"Rate of failure with {i} typos per sentence: ",rate_typos_n)

    append_to_results(results_path,{f"sents_{i}_typos":rate_typos_n})


Rate of failure with 1 typos per sentence:  16.0
Rate of failure with 2 typos per sentence:  38.0
Rate of failure with 3 typos per sentence:  47.0
Rate of failure with 4 typos per sentence:  52.0


## PARAPHRASING

### passive and active trasnformarion

In [31]:
with open(path+"activepassive_sentences.json", 'r') as f:
    sentences = json.load(f)


labelsActive=sentences.pop("labelsActive")
labelsPassive=sentences.pop("labelsPassive")

rate=eval_full_sent_BIOtags_INV(sentences,labelsActive,labelsPassive,predictor,verbose=False)

print(f"\nRate of failure: ",rate,"% Total number of example: ",len(sentences),"\n")

append_to_results(results_path,{f"ActivePassive":rate})


Error
[ARG0: The waiter] [V: served] [ARG1: the meal] . != ['B-ARG0', 'I-ARG0', 'B-V', 'B-ARG1', 'I-ARG1', 'O']
[ARG2: The meal] was [V: served] [ARG0: by the waiter] . != ['B-ARG1', 'I-ARG1', 'O', 'B-V', 'B-ARG0', 'I-ARG0', 'I-ARG0', 'O']




Rate of failure:  5.0 % Total number of example:  20 

