# Voting Systems

In this notebook we compare different voting systems to get the final prediction. Our models were trained on the blind test data, that is not biased by the training process and use the dev-test data to test our model.

## Load Test Data and Models

In [1]:
from read_write_files import read_json,save_json,get_parser_paths
from helper_functions import get_sense_lists,align_parsers_to_gold,plot_confusion_matrix
import numpy as np
import conll16st.scorer as scorer
import conll16st.partial_scorer as partial_scorer
import random
from collections import Counter

In [2]:
sense_model_path = "data/project_files/test/sense_model.json"
total_alignment_path = "data/project_files/blind/total_alignment.json"
test_data_path = "data/gold_standard/blind/relations.json"
example_parser_path = "data/submissions/sense_only/blind/oslopots/output/output.json"

In [3]:
total_alignments = read_json(total_alignment_path)
sense_model = read_json(sense_model_path)
test_data = read_json(test_data_path)

In [4]:
example_parser = read_json(example_parser_path)

In [5]:
gold_senses,parser_senses,parser_names = get_sense_lists(total_alignments)

## Voting Systems

In [8]:
def voting(parser_preds,parser_names,model,voting_algorithm):
    new_senses = []
    
    parser_pred_zip = zip(*parser_preds)
    for predictions in parser_pred_zip:
        result = voting_algorithm(predictions,model)
        new_senses += [result]
        
    return new_senses

In [9]:
def best_wins_voting(predictions,model):
    probs = []
    sense_predictions = []
    for ind,pred in enumerate(predictions):
        sense_predictions += [pred]
        sense_dic = model[ind]["sense_pred"]
        if (pred == "None") or not (sense_dic.has_key(pred)):
            probs += [0]
        else:
            probs += [sense_dic[pred]["f1"]]
            
    result = np.argmax(probs)
    
    if sum(probs) == 0:
        result = -1
    
    return result

In [10]:
def max_agreement(predictions,model):
    sense_counter = Counter(predictions)
    best_sense = sense_counter.most_common(1)[0][0]
    
    if best_sense == "None":
        return -1
    
    return predictions.index(best_sense) 

In [11]:
best_wins_parsers = voting(parser_senses,parser_names,sense_model,best_wins_voting)

In [12]:
max_agreement_parsers = voting(parser_senses,parser_names,sense_model,max_agreement)

## Exchange new Attributes in Relation File

Only for sense evaluation, because we take the arg span from the gold file to have a clear mapping between the gold and the prediction (only sense is exchanged)

In [13]:
def exchange_sense_values(parser_rel,alignment_list,best_relation_indexes):
    relation_senses = {}
    
    for best_parser,alignments in zip(*[best_relation_indexes,alignment_list]):
        if best_parser != -1:
            best_sense = alignments["parsers"][best_parser]["Sense"]
            
            relation_senses[alignments["gold"]["ID"]] = best_sense
            
            #best_sense = best_parser_result["Sense"][0]
            #new_rel = alignments["gold"].copy()
            #new_rel["Sense"] = best_sense
    
    
    for rel in parser_rel:
        rel_id = rel["ID"]
        if rel_id in relation_senses:
            rel["Sense"] = relation_senses[rel_id] 
        
    return parser_rel

In [14]:
best_wins_relations = exchange_sense_values(example_parser,total_alignments,best_wins_parsers)

In [15]:
max_agreement_relations = exchange_sense_values(example_parser,total_alignments,max_agreement_parsers)

# Evaluation (Senses)

# Comparison to Oslopots Scorer

In [17]:
result = scorer.evaluate(test_data,example_parser)

Explicit connectives         : Precision 1.0000 Recall 0.9874 F1 0.9937
Arg 1 extractor              : Precision 1.0000 Recall 1.0000 F1 1.0000
Arg 2 extractor              : Precision 1.0000 Recall 1.0000 F1 1.0000
Arg1 Arg2 extractor combined : Precision 1.0000 Recall 1.0000 F1 1.0000
Sense classification--------------
*Micro-Average                    precision 0.5485	recall 0.5476	F1 0.5480
Comparison.Concession             precision 1.0000	recall 0.0660	F1 0.1239
Comparison.Contrast               precision 0.2160	recall 0.4909	F1 0.3000
Contingency.Cause.Reason          precision 0.4267	recall 0.4384	F1 0.4324
Contingency.Cause.Result          precision 0.6000	recall 0.3000	F1 0.4000
Contingency.Condition             precision 0.8667	recall 1.0000	F1 0.9286
EntRel                            precision 0.4306	recall 0.7600	F1 0.5497
Expansion.Alternative             precision 1.0000	recall 0.3333	F1 0.5000
Expansion.Conjunction             precision 0.6704	recall 0.7368	F1 0.7021
Ex

## "Best Wins"

This algorithm focus on the highest reliability for its prediction (F1 Score for its predicted sense).

In [40]:
result = scorer.evaluate(test_data,best_wins_relations)

Explicit connectives         : Precision 1.0000 Recall 0.9874 F1 0.9937
Arg 1 extractor              : Precision 1.0000 Recall 1.0000 F1 1.0000
Arg 2 extractor              : Precision 1.0000 Recall 1.0000 F1 1.0000
Arg1 Arg2 extractor combined : Precision 1.0000 Recall 1.0000 F1 1.0000
Sense classification--------------
*Micro-Average                    precision 0.4334	recall 0.4334	F1 0.4334
Comparison.Concession             precision 1.0000	recall 0.0000	F1 0.0000
Comparison.Contrast               precision 0.1360	recall 0.5636	F1 0.2191
Contingency.Cause.Reason          precision 0.4688	recall 0.2027	F1 0.2830
Contingency.Cause.Result          precision 0.5000	recall 0.0204	F1 0.0392
Contingency.Condition             precision 0.7879	recall 1.0000	F1 0.8814
EntRel                            precision 0.4151	recall 0.2200	F1 0.2876
Expansion.Alternative             precision 1.0000	recall 0.3333	F1 0.5000
Expansion.Conjunction             precision 0.4595	recall 0.8369	F1 0.5932
Ex

## "Max Agreement"

This algorithm takes the prediction where most of the parsers agree.

In [44]:
result = scorer.evaluate(test_data,max_agreement_relations)

Explicit connectives         : Precision 1.0000 Recall 0.9874 F1 0.9937
Arg 1 extractor              : Precision 1.0000 Recall 1.0000 F1 1.0000
Arg 2 extractor              : Precision 1.0000 Recall 1.0000 F1 1.0000
Arg1 Arg2 extractor combined : Precision 1.0000 Recall 1.0000 F1 1.0000
Sense classification--------------
*Micro-Average                    precision 0.5485	recall 0.5476	F1 0.5480
Comparison.Concession             precision 1.0000	recall 0.0660	F1 0.1239
Comparison.Contrast               precision 0.2160	recall 0.4909	F1 0.3000
Contingency.Cause.Reason          precision 0.4267	recall 0.4384	F1 0.4324
Contingency.Cause.Result          precision 0.6000	recall 0.3000	F1 0.4000
Contingency.Condition             precision 0.8667	recall 1.0000	F1 0.9286
EntRel                            precision 0.4306	recall 0.7600	F1 0.5497
Expansion.Alternative             precision 1.0000	recall 0.3333	F1 0.5000
Expansion.Conjunction             precision 0.6704	recall 0.7368	F1 0.7021
Ex

#