# Voting Systems

In this notebook we compare different voting systems to get the final prediction. Our models were trained on the blind test data, that is not biased by the training process and use the dev-test data to test our model.

## Load Test Data and Models

In [234]:
from read_write_files import read_json,save_json,get_parser_paths
from helper_functions import get_sense_lists,align_parsers_to_gold,plot_confusion_matrix
import numpy as np
import conll16st.scorer as scorer
import conll16st.partial_scorer as partial_scorer
import random
from collections import Counter
import copy

In [235]:
sense_model_path = "data/project_files/test/sense_model_V2.json"
total_alignment_path = "data/project_files/blind/total_alignment.json"
test_data_path = "data/gold_standard/blind/relations.json"
example_parser_path = "data/submissions/sense_only/blind/oslopots/output/output.json"

In [236]:
total_alignments = read_json(total_alignment_path)
sense_model = read_json(sense_model_path)
test_data = read_json(test_data_path)

In [237]:
example_parser = read_json(example_parser_path)

In [238]:
gold_senses,parser_senses,parser_names = get_sense_lists(total_alignments)

## Voting Systems

In [84]:
def voting(parser_preds,parser_names,model,voting_algorithm):
    # Main function that takes the predictions, the models and a voting algorithm and select the voted sense
    
    #param parser_preds     list of predicted senses by all parsers
    #param parser_names     parser names
    #param model      model with reliability values for each sense in each parsers
    #param voting_algorithm     algorithm that takes all predictions and gives back the resulting sense
    
    #return list of the new senses for each relation
    
    new_senses = []
    
    parser_pred_zip = zip(*parser_preds)
    for predictions in parser_pred_zip:
        result = voting_algorithm(predictions,model)
        new_senses += [result]
        
    return new_senses

In [183]:
def best_wins_voting(predictions,model):
    #Voting Alogrithm: Select the sense of the parser with the highest reliability for its sense
    
    probs = []
    sense_predictions = []
    for ind,pred in enumerate(predictions):
        sense_predictions += [pred]
        sense_dic = model[ind]["sense_pred"]
        if (pred == "None") or not (sense_dic.has_key(pred)):
            probs += [0]
        else:
            probs += [sense_dic[pred]["f1"]]
            
    result = np.argmax(probs)
    if sum(probs) == 0:
        result = -1
    
    return result

In [208]:
def max_agreement(predictions,model):
    #Voting Alogrithm: Select the sense, which is predicted by the overall majority of parsers
    
    sense_counter = Counter(predictions)
    best_sense = sense_counter.most_common(1)[0][0]

    if best_sense == "None":
        return -1
    
    return predictions.index(best_sense) 

In [222]:
def prob_maximation(predictions,model):
    #Voting Alogrithm: Select the sense with the highest reliability sum of all parsers
    # (each reliability for each sense of each parser is summed up for each sense)
    # e.g. Sense1 : parser1_reliability = 0.5, parser2_reliability = 0.3 .... sum = 0.8
    #      Sense2 : parser3_reliability = 0.9 .... sum = 0.9
    # ---> select Sense2
    
    sense_probs = {pred:[] for pred in set(predictions)}
    
    for pred,tmp_model in zip(*[predictions,model]):
        if pred != "None":
            sense_probs[pred] += [tmp_model["sense_pred"][pred]["f1"]]
    
    maxi = 0
    best_sense = "None"
    for sense in sense_probs:
        sense_probs[sense] = sum(sense_probs[sense])
        if sense_probs[sense] > maxi:
            best_sense = sense
            maxi = sense_probs[sense]
    

    if best_sense == "None":
        return -1
    else:
        return predictions.index(best_sense)

In [218]:
def three_best_agreement(predictions,model):
    #Voting Algorithm: Select the sense, which is predicted by the majority of the three best parsers
    
    weighting = [tmp_model["weight"] for tmp_model in model]
    
    best_model_indexes = []
    for i in range(3):
        best_model_indexes += [np.argmax(weighting)]
        weighting[np.argmax(weighting)] = 0
    
    new_predictions = [pred for ind,pred in enumerate(predictions) if ind in best_model_indexes]
    
    sense_counter = Counter(new_predictions)
    best_sense = sense_counter.most_common(1)[0][0]

    if best_sense == "None":
        return -1
    
    return predictions.index(best_sense)

In [189]:
best_wins_parsers = voting(parser_senses,parser_names,sense_model,best_wins_voting)

In [190]:
max_agreement_parsers = voting(parser_senses,parser_names,sense_model,max_agreement)

In [225]:
prob_maximation_parsers = voting(parser_senses,parser_names,sense_model,prob_maximation)

In [219]:
three_best_agreement_parsers = voting(parser_senses,parser_names,sense_model,three_best_agreement)

## Exchange new Attributes in Relation File

Only for sense evaluation, because we take the arg span from the gold file to have a clear mapping between the gold and the prediction (only sense is exchanged)

In [215]:
def exchange_sense_values(parser_rel,alignment_list,best_relation_indexes):
    # Exchanges the sense values with a giving relation list, so that only the senses differ (for evaluation)
    
    #param parser_rel     relations for exchanging
    #param alignment_list    list of all aligned relations with the gold standard
    #param best_relation_indexes    which parser prediction should be selected
    
    #return new parser relations with the exchanged sense
    
    relation_senses = {}
    
    for best_parser,alignments in zip(*[best_relation_indexes,alignment_list]):
        
        if best_parser != -1:
            best_sense = alignments["parsers"][best_parser]["Sense"]
            
            relation_senses[alignments["gold"]["ID"]] = best_sense
    
    new_parser_rel = copy.deepcopy(parser_rel)
    for rel in new_parser_rel:
        rel_id = rel["ID"]
        if rel_id in relation_senses:
            rel["Sense"] = relation_senses[rel_id] 

    return new_parser_rel

In [170]:
best_wins_relations = exchange_sense_values(example_parser,total_alignments,best_wins_parsers)

In [161]:
max_agreement_relations = exchange_sense_values(example_parser,total_alignments,max_agreement_parsers)

In [226]:
prob_maximation_relations = exchange_sense_values(example_parser,total_alignments,prob_maximation_parsers)

In [220]:
three_best_agreement_relations = exchange_sense_values(example_parser,total_alignments,three_best_agreement_parsers)

# Evaluation (Senses)

# Comparison to Oslopots Scorer

In [94]:
result = scorer.evaluate(test_data,example_parser)

Explicit connectives         : Precision 1.0000 Recall 0.9874 F1 0.9937
Arg 1 extractor              : Precision 1.0000 Recall 1.0000 F1 1.0000
Arg 2 extractor              : Precision 1.0000 Recall 1.0000 F1 1.0000
Arg1 Arg2 extractor combined : Precision 1.0000 Recall 1.0000 F1 1.0000
Sense classification--------------
*Micro-Average                    precision 0.5485	recall 0.5476	F1 0.5480
Comparison.Concession             precision 1.0000	recall 0.0660	F1 0.1239
Comparison.Contrast               precision 0.2160	recall 0.4909	F1 0.3000
Contingency.Cause.Reason          precision 0.4267	recall 0.4384	F1 0.4324
Contingency.Cause.Result          precision 0.6000	recall 0.3000	F1 0.4000
Contingency.Condition             precision 0.8667	recall 1.0000	F1 0.9286
EntRel                            precision 0.4306	recall 0.7600	F1 0.5497
Expansion.Alternative             precision 1.0000	recall 0.3333	F1 0.5000
Expansion.Conjunction             precision 0.6704	recall 0.7368	F1 0.7021
Ex

## "Best Wins"

This algorithm focus on the highest reliability for its prediction (F1 Score for its predicted sense).

In [171]:
result = scorer.evaluate(test_data,best_wins_relations)

Explicit connectives         : Precision 1.0000 Recall 0.9874 F1 0.9937
Arg 1 extractor              : Precision 1.0000 Recall 1.0000 F1 1.0000
Arg 2 extractor              : Precision 1.0000 Recall 1.0000 F1 1.0000
Arg1 Arg2 extractor combined : Precision 1.0000 Recall 1.0000 F1 1.0000
Sense classification--------------
*Micro-Average                    precision 0.4334	recall 0.4334	F1 0.4334
Comparison.Concession             precision 1.0000	recall 0.0000	F1 0.0000
Comparison.Contrast               precision 0.1360	recall 0.5636	F1 0.2191
Contingency.Cause.Reason          precision 0.4688	recall 0.2027	F1 0.2830
Contingency.Cause.Result          precision 0.5000	recall 0.0204	F1 0.0392
Contingency.Condition             precision 0.7879	recall 1.0000	F1 0.8814
EntRel                            precision 0.4151	recall 0.2200	F1 0.2876
Expansion.Alternative             precision 1.0000	recall 0.3333	F1 0.5000
Expansion.Conjunction             precision 0.4595	recall 0.8369	F1 0.5932
Ex

## "Max Agreement"

This algorithm takes the prediction where most of the parsers agree.

In [133]:
result = scorer.evaluate(test_data,max_agreement_relations)

Explicit connectives         : Precision 1.0000 Recall 0.9874 F1 0.9937
Arg 1 extractor              : Precision 1.0000 Recall 1.0000 F1 1.0000
Arg 2 extractor              : Precision 1.0000 Recall 1.0000 F1 1.0000
Arg1 Arg2 extractor combined : Precision 1.0000 Recall 1.0000 F1 1.0000
Sense classification--------------
*Micro-Average                    precision 0.5559	recall 0.5550	F1 0.5555
Comparison.Concession             precision 1.0000	recall 0.0660	F1 0.1239
Comparison.Contrast               precision 0.2500	recall 0.4909	F1 0.3313
Contingency.Cause.Reason          precision 0.4051	recall 0.4384	F1 0.4211
Contingency.Cause.Result          precision 0.5926	recall 0.3200	F1 0.4156
Contingency.Condition             precision 0.9286	recall 1.0000	F1 0.9630
EntRel                            precision 0.4262	recall 0.7800	F1 0.5512
Expansion.Alternative             precision 1.0000	recall 0.3333	F1 0.5000
Expansion.Conjunction             precision 0.6839	recall 0.7368	F1 0.7094
Ex

## "Prob Maximation"

This algorithm sums up the reliability of all parsers that agree with each other and takes the sense with the highest score.

In [227]:
result = scorer.evaluate(test_data,prob_maximation_relations)

Explicit connectives         : Precision 1.0000 Recall 0.9874 F1 0.9937
Arg 1 extractor              : Precision 1.0000 Recall 1.0000 F1 1.0000
Arg 2 extractor              : Precision 1.0000 Recall 1.0000 F1 1.0000
Arg1 Arg2 extractor combined : Precision 1.0000 Recall 1.0000 F1 1.0000
Sense classification--------------
*Micro-Average                    precision 0.5360	recall 0.5352	F1 0.5356
Comparison.Concession             precision 1.0000	recall 0.0660	F1 0.1239
Comparison.Contrast               precision 0.2347	recall 0.4182	F1 0.3007
Contingency.Cause.Reason          precision 0.4706	recall 0.4384	F1 0.4539
Contingency.Cause.Result          precision 0.5455	recall 0.2449	F1 0.3380
Contingency.Condition             precision 0.9286	recall 1.0000	F1 0.9630
EntRel                            precision 0.3824	recall 0.7150	F1 0.4983
Expansion.Alternative             precision 1.0000	recall 0.3333	F1 0.5000
Expansion.Conjunction             precision 0.6512	recall 0.7377	F1 0.6918
Ex

## "Three Best Agreement"

This parser is similar to the max agreement algorithm, except that it takes only the predictions into account which are coming from the three best parsers.

In [221]:
result = scorer.evaluate(test_data,three_best_agreement_relations)

Explicit connectives         : Precision 1.0000 Recall 0.9874 F1 0.9937
Arg 1 extractor              : Precision 1.0000 Recall 1.0000 F1 1.0000
Arg 2 extractor              : Precision 1.0000 Recall 1.0000 F1 1.0000
Arg1 Arg2 extractor combined : Precision 1.0000 Recall 1.0000 F1 1.0000
Sense classification--------------
*Micro-Average                    precision 0.5423	recall 0.5409	F1 0.5416
Comparison.Concession             precision 0.8571	recall 0.0566	F1 0.1062
Comparison.Contrast               precision 0.2090	recall 0.5091	F1 0.2963
Contingency.Cause.Reason          precision 0.4756	recall 0.5342	F1 0.5032
Contingency.Cause.Result          precision 0.4571	recall 0.3200	F1 0.3765
Contingency.Condition             precision 0.8966	recall 1.0000	F1 0.9455
EntRel                            precision 0.4238	recall 0.6400	F1 0.5100
Expansion.Alternative             precision 1.0000	recall 0.3333	F1 0.5000
Expansion.Conjunction             precision 0.6817	recall 0.7469	F1 0.7128
Ex