# Parser Gold Mapping

This notebook tries to map the gold standard relations to the corresponding parser predictions. Therefore, we can compare later the different predictions for the same relation.

In [20]:
import json
import numpy as np
import conll16st.aligner as aligner
import conll16st.partial_scorer as ps
import read_write_files as rw

In [21]:
def change_sets_to_list(alignments):
    # This function just changes the TokenIndexSets to lists, 
    # because otherwise some algorithms can't work with them.
    for gold,pred in alignments:
        if gold != None:
            gold["Arg1"]["TokenIndexSet"] = list(gold["Arg1"]["TokenIndexSet"])
            gold["Arg2"]["TokenIndexSet"] = list(gold["Arg2"]["TokenIndexSet"])
        if pred != None:
            pred["Arg1"]["TokenIndexSet"] = list(pred["Arg1"]["TokenIndexSet"])
            pred["Arg2"]["TokenIndexSet"] = list(pred["Arg2"]["TokenIndexSet"])
        
    return alignments

In [22]:
def align_parsers_to_gold(gold_rel,parsers):
    # This function aligns the predicted relations to the gold relations
    # It uses the aligner functions from the conll2016 validator
    
    #param gold_rel     gold standard relations
    #param parsers      list of predicted relations for each parser
    
    #return alignment of gold standard and all predictions and a file which contains all unmappable relations
    # (if a relation is not predicted by a parser there will be a placeholder for it)
    total_alignment = {gold["ID"]:{"gold":gold,"parsers":[]} for gold in gold_rel}
    parsers_not_mappable = []
    
    parser_names = [name for name,parser in parsers]
    
    for name,parser_relations in parsers:
        arg1_alignment, arg2_alignment, relation_alignment = aligner.align_relations(
            gold_rel, 
            parser_relations, 
            0.7)
         
        relation_alignment = change_sets_to_list(relation_alignment)
        for gold_align,pred_align in relation_alignment:
            if gold_align == None:
                parsers_not_mappable += [pred_align]
            else:
                total_alignment[gold_align["ID"]]["parsers"] += [pred_align]
                total_alignment[gold_align["ID"]]["parser_names"] = parser_names
     
    return total_alignment,parsers_not_mappable
        

# Training Set Mapping

We take the gold and predictions and align the relations. Afterwards, we save the total alignments to a json file.

In [27]:
gold_path = "data/gold_standard/test/gold.json"
gold_list = rw.read_json(gold_path)

In [31]:
path = "data/submissions/randomized/test/"
parser_files = rw.get_files_in_directory(path)
parsers = [(filee.split(".")[0],path+filee) for filee in parser_files]

In [29]:
predicted_lists = []

for name,path in parsers:
    predicted_list = rw.read_json(path)
    predicted_lists += [(name,predicted_list)]

In [7]:
total_alignment,not_mappables = align_parsers_to_gold(
    gold_rel=gold_list,
    parsers=predicted_lists)

In [8]:
rw.save_json(total_alignment.values(),"data/project_files/test/total_alignment.json")
rw.save_json(not_mappables,"data/project_files/test/not_mappable.json")

### Mapping for the three best parsers

We wanted to give some statistical information to the three best parsers. Therefore, we created a additional alignment only between these three (oslopots, steven and ecnucs).

----------------------------------

Stephan Oepen, Jonathon Read, Tatjana Scheffler,
Uladzimir Sidarenka, Manfred Stede, Eric
Velldal, and Lilja Ovrelid. 2016. Opt: Oslo–
potsdam–teesside. pipelining rules, rankers, and
classifier ensembles for shallow discourse parsing.
In Proceedings of the Twentieth Conference
on Computational Natural Language Learning:
Shared Task, Berlin, Germany, August. Association
for Computational Linguistics.

Evgeny Stepanov and Giuseppe Riccardi. 2016.
Unitn end-to-end discourse parser for conll 2016
shared task. In Proceedings of the Twentieth
Conference on Computational Natural Language
Learning: Shared Task, Berlin, Germany, August.
Association for Computational Linguistics.

Jianxiang Wang and Man Lan. 2016. Two endto-end
shallow discourse parsers for english and
chinese in conll-2016 shared task. In Proceedings
of the Twentieth Conference on Computational
Natural Language Learning: Shared Task,
Berlin, Germany, August. Association for Computational
Linguistics.

In [32]:
best_parser_files = ["oslopots.json","steven.json","ecnucs.json"]
parser_files = [p_file for p_file in rw.get_files_in_directory(path) if p_file in best_parser_files]
best_parsers = [(filee.split(".")[0],path+filee) for filee in parser_files]
print(best_parsers)
predicted_lists = []

for name,path in best_parsers:
    predicted_list = rw.read_json(path)
    predicted_lists += [(name,predicted_list)]
    
total_alignment,not_mappables = align_parsers_to_gold(
    gold_rel=gold_list,
    parsers=predicted_lists)

rw.save_json(total_alignment.values(),"data/project_files/test/3best_alignment.json")
rw.save_json(not_mappables,"data/project_files/test/3best_not_mappable.json")

[('steven', 'data/submissions/randomized/test/steven.json'), ('oslopots', 'data/submissions/randomized/test/oslopots.json'), ('ecnucs', 'data/submissions/randomized/test/ecnucs.json')]


# Test Set Mapping

Same procedure for the test sets.

In [9]:
gold_path = "data/gold_standard/blind/gold.json"
gold_list = rw.read_json(gold_path)

In [10]:
path = "data/submissions/randomized/blind/"
parser_files = rw.get_files_in_directory(path)
parsers = [(filee.split(".")[0],path+filee) for filee in parser_files]

In [12]:
predicted_lists = []

for name,path in parsers:
    predicted_list = rw.read_json(path)
    predicted_lists += [(name,predicted_list)]

In [13]:
total_alignment,not_mappables = align_parsers_to_gold(
    gold_rel=gold_list,
    parsers=predicted_lists)

In [15]:
rw.save_json(total_alignment.values(),"data/project_files/blind/total_alignment.json")
rw.save_json(not_mappables,"data/project_files/blind/not_mappable.json")