Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [None]:
NAME = ""
COLLABORATORS = ""

---

# Introduction
The goal of this assignment is to create a basic program that provides an overview of basic evaluation metrics (in particular precision, recall, f-score) from documents provided in the conll format. Make sure that your code can handle situation when there are no true positives for a specific class.

This notebook provides a suggested structure (through functions) and modules. You may want to make use of additional functions (depending on how you structure your code). 

There is also an advanced version which you can follow if you prefer to structure your code differently.

In [None]:
import pandas as pd
#see tips & tricks on using defaultdict (remove when you do not use it)
from collections import defaultdict

def extract_annotations(inputfile, annotationcolumn, delimiter='\t'):
    '''
    This function extracts annotations represented in the conll format from a file
    
    :param inputfile: the path to the conll file
    :param annotationcolumn: the name of the column in which the target annotation is provided
    :param delimiter: optional parameter to overwrite the default delimiter (tab)
    :type inputfile: string
    :type annotationcolumn: string
    :type delimiter: string
    :returns: the annotations in a structured format of your choice
    '''
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
def obtain_counts(goldannotations, machineannotations):
    '''
    This function compares the gold annotations to machine output
    
    :param goldannotations: the gold annotations
    :param machineannotations: the output annotations of the system in question
    :type goldannotations: the type of the object created in extract_annotations
    :type machineannotations: the type of the object created in extract_annotations
    
    :returns: a countainer providing the counts of true positives, false positives and false negatives for each class
    '''
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
def calculate_precision_recall_fscore(evaluation_counts):
    '''
    Calculate precision recall and fscore for each class and return them in a dictionary
    
    :param evaluation_counts: the true positives, false positives and false negatives for each class
    :type evaluation_counts: type of object returned by obtain_counts
    
    :returns the precision, recall and f-score of each class in a container
    '''
    # YOUR CODE HERE
    raise NotImplementedError()
            
            
            

In [None]:
def carry_out_evaluation(goldannotations, systemfile, systemcolumn, sysdelimiter='\t'):
    '''
    Carries out the evaluation process (from input file to calculating relevant scores)
    
    :param goldannotations: the gold annotations
    :param systemfile: path to file with system output
    :param systemcolumn: indication of column with relevant information
    :param sysdelimiter: specification of formatting of file (default delimiter set to '\t')
    
    returns evaluation information for this specific system
    '''
    # YOUR CODE HERE
    raise NotImplementedError()
    

In [None]:
def provide_output_tables(evaluations):
    '''
    Create tables based on the evaluation of various systems
    
    :param evaluations: the outcome of evaluating one or more systems
    '''
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
def perform_evaluations(goldannotations, systems):
    '''
    Carry out standard evaluation for one or more system outputs
    
    :param goldfile: path to file with goldstandard
    :param systemfiles: required information to find and process system output
    :type goldfile: string
    :type systemfiles: list (providing file name, information on tab with system output and system name for each element)
    
    :returns the evaluations for all systems
    '''
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
def identify_evaluation_value(system, class_label, value_name, evaluations):
    '''
    Return the outcome of a specific value of the evaluation
    
    :param system: the name of the system
    :param class_label: the name of the class for which the value should be returned
    :param value_name: the name of the score that is returned
    :param evaluations: the overview of evaluations
    
    :returns the requested value
    '''
    # YOUR CODE HERE
    raise NotImplementedError()
    
    

# Checking the overall set-up
We will first test the evaluation scripts on a very small data set. You can carry out a small test yourself with the data provided in the data/ folder. A similar hidden test will be carried out by us for which the details (data) remains hidden for you. 
You are going to write a generic function that creates the gold annotations and calls the script to create all subsequent evaluations. 
It then becomes straight-forward to create the final setup of the system, that does not make use of a notebook.

In [None]:
def run_evaluations(goldfile, gold_column, systemfiles):
    '''
    Create an evaluation where the gold is provided by data/minigold.csv and the system output by data/miniout1.csv
    Use the label `system1' for the system
    Label the evaluation values as 'precision', 'recall', 'f-score' (written full-out, all lowercase)
    
    :param goldfile: path to file with gold standard annotations
    :param gold_column: indicator to identify the gold column
    :param systemfiles: list that contains all information needed to process the systems
    
    :returns annotations
    '''
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
def create_system_information(system_information):
    '''
    Takes system information in the form that it is passed on through sys.argv or via a settingsfile
    and returns a list of elements specifying all the needed information on each system output file to carry out the evaluation.
    
    :param system_information is the input as from a commandline or an input file
    '''
    # YOUR CODE HERE
    raise NotImplementedError()
    

In [None]:
from nose.tools import assert_equal

def main():
    # these can come from the commandline using sys.argv for instance
    my_args = ['data/minigold.csv','gold','data/miniout1.csv','NER','system1']
    system_info = create_system_information(my_args[2:])
    evaluations = run_evaluations(my_args[0], my_args[1], system_info)
    provide_output_tables(evaluations)
    check_eval = identify_evaluation_value('system1', 'O', 'f-score', evaluations)
    assert_equal("%.3f" % check_eval,"0.889")

main()

