# tieval: An Overview

This notebook is intended to provide an introductory overview of the tieval framework. tieval is a python library that was build to mitigate the issues on the development of temporal information extraction systems.

<img src="../../imgs/tieval.png" alt="tieval logo" align="center" width="500" />

Let's dive into it.

The first step to use tieval is to install it. In order to facilitate that, we published tieval on [python package index](https://pypi.org/) so to install it, one just needs to run the following line:

In [1]:
#! pip install tieval

tieval is divided into three main module - namely: datasets, models, and evaluation - that are intended to accommodate the full cycle of the machine learning project. In the following sections we will give a tour over the three modules.

## Datasets

The datasets' module is responsible for downloading and reading corpus annotated with temporal information. The corpus supported by `tieval` are listed in the `SUPPORTED_DATASETS` list. The script bellow prints the name of the currently supported corpus.

In [2]:
from tieval import datasets

n_datasets = len(datasets.SUPPORTED_DATASETS)
print(f"Total number of datasets: {n_datasets}")

print(f"Dataset names:")
for dataset_name in datasets.SUPPORTED_DATASETS:
    print("\t", dataset_name)

Total number of datasets: 46
Dataset names:
	 ancient_time_arabic
	 ancient_time_dutch
	 ancient_time_english
	 ancient_time_french
	 ancient_time_german
	 ancient_time_italian
	 ancient_time_spanish
	 ancient_time_vietnamese
	 aquaint
	 eventtime
	 fr_timebank
	 grapheve
	 krauts
	 krauts_diezeit
	 krauts_dolomiten_42
	 krauts_dolomiten_100
	 matres
	 meantime_english
	 meantime_spanish
	 meantime_dutch
	 meantime_italian
	 narrative_container
	 ph_english
	 ph_french
	 ph_german
	 ph_italian
	 ph_portuguese
	 ph_spanish
	 platinum
	 spanish_timebank
	 tcr
	 tddiscourse
	 tempeval_2_chinese
	 tempeval_2_english
	 tempeval_2_french
	 tempeval_2_italian
	 tempeval_2_korean
	 tempeval_2_spanish
	 tempeval_3
	 timebank_1.2
	 timebank_dense
	 timebankpt
	 timebank
	 traint3
	 wikiwars
	 wikiwars_de




### Download and read dataset

Previous to the release of tieval, one interested in leveraging corpus annotates with temporal information would have to develop software to read the corpus. Furthermore, since most of the corpus are stored in different formats, he/she would have to tailor a corpus reader for each of the corpus. This accumulates to significant engineering endeavour that typically limit the amount of corpus employed in the development of the systems.

tieval provides the datasets readers out-of-the-box for the corpus listed above. The script bellow exemplifies how one can read the TempEval-3 corpus with tieval

In [3]:
te3 = datasets.read("tempeval_3")

  0%|          | 0/275 [00:00<?, ?it/s]

100%|██████████| 275/275 [00:01<00:00, 232.21it/s]


By running the previous script, tieval creates a data/ folder at the root of the project directory, and downloads the TempEval-3 annotated corpus to there. After that, the `read()` function parses the corpus with the appropriate reader and outputs an instance of the `Dataset` object.

### `Dataset` object

In [4]:
print(type(te3))
print(te3)

<class 'tieval.base.Dataset'>
Dataset(name=tempeval_3)


In [5]:
# Attributes
print("Name:")
print(te3.name)

print("---")
print("Training Documents:")
print(te3.train)

print("---")
print("Test Documents:")
print(te3.test)

Name:
tempeval_3
---
Training Documents:
[Document(name=APW19991024.0075), Document(name=APW19990206.0090), Document(name=APW19980213.1310), Document(name=NYT19981026.0446), Document(name=wsj_0781), Document(name=XIE19980812.0062), Document(name=wsj_1073), Document(name=wsj_0973), Document(name=APW20000328.0257), Document(name=APW19980807.0261), Document(name=wsj_0685), Document(name=wsj_0811), Document(name=wsj_0713), Document(name=APW20000401.0150), Document(name=APW19991008.0265), Document(name=wsj_0159), Document(name=XIE19990210.0079), Document(name=wsj_1014), Document(name=wsj_0760), Document(name=CNN19980222.1130.0084), Document(name=wsj_0348), Document(name=APW19990507.0207), Document(name=PRI19980213.2000.0313), Document(name=NYT20000105.0325), Document(name=wsj_0928), Document(name=wsj_0151), Document(name=wsj_0667), Document(name=XIE19990227.0171), Document(name=wsj_0106), Document(name=wsj_0585), Document(name=AP900816-0139), Document(name=wsj_0184), Document(name=wsj_0356)

The `Dataset` object is the final representation of each dataset. Within it, one can find three attributes of major interest:
    - `.name` a string that contains the name of the datasets 
    - `.train` a list with the training documents
    - `.test` a list with the test documents
    
Furthermore, one can access al the documents with the `.documents` property.

> All the documents are placed in the `.train` attribute in case no train/ test is provided by the original authors of the dataset.

### `Document` object

Each document of the corpus is sorted as an instance of a `Document` which compiles the document level annotation. Most importantly, it contains the temporal entities and the temporal links that were annotated on the `.entities` and `.tlinks` attributes.  To demonstrate this we will use document **wsj_0006** of TempEval-3 as it the shortest document of the corpus.

In [6]:
doc_name = "wsj_0006"
doc = te3[doc_name]
print("Type:", type(doc))
print(doc)

Type: <class 'tieval.base.Document'>
Pacific First Financial Corp. said shareholders approved its acquisition by Royal Trustco Ltd. of Toronto for $27 a share, or $212 million.
The thrift holding company said it expects to obtain regulatory approval and complete the transaction by year-end.


In [7]:
print("Entities:")
print(doc.entities)

print("---")
print("TLinks:")
print(doc.tlinks)

Entities:
{Event("acquisition"), Event("approved"), Event("said"), Event("said"), Event("obtain"), Timex("year-end"), Event("transaction"), Event("complete"), Timex("11/02/89"), Event("approval"), Event("expects")}
---
TLinks:
{TLink(Event("complete") ---BEFORE--> Timex("year-end")), TLink(Event("said") ---SIMULTANEOUS--> Event("said")), TLink(Event("complete") ---ENDS--> Event("transaction")), TLink(Event("said") ---BEFORE--> Timex("11/02/89")), TLink(Event("approved") ---AFTER--> Event("acquisition")), TLink(Event("approved") ---BEFORE--> Event("said"))}


Note that the entities can take two types: `Event` or `Timex`. This goe sin accordance to the traditional approach of temporal information extraction in which the temporal entities take this two types.

Other attributes that can be found in a `Document` instance are: 
    - `.name` a string with the name of the document
    - `.text` a string with the text of the document
    - `.dct` a timex that represents the document creation time
    - `.events` the set of annotated events
    - `.timexs` the set of annotated timexs

The script bellow presents those attributes for our reference corpus.

In [8]:
print("Name:")
print(doc.name)

print("---")
print("Text:")
print(doc.text)

print("---")
print("Document Creation Time:")
print(doc.dct)

print("---")
print("Events:")
print(doc.events)

print("---")
print("Timexs:")
print(doc.timexs)

Name:
wsj_0006
---
Text:
Pacific First Financial Corp. said shareholders approved its acquisition by Royal Trustco Ltd. of Toronto for $27 a share, or $212 million.
The thrift holding company said it expects to obtain regulatory approval and complete the transaction by year-end.
---
Document Creation Time:
Timex("11/02/89")
---
Events:
{Event("acquisition"), Event("approved"), Event("said"), Event("said"), Event("obtain"), Event("transaction"), Event("complete"), Event("approval"), Event("expects")}
---
Timexs:
{Timex("11/02/89"), Timex("year-end")}


#### Temporal Closure
One of the main contributions of `tieval` is providing the research community an easy to compute temporal closure of a document. For more about temporal closure see [James Allen](https://dl.acm.org/doi/pdf/10.1145/182.358434) work. In `tieval` the closure operation can be performed in two ways: by calling the `.temporal_closure` property of a `Document`, or by using applying the `temporal_closure()` function to a set of TLinks, as is demonstrated on the script bellow.

In [9]:
print("Document property:")
tlinks_closure_doc = doc.temporal_closure
print(tlinks_closure_doc)

print("---")
print("Closure function:")
from tieval.closure import temporal_closure
tlinks_closure_func = temporal_closure(doc.tlinks)
print(tlinks_closure_func)

print("---")
print(tlinks_closure_doc == tlinks_closure_func)

Document property:
{TLink(ei76 ---BEFORE--> t0), TLink(ei74 ---BEFORE--> t0), TLink(ei74 ---BEFORE--> ei76), TLink(ei80 ---BEFORE--> t10), TLink(ei75 ---BEFORE--> t0), TLink(ei73 ---SIMULTANEOUS--> ei76), TLink(ei81 ---BEFORE--> t10), TLink(ei80 ---ENDS--> ei81), TLink(ei73 ---AFTER--> ei75), TLink(ei73 ---BEFORE--> t0), TLink(ei75 ---BEFORE--> ei76), TLink(ei74 ---AFTER--> ei75), TLink(ei73 ---AFTER--> ei74)}
---
Closure function:
{TLink(ei76 ---BEFORE--> t0), TLink(ei74 ---BEFORE--> t0), TLink(ei74 ---BEFORE--> ei76), TLink(ei80 ---BEFORE--> t10), TLink(ei75 ---BEFORE--> t0), TLink(ei73 ---SIMULTANEOUS--> ei76), TLink(ei81 ---BEFORE--> t10), TLink(ei80 ---ENDS--> ei81), TLink(ei73 ---AFTER--> ei75), TLink(ei73 ---BEFORE--> t0), TLink(ei75 ---BEFORE--> ei76), TLink(ei74 ---AFTER--> ei75), TLink(ei73 ---AFTER--> ei74)}
---
True


### Statistics

Now we can produce a script that presents the number of documents, events, timexs, and tlinks for each corpus. 

In [10]:
statistics = []
for dataset_name in datasets.SUPPORTED_DATASETS:
    
    print(f"Reading dataset {dataset_name}...")
    dataset = datasets.read(dataset_name)
    
    n_docs = len(dataset.documents)
    n_events = sum(len(doc.events) for doc in dataset.documents)
    n_timexs = sum(len(doc.timexs) for doc in dataset.documents)
    n_tlinks = sum(len(doc.tlinks) for doc in dataset.documents if doc.tlinks)
    statistics += [(dataset.name, n_docs, n_events, n_timexs, n_tlinks)]

Reading dataset ancient_time_arabic...


100%|██████████| 5/5 [00:00<00:00, 818.46it/s]


Reading dataset ancient_time_dutch...


100%|██████████| 5/5 [00:00<00:00, 1663.22it/s]


Reading dataset ancient_time_english...


100%|██████████| 5/5 [00:00<00:00, 658.74it/s]


Reading dataset ancient_time_french...


100%|██████████| 5/5 [00:00<00:00, 635.29it/s]


Reading dataset ancient_time_german...


100%|██████████| 5/5 [00:00<00:00, 1036.30it/s]

Reading dataset ancient_time_italian...



100%|██████████| 5/5 [00:00<00:00, 649.25it/s]


Reading dataset ancient_time_spanish...


100%|██████████| 5/5 [00:00<00:00, 914.95it/s]


Reading dataset ancient_time_vietnamese...


100%|██████████| 4/4 [00:00<00:00, 644.71it/s]


Reading dataset aquaint...


100%|██████████| 72/72 [00:00<00:00, 126.54it/s]


Reading dataset eventtime...


100%|██████████| 183/183 [00:00<00:00, 318.42it/s]


Reading dataset fr_timebank...


100%|██████████| 108/108 [00:00<00:00, 665.56it/s]


Reading dataset grapheve...


100%|██████████| 103/103 [00:24<00:00,  4.17it/s]


Reading dataset krauts...


100%|██████████| 192/192 [00:00<00:00, 3375.91it/s]


Reading dataset krauts_diezeit...


100%|██████████| 50/50 [00:00<00:00, 2321.89it/s]


Reading dataset krauts_dolomiten_42...


100%|██████████| 42/42 [00:00<00:00, 3651.15it/s]


Reading dataset krauts_dolomiten_100...


100%|██████████| 100/100 [00:00<00:00, 3886.60it/s]


Reading dataset matres...


100%|██████████| 275/275 [00:01<00:00, 238.62it/s]


Reading dataset meantime_english...


100%|██████████| 120/120 [00:00<00:00, 155.48it/s]


Reading dataset meantime_spanish...


100%|██████████| 120/120 [00:00<00:00, 180.23it/s]


Reading dataset meantime_dutch...


100%|██████████| 120/120 [00:00<00:00, 204.32it/s]


Reading dataset meantime_italian...


100%|██████████| 120/120 [00:00<00:00, 181.64it/s]


Reading dataset narrative_container...


100%|██████████| 63/63 [00:19<00:00,  3.16it/s]


Reading dataset ph_english...


100%|██████████| 24642/24642 [00:03<00:00, 7195.62it/s]


Reading dataset ph_french...


100%|██████████| 27154/27154 [00:02<00:00, 12013.05it/s]


Reading dataset ph_german...


100%|██████████| 19095/19095 [00:02<00:00, 7287.55it/s]


Reading dataset ph_italian...


100%|██████████| 9619/9619 [00:01<00:00, 9267.41it/s] 


Reading dataset ph_portuguese...


100%|██████████| 24293/24293 [00:02<00:00, 11155.73it/s]


Reading dataset ph_spanish...


100%|██████████| 33266/33266 [00:04<00:00, 6865.67it/s]


Reading dataset platinum...


100%|██████████| 20/20 [00:00<00:00, 229.31it/s]


Reading dataset spanish_timebank...


100%|██████████| 210/210 [00:00<00:00, 279.53it/s]


Reading dataset tcr...


100%|██████████| 25/25 [00:00<00:00, 182.40it/s]


Reading dataset tddiscourse...


100%|██████████| 183/183 [00:00<00:00, 254.20it/s]


Reading dataset tempeval_2_chinese...


100%|██████████| 52/52 [00:00<00:00, 143.62it/s]


Reading dataset tempeval_2_english...


100%|██████████| 182/182 [00:00<00:00, 271.14it/s]


Reading dataset tempeval_2_french...


100%|██████████| 83/83 [00:00<00:00, 886.46it/s]


Reading dataset tempeval_2_italian...


100%|██████████| 64/64 [00:00<00:00, 306.67it/s]


Reading dataset tempeval_2_korean...


100%|██████████| 18/18 [00:00<00:00, 70.13it/s]


Reading dataset tempeval_2_spanish...


100%|██████████| 210/210 [00:00<00:00, 269.48it/s]


Reading dataset tempeval_3...


100%|██████████| 275/275 [00:01<00:00, 213.99it/s]


Reading dataset timebank_1.2...


100%|██████████| 183/183 [00:00<00:00, 246.42it/s]


Reading dataset timebank_dense...


100%|██████████| 183/183 [00:00<00:00, 320.91it/s]


Reading dataset timebankpt...


100%|██████████| 182/182 [00:00<00:00, 372.29it/s]


Reading dataset timebank...


100%|██████████| 183/183 [00:00<00:00, 289.00it/s]


Reading dataset traint3...


100%|██████████| 175/175 [00:00<00:00, 186.00it/s]


Reading dataset wikiwars...


100%|██████████| 22/22 [00:00<00:00, 195.77it/s]


Reading dataset wikiwars_de...


100%|██████████| 22/22 [00:00<00:00, 247.53it/s]


In [11]:
print("Name\t\t\t#docs\t#events\t#timexs\t#tlinks")
print("-" * 55)
for name, n_docs, n_events, n_timexs, n_tlinks in statistics:
    print(f"{name.capitalize():20}\t{n_docs}\t{n_events}\t{n_timexs}\t{n_tlinks}")

Name			#docs	#events	#timexs	#tlinks
-------------------------------------------------------
Ancient_time_arabic 	5	0	106	0
Ancient_time_dutch  	5	0	130	0
Ancient_time_english	5	0	311	0
Ancient_time_french 	5	0	290	0
Ancient_time_german 	5	0	196	0
Ancient_time_italian	5	0	234	0
Ancient_time_spanish	5	0	217	0
Ancient_time_vietnamese	4	0	120	0
Aquaint             	72	4351	639	5832
Eventtime           	36	1498	0	0
Fr_timebank         	108	2115	533	2303
Grapheve            	103	4298	0	18204
Krauts              	192	0	1282	0
Krauts_diezeit      	50	0	553	0
Krauts_dolomiten_42 	42	0	228	0
Krauts_dolomiten_100	100	0	501	0
Matres              	274	6065	0	13504
Meantime_english    	120	1882	349	1753
Meantime_spanish    	120	2000	344	1975
Meantime_dutch      	120	1346	346	1487
Meantime_italian    	120	1980	338	1675
Narrative_container 	63	3559	439	737
Ph_english          	24642	0	254803	0
Ph_french           	27154	0	83431	0
Ph_german           	19095	0	194043	0
Ph_italian          	9619	0	58823

## Models

Now that we presented how to import the data we move to the second step of the machine learning pipeline: the development of the model.

In its current version, `tieval` provides four models, namely:

    - TimexIdentificationBaseline -> which is the NER recognition model from SpaCy that was trained to identify timexs
    - HeidelTime -> a well known model for timex identification
    - EventIdentificationBaseline -> the same architecture of TimexIdentificationBaseline but trained to identify event expressions
    - CogCompTime2 -> a baseline for temporal relation classificaiton proposed by 

For the purpose of this notebook we will focus on timex identification, that task in which we can compare the two baseline models TimexIdentificationBaseline and HeidelTime.


> Note: for the implementation of HeidelTime, tieval uses the python implementation [py_heidetime](https://github.com/JMendes1995/py_heideltime) repository. As this is a python wrapper of the [original implementation](https://github.com/HeidelTime/heideltime) in Java, it requires some specif installation steps. For that, please refer to the [py_heidetime](https://github.com/JMendes1995/py_heideltime) repository.

The script bellow shows how to load the models.

### Entity Identification

In [None]:
from tieval import models

baseline = models.TimexIdentificationBaseline()
heideltime = models.HeidelTime()

Note that the script above will create a `models` directory on the project root. This is where tieval will store the parameters required for the TimexIdentificationBaseline model.

To get predictions from the models one just needs to pass a set of documents to the `.predict()` method of the model.

In [None]:
print("Preforming predictions with baseline model...")
baseline_predictions = baseline.predict(te3.test)
print("Done.")      
      
print("Preforming predictions with HeidelTime model...")
heideltime_predictions = heideltime.predict(te3.test)
print("Done.")

Preforming predictions with baseline model...
Done.
Preforming predictions with HeidelTime model...


100%|████████████████████████████████████████████████████████████████| 20/20 [00:38<00:00,  1.93s/it]

Done.





The output of the prediction method is a dictionary that maps the name of the document to the set of predicted timexs, as illustrated bellow.

In [None]:
print("Predictions type:")
print(type(baseline_predictions))

print("---")
print("Predictions keys:")
print(baseline_predictions.keys())

Predictions type:
<class 'dict'>
---
Predictions keys:
dict_keys(['CNN_20130322_1003', 'CNN_20130322_248', 'CNN_20130321_821', 'CNN_20130322_314', 'AP_20130322', 'WSJ_20130318_731', 'bbc_20130322_1150', 'WSJ_20130321_1145', 'CNN_20130322_1243', 'WSJ_20130322_159', 'bbc_20130322_332', 'nyt_20130322_strange_computer', 'nyt_20130321_sarkozy', 'WSJ_20130322_804', 'bbc_20130322_1600', 'bbc_20130322_1353', 'nyt_20130321_china_pollution', 'nyt_20130321_cyprus', 'nyt_20130321_women_senate', 'bbc_20130322_721'])


Let's look at the predictions made by the models to the document CNN_20130322_248

In [None]:
doc_name = "CNN_20130322_248"
doc = te3[doc_name]

print("Metadata:")
print(f"Document name: {doc_name}")
print(f"Document creation time: {doc.dct}")

print("---")
print("Annotation")
print(doc.timexs)

print("---")
print("Baseline Predictions:")
print(baseline_predictions[doc_name])

print("---")
print("HeidelTime Predictions:")
print(heideltime_predictions[doc_name])


Metadata:
Document name: CNN_20130322_248
Document creation time: Timex("March 22, 2013")
---
Annotation
{Timex("this fiscal year"), Timex("March 22, 2013"), Timex("four-week"), Timex("Friday"), Timex("Friday"), Timex("April 7")}
---
Baseline Predictions:
[Timex("Friday"), Timex("April 7"), Timex("Friday"), Timex("this fiscal year")]
---
HeidelTime Predictions:
[Timex("Friday"), Timex("four-week"), Timex("April 7"), Timex("Friday"), Timex("this fiscal year")]


An important remark is the fact that none of the models predict the Timex("March 22, 2013"). This is natural as that timex is the document creation time, which is not explicit on the text and is given a priori to the system. `tieval` has built-in evaluation functions for fair comparison between the models as we will see in the evaluation section.

#### Trainable models

The provided by default for the event and timex baseline models were trained on the TempEval-3 corpus. However, one can train them from scratch in any other corpus. Bellow we present a script that train the TimexIdentificationBaseline on the TempEval-2 Spanish corpus.

In [None]:
import random
import warnings
warnings.filterwarnings("ignore")  # supress spacy warnings

import spacy

# set seeds
random.seed(73)
spacy.util.fix_random_seed(73)

n_epochs = 5

print("Reading data.")
data = datasets.read("tempeval_2_spanish")

# train/ dev split
size = len(data.train)
dev_size = round(size * 0.2)
random.shuffle(data.train)
train_docs, dev_docs = data.train[dev_size:], data.train[:dev_size]

# print some statistics
n_train_timexs = sum(len(doc.timexs) for doc in train_docs)
n_dev_timexs = sum(len(doc.timexs) for doc in dev_docs)

print(f"Number timexs (train/dev): {n_train_timexs}/{n_dev_timexs}")
print("-" * 50)

print("Training the model.")
model = models.TimexIdentificationBaseline()
for epoch in range(n_epochs):
    print(f"Epoch {epoch+1}")
    if epoch == 0:  # reset weights
        model.fit(train_docs, dev_docs, from_scratch=True)
    else:
        model.fit(train_docs, dev_docs)
    print("-" * 50)

Reading data.


100%|█████████████████████████████████████████████████████████████| 193/193 [00:00<00:00, 464.35it/s]


Number timexs (train/dev): 1163/281
--------------------------------------------------
Training the model.
Epoch 1
Losses:	Train 10.11214	Dev 1.61868
--------------------------------------------------
Epoch 2
Losses:	Train 1.60296	Dev 1.00113
--------------------------------------------------
Epoch 3
Losses:	Train 0.83664	Dev 0.53589
--------------------------------------------------
Epoch 4
Losses:	Train 0.35485	Dev 0.34843
--------------------------------------------------
Epoch 5
Losses:	Train 0.16431	Dev 0.24151
--------------------------------------------------


In the Evaluation section we will show how one can use `tieval` to further evaluate the performace of the system.

 ### TLink Classification

The model for tlinks classification follows the same principles of the timex identification. That is, the predictions are performed by means of the `.predict()` method which takes as input a set of documents and outputs a dictionary for that maps the names of the documents to the predictions performed by the system.

> Beware: CogCompTime2 model is a computationally heavy model that will take some time to run.

In the script bellow we load the CogCompTime2 model - that will download all the model resources to the `models` folder - the TCR corpus - as it is one of the corpus the authors used in the original evaluation of the models - and predict the temporal relations for all documents of TCR - TCR is one of the corpus that no standard train test split is provided.

In [None]:
cct2 = models.CogCompTime2()

In [None]:
tcr = datasets.read("tcr")
cct2_predictions = cct2.predict(tcr.documents)

100%|███████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 109.25it/s]


We can look at the results for one of the documents.

In [None]:
tcr.documents

[Document(name=2010.01.12.haiti.earthquake),
 Document(name=2010.01.03.japan.jal.airlines.ft),
 Document(name=2010.01.08.facebook.bra.color),
 Document(name=2010.01.12.uk.islamist.group.ban),
 Document(name=2010.01.06.tennis.qatar.federer.nadal),
 Document(name=2010.02.26.census.redistricting),
 Document(name=2010.01.07.water.justice),
 Document(name=2010.03.02.health.care),
 Document(name=2010.01.01.iran.moussavi),
 Document(name=2010.01.18.sherlock.holmes.tourism.london),
 Document(name=2010.01.12.turkey.israel),
 Document(name=2010.02.05.sotu.crowley.column),
 Document(name=2010.02.03.cross.quake.resistant.housing),
 Document(name=2010.01.02.pakistan.attacks),
 Document(name=2010.01.13.haiti.un.mission),
 Document(name=2010.01.13.google.china.exit),
 Document(name=2010.02.06.iran.nuclear),
 Document(name=2010.01.13.mexico.human.traffic.drug),
 Document(name=2010.03.22.africa.elephants.ivory.trade),
 Document(name=2010.03.02.japan.unemployment.ft),
 Document(name=2010.02.07.japan.pri

In [None]:
doc = tcr.documents[9]

print("Annotation")
for tlink in doc.tlinks:
    print(tlink)
    
print("---")
print("CogCompTime2 Predictions:")
for tlink in cct2_predictions[doc.name]:
    print(tlink)

Annotation
Event("made") ---AFTER--> Event("promoting")
Event("hoping") ---AFTER--> Event("promoting")
Event("told") ---AFTER--> Event("visit")
Event("think") ---AFTER--> Event("meaning")
Event("lived") ---BEFORE--> Event("greets")
Event("examine") ---AFTER--> Event("put")
Event("think") ---AFTER--> Event("put")
Event("put") ---BEFORE--> Event("put")
Event("visit") ---AFTER--> Event("think")
Event("examine") ---BEFORE--> Event("said")
Event("think") ---BEFORE--> Event("said")
Event("visit") ---AFTER--> Event("set")
Event("seen") ---BEFORE--> Event("visit")
Event("examine") ---BEFORE--> Event("think")
Event("examine") ---AFTER--> Event("put")
Event("promoting") ---BEFORE--> Event("seen")
Event("put") ---BEFORE--> Event("said")
Event("think") ---AFTER--> Event("put")
Event("visit") ---BEFORE--> Event("said")
Event("said") ---SIMULTANEOUS--> Event("meaning")
Event("seen") ---BEFORE--> Event("told")
Event("promoting") ---BEFORE--> Event("told")
Event("set") ---AFTER--> Event("lived")
Event

## Evaluation

Another interesting contribution of `tieval` is the evaluation infrastructure it provides. In particular, the framework provides evaluation functions for entity identification, tlink identification, and tlink classification. The tlink classification produces statists for temporal awareness, a more comprehensive metric to evaluation temporal relation extraction systems, that is not reported in many works.

### Timex Evaluation
As an example, the script bellow evaluates the predictions produced by the TimexIdentificationBaseline and the HeidelTime system that were produced above.

In [None]:
from tieval import evaluate

annotation_te3 = {doc.name: doc.timexs for doc in te3.test}

print("Baseline Results:")
baseline_results = evaluate.timex_identification(
    annotation_te3, 
    baseline_predictions, 
    verbose=True
)

print("---")
print("HeidelTime Results:")
heideltime_results = evaluate.timex_identification(
    annotation_te3, 
    heideltime_predictions, 
    verbose=True
)

Baseline Results:
|       |    f1 |   precision |   recall |
|-------+-------+-------------+----------|
| macro | 0.778 |       0.817 |    0.742 |
| micro | 0.795 |       0.851 |    0.746 |
---
HeidelTime Results:
|       |    f1 |   precision |   recall |
|-------+-------+-------------+----------|
| macro | 0.787 |       0.811 |    0.765 |
| micro | 0.818 |       0.84  |    0.797 |


Note that we have to build the annotation in the same format as the predictions, i.e., a dictionary with the name of the documents as key and the annotated timexs as the values.

As a result of the evaluation function `tieval` produces the micro and macro metrics for the precision, recall, and F1-score. 

### TLink Classification

For the temporal relation classification task, it adds to those metrics, the accuracy and the temporal metrics (precision, recall, and awareness) of the model. The script bellow provides the evaluation of the CogCompTime2 system on the TCR corpus.

In [None]:
annotation_tcr = {doc.name: doc.tlinks for doc in tcr.documents}
cct2_result = evaluate.tlink_classification(
    annotation_tcr, 
    cct2_predictions,
    verbose=True
)

|       |   accuracy |    f1 |   precision |   recall |   temporal_awareness |   temporal_precision |   temporal_recall |
|-------+------------+-------+-------------+----------+----------------------+----------------------+-------------------|
| macro |      0.748 | 0.748 |       0.748 |    0.748 |                0.687 |                0.74  |             0.641 |
| micro |      0.753 | 0.753 |       0.753 |    0.753 |                0.693 |                0.742 |             0.65  |


This finalizes the overview of `tieval`. With this introduction one should be familiar with the basic functionality of the framework which can empower the development of temporal information extraction system, and produce a comprehensive evaluation of it, on the several corpus the framework provides.