added evaluation material for KE4IR journal paper

dkmfbk · Dec 22, 2016 · ca351e0 · ca351e0
1 parent 7db8f2a
commit ca351e0
Showing 1 changed file with 48 additions and 6 deletions.
diff --git a/src/site/markdown/ke4ir.md b/src/site/markdown/ke4ir.md
@@ -11,7 +11,7 @@ This page provides additional details on the use of Knowledge Extraction techniq
 
 Here, we provide a brief overview of the approach, make available for downlaod all the code and data used in the evaluation (to allow reproducibility of our results), and provide all the reports and the additional material we produced as part of the evaluation.
 
-
+<br/>
 ## Approach
 
 The goal in Information Retrieval is to determine, for a given text query, the relevant documents in a text collection, ranking them according to their relevance degree for the query.
@@ -25,7 +25,7 @@ In our approach, named __KE4IR__ (read: [_kee-fer_](https://knowledgestore.fbk.e
 
 We adopt a retrieval model inspired to the [Vector Space Model (VSM)](https://en.wikipedia.org/wiki/Vector_space_model). We represent both documents and queries as term vectors whose elements are the weight of textual and semantic terms, computed based on [TF / IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) values opportunely extended for the use with semantic information, plus a weight that allows assigning different importances to distinct layers (in our experiments, we use a setup where textual and semantic information contribute equally). The similarity score between a document and a query is computed via the scalar product of their vectors, and is used to identify and rank relevant documents.
 
-
+<br/>
 ## Implementation
 
 We built an evaluation infrastructure that implements the KE4IR approach and allows applying it on arbitrary documents and queries, measuring retrieval performances against gold relevance judgments.
@@ -35,12 +35,18 @@ For each document or query, textual terms are extracted starting from the tokeni
 Document terms are indexed in a Lucene inverted index.
 At search time, query terms are OR-ed in a Lucene query that locates documents containing at least one term. Then, matched documents are scored and ranked externally to Lucene (for ease of testing) according to KE4IR retrieval model.
 The resulting document ranking is compared with the gold relevance judgments to compute a comprehensive set of evaluation metrics, which are averaged along different queries.
+The evaluation infrastructure emits several CSV files:
+
+  * aggregates.csv: aggregated results over all queries of the dataset;
+  * query-NNN.csv: results on query NNN, for each feasible layer combination;
+  * ranking-NNN.csv: document rankings per layer on query NNN;
+  * settings-ABC.csv: results for a given layer combination ABC, on each query of the dataset.
 
 We provide the [evaluation infrastructure](http://github.com/dkmfbk/ke4ir-evaluation) as a separate GitHub project. A [precompiled version](https://knowledgestore.fbk.eu/files/ke4ir/ke4ir.tar.gz) is available.
 Please refer to the documentation on the GitHub project for specific instructions on how to load a dataset and run the code.
 
-
-## Evaluation
+<br/>
+## Evaluation 1: [WES Dataset](http://s16a.org/node/14)
 
 KE4IR has been validated on the ad-hoc IR task, consisting in performing a set of queries over a document collection for which the list of relevance judgments is available, comparing for each query the document rankings produced by KE4IR with the gold rankings.
 We adopted the document collection from [http://s16a.org/node/14](http://s16a.org/node/14), described in the paper:
@@ -74,9 +80,10 @@ Below we provide all the input, output and intermediate data involved in the eva
   * Term extraction results
     * textual and semantic terms extracted with KE4IR (TSV format) - [docs-extracted-terms.tsv.gz](https://knowledgestore.fbk.eu/files/ke4ir/docs-extracted-terms.tsv.gz) (3 MB), [queries-extracted-terms.tsv.gz](https://knowledgestore.fbk.eu/files/ke4ir/queries-extracted-terms.tsv.gz) (12 KB)
     * lucene index built using textual and semantic terms of documents - [lucene_index.zip](https://knowledgestore.fbk.eu/files/ke4ir/lucene_index.zip) (6 MB)
- <br/>
+<br/>
   * Evaluation results
-    * CSV files generated by the evaluation infrastructure (redundant data, included because more easily processable for further analyses) - [results_raw.zip](https://knowledgestore.fbk.eu/files/ke4ir/results_raw.zip) (180 KB)
+    * zip package containing the CSV files produced by the evaluation infrastructure - [results_raw.zip](https://knowledgestore.fbk.eu/files/ke4ir/results_raw.zip) (180 KB):
+<br/>          
     * spreadsheet with performances aggregated over queries, using different layer combinations (data related to Table 3) - [results_aggregates.ods](https://knowledgestore.fbk.eu/files/ke4ir/results_aggregates.ods) (60 KB)
     * spreadsheet with performances (and top 10 results with their scores) of each query, for each significant layer combination (data related to Table 4) - [results_queries.ods](https://knowledgestore.fbk.eu/files/ke4ir/results_queries.ods) (292 KB)
     * spreadsheet with rankings returned by each query considering one semantic layer at a time, with scores obtained before applying layer weight w(l(t)); data not used in the paper but possibly useful to further compare layers' performances - [results_rankings.ods](https://knowledgestore.fbk.eu/files/ke4ir/results_rankings.ods) (212 KB)
@@ -98,3 +105,38 @@ Requirements:
   * Java 8
 
 Note: the data folder contains the relevance judgments, the NLP annotations of documents and queries, and the knowledge graphs of documents and queries already enriched with background knowledge, so to avoid shipping the (very large) background knowledge index. Document and query terms, the lucene index, and the CSV report files are not included as they are generated by running the command above (the spreadsheets were built manually from the CSV reports, while the CSV file with the semantic weight analysis was produced calling the command multiple times with some script hack).
+
+<br/>
+## Evaluation 2: [Fernandez et al. Dataset](http://technologies.kmi.open.ac.uk/poweraqua/trec-evaluation.html)
+
+For this evaluation, we used the semantic search dataset proposed in:
+
+  * **Semantically enhanced Information Retrieval: an ontology-based approach**<br/>
+    By Miriam Fernandez, Ivan Cantador, Vanesa Lopez, David Vallet, Pablo Castells, Enrico Motta.<br/>
+    Web Semantics: Science, Services and Agents on the World Wide Web, [S.l.], v. 9, n. 4, Jan. 2012. ISSN 1570-8268.<br/>
+    [\[pdf\]](http://www.websemanticsjournal.org/index.php/ps/article/view/242) (more details also available on the dataset [webpage](http://technologies.kmi.open.ac.uk/poweraqua/trec-evaluation.html))
+
+Evaluation Results:<br/>
+
+  * zip package containing the CSV files produced by the evaluation infrastructure: [KE4IR vs textual](https://knowledgestore.fbk.eu/files/ke4ir/kbs/jws.zip) (56 KB) - [All layer combinations](https://knowledgestore.fbk.eu/files/ke4ir/kbs/jws_all.zip) (147 KB)<br />
+
+<br/>
+## Evaluation 3: [Trec 6-7-8-9-2001 Datasets](http://trec.nist.gov/)
+
+For this evaluation, we used the datasets adopted in the Trec 6, 7, 8, 9. and 2001 evaluation campaigns. More details can be found on the [Trec web-site](http://trec.nist.gov/).
+
+Evaluation Results:<br/>
+
+  * zip package containing the CSV files produced by the evaluation infrastructure (using Trec Topic Title as query):
+    * Trec 6 Title: [KE4IR vs textual](https://knowledgestore.fbk.eu/files/ke4ir/kbs/trec6t.zip) (131 KB) - [All layer combinations](https://knowledgestore.fbk.eu/files/ke4ir/kbs/trec6t_all.zip) (314 KB)
+    * Trec 7 Title: [KE4IR vs textual](https://knowledgestore.fbk.eu/files/ke4ir/kbs/trec7t.zip) (126 KB) - [All layer combinations](https://knowledgestore.fbk.eu/files/ke4ir/kbs/trec7t_all.zip) (303 KB)
+    * Trec 8 Title: [KE4IR vs textual](https://knowledgestore.fbk.eu/files/ke4ir/kbs/trec8t.zip) (134 KB) - [All layer combinations](https://knowledgestore.fbk.eu/files/ke4ir/kbs/trec8t_all.zip) (319 KB)
+    * Trec 9 Title: [KE4IR vs textual](https://knowledgestore.fbk.eu/files/ke4ir/kbs/trec9t.zip) (136 KB) - [All layer combinations](https://knowledgestore.fbk.eu/files/ke4ir/kbs/trec9t_all.zip) (335 KB)
+    * Trec 2001 Title: [KE4IR vs textual](https://knowledgestore.fbk.eu/files/ke4ir/kbs/trec2001t.zip) (139 KB) - [All layer combinations](https://knowledgestore.fbk.eu/files/ke4ir/kbs/trec2001t_all.zip) (331 KB)
+<br/><br/>         
+  * zip package containing the CSV files produced by the evaluation infrastructure (using Trec Topic Title and Description as query):
+    * Trec 6 Desc: [KE4IR vs textual](https://knowledgestore.fbk.eu/files/ke4ir/kbs/trec6d.zip) (149 KB) - [All layer combinations](https://knowledgestore.fbk.eu/files/ke4ir/kbs/trec6d_all.zip) (351 KB)
+    * Trec 7 Desc: [KE4IR vs textual](https://knowledgestore.fbk.eu/files/ke4ir/kbs/trec7d.zip) (139 KB) - [All layer combinations](https://knowledgestore.fbk.eu/files/ke4ir/kbs/trec7d_all.zip) (328 KB)
+    * Trec 8 Desc: [KE4IR vs textual](https://knowledgestore.fbk.eu/files/ke4ir/kbs/trec8d.zip) (148 KB) - [All layer combinations](https://knowledgestore.fbk.eu/files/ke4ir/kbs/trec8d_all.zip) (355 KB)
+    * Trec 9 Desc: [KE4IR vs textual](https://knowledgestore.fbk.eu/files/ke4ir/kbs/trec9d.zip) (162 KB) - [All layer combinations](https://knowledgestore.fbk.eu/files/ke4ir/kbs/trec9d_all.zip) (392 KB)
+    * Trec 2001 Desc: [KE4IR vs textual](https://knowledgestore.fbk.eu/files/ke4ir/kbs/trec2001d.zip) (149 KB) - [All layer combinations](https://knowledgestore.fbk.eu/files/ke4ir/kbs/trec2001d_all.zip) (358 KB)