Navigation Menu

Skip to content
angelobo edited this page Jan 13, 2016 · 2 revisions

Task 2

… of the Semantic Publishing Challenge 2015.

Motivation

Several information about papers published in CEUR-WS.org is hidden within PDFs. Our goal is to extract some data and to make them available as LOD.

The extracted information should provide a deeper understanding of the context in which the papers were written. Scientific papers in fact are not isolated units. Several factors contribute to the origin and development of a paper, directly or indirectly: the research institutions the authors are affiliated to, the agencies that funded a research, the venue where a paper was presented are some examples. The network of related papers - for instance, those that cite or that are cited by a given one, or those that address similar issues - is a further dimension to take into account in order to assess the credibility and relevance of a paper.

The queries participants are required to answer approximate some of these indicators and are shown below.

The task requires techniques for extracting data from PDF, sided by techniques for Named-entity Recognition and Natural Language Processing.

Data Source

The input datasets consists of a set of PDF papers, taken from some of the workshops analysed in Task 1. The papers use different formats (ACM, LNCS) and different rules for bibliographic references, headers, affiliations and acknowledgements.

Datasets can be downloaded here:

Training Dataset TD2

PDF papers available in CEUR-WS.org. Individual description given here; list of URLs for convenient one-time download below.

Vol-1302

Vol-1248

Vol-1215

Vol-1207

Vol-1188

Vol-1155

Vol-1123

Vol-813

Vol-721

Vol-571

Vol-523

Vol-315

List of URLs for one-time download:

http://ceur-ws.org/Vol-1302/invited.pdf
http://ceur-ws.org/Vol-1302/paper1.pdf
http://ceur-ws.org/Vol-1302/paper2.pdf
http://ceur-ws.org/Vol-1302/paper3.pdf
http://ceur-ws.org/Vol-1302/paper4.pdf
http://ceur-ws.org/Vol-1302/paper5.pdf
http://ceur-ws.org/Vol-1302/paper6.pdf
http://ceur-ws.org/Vol-1302/paper7.pdf
http://ceur-ws.org/Vol-1302/paper8.pdf
http://ceur-ws.org/Vol-1248/WoMO14-Paper1.pdf 
http://ceur-ws.org/Vol-1248/WoMO14-Paper3.pdf
http://ceur-ws.org/Vol-1248/WoMO14-Paper4.pdf
http://ceur-ws.org/Vol-1248/WoMO14-Paper5.pdf
http://ceur-ws.org/Vol-1248/WoMO14-Paper7.pdf
http://ceur-ws.org/Vol-1248/WoMO14-Paper8.pdf
http://ceur-ws.org/Vol-1248/WoMO14-Paper9.pdf
http://ceur-ws.org/Vol-1215/paper-01.pdf 
http://ceur-ws.org/Vol-1215/paper-02.pdf
http://ceur-ws.org/Vol-1215/paper-03.pdf
http://ceur-ws.org/Vol-1215/paper-04.pdf
http://ceur-ws.org/Vol-1215/paper-05.pdf
http://ceur-ws.org/Vol-1215/paper-06.pdf
http://ceur-ws.org/Vol-1207/paper_1.pdf 
http://ceur-ws.org/Vol-1207/paper_2.pdf
http://ceur-ws.org/Vol-1207/paper_3.pdf
http://ceur-ws.org/Vol-1207/paper_4.pdf
http://ceur-ws.org/Vol-1207/paper_5.pdf
http://ceur-ws.org/Vol-1207/paper_6.pdf
http://ceur-ws.org/Vol-1207/paper_7.pdf
http://ceur-ws.org/Vol-1207/paper_8.pdf
http://ceur-ws.org/Vol-1207/paper_9.pdf
http://ceur-ws.org/Vol-1207/paper_10.pdf
http://ceur-ws.org/Vol-1207/paper_11.pdf
http://ceur-ws.org/Vol-1188/paper_1.pdf 
http://ceur-ws.org/Vol-1188/paper_2.pdf 
http://ceur-ws.org/Vol-1188/paper_3.pdf 
http://ceur-ws.org/Vol-1188/paper_4.pdf 
http://ceur-ws.org/Vol-1188/paper_5.pdf 
http://ceur-ws.org/Vol-1188/paper_6.pdf 
http://ceur-ws.org/Vol-1188/paper_7.pdf 
http://ceur-ws.org/Vol-1188/paper_9.pdf 
http://ceur-ws.org/Vol-1188/paper_11.pdf 
http://ceur-ws.org/Vol-1188/paper_12.pdf 
http://ceur-ws.org/Vol-1188/paper_13.pdf 
http://ceur-ws.org/Vol-1155/paper-01.pdf
http://ceur-ws.org/Vol-1155/paper-02.pdf
http://ceur-ws.org/Vol-1155/paper-03.pdf
http://ceur-ws.org/Vol-1155/paper-04.pdf
http://ceur-ws.org/Vol-1155/paper-05.pdf
http://ceur-ws.org/Vol-1155/paper-06.pdf
http://ceur-ws.org/Vol-1155/paper-07.pdf
http://ceur-ws.org/Vol-1123/paper1.pdf
http://ceur-ws.org/Vol-1123/paper2.pdf
http://ceur-ws.org/Vol-1123/paper3.pdf
http://ceur-ws.org/Vol-1123/paper4.pdf
http://ceur-ws.org/Vol-1123/paper5.pdf
http://ceur-ws.org/Vol-1123/paper6.pdf
http://ceur-ws.org/Vol-813/ldow2011-paper01.pdf 
http://ceur-ws.org/Vol-813/ldow2011-paper02.pdf
http://ceur-ws.org/Vol-813/ldow2011-paper03.pdf
http://ceur-ws.org/Vol-813/ldow2011-paper04.pdf
http://ceur-ws.org/Vol-813/ldow2011-paper05.pdf
http://ceur-ws.org/Vol-813/ldow2011-paper06.pdf
http://ceur-ws.org/Vol-813/ldow2011-paper07.pdf
http://ceur-ws.org/Vol-813/ldow2011-paper08.pdf
http://ceur-ws.org/Vol-813/ldow2011-paper09.pdf
http://ceur-ws.org/Vol-813/ldow2011-paper10.pdf
http://ceur-ws.org/Vol-813/ldow2011-paper11.pdf
http://ceur-ws.org/Vol-813/ldow2011-paper12.pdf
http://ceur-ws.org/Vol-721/paper-01.pdf
http://ceur-ws.org/Vol-721/paper-02.pdf
http://ceur-ws.org/Vol-721/paper-03.pdf
http://ceur-ws.org/Vol-721/paper-04.pdf
http://ceur-ws.org/Vol-721/paper-05.pdf
http://ceur-ws.org/Vol-721/paper-06.pdf
http://ceur-ws.org/Vol-721/paper-07.pdf
http://ceur-ws.org/Vol-571/paper1.pdf
http://ceur-ws.org/Vol-571/paper2.pdf
http://ceur-ws.org/Vol-571/paper3.pdf
http://ceur-ws.org/Vol-571/paper4.pdf
http://ceur-ws.org/Vol-571/paper5.pdf
http://ceur-ws.org/Vol-571/paper6.pdf
http://ceur-ws.org/Vol-571/paper7.pdf
http://ceur-ws.org/Vol-571/paper8.pdf
http://ceur-ws.org/Vol-523/Battistelli.pdf
http://ceur-ws.org/Vol-523/deWaard.pdf
http://ceur-ws.org/Vol-523/Groza.pdf
http://ceur-ws.org/Vol-523/Mons.pdf
http://ceur-ws.org/Vol-523/Newman.pdf
http://ceur-ws.org/Vol-523/Novacek.pdf
http://ceur-ws.org/Vol-523/Noy.pdf
http://ceur-ws.org/Vol-523/Passant.pdf
http://ceur-ws.org/Vol-523/Renear.pdf 
http://ceur-ws.org/Vol-315/paper1.pdf
http://ceur-ws.org/Vol-315/paper2.pdf
http://ceur-ws.org/Vol-315/paper3.pdf
http://ceur-ws.org/Vol-315/paper4.pdf
http://ceur-ws.org/Vol-315/paper5.pdf
http://ceur-ws.org/Vol-315/paper6.pdf
http://ceur-ws.org/Vol-315/paper7.pdf
http://ceur-ws.org/Vol-315/paper8.pdf

Evalation dataset ED2

The dataset ED2 is a superset of TD2. It includes all papers in TD2 plus the papers listed below.

The PDF papers are available in CEUR-WS.org. Individual description given here; list of URLs for convenient one-time download below.

Vol-1301

Vol-1184

Vol-1116

Vol-1044

Vol-958

Vol-903

Vol-856

Vol-778

Vol-665

One-time download:

http://ceur-ws.org/Vol-1301/ontocomodise2014_1.pdf
http://ceur-ws.org/Vol-1301/ontocomodise2014_2.pdf
http://ceur-ws.org/Vol-1301/ontocomodise2014_3.pdf
http://ceur-ws.org/Vol-1301/ontocomodise2014_4.pdf
http://ceur-ws.org/Vol-1301/ontocomodise2014_5.pdf
http://ceur-ws.org/Vol-1301/ontocomodise2014_6.pdf
http://ceur-ws.org/Vol-1301/ontocomodise2014_7.pdf
http://ceur-ws.org/Vol-1301/ontocomodise2014_8.pdf
http://ceur-ws.org/Vol-1301/ontocomodise2014_9.pdf
http://ceur-ws.org/Vol-1301/ontocomodise2014_10.pdf
http://ceur-ws.org/Vol-1184/ldow2014_paper_01.pdf
http://ceur-ws.org/Vol-1184/ldow2014_paper_02.pdf
http://ceur-ws.org/Vol-1184/ldow2014_paper_03.pdf
http://ceur-ws.org/Vol-1184/ldow2014_paper_04.pdf
http://ceur-ws.org/Vol-1184/ldow2014_paper_05.pdf
http://ceur-ws.org/Vol-1184/ldow2014_paper_06.pdf
http://ceur-ws.org/Vol-1184/ldow2014_paper_07.pdf
http://ceur-ws.org/Vol-1184/ldow2014_paper_08.pdf
http://ceur-ws.org/Vol-1184/ldow2014_paper_09.pdf
http://ceur-ws.org/Vol-1184/ldow2014_paper_10.pdf
http://ceur-ws.org/Vol-1184/ldow2014_paper_11.pdf
http://ceur-ws.org/Vol-1184/ldow2014_paper_12.pdf
http://ceur-ws.org/Vol-1184/ldow2014_paper_13.pdf
http://ceur-ws.org/Vol-1116/paper1.pdf
http://ceur-ws.org/Vol-1116/paper2.pdf
http://ceur-ws.org/Vol-1116/paper3.pdf
http://ceur-ws.org/Vol-1116/paper4.pdf
http://ceur-ws.org/Vol-1116/paper5.pdf
http://ceur-ws.org/Vol-1116/paper6.pdf
http://ceur-ws.org/Vol-1116/paper7.pdf
http://ceur-ws.org/Vol-1116/paper8.pdf
http://ceur-ws.org/Vol-1044/paper-01.pdf
http://ceur-ws.org/Vol-1044/paper-02.pdf
http://ceur-ws.org/Vol-1044/paper-03.pdf
http://ceur-ws.org/Vol-1044/paper-04.pdf
http://ceur-ws.org/Vol-1044/paper-05.pdf
http://ceur-ws.org/Vol-1044/paper-06.pdf
http://ceur-ws.org/Vol-1044/paper-07.pdf
http://ceur-ws.org/Vol-958/paper1.pdf
http://ceur-ws.org/Vol-958/paper2.pdf
http://ceur-ws.org/Vol-958/paper3.pdf
http://ceur-ws.org/Vol-958/paper4.pdf
http://ceur-ws.org/Vol-958/paper5.pdf
http://ceur-ws.org/Vol-958/paper6.pdf
http://ceur-ws.org/Vol-958/paper7.pdf
http://ceur-ws.org/Vol-958/paper8.pdf
http://ceur-ws.org/Vol-958/paper9.pdf
http://ceur-ws.org/Vol-903/paper-01.pdf
http://ceur-ws.org/Vol-903/paper-02.pdf
http://ceur-ws.org/Vol-903/paper-03.pdf
http://ceur-ws.org/Vol-903/paper-04.pdf
http://ceur-ws.org/Vol-903/paper-05.pdf
http://ceur-ws.org/Vol-903/paper-06.pdf
http://ceur-ws.org/Vol-903/paper-07.pdf
http://ceur-ws.org/Vol-903/paper-08.pdf
http://ceur-ws.org/Vol-856/paper_3.pdf
http://ceur-ws.org/Vol-856/paper_4.pdf
http://ceur-ws.org/Vol-856/paper_5.pdf
http://ceur-ws.org/Vol-856/paper_6.pdf
http://ceur-ws.org/Vol-856/paper_7.pdf
http://ceur-ws.org/Vol-856/paper_8.pdf
http://ceur-ws.org/Vol-778/paper1.pdf
http://ceur-ws.org/Vol-778/paper2.pdf
http://ceur-ws.org/Vol-778/paper3.pdf
http://ceur-ws.org/Vol-778/paper4.pdf
http://ceur-ws.org/Vol-778/paper5.pdf
http://ceur-ws.org/Vol-778/paper6.pdf
http://ceur-ws.org/Vol-778/paper7.pdf
http://ceur-ws.org/Vol-778/paper8.pdf
http://ceur-ws.org/Vol-778/pospaper1.pdf
http://ceur-ws.org/Vol-778/pospaper2.pdf
http://ceur-ws.org/Vol-778/pospaper3.pdf
http://ceur-ws.org/Vol-665/LiuEtAl_COLD2010.pdf
http://ceur-ws.org/Vol-665/NikolovEtAl_COLD2010.pdf
http://ceur-ws.org/Vol-665/NortonEtAl_COLD2010.pdf
http://ceur-ws.org/Vol-665/MillardEtAl_COLD2010.pdf
http://ceur-ws.org/Vol-665/UmbrichEtAl_COLD2010.pdf
http://ceur-ws.org/Vol-665/TroncyEtAl_COLD2010.pdf
http://ceur-ws.org/Vol-665/CorrendoEtAl_COLD2010.pdf
http://ceur-ws.org/Vol-665/IseleEtAl_COLD2010.pdf
http://ceur-ws.org/Vol-665/BizerEtAl_COLD2010.pdf
http://ceur-ws.org/Vol-665/MulwadEtAl_COLD2010.pdf

Queries

Participants are required to produce a dataset for answering the following queries.

  • Q2.1 (Affiliations in a paper): Identify the affiliations of the authors of the paper X.
  • Q2.2 (Papers from a country): Identify the papers presented at the workshop X and written by researchers affiliated to an organization located in the country Y.
  • Q2.3 (Cited works): Identify all works cited by the paper X
  • Q2.4 (Recent cited works): Identify all works cited by the paper X and published after the year Y.
  • Q2.5 (Cited journal papers): Identify all journal papers cited by the paper X
  • Q2.6 (Research grants): Identify the grant(s) that supported the research presented in the paper X (or part of it).
  • Q2.7 (Funding agencies): Identify the funding agencies that funded the research presented in the paper X (or part of it).
  • Q2.8 (EU projects): Identify the EU project(s) that supported the research presented in the paper X (or part of it).
  • Q2.9 (Related ontologies): Identify the ontologies mentioned in the abstract of the paper X.
  • Q2.10 (New ontologies): Identify the ontologies introduced in the paper X (according to the abstract).

These queries have to be translated in SPARQL according to the challenge's general rules and have to produce an output according to the detailed rules.