Skip to content

Commit

Permalink
Update java to v2.6 (19)
Browse files Browse the repository at this point in the history
In V2.6 new PageMetadata fields are exposed:
- WikiData Qid, the language/site and time independent identifier from Wikidata (empty in previous cbor versions, since v2.6 a singleton list)
- SiteId, referring to the wiki site, like "enwiki". Note that Qid and SiteID refer to one page in a Wikipedia in a time independent manner, where PageNames are subject to change.
- PageTags, as exposed via page templates, such as "Vital article" or "Good article"
  • Loading branch information
laura-dietz committed Feb 1, 2022
1 parent d801a3c commit f47c77e
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions README.mkd
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

Development tools for participants of the TREC Complex Answer Retrieval track.

Data release support for v1.5 and v2.0.
Data release support for v1.5 and v2.0. and v2.6

Note that in order to allow to compile your project for two trec-car format versions, the maven artifact Id was changed to `treccar-tools-v2` with version 2.0, and the package path changed to `treccar_v2`

Expand Down Expand Up @@ -47,7 +47,7 @@ add the trec-car-tools dependency:
<dependency>
<groupId>com.github.TREMA-UNH</groupId>
<artifactId>trec-car-tools-java</artifactId>
<version>17</version>
<version>19</version>
</dependency>
~~~~

Expand Down Expand Up @@ -84,6 +84,8 @@ Articles, outlines, paragraphs are all described with CBOR following this gramma
CategoryIds -> [$pageId]
InlinkIds -> [$pageId]
InlinkAnchors -> [$anchorText]
WikiDataQid -> [$qid]
PageTags -> [$pageTags]
PageSkeleton -> Section | Para | Image | ListItem
Section -> $sectionHeading [PageSkeleton]
Expand Down

0 comments on commit f47c77e

Please sign in to comment.