Skip to content

module__org.bibliome.alvisnlp.modules.EnrichedDocumentWriter

Robert Bossy edited this page Jul 27, 2017 · 1 revision

#org.bibliome.alvisnlp.modules.EnrichedDocumentWriter

Synopsis

Writes the corpus in the infamous Alvis Enriched Document Format suitable for indexation with Zebra-Alvis.

Description

Writes the corpus in the infamous Alvis Enriched Document Format suitable for indexation with Zebra-Alvis.

Parameters

Optional

Type: String

Metadata key for the document id.

Optional

Type: Mapping

Metadata key translation.

Optional

Type: String

Name of the layer containing named entity annotations.

Optional

Type: OutputDirectory

Path to the directory where to write files.

Optional

Type: String

Prefix of the name of generated files.

Optional

Type: String

Name of the feature containing the term canonical form.

Optional

Type: String

Name of the layer containing the term annotations.

Optional

Type: String

Name of the layer containing token annotations.

Optional

Type: String

Name of the feature in token annotations containing the token type.

Optional

Type: String

Prefix for the document URL.

Optional

Type: String

Name of the feature containing semantic features of named entities and terms.

Default value: 100

Type: Integer

Number of documents in each document block.

Default value: 0

Type: Integer

Start point for document block numbering.

Default value: true

Type: Expression

Only process document that satisfy this filter.

Default value: lemma

Type: String

Name of the feature in word annotations containing the lemma.

Default value: lemma

Type: String

Name of the feature in named entity annotations containing the canonical form.

Default value: neType

Type: String

Name of the feature in named entity annotations containing the named entity type.

Default value: .sem

Type: String

Suffix of the name of generated files.

Default value: pos

Type: String

Name of the feature in word annotations containing the POS tag.

Default value: true

Type: Expression

Process only sections that satisfy this filter.

Default value: sentences

Type: String

Name of the layer containing sentence annotations.

Default value: id

Type: String

Document feature to use as the URL suffix.

Default value: words

Type: String

Name of the layer containing word annotations.

Clone this wiki locally