Skip to content

F1uctus/webanno2spacy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WebAnno ⟶ spaCy

A tool that helps you to convert the WebAnno TSV 3.2 files to spaCy's Doc format. The relations are saved into the Doc's rel extension attribute, in the same way as done in the spaCy tutorial video:

{
    (0, 6): { "label1": 1.0, "label2": 0.0, ... },
    (6, 0): { "label1": 0.0, "label2": 0.0, ... },
    ...
}

Usage

$ poetry install
$ webanno2spacy --help
Usage: webanno2spacy [OPTIONS] SPACY_MODEL INPUT_TEXT_FILE INPUT_WEBANNO_FILE

Arguments:
  SPACY_MODEL         [required]
  INPUT_TEXT_FILE     [required]
  INPUT_WEBANNO_FILE  [required]

Options:
  --output-file PATH
  --install-completion [bash|zsh|fish|powershell|pwsh]
                                  Install completion for the specified shell.
  --show-completion [bash|zsh|fish|powershell|pwsh]
                                  Show completion for the specified shell, to
                                  copy it or customize the installation.
  --help                          Show this message and exit.

TODO

  • Implement batch conversion of multiple files to DocBin

See also