A non-interactive command-line interface for the Pepper converter suite. Requires Unix-style environment, tested with Ubuntu 20.04L
Requirements: Java >=1.8, make, bash, Unix-style OS, write access to /tmp
Run
$> make
Retrieves a stable 2021 Pepper installation and sample data
Example call:
$> bash -e ./convert.sh PAULAImporter TextExporter pcc2_PAULA.zip samples
This will convert PAULA to plain text
synopsis:
convert.sh IMPORTER EXPORTER SRC[1..n] TGT
IMPORTER importer (source format)
EXPORTER exporter (target format)
SRC source file, directory or zip archive
TGT target directory
with the following converters:
| Format/Tool | IMPORTER |
EXPORTER |
|---|---|---|
| Aldt | AldtImporter |
|
| ANNIS | ANNISExporter |
|
| CoNLL (CoNLL-2012?) | CoNLLCorefImporter |
CoNLLCorefExporter |
| CoNLL (CoNLL-X?, CoNLL-U?) | CoNLLImporter |
CoNLLExporter |
| Cora | CoraXMLImporter |
|
| GraphViz | DOTExporter |
|
| EXMARaLDA | EXMARaLDAImporter |
EXMARaLDAExporter |
| ELAN | ElanImporter |
|
| GATE | GateImporter |
|
| GeTa | GeTaImporter |
|
| XML | GenericXMLImporter |
|
| GrAF | GrAFImporter |
|
| GraphAnno | GraphAnnoExporter |
|
| MMAX2 | MMAX2Importer |
MMAX2Exporter |
| PAULA | PAULAImporter |
PAULAExporter |
| Penn Treebank | PTBImporter |
PTBExporter |
| RSD | RSDImporter |
|
| RSTTool | RSTImporter |
|
| HTML (metadata) | SaltInfoExporter |
|
| SaltXML | SaltXMLImporter |
SaltXMLExporter |
| CSV | SpreadsheetImporter |
|
| TCF | TCFImporter |
TCFExporter |
| TEI (incomplete) | TEIImporter |
|
| txt | TextImporter |
TextExporter |
| TIGER-XML | Tiger2Importer |
|
| Toolbox | ToolboxImporter |
|
| TreeTagger | TreetaggerImporter |
TreetaggerExporter |
| UAM | UAMImporter |
|
| TSV | WebannoTSVImporter |
|
| ad hoc | WolofImporter |
Note that we extend the original Pepper with support for zip archives
For portability beyond Ubuntu 20.04, a Dockerfile is provided. However, note that this uses bind, so that external and internal paths needs to be distinguished.
Building
$> docker build -f Dockerfile -t acoli/pepper-wrapper .
Converting ./pcc2_PAULA.zip to ./tcf/:
$> docker run \
--mount type=bind,source=`realpath .`,target=/source \
--mount type=bind,source=`realpath ./tcf`,target=/target \
acoli/pepper-wrapper \
pepper-wrapper/convert.sh PaulaImporter TCFExporter source/pcc2_PAULA.zip target/
This is equivalent to
$> bash -e ./convert.sh PaulaImporter TCFExporter ./pcc2_PAULA.zip ./tcf