-
Notifications
You must be signed in to change notification settings - Fork 23
SDK Convert Corpus
André Santos edited this page Nov 20, 2016
·
2 revisions
If users do not want, it is also straightforward to convert a corpus from one format to another programatically.
The following source code snippet shows how to convert a corpus, by creating a processing pipeline and using the data provided on the "example" folder.
// Set files
String documentsDirectory = "example/annotate/a1/in/";
String outputDirectory = "example/annotate/out/";
// Set input and output formats
InputFormat inputFormat = InputFormat.A1;
List<OutputFormat> outputFormats = new ArrayList();
outputFormats.add(OutputFormat.CONLL);
// Create context
ContextConfiguration config = new ContextConfiguration.Builder()
.withInputFormat(inputFormat)
.withOutputFormats(outputFormats)
.withParserTool(ParserTool.GDEP)
.withParserLanguage(ParserLanguage.ENGLISH)
.withParserLevel(ParserLevel.CHUNKING)
.build();
Context context = new Context(config, null, null);
// Create batch executor
boolean compressed = false;
int numThreads = 1;
BatchExecutor batch = new FileBatchExecutor(documentsDirectory, outputDirectory, compressed, numThreads, false, true);
// Run batch processing
batch.run(FileProcessor.class, context);