Skip to content

Commit

Permalink
Updated source code markup
Browse files Browse the repository at this point in the history
`code` is for inline
  • Loading branch information
Severin Simmler authored Jul 2, 2016
1 parent 159a93a commit 40da403
Showing 1 changed file with 44 additions and 33 deletions.
77 changes: 44 additions & 33 deletions doc/user-guide.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -37,15 +37,17 @@ The pipeline requires *Java 1.8* or higher. You can download Java from the http:

After downloading and unzipping the files, execute in your command line the following code:


`java -Xmx4g -jar ddw-{version}.jar -input file.txt -output folder`

[subs="attributes"]
----
java -Xmx4g -jar ddw-{version}.jar -input file.txt -output folder
----

If you do not specify the `-language` parameter, the pipeline is prepared to analyze English input. Support for the following languages are included in the current version of the DARIAH-DKPro-Wrapper: German (de), English (en), Spanish (es), and French (fr). If you want to work with Bulgarian (bg), Danish (da), Estonian (et), Finnish (fi), Galician (gl), Latin (la), Mongolian (mn), Polish (pl), Russian (ru), Slovakian (sk) or Swahili (sw) input, you have to install link:#TreeTagger[TreeTagger] first. To run the pipeline for German, execute the following command:


`java -Xmx4g -jar ddw-{version}.jar -language de -input file.txt -output folder`

[subs="attributes"]
----
java -Xmx4g -jar ddw-{version}.jar -language de -input file.txt -output folder
----

== Run the full pipeline
By default, the pipeline runs in a light mode, the memory and time intensive components for parsing and semantic role labeling are *disabled*.
Expand Down Expand Up @@ -76,35 +78,39 @@ You can process either single files or also all files inside a directory. Patter

The DARIAH-DKPro-Wrapper implements two base readers, one text reader and one XML-file reader. You can specify the reader that should be used with the `-reader` parameter. By default, the text reader is used. To use the XML reader, run the pipeline in the following way:


`java -Xmx4g -jar ddw-{version}.jar -reader xml -input file.xml -output folder`

[subs="attributes"]
----
java -Xmx4g -jar ddw-{version}.jar -reader xml -input file.xml -output folder
----

The XML reader skips XML tags and processes only text which is inside the XML tags. The XPath to each tag is conserved and stored in the column *SectionId* in the ouput format.

=== Reading Directories

You can also specify for the *-input* argument a directory instead of a file. If you run the pipeline in the following way:


`java -Xmx4g -jar ddw-{version}.jar -input folder/With/Files/ -output folder`

[subs="attributes"]
----
java -Xmx4g -jar ddw-{version}.jar -input folder/With/Files/ -output folder
----

the pipeline will process all files with a _.txt_ extension for the Text-reader. For the XML-reader, it will process all files with a _.xml_ extension.

You can speficy also patterns to read in only certain files or files with certain extension. For example to read in only _.xmi_ with the XML reader, you must start the pipeline in the following way:


`java -Xmx4g -jar ddw-{version}.jar -reader xml -input "folder/With/Files/*.xmi" -output folder`

[subs="attributes"]
----
java -Xmx4g -jar ddw-{version}.jar -reader xml -input "folder/With/Files/*.xmi" -output folder
----

*Note:* If you use patterns (i.e. paths containing an *), you must set it into quotes to prevent shell globbing.

To read all files in all subfolders, you can use a pattern like this:


`java -Xmx4g -jar ddw-{version}.jar -input "folder/With/Subfolders/\**/*.txt" -output folder`

[subs="attributes"]
----
java -Xmx4g -jar ddw-{version}.jar -input "folder/With/Subfolders/\**/*.txt" -output folder
----

This will read in all _.txt_ files in all subfolders. Note that the subfolder path will not be maintained in the output folder.

Expand All @@ -116,30 +122,34 @@ The pipeline can be configurated via properties-files that are stored in the `co

If you like to write your own config file, just create your own `.properties` file. You can run the pipeline with your `.properties`-file by setting the command argument.


`java -Xmx4g -jar ddw-{version}.jar -config /path/to/my/config/myconfigfile.properties -input file.txt -output folder`

[subs="attributes"]
----
java -Xmx4g -jar ddw-{version}.jar -config /path/to/my/config/myconfigfile.properties -input file.txt -output folder
----

In case you store your `myconfigfile.properties` in the `configs` folder, you can run the pipeline via:


`java -Xmx4g -jar ddw-{version}.jar -config myconfigfile.properties -input file.txt -output folder`

[subs="attributes"]
----
java -Xmx4g -jar ddw-{version}.jar -config myconfigfile.properties -input file.txt -output folder
----

You can split your config file into different parts and pass them all to the pipeline by seperating the paths using comma or semicolons. The pipeline examines all passed config files and derives the final configuration from all files. The config-file passed as last arguments has the highest priority, i.e. it can overwrite the values for all previous config files:


`java -Xmx4g -jar ddw-{version}.jar -config myfile1.properties,myconfig2.properties,myfile3.properties -input file.txt -output folder`

[subs="attributes"]
----
java -Xmx4g -jar ddw-{version}.jar -config myfile1.properties,myconfig2.properties,myfile3.properties -input file.txt -output folder
----

*Note:* The system always uses the default.properties and default_[langcode].properties as basic configuration files. All further config files are added on top of these files.


In case you like to use the _full_-version and also want to change the POS-tagger, you can run the pipeline in the following way:


`java -Xmx4g -jar ddw-{version}.jar -config myFullVersion.properties,myPOSTagger.properties -input file.txt -output folder`

[subs="attributes"]
----
java -Xmx4g -jar ddw-{version}.jar -config myFullVersion.properties,myPOSTagger.properties -input file.txt -output folder
----

In `myPOSTagger.properties` you just add the configuration for the different POS-tagger.

Expand Down Expand Up @@ -185,9 +195,10 @@ useLemmatizer = false

Change the paths for the parameter _executablePath_ and _modelLocation_ to the correct paths on your machine. You can then use TreeTagger in your pipeline using the `-config` argument:


`java -Xmx4g -jar ddw-{version}.jar -config treetagger-example.properties -language la -input file.txt -output folder`

[subs="attributes"]
----
java -Xmx4g -jar ddw-{version}.jar -config treetagger-example.properties -language la -input file.txt -output folder
----

Check the output of the pipeline that TreeTagger is used. The output of your pipeline should look something like this:
----
Expand Down

0 comments on commit 40da403

Please sign in to comment.