forked from morfologik/morfologik-stemming
-
Notifications
You must be signed in to change notification settings - Fork 0
/
CONTRIBUTOR
62 lines (37 loc) · 1.71 KB
/
CONTRIBUTOR
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
Compiling
=========
You will need maven. Then, in the top folder of the checkout:
mvn clean install
or
mvn clean install -Pquick # no tests
Eclipse
=======
We highly recommend using m2eclipse plugin and importing all projects directly
into Eclipse as Maven projects ("maven nature").
Compiling dictionaries
======================
A pipeline for compiling plain text dictionary data into automata:
1) Prepare tab-delimited input file with the following columns:
inflected-form base-form annotation. An example:
niebabińscy niebabiński adj:pl:nom.voc:m1.p1:pos
niebabińska niebabiński adj:sg:nom.voc:f:pos
niebabiński niebabiński adj:sg:acc:m3:pos
2) The above tab-delimited input can be preprocessed
to conflate shared affixes (helps in subsequent FSA compression):
java -jar morfologik-tools-*-standalone.jar tab2morph --coder INFIX --input ~/tmp/input.txt > intermediate.txt
3) Compile FSA from the intermediate format:
java -jar morfologik-tools-*-standalone.jar fsa_build --input intermediate.txt --progress > output.fsa
4) You should add output.info file specifying character encoding and additional
licensing information. See examples (Polish dictionaries).
More info:
http://languagetool.wikidot.com/developing-a-tagger-dictionary
Sonatype/ release push
======================
# snapshot deploy, create single-JAR version, javadocs, etc.
mvn clean deploy -Prelease
# ZIP with full release artifacts
mvn clean deploy -Prelease,distribution
# ZIP with full release artifacts for sourceforge.net
mvn clean install -Prelease,distribution
# For final releases, GPG sign.
mvn clean deploy -Prelease,distribution,sign