Skip to content

Commit b596f17

Browse files
AngledLuffaStanford NLP
authored andcommitted
Add a ctb9 model with a smaller feature set
1 parent 5cf2230 commit b596f17

File tree

1 file changed

+4
-0
lines changed

1 file changed

+4
-0
lines changed

scripts/chinese-segmenter/Makefile

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,10 @@ ctb7.train.chris6.ser.gz: dict-chris6.ser.gz
6363
ctb9.train.chris6.ser.gz: dict-chris6.ser.gz
6464
time java -mx20g edu.stanford.nlp.ie.crf.CRFClassifier -prop ctb9-chris6.prop -serDictionary $+ -sighanCorporaDict /u/nlp/data/chinese-segmenter/gale2007/ctb6/ -trainFile $(CTB9_ALL) -serializeTo $@ > $@.log 2> $@.err
6565

66+
# train on train CTB9 + extras, with all external lexicons, without training lexicon, use the threshold to make it smaller
67+
ctb9.train-small.chris6.ser.gz: dict-chris6.ser.gz
68+
time java -mx20g edu.stanford.nlp.ie.crf.CRFClassifier -prop ctb9-chris6.prop -serDictionary $+ -sighanCorporaDict /u/nlp/data/chinese-segmenter/gale2007/ctb6/ -featureDiffThresh=0.015 -trainFile $(CTB9_ALL) -serializeTo $@ > $@.log 2> $@.err
69+
6670
# train on all CTB7, with all external lexicons, without training lexicon
6771
bolt.chris6.ser.gz: dict-chris6.ser.gz
6872
time java -mx15g edu.stanford.nlp.ie.crf.CRFClassifier -prop $(DIR)/ctb6-chris6.prop -serDictionary $+ -sighanCorporaDict /u/nlp/data/chinese-segmenter/gale2007/ctb6/ -trainFile $(BOLT) -serializeTo $@ > $@.log 2> $@.err

0 commit comments

Comments
 (0)