Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configuration for NER and POS #18

Closed
loretoparisi opened this issue Jul 17, 2017 · 41 comments
Closed

Configuration for NER and POS #18

loretoparisi opened this issue Jul 17, 2017 · 41 comments

Comments

@loretoparisi
Copy link

loretoparisi commented Jul 17, 2017

First thank you for your great work on the italian language for CoreNLP.
I'm trying the NER and POS tagger. My simplest configuration for CoreNLP is the following:

{ 
  'annotators': 'tokenize,ssplit,pos,lemma,ner',
  'ner.model': '/ner-ita-nogpe-noiob_gaz_wikipedia_sloppy.ser.gz',
  'pos.model': '/italian-fast.tagger',
  'depparse.model': '/parser-model-1.txt.gz',
  'customAnnotatorClass.ita_toksent': 
  'eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator',
  'customAnnotatorClass.ita_toksent': 'eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator',
  'customAnnotatorClass.ita_lemma': 'eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator',
'customAnnotatorClass.ita_morpho': 'eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator',
  'ssplit.newlineIsSentenceBreak' : 'always',
  'ner.useSUTime': 0,
}

Entities like DATE, LOC, PER are being recognized. Part of Speech tags as well. I have seen that there are other annotators like Geoloc, HeidelTime, customized Lemma, etc.

For the given configuration, this is my pipeline output:

13:01:21.587 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Registering annotator ita_toksent with class eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator
13:01:21.590 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Registering annotator ita_morpho with class eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator
13:01:21.590 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Registering annotator ita_lemma with class eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator
13:01:21.591 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
13:01:21.605 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
13:01:21.610 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
13:01:21.634 [main] INFO  e.s.nlp.tagger.maxent.MaxentTagger - warning: no language set, no open-class tags specified, and no closed-class tags specified; assuming ALL tags are open class tags
13:01:21.975 [main] INFO  e.s.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from /root/italian-fast.tagger ... done [0.3 sec].
13:01:21.976 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
13:01:21.976 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
13:01:25.247 [main] INFO  e.s.n.ie.AbstractSequenceClassifier - Loading classifier from /root/ner-ita-nogpe-noiob_gaz_wikipedia_sloppy.ser.gz ... done [3.2 sec].

so the custom annotators like ita_lemma and ita_toksent are registered, but I'm not sure that are actually loaded, instead of default ones.

Thank you.

@ziorufus
Copy link
Member

In this configuration, you are using the English annotators. The correct list of annotators is:

annotators=ita_toksent, pos, ita_morpho, ita_lemma, ner

It is weird that you get the DATEs, as they are not included in the Italian model (unless you are using an English text and the English model). If you want to use HeidelTime, you should add:

customAnnotatorClass.timex=eu.fbk.dh.tint.heideltime.annotator.HeidelTimeAnnotator
annotators=..., timex
timex.treeTaggerHome=path/to/tagger-scripts
timex.considerDate=true
timex.considerDuration=true
timex.considerSet=true
timex.considerTime=true
timex.typeSystemHome=desc/type/HeidelTime_TypeSystem.xml
timex.typeSystemHome_DKPro=desc/type/DKPro_TypeSystem.xml
timex.uimaVarDate=Date
timex.uimaVarDuration=Duration
timex.uimaVarLanguage=Language
timex.uimaVarSet=Set
timex.uimaVarTime=Time
timex.uimaVarTypeToProcess=Type
timex.uimaVarTemponym = Temponym
timex.considerTemponym = false
timex.chineseTokenizerPath=

where path/to/tagger-scripts is the path where you installed TreeTagger. You must leave the last one, timex.chineseTokenizerPath, even if it's blank, otherwise HeidelTime crashes.

To run the geocoder, you should have a local installation of Nominatim, or you can use the public one. The configuration is:

customAnnotatorClass.geoloc=eu.fbk.dh.tint.geoloc.annotator.GeolocAnnotator
annotators=..., geoloc
geoloc.geocoder_url=/path/to/nominatim

where /path/to/nominatim is the URL of Nominatim. By default, the GeolocAnnotator uses the Nominatim public one, that is slow and limited. If you use a local version, you can add a geoloc.use_local_geocoder boolean setting to skip the timeout. You can also set a geoloc.timeout option (in milliseconds), that works only when geoloc.use_local_geocoder is enabled (otherwise it is 1 second).

If you launch Tint using the included runner, all the customAnnotatorClasses are already set up correctly.

@loretoparisi
Copy link
Author

loretoparisi commented Jul 17, 2017

@ziorufus Thank you. Is the TreeTagger necessary for the italian tagger? I'm using CoreNLP with default models and by language models like fr,zh,de,es,ar (custom models in the jar files for each language then).

If I use as annotators "ita_toksent,ita_lemma,ita_morpho,ssplit,pos,ner" the JVM complains it's missing the class PropertiesUtils:

15:02:51.158 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Registering annotator ita_toksent with class eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator
15:02:51.161 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Registering annotator timex with class eu.fbk.dh.tint.heideltime.annotator.HeidelTimeAnnotator
15:02:51.161 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Registering annotator ita_morpho with class eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator
15:02:51.161 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Registering annotator ita_lemma with class eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator
15:02:51.162 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator ita_toksent
{ Error: Error creating class
edu.stanford.nlp.util.MetaClass$ClassCreationException: MetaClass couldn't create public eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator(java.lang.String,java.util.Properties) with args [ita_toksent, {tokenize.language=de, ssplit.newlineIsSentenceBreak=always, lang=it, annotators=ita_toksent,ita_lemma,ita_morpho,ssplit,pos,ner, depparse.model=/root/parser-model-1.txt.gz, customAnnotatorClass.ita_toksent=eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator, customAnnotatorClass.timex=eu.fbk.dh.tint.heideltime.annotator.HeidelTimeAnnotator, pos.model=/root/italian-fast.tagger, parse.model=edu/stanford/nlp/models/srparser/germanSR.ser.gz, ner.useSUTime=0, customAnnotatorClass.ita_morpho=eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator, customAnnotatorClass.ita_lemma=eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator, DATAROOT=/Users/loretoparisi/Dropbox (musixmatch)/Development/data/data/stanford, ner.model=/root/ner-ita-nogpe-noiob_gaz_wikipedia_sloppy.ser.gz}]
	at edu.stanford.nlp.util.MetaClass$ClassFactory.createInstance(MetaClass.java:237)
	at edu.stanford.nlp.util.MetaClass.createInstance(MetaClass.java:382)
	at edu.stanford.nlp.pipeline.AnnotatorImplementations.custom(AnnotatorImplementations.java:143)
	at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$registerCustomAnnotators$66(StanfordCoreNLP.java:556)
	at edu.stanford.nlp.util.Lazy$3.compute(Lazy.java:118)
	at edu.stanford.nlp.util.Lazy.get(Lazy.java:31)
	at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:146)
	at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:447)
	at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:150)
	at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:146)
	at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:133)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at edu.stanford.nlp.util.MetaClass$ClassFactory.createInstance(MetaClass.java:233)
	... 14 more
Caused by: java.lang.NoClassDefFoundError: eu/fbk/utils/core/PropertiesUtils
	at eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator.<init>(ItalianTokenizerAnnotator.java:29)
	... 19 more
Caused by: java.lang.ClassNotFoundException: eu.fbk.utils.core.PropertiesUtils
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 20 more

thank you

@ziorufus
Copy link
Member

TreeTagger is only needed for the HeidelTime annotator. Tint uses the CoreNLP original tagger. Regarding your error, are you using Maven to include the tint-tokenizer?

@loretoparisi
Copy link
Author

@ziorufus Ok got it! ...Nope, I'm including the jar manually in my project. Is that a sub-project?
Thank you.

@ziorufus
Copy link
Member

Which JAR are you including? You need to include the jar-with-dependencies.

@loretoparisi
Copy link
Author

loretoparisi commented Jul 17, 2017

So far I have included only:

-rw-r--r--  1 loretoparisi  staff     3534201 17 Lug 12:54 /root/tint-digimorph-0.1.jar
-rw-r--r--  1 loretoparisi  staff       10333 17 Lug 12:47 /root/tint-digimorph-annotator-0.1.jar
-rw-r--r--  1 loretoparisi  staff        8071 17 Lug 15:02 /root/tint-heideltime-annotator-0.1.jar
-rw-r--r--  1 loretoparisi  staff       24003 17 Lug 11:39 /root/tint-tokenizer-0.1.jar

Ah ok I see that it is part of the DKM package. Funny thing I do not find the eu.fbk.utils.core in the eu.fbk.utils I mean this one https://mvnrepository.com/artifact/eu.fbk.dkm.utils/utils/1.2

@ziorufus
Copy link
Member

You need to include all the dependencies (recursively). You can find the dependencies in the pom.xml file, but I suggest you to use the Maven paradigm, otherwise you need to add tens of dependency by hand.

@loretoparisi
Copy link
Author

loretoparisi commented Jul 17, 2017

Thank you. I'm actually using Maven to build the project:

[INFO] Reactor Summary:
[INFO] 
[INFO] tint ............................................... SUCCESS [  1.581 s]
[INFO] tint-textpro ....................................... SUCCESS [  0.477 s]
[INFO] tint-eval .......................................... SUCCESS [  0.040 s]
[INFO] tint-resources ..................................... SUCCESS [  0.102 s]
[INFO] tint-digimorph ..................................... SUCCESS [  0.123 s]
[INFO] tint-digimorph-annotator ........................... SUCCESS [  0.028 s]
[INFO] tint-tokenizer ..................................... SUCCESS [  0.031 s]
[INFO] tint-tense ......................................... SUCCESS [  0.021 s]
[INFO] tint-readability ................................... SUCCESS [  0.041 s]
[INFO] tint-geoloc-annotator .............................. SUCCESS [  0.018 s]
[INFO] tint-heideltime-annotator .......................... SUCCESS [  0.399 s]
[INFO] tint-models ........................................ SUCCESS [  0.012 s]
[INFO] tint-runner ........................................ SUCCESS [  0.925 s]
[INFO] tint-kd-annotator .................................. SUCCESS [  0.016 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS

so I get these target jars

[loretoparisi@:mbploreto tint]$ find ./ -name \*.jar
.//target/tint-0.1-tests.jar
.//tint-digimorph/target/tint-digimorph-0.1.jar
.//tint-digimorph-annotator/target/tint-digimorph-annotator-0.1.jar
.//tint-eval/target/tint-eval-0.1.jar
.//tint-geoloc-annotator/target/tint-geoloc-annotator-0.1.jar
.//tint-heideltime-annotator/target/tint-heideltime-annotator-0.1-tests.jar
.//tint-heideltime-annotator/target/tint-heideltime-annotator-0.1.jar
.//tint-kd-annotator/target/tint-kd-annotator-0.1.jar
.//tint-models/target/tint-models-0.1.jar
.//tint-readability/target/tint-readability-0.1.jar
.//tint-resources/target/tint-resources-0.1.jar
.//tint-runner/target/tint-runner-0.1-tests.jar
.//tint-runner/target/tint-runner-0.1.jar
.//tint-tense/target/tint-tense-0.1.jar
.//tint-textpro/target/tint-textpro-0.1.jar
.//tint-tokenizer/target/tint-tokenizer-0.1-tests.jar
.//tint-tokenizer/target/tint-tokenizer-0.1.jar

I prefer to take the generated jars one by one and put in my classpath. The issue here is that I do not find that util in the maven generated depencies (mvn package / install).

@ziorufus
Copy link
Member

Run mvn dependency:tree to print the list of dependencies recursively. As a suggestion, use the corenlp370 branch of Tint, so that you have the last version. In this case, you'll have to fix some dependencies, therefore you should mvn install utils and fcw before compiling Tint.

Anyway, if you include Tint in an existing Java project I suggest you to use Maven for both and include it into the pom.xml file. If you need to run Tint from the shell, just run mvn package -Prelease and uncompress the ready-to-use tar.gz archive you can find in the tint-runner/target folder.

@loretoparisi
Copy link
Author

loretoparisi commented Jul 17, 2017

@ziorufus Yes that is the best solution, I now realize that there are too much dependencies in the ~/.m2/repository/ folder to copy... Grazie!

@loretoparisi
Copy link
Author

loretoparisi commented Jul 17, 2017

@ziorufus So I did a check of corenlp370 and then I did mvn package -Prelease, but I get an error:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.3:compile (default-compile) on project tint-runner: Compilation failure: Compilation failure: 
[ERROR] /tint/tint-runner/src/main/java/eu/fbk/dh/tint/runner/TintPipeline.java:[6,39] package eu.fbk.utils.corenlp.outputters does not exist
[ERROR] /tint/tint-runner/src/main/java/eu/fbk/dh/tint/runner/TintPipeline.java:[147,44] package eu.fbk.utils.corenlp.outputters does not exist
[ERROR] /tint/tint-runner/src/main/java/eu/fbk/dh/tint/runner/TintPipeline.java:[150,13] cannot find symbol
[ERROR]   symbol:   variable TextProOutputter
[ERROR]   location: class eu.fbk.dh.tint.runner.TintPipeline
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :tint-runner

The utils package compiles and build, while on the dependency fcw I get

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.3:compile (default-compile) on project fcw-wikipedia: Compilation failure
[ERROR] /tint/fcw/fcw-wikipedia/src/main/java/eu/fbk/fcw/wikipedia/WikipediaCorefAnnotator.java:[143,41] cannot find symbol
[ERROR]   symbol:   class SimpleCorefAnnotation
[ERROR]   location: class eu.fbk.utils.corenlp.CustomAnnotations
[ERROR] 

NOTE. I can compile and build the version on the master branch with any issues.

@ziorufus
Copy link
Member

You are right: before installing utils you need to switch to the develop branch.

@loretoparisi
Copy link
Author

@ziorufus Ciao! I'm not sure about the steps to build tint, utils and fcw, with the latest support for CoreNLP 3.7.0. Could you please guide me through?

Thank you.

@loretoparisi
Copy link
Author

loretoparisi commented Jul 19, 2017

@ziorufus maybe a simpler solution could be to provide the releases builds (i.e. packaged with all the dependencies) directly. Thank you.

@loretoparisi
Copy link
Author

@ziorufus Ciao, a question about the above configuration for ner only.
Assumed that the configuration is like annotators=ita_toksent, pos, ita_morpho, ita_lemma, ner, and I'm not going to use HeidelTime, do I need this jar only as dependency?

-rw-r--r--  1 loretoparisi  staff       24003 17 Lug 11:39 /root/tint-tokenizer-0.1.jar

If not, could you please point me to the related maven dependencies? At this time I have in my jar files

├── ahocorasick-0.3.0.jar
├── tint-digimorph-0.1.jar
├── tint-digimorph-annotator-0.1.jar
├── tint-heideltime-annotator-0.1.jar
├── tint-tokenizer-0.1.jar
└── utils-core-3.0.jar

and I would like to keep only the ones needed.

Thank you for your help!!!

@ziorufus
Copy link
Member

@loretoparisi The problem is that each dependency as its own dependencies. If you use Maven, the dependency tree is built and managed automatically; if you want to include the jars, you need to resolve the tree and add everything.

@borice
Copy link

borice commented Mar 13, 2018

Hello @ziorufus

Sorry to write in this thread, but I have a related problem getting Stanford CoreNLP 3.9.1 to work with the Italian models from Tint. I have the following properties configuration file:

annotators = ita_toksent, ner
tokenize.language = it
ssplit.newlineIsSentenceBreak = false
pos.model = models/italian-fast.tagger
depparse.model = models/parser-model-1.txt.gz
ner.model = models/ner-ita-nogpe-noiob_gaz_wikipedia_sloppy.ser.gz
ner.applyNumericClassifiers = false
ner.useSUTime = false
ner.applyFineGrained = false

customAnnotatorClass.ita_toksent = eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator
customAnnotatorClass.ita_lemma = eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator
customAnnotatorClass.ita_morpho = eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator

And the error I get is:

19:47:31.113  INFO Registering annotator ita_toksent with class eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator
19:47:31.114  INFO Registering annotator ita_morpho with class eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator
19:47:31.114  INFO Registering annotator ita_lemma with class eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator
19:47:31.118  INFO Adding annotator ita_toksent
19:47:31.221  INFO Loaded 37 normalization rules
19:47:31.224  INFO Loaded 7 sentence splitting rules
19:47:31.225  INFO Loaded 6 token splitting rules
19:47:31.226  INFO Loaded 9 regular expressions
19:47:31.240  INFO Loaded 288 abbreviations
19:47:31.253  INFO Adding annotator ner
19:47:34.229  INFO Loading classifier from models/ner-ita-nogpe-noiob_gaz_wikipedia_sloppy.ser.gz ... done [2.9 sec].
Exception in thread "main" java.lang.IllegalArgumentException: annotator "ner" requires annotation "IsNewlineAnnotation". The usual requirements for this annotator are: tokenize,ssplit,pos,lemma
	at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:504)

Any ideas?

If I add the pos, ita_morpho and ita_lemma annotators, I get a different error:
Caused by: java.lang.ClassNotFoundException: kotlin.TypeCastException

I've added the Tint Maven dependency as instructed in the documentation:

<dependency>
    <groupId>eu.fbk.dh</groupId>
    <artifactId>tint-runner</artifactId>
    <version>0.2</version>
</dependency>

Thank you!

@ziorufus
Copy link
Member

ziorufus commented Mar 13, 2018

It seems that CoreNLP 3.9.1 added a new mandatory annotation. It is not documented on Stanford NLP website, therefore I needed to write to the group. For now, I patched it, hoping that it is enough. Just pull the develop branch, recompile using mvn clean install, edit the version in your POM from 0.2 to 1.0-SNAPSHOT and try again.

@loretoparisi
Copy link
Author

@ziorufus yes there is a important migration to do for the annotators and sub-annotators: stanfordnlp/CoreNLP#633 (comment)

@ziorufus
Copy link
Member

I know that, but the problem is not on sub annotators (that Tint is not using), but on a new annotation called IsNewlineAnnotation, that is required by the NER and it is not documented anywhere.

@borice
Copy link

borice commented Mar 13, 2018

Thank you @ziorufus Using the change you've done in the develop branch worked!

@algoscale1
Copy link

algoscale1 commented Mar 27, 2018

Hello @ziorufus ,
I am finding an error while using tint with corenlp 3.9.1
Exception in thread "main" edu.stanford.nlp.util.MetaClass$ClassCreationException: java.lang.ClassNotFoundException: eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator at edu.stanford.nlp.util.MetaClass.createFactory(MetaClass.java:364) at edu.stanford.nlp.util.MetaClass.createInstance(MetaClass.java:381) at edu.stanford.nlp.pipeline.AnnotatorImplementations.custom(AnnotatorImplementations.java:141) at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$null$67(StanfordCoreNLP.java:606) at edu.stanford.nlp.util.Lazy$3.compute(Lazy.java:126) at edu.stanford.nlp.util.Lazy.get(Lazy.java:31) at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:149) at edu.stanford.nlp.pipeline.StanfordCoreNLP.construct(StanfordCoreNLP.java:495) at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:201) at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:194) at edu.stanford.nlp.pipeline.StanfordCoreNLP.<init>(StanfordCoreNLP.java:181) at Caused by: java.lang.ClassNotFoundException: eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) 19:09:48.221 [main] INFO e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator ita_toksent at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at edu.stanford.nlp.util.MetaClass$ClassFactory.construct(MetaClass.java:135) at edu.stanford.nlp.util.MetaClass$ClassFactory.<init>(MetaClass.java:202) at edu.stanford.nlp.util.MetaClass$ClassFactory.<init>(MetaClass.java:69) at edu.stanford.nlp.util.MetaClass.createFactory(MetaClass.java:360) ... 11 more

these are my configs
properties.setProperty("annotators", "ita_toksent, pos, ita_morpho, ita_lemma, ner"); properties.setProperty("ner.model", "models/ner-ita-nogpe-noiob_gaz_wikipedia_sloppy.ser.gz"); properties.setProperty("pos.model","models/italian-fast.tagger"); properties.setProperty("depparse.model","models/parser-model-1.txt.gz"); properties.setProperty("customAnnotatorClass.ita_toksent", "eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator"); properties.setProperty("customAnnotatorClass.ita_lemma", "eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator"); properties.setProperty("customAnnotatorClass.ita_morpho","eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator"); properties.setProperty("ssplit.newlineIsSentenceBreak","always"); properties.setProperty("ner.useSUTime","false");

I am using maven dependency of tint as mentioned in docs . Any help would be greatly appreciated..thanks!!

@ziorufus
Copy link
Member

Try to add this dependency to the pom.xml file.

        <dependency>
            <groupId>eu.fbk.dh</groupId>
            <artifactId>tint-tokenizer</artifactId>
            <version>1.0-SNAPSHOT</version>
            <scope>runtime</scope>
        </dependency>

Use the develop branch.

@dvakhil8
Copy link

Hello @ziorufus,
I have cloned source code and used develop branch and compiled using mvn clean install . Also I changed my pom version from 0.2 to 1.0-SNAPSHOT but still facing issue
Exception in thread "main" edu.stanford.nlp.util.MetaClass$ClassCreationException: java.lang.ClassNotFoundException: eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator

@andreaferretti
Copy link

Hi @ziorufus, I am trying to use TINT for italian NER as well, following the configuration you mention at the beginning of this thread. I have a couple of questions:

  • is there a reason why TINT uses HeidelTime standalone instead of using it as a library?
  • why this requirement of tree tagger? On the HeidelTime page I found no mention of it

@ziorufus
Copy link
Member

Tint uses HeidelTime standalone because it's hard to integrate it in a flow that does not use UIMA. Tree tagger is required because it uses the correct POS tags. We are working on a custom version of HeidelTime that can be easily integrated into the Tint POS tags, but it's not ready yet.

@andreaferretti
Copy link

@ziorufus thank you! Unfortunately, this makes tint a little hard to deploy, since it starts a lot of processes in the background for heideltime and brings a dependency on perl...

@nadezdaalexandrovna
Copy link

Hi @ziorufus, sorry for disturbing you, I am trying to integrate Tint into a Pepper module and am getting the error resource italian.db not found :
Caused by: edu.stanford.nlp.util.MetaClass$ClassCreationException: MetaClass couldn't create public eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator(java.lang.String,java.util.Properties) with args [ita_morpho, {readability.glossario.use=no, timex.uimaVarLanguage=Language, timex.uimaVarDuration=Duration, timex.chineseTokenizerPath=, ner.useSUTime=0, timex.typeSystemHome_DKPro=desc/type/DKPro_TypeSystem.xml, dbps.first_confidence=0.5, timex.uimaVarTime=Time, timex.uimaVarTypeToProcess=Type, dbps.min_confidence=0.3, customAnnotatorClass.keyphrase=eu.fbk.dh.kd.annotator.DigiKdAnnotator, customAnnotatorClass.fake_dep=eu.fbk.dkm.pikes.depparseannotation.StanfordToConllDepsAnnotator, timex.uimaVarDate=Date, dbps.address=http://spotlight.sztaki.hu:2230/rest, customAnnotatorClass.timex=eu.fbk.dh.tint.heideltime.annotator.HeidelTimeAnnotator, customAnnotatorClass.geoloc=eu.fbk.dh.tint.geoloc.annotator.GeolocAnnotator, timex.uimaVarSet=Set, dbps.annotator=dbpedia-annotate, pos.model=models/italian-fast.tagger, dbps.extract_types=0, customAnnotatorClass.readability=eu.fbk.dh.tint.readability.ReadabilityAnnotator, timex.considerTemponym=false, annotators=ita_toksent, pos, ita_morpho, ita_lemma, ner, timex.considerTime=true, timex.treeTaggerHome=/home/nadiushka/treetagger/cmd, timex.considerDate=true, customAnnotatorClass.ita_tense=eu.fbk.dh.tint.tense.TenseAnnotator, readability.glossario.parse=yes, readability.language=it, depparse.model=models/parser-model-1.txt.gz, customAnnotatorClass.dbps=eu.fbk.dkm.pikes.twm.LinkingAnnotator, customAnnotatorClass.ita_morpho=eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator, timex.typeSystemHome=desc/type/HeidelTime_TypeSystem.xml, timex.uimaVarTemponym=Temponym, readability.glossario.stanford.annotators=ita_toksent, pos, ita_morpho, ita_lemma, customAnnotatorClass.ml=eu.fbk.dkm.pikes.twm.LinkingAnnotator, ner.model=models/ner-ita-nogpe-noiob_gaz_wikipedia_sloppy.ser.gz, customAnnotatorClass.ita_toksent=eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator, timex.considerDuration=true, timex.considerSet=true, customAnnotatorClass.ita_lemma=eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator}]
at edu.stanford.nlp.util.MetaClass$ClassFactory.createInstance(MetaClass.java:237)
...
at eu.fbk.dh.tint.runner.TintPipeline.runRaw(TintPipeline.java:103)
at CoreNLPPepper.CoreNLPPepper.CoreNLPManipulator$CoreNLPMapper.testStanfordItalian(CoreNLPManipulator.java:238)
at CoreNLPPepper.CoreNLPPepper.CoreNLPManipulator$CoreNLPMapper.mapSDocument(CoreNLPManipulator.java:144)
at org.corpus_tools.pepper.impl.PepperMapperControllerImpl.map(PepperMapperControllerImpl.java:251)
at org.corpus_tools.pepper.impl.PepperMapperControllerImpl.run(PepperMapperControllerImpl.java:188)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at edu.stanford.nlp.util.MetaClass$ClassFactory.createInstance(MetaClass.java:233)
... 16 more
Caused by: java.lang.IllegalArgumentException: resource italian.db not found.
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:146)
at com.google.common.io.Resources.getResource(Resources.java:197)
at eu.fbk.dh.tint.digimorph.DigiMorph.(DigiMorph.java:58)
at eu.fbk.dh.tint.digimorph.annotator.DigiMorphModel.getInstance(DigiMorphModel.java:14)
at eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator.(DigiMorphAnnotator.java:23)

I included the tint-digimorph jar into my pom.xml and it gets copied to the snapshot and I also copy it to the dependency folder used by Pepper.

Would you have any idea?
Thanks a lot!

@ziorufus
Copy link
Member

ziorufus commented Feb 19, 2019 via email

@nadezdaalexandrovna
Copy link

Thank you very much for your quick reply! It worked and now I have a new error: the beginning is the same, but the end is different:
Caused by: java.lang.NoClassDefFoundError: org/mapdb/volume/MappedFileVol
at eu.fbk.dh.tint.digimorph.DigiMorph.(DigiMorph.java:67)
at eu.fbk.dh.tint.digimorph.annotator.DigiMorphModel.getInstance(DigiMorphModel.java:14)
at eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator.(DigiMorphAnnotator.java:23)

Maybe you have an idea about this one, too? Thank you!

@ziorufus
Copy link
Member

ziorufus commented Feb 19, 2019 via email

@nadezdaalexandrovna
Copy link

Thank you! I am still getting errors, but will continue trying on my own now. Thanks a lot for being so responsive!

@nadezdaalexandrovna
Copy link

Good morning Alessio,
Sorry to disturb you, I am trying to use the development version. I have compiled the tint-runner-1.0-SNAPSHOT.jar and the tint-runner-1.0-SNAPSHOT-jar-with-dependencies.jar following the instructions on github. The result is successful and the reactor summary is the following:
Reactor Summary:
[INFO]
[INFO] tint ............................................... SUCCESS [ 1.253 s]
[INFO] tint-eval .......................................... SUCCESS [ 1.975 s]
[INFO] tint-resources ..................................... SUCCESS [ 5.017 s]
[INFO] tint-digimorph ..................................... SUCCESS [ 2.199 s]
[INFO] tint-digimorph-annotator ........................... SUCCESS [ 0.380 s]
[INFO] tint-tokenizer ..................................... SUCCESS [ 0.278 s]
[INFO] tint-verb .......................................... SUCCESS [ 0.583 s]
[INFO] tint-readability ................................... SUCCESS [ 1.047 s]
[INFO] tint-derived ....................................... SUCCESS [ 0.153 s]
[INFO] tint-heideltime-annotator .......................... SUCCESS [ 0.343 s]
[INFO] tint-models ........................................ SUCCESS [ 6.418 s]
[INFO] tint-runner ........................................ SUCCESS [ 45.478 s]
[INFO] tint-inverse-digimorph ............................. SUCCESS [ 1.482 s]
[INFO] tint-simplifier .................................... SUCCESS [ 20.939 s]

Now I need to make this jar accessible to my project, so I need to install it into my ./m2 folder.
I tried to do it with the following command:
mvn --also-make-dependents install:install-file -Dfile=tint-runner/target/tint-runner-1.0-SNAPSHOT.jar -DgroupId=eu.fbk.dh -DartifactId=tint-runner -Dversion=1.0-SNAPSHOT -Dpackaging=jar
but the reactor summary was different:
Reactor Summary:
[INFO]
[INFO] tint ............................................... SUCCESS [ 0.290 s]
[INFO] tint-eval .......................................... SKIPPED
[INFO] tint-resources ..................................... SKIPPED
[INFO] tint-digimorph ..................................... SKIPPED
[INFO] tint-digimorph-annotator ........................... SKIPPED
[INFO] tint-tokenizer ..................................... SKIPPED
[INFO] tint-verb .......................................... SKIPPED
[INFO] tint-readability ................................... SKIPPED
[INFO] tint-derived ....................................... SKIPPED
[INFO] tint-heideltime-annotator .......................... SKIPPED
[INFO] tint-models ........................................ SKIPPED
[INFO] tint-runner ........................................ SKIPPED
[INFO] tint-inverse-digimorph ............................. SKIPPED
[INFO] tint-simplifier .................................... SKIPPED

How can I make all the modules get installed and not only the first one?
Thanks a lot!

@ziorufus
Copy link
Member

ziorufus commented Feb 21, 2019 via email

@nadezdaalexandrovna
Copy link

Thanks a lot, it worked. Now I have another resource not found problem:
resource feat-mappings.txt not found:
edu.stanford.nlp.util.MetaClass$ClassCreationException: MetaClass couldn't create public eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator(java.lang.String,java.util.Properties) with args [ita_lemma, {readability.glossario.use=no, timex.uimaVarLanguage=Language, timex.uimaVarDuration=Duration, timex.chineseTokenizerPath=, ner.useSUTime=0, timex.typeSystemHome_DKPro=desc/type/DKPro_TypeSystem.xml, dbps.first_confidence=0.5, timex.uimaVarTime=Time, timex.uimaVarTypeToProcess=Type, dbps.min_confidence=0.3, customAnnotatorClass.keyphrase=eu.fbk.dh.kd.annotator.DigiKdAnnotator, customAnnotatorClass.fake_dep=eu.fbk.dkm.pikes.depparseannotation.StanfordToConllDepsAnnotator, timex.uimaVarDate=Date, dbps.address=http://spotlight.sztaki.hu:2230/rest, customAnnotatorClass.timex=eu.fbk.dh.tint.heideltime.annotator.HeidelTimeAnnotator, customAnnotatorClass.geoloc=eu.fbk.dh.tint.geoloc.annotator.GeolocAnnotator, timex.uimaVarSet=Set, dbps.annotator=dbpedia-annotate, pos.model=models/italian-fast.tagger, dbps.extract_types=0, customAnnotatorClass.readability=eu.fbk.dh.tint.readability.ReadabilityAnnotator, timex.considerTemponym=false, annotators=ita_toksent, pos, ita_morpho, ita_lemma, ner, timex.considerTime=true, timex.treeTaggerHome=/home/nadiushka/treetagger/cmd, timex.considerDate=true, customAnnotatorClass.ita_tense=eu.fbk.dh.tint.tense.TenseAnnotator, readability.glossario.parse=yes, readability.language=it, depparse.model=models/parser-model-1.txt.gz, customAnnotatorClass.dbps=eu.fbk.dkm.pikes.twm.LinkingAnnotator, customAnnotatorClass.ita_morpho=eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator, timex.typeSystemHome=desc/type/HeidelTime_TypeSystem.xml, timex.uimaVarTemponym=Temponym, readability.glossario.stanford.annotators=ita_toksent, pos, ita_morpho, ita_lemma, customAnnotatorClass.ml=eu.fbk.dkm.pikes.twm.LinkingAnnotator, ner.model=models/ner-ita-nogpe-noiob_gaz_wikipedia_sloppy.ser.gz, customAnnotatorClass.ita_toksent=eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator, timex.considerDuration=true, timex.considerSet=true, customAnnotatorClass.ita_lemma=eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator, ita_morpho.model=/home/nadiushka/pepper/CoreNLPPepper/italian.db}]
...
Caused by: java.lang.IllegalArgumentException: resource feat-mappings.txt not found.
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:146)
at com.google.common.io.Resources.getResource(Resources.java:197)
at eu.fbk.dh.tint.digimorph.annotator.GuessModel.(GuessModel.java:235)
at eu.fbk.dh.tint.digimorph.annotator.GuessModelInstance.(GuessModelInstance.java:18)
at eu.fbk.dh.tint.digimorph.annotator.GuessModelInstance.getInstance(GuessModelInstance.java:23)
at eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator.(DigiLemmaAnnotator.java:87)

I saved it on my computer, but to what variable in the default-config.properties file should I assign its path?
Thank you!

@ziorufus
Copy link
Member

ziorufus commented Feb 22, 2019 via email

@nadezdaalexandrovna
Copy link

Yes, but what is the name of the variable to assign it to?

@ziorufus
Copy link
Member

ziorufus commented Feb 22, 2019 via email

@ziorufus
Copy link
Member

ziorufus commented Feb 22, 2019 via email

@nadezdaalexandrovna
Copy link

Thank you.

@nadezdaalexandrovna
Copy link

Good afternoon Alessio,
Sorry to disturb you again, but after pulling the new development version I am now getting the following error:
Caused by: edu.stanford.nlp.util.MetaClass$ClassCreationException: MetaClass couldn't create public eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator(java.lang.String,java.util.Properties) with args [ita_morpho, {readability.glossario.use=no, timex.uimaVarLanguage=Language, timex.uimaVarDuration=Duration, timex.chineseTokenizerPath=, ner.useSUTime=0, timex.typeSystemHome_DKPro=desc/type/DKPro_TypeSystem.xml, dbps.first_confidence=0.5, timex.uimaVarTime=Time, timex.uimaVarTypeToProcess=Type, ita_lemma.guess_model=/home/nadiushka/pepper/CoreNLPPepper/feat-mappings.txt, dbps.min_confidence=0.3, customAnnotatorClass.keyphrase=eu.fbk.dh.kd.annotator.DigiKdAnnotator, customAnnotatorClass.fake_dep=eu.fbk.dkm.pikes.depparseannotation.StanfordToConllDepsAnnotator, timex.uimaVarDate=Date, dbps.address=http://spotlight.sztaki.hu:2230/rest, customAnnotatorClass.timex=eu.fbk.dh.tint.heideltime.annotator.HeidelTimeAnnotator, customAnnotatorClass.geoloc=eu.fbk.dh.tint.geoloc.annotator.GeolocAnnotator, timex.uimaVarSet=Set, dbps.annotator=dbpedia-annotate, pos.model=models/italian-fast.tagger, dbps.extract_types=0, customAnnotatorClass.readability=eu.fbk.dh.tint.readability.ReadabilityAnnotator, timex.considerTemponym=false, annotators=ita_toksent, pos, ita_morpho, ita_lemma, ner, timex.considerTime=true, timex.treeTaggerHome=/home/nadiushka/treetagger/cmd, customAnnotatorClass.ita_tense=eu.fbk.dh.tint.tense.TenseAnnotator, timex.considerDate=true, readability.glossario.parse=yes, readability.language=it, depparse.model=models/parser-model-1.txt.gz, customAnnotatorClass.dbps=eu.fbk.dkm.pikes.twm.LinkingAnnotator, customAnnotatorClass.ita_morpho=eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator, timex.typeSystemHome=desc/type/HeidelTime_TypeSystem.xml, timex.uimaVarTemponym=Temponym, readability.glossario.stanford.annotators=ita_toksent, pos, ita_morpho, ita_lemma, customAnnotatorClass.ml=eu.fbk.dkm.pikes.twm.LinkingAnnotator, ner.model=models/ner-ita-nogpe-noiob_gaz_wikipedia_sloppy.ser.gz, customAnnotatorClass.ita_toksent=eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator, timex.considerDuration=true, timex.considerSet=true, customAnnotatorClass.ita_lemma=eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator, ita_morpho.model=italian.db}]
at edu.stanford.nlp.util.MetaClass$ClassFactory.createInstance(MetaClass.java:237)
at edu.stanford.nlp.util.MetaClass.createInstance(MetaClass.java:382)
at edu.stanford.nlp.pipeline.AnnotatorImplementations.custom(AnnotatorImplementations.java:141)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.lambda$null$28(StanfordCoreNLP.java:583)
at edu.stanford.nlp.util.Lazy$3.compute(Lazy.java:126)
at edu.stanford.nlp.util.Lazy.get(Lazy.java:31)
at edu.stanford.nlp.pipeline.AnnotatorPool.get(AnnotatorPool.java:149)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.(StanfordCoreNLP.java:251)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.(StanfordCoreNLP.java:192)
at edu.stanford.nlp.pipeline.StanfordCoreNLP.(StanfordCoreNLP.java:188)
at eu.fbk.dh.tint.runner.TintPipeline.load(TintPipeline.java:56)
at eu.fbk.dh.tint.runner.TintPipeline.runRaw(TintPipeline.java:116)
at eu.fbk.dh.tint.runner.TintPipeline.runRaw(TintPipeline.java:112)
at CoreNLPPepper.CoreNLPPepper.CoreNLPManipulator$CoreNLPMapper.testStanfordItalian(CoreNLPManipulator.java:238)
at CoreNLPPepper.CoreNLPPepper.CoreNLPManipulator$CoreNLPMapper.mapSDocument(CoreNLPManipulator.java:144)
at org.corpus_tools.pepper.impl.PepperMapperControllerImpl.map(PepperMapperControllerImpl.java:251)
at org.corpus_tools.pepper.impl.PepperMapperControllerImpl.run(PepperMapperControllerImpl.java:188)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at edu.stanford.nlp.util.MetaClass$ClassFactory.createInstance(MetaClass.java:233)
... 16 more
Caused by: org.mapdb.DBException$VolumeIOError
at org.mapdb.volume.MappedFileVolSingle.(MappedFileVolSingle.java:108)
at org.mapdb.volume.MappedFileVol$MappedFileFactory.factory(MappedFileVol.java:59)
at org.mapdb.volume.MappedFileVol$MappedFileFactory.makeVolume(MappedFileVol.java:38)
at org.mapdb.volume.VolumeFactory.makeVolume(VolumeFactory.java:20)
at org.mapdb.volume.VolumeFactory.makeVolume(VolumeFactory.java:15)
at eu.fbk.dh.tint.digimorph.DigiMorph.(DigiMorph.java:67)
at eu.fbk.dh.tint.digimorph.annotator.DigiMorphModel.getInstance(DigiMorphModel.java:14)
at eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator.(DigiMorphAnnotator.java:23)
... 21 more
Caused by: java.io.FileNotFoundException: italian.db (No such file or directory)
at java.io.RandomAccessFile.open0(Native Method)
at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
at java.io.RandomAccessFile.(RandomAccessFile.java:243)
at org.mapdb.volume.MappedFileVolSingle.(MappedFileVolSingle.java:85)
... 28 more

It is similar to the one a had already had with italian.db, but not exactly the same.
I have tried saving the italian.db file in different places and tried these 3 configurations:
1 ita_morpho.model=/home/nadiushka/pepper/CoreNLPPepper/italian.db
2 ita_morpho.model=models/italian.db
3 ita_morpho.model=italian.db
But none of them has worked.
Would you have any suggestions on how to address this problem?
Thank you in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants