-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configuration for NER and POS #18
Comments
In this configuration, you are using the English annotators. The correct list of annotators is:
It is weird that you get the
where To run the geocoder, you should have a local installation of Nominatim, or you can use the public one. The configuration is:
where If you launch Tint using the included runner, all the |
@ziorufus Thank you. Is the If I use as annotators
thank you |
TreeTagger is only needed for the HeidelTime annotator. Tint uses the CoreNLP original tagger. Regarding your error, are you using Maven to include the |
@ziorufus Ok got it! ...Nope, I'm including the jar manually in my project. Is that a sub-project? |
Which JAR are you including? You need to include the |
So far I have included only:
Ah ok I see that it is part of the DKM package. Funny thing I do not find the |
You need to include all the dependencies (recursively). You can find the dependencies in the |
Thank you. I'm actually using Maven to build the project:
so I get these target jars
I prefer to take the generated jars one by one and put in my classpath. The issue here is that I do not find that util in the maven generated depencies ( |
Run Anyway, if you include Tint in an existing Java project I suggest you to use Maven for both and include it into the |
@ziorufus Yes that is the best solution, I now realize that there are too much dependencies in the |
@ziorufus So I did a check of
The
NOTE. I can compile and build the version on the |
You are right: before installing utils you need to switch to the |
@ziorufus Ciao! I'm not sure about the steps to build Thank you. |
@ziorufus maybe a simpler solution could be to provide the releases builds (i.e. packaged with all the dependencies) directly. Thank you. |
@ziorufus Ciao, a question about the above configuration for
If not, could you please point me to the related maven dependencies? At this time I have in my jar files
and I would like to keep only the ones needed. Thank you for your help!!! |
@loretoparisi The problem is that each dependency as its own dependencies. If you use Maven, the dependency tree is built and managed automatically; if you want to include the jars, you need to resolve the tree and add everything. |
Hello @ziorufus Sorry to write in this thread, but I have a related problem getting Stanford CoreNLP 3.9.1 to work with the Italian models from Tint. I have the following properties configuration file:
And the error I get is:
Any ideas? If I add the I've added the Tint Maven dependency as instructed in the documentation:
Thank you! |
It seems that CoreNLP 3.9.1 added a new mandatory annotation. It is not documented on Stanford NLP website, therefore I needed to write to the group. For now, I patched it, hoping that it is enough. Just pull the |
@ziorufus yes there is a important migration to do for the annotators and sub-annotators: stanfordnlp/CoreNLP#633 (comment) |
I know that, but the problem is not on sub annotators (that Tint is not using), but on a new annotation called |
Thank you @ziorufus Using the change you've done in the |
Hello @ziorufus , these are my configs I am using maven dependency of tint as mentioned in docs . Any help would be greatly appreciated..thanks!! |
Try to add this dependency to the
Use the |
Hello @ziorufus, |
Hi @ziorufus, I am trying to use TINT for italian NER as well, following the configuration you mention at the beginning of this thread. I have a couple of questions:
|
Tint uses HeidelTime standalone because it's hard to integrate it in a flow that does not use UIMA. Tree tagger is required because it uses the correct POS tags. We are working on a custom version of HeidelTime that can be easily integrated into the Tint POS tags, but it's not ready yet. |
@ziorufus thank you! Unfortunately, this makes tint a little hard to deploy, since it starts a lot of processes in the background for heideltime and brings a dependency on perl... |
Hi @ziorufus, sorry for disturbing you, I am trying to integrate Tint into a Pepper module and am getting the error resource italian.db not found : I included the tint-digimorph jar into my pom.xml and it gets copied to the snapshot and I also copy it to the dependency folder used by Pepper. Would you have any idea? |
Try to save the italian.db file somewhere on your computer, and specify it
using the property ita_morpho.model
You can find the file here:
https://github.com/dhfbk/tint/tree/master/tint-digimorph/src/main/resources
I suggest you to use the develop branch.
Best,
Alessio
Il giorno mar 19 feb 2019 alle ore 17:13 nadezdaalexandrovna <
notifications@github.com> ha scritto:
… Hi @ziorufus <https://github.com/ziorufus>, sorry for disturbing you, I
am trying to integrate Tint into a Pepper module and am getting the error *resource
italian.db not found* :
Caused by: edu.stanford.nlp.util.MetaClass$ClassCreationException:
MetaClass couldn't create public
eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator(java.lang.String,java.util.Properties)
with args [ita_morpho, {readability.glossario.use=no,
timex.uimaVarLanguage=Language, timex.uimaVarDuration=Duration,
timex.chineseTokenizerPath=, ner.useSUTime=0,
timex.typeSystemHome_DKPro=desc/type/DKPro_TypeSystem.xml,
dbps.first_confidence=0.5, timex.uimaVarTime=Time,
timex.uimaVarTypeToProcess=Type, dbps.min_confidence=0.3,
customAnnotatorClass.keyphrase=eu.fbk.dh.kd.annotator.DigiKdAnnotator,
customAnnotatorClass.fake_dep=eu.fbk.dkm.pikes.depparseannotation.StanfordToConllDepsAnnotator,
timex.uimaVarDate=Date, dbps.address=http://spotlight.sztaki.hu:2230/rest,
customAnnotatorClass.timex=eu.fbk.dh.tint.heideltime.annotator.HeidelTimeAnnotator,
customAnnotatorClass.geoloc=eu.fbk.dh.tint.geoloc.annotator.GeolocAnnotator,
timex.uimaVarSet=Set, dbps.annotator=dbpedia-annotate,
pos.model=models/italian-fast.tagger, dbps.extract_types=0,
customAnnotatorClass.readability=eu.fbk.dh.tint.readability.ReadabilityAnnotator,
timex.considerTemponym=false, annotators=ita_toksent, pos, ita_morpho,
ita_lemma, ner, timex.considerTime=true,
timex.treeTaggerHome=/home/nadiushka/treetagger/cmd,
timex.considerDate=true,
customAnnotatorClass.ita_tense=eu.fbk.dh.tint.tense.TenseAnnotator,
readability.glossario.parse=yes, readability.language=it,
depparse.model=models/parser-model-1.txt.gz,
customAnnotatorClass.dbps=eu.fbk.dkm.pikes.twm.LinkingAnnotator,
customAnnotatorClass.ita_morpho=eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator,
timex.typeSystemHome=desc/type/HeidelTime_TypeSystem.xml,
timex.uimaVarTemponym=Temponym,
readability.glossario.stanford.annotators=ita_toksent, pos, ita_morpho,
ita_lemma, customAnnotatorClass.ml=eu.fbk.dkm.pikes.twm.LinkingAnnotator,
ner.model=models/ner-ita-nogpe-noiob_gaz_wikipedia_sloppy.ser.gz,
customAnnotatorClass.ita_toksent=eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator,
timex.considerDuration=true, timex.considerSet=true,
customAnnotatorClass.ita_lemma=eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator}]
at
edu.stanford.nlp.util.MetaClass$ClassFactory.createInstance(MetaClass.java:237)
...
at eu.fbk.dh.tint.runner.TintPipeline.runRaw(TintPipeline.java:103)
at
CoreNLPPepper.CoreNLPPepper.CoreNLPManipulator$CoreNLPMapper.testStanfordItalian(CoreNLPManipulator.java:238)
at
CoreNLPPepper.CoreNLPPepper.CoreNLPManipulator$CoreNLPMapper.mapSDocument(CoreNLPManipulator.java:144)
at
org.corpus_tools.pepper.impl.PepperMapperControllerImpl.map(PepperMapperControllerImpl.java:251)
at
org.corpus_tools.pepper.impl.PepperMapperControllerImpl.run(PepperMapperControllerImpl.java:188)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at
edu.stanford.nlp.util.MetaClass$ClassFactory.createInstance(MetaClass.java:233)
... 16 more
Caused by: java.lang.IllegalArgumentException: resource italian.db not
found.
at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:146)
at com.google.common.io.Resources.getResource(Resources.java:197)
at eu.fbk.dh.tint.digimorph.DigiMorph.(DigiMorph.java:58)
at
eu.fbk.dh.tint.digimorph.annotator.DigiMorphModel.getInstance(DigiMorphModel.java:14)
at
eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator.(DigiMorphAnnotator.java:23)
I included the tint-digimorph jar into my pom.xml and it gets copied to
the snapshot and I also copy it to the dependency folder used by Pepper.
Would you have any idea?
Thanks a lot!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#18 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADWtiFpKuvU6JIe1BDTCG2rIPOeHUXAgks5vPCKjgaJpZM4OZyzr>
.
|
Thank you very much for your quick reply! It worked and now I have a new error: the beginning is the same, but the end is different: Maybe you have an idea about this one, too? Thank you! |
Try adding this dependency to your pom.xml file
<dependency>
<groupId>org.mapdb</groupId>
<artifactId>mapdb</artifactId>
<version>3.0.1</version>
</dependency>
Best,
Alessio
Il giorno mar 19 feb 2019 alle ore 17:31 nadezdaalexandrovna <
notifications@github.com> ha scritto:
… Thank you very much for your quick reply! It worked and now I have a new
error: the beginning is the same, but the end is different:
Caused by: java.lang.NoClassDefFoundError: org/mapdb/volume/MappedFileVol
at eu.fbk.dh.tint.digimorph.DigiMorph.(DigiMorph.java:67)
at
eu.fbk.dh.tint.digimorph.annotator.DigiMorphModel.getInstance(DigiMorphModel.java:14)
at
eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator.(DigiMorphAnnotator.java:23)
Maybe you have an idea about this one, too? Thank you!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#18 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADWtiHYBNW4_ls_BG0GLNpd3YBxLe9pcks5vPCbygaJpZM4OZyzr>
.
|
Thank you! I am still getting errors, but will continue trying on my own now. Thanks a lot for being so responsive! |
Good morning Alessio, Now I need to make this jar accessible to my project, so I need to install it into my ./m2 folder. How can I make all the modules get installed and not only the first one? |
Did you try a simple "mvn install"? It should work...
A.
Il giorno gio 21 feb 2019 alle ore 10:32 nadezdaalexandrovna <
notifications@github.com> ha scritto:
… Good morning Alessio,
Sorry to disturb you, I am trying to use the development version. I have
compiled the tint-runner-1.0-SNAPSHOT.jar and the
tint-runner-1.0-SNAPSHOT-jar-with-dependencies.jar following the
instructions on github. The result is successful and the reactor summary is
the following:
Reactor Summary:
[INFO]
[INFO] tint ............................................... SUCCESS [
1.253 s]
[INFO] tint-eval .......................................... SUCCESS [
1.975 s]
[INFO] tint-resources ..................................... SUCCESS [
5.017 s]
[INFO] tint-digimorph ..................................... SUCCESS [
2.199 s]
[INFO] tint-digimorph-annotator ........................... SUCCESS [
0.380 s]
[INFO] tint-tokenizer ..................................... SUCCESS [
0.278 s]
[INFO] tint-verb .......................................... SUCCESS [
0.583 s]
[INFO] tint-readability ................................... SUCCESS [
1.047 s]
[INFO] tint-derived ....................................... SUCCESS [
0.153 s]
[INFO] tint-heideltime-annotator .......................... SUCCESS [
0.343 s]
[INFO] tint-models ........................................ SUCCESS [
6.418 s]
[INFO] tint-runner ........................................ SUCCESS [
45.478 s]
[INFO] tint-inverse-digimorph ............................. SUCCESS [
1.482 s]
[INFO] tint-simplifier .................................... SUCCESS [
20.939 s]
Now I need to make this jar accessible to my project, so I need to install
it into my ./m2 folder.
I tried to do it with the following command:
mvn --also-make-dependents install:install-file
-Dfile=tint-runner/target/tint-runner-1.0-SNAPSHOT.jar -DgroupId=eu.fbk.dh
-DartifactId=tint-runner -Dversion=1.0-SNAPSHOT -Dpackaging=jar
but the reactor summary was different:
Reactor Summary:
[INFO]
[INFO] tint ............................................... SUCCESS [
0.290 s]
[INFO] tint-eval .......................................... SKIPPED
[INFO] tint-resources ..................................... SKIPPED
[INFO] tint-digimorph ..................................... SKIPPED
[INFO] tint-digimorph-annotator ........................... SKIPPED
[INFO] tint-tokenizer ..................................... SKIPPED
[INFO] tint-verb .......................................... SKIPPED
[INFO] tint-readability ................................... SKIPPED
[INFO] tint-derived ....................................... SKIPPED
[INFO] tint-heideltime-annotator .......................... SKIPPED
[INFO] tint-models ........................................ SKIPPED
[INFO] tint-runner ........................................ SKIPPED
[INFO] tint-inverse-digimorph ............................. SKIPPED
[INFO] tint-simplifier .................................... SKIPPED
How can I make all the modules get installed and not only the first one?
Thanks a lot!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#18 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADWtiJV0RQQH2xfZE6HGlIcmfqNXPjWNks5vPmedgaJpZM4OZyzr>
.
|
Thanks a lot, it worked. Now I have another resource not found problem: I saved it on my computer, but to what variable in the default-config.properties file should I assign its path? |
Yes, I guess you can assign che path to the file in the properties file.
Best,
Alessio
Il giorno gio 21 feb 2019 alle ore 16:39 nadezdaalexandrovna <
notifications@github.com> ha scritto:
… Thanks a lot, it worked. Now I have another resource not found problem:
resource *feat-mappings.txt* not found:
edu.stanford.nlp.util.MetaClass$ClassCreationException: MetaClass couldn't
create public
eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator(java.lang.String,java.util.Properties)
with args [ita_lemma, {readability.glossario.use=no,
timex.uimaVarLanguage=Language, timex.uimaVarDuration=Duration,
timex.chineseTokenizerPath=, ner.useSUTime=0,
timex.typeSystemHome_DKPro=desc/type/DKPro_TypeSystem.xml,
dbps.first_confidence=0.5, timex.uimaVarTime=Time,
timex.uimaVarTypeToProcess=Type, dbps.min_confidence=0.3,
customAnnotatorClass.keyphrase=eu.fbk.dh.kd.annotator.DigiKdAnnotator,
customAnnotatorClass.fake_dep=eu.fbk.dkm.pikes.depparseannotation.StanfordToConllDepsAnnotator,
timex.uimaVarDate=Date, dbps.address=http://spotlight.sztaki.hu:2230/rest,
customAnnotatorClass.timex=eu.fbk.dh.tint.heideltime.annotator.HeidelTimeAnnotator,
customAnnotatorClass.geoloc=eu.fbk.dh.tint.geoloc.annotator.GeolocAnnotator,
timex.uimaVarSet=Set, dbps.annotator=dbpedia-annotate,
pos.model=models/italian-fast.tagger, dbps.extract_types=0,
customAnnotatorClass.readability=eu.fbk.dh.tint.readability.ReadabilityAnnotator,
timex.considerTemponym=false, annotators=ita_toksent, pos, ita_morpho,
ita_lemma, ner, timex.considerTime=true,
timex.treeTaggerHome=/home/nadiushka/treetagger/cmd,
timex.considerDate=true,
customAnnotatorClass.ita_tense=eu.fbk.dh.tint.tense.TenseAnnotator,
readability.glossario.parse=yes, readability.language=it,
depparse.model=models/parser-model-1.txt.gz,
customAnnotatorClass.dbps=eu.fbk.dkm.pikes.twm.LinkingAnnotator,
customAnnotatorClass.ita_morpho=eu.fbk.dh.tint.digimorph.annotator.DigiMorphAnnotator,
timex.typeSystemHome=desc/type/HeidelTime_TypeSystem.xml,
timex.uimaVarTemponym=Temponym,
readability.glossario.stanford.annotators=ita_toksent, pos, ita_morpho,
ita_lemma, customAnnotatorClass.ml=eu.fbk.dkm.pikes.twm.LinkingAnnotator,
ner.model=models/ner-ita-nogpe-noiob_gaz_wikipedia_sloppy.ser.gz,
customAnnotatorClass.ita_toksent=eu.fbk.dh.tint.tokenizer.annotators.ItalianTokenizerAnnotator,
timex.considerDuration=true, timex.considerSet=true,
customAnnotatorClass.ita_lemma=eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator,
ita_morpho.model=/home/nadiushka/pepper/CoreNLPPepper/italian.db}]
...
Caused by: java.lang.IllegalArgumentException: resource feat-mappings.txt
not found.
at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:146)
at com.google.common.io.Resources.getResource(Resources.java:197)
at eu.fbk.dh.tint.digimorph.annotator.GuessModel.(GuessModel.java:235)
at
eu.fbk.dh.tint.digimorph.annotator.GuessModelInstance.(GuessModelInstance.java:18)
at
eu.fbk.dh.tint.digimorph.annotator.GuessModelInstance.getInstance(GuessModelInstance.java:23)
at
eu.fbk.dh.tint.digimorph.annotator.DigiLemmaAnnotator.(DigiLemmaAnnotator.java:87)
I saved it on my computer, but to what variable in the
default-config.properties file should I assign its path?
Thank you!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#18 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADWtiAdo2lU2xRLNNqb0U7Mlp2IEmHEEks5vPr3LgaJpZM4OZyzr>
.
|
Yes, but what is the name of the variable to assign it to? |
Try to move that file in the src/main/resources folder of your project.
A.
Il giorno ven 22 feb 2019 alle ore 12:55 nadezdaalexandrovna <
notifications@github.com> ha scritto:
… Yes, but what is the name of the variable to assign it to?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#18 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADWtiFNo8VVOerDG6CUOTW1iAWlTO97Lks5vP9qegaJpZM4OZyzr>
.
|
You were right, there was no property for the guess model.
I've updated the code, you can now use ita_lemma.guess_model and specify
the file in the properties file.
Just pull the repository on the develop branch.
Best,
Alessio
Il giorno ven 22 feb 2019 alle ore 14:40 Alessio Palmero Aprosio <
alessio@apnetwork.it> ha scritto:
… Try to move that file in the src/main/resources folder of your project.
A.
Il giorno ven 22 feb 2019 alle ore 12:55 nadezdaalexandrovna <
***@***.***> ha scritto:
> Yes, but what is the name of the variable to assign it to?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#18 (comment)>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/ADWtiFNo8VVOerDG6CUOTW1iAWlTO97Lks5vP9qegaJpZM4OZyzr>
> .
>
|
Thank you. |
Good afternoon Alessio, It is similar to the one a had already had with italian.db, but not exactly the same. |
First thank you for your great work on the italian language for
CoreNLP
.I'm trying the
NER
andPOS
tagger. My simplest configuration for CoreNLP is the following:Entities like
DATE
,LOC
,PER
are being recognized. Part of Speech tags as well. I have seen that there are other annotators likeGeoloc
,HeidelTime
, customizedLemma
, etc.For the given configuration, this is my pipeline output:
so the custom annotators like
ita_lemma
andita_toksent
are registered, but I'm not sure that are actually loaded, instead of default ones.Thank you.
The text was updated successfully, but these errors were encountered: