-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time normalization config in eidos.conf and reference.conf #443
Comments
Hi Ben,
Sorry about this frustration. Whether the setting is in reference.conf or
eidos.conf probably depended on how likely people thought it would need to
change.plus whether it depended on language. The true/false was perhaps
not likely to change, but maybe the language was. Anyway, it must have
been subjective, could be improved, and you're probably correct.
The other problem is worse. The model can't be accessed from a jar file.
Recent versions of sbt don't work from some local target directory where
there might still be access to a file with resources, but instead from some
temp directory. One can try to trick sbt by keeping the model somehow
accessible, reverting to an older version of sbt (by editing
project/build.properties), by using IntelliJ or Eclipse, or by changing a
line of code. We hope this is fixed before a larger audience needs to run
it. If need be I'll track down when sbt made that change.
In EidosSystem.scala change
val file = Paths.get(timeNormResource.toURI()).toFile().getAbsolutePath()
to
val file = "/home/you/timenorm_model.hdf5"
Just the messenger,
Keith
…On Fri, Sep 21, 2018 at 12:26 PM Benjamin M. Gyori ***@***.***> wrote:
I've been trying to configure Eidos to use the time normalization feature
and I'm running into some issues. These are 3 issues here but they are
related so I'm putting them all here.
First, I am wondering if some of the differences in eidos.conf and
reference.conf are on purpose or not.
1. In reference.conf timeNormModelPath is set to
timeNormModelPath = /org/clulab/wm/eidos/english/models/timenorm_model.hdf5
whereas in eidos.conf it is set to
timeNormModelPath = /org/clulab/wm/eidos/models/timenorm_model.hdf5
I think between the two, the latter is the better default setting since
timenorm_model.hdf is part of the repo at
org/clulab/wm/eidos/models/timenorm_model.hdf5. Should I update the default
reference.conf to use this path?
1. Another inconsistency between the two conf files is that in
reference.conf
useTimeNorm = false
is set but there is no useTimeNorm row in eidos.conf. Would it make sense
to include the same row with the same default value in eidos.conf as well?
1. Now, using the settings as follows:
timeNormModelPath = /org/clulab/wm/eidos/models/timenorm_model.hdf5
...
useTimeNorm = true
and running
java -Xmx12G -cp /Users/ben/tmp/eidos/target/scala-2.12/eidos-assembly-0.2.2-SNAPSHOT.jar org.clulab.wm.eidos.apps.ExtractFromDirectory /Users/ben/tmp/eidos/docs /Users/ben/tmp/eidos/docs
I get
15:22:16.328 [scala-execution-context-global-11] INFO e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
15:22:17.500 [scala-execution-context-global-11] INFO e.s.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [1.2 sec].
jar:file:/Users/ben/tmp/eidos/target/scala-2.12/eidos-assembly-0.2.2-SNAPSHOT.jar!/org/clulab/wm/eidos/models/timenorm_model.hdf5
Exception in thread "main" java.nio.file.FileSystemNotFoundException
at com.sun.nio.zipfs.ZipFileSystemProvider.getFileSystem(ZipFileSystemProvider.java:171)
at com.sun.nio.zipfs.ZipFileSystemProvider.getPath(ZipFileSystemProvider.java:157)
at java.nio.file.Paths.get(Paths.java:143)
at org.clulab.wm.eidos.EidosSystem$LoadableAttributes$.apply(EidosSystem.scala:129)
at org.clulab.wm.eidos.EidosSystem.<init>(EidosSystem.scala:153)
at org.clulab.wm.eidos.apps.ExtractFromDirectory$.delayedEndpoint$org$clulab$wm$eidos$apps$ExtractFromDirectory$1(ExtractFromDirectory.scala:14)
at org.clulab.wm.eidos.apps.ExtractFromDirectory$delayedInit$body.apply(ExtractFromDirectory.scala:9)
at scala.Function0.apply$mcV$sp(Function0.scala:34)
at scala.Function0.apply$mcV$sp$(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App.$anonfun$main$1$adapted(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:389)
at scala.App.main(App.scala:76)
at scala.App.main$(App.scala:74)
at org.clulab.wm.eidos.apps.ExtractFromDirectory$.main(ExtractFromDirectory.scala:9)
at org.clulab.wm.eidos.apps.ExtractFromDirectory.main(ExtractFromDirectory.scala)
Note that I added a debug print: println(timeNormResource) on
EidosSystem.scala line 127 to produce this line
jar:file:/Users/ben/tmp/eidos/target/scala-2.12/eidos-assembly-0.2.2-SNAPSHOT.jar!/org/clulab/wm/eidos/models/timenorm_model.hdf5
I also confirmed by browsing the jar file itself that
/org/clulab/wm/eidos/models/timenorm_model.hdf5 is at the specified
location within the JAR.
Thanks for your help!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#443>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AIRxOp7AKV8zhPcJ3izyA8CX-pmzzOkAks5udT1fgaJpZM4W0vWD>
.
|
Thanks, so if I understand correctly, if I put the hdf5 file outside the JAR and reference it by its absolute path it should work. Let me try that and I'll report back! |
Yes, that's it. Also please be advised that we are working on related performance issues. Don't plan to run lots of files with timenorm on. |
Alright, that seems to have worked as far as the file path goes. In particular,
works as expected, and However, the other reading mode we have been using, which is for reading snippets of text directly using an instance of EidosSystem and calling its extractFromText method (from Python) gives me this error:
Any clues what might be behind this? |
Still working on it... |
I think I have a guess: from Python I'm passing
With some experimentation, I found that if I change the argument to
|
For the moment, the DocTime must be in YYYY-MM-DD format. Try passing something like scala.Some('2018-09-24'). |
Thanks @EgoLaparra, that worked! Let me test it some more and then I'll close this issue. |
By the way, what happens if you don't pass the fourth argument? |
Complicated... The Java-Python bridge called jnius that allows us to use Eidos programatically at all is not really meant to be used with Scala. Java methods don't have default arguments (you rather define the function multiple times with different sets of arguments) and so jnius thinks this method needs 5 arguments and errors if you call it with less. This is what prompted e.g. this line: |
I see. In any case, we need the actual creation time of the document to get correct normalizations for expression like "last week". The parser cannot infer it from the text, so, when no DocTime is passed, it uses as reference the current date. |
The fourth argument as in filename: Option[String]= None to EidosSystem.annotate? It should be OK. It is only used for the document id which is probably only used for the JSON-LD output. |
Well if you count from 1, not 0, then the 4th argument is documentCreationTime which we discussed above: def extractFromText(text: String, keepText: Boolean = true, cagRelevantOnly: Boolean = true,
documentCreationTime: Option[String] = None, filename: Option[String] = None) |
I can still count, but maybe it's time for trifocals :-) I didn't realize you were both talking about the same thing. |
Thanks, looks like this is working! |
@EgoLaparra, I think you'll want to change from
to
Neither Eidos nor EidosDocument are in a good position to decide what kind of string is being passed and should let whatever reads or produces the string take care of that. In reading these 17k documents I find that the "creation date" comes in multiple formats and it's not efficient to parse them and convert them to the kind of string that is needed (e.g., eight digits, with dashes, without time) only to have them parsed again, etc. |
What about letting the parser to deal with these strings? Eidos could pass whatever it finds, even if the format is not the correct one, and the temporal parser would decide if it can create a DCT or set it as undefined. |
That sounds interesting. Perhaps if it is passed a string, it could convert it to an Option[LocalDateTime] and call the other function. Right now the conversion process is on the fragile side. I haven't been watching your timenorm project to know if you have made the update that includes what you want to be used in this large run. Be sure to let me know. Thanks. |
@EgoLaparra, are we any closer on what needs to be delivered on this large run that needs to work overnight and get sent away? For the metadata files should I expect that there are some without matching text files? I need to double check, but it seemed that there were both texts without metadata and metadata without texts. |
Yes, we are closer. I have changed the parser and EidosDocument so that the dct can be handled with any format, even it it is wrong. I still need to run some test to make sure that everything is working properly. |
@kwalcock, I have created a pull-request to kwalcock-timeTime with theses changes. |
Thanks @EgoLaparra and @kwalcock! |
I've been trying to configure Eidos to use the time normalization feature and I'm running into some issues. These are 3 issues here but they are related so I'm putting them all here.
First, I am wondering if some of the differences in
eidos.conf
andreference.conf
are on purpose or not.whereas in eidos.conf it is set to
I think between the two, the latter is the better default setting since timenorm_model.hdf is part of the repo at org/clulab/wm/eidos/models/timenorm_model.hdf5. Should I update the default reference.conf to use this path?
reference.conf
is set but there is no
useTimeNorm
row ineidos.conf
. Would it make sense to include the same row with the same default value in eidos.conf as well?and running
I get
Note that I added a debug print:
println(timeNormResource)
on EidosSystem.scala line 127 to produce this lineI also confirmed by browsing the jar file itself that /org/clulab/wm/eidos/models/timenorm_model.hdf5 is at the specified location within the JAR.
Thanks for your help!
The text was updated successfully, but these errors were encountered: