Time normalization config in eidos.conf and reference.conf #443

bgyori · 2018-09-21T19:26:22Z

I've been trying to configure Eidos to use the time normalization feature and I'm running into some issues. These are 3 issues here but they are related so I'm putting them all here.

First, I am wondering if some of the differences in eidos.conf and reference.conf are on purpose or not.

In reference.conf timeNormModelPath is set to

timeNormModelPath = /org/clulab/wm/eidos/english/models/timenorm_model.hdf5

whereas in eidos.conf it is set to

timeNormModelPath = /org/clulab/wm/eidos/models/timenorm_model.hdf5

I think between the two, the latter is the better default setting since timenorm_model.hdf is part of the repo at org/clulab/wm/eidos/models/timenorm_model.hdf5. Should I update the default reference.conf to use this path?

Another inconsistency between the two conf files is that in reference.conf

useTimeNorm = false

is set but there is no useTimeNorm row in eidos.conf. Would it make sense to include the same row with the same default value in eidos.conf as well?

Now, using the settings as follows:

timeNormModelPath = /org/clulab/wm/eidos/models/timenorm_model.hdf5
...
useTimeNorm = true

and running

java -Xmx12G -cp /Users/ben/tmp/eidos/target/scala-2.12/eidos-assembly-0.2.2-SNAPSHOT.jar org.clulab.wm.eidos.apps.ExtractFromDirectory /Users/ben/tmp/eidos/docs /Users/ben/tmp/eidos/docs

I get

15:22:16.328 [scala-execution-context-global-11] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
15:22:17.500 [scala-execution-context-global-11] INFO  e.s.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [1.2 sec].
jar:file:/Users/ben/tmp/eidos/target/scala-2.12/eidos-assembly-0.2.2-SNAPSHOT.jar!/org/clulab/wm/eidos/models/timenorm_model.hdf5
Exception in thread "main" java.nio.file.FileSystemNotFoundException
	at com.sun.nio.zipfs.ZipFileSystemProvider.getFileSystem(ZipFileSystemProvider.java:171)
	at com.sun.nio.zipfs.ZipFileSystemProvider.getPath(ZipFileSystemProvider.java:157)
	at java.nio.file.Paths.get(Paths.java:143)
	at org.clulab.wm.eidos.EidosSystem$LoadableAttributes$.apply(EidosSystem.scala:129)
	at org.clulab.wm.eidos.EidosSystem.<init>(EidosSystem.scala:153)
	at org.clulab.wm.eidos.apps.ExtractFromDirectory$.delayedEndpoint$org$clulab$wm$eidos$apps$ExtractFromDirectory$1(ExtractFromDirectory.scala:14)
	at org.clulab.wm.eidos.apps.ExtractFromDirectory$delayedInit$body.apply(ExtractFromDirectory.scala:9)
	at scala.Function0.apply$mcV$sp(Function0.scala:34)
	at scala.Function0.apply$mcV$sp$(Function0.scala:34)
	at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
	at scala.App.$anonfun$main$1$adapted(App.scala:76)
	at scala.collection.immutable.List.foreach(List.scala:389)
	at scala.App.main(App.scala:76)
	at scala.App.main$(App.scala:74)
	at org.clulab.wm.eidos.apps.ExtractFromDirectory$.main(ExtractFromDirectory.scala:9)
	at org.clulab.wm.eidos.apps.ExtractFromDirectory.main(ExtractFromDirectory.scala)

Note that I added a debug print: println(timeNormResource) on EidosSystem.scala line 127 to produce this line

jar:file:/Users/ben/tmp/eidos/target/scala-2.12/eidos-assembly-0.2.2-SNAPSHOT.jar!/org/clulab/wm/eidos/models/timenorm_model.hdf5

I also confirmed by browsing the jar file itself that /org/clulab/wm/eidos/models/timenorm_model.hdf5 is at the specified location within the JAR.

Thanks for your help!

The text was updated successfully, but these errors were encountered:

kwalcock · 2018-09-21T20:00:10Z

Hi Ben, Sorry about this frustration. Whether the setting is in reference.conf or eidos.conf probably depended on how likely people thought it would need to change.plus whether it depended on language. The true/false was perhaps not likely to change, but maybe the language was. Anyway, it must have been subjective, could be improved, and you're probably correct. The other problem is worse. The model can't be accessed from a jar file. Recent versions of sbt don't work from some local target directory where there might still be access to a file with resources, but instead from some temp directory. One can try to trick sbt by keeping the model somehow accessible, reverting to an older version of sbt (by editing project/build.properties), by using IntelliJ or Eclipse, or by changing a line of code. We hope this is fixed before a larger audience needs to run it. If need be I'll track down when sbt made that change. In EidosSystem.scala change val file = Paths.get(timeNormResource.toURI()).toFile().getAbsolutePath() to val file = "/home/you/timenorm_model.hdf5" Just the messenger, Keith

…

On Fri, Sep 21, 2018 at 12:26 PM Benjamin M. Gyori ***@***.***> wrote: I've been trying to configure Eidos to use the time normalization feature and I'm running into some issues. These are 3 issues here but they are related so I'm putting them all here. First, I am wondering if some of the differences in eidos.conf and reference.conf are on purpose or not. 1. In reference.conf timeNormModelPath is set to timeNormModelPath = /org/clulab/wm/eidos/english/models/timenorm_model.hdf5 whereas in eidos.conf it is set to timeNormModelPath = /org/clulab/wm/eidos/models/timenorm_model.hdf5 I think between the two, the latter is the better default setting since timenorm_model.hdf is part of the repo at org/clulab/wm/eidos/models/timenorm_model.hdf5. Should I update the default reference.conf to use this path? 1. Another inconsistency between the two conf files is that in reference.conf useTimeNorm = false is set but there is no useTimeNorm row in eidos.conf. Would it make sense to include the same row with the same default value in eidos.conf as well? 1. Now, using the settings as follows: timeNormModelPath = /org/clulab/wm/eidos/models/timenorm_model.hdf5 ... useTimeNorm = true and running java -Xmx12G -cp /Users/ben/tmp/eidos/target/scala-2.12/eidos-assembly-0.2.2-SNAPSHOT.jar org.clulab.wm.eidos.apps.ExtractFromDirectory /Users/ben/tmp/eidos/docs /Users/ben/tmp/eidos/docs I get 15:22:16.328 [scala-execution-context-global-11] INFO e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator pos 15:22:17.500 [scala-execution-context-global-11] INFO e.s.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [1.2 sec]. jar:file:/Users/ben/tmp/eidos/target/scala-2.12/eidos-assembly-0.2.2-SNAPSHOT.jar!/org/clulab/wm/eidos/models/timenorm_model.hdf5 Exception in thread "main" java.nio.file.FileSystemNotFoundException at com.sun.nio.zipfs.ZipFileSystemProvider.getFileSystem(ZipFileSystemProvider.java:171) at com.sun.nio.zipfs.ZipFileSystemProvider.getPath(ZipFileSystemProvider.java:157) at java.nio.file.Paths.get(Paths.java:143) at org.clulab.wm.eidos.EidosSystem$LoadableAttributes$.apply(EidosSystem.scala:129) at org.clulab.wm.eidos.EidosSystem.<init>(EidosSystem.scala:153) at org.clulab.wm.eidos.apps.ExtractFromDirectory$.delayedEndpoint$org$clulab$wm$eidos$apps$ExtractFromDirectory$1(ExtractFromDirectory.scala:14) at org.clulab.wm.eidos.apps.ExtractFromDirectory$delayedInit$body.apply(ExtractFromDirectory.scala:9) at scala.Function0.apply$mcV$sp(Function0.scala:34) at scala.Function0.apply$mcV$sp$(Function0.scala:34) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) at scala.App.$anonfun$main$1$adapted(App.scala:76) at scala.collection.immutable.List.foreach(List.scala:389) at scala.App.main(App.scala:76) at scala.App.main$(App.scala:74) at org.clulab.wm.eidos.apps.ExtractFromDirectory$.main(ExtractFromDirectory.scala:9) at org.clulab.wm.eidos.apps.ExtractFromDirectory.main(ExtractFromDirectory.scala) Note that I added a debug print: println(timeNormResource) on EidosSystem.scala line 127 to produce this line jar:file:/Users/ben/tmp/eidos/target/scala-2.12/eidos-assembly-0.2.2-SNAPSHOT.jar!/org/clulab/wm/eidos/models/timenorm_model.hdf5 I also confirmed by browsing the jar file itself that /org/clulab/wm/eidos/models/timenorm_model.hdf5 is at the specified location within the JAR. Thanks for your help! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#443>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AIRxOp7AKV8zhPcJ3izyA8CX-pmzzOkAks5udT1fgaJpZM4W0vWD> .

bgyori · 2018-09-21T20:14:33Z

Thanks, so if I understand correctly, if I put the hdf5 file outside the JAR and reference it by its absolute path it should work. Let me try that and I'll report back!

kwalcock · 2018-09-21T20:19:05Z

Yes, that's it. Also please be advised that we are working on related performance issues. Don't plan to run lots of files with timenorm on.

bgyori · 2018-09-21T20:56:07Z

Alright, that seems to have worked as far as the file path goes. In particular,

java -Xmx12G -cp /Users/ben/tmp/eidos/target/scala-2.12/eidos-assembly-0.2.2-SNAPSHOT.jar
 org.clulab.wm.eidos.apps.ExtractFromDirectory /Users/ben/tmp/eidos/docs /Users/ben/tmp/eidos/docs

works as expected, and timexes show up in the output JSON-LD.

However, the other reading mode we have been using, which is for reading snippets of text directly using an instance of EidosSystem and calling its extractFromText method (from Python) gives me this error:

JavaException: JVM exception occurred: Text 'nullT00:00:00' could not be parsed at index 0

Any clues what might be behind this?

kwalcock · 2018-09-21T21:13:56Z

Still working on it...
The program ExtractFromDirectory uses extractFromText, so it should be working in general. Can you send a specific sentence that's a problem and/or the important part of code? Thanks. It doesn't seem that @EgoLaparra is online to respond.

bgyori · 2018-09-24T16:17:19Z

I think I have a guess: from Python I'm passing scala.Some(None) as the fourth argument which is the documentCreationTime. I thought passing in None would be adequate because None is defined as the default argument, and ExtractFromDirectory doesn't specify this argument:

val annotatedDocuments = Seq(reader.extractFromText(text))

With some experimentation, I found that if I change the argument to scala.Some('2018'), I get this error:

JavaException: JVM exception occurred: Text '2018T00:00:00' could not be parsed at index 4

EgoLaparra · 2018-09-24T16:22:28Z

For the moment, the DocTime must be in YYYY-MM-DD format. Try passing something like scala.Some('2018-09-24').

bgyori · 2018-09-24T16:34:50Z

Thanks @EgoLaparra, that worked! Let me test it some more and then I'll close this issue.

EgoLaparra · 2018-09-24T16:35:17Z

By the way, what happens if you don't pass the fourth argument?

bgyori · 2018-09-24T16:44:09Z

Complicated... The Java-Python bridge called jnius that allows us to use Eidos programatically at all is not really meant to be used with Scala. Java methods don't have default arguments (you rather define the function multiple times with different sets of arguments) and so jnius thinks this method needs 5 arguments and errors if you call it with less. This is what prompted e.g. this line:
https://github.com/clulab/eidos/blob/master/src/main/scala/org/clulab/wm/eidos/EidosSystem.scala#L32

EgoLaparra · 2018-09-24T17:15:17Z

I see. In any case, we need the actual creation time of the document to get correct normalizations for expression like "last week". The parser cannot infer it from the text, so, when no DocTime is passed, it uses as reference the current date.

kwalcock · 2018-09-24T18:04:53Z

The fourth argument as in filename: Option[String]= None to EidosSystem.annotate? It should be OK. It is only used for the document id which is probably only used for the JSON-LD output.

bgyori · 2018-09-24T18:14:02Z

Well if you count from 1, not 0, then the 4th argument is documentCreationTime which we discussed above:

def extractFromText(text: String, keepText: Boolean = true, cagRelevantOnly: Boolean = true,
                      documentCreationTime: Option[String] = None, filename: Option[String] = None)

kwalcock · 2018-09-24T18:28:59Z

I can still count, but maybe it's time for trifocals :-) I didn't realize you were both talking about the same thing.

bgyori · 2018-09-24T20:33:43Z

Thanks, looks like this is working!

kwalcock · 2018-09-26T16:09:41Z

@EgoLaparra, I think you'll want to change from

def extractFromText(text: String, keepText: Boolean = true, cagRelevantOnly: Boolean = true,
                      documentCreationTime: Option[String] = None, filename: Option[String] = None)

to

def extractFromText(text: String, keepText: Boolean = true, cagRelevantOnly: Boolean = true,
                      documentCreationTime: Option[LocalDateTime] = None, filename: Option[String] = None)

Neither Eidos nor EidosDocument are in a good position to decide what kind of string is being passed and should let whatever reads or produces the string take care of that. In reading these 17k documents I find that the "creation date" comes in multiple formats and it's not efficient to parse them and convert them to the kind of string that is needed (e.g., eight digits, with dashes, without time) only to have them parsed again, etc.

EgoLaparra · 2018-09-27T00:48:30Z

What about letting the parser to deal with these strings? Eidos could pass whatever it finds, even if the format is not the correct one, and the temporal parser would decide if it can create a DCT or set it as undefined.

kwalcock · 2018-09-27T02:23:40Z

That sounds interesting. Perhaps if it is passed a string, it could convert it to an Option[LocalDateTime] and call the other function. Right now the conversion process is on the fragile side. I haven't been watching your timenorm project to know if you have made the update that includes what you want to be used in this large run. Be sure to let me know. Thanks.

kwalcock · 2018-09-28T19:10:40Z

@EgoLaparra, are we any closer on what needs to be delivered on this large run that needs to work overnight and get sent away? For the metadata files should I expect that there are some without matching text files? I need to double check, but it seemed that there were both texts without metadata and metadata without texts.

EgoLaparra · 2018-09-28T20:23:15Z

Yes, we are closer. I have changed the parser and EidosDocument so that the dct can be handled with any format, even it it is wrong. I still need to run some test to make sure that everything is working properly.
And yes, the document collection in the FAO site has changed since I retrieve the pdfs, so this kind of things can happen.

EgoLaparra · 2018-09-29T21:30:56Z

@kwalcock, I have created a pull-request to kwalcock-timeTime with theses changes.

MihaiSurdeanu · 2018-09-30T16:20:01Z

Thanks @EgoLaparra and @kwalcock!
This integration is very important. Please prioritize this work.

bgyori closed this as completed Sep 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time normalization config in eidos.conf and reference.conf #443

Time normalization config in eidos.conf and reference.conf #443

bgyori commented Sep 21, 2018

kwalcock commented Sep 21, 2018 via email

bgyori commented Sep 21, 2018

kwalcock commented Sep 21, 2018

bgyori commented Sep 21, 2018

kwalcock commented Sep 21, 2018 •

edited

Loading

bgyori commented Sep 24, 2018

EgoLaparra commented Sep 24, 2018

bgyori commented Sep 24, 2018

EgoLaparra commented Sep 24, 2018

bgyori commented Sep 24, 2018

EgoLaparra commented Sep 24, 2018

kwalcock commented Sep 24, 2018

bgyori commented Sep 24, 2018

kwalcock commented Sep 24, 2018

bgyori commented Sep 24, 2018

kwalcock commented Sep 26, 2018

EgoLaparra commented Sep 27, 2018

kwalcock commented Sep 27, 2018

kwalcock commented Sep 28, 2018

EgoLaparra commented Sep 28, 2018

EgoLaparra commented Sep 29, 2018

MihaiSurdeanu commented Sep 30, 2018

Time normalization config in eidos.conf and reference.conf #443

Time normalization config in eidos.conf and reference.conf #443

Comments

bgyori commented Sep 21, 2018

kwalcock commented Sep 21, 2018 via email

bgyori commented Sep 21, 2018

kwalcock commented Sep 21, 2018

bgyori commented Sep 21, 2018

kwalcock commented Sep 21, 2018 • edited Loading

bgyori commented Sep 24, 2018

EgoLaparra commented Sep 24, 2018

bgyori commented Sep 24, 2018

EgoLaparra commented Sep 24, 2018

bgyori commented Sep 24, 2018

EgoLaparra commented Sep 24, 2018

kwalcock commented Sep 24, 2018

bgyori commented Sep 24, 2018

kwalcock commented Sep 24, 2018

bgyori commented Sep 24, 2018

kwalcock commented Sep 26, 2018

EgoLaparra commented Sep 27, 2018

kwalcock commented Sep 27, 2018

kwalcock commented Sep 28, 2018

EgoLaparra commented Sep 28, 2018

EgoLaparra commented Sep 29, 2018

MihaiSurdeanu commented Sep 30, 2018

kwalcock commented Sep 21, 2018 •

edited

Loading