Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

With the Arizona and CMU formats enabled, reach drops the ball #716

Closed
kwalcock opened this issue Dec 30, 2020 · 11 comments
Closed

With the Arizona and CMU formats enabled, reach drops the ball #716

kwalcock opened this issue Dec 30, 2020 · 11 comments
Assignees

Comments

@kwalcock
Copy link
Member

kwalcock commented Dec 30, 2020

Regarding #691, @enoriega reports

I think there is some sneaky bug in there.

After enabling the Arizona and CMU formats, ReachCLI is picking up papers, generating the fries output and apparently dropping the ball without actually producing the other output format files. It doesn't report any exceptions nor stack traces.

The log file reports starting the papers but not finishing them. Can someone double check this?

This is the line I have in application.conf in case having these formats simultaneously is what triggers the problem:

outputTypes = ["fries", "serial-json", "text", "indexcard", "arizona", "cmu"]

@kwalcock kwalcock self-assigned this Dec 30, 2020
@MihaiSurdeanu
Copy link
Contributor

MihaiSurdeanu commented Dec 30, 2020 via email

@enoriega
Copy link
Member

@kwalcock Let me know if I can provide data to reproduce this

@kwalcock
Copy link
Member Author

kwalcock commented Jan 6, 2021

You aren't running afoul of the restart mechanism, are you? It was turned on by default not too long ago (May, this year). It looks for restart.log in the output directory and skips files recorded there. If you ran once with the fries output successfully, added some formats, and ran again, the second run would not reprocess the files already done unless restart.log was first erased.

The json output is not working for me, but the other formats are generated if I'm sure to remove restart.log or turn that feature off in application.conf.

@kwalcock
Copy link
Member Author

kwalcock commented Jan 6, 2021

It's really cool that we can output in Friesian!

@enoriega
Copy link
Member

enoriega commented Jan 6, 2021

@kwalcock I am aware of the restart mechanism. So far I am pointing this issue w.r.t. files that haven't been processed yet.

After the fixes in kwalcock-fixes, I noticed that for some nxlm files are processed correctly and all the files are generated. Others crash at the time of json serialization, (as in json4s, not necessarily REACH's json format) with missing matches, like:

scala.MatchError: OEtrigger(org.clulab.reach.mentions.BioTextBoundMention@3b45b692) (of class org.clulab.reach.mentions.OEtrigger)

This is a very prevalent error in the log file. For the class of files that report this error, only the FRIES output is generated.

My educated guess about this is:

New trigger subclasses were added after the serialization code. A match statement somewhere is not comprehensive because it misses the patterns that correspond to the new sub-trigger types. Ergo, an exception is raised and propagates all the way up to ReachCLI, which recovers from it and moves on to the next paper without finishing writing the other output formats.

I attach the log and a file that triggers the error.

PMC5668567.nxml.txt
error.log

@kwalcock
Copy link
Member Author

kwalcock commented Jan 6, 2021

Thanks for the files. I can confirm a crash.

@MihaiSurdeanu
Copy link
Contributor

@kwalcock: please let me know if you need my help on this. A small reproducible test case would help.

@kwalcock
Copy link
Member Author

kwalcock commented Jan 6, 2021

The diagnosis seems to be correct. There were triggers/Modifications added as far back as July 2019 which do not have proper serialization support. IntelliJ does very poorly with multiple package objects with implicit conversions, so things were a little confusing. There should be a fix shortly.

@kwalcock
Copy link
Member Author

kwalcock commented Jan 6, 2021

The most recent change to #719 may take care of it. I'm still going to try for a less wordy version, though.

@MihaiSurdeanu
Copy link
Contributor

Thanks @kwalcock !

@enoriega
Copy link
Member

enoriega commented Jan 6, 2021

@kwalcock I can confirm I no longer see those error in the log. I think this is safe to be closed. Thanks!

@enoriega enoriega closed this as completed Jan 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants