Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with resource files in fat jar #759

Closed
bgyori opened this issue Aug 24, 2021 · 6 comments
Closed

Problem with resource files in fat jar #759

bgyori opened this issue Aug 24, 2021 · 6 comments

Comments

@bgyori
Copy link
Contributor

bgyori commented Aug 24, 2021

I am trying to create a fat JAR of the latest Reach using sbt assembly and then using it to process text with the ApiRuler's annotate_text method (this has been our standard integration approach, no change here). If I do this exactly from the Reach repo's main folder, it works, i.e. it is able to load all the resource files and return a result

[dynet] Loading DyNet from /tmp/libdynet_swig-2518909563452203147.so...
[dynet] random seed: 2843805941
[dynet] allocating memory: 512,512,512,512MB
[dynet] memory allocation done.
22:41:27.961 [main] INFO  o.c.p.m.DeepLearningPolarityClassifier - Loading saved model SavedLSTM_WideBound_u_tag ...
22:41:28.282 [main] INFO  o.c.p.m.DeepLearningPolarityClassifier - Loading model finished!
22:41:44.011 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
22:41:44.236 [main] INFO  e.s.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.2 sec].
22:41:44.249 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
22:41:44.285 [main] INFO  org.clulab.sequences.LexiconNER - Beginning to load the KBs for the rule-based NER...
22:41:44.303 [main] INFO  org.clulab.sequences.LexiconNER - Loaded OVERRIDE matchers for all labels.  The number of entries added to the first layer was 750.
22:41:44.304 [main] INFO  o.c.p.bionlp.BioNLPProcessor - Loading BioProcess...
22:41:45.838 [main] INFO  o.c.p.bionlp.BioNLPProcessor - Done. Read org.clulab.processors.bionlp.ner.ReachSingleStandardKbSource$$anon$1@7e27b77a.lineCount lines from bio_process.tsv
...

However, if I move to any other folder, loading the resource files fails when attempting to load the first resource file:

[dynet] Loading DyNet from /tmp/libdynet_swig-8493435714969305050.so...
[dynet] random seed: 2078540065
[dynet] allocating memory: 512,512,512,512MB
[dynet] memory allocation done.
22:41:00.169 [main] INFO  o.c.p.m.DeepLearningPolarityClassifier - Loading saved model SavedLSTM_WideBound_u_tag ...
22:41:00.492 [main] INFO  o.c.p.m.DeepLearningPolarityClassifier - Loading model finished!
22:41:16.736 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
22:41:16.973 [main] INFO  e.s.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger ... done [0.2 sec].
22:41:16.988 [main] INFO  e.s.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
22:41:17.024 [main] INFO  org.clulab.sequences.LexiconNER - Beginning to load the KBs for the rule-based NER...
22:41:17.041 [main] INFO  org.clulab.sequences.LexiconNER - Loaded OVERRIDE matchers for all labels.  The number of entries added to the first layer was 750.
22:41:17.042 [main] INFO  o.c.p.bionlp.BioNLPProcessor - Loading BioProcess...

-> process exits here.

So I suspect the issue is with the path by which the bioresources are referred to in the context of a JAR file.
In particular, I am wondering if this line: https://github.com/clulab/reach/blob/master/bioresources/src/main/resources/application.conf#L1 could be responsible for the issue.

Any help would be appreciated!

@bgyori
Copy link
Contributor Author

bgyori commented Aug 24, 2021

As a minor side-issue, note how in the logging message in the case I was running this from the reach repo folder, it says

22:41:45.838 [main] INFO  o.c.p.bionlp.BioNLPProcessor - 
Done. Read org.clulab.processors.bionlp.ner.ReachSingleStandardKbSource$$anon$1@7e27b77a.lineCount 
lines from bio_process.tsv

so the line count is not shown as intended.

@kwalcock
Copy link
Member

At runtime (except maybe during testing, so testtime), nothing should refer to the directory structure of the source code. So, I think you've found a problem. I'm looking into what to do about it.

@MihaiSurdeanu
Copy link
Contributor

MihaiSurdeanu commented Aug 24, 2021 via email

@kwalcock
Copy link
Member

The logging statement does have a bug as well, but it is readily fixed.

@kwalcock
Copy link
Member

This is being addressed with #760.

@bgyori
Copy link
Contributor Author

bgyori commented Aug 29, 2021

Thank you! I have now been using this and everything seems to work.

@bgyori bgyori closed this as completed Aug 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants