Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NxmLReader problem probably associated to assembly #718

Closed
enoriega opened this issue Jan 2, 2021 · 19 comments
Closed

NxmLReader problem probably associated to assembly #718

enoriega opened this issue Jan 2, 2021 · 19 comments
Labels

Comments

@enoriega
Copy link
Member

enoriega commented Jan 2, 2021

When I turn on assembly in the config, sometimes I see an an NxmlReader error that I didn't see otherwise. The stack trace points back to some of the assembly methods. I suspect there is an unexpected corner case with this class of files that crashes their processing.

The error is not catastrophic, as processing of the other files carries on.

I attach the stack trace of the exception and an nxml file that triggers it for replication purposes.

PMC6797981.nxml.txt

Stack trace:

 ¡¡¡ NxmlReader error !!!                                                      

paper: PMC6797981

                                                              

error:                                                                         
java.lang.reflect.InvocationTargetException

stack trace:                                                                   
sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
scala.reflect.runtime.JavaMirrors$JavaMirror$JavaVanillaMethodMirror2.jinvokeraw(JavaMirrors.scala:398)
scala.reflect.runtime.JavaMirrors$JavaMirror$JavaMethodMirror.jinvoke(JavaMirrors.scala:354)
scala.reflect.runtime.JavaMirrors$JavaMirror$JavaVanillaMethodMirror.apply(JavaMirrors.scala:370)
org.clulab.odin.impl.ActionMirror.$anonfun$reflect$1(ActionMirror.scala:23)
org.clulab.odin.impl.GraphExtractor.findAllIn(Extractor.scala:111)
org.clulab.odin.impl.Extractor.$anonfun$findAllIn$1(Extractor.scala:20)
org.clulab.odin.impl.Extractor.$anonfun$findAllIn$1$adapted(Extractor.scala:19) 
scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:244)
scala.collection.immutable.Range.foreach(Range.scala:158)
scala.collection.TraversableLike.flatMap(TraversableLike.scala:244)
scala.collection.TraversableLike.flatMap$(TraversableLike.scala:241)
scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
org.clulab.odin.impl.Extractor.findAllIn(Extractor.scala:19)
org.clulab.odin.impl.Extractor.findAllIn$(Extractor.scala:18)
org.clulab.odin.impl.GraphExtractor.findAllIn(Extractor.scala:99)
org.clulab.odin.ExtractorEngine.$anonfun$extractFrom$2(ExtractorEngine.scala:45)
scala.collection.TraversableLike$WithFilter.$anonfun$flatMap$2(TraversableLike.scala:773)
scala.collection.Iterator.foreach(Iterator.scala:941)
scala.collection.Iterator.foreach$(Iterator.scala:941)
scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
scala.collection.IterableLike.foreach(IterableLike.scala:74)
scala.collection.IterableLike.foreach$(IterableLike.scala:73)
scala.collection.AbstractIterable.foreach(Iterable.scala:56)
scala.collection.TraversableLike$WithFilter.flatMap(TraversableLike.scala:772)
org.clulab.odin.ExtractorEngine.extract$1(ExtractorEngine.scala:43)
org.clulab.odin.ExtractorEngine.loop$1(ExtractorEngine.scala:34)
org.clulab.odin.ExtractorEngine.extractFrom(ExtractorEngine.scala:56)
org.clulab.odin.ExtractorEngine.extractByType(ExtractorEngine.scala:63)
org.clulab.reach.ReachSystem.extractEventsFrom(ReachSystem.scala:213)
org.clulab.reach.ReachSystem.extractFrom(ReachSystem.scala:89)
org.clulab.reach.ReachSystem.extractFrom(ReachSystem.scala:155)
org.clulab.reach.ReachSystem.extractFrom(ReachSystem.scala:149)
org.clulab.reach.ReachSystem.extractFrom(ReachSystem.scala:73)
org.clulab.reach.PaperReader$.getMentionsFromEntry(PaperReader.scala:144)
org.clulab.reach.ReachCLI.processPaper(ReachCLI.scala:136)
org.clulab.reach.ReachCLI.$anonfun$processPapers$3(ReachCLI.scala:90)
org.clulab.reach.ReachCLI.$anonfun$processPapers$3$adapted(ReachCLI.scala:84)
scala.collection.parallel.AugmentedIterableIterator.map2combiner(RemainsIterator.scala:116)
scala.collection.parallel.AugmentedIterableIterator.map2combiner$(RemainsIterator.scala:113)
scala.collection.parallel.immutable.ParVector$ParVectorIterator.map2combiner(ParVector.scala:66)
scala.collection.parallel.ParIterableLike$Map.leaf(ParIterableLike.scala:1056)
scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:53)
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:67)
scala.collection.parallel.Task.tryLeaf(Tasks.scala:56)
scala.collection.parallel.Task.tryLeaf$(Tasks.scala:50)
scala.collection.parallel.ParIterableLike$Map.tryLeaf(ParIterableLike.scala:1053)
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.internal(Tasks.scala:170)
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.internal$(Tasks.scala:157)
scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.internal(Tasks.scala:440)
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute(Tasks.scala:150)
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute$(Tasks.scala:149)
scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:440)
java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
@enoriega enoriega added the bug label Jan 2, 2021
@kwalcock
Copy link
Member

kwalcock commented Jan 5, 2021

This stack trace is deceptive, probably because of the reflection involved. It is hiding an array index out of bounds exception which is thrown from LinguisticPolarityEngine line 46. At that point there is a sentence with a token interval [3, 29] but incoming edges only from [0, 28].

      val prepc_byed = (evt.tokenInterval filter (tok => deps.getIncomingEdges(tok).map(_._2).contains("advcl_by"))).toSet

@kwalcock
Copy link
Member

kwalcock commented Jan 5, 2021

This may well have been fixed with clulab/processors#428 which should be in processors 8.2.2, but that's exactly what is supposed to be being used...

@MihaiSurdeanu
Copy link
Contributor

Indeed. This should be fixed there...
@enoriega: if you can isolate this a sentence, I will debug this.

@kwalcock
Copy link
Member

kwalcock commented Jan 5, 2021 via email

@kwalcock
Copy link
Member

kwalcock commented Jan 5, 2021

It is this sentence, which is almost the only one in the attached file:

This pleiotropic inflammatory cytokine is produced by T cells, monocytes, macrophages and synovial fibroblasts, and mediates various functions by binding to its receptor IL-6R ( 40 ).

AFAICT, the problem is in the processors project, file CoreNLPProcessor.scala, around line 129-130, in which these lines do not pass in a preferredSize when they call CoreNLPUtils.toDirectedGraph, unlike the code in FastNLPProcessor, method parseWithStanford, which might be used as a template.

doc.sentences(offset).setDependencies(GraphMap.UNIVERSAL_BASIC, CoreNLPUtils.toDirectedGraph(basicDeps, in))
doc.sentences(offset).setDependencies(GraphMap.UNIVERSAL_ENHANCED, CoreNLPUtils.toDirectedGraph(enhancedDeps, in))

Sentence110.nxml.txt

@MihaiSurdeanu
Copy link
Contributor

Thanks! I'll take a look soon.

@kwalcock
Copy link
Member

kwalcock commented Jan 5, 2021

@enoriega's keen observational skills are greatly appreciated. Thanks for taking the time to report the problem.

@MihaiSurdeanu
Copy link
Contributor

Solved in processors PR #439, which is in the process of being tested and then merged.

@enoriega: this means that you have to publishLocal processors 8.2.4-SNAPSHOT, and use this version in reach/processors/build.sbt.

@MihaiSurdeanu
Copy link
Contributor

The processors PR has been merged.

@kwalcock
Copy link
Member

kwalcock commented Jan 5, 2021

This has likely been handled with clulab/processors#439.

@kwalcock kwalcock closed this as completed Jan 5, 2021
@enoriega
Copy link
Member Author

enoriega commented Jan 5, 2021

I tested again with a freshly cloned processors version 8.2.4-SNAPSHOT and am still getting the same error trace. Could it be that there's still a corner case not covered in processors?

PMC2669449.nxml.txt

 ¡¡¡ NxmlReader error !!!

paper: PMC2669449

error:
java.lang.reflect.InvocationTargetException

stack trace:
sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
scala.reflect.runtime.JavaMirrors$JavaMirror$JavaVanillaMethodMirror2.jinvokeraw(JavaMirrors.scala:398)
scala.reflect.runtime.JavaMirrors$JavaMirror$JavaMethodMirror.jinvoke(JavaMirrors.scala:354)
scala.reflect.runtime.JavaMirrors$JavaMirror$JavaVanillaMethodMirror.apply(JavaMirrors.scala:370)
org.clulab.odin.impl.ActionMirror.$anonfun$reflect$1(ActionMirror.scala:23)
org.clulab.odin.impl.GraphExtractor.findAllIn(Extractor.scala:111)
org.clulab.odin.impl.Extractor.$anonfun$findAllIn$1(Extractor.scala:20)
org.clulab.odin.impl.Extractor.$anonfun$findAllIn$1$adapted(Extractor.scala:19)
scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:244)
scala.collection.immutable.Range.foreach(Range.scala:158)
scala.collection.TraversableLike.flatMap(TraversableLike.scala:244)
scala.collection.TraversableLike.flatMap$(TraversableLike.scala:241)
scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
org.clulab.odin.impl.Extractor.findAllIn(Extractor.scala:19)
org.clulab.odin.impl.Extractor.findAllIn$(Extractor.scala:18)
org.clulab.odin.impl.GraphExtractor.findAllIn(Extractor.scala:99)
org.clulab.odin.ExtractorEngine.$anonfun$extractFrom$2(ExtractorEngine.scala:45)
scala.collection.TraversableLike$WithFilter.$anonfun$flatMap$2(TraversableLike.scala:773)
scala.collection.Iterator.foreach(Iterator.scala:941)
scala.collection.Iterator.foreach$(Iterator.scala:941)
scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
scala.collection.IterableLike.foreach(IterableLike.scala:74)
scala.collection.IterableLike.foreach$(IterableLike.scala:73)
scala.collection.AbstractIterable.foreach(Iterable.scala:56)
scala.collection.TraversableLike$WithFilter.flatMap(TraversableLike.scala:772)
org.clulab.odin.ExtractorEngine.extract$1(ExtractorEngine.scala:43)
org.clulab.odin.ExtractorEngine.loop$1(ExtractorEngine.scala:34)
org.clulab.odin.ExtractorEngine.extractFrom(ExtractorEngine.scala:56)
org.clulab.reach.assembly.sieves.SieveUtils$.$anonfun$assemblyViaRules$4(Sieves.scala:540)
scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:244)
scala.collection.immutable.Map$Map1.foreach(Map.scala:128)
scala.collection.TraversableLike.flatMap(TraversableLike.scala:244)
scala.collection.TraversableLike.flatMap$(TraversableLike.scala:241)
scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
org.clulab.reach.assembly.sieves.SieveUtils$.assemblyViaRules(Sieves.scala:534)
org.clulab.reach.assembly.sieves.PrecedenceSieves.applyPrecedenceRules(Sieves.scala:63)
org.clulab.reach.assembly.sieves.PrecedenceSieves.intrasententialRBPrecedence(Sieves.scala:103)
org.clulab.reach.assembly.Assembler$.$anonfun$applySieves$2(Assembler.scala:146)
org.clulab.reach.assembly.sieves.AssemblySieve$$anon$1.apply(AssemblySieve.scala:32)
org.clulab.reach.assembly.sieves.SieveMixture.apply(AssemblySieve.scala:38)
org.clulab.reach.assembly.sieves.SieveMixture.apply(AssemblySieve.scala:43)
org.clulab.reach.assembly.Assembler$.applySieves(Assembler.scala:158)
org.clulab.reach.assembly.Assembler$.apply(Assembler.scala:116)
org.clulab.reach.ReachCLI.doAssembly(ReachCLI.scala:124)
org.clulab.reach.ReachCLI.outputMentions(ReachCLI.scala:184)
org.clulab.reach.ReachCLI.$anonfun$processPaper$1(ReachCLI.scala:143)
org.clulab.reach.ReachCLI.$anonfun$processPaper$1$adapted(ReachCLI.scala:143)
scala.collection.immutable.List.foreach(List.scala:392)
org.clulab.reach.ReachCLI.processPaper(ReachCLI.scala:143)
org.clulab.reach.ReachCLI.$anonfun$processPapers$3(ReachCLI.scala:90)
org.clulab.reach.ReachCLI.$anonfun$processPapers$3$adapted(ReachCLI.scala:84)
scala.collection.parallel.AugmentedIterableIterator.map2combiner(RemainsIterator.scala:116)
scala.collection.parallel.AugmentedIterableIterator.map2combiner$(RemainsIterator.scala:113)
scala.collection.parallel.immutable.ParVector$ParVectorIterator.map2combiner(ParVector.scala:66)
scala.collection.parallel.ParIterableLike$Map.leaf(ParIterableLike.scala:1056)
scala.collection.parallel.Task.$anonfun$tryLeaf$1(Tasks.scala:53)
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
scala.util.control.Breaks$$anon$1.catchBreak(Breaks.scala:67)
scala.collection.parallel.Task.tryLeaf(Tasks.scala:56)
scala.collection.parallel.Task.tryLeaf$(Tasks.scala:50)
scala.collection.parallel.ParIterableLike$Map.tryLeaf(ParIterableLike.scala:1053)
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.internal(Tasks.scala:160)
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.internal$(Tasks.scala:157)
scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.internal(Tasks.scala:440)
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute(Tasks.scala:150)
scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask.compute$(Tasks.scala:149)
scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:440)
java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

==========

@kwalcock kwalcock reopened this Jan 5, 2021
@kwalcock
Copy link
Member

kwalcock commented Jan 5, 2021

I'll check.

@kwalcock
Copy link
Member

kwalcock commented Jan 6, 2021

It looks like the same error, but has a completely different cause. This line in reach does not check the bounds correctly:

case outOfBounds if outOfBounds == -1 || outOfBounds > words.size => false

@MihaiSurdeanu
Copy link
Contributor

Can you please try to fix it?
Thank you!

@kwalcock
Copy link
Member

kwalcock commented Jan 6, 2021

Yes, doing so.

@kwalcock
Copy link
Member

kwalcock commented Jan 6, 2021

@enoriega, while it is being tested, approved, and merged, you can use the changes from #719 or the kwalcock-fixes branch.

@kwalcock kwalcock closed this as completed Jan 6, 2021
@enoriega
Copy link
Member Author

enoriega commented Jan 6, 2021

Thanks @kwalcock

@enoriega
Copy link
Member Author

enoriega commented Jan 6, 2021

I have been running REACH using branch kwalcock-fixes for a while and haven't seen this error. I think it is safe it has been fixed now. Thanks @kwalcock

@kwalcock
Copy link
Member

kwalcock commented Jan 6, 2021

Thanks for the update. We're working on the merge to master and a new release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants