You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
java.lang.ClassCastException: class com.fasterxml.jackson.databind.node.TextNode cannot be cast to class com.fasterxml.jackson.databind.node.ArrayNode (com.fasterxml.jackson.databind.node.TextNode and com.fasterxml.jackson.databind.node.ArrayNode are in unnamed module of loader 'app')
at io.anserini.collection.AclAnthology$Document.<init>(AclAnthology.java:158) ~[anserini-0.20.0-fatjar.jar:?]
at io.anserini.collection.AclAnthology$Segment.readNext(AclAnthology.java:115) ~[anserini-0.20.0-fatjar.jar:?]
at io.anserini.collection.FileSegment$1.hasNext(FileSegment.java:136) ~[anserini-0.20.0-fatjar.jar:?]
at io.anserini.index.IndexCollection$LocalIndexerThread.run(IndexCollection.java:298) [anserini-0.20.0-fatjar.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
My use case is using castorini/covidex with ACL documents. My workaround was to index the bibtex of the aclanthology, but there is a lot of brackets and LaTeX things in the text, so I'd rather go with this solution.
It seems that now the the name of the venues is stored in volume.get("venue"), and that volume.get("venues") is a yaml pointer (?) to volume.get("venue").
When following the "Indexing the ACL Anthology with Anserini" the actual indexing raises the following traceback (see AclAnthology.java:158):
There seem to have been a change on how the venues are processed in the acl-org/acl-anthology which breaks anserini's
collection.ACLAnthology
.My use case is using castorini/covidex with ACL documents. My workaround was to index the bibtex of the aclanthology, but there is a lot of brackets and LaTeX things in the text, so I'd rather go with this solution.
Steps to reproduce:
git clone https://github.com/acl-org/acl-anthology conda create -n acl_anth python=3.8 conda activate acl_anth cd acl-anthology pip install -r bin/requirements.txt python bin/create_hugo_yaml.py pip install pyserini python -m pyserini.index -collection AclAnthology -generator AclAnthologyGenerator -threads 8 -input build/data/ -index index/lucene-index-acl-paragraph -storePositions -storeDocvectors -storeContents -storeRaw -optimize
But everything works well when the acl-anthology version used is close to the creation of the "Indexing the ACL Anthology with Anserini" tutorial.
The text was updated successfully, but these errors were encountered: