New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DictionaryAnnotator fails to annotate at the end of a sentence #1240
Comments
The fix might be to change List<Token> tokensToSentenceEnd = tokens.subList(i, tokens.size() - 1); to List<Token> tokensToSentenceEnd = tokens.subList(i, tokens.size()); in DictionaryAnnotator. But I'm not sure if this has been put there for a reason |
@jkirsch Could be that the author of the code assumed that a sentence always ends in punctuation token and for some reason wanted to exclude that - no idea. Would you like to do a PR with our fix and unit test? |
jkirsch
added a commit
to jkirsch/dkpro-core
that referenced
this issue
May 27, 2018
- removes the code to drop the last token within in a sentence when looking for dictionary matches
reckart
added a commit
that referenced
this issue
May 29, 2018
#1240 - Ensures DictionaryAnnotator matches all token
reckart
pushed a commit
that referenced
this issue
May 29, 2018
- removes the code to drop the last token within in a sentence when looking for dictionary matches
reckart
added a commit
that referenced
this issue
Jul 6, 2018
* master: (1026 commits) #1216 - Enable Arabic segmentation with CoreNLP #1216 - Enable Arabic segmentation with CoreNLP #1216 - Enable Arabic segmentation with CoreNLP #1246 - Typo in en-ptb-emory-pos.map #1240 - Ensures DictionaryAnnotator matches all token #1236 - Remove PARAM_INTERN_TAGS #1218 - Basic support for PubAnnotation format #1236 - Remove PARAM_INTERN_TAGS #1236 - Remove PARAM_INTERN_TAGS [maven-release-plugin] prepare for next development iteration [maven-release-plugin] prepare release de.tudarmstadt.ukp.dkpro.core-1.9.2 #1222 - Improve component metadata (1.9.2) #1222 - Improve component metadata (1.9.2) #1231 - PdfReader creates annotations with leading/trailing whitespace #1231 - PdfReader creates annotations with leading/trailing whitespace #1222 - Improve component metadata (1.9.2) #1228 - Problem with model auto-loading in CoreNLP #1228 - Problem with model auto-loading in CoreNLP #1228 - Problem with model auto-loading in CoreNLP #1222 - Improve component metadata (1.9.2) ... % Conflicts: % dkpro-core-lbj-asl/pom.xml
reckart
added a commit
that referenced
this issue
Jul 8, 2018
* 1.9.x: (28 commits) No issue. Updating reference data in disabled TreeTagger unit test. No issue. Fixed corpus links. #1244 - Restore alignment data for Backmapper after a CAS restore - review: Moved sofa change operation constants back to ApplyChangesAnnotator and deleted the SofaChangeOperations class again. #1244 - Restore alignment data for Backmapper after a CAS restore - fix: Restored sofa change operations constants in ApplyChangeAnnotator for backward compatibility, but deprecated them with a reference to the new location. - logging: Added info logging when restoring alignments from SofaChangeAnnotations - doc: Added some additional documentation about the restore possibilities in the Backmapper #1244 - Restore alignment data for Backmapper after a CAS restore - cleanup: Fixed minor style violations. #1244 - Restore alignment data for Backmapper after a CAS restore - fix: Synchronizing internal map of AlignmentStore during get and put, since it wasn't entirely thread-safe. #1244 - Restore alignment data for Backmapper after a CAS restore - test: Simulating alignment state fallback in Backmapper after a process restart and CAS restore. #1244 - Restore alignment data for Backmapper after a CAS restore - cleanup: Re-establishing style after IDE automatically optimized imports according to different style. #1244 - Restore alignment data for Backmapper after a CAS restore - refactor: Moved sofa change operations constants from ApplyChangesAnnotator to a separate constants class, so that it doesn't require dependencies to the ApplyChangesAnnotator in other classes when it's the only thing being used. #1244 - Restore alignment data for Backmapper after a CAS restore - cleanup: Renamed AlignmentFactory method and local variable in Backmapper. #1244 - Restore alignment data for Backmapper after a CAS restore - cleanup: Renamed AlignmentFactory method and local variable in Backmapper. #1244 - Restore alignment data for Backmapper after a CAS restore - cleanup: Renamed AlignmentBuild to AlignmentFactory. #1244 - Restore alignment data for Backmapper after a CAS restore - cleanup: Moved aligmment build code from constructor into pure static factory method. #1244 - Restore alignment data for Backmapper after a CAS restore - cleanup: Fixed style checking errors. #1244 - Restore alignment data for Backmapper after a CAS restore - feat: Initial modifications to support this feature. Moved construction of AligmentString from SofaChangeAnnatation to an external helper class, so that Backmapper can reuse this logic to reconstruct alignment data when it's not found in the AlignmentStorage after a CAS restore. #1242 - Specify cluster resource name for Ark tweet POS tagger trainer #1196 - Add Arktweet POS tagger trainer #1240 - Ensures DictionaryAnnotator matches all token #1235 - Improve documentation and metadata #1235 - Improve documentation and metadata ... % Conflicts: % dkpro-core-api-parameter-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/api/parameter/ComponentParameters.java % dkpro-core-asl/pom.xml % dkpro-core-berkeleyparser-gpl/src/main/java/de/tudarmstadt/ukp/dkpro/core/berkeleyparser/BerkeleyParser.java % dkpro-core-cogroo-asl/pom.xml % dkpro-core-cogroo-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/cogroo/CogrooPosTagger.java % dkpro-core-corenlp-gpl/src/main/java/de/tudarmstadt/ukp/dkpro/core/corenlp/CoreNlpDependencyParser.java % dkpro-core-corenlp-gpl/src/main/java/de/tudarmstadt/ukp/dkpro/core/corenlp/CoreNlpNamedEntityRecognizer.java % dkpro-core-corenlp-gpl/src/main/java/de/tudarmstadt/ukp/dkpro/core/corenlp/CoreNlpParser.java % dkpro-core-corenlp-gpl/src/main/java/de/tudarmstadt/ukp/dkpro/core/corenlp/CoreNlpPosTagger.java % dkpro-core-gate-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/gate/HepplePosTagger.java % dkpro-core-gpl/pom.xml % dkpro-core-hunpos-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/hunpos/HunPosTagger.java % dkpro-core-io-annis-asl/pom.xml % dkpro-core-io-brat-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/io/brat/BratReader.java % dkpro-core-io-cermine-gpl/pom.xml % dkpro-core-io-conll-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/io/conll/Conll2000Reader.java % dkpro-core-io-conll-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/io/conll/Conll2002Reader.java % dkpro-core-io-conll-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/io/conll/Conll2003Reader.java % dkpro-core-io-conll-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/io/conll/Conll2012Reader.java % dkpro-core-io-fangorn-asl/pom.xml % dkpro-core-io-gate-asl/pom.xml % dkpro-core-io-graf-asl/pom.xml % dkpro-core-io-penntree-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/io/penntree/PennTreebankCombinedReader.java % dkpro-core-io-rdf-asl/pom.xml % dkpro-core-io-tuebadz-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/io/tuebadz/TuebaDZReader.java % dkpro-core-kuromoji-asl/pom.xml % dkpro-core-lbj-asl/pom.xml % dkpro-core-lbj-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/lbj/IllinoisChunker.java % dkpro-core-lbj-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/lbj/IllinoisNamedEntityRecognizer.java % dkpro-core-lbj-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/lbj/IllinoisPosTagger.java % dkpro-core-lingpipe-gpl/src/main/java/de/tudarmstadt/ukp/dkpro/core/lingpipe/LingPipePosTagger.java % dkpro-core-opennlp-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/opennlp/OpenNlpChunker.java % dkpro-core-opennlp-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/opennlp/OpenNlpParser.java % dkpro-core-opennlp-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/opennlp/OpenNlpPosTagger.java % dkpro-core-stanfordnlp-gpl/src/main/java/de/tudarmstadt/ukp/dkpro/core/stanfordnlp/StanfordPosTagger.java % dkpro-core-treetagger-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/treetagger/TreeTaggerChunker.java % dkpro-core-treetagger-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/treetagger/TreeTaggerPosTagger.java % dkpro-core-udpipe-asl/pom.xml % dkpro-core-udpipe-asl/src/main/java/org/dkpro/core/udpipe/UDPipeParser.java % dkpro-core-udpipe-asl/src/main/java/org/dkpro/core/udpipe/UDPipePosTagger.java
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Given a dictionary of
And a test sentence which shows the match at the end
The dictionary annotator fails to match.
Sample test that shows the error, adapted from
de.tudarmstadt.ukp.dkpro.core.dictionaryannotator.DictionaryAnnotatorTest
The text was updated successfully, but these errors were encountered: