Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DictionaryAnnotator fails to annotate at the end of a sentence #1240

Closed
jkirsch opened this issue May 27, 2018 · 2 comments
Closed

DictionaryAnnotator fails to annotate at the end of a sentence #1240

jkirsch opened this issue May 27, 2018 · 2 comments
Assignees
Labels
Milestone

Comments

@jkirsch
Copy link
Contributor

jkirsch commented May 27, 2018

Given a dictionary of

  • John Silver

And a test sentence which shows the match at the end

I am John Silver

The dictionary annotator fails to match.

Sample test that shows the error, adapted from de.tudarmstadt.ukp.dkpro.core.dictionaryannotator.DictionaryAnnotatorTest

@Test
public void testEndOfSentence() throws Exception
{
    AnalysisEngine ae = createEngine(DictionaryAnnotator.class,
            DictionaryAnnotator.PARAM_ANNOTATION_TYPE, NamedEntity.class,
            DictionaryAnnotator.PARAM_VALUE, "PERSON",
            DictionaryAnnotator.PARAM_MODEL_LOCATION, "src/test/resources/persons.txt");

    JCas jcas = JCasFactory.createJCas();
    TokenBuilder<Token, Sentence> tb = new TokenBuilder<>(Token.class, Sentence.class);
    tb.buildTokens(jcas, "I am John Silver");

    ae.process(jcas);

    NamedEntity ne = selectSingle(jcas, NamedEntity.class);
    assertEquals("PERSON", ne.getValue());
    assertEquals("John Silver", ne.getCoveredText());
}
@jkirsch
Copy link
Contributor Author

jkirsch commented May 27, 2018

The fix might be to change

List<Token> tokensToSentenceEnd = tokens.subList(i, tokens.size() - 1);

to

List<Token> tokensToSentenceEnd = tokens.subList(i, tokens.size());

in DictionaryAnnotator. But I'm not sure if this has been put there for a reason

@reckart
Copy link
Member

reckart commented May 27, 2018

@jkirsch Could be that the author of the code assumed that a sentence always ends in punctuation token and for some reason wanted to exclude that - no idea. Would you like to do a PR with our fix and unit test?

jkirsch added a commit to jkirsch/dkpro-core that referenced this issue May 27, 2018
- removes the code to drop the last token within in a sentence when looking for dictionary matches
@reckart reckart added 🐛Bug Something isn't working Module-dictionaryannotator labels May 29, 2018
@reckart reckart added this to the 1.9.3 milestone May 29, 2018
reckart added a commit that referenced this issue May 29, 2018
#1240 - Ensures DictionaryAnnotator matches all token
reckart pushed a commit that referenced this issue May 29, 2018
- removes the code to drop the last token within in a sentence when looking for dictionary matches
@reckart reckart closed this as completed May 29, 2018
reckart added a commit that referenced this issue Jul 6, 2018
* master: (1026 commits)
  #1216 - Enable Arabic segmentation with CoreNLP
  #1216 - Enable Arabic segmentation with CoreNLP
  #1216 - Enable Arabic segmentation with CoreNLP
  #1246 - Typo in en-ptb-emory-pos.map
  #1240 - Ensures DictionaryAnnotator matches all token
  #1236 - Remove PARAM_INTERN_TAGS
  #1218 - Basic support for PubAnnotation format
  #1236 - Remove PARAM_INTERN_TAGS
  #1236 - Remove PARAM_INTERN_TAGS
  [maven-release-plugin] prepare for next development iteration
  [maven-release-plugin] prepare release de.tudarmstadt.ukp.dkpro.core-1.9.2
  #1222 - Improve component metadata (1.9.2)
  #1222 - Improve component metadata (1.9.2)
  #1231 - PdfReader creates annotations with leading/trailing whitespace
  #1231 - PdfReader creates annotations with leading/trailing whitespace
  #1222 - Improve component metadata (1.9.2)
  #1228 - Problem with model auto-loading in CoreNLP
  #1228 - Problem with model auto-loading in CoreNLP
  #1228 - Problem with model auto-loading in CoreNLP
  #1222 - Improve component metadata (1.9.2)
  ...

% Conflicts:
%	dkpro-core-lbj-asl/pom.xml
reckart added a commit that referenced this issue Jul 8, 2018
* 1.9.x: (28 commits)
  No issue. Updating reference data in disabled TreeTagger unit test.
  No issue. Fixed corpus links.
  #1244 - Restore alignment data for Backmapper after a CAS restore   - review: Moved sofa change operation constants back to ApplyChangesAnnotator and deleted the SofaChangeOperations class again.
  #1244 - Restore alignment data for Backmapper after a CAS restore   - fix: Restored sofa change operations constants in ApplyChangeAnnotator for backward compatibility, but deprecated them with a reference to the new location.   - logging: Added info logging when restoring alignments from SofaChangeAnnotations   - doc: Added some additional documentation about the restore possibilities in the Backmapper
  #1244 - Restore alignment data for Backmapper after a CAS restore   - cleanup: Fixed minor style violations.
  #1244 - Restore alignment data for Backmapper after a CAS restore   - fix: Synchronizing internal map of AlignmentStore during get and put, since it wasn't entirely thread-safe.
  #1244 - Restore alignment data for Backmapper after a CAS restore   - test: Simulating alignment state fallback in Backmapper after a process restart and CAS restore.
  #1244 - Restore alignment data for Backmapper after a CAS restore   - cleanup: Re-establishing style after IDE automatically optimized imports according to different style.
  #1244 - Restore alignment data for Backmapper after a CAS restore   - refactor: Moved sofa change operations constants from ApplyChangesAnnotator to a separate constants class, so that it doesn't require dependencies to the ApplyChangesAnnotator in other classes when it's the only thing being used.
  #1244 - Restore alignment data for Backmapper after a CAS restore   - cleanup: Renamed AlignmentFactory method and local variable in Backmapper.
  #1244 - Restore alignment data for Backmapper after a CAS restore   - cleanup: Renamed AlignmentFactory method and local variable in Backmapper.
  #1244 - Restore alignment data for Backmapper after a CAS restore   - cleanup: Renamed AlignmentBuild to AlignmentFactory.
  #1244 - Restore alignment data for Backmapper after a CAS restore   - cleanup: Moved aligmment build code from constructor into pure static factory method.
  #1244 - Restore alignment data for Backmapper after a CAS restore   - cleanup: Fixed style checking errors.
  #1244 - Restore alignment data for Backmapper after a CAS restore   - feat: Initial modifications to support this feature. Moved construction of AligmentString from SofaChangeAnnatation to an external helper class, so that Backmapper can reuse this logic to reconstruct alignment data when it's not found in the AlignmentStorage after a CAS restore.
  #1242 - Specify cluster resource name for Ark tweet POS tagger trainer
  #1196 - Add Arktweet POS tagger trainer
  #1240 - Ensures DictionaryAnnotator matches all token
  #1235 - Improve documentation and metadata
  #1235 - Improve documentation and metadata
  ...

% Conflicts:
%	dkpro-core-api-parameter-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/api/parameter/ComponentParameters.java
%	dkpro-core-asl/pom.xml
%	dkpro-core-berkeleyparser-gpl/src/main/java/de/tudarmstadt/ukp/dkpro/core/berkeleyparser/BerkeleyParser.java
%	dkpro-core-cogroo-asl/pom.xml
%	dkpro-core-cogroo-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/cogroo/CogrooPosTagger.java
%	dkpro-core-corenlp-gpl/src/main/java/de/tudarmstadt/ukp/dkpro/core/corenlp/CoreNlpDependencyParser.java
%	dkpro-core-corenlp-gpl/src/main/java/de/tudarmstadt/ukp/dkpro/core/corenlp/CoreNlpNamedEntityRecognizer.java
%	dkpro-core-corenlp-gpl/src/main/java/de/tudarmstadt/ukp/dkpro/core/corenlp/CoreNlpParser.java
%	dkpro-core-corenlp-gpl/src/main/java/de/tudarmstadt/ukp/dkpro/core/corenlp/CoreNlpPosTagger.java
%	dkpro-core-gate-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/gate/HepplePosTagger.java
%	dkpro-core-gpl/pom.xml
%	dkpro-core-hunpos-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/hunpos/HunPosTagger.java
%	dkpro-core-io-annis-asl/pom.xml
%	dkpro-core-io-brat-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/io/brat/BratReader.java
%	dkpro-core-io-cermine-gpl/pom.xml
%	dkpro-core-io-conll-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/io/conll/Conll2000Reader.java
%	dkpro-core-io-conll-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/io/conll/Conll2002Reader.java
%	dkpro-core-io-conll-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/io/conll/Conll2003Reader.java
%	dkpro-core-io-conll-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/io/conll/Conll2012Reader.java
%	dkpro-core-io-fangorn-asl/pom.xml
%	dkpro-core-io-gate-asl/pom.xml
%	dkpro-core-io-graf-asl/pom.xml
%	dkpro-core-io-penntree-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/io/penntree/PennTreebankCombinedReader.java
%	dkpro-core-io-rdf-asl/pom.xml
%	dkpro-core-io-tuebadz-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/io/tuebadz/TuebaDZReader.java
%	dkpro-core-kuromoji-asl/pom.xml
%	dkpro-core-lbj-asl/pom.xml
%	dkpro-core-lbj-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/lbj/IllinoisChunker.java
%	dkpro-core-lbj-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/lbj/IllinoisNamedEntityRecognizer.java
%	dkpro-core-lbj-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/lbj/IllinoisPosTagger.java
%	dkpro-core-lingpipe-gpl/src/main/java/de/tudarmstadt/ukp/dkpro/core/lingpipe/LingPipePosTagger.java
%	dkpro-core-opennlp-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/opennlp/OpenNlpChunker.java
%	dkpro-core-opennlp-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/opennlp/OpenNlpParser.java
%	dkpro-core-opennlp-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/opennlp/OpenNlpPosTagger.java
%	dkpro-core-stanfordnlp-gpl/src/main/java/de/tudarmstadt/ukp/dkpro/core/stanfordnlp/StanfordPosTagger.java
%	dkpro-core-treetagger-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/treetagger/TreeTaggerChunker.java
%	dkpro-core-treetagger-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/treetagger/TreeTaggerPosTagger.java
%	dkpro-core-udpipe-asl/pom.xml
%	dkpro-core-udpipe-asl/src/main/java/org/dkpro/core/udpipe/UDPipeParser.java
%	dkpro-core-udpipe-asl/src/main/java/org/dkpro/core/udpipe/UDPipePosTagger.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants