Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parameter to enable lower-cased lookup of first word in sentence in SfstAnnotator #1041

Merged
merged 10 commits into from
Jun 4, 2019
Merged

Conversation

ramonziai
Copy link
Contributor

Many full-form lexicons underlying morphological models do not handle uppercase versions of words. The result is that uppercase forms are not found and cannot be analyzed. This fix adds a parameter which enables the lookup of lowercase versions of sentence-initial words, depending on the locale of the document language.

@reckart
Copy link
Member

reckart commented Apr 7, 2017

Thanks, that looks very useful. I see the default value of the parameter is "false". Should it maybe be "true"?

Did you notice our contribution guidelines? It would be great if you could provide us with a CLA so we can merge the PR.

@reckart reckart added this to the 1.10.0 milestone Jun 7, 2018
@reckart reckart added Module-sfst ⭐️ Enhancement New feature or request labels Jun 7, 2018
@reckart reckart modified the milestones: 1.10.0, 1.11.0 Jul 28, 2018
* master: (653 commits)
  #1299 - Update to CoreNLP 3.9.2
  #1337 - Connl2012 writer uses WordSense, but does not declare it
  #1299 - Update to CoreNLP 3.9.2
  No issue. Fixed JavaDoc error.
  #1340 - Upgrade dependencies (1.11.0)
  #1358 - Improve error messages in TSV3
  #1357 - Upgrade to ICU4J 64.2
  #1340 - Upgrade dependencies (1.11.0)
  #1343 - Segmenter for Chinese
  No issue. Formatting and remove dewac test since dewac model is no longer available.
  #186 - Change artifactId to "dkpro-core-XXX"
  #186 - Change artifactId to "dkpro-core-XXX"
  #186 - Change artifactId to "dkpro-core-XXX"
  #186 - Change artifactId to "dkpro-core-XXX"
  #186 - Change artifactId to "dkpro-core-XXX"
  #186 - Change artifactId to "dkpro-core-XXX"
  #186 - Change artifactId to "dkpro-core-XXX"
  #186 - Change artifactId to "dkpro-core-XXX"
  #186 - Change artifactId to "dkpro-core-XXX"
  #186 - Change artifactId to "dkpro-core-XXX"
  ...

% Conflicts:
%	dkpro-core-sfst-gpl/src/main/java/de/tudarmstadt/ukp/dkpro/core/sfst/SfstAnnotator.java
@reckart
Copy link
Member

reckart commented May 7, 2019

@ramonziai @rziai Since this is a commit to a GPL module, we cannot simply accept the PR under the contribution section of the Apache License - would you be able to provide a CLA?

@ukp-svc-jenkins
Copy link

Can one of the admins verify this patch?

@ramonziai
Copy link
Contributor Author

@ramonziai @rziai Since this is a commit to a GPL module, we cannot simply accept the PR under the contribution section of the Apache License - would you be able to provide a CLA?

Yes, I just sent a signed ICLA to licenses@ukp.informatik.tu-darmstadt.de. Will that be sufficient?

@reckart
Copy link
Member

reckart commented May 7, 2019

Jenkins, can you test this please?

* master:
  #186 - Change artifactId to "dkpro-core-XXX"
@reckart
Copy link
Member

reckart commented Jun 1, 2019

@ramonziai yep, thanks :)

@reckart reckart changed the title Added parameter to enable lower-cased lookup of first word in sentence in SfstAnnotator Add parameter to enable lower-cased lookup of first word in sentence in SfstAnnotator Jun 2, 2019
…entence in SfstAnnotator

- Formatting
- Adjusting to changes in DKPro Core package names
@reckart
Copy link
Member

reckart commented Jun 2, 2019

Jenkins, can you test this please?

@reckart
Copy link
Member

reckart commented Jun 2, 2019

Jenkins, can you test this please?

- Commented out logging test because it fails on Jenkins/Windows
@reckart
Copy link
Member

reckart commented Jun 3, 2019

Jenkins, can you test this please?

@reckart
Copy link
Member

reckart commented Jun 3, 2019

Jenkins, can you test this please?

…entence in SfstAnnotator

- Move changes to the right source file
@reckart
Copy link
Member

reckart commented Jun 4, 2019

Jenkins, can you test this please?

@ukp-svc-jenkins
Copy link

68% (-4.41%) vs master 73%

@reckart reckart merged commit 2776ac5 into dkpro:master Jun 4, 2019
reckart added a commit that referenced this pull request Jun 4, 2019
… of github.com:dkpro/dkpro-core into bugfix/1362-NifWriter-does-not-write-out-NE-identifier

* 'bugfix/1362-NifWriter-does-not-write-out-NE-identifier' of github.com:dkpro/dkpro-core:
  #1041 - Add parameter to enable lower-cased lookup of first word in sentence in SfstAnnotator
  #1366 - Added support in CONLL-U reader for document and paragraph IDs
  #1041 - Add parameter to enable lower-cased lookup of first word in sentence in SfstAnnotator
  #1041 - Add parameter to enable lower-cased lookup of first word in sentence in SfstAnnotator
  Added parameter to enable lower-cased lookup of first word in sentence.
reckart added a commit that referenced this pull request Jun 4, 2019
* master:
  #1041 - Add parameter to enable lower-cased lookup of first word in sentence in SfstAnnotator
  #1362 - NifWriter does not write out NE identifier
  #1362 - NifWriter does not write out NE identifier
  #1152 - Introduce "order" feature on tokens
  #1366 - Added support in CONLL-U reader for document and paragraph IDs
  #1041 - Add parameter to enable lower-cased lookup of first word in sentence in SfstAnnotator
  #1041 - Add parameter to enable lower-cased lookup of first word in sentence in SfstAnnotator
  Added parameter to enable lower-cased lookup of first word in sentence.

% Conflicts:
%	dkpro-core-io-conll-asl/src/test/java/org/dkpro/core/io/conll/ConllUReaderTest.java
%	dkpro-core-io-json-asl/src/test/resources/conll/2000/chunk2000_ref.json
%	dkpro-core-io-xmi-asl/src/test/resources/xmi/english.xmi
reckart added a commit that referenced this pull request Jun 4, 2019
* master:
  #1041 - Add parameter to enable lower-cased lookup of first word in sentence in SfstAnnotator
  #1362 - NifWriter does not write out NE identifier
  #1362 - NifWriter does not write out NE identifier
  #1152 - Introduce "order" feature on tokens
  #1366 - Added support in CONLL-U reader for document and paragraph IDs
  #1041 - Add parameter to enable lower-cased lookup of first word in sentence in SfstAnnotator
  #1366 - Added support in CONLL-U reader for document and paragraph IDs
  #1367 - Support TCF orthography via SofaChangeAnnotations
  #1041 - Add parameter to enable lower-cased lookup of first word in sentence in SfstAnnotator
  #1327 - Update LIF support
  #1366 - Added support in CONLL-U reader for document and paragraph IDs
  #1367 - Support TCF orthography via SofaChangeAnnotations
  Forgot to commit the list declaration
  Warn if CONLL-U file contains multiple documents
  Added support in CONLL-U reader for document and paragraph IDs
  #186 - Change artifactId to "dkpro-core-XXX"
  #1299 - Update to CoreNLP 3.9.2
  #1337 - Connl2012 writer uses WordSense, but does not declare it
  #1299 - Update to CoreNLP 3.9.2
  Added parameter to enable lower-cased lookup of first word in sentence.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⭐️ Enhancement New feature or request Module-sfst
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants