Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce "order" feature on tokens #1152

Closed
1 task done
reckart opened this issue Oct 26, 2017 · 0 comments
Closed
1 task done

Introduce "order" feature on tokens #1152

reckart opened this issue Oct 26, 2017 · 0 comments
Assignees
Milestone

Comments

@reckart
Copy link
Member

reckart commented Oct 26, 2017

In addition to the "form" feature, we should also introduce some kind of "order" or "index" feature to indicate the order in which tokens appear in the text, in particular if tokens have the same offsets (e.g. à -> a + a).

  • introduce feature

Problem that will appear: selectCovered won't be working very well anymore.

Deferring these because we don't have any segmenters at the moment that analyze multiple tokens for a given position - when these are needed, please open a new issue:

  • (deferred) make sure feature gets filled by readers and segmenters
  • (deferred) make sure feature gets used when iterating over tokens (maybe a custom token index that uses begin/end/index for sorting on the UIMA level?)
@reckart reckart added this to the 1.9.0 milestone Oct 26, 2017
@reckart reckart self-assigned this Oct 26, 2017
@reckart reckart modified the milestones: 1.9.0, 1.10.0 Nov 14, 2017
@reckart reckart modified the milestones: 1.10.0, 1.11.0 Jul 28, 2018
@reckart reckart modified the milestones: 1.11.0, 1.12.0 Jan 12, 2019
reckart added a commit that referenced this issue Jun 3, 2019
- Added order feature and updated unit tests
- Feature is not really used anywhere yet and is always 0
@reckart reckart modified the milestones: 1.12.0, 1.11.0 Jun 3, 2019
reckart added a commit that referenced this issue Jun 3, 2019
…ure-on-tokens

#1152 - Introduce "order" feature on tokens
@reckart reckart closed this as completed Jun 3, 2019
reckart added a commit that referenced this issue Jun 4, 2019
* master:
  #1041 - Add parameter to enable lower-cased lookup of first word in sentence in SfstAnnotator
  #1362 - NifWriter does not write out NE identifier
  #1362 - NifWriter does not write out NE identifier
  #1152 - Introduce "order" feature on tokens
  #1366 - Added support in CONLL-U reader for document and paragraph IDs
  #1041 - Add parameter to enable lower-cased lookup of first word in sentence in SfstAnnotator
  #1041 - Add parameter to enable lower-cased lookup of first word in sentence in SfstAnnotator
  Added parameter to enable lower-cased lookup of first word in sentence.

% Conflicts:
%	dkpro-core-io-conll-asl/src/test/java/org/dkpro/core/io/conll/ConllUReaderTest.java
%	dkpro-core-io-json-asl/src/test/resources/conll/2000/chunk2000_ref.json
%	dkpro-core-io-xmi-asl/src/test/resources/xmi/english.xmi
reckart added a commit that referenced this issue Jun 4, 2019
* master:
  #1041 - Add parameter to enable lower-cased lookup of first word in sentence in SfstAnnotator
  #1362 - NifWriter does not write out NE identifier
  #1362 - NifWriter does not write out NE identifier
  #1152 - Introduce "order" feature on tokens
  #1366 - Added support in CONLL-U reader for document and paragraph IDs
  #1041 - Add parameter to enable lower-cased lookup of first word in sentence in SfstAnnotator
  #1366 - Added support in CONLL-U reader for document and paragraph IDs
  #1367 - Support TCF orthography via SofaChangeAnnotations
  #1041 - Add parameter to enable lower-cased lookup of first word in sentence in SfstAnnotator
  #1327 - Update LIF support
  #1366 - Added support in CONLL-U reader for document and paragraph IDs
  #1367 - Support TCF orthography via SofaChangeAnnotations
  Forgot to commit the list declaration
  Warn if CONLL-U file contains multiple documents
  Added support in CONLL-U reader for document and paragraph IDs
  #186 - Change artifactId to "dkpro-core-XXX"
  #1299 - Update to CoreNLP 3.9.2
  #1337 - Connl2012 writer uses WordSense, but does not declare it
  #1299 - Update to CoreNLP 3.9.2
  Added parameter to enable lower-cased lookup of first word in sentence.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant