Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update datasets API #1384

Closed
reckart opened this issue Jun 12, 2019 · 0 comments
Closed

Update datasets API #1384

reckart opened this issue Jun 12, 2019 · 0 comments
Assignees
Labels
Milestone

Comments

@reckart
Copy link
Member

reckart commented Jun 12, 2019

Many of the URLs listed in the dataset descriptions now redirect elsewhere (usually http -> https) and need to be updated since the DatasetFactory cannot deal with redirects yet.

The text file for CC-BY 4.0 has had a whitespace-only change. It would be good to have the option to ignore whitespace when validating plain text files to be more resilient against such changes.

@reckart reckart self-assigned this Jun 12, 2019
@reckart reckart added this to the 1.11.0 milestone Jun 12, 2019
reckart added a commit that referenced this issue Jun 12, 2019
- Add SHA512 hash support
- Update dataset descriptions
- Add option to calculate hash sum of plain text files using normalized whitespace
- Added option to set a default verification policy via a system property
- Fixed a few bad dataset descriptions
- Updated documentation regarding the new verificationMode option and SHA512
- Improved log messages
reckart added a commit that referenced this issue Jun 12, 2019
- Add GUM 5.0.0 UD dataset
reckart added a commit that referenced this issue Jun 12, 2019
- Update GUM 5.0.0 UD dataset license info
reckart added a commit that referenced this issue Jun 12, 2019
- Removed comment which shouldn't apply to GUM UD version
@reckart reckart closed this as completed Jun 17, 2019
reckart added a commit that referenced this issue Jul 4, 2019
* master:
  #1382 - TEI reader seems not to be trimming whitespace
  #1378 - BratReader crashes when an annotation covers more than two spans of text
  #1379 - Add generic XML types
  #1382 - TEI reader seems not to be trimming whitespace
  #1384 - Update datasets API
  #1384 - Update datasets API
  #1384 - Update datasets API
  #1384 - Update datasets API
reckart added a commit that referenced this issue Jul 19, 2019
* 1.11.x: (371 commits)
  No issue. Set version to 1.11.0-SNAPSHOT.
  No issue. Fix checkstyle issue.
  No issue. Fix more JavaDoc issues.
  No issue. Upgrade to DKPro Meta 0.2.0.
  #1382 - TEI reader seems not to be trimming whitespace
  #1382 - TEI reader seems not to be trimming whitespace
  #1378 - BratReader crashes when an annotation covers more than two spans of text
  #1379 - Add generic XML types
  #1382 - TEI reader seems not to be trimming whitespace
  #1384 - Update datasets API
  #1384 - Update datasets API
  #1384 - Update datasets API
  #1384 - Update datasets API
  #1381 - Annotations starting/ending in inter-token space cause exception
  #1379 - Add generic XML types
  #1379 - Add generic XML types
  #1379 - Add generic XML types
  #1376 - Update TreeTagger build.xml
  #1346 - Reader for Annotated Gigaword
  #186 - Change artifactId to "dkpro-core-XXX"
  ...

% Conflicts:
%	dkpro-core-io-brat-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/io/brat/BratReader.java
%	dkpro-core-io-brat-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/io/brat/internal/model/BratAnnotationDocument.java
%	dkpro-core-io-brat-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/io/brat/internal/model/TypeMapping.java
%	dkpro-core-io-brat-asl/src/test/java/de/tudarmstadt/ukp/dkpro/core/io/brat/BratReaderWriterTest.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant