Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#168 - Experimental JSON CAS support #169

Merged
merged 53 commits into from
Dec 12, 2021

Conversation

reckart
Copy link
Member

@reckart reckart commented Aug 13, 2021

  • Basic JSON CAS support
  • Randomized tests

- Added very basic JSON CAS support
- No support for type systems yet
- No support for lenient loading
- Remove Cas:NULL via type name instead of puring simply the FS with ID 0 (which may not be a Cas:NULL fs)
- Added various constants for type names and feature names in the Cas class (analouge to the Apache UIMA Java SDK impl)
- WIP
@reckart reckart self-assigned this Aug 13, 2021
- Fixed bad PyDoc comment
- Fixed linter error because type hint was referring to a dynamically created type
- Roll back change of Sofa.sofaArray range type from uima.cas.ByteArray back to uima.cas.TOP which is indeed the range type also used in the Apache UIMA Java SDK - despite only uima.cas.ByteArray being acceptable...
@codecov
Copy link

codecov bot commented Aug 13, 2021

Codecov Report

Merging #169 (e89ada4) into main (0b802b3) will decrease coverage by 0.64%.
The diff coverage is 92.02%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #169      +/-   ##
==========================================
- Coverage   96.15%   95.51%   -0.65%     
==========================================
  Files           4        5       +1     
  Lines        1509     1827     +318     
==========================================
+ Hits         1451     1745     +294     
- Misses         58       82      +24     
Impacted Files Coverage Δ
cassis/json.py 91.61% <91.61%> (ø)
cassis/cas.py 95.89% <93.75%> (+0.11%) ⬆️
cassis/typesystem.py 95.63% <100.00%> (+0.19%) ⬆️
cassis/xmi.py 96.88% <100.00%> (+0.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0b802b3...e89ada4. Read the comment docs.

cassis/cas.py Outdated Show resolved Hide resolved
cassis/cas.py Outdated Show resolved Hide resolved
cassis/cas.py Outdated Show resolved Hide resolved
cassis/typesystem.py Outdated Show resolved Hide resolved
tests/util.py Outdated Show resolved Hide resolved
- Added generator for random CASes
- Added JSON tests using random CAS generator
- Added support for (de)serializing type system information in the JSON format
- Move the type/feature name constants from Cas to typesystem.py
- Added another generator for random CASes
- Added more tests
- Commented out all testing of arrays in the new generator since array handling in cassis seems to have a few conceptual problems when need to be looked at first
- Revert change to stripping the null FS
- Changed reference data so that IDs start at 1 and not at 0 leaving 0 reserved for the null FS
@jcklie jcklie added this to the 0.6.0 milestone Aug 14, 2021
@jcklie jcklie linked an issue Aug 14, 2021 that may be closed by this pull request
* main:
  #172 - Naming: cas.add_annotation(s) (#181)
  #175 - Set a feature if the feature name is in a variable (#180)
  #175 - Set a feature if the feature name is in a variable
  #174 - FSes that are only transitively referenced cannot be serialized (#179)
  #170 - Handling of the "uima.noNamespace" prefix (#178)
  No issue
  #173 - Rename add_feature to create_feature (#177)

# Conflicts:
#	cassis/typesystem.py
…annot-be-serialized' into feature/168-Experimental-JSON-CAS-support

* bugfix/174-FSes-that-are-only-transitively-referenced-cannot-be-serialized:
  #174 - FSes that are only transitively referenced cannot be serialized (#179)
…annot-be-serialized' into feature/168-Experimental-JSON-CAS-support

* bugfix/174-FSes-that-are-only-transitively-referenced-cannot-be-serialized:
  #174 - FSes that are only transitively referenced cannot be serialized (#179)
* main:
  #174 - FSes that are only transitively referenced cannot be serialized (#179)
  #174 - FSes that are only transitively referenced cannot be serialized (#179)
…ed-and-arrays' into feature/168-Experimental-JSON-CAS-support

* feature/185-186-187-Handling-of-multipleReferencesAllowed-and-arrays:
  #187 - The multipleReferencesAllowed flag on array features is not handled
  #187 - The multipleReferencesAllowed flag on array features is not handled
  #187 - The multipleReferencesAllowed flag on array features is not handled
  #187 - The multipleReferencesAllowed flag on array features is not handled
  #187 - The multipleReferencesAllowed flag on array features is not handled
  #187 - The multipleReferencesAllowed flag on array features is not handled
  #186 - Creating subtypes of inheritance-final types (arrays) is not prevented
  #185 - Transitively referenced primitive arrays not returned by _find_all_fs #186 - Creating subtypes of inheritance-final types (arrays) is not prevented #187 - The multipleReferencesAllowed flag on array features is not handled
- Fix array support
- Enable array tests
…ed-and-arrays' into feature/168-Experimental-JSON-CAS-support

* feature/185-186-187-Handling-of-multipleReferencesAllowed-and-arrays:
  #187 - The multipleReferencesAllowed flag on array features is not handled
* main:
  #183 - Better error message when failing to resolve feature path
  #183 - Better error message when failing to resolve feature path
- Change view members field name
…168-Experimental-JSON-CAS-support

* commit '708b78aa5008ec09497999e5655662e5b572d972':
  #204 - Provide domain on feature
* main:
  No issue: Dont compute coverage for __version__.py
  No issue: Dont compute coverage for tests
  #206 - Type unmarshalling from string to the actual type specified in the type system
  #206 - Type unmarshalling from string to the actual type specified in the type system
  #206 - Type unmarshalling from string to the actual type specified in the type system
  #206 - Type unmarshalling from string to the actual type specified in the type system
  #204 - Provide domain on feature
- Do not execute performance "tests" when running make test
- Update JSON reference data with new data from UIMA Java SDK - including CAS examples using emojis and other Unicode characters
- Enabled character offset conversion on import/export in JSON (de)serializer
- Update JSON reference data with new data from UIMA Java SDK - including CAS examples using emojis and other Unicode characters
- Enabled character offset conversion on import/export in JSON (de)serializer
…llowed=true fails

- Fixed problem by checking the multipleReferencesAllowed feature during deserialization
- Added test
- Better check whether adding a TextIOWrapper is necessary during serialization
- Fixed bad access to element type name
- Formatting
- Better test if using a TextIOWrapper is really necessary
* main:
  #209 - Parsing an array that was serialized using multipleReferencesAllowed=true fails
@reckart reckart force-pushed the feature/168-Experimental-JSON-CAS-support branch from c3c8e58 to 08efad0 Compare September 20, 2021 15:20
- Work around issues with cas_to_compareble_text and FSArrays
* main:
  No issue. Formatting.
  #215 - Ability to exclude types from cas_to_comparable_text
  #212 - Allow loading/saving XMI/typesystems from/to Path
  #211 - Serializing an FSArray without any elements breaks
  #212 - Allow loading/saving XMI/typesystems from/to Path

# Conflicts:
#	cassis/util.py
…ithub.com/dkpro/dkpro-cassis into feature/168-Experimental-JSON-CAS-support

* 'feature/168-Experimental-JSON-CAS-support' of https://github.com/dkpro/dkpro-cassis:
  No issue. Formatting.
  #215 - Ability to exclude types from cas_to_comparable_text
  #212 - Allow loading/saving XMI/typesystems from/to Path
  #211 - Serializing an FSArray without any elements breaks
  #212 - Allow loading/saving XMI/typesystems from/to Path
  #168 - Experimental JSON CAS support
  #168 - Experimental JSON CAS support
  #168 - Experimental JSON CAS support
  #209 - Parsing an array that was serialized using multipleReferencesAllowed=true fails
  - Do not execute performance "tests" when running make test - Update JSON reference data with new data from UIMA Java SDK - including CAS examples using emojis and other Unicode characters - Enabled character offset conversion on import/export in JSON (de)serializer
  #209 - Parsing an array that was serialized using multipleReferencesAllowed=true fails
* main:
  #221 - Unable to parse empty arrays
  #219 - Floating point special values not serialized as expected
- Support for floating point special values in JSON
- Support for not serializing the full type system in JSON but only the minimal or none at all
* main:
  No issue. Fix issues with arrays in cas_to_comparable_text.
  #219 - Floating point special values not serialized as expected
@reckart reckart modified the milestones: 0.6.0, 0.7.0 Sep 27, 2021
…-CAS-support

* feature/192-cleanup:
  #192 - Cleanup stuff
  #192 - Cleanup stuff
  #192 - Cleanup stuff
  #192 - Cleanup stuff
  #192 - Cleanup stuff
  #192 - Cleanup stuff
  No issue. Fix issues with arrays in cas_to_comparable_text - added missing import.

# Conflicts:
#	cassis/util.py
- Run pyupgrade
* main:
  #225 - Add check for properly formatted README.rst (#226)
  No issue: Bum version after release
  Update release.md
  No isse: Fix README rst
  No issue: DKPro Cassis 0.6.0 release
  Create release.md
  No issue. Add release guide
* main:
  #231 - cas_to_comparable_text breaks saying that FSes do not have an ID
  #229 - Get transitive closure of types
  #227 - If a CAS contains no text the offset mapping initialization fails
* main:
  No issue: Bump version after release
  No issue: DKPro Cassis 0.6.1 release
  No issue. Fix PyDoc.
* main:
  #234 - cas_to_comparable_text fails with null arrays
@reckart reckart force-pushed the feature/168-Experimental-JSON-CAS-support branch from 397d752 to 9cdd945 Compare December 12, 2021 20:00
* main:
  #238 - Error parsing FSList in CTAKES XMi
  #238 - Error parsing FSList in CTAKES XMi
  #238 - Error parsing FSList in CTAKES XMi
  Create CITATION.cff
  #236 - Long output when printing type (#237)

% Conflicts:
%	cassis/cas.py
@reckart reckart force-pushed the feature/168-Experimental-JSON-CAS-support branch from 9cdd945 to 0566e37 Compare December 12, 2021 20:06
@reckart reckart marked this pull request as ready for review December 12, 2021 20:28
- Added mention about non-final status in README file
@reckart reckart merged commit 0753e05 into main Dec 12, 2021
@reckart reckart deleted the feature/168-Experimental-JSON-CAS-support branch December 12, 2021 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Experimental JSON CAS support
2 participants