Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text encoding revamp: trait changes and more text encodings #53

Merged
merged 7 commits into from
Jun 21, 2020

Conversation

Enet4
Copy link
Owner

@Enet4 Enet4 commented Jun 13, 2020

This is another step intended to cover some of the current issues in text encoding.

Major changes:

  • TextCodec no longer implies Debug
    • This is not a common practice anyway, such a constraint makes more sense where it is actually needed.
  • New method TextCodec::name, for retrieving the character set's defined term.
  • SpecificCharacterSet is marked as non-exhaustive, so that the presence of future encodings does not break future code upon this change.

New:

  • New character encodings: ISO-IR 100, ISO-IR 101, ISO-IR 109, ISO-IR 110, ISO-IR 144.
  • Report to stderr when an unsupported text encoding is decoded (should become a warning log line with Logging foundation #49).

Misc:

  • Added a test at parser to check that the specific character set is updated after decoding the respective data element (verifies fix parse Specific Character Set error #52).
  • Used a macro to implement existing text encodings via the encoding crate.
  • Add baseline tests for some of the current text encodings.

- `TextCode` does not require `Debug`
- Add `name` method

Note: major breaking changes
- print warning when finding an unsupported character set
- add test for character set update to ISO_IR 192
   - ensure that the Specific Character Set detection
   mechanism is working
- update documentation given the
  currently supported character sets
- [breaking change] make textcodec deliberately non-exhaustive
   - so that new text codecs do not break dependents in future versions
- SpecificCharacterSet::IsoIr100
- SpecificCharacterSet::IsoIr101
- Make a distinction between ISO-IR 6 and ISO-IR 100
- add tests to current text encodings
- reduce redundancy in implementation of
  text codecs via macro
- ISO-IR 109: South Europe
- ISO-IR 110: North Europe
- ISO-IR 144: Cyrillic
- Admit ISO 2022 prefix
   - multi-value character set not supported yet
@Enet4 Enet4 mentioned this pull request Jun 13, 2020
6 tasks
@Enet4 Enet4 merged commit 823ede5 into master Jun 21, 2020
@Enet4 Enet4 deleted the imp/text-codec-name-test branch June 21, 2020 10:36
@9enki
Copy link

9enki commented Nov 24, 2023

@Enet4 Is it OK to ask questions here related to this PR? If it is against the rules, I will delete the post.

Report to stderr when an unsupported text encoding is decoded (should become a warning log line with #49).

According to avobe description, unsupported text encoding will cause an error when decoded, but is the case I tried below the case where an error should occur?

let specific_character_set = self
    .object
    .element(tags::SPECIFIC_CHARACTER_SET)
    .unwrap()
    .to_multi_str();
let patient_name = self
    .object
    .element(tags::PATIENT_NAME)
    .unwrap()
    .to_multi_str();
info!("SPECIFIC_CHARACTER_SET: {:?}", specific_character_set);
info!("PATIENT_NAME:           {:?}", patient_name);
2023-11-24T08:44:03.903759Z  INFO api::presentation::validates::analysis::dicom: SPECIFIC_CHARACTER_SET: Ok(["ISO 2022 IR 13", "ISO 2022 IR 87"])
2023-11-24T08:44:03.903783Z  INFO api::presentation::validates::analysis::dicom: PATIENT_NAME:           Ok(["ŶÔÏ^º³¼Þ=\u{1b}$BCf;3\u{1b}(J^\u{1b}$B9'<#\u{1b}(J"])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants