Text encoding revamp: trait changes and more text encodings #53

Enet4 · 2020-06-13T17:09:50Z

This is another step intended to cover some of the current issues in text encoding.

Major changes:

TextCodec no longer implies Debug
- This is not a common practice anyway, such a constraint makes more sense where it is actually needed.
New method TextCodec::name, for retrieving the character set's defined term.
SpecificCharacterSet is marked as non-exhaustive, so that the presence of future encodings does not break future code upon this change.

New:

New character encodings: ISO-IR 100, ISO-IR 101, ISO-IR 109, ISO-IR 110, ISO-IR 144.
Report to stderr when an unsupported text encoding is decoded (should become a warning log line with Logging foundation #49).

Misc:

Added a test at parser to check that the specific character set is updated after decoding the respective data element (verifies fix parse Specific Character Set error #52).
Used a macro to implement existing text encodings via the encoding crate.
Add baseline tests for some of the current text encodings.

- `TextCode` does not require `Debug` - Add `name` method Note: major breaking changes

- print warning when finding an unsupported character set - add test for character set update to ISO_IR 192 - ensure that the Specific Character Set detection mechanism is working

- update documentation given the currently supported character sets - [breaking change] make textcodec deliberately non-exhaustive - so that new text codecs do not break dependents in future versions

- SpecificCharacterSet::IsoIr100 - SpecificCharacterSet::IsoIr101 - Make a distinction between ISO-IR 6 and ISO-IR 100 - add tests to current text encodings

- reduce redundancy in implementation of text codecs via macro - ISO-IR 109: South Europe - ISO-IR 110: North Europe - ISO-IR 144: Cyrillic - Admit ISO 2022 prefix - multi-value character set not supported yet

9enki · 2023-11-24T16:22:03Z

@Enet4 Is it OK to ask questions here related to this PR? If it is against the rules, I will delete the post.

Report to stderr when an unsupported text encoding is decoded (should become a warning log line with #49).

According to avobe description, unsupported text encoding will cause an error when decoded, but is the case I tried below the case where an error should occur?

let specific_character_set = self
    .object
    .element(tags::SPECIFIC_CHARACTER_SET)
    .unwrap()
    .to_multi_str();
let patient_name = self
    .object
    .element(tags::PATIENT_NAME)
    .unwrap()
    .to_multi_str();
info!("SPECIFIC_CHARACTER_SET: {:?}", specific_character_set);
info!("PATIENT_NAME:           {:?}", patient_name);

2023-11-24T08:44:03.903759Z  INFO api::presentation::validates::analysis::dicom: SPECIFIC_CHARACTER_SET: Ok(["ISO 2022 IR 13", "ISO 2022 IR 87"])
2023-11-24T08:44:03.903783Z  INFO api::presentation::validates::analysis::dicom: PATIENT_NAME:           Ok(["Å¶ÔÏ^º³¼Þ=\u{1b}$BCf;3\u{1b}(J^\u{1b}$B9'<#\u{1b}(J"])

Enet4 added 5 commits June 13, 2020 15:29

[encoding] Change TextCodec trait

2eed8f0

- `TextCode` does not require `Debug` - Add `name` method Note: major breaking changes

[parser] Test coverage for character set update

2fc7e69

- print warning when finding an unsupported character set - add test for character set update to ISO_IR 192 - ensure that the Specific Character Set detection mechanism is working

[encoding] Update text module

0d5ed5d

- update documentation given the currently supported character sets - [breaking change] make textcodec deliberately non-exhaustive - so that new text codecs do not break dependents in future versions

[encoding] ISO-IR 100 + ISO-IR 101

f9789f0

- SpecificCharacterSet::IsoIr100 - SpecificCharacterSet::IsoIr101 - Make a distinction between ISO-IR 6 and ISO-IR 100 - add tests to current text encodings

[encoding] more character sets

e53f019

- reduce redundancy in implementation of text codecs via macro - ISO-IR 109: South Europe - ISO-IR 110: North Europe - ISO-IR 144: Cyrillic - Admit ISO 2022 prefix - multi-value character set not supported yet

Enet4 mentioned this pull request Jun 13, 2020

Tracker issue for text encoding #40

Open

6 tasks

Enet4 added 2 commits June 21, 2020 00:21

[encoding] Test text codecs bidirectionally

d7db685

[encoding] minor documentation tweaks on text

fcab2b5

Enet4 merged commit 823ede5 into master Jun 21, 2020

Enet4 deleted the imp/text-codec-name-test branch June 21, 2020 10:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text encoding revamp: trait changes and more text encodings #53

Text encoding revamp: trait changes and more text encodings #53

Enet4 commented Jun 13, 2020

9enki commented Nov 24, 2023

Text encoding revamp: trait changes and more text encodings #53

Text encoding revamp: trait changes and more text encodings #53

Conversation

Enet4 commented Jun 13, 2020

9enki commented Nov 24, 2023