Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JISX0201 Not recognized in multi-character set encoding using ISO_IR 13 #869

Closed
homerocda opened this issue Jan 14, 2021 · 3 comments
Closed
Assignees
Milestone

Comments

@homerocda
Copy link

Describe the bug
SpecificCharacterSet.valueOf(String... codes) does not get JISX0201 encoding for "ISO_IR 13/ISO 2020 IR 87".

To Reproduce
Add the following test to SpecificCharacterSetTest. The test fails.

@Test
public void testEncodeJapanesePersonNameJISX0201_withAlias() {
      assertArrayEquals(JAPANESE_PERSON_NAME_JISX0201_BYTES,
              SpecificCharacterSet.valueOf("ISO_IR 13", "ISO 2020 IR 87").encode(JAPANESE_PERSON_NAME_JISX0201, PN_DELIMS));
}

Expected behavior
The test should pass as if "ISO 2022 IR 13" was given instead of "ISO_IR 13"

@gunterze
Copy link
Member

According DICOM PS 3.3, C.12.1.1.2 Specific Character Set:

If the Attribute Specific Character Set (0008,0005) has more than one value, Code Extension techniques are used and Escape Sequences may be encountered in all character sets. Requirements for the use of Code Extension techniques are specified in PS3.5. In order to indicate the presence of Code Extension, the Defined Terms for the repertoires have the prefix "ISO 2022", e.g., ISO 2022 IR 100 for the Latin Alphabet No. 1. See Table C.12-3 and Table C.12-4. Table C.12-3 describes single-byte character sets for value 1 to value n of the Attribute Specific Character Set (0008,0005), and Table C.12-4 describes multi-byte character sets for value 2 to value n of the Attribute Specific Character Set (0008,0005).

And all Defined Terms listed in Table C.12-3 and Table C.12-4 have the prefix "ISO 2022".

"ISO_IR 13" is listed in Table C.12-2. Defined Terms for Single-Byte Character Sets Without Code Extensions

@gunterze
Copy link
Member

gunterze commented Jan 15, 2021

I don't argue, that we should not make it more lenient - particularly because I think it's quite easy - , but I would not labeling it "fixing a bug".

@gunterze
Copy link
Member

Fixed by 70d1262 and 5921973 (cherry-picked from #870)

@gunterze gunterze self-assigned this Jan 18, 2021
@gunterze gunterze added this to the 5.23.1 milestone Jan 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants