write Japanese with setDefaultCharacterSet to UTF8 (ISO_IR 192) fails with BufferOverflowException #839

wigun · 2020-11-25T16:24:47Z

Describe the bug
If you create a dataset with 'Value 1 of Attribute Specific Character Set (0008,0005) is Not Present' like '(0008,0005) \ISO 2022 IR 87' AND you have set the DefaultCharacterSet to UTF8, the writing of the dataset fails with a BufferOverflowException.

To Reproduce
Run this test:

    @Test
    void writeJapanese_setDefaultCharacterSet_Utf8() {
        String utf8 = "ISO_IR 192";
        SpecificCharacterSet.setDefaultCharacterSet(utf8);

        Attributes dataset = new Attributes();
        dataset.setString(Tag.SpecificCharacterSet, VR.CS, "", "ISO 2022 IR 87");
        dataset.setString(Tag.PatientName, VR.PN, "Yamada^Tarou=山田^太郎=やまだ^たろう");

        SpecificCharacterSet specificCharacterSet = dataset.getSpecificCharacterSet();
        //codec[0] is 'UTF_8', codec[1] is 'JIS_X_208' ?!?!?!

        VR.PN.toBytes(dataset.getValue(Tag.PatientName), specificCharacterSet);

        // this fails with a
        // java.nio.BufferOverflowException
        //	at java.base/java.nio.charset.CoderResult.throwException(CoderResult.java:273)
        //	at org.dcm4che3.data.SpecificCharacterSet$Encoder.encode(SpecificCharacterSet.java:265)
        //	at org.dcm4che3.data.SpecificCharacterSet$ISO2022.encode(SpecificCharacterSet.java:309)

        // therefore any write dataset will fail as well
    }

Expected behavior
Writing of any datasets should succeed independent of the set DefaultCharacterSet by SpecificCharacterSet.setDefaultCharacterSet(String code)

Desktop (please complete the following information):

OS: Windows 10
dcm4che 5.22.5

The text was updated successfully, but these errors were encountered:

wigun · 2020-11-27T09:15:04Z

related or duplicate to #818 ?!

gunterze · 2020-11-27T09:20:12Z

related, but not duplicate.

wigun · 2020-11-27T10:54:30Z

This is a more simple test, which shows the issue, which I would expect to succeed:

    @Test
    public void testSetDefaultCharacterSetUtf8_valueOfJapanese() {
        SpecificCharacterSet.setDefaultCharacterSet("ISO_IR 192");
        SpecificCharacterSet specificCharacterSet = SpecificCharacterSet.valueOf("", "ISO 2022 IR 87");

        assertEquals("ISO_646", specificCharacterSet.codecs[0].name());
        assertEquals("JIS_X_208", specificCharacterSet.codecs[1].name());
    }

gunterze · 2020-11-27T11:02:45Z

Will fix it, by falling back to ASCII as default character set, if Specific Character Sets contains multiple code values and Default Character Set was set to "ISO_IR 192".

wigun · 2020-11-27T11:15:45Z

Only for Utf8 "ISO_IR 192" ?
I would expect this to work for any changed Default Character Set.

gunterze · 2020-11-27T11:18:45Z

No, the error should no occur if setting a single byte character supplementing ASCII as Default Character Set

gunterze self-assigned this Nov 27, 2020

gunterze added the bug label Nov 27, 2020

gunterze added this to the 5.23.0 milestone Nov 27, 2020

gunterze closed this as completed in 160ac20 Nov 27, 2020

homerocda mentioned this issue Dec 4, 2020

Porting over fixes for ISO2022 handling in Specific Character Set from master #846

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

write Japanese with setDefaultCharacterSet to UTF8 (ISO_IR 192) fails with BufferOverflowException #839

write Japanese with setDefaultCharacterSet to UTF8 (ISO_IR 192) fails with BufferOverflowException #839

wigun commented Nov 25, 2020 •

edited

wigun commented Nov 27, 2020

gunterze commented Nov 27, 2020

wigun commented Nov 27, 2020

gunterze commented Nov 27, 2020

wigun commented Nov 27, 2020

gunterze commented Nov 27, 2020 •

edited

write Japanese with setDefaultCharacterSet to UTF8 (ISO_IR 192) fails with BufferOverflowException #839

write Japanese with setDefaultCharacterSet to UTF8 (ISO_IR 192) fails with BufferOverflowException #839

Comments

wigun commented Nov 25, 2020 • edited

wigun commented Nov 27, 2020

gunterze commented Nov 27, 2020

wigun commented Nov 27, 2020

gunterze commented Nov 27, 2020

wigun commented Nov 27, 2020

gunterze commented Nov 27, 2020 • edited

wigun commented Nov 25, 2020 •

edited

gunterze commented Nov 27, 2020 •

edited