Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write Japanese with setDefaultCharacterSet to UTF8 (ISO_IR 192) fails with BufferOverflowException #839

Closed
wigun opened this issue Nov 25, 2020 · 6 comments
Assignees
Labels
Milestone

Comments

@wigun
Copy link

wigun commented Nov 25, 2020

Describe the bug
If you create a dataset with 'Value 1 of Attribute Specific Character Set (0008,0005) is Not Present' like '(0008,0005) \ISO 2022 IR 87' AND you have set the DefaultCharacterSet to UTF8, the writing of the dataset fails with a BufferOverflowException.

To Reproduce
Run this test:

    @Test
    void writeJapanese_setDefaultCharacterSet_Utf8() {
        String utf8 = "ISO_IR 192";
        SpecificCharacterSet.setDefaultCharacterSet(utf8);

        Attributes dataset = new Attributes();
        dataset.setString(Tag.SpecificCharacterSet, VR.CS, "", "ISO 2022 IR 87");
        dataset.setString(Tag.PatientName, VR.PN, "Yamada^Tarou=山田^太郎=やまだ^たろう");

        SpecificCharacterSet specificCharacterSet = dataset.getSpecificCharacterSet();
        //codec[0] is 'UTF_8', codec[1] is 'JIS_X_208' ?!?!?!

        VR.PN.toBytes(dataset.getValue(Tag.PatientName), specificCharacterSet);

        // this fails with a
        // java.nio.BufferOverflowException
        //	at java.base/java.nio.charset.CoderResult.throwException(CoderResult.java:273)
        //	at org.dcm4che3.data.SpecificCharacterSet$Encoder.encode(SpecificCharacterSet.java:265)
        //	at org.dcm4che3.data.SpecificCharacterSet$ISO2022.encode(SpecificCharacterSet.java:309)

        // therefore any write dataset will fail as well
    }

Expected behavior
Writing of any datasets should succeed independent of the set DefaultCharacterSet by SpecificCharacterSet.setDefaultCharacterSet(String code)

Desktop (please complete the following information):

  • OS: Windows 10
  • dcm4che 5.22.5
@wigun
Copy link
Author

wigun commented Nov 27, 2020

related or duplicate to #818 ?!

@gunterze
Copy link
Member

related, but not duplicate.

@wigun
Copy link
Author

wigun commented Nov 27, 2020

This is a more simple test, which shows the issue, which I would expect to succeed:

    @Test
    public void testSetDefaultCharacterSetUtf8_valueOfJapanese() {
        SpecificCharacterSet.setDefaultCharacterSet("ISO_IR 192");
        SpecificCharacterSet specificCharacterSet = SpecificCharacterSet.valueOf("", "ISO 2022 IR 87");

        assertEquals("ISO_646", specificCharacterSet.codecs[0].name());
        assertEquals("JIS_X_208", specificCharacterSet.codecs[1].name());
    }

@gunterze
Copy link
Member

Will fix it, by falling back to ASCII as default character set, if Specific Character Sets contains multiple code values and Default Character Set was set to "ISO_IR 192".

@wigun
Copy link
Author

wigun commented Nov 27, 2020

Only for Utf8 "ISO_IR 192" ?
I would expect this to work for any changed Default Character Set.

@gunterze
Copy link
Member

gunterze commented Nov 27, 2020

No, the error should no occur if setting a single byte character supplementing ASCII as Default Character Set

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants