Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Use UTF-8 as string encoding #1319

Closed
wants to merge 1 commit into from

Conversation

@lassoan
Copy link
Contributor

lassoan commented Feb 10, 2020

In general, all std::string or char buffers in Slicer are now expected to contain UTF-8 encoded string. This is what was the default on Mac OSX and Linux and it has been made available as an option on Windows as well. This is where VTK and ITK (and most other software libraries) are moving towards, too.

This will allow loading and saving files that has special characters in their names.

Requires yet unmerged changes in CTK: commontk/CTK#902

@lassoan lassoan requested review from pieper and jcfr Feb 10, 2020
@pieper
pieper approved these changes Feb 10, 2020
Copy link
Member

pieper left a comment

Looks good.

Seems like it would be nice to get this all in before Slicer5.

Libs/MRML/Core/Testing/cube-utf8.mrml Show resolved Hide resolved

<VolumeDisplay
id="vtkMRMLScalarVolumeDisplayNode1"
name="árvíztűrő tükörfúrógép"

This comment has been minimized.

Copy link
@pieper

pieper Feb 10, 2020

Member

Is this what was intended?

This comment has been minimized.

Copy link
@lassoan

lassoan Feb 10, 2020

Author Contributor

Yes, this is a well-known phrase that contain all the accented characters of Hungarian language. (to be precise: well-known to Hungarians)

This comment has been minimized.

Copy link
@pieper
@jcfr

This comment has been minimized.

Copy link
Member

jcfr commented Feb 10, 2020

Thanks for the work on this 🙏

I didn't have a chance to look at this PR or the one in CTK. Could we add this to the agenda for tomorrow dev meeting ?

@lassoan lassoan force-pushed the lassoan:utf8-string-encoding branch 2 times, most recently from 0c75624 to 3e336c2 Feb 15, 2020
@lassoan lassoan marked this pull request as ready for review Feb 16, 2020
@lassoan

This comment has been minimized.

Copy link
Contributor Author

lassoan commented Feb 19, 2020

If there is no further feedback then I'll merge this tomorrow.

@jcfr

This comment has been minimized.

Copy link
Member

jcfr commented Feb 19, 2020

If there is no further feedback then I'll merge this tomorrow.

Shouldn't we finalize this one first Slicer/SlicerExecutionModel#124

I am assuming the warnings et all discussed there will still apply.

@lassoan

This comment has been minimized.

Copy link
Contributor Author

lassoan commented Feb 20, 2020

It would be nice to merge SEM first, but SEM is quite independent: if the UTF-8 branch gets merged then non-ascii strings can be transferred correctly between Slicer and the CLI. Currently, we don't support non-ascii strings anyway, so it would not be a regression.

I've fixed the test failure in SEM. I don't think we should fix the warning, it is just a warning that the mt.exe tool included in that Windows SDK does not know this option yet (it was introduced after that Windows SDK was released). Despite the warning, mt.exe can still embed the manifest and the built exexutable works well on recent Windows 10 systems.

@lassoan

This comment has been minimized.

Copy link
Contributor Author

lassoan commented Feb 20, 2020

@jcfr Should I wait for merge of Slicer/SlicerExecutionModel#124 or can proceed with merging this without SEM update for now (and then update it when you have a chance to review that)?

This merge should happen before the git switch, as many files are modified.

@jcfr

This comment has been minimized.

Copy link
Member

jcfr commented Feb 20, 2020

I don't think we should fix the warning, it is just a warning that the mt.exe tool included in that Windows SDK does not know this option yet (it was introduced after that Windows SDK was released)

Makes sense.

Could you add an exception to https://github.com/Slicer/Slicer/blob/master/CMake/CTestCustom.cmake.in#L45

Should I wait for merge of Slicer/SlicerExecutionModel#124

It has just been merged.

Once the CTest exception has been added. I have no problem with integrating this PR. 🎉 🙏

In general, all std::string or char buffers in Slicer are now expected to contain UTF-8 encoded string. This is what was the default on Mac OSX and Linux and it has been made available as an option on Windows as well. This is where VTK and ITK (and most other software libraries) are moving towards, too.

This will allow loading and saving files that has special characters in their names and display of international text.

On Windows versions before Windows 10 Version 1903 (May 2019 Update): UTF-8 process code page is not yet supported, therefore non-ASCII characters may not be used in filenames correctly.
@lassoan

This comment has been minimized.

Copy link
Contributor Author

lassoan commented Feb 20, 2020

Thanks for these. If you don't mind, I would add the warning suppression in a few days, after we saw which systems report the manifest authoring warning and to confirm that the change is necessary and indeed fixes the warning.

@lassoan lassoan force-pushed the lassoan:utf8-string-encoding branch from 3e336c2 to 78c7553 Feb 20, 2020
@lassoan

This comment has been minimized.

Copy link
Contributor Author

lassoan commented Feb 21, 2020

Thanks for the reviews, this is merged now in r28783. I'll add the CMake warning exception within a few days.

@lassoan lassoan closed this Feb 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.