Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ElementDataFileName always use UTF-8 encoding #113

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

codeling
Copy link
Contributor

@codeling codeling commented Aug 10, 2022

Attempt to fix #68.

Limitations:

  • Only handles the ElementDataFileName field in datasets of type metaImage; I have seen the same field being used in e.g. metaArray and metaFEMObject as well, but I cannot test these at the moment; as for other MET_STRING fields, I have no idea if any might need such handling as well, and if so what consequences this might have, therefore I didn't touch them.
  • Not sure if all possible cases are handled correctly; I only have limited ability to test, and I don't know all possibilites of how the code is used from VTK/ITK. There is for example a logic of determining the ElementDataFileName from the FileName in the ::Write method, which I don't fully get; or rather I don't get why it intermittently sets the ElementDataFileName to the full data file path, and only later cuts away the path again; because this makes it a bit tricky to handle if the path itself also contains special characters - one should not encode the full path immediately in utf-8, since later, the path is extracted again from the ElementDataFileName, and a comparisons to the non-utf-8 encoded version of the file path is done....
  • The file name itself still has to be specified in local encoding (on Windows).
  • I considered adding a test case to testMeta3Image.cxx, but couldn't figure out how to set the file name properly, as source files are by default stored in utf-8 (but we would need to provide an ANSI filename on Windows)

Maybe a solution using vtksys iostreams (https://gitlab.kitware.com/paraview/paraview/-/merge_requests/3850) would be preferrable? Though this would probably break backwards-compatibility, since all file names would then be expected to be in UTF-8 format I guess?

Copy link
Collaborator

@dzenanz dzenanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know whether VTK/ParaView solution would work in ITK.

Squashing these commits is preferred.

Some test is better than no test. Maybe get code page, then if not a Chinese one, use Chinese characters in the test, otherwise use Cyrillic characters in the test?

@todoooo
Copy link

todoooo commented Aug 10, 2022

Sorry but I think introducing the MET_FromUtf8ToLocalEncoding/MET_FromUtf8ToLocalEncoding macros is a terrible idea.

Please don't redefine ToWide()/ToNarrow() in the METAIO source as they are already defined in KWSYS. The best way to handle stream and file opening is by using vtksys input/output streams throughout METAIO as has already been demonstrated in VTK (https://gitlab.kitware.com/vtk/vtk/-/merge_requests/6301) and Paraview.

UTF-8 encoding is identical for plain English (ANSI) text so backwards compatibility is not an issue.

@codeling
Copy link
Contributor Author

codeling commented Aug 10, 2022

I agree that duplication is bad; from comments on #68 I was under the impression though that the idea was to introduce just the required functionality from KWSys, not a dependency on that library (since MetaIO in general has very few dependencies, only zlib I think at the moment?). Regarding vtksys streams - correct me if I'm wrong, but wouldn't making MetaIO depend on vtksys introduce circular depencies (since currently VTK depends on MetaIO? I'd be glad if somebody comes up with a better solution of course; I just wanted to give it a quick go since nothing has happened on the topic in 3 years ;).
Also I'm not sure the vtksys fstream would fix this issue - it's about the encoding of the .mhd file content, not about that of the file name directly.

@todoooo
Copy link

todoooo commented Aug 10, 2022

correct me if I'm wrong, but wouldn't making MetaIO depend on vtksys introduce circular depencies (since currently VTK depends on MetaIO?

Since MetaIO and VTK are separate projects I don't believe that is a problem. The CMake builds in each will determine the location of vtksys.

@todoooo
Copy link

todoooo commented Aug 10, 2022

Also I'm not sure the vtksys fstream would fix this issue - it's about the encoding of the .mhd file content, not about the file name

Please re-read the issue. #68 is about the filename encoding not the file contents.

@dzenanz
Copy link
Collaborator

dzenanz commented Aug 10, 2022

vtksys and itksys are a rename of kwsys. So within VTK, KWSys is compiled under the name vtksys. Library content remains the same.

@codeling
Copy link
Contributor Author

Please re-read the issue. #68 is about the filename encoding not the file contents.

I wrote #68 ;) - maybe it's not fully clear from the description there, but the issue (which this pull request tries to address) is that the content of an .mhd file is not consistently encoded; explicitly the ElementDataFile entry; and this is a problem when accessing the same .mhd/.raw file from both Linux and Windows.

Of course this is closely linked to the encoding in which the file name is passed to MetaIO - which is what the introduction of vtksys would be about, right? That's why I mentiond that one, also maybe a bit confusingly.

In the end we ideally of course would have UTF-8 encoding everywhere; and making all filenames use UTF-8 encoding might implicitly fix the issue with an inconsistent encoding in ElementDataFile as well...

@todoooo
Copy link

todoooo commented Aug 11, 2022

I wrote #68 ;) - maybe it's not fully clear from the description there, but the issue (which this pull request tries to address) is that the content of an .mhd file is not consistently encoded; explicitly the ElementDataFile entry; and this is a problem when accessing the same .mhd/.raw file from both Linux and Windows.

I think you're barking up the wrong tree. The header/data names passed to the Meta read/write methods should be utf-8 encoded already, so your solution does nothing to address the problem. What's the failing test case?

The solution here is to replace all occurrences of std::ifstream/std::ofstream in the MetaIO source with vtksys::ifstream/vtksys::ofstream and include the vtksys/FStream.hxx dependency. The vtksys streams convert utf8 path and filenames to wide strings for the Windows API calls.

You can copy what has already been done in VTK.

@aylward
Copy link
Collaborator

aylward commented Aug 11, 2022

As @dzenanz pointed out, itksys and vtksys are just name mangled versions of kwsys. So, perhaps a better solution is to use a MET_FSTREAM preprocessor variable and if MetaIO is being compiled with ITK, then MET_FSTREAM:: becomes itksys::; and if MetaIO is being compiled with VTK, then MET_FSTREAM:: becomes vtksys::, and if MetaIO is being compiled independently, then MET_FSTREAM becomes either std:: or kwsys:: as determined by a cmake option (with the default being kwsys and with the developer responsible for providing KWSys via a find_package). The good news is that there is already logic for these three use cases in the namespace mangling routines in the CMakeLists.txt and metaIOConfig.h.in files in the src dir of MetaIO.

Of course, there are probably better names than MET_FSTREAM :)

@todoooo
Copy link

todoooo commented Aug 11, 2022

Of course, there are probably better names than MET_FSTREAM :)

METAIO_STREAM #106 (comment)

@aylward
Copy link
Collaborator

aylward commented Aug 11, 2022

Wish METAIO_STREAM hadn't been removed. @hjmjohnson - do you recall the motivation for removing it?

@todoooo
Copy link

todoooo commented Aug 17, 2022

This is a 5 minute fix. Is it going to languish for another 18 months?

@jamesobutler
Copy link

Hi all, I ran into the original issue that this PR is trying to fix which I posted about over on the Slicer forum https://discourse.slicer.org/t/failures-saving-a-volume-to-path-with-sign/25657. Is there a path forward to help move along a solution to solve this issue in MetaIO?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Assumed string encoding for .mhd differs on platforms
5 participants