Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 non-ascii string values cause string to be truncated #18

Closed
bjornharrtell opened this issue Jun 3, 2020 · 13 comments · Fixed by #29
Closed

UTF-8 non-ascii string values cause string to be truncated #18

bjornharrtell opened this issue Jun 3, 2020 · 13 comments · Fixed by #29
Assignees
Labels
bug-gdal A bug in GDAL
Milestone

Comments

@bjornharrtell
Copy link
Contributor

Describe the bug

UTF-8 non-ascii string values cause string to be truncated. In reproduction case it's a shapefile that has a value Befæstet which become Befæste. Likely string char length is assumed to be byte length somewhere.

To Reproduce

Clone https://github.com/bjornharrtell/gdal.netcore.utf8issuerepro then do a dotnet build then a dotnet run.

Expected behavior

Should not corrupt strings.

Environment information:

  • OS (version): Ubuntu 20.04
  • Package version (core): [e.g. 3.0.1.25]
  • Package version (runtime): [e.g. 3.0.1.2]

Additional context

I've also reproduced this with other formats (fx. GML) and my own builds of gdal.netcore and when running on Debian 10, so this seems to sit down deep somewhere.

@bjornharrtell bjornharrtell added the bug Something isn't working label Jun 3, 2020
@bjornharrtell
Copy link
Contributor Author

@rouault got any ideas where I should start looking for this bug?

@bjornharrtell
Copy link
Contributor Author

@MaxRev-Dev
Copy link
Owner

That's must be an issue with array bounds during marshaling.
This line respects a zero-terminator?? But the reverse is not?

I can only compile and release a new package when this will be fixed in GDAL.

@bjornharrtell
Copy link
Contributor Author

Preliminary good results by replacing the implementation of Utf8BytesToString in typemaps_csharp.i with:

internal unsafe static string Utf8BytesToString(IntPtr pNativeData)
  {
    if (pNativeData == IntPtr.Zero)
        return null;

    byte* pStringUtf8 = (byte*) pNativeData;
    int len = 0;
    while (pStringUtf8[len] != 0) len++;
    return System.Text.Encoding.UTF8.GetString(pStringUtf8, len);
  }

@bjornharrtell
Copy link
Contributor Author

It's strange that this has apparently worked fine in .NET Framework. I suspect a subtle and perhaps unintended change of behaviour in the marshalling parts in .NET.

@bjornharrtell
Copy link
Contributor Author

Found this old thing perhaps related https://trac.osgeo.org/gdal/ticket/5963 (!)

@bjornharrtell
Copy link
Contributor Author

And found this very comprehensive site with more information than you'd ever want to know about this subject - https://ericsink.com/entries/utf8z.html. May want to consider his utf8z type (https://github.com/ericsink/SQLitePCL.raw/blob/master/src/SQLitePCLRaw.core/utf8z.cs) which might be significantly better performing than the old methods.

@MaxRev-Dev
Copy link
Owner

That's the way to fix the truncation. But to move further, consider creating an issue in GDAL's official repo.
One thing I can do, is only publish packages when everything will be done.

@MaxRev-Dev MaxRev-Dev added bug-gdal A bug in GDAL and removed bug Something isn't working labels Jun 3, 2020
@bjornharrtell
Copy link
Contributor Author

@MaxRev-Dev fully agreed. GDAL PR done with OSGeo/gdal#2649.

@MaxRev-Dev
Copy link
Owner

MaxRev-Dev commented Nov 9, 2020

@bjornharrtell Looks like this bug was fixed in GDAL v3.2.0RC1.
Changes will apply in corresponding versions of packages - v3.2.0.x milestone

@txantxangorriak
Copy link

Hello,
It seems that we are encountering the same issue reading S57 Files when executing our program in Ubuntu 18.04 LTS environment.
Program is targeting .NetCore 3.1 and works perfectly fine on Windows environment.

Ile de paté is corrupted to Ile de Pat�
and
Ile d'Oléron is corrupted to Ile d'Olér

We currently use those package versions :
image

Can you confirm that we can hope for our problem being fixed using the futur package version based on GDAL v3.2.0?
And is there a plan of releasing it?

@bjornharrtell
Copy link
Contributor Author

@txantxangorriak I can confirm it resolved the issue for me (with a custom compilation based on this repo and a non-released GDAL)

@MaxRev-Dev MaxRev-Dev added this to the v3.2.0.100 milestone Dec 5, 2020
@MaxRev-Dev
Copy link
Owner

MaxRev-Dev commented Dec 6, 2020

@bjornharrtell @txantxangorriak
Tried to build GDAL 3.2.0, but it requires libtiff >= 4.2.0+ ???
I think it's a related issue OSGeo/gdal#3135.

Windows build is ready, though.
And GDAL 3.1.2 on linux is also works fine.

Should we wait VCPKG's update for libtiff ?

Configuration logs here

GDAL

configure: Bash completions not requested
checking for libtiff... using libtiff from /gdal-netcore/unix/build-unix/vcpkg/installed/x64-linux/lib.
checking for TIFFScanlineSize64 in -ltiff... no
configure: error: libtiff >= 4.0 is required.
make[1]: *** [gdal-makefile:73: configure_gdal] Error 1

VCPKG

szip:x64-linux-dynamic                             2.1.1-6          Szip compression software, providing lossless co...
tiff:x64-linux                                     4.1.0            A library that supports the manipulation of TIFF...
xerces-c:x64-linux                                 3.2.3-1          Xerces-C++ is a XML parser, for parsing, generat...

@MaxRev-Dev MaxRev-Dev reopened this Dec 6, 2020
MaxRev-Dev added a commit that referenced this issue Dec 12, 2020
Closes #18, closes 23
MaxRev-Dev added a commit that referenced this issue Dec 12, 2020
* Release v3.2.0.100
Closes #18, closes #23, closes #11 

* updated links and build instructions

* geos was cached. rebuilt windows packages

* tests configuration update

* updated test targets
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-gdal A bug in GDAL
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants