-
-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF-8 non-ascii string values cause string to be truncated #18
Comments
@rouault got any ideas where I should start looking for this bug? |
Hmm I suspect https://github.com/OSGeo/gdal/blob/88409871d445a1ed7c973533cf2075f904a39f90/gdal/swig/include/csharp/typemaps_csharp.i#L392-L414. Guess I will learn even more SWIG... |
That's must be an issue with array bounds during marshaling. I can only compile and release a new package when this will be fixed in GDAL. |
Preliminary good results by replacing the implementation of internal unsafe static string Utf8BytesToString(IntPtr pNativeData)
{
if (pNativeData == IntPtr.Zero)
return null;
byte* pStringUtf8 = (byte*) pNativeData;
int len = 0;
while (pStringUtf8[len] != 0) len++;
return System.Text.Encoding.UTF8.GetString(pStringUtf8, len);
} |
It's strange that this has apparently worked fine in .NET Framework. I suspect a subtle and perhaps unintended change of behaviour in the marshalling parts in .NET. |
Found this old thing perhaps related https://trac.osgeo.org/gdal/ticket/5963 (!) |
And found this very comprehensive site with more information than you'd ever want to know about this subject - https://ericsink.com/entries/utf8z.html. May want to consider his utf8z type (https://github.com/ericsink/SQLitePCL.raw/blob/master/src/SQLitePCLRaw.core/utf8z.cs) which might be significantly better performing than the old methods. |
That's the way to fix the truncation. But to move further, consider creating an issue in GDAL's official repo. |
@MaxRev-Dev fully agreed. GDAL PR done with OSGeo/gdal#2649. |
@bjornharrtell Looks like this bug was fixed in GDAL v3.2.0RC1. |
@txantxangorriak I can confirm it resolved the issue for me (with a custom compilation based on this repo and a non-released GDAL) |
@bjornharrtell @txantxangorriak Windows build is ready, though. Should we wait VCPKG's update for Configuration logs hereGDAL
VCPKG
|
Describe the bug
UTF-8 non-ascii string values cause string to be truncated. In reproduction case it's a shapefile that has a value
Befæstet
which becomeBefæste
. Likely string char length is assumed to be byte length somewhere.To Reproduce
Clone https://github.com/bjornharrtell/gdal.netcore.utf8issuerepro then do a dotnet build then a dotnet run.
Expected behavior
Should not corrupt strings.
Environment information:
Additional context
I've also reproduced this with other formats (fx. GML) and my own builds of gdal.netcore and when running on Debian 10, so this seems to sit down deep somewhere.
The text was updated successfully, but these errors were encountered: