-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSharp bindings: Fixed implementation of Utf8BytesToString #2649
Conversation
@szekerest how does this look to you? |
The GDAL project highly values your contribution and would love to see this work merged! Unfortunately this PR has not had any activity in the last 21 days and is being automatically marked as "stale". If you think this pull request should be merged, please check - that all unit tests are passing - that all comments by reviewers have been addressed - that there is enough information for reviewers, in particular
|
While we hate to see this happen, this PR has been automatically closed because it has not had any activity in the last 28 days. If this pull request should be reconsidered, please follow the guidelines in the previous comment and reopen this pull request. Or, if you have any further questions, just ask! We love to help, and if there's anything the GDAL project can do to help push this PR forward please let us know how we can assist. |
@szekerest and @rouault this is an important bugfix and it would be sad to leave it closed in my opinion. |
Re-opening but this was causing issues on the CI, which would have requested perhaps adapting it |
I think it only failed for macos_build for what appeared to be unrelated reasons. But I'll look into any persistent CI failures. |
There was in the trusty_clang config that was related to csharp. That said, as trusty is an oldish platform, we could just deprecate/remove C# testing on it |
Ah I see it's this part:
I can only guess it's something related to the use of unsafe (which I think is correct here) and the old version of mono in trusty. I think it makes sense to remove the C# testing on trusty. |
agreed |
Rebased on master and removed mono on trusty. But it looks like mono is not tested elsewhere? Will fork this PR and try to add it to Ubuntu 18.04 testing.. |
you also need to modify the install.sh scripts to remove / add the compilation of the bindings |
Ah, easy to miss as I have as of yet failed to run the same ci tests locally. It would be good if the CI setup allows for easy local execution using docker. It looks like this might be easier to support with GitHub actions, is that something you are trying to transition to? |
Yes I've recently added .github/workflows/ubuntu_20.04.yml |
Now failing with:
Can't explain it... The closest I have locally is Debian 10 where make test seems to be working, but it has mono 5.x compared to Ubuntu 18.04 which has mono 4.x. |
Pushed a noop to see if I get the same thing again... |
@bjornharrtell Why is it necessary to introduce unsafe code in the bindings? |
@szekerest it's not strictly necessary but AFAIK alternatives are inefficient and involve use of Marshal which I believe are not any safer anyway. (see https://stackoverflow.com/questions/10773440/conversion-in-net-native-utf-8-managed-string for a bit more in-depth) |
@bjornharrtell The thread you are referring to contains a way to calculate the byte length as follows:
Wouldn't that be sufficient for the purpose? If we would like to avoid the extra copy of using the Marshal class, why don't we do the UTF8->Unicode conversion at the unmanaged side, so that PtrToStringUni could be used directly, something like:
Not tested this approach, though. |
@bjornharrtell I also wanted to ask for an example where we can go wrong with the original approach. I've just created a small netcore project where I couldn't really reproduce the wrong length calculation (probably I just couldn't find out such input strings). |
@szekerest I would need to dive deeper into the subject to argue for/against using Marshal, so for now I'll stay mostly neutral on the subject. It seems the issue is only affecting .NET Core for unknown reasons as the original code does seem buggy to me. I can only imagine there is a subtle difference in the underlying implementation but I would expect that to have been caught by others/tests. A full description of the case is at MaxRev-Dev/gdal.netcore#18. Incidentally, I think that repo/effort is a good start making a .NET Core compatible build which is not easy in my opinion. I've forked that repo to include more drivers and against GDAL with this patch and I'm actually using that is production with as of yet good results. |
I took a brief look at https://www.mono-project.com/docs/advanced/pinvoke/ and it's a deep and murky subject which I'm not sure I will ever dive far into. I remain unconvinced that unsafe is any more dangerous than Marshal and P/Invoke when used in a controlled manner for the purpose of native bindings. In fact I feel more secure with the unsafe variant in this case as at least I can understand what is going on (i.e I'm not 100% sure what Marshal.PtrToStringUni is doing and that it's 100% portable). |
@bjornharrtell According to the documentation https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.marshal.ptrtostringuni?view=netcore-3.1 it just copies the contents of the unmanaged memory to a managed string up to the null terminator character. |
As far as I know using the unsafe flag doesn't affect the runtime/GC. Using the unsafe flag has security implications for the program as a whole but no higher/worse than already required by the use of P/Invoke, so as far as I know (and from experience) external projects are not affected. |
As soon as it won't introduce any limitation for the users consuming the gdal library, I'm not against the changes, however I'd prefer to add /unsafe only to ogr_csharp.dll at this stage which seems to be affected by the csout typemap. What about creating a new PR with that specific change only? I'm not sure we need to change the CI build setup, especially disabling trusty-clang within the scope of this PR (I don't recall what was the build issue with that however). |
@szekerest good points, I will try to find the time to rework this PR to your suggestions. |
The GDAL project highly values your contribution and would love to see this work merged! Unfortunately this PR has not had any activity in the last 21 days and is being automatically marked as "stale". If you think this pull request should be merged, please check - that all unit tests are passing - that all comments by reviewers have been addressed - that there is enough information for reviewers, in particular
|
Seems you can proceed now |
This is now reworked and rebased on master and contains only the unsafe fixed implementation of Utf8BytesToString. I had to add unsafe flag for all four dlls as Utf8BytesToString is referenced in sources for all of them. |
Windows build fails due to me not understanding how to provide the unsafe flag in that environment. |
Looks like affected csproj files need: <PropertyGroup>
<AllowUnsafeBlocks>true</AllowUnsafeBlocks>
</PropertyGroup> But as they are generated it's a pain to fix. Annoying that this seem to be the only way to do it with.NET Core. |
This is now all green and ready for review/merge @rouault and @szekerest. |
Looks good to me |
The existing implementation has shown to corrupt strings when bindings are generated and used with .NET Core.
I'm not sure why the existing implementation seem to work in .NET Framework. It does seem buggy - length is number of characters not bytes.
The proposed implementation in this PR has been verified working in .NET Core but is not verified with .NET Framework but I believe it should work.