Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSharp bindings: Fixed implementation of Utf8BytesToString #2649

Merged
merged 6 commits into from
Sep 1, 2020

Conversation

bjornharrtell
Copy link
Contributor

The existing implementation has shown to corrupt strings when bindings are generated and used with .NET Core.

I'm not sure why the existing implementation seem to work in .NET Framework. It does seem buggy - length is number of characters not bytes.

The proposed implementation in this PR has been verified working in .NET Core but is not verified with .NET Framework but I believe it should work.

@bjornharrtell
Copy link
Contributor Author

@szekerest how does this look to you?

@stale
Copy link

stale bot commented Jun 26, 2020

The GDAL project highly values your contribution and would love to see this work merged! Unfortunately this PR has not had any activity in the last 21 days and is being automatically marked as "stale". If you think this pull request should be merged, please check - that all unit tests are passing - that all comments by reviewers have been addressed - that there is enough information for reviewers, in particular

  • link to any issues which this pull request fixes
  • add a description of workflows which this pull request fixes
  • add screenshots if applicable
  • that you have written unit tests where possible In case you should have any uncertainty, please leave a comment and we will be happy to help you proceed with this pull request. If there is no further activity on this pull request, it will be closed in a week.

@stale stale bot added the stale label Jun 26, 2020
@stale
Copy link

stale bot commented Jul 4, 2020

While we hate to see this happen, this PR has been automatically closed because it has not had any activity in the last 28 days. If this pull request should be reconsidered, please follow the guidelines in the previous comment and reopen this pull request. Or, if you have any further questions, just ask! We love to help, and if there's anything the GDAL project can do to help push this PR forward please let us know how we can assist.

@stale stale bot closed this Jul 4, 2020
@bjornharrtell
Copy link
Contributor Author

@szekerest and @rouault this is an important bugfix and it would be sad to leave it closed in my opinion.

@rouault
Copy link
Member

rouault commented Jul 4, 2020

Re-opening but this was causing issues on the CI, which would have requested perhaps adapting it

@rouault rouault reopened this Jul 4, 2020
@stale stale bot removed the stale label Jul 4, 2020
@bjornharrtell
Copy link
Contributor Author

I think it only failed for macos_build for what appeared to be unrelated reasons. But I'll look into any persistent CI failures.

@rouault
Copy link
Member

rouault commented Jul 4, 2020

There was in the trusty_clang config that was related to csharp. That said, as trusty is an oldish platform, we could just deprecate/remove C# testing on it

@bjornharrtell
Copy link
Contributor Author

Ah I see it's this part:

LC_ALL=C mono createdata.exe Data pointlayer
The assembly mscorlib.dll was not found or could not be loaded.
It should have been installed in the `/usr/lib/mono/2.0/mscorlib.dll' directory.
make: *** [vagrant_safe_test] Error 1

I can only guess it's something related to the use of unsafe (which I think is correct here) and the old version of mono in trusty. I think it makes sense to remove the C# testing on trusty.

@rouault
Copy link
Member

rouault commented Jul 4, 2020

I think it makes sense to remove the C# testing on trusty.

agreed

@bjornharrtell
Copy link
Contributor Author

Rebased on master and removed mono on trusty.

But it looks like mono is not tested elsewhere? Will fork this PR and try to add it to Ubuntu 18.04 testing..

@rouault
Copy link
Member

rouault commented Jul 5, 2020

you also need to modify the install.sh scripts to remove / add the compilation of the bindings

@bjornharrtell
Copy link
Contributor Author

Ah, easy to miss as I have as of yet failed to run the same ci tests locally. It would be good if the CI setup allows for easy local execution using docker. It looks like this might be easier to support with GitHub actions, is that something you are trying to transition to?

@rouault
Copy link
Member

rouault commented Jul 5, 2020

It looks like this might be easier to support with GitHub actions, is that something you are trying to transition to?

Yes I've recently added .github/workflows/ubuntu_20.04.yml

@bjornharrtell
Copy link
Contributor Author

Now failing with:

[ERROR] FATAL UNHANDLED EXCEPTION: System.TypeInitializationException: The type initializer for 'OSGeo.OGR.Ogr' threw an exception. ---> System.EntryPointNotFoundException: CSharp_OSGeofOGR_wkb25DBit_get___

Can't explain it... The closest I have locally is Debian 10 where make test seems to be working, but it has mono 5.x compared to Ubuntu 18.04 which has mono 4.x.

@bjornharrtell
Copy link
Contributor Author

Pushed a noop to see if I get the same thing again...

@szekerest
Copy link
Contributor

@bjornharrtell Why is it necessary to introduce unsafe code in the bindings?

@bjornharrtell
Copy link
Contributor Author

bjornharrtell commented Jul 23, 2020

@szekerest it's not strictly necessary but AFAIK alternatives are inefficient and involve use of Marshal which I believe are not any safer anyway. (see https://stackoverflow.com/questions/10773440/conversion-in-net-native-utf-8-managed-string for a bit more in-depth)

@szekerest
Copy link
Contributor

szekerest commented Jul 23, 2020

@bjornharrtell The thread you are referring to contains a way to calculate the byte length as follows:

while (Marshal.ReadByte(nativeUtf8, len) != 0) ++len;

Wouldn't that be sufficient for the purpose?
I'd prefer to avoid adding the /unsafe switch to the entire build just because of this change. The utf8 pathes cannot be such long where the Marshal.Copy would decrease the performance significantly. By using the Marshal class to implement the interaction between the managed and unmanaged code it is more difficult to run into such unstable situations which are hard to indentify and manage. A good description about the potential problems can be found here: https://www.mono-project.com/docs/advanced/pinvoke/ and I confirm that we have also run into such situations (like early GC causing app crashes) with the stock implementations of the SWIG wrappers and that's why the implementation of swig_csharp_extensions.i has been added.

If we would like to avoid the extra copy of using the Marshal class, why don't we do the UTF8->Unicode conversion at the unmanaged side, so that PtrToStringUni could be used directly, something like:

%typemap(ctype) (const char *utf8_path)  "wchar_t*"
%typemap(out) (const char *utf8_path) %{ $result = CPLRecodeToWChar( $1, CPL_ENC_UTF8, CPL_ENC_UCS2); %}
%typemap(csout, excode=SWIGEXCODE) (const char *utf8_path) {
        /* %typemap(csout) (const char *utf8_path) */
        IntPtr cPtr = $imcall;
        string ret = Marshal.PtrToStringUni(cPtr);
        $excode
        return ret;
}

Not tested this approach, though.

@szekerest
Copy link
Contributor

@bjornharrtell I also wanted to ask for an example where we can go wrong with the original approach. I've just created a small netcore project where I couldn't really reproduce the wrong length calculation (probably I just couldn't find out such input strings).
In case if this issue is related to .net core specifically, I think it would be a right chance to add some build support in gdal, so that we can also include that configuration in the CI builds :-)

@bjornharrtell
Copy link
Contributor Author

@szekerest I would need to dive deeper into the subject to argue for/against using Marshal, so for now I'll stay mostly neutral on the subject.

It seems the issue is only affecting .NET Core for unknown reasons as the original code does seem buggy to me. I can only imagine there is a subtle difference in the underlying implementation but I would expect that to have been caught by others/tests. A full description of the case is at MaxRev-Dev/gdal.netcore#18. Incidentally, I think that repo/effort is a good start making a .NET Core compatible build which is not easy in my opinion. I've forked that repo to include more drivers and against GDAL with this patch and I'm actually using that is production with as of yet good results.

@bjornharrtell
Copy link
Contributor Author

I took a brief look at https://www.mono-project.com/docs/advanced/pinvoke/ and it's a deep and murky subject which I'm not sure I will ever dive far into. I remain unconvinced that unsafe is any more dangerous than Marshal and P/Invoke when used in a controlled manner for the purpose of native bindings. In fact I feel more secure with the unsafe variant in this case as at least I can understand what is going on (i.e I'm not 100% sure what Marshal.PtrToStringUni is doing and that it's 100% portable).

@szekerest
Copy link
Contributor

szekerest commented Jul 23, 2020

@bjornharrtell According to the documentation https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.marshal.ptrtostringuni?view=netcore-3.1 it just copies the contents of the unmanaged memory to a managed string up to the null terminator character.
But regarding to the documentation it is unclear what we get or loose if we switch on the /unsafe option for the build. Does it affect the overall memory management in some way or it is just to allow using the unsafe/fixed keywords in the code? Also, how the memory management is affected by the unsafe keyword itself? How a class or method is managed by the GC in this case or can that be framework dependent? How the dll-s marked as unsafe affect the external projects referring to that?

@bjornharrtell
Copy link
Contributor Author

As far as I know using the unsafe flag doesn't affect the runtime/GC. Using the unsafe flag has security implications for the program as a whole but no higher/worse than already required by the use of P/Invoke, so as far as I know (and from experience) external projects are not affected.

@szekerest
Copy link
Contributor

As soon as it won't introduce any limitation for the users consuming the gdal library, I'm not against the changes, however I'd prefer to add /unsafe only to ogr_csharp.dll at this stage which seems to be affected by the csout typemap. What about creating a new PR with that specific change only? I'm not sure we need to change the CI build setup, especially disabling trusty-clang within the scope of this PR (I don't recall what was the build issue with that however).
Enabling mono on ubuntu should also go to another PR IMHO.

@bjornharrtell
Copy link
Contributor Author

@szekerest good points, I will try to find the time to rework this PR to your suggestions.

@bjornharrtell
Copy link
Contributor Author

Removal of trusty target done broken out into #2802.

SWIG 4+ compatibility and Ubuntu 20.04 test target done in #2802.

If those are accepted, I can make a new PR with only the fixed unsafe variant and hopefully make a test case verification for it.

@stale
Copy link

stale bot commented Aug 19, 2020

The GDAL project highly values your contribution and would love to see this work merged! Unfortunately this PR has not had any activity in the last 21 days and is being automatically marked as "stale". If you think this pull request should be merged, please check - that all unit tests are passing - that all comments by reviewers have been addressed - that there is enough information for reviewers, in particular

  • link to any issues which this pull request fixes
  • add a description of workflows which this pull request fixes
  • add screenshots if applicable
  • that you have written unit tests where possible In case you should have any uncertainty, please leave a comment and we will be happy to help you proceed with this pull request. If there is no further activity on this pull request, it will be closed in a week.

@stale stale bot added the stale label Aug 19, 2020
@rouault
Copy link
Member

rouault commented Aug 21, 2020

If those are accepted, I can make a new PR with only the fixed unsafe variant and hopefully make a test case verification for it.

Seems you can proceed now

@stale stale bot removed the stale label Aug 21, 2020
@bjornharrtell bjornharrtell changed the title Fixed/compatible implementation of Utf8BytesToString Fixed implementation of Utf8BytesToString Aug 30, 2020
@bjornharrtell bjornharrtell changed the title Fixed implementation of Utf8BytesToString CSharp bindings: Fixed implementation of Utf8BytesToString Aug 30, 2020
@bjornharrtell
Copy link
Contributor Author

This is now reworked and rebased on master and contains only the unsafe fixed implementation of Utf8BytesToString.

I had to add unsafe flag for all four dlls as Utf8BytesToString is referenced in sources for all of them.

@bjornharrtell
Copy link
Contributor Author

Windows build fails due to me not understanding how to provide the unsafe flag in that environment.

@bjornharrtell
Copy link
Contributor Author

bjornharrtell commented Aug 31, 2020

Looks like affected csproj files need:

<PropertyGroup>
  <AllowUnsafeBlocks>true</AllowUnsafeBlocks>
</PropertyGroup>

But as they are generated it's a pain to fix. Annoying that this seem to be the only way to do it with.NET Core.

@bjornharrtell
Copy link
Contributor Author

This is now all green and ready for review/merge @rouault and @szekerest.

@szekerest
Copy link
Contributor

Looks good to me

@szekerest szekerest merged commit 1d35ff4 into OSGeo:master Sep 1, 2020
@bjornharrtell bjornharrtell deleted the patch-2 branch September 1, 2020 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants