New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implements 'surrogateescape' encoding error handler #545
Conversation
I like the two-pass encoding idea. Do you have test cases for these? Might be useful to include in the testing. There are a few failures related to my comment about |
This reverts commit 864c44c.
Frankly, I do not know how to do it in a single pass (short from rewriting all .NET encoding classes from scratch). The good thing is that, if there are no decoding/encoding errors (which will be the vast majority of practical cases), the decoding/encoding will still be done in one pass. I do have C# unit tests for I also have a few tests in Python, which are less extensive but focus on (potential) differences with CPython. Funny but locally, all tests were passing. Does CI do more tests? |
The CI runs tests on .NET Framework + .NET Core on Windows and Mono + .NET Core on Linux and macOS. Not sure why it would pass for you but fail on the CI (I managed to reproduce the failure over here). You can include the C# unit tests in namespace IronPythonTest {
[TestFixture(Category="IronPython")]
public class SurrogateEscape {
[Test]
public void Utf8() {
// ...
}
// etc...
}
} |
I have reverted the changes to |
Sorry, I should have been clear that it should have been a partial revert of #539 since the changes to |
The Windows tests pass but the Linux/macOS tests fail. The main cause is The I have done some tests to see how .NET handles decoding errors, and sure enough, they do not get escaped in lone surrogates, but simply replaced by U+FFFD. So information is lost and we will never have full CPython behavior here. What we still can do is to make sure that we've got the types right, that would be good enough for practical purposes. This means that The question is where to do the conversion of strings to bytes: Since module |
@BCSharp I would probably fork I'm also completely fine with just having the |
Great. I leave updating ``EnvironmentDictionaryStorage` for another PR. |
Thanks for adding the tests. Looks like Mono is having issues with |
I have noticed that one test on Mono was failing but I haven’t got time to look into it yet. Probably I will have some time tonight (Pacific time). Thank you for providing some insight into the problem. What is interesting is that the test that is failing was meant to be run only on .NET Framework on Windows, as it uses an By the way, is there a better way to mark NUnit tests as applicable only to a specific framework or operating system? I was looking fir some attributes to apply to the test method, but haven’t found any. |
Hmm, while cp1252 is primarily a Windows encoding I think it should be available on all platforms. CPython supports it (via a charmap encoding?) and .NET also supports it. In the case of .NET Core, it's not available natively, but it is still available via the #if NETCOREAPP2_1
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
#endif We have an explicit hook up for cp1252 encoding on .NET Core (see Maybe if we didn't hook up encodings (other than the Python builtin ones) then we wouldn't have to deal with these compatibility issues when using normal Python. However we still should still support the interop scenario: Looks like the standard method for skipping the test would be something like: if (!RuntimeInformation.IsOSPlatform(OSPlatform.Windows)) {
Assert.Ignore("some message");
} |
No, it is not. What is documented is that if a
Strange, but when I do |
Odd about Encoding 1252, the test works fine for me... If you're happy with your PR as it is I'll take another look tomorrow and merge it in if everything looks good. |
I have just noticed that |
The reference should have been inherited from |
Adding the reference to |
Thanks again for the PR! |
Implements PEP 383 and Issue #2.
Works with all .NET encodings: ASCII, Latin1, UTF-8, UTF-7, UTF-16, UTF-32, etc.
The challenge is that .NET encoder and decoder fallback interfaces are only character oriented. This was also a problem in CPython, but they could extend the interface to allow fallbacks to produce output on byte level (see discussion PEP 383).