Remove PythonAsciiEncoding#959
Conversation
|
|
||
| [Test] | ||
| public void TestIncrementalWithtPythonAscii() { | ||
| public void TestIncrementalWithtLatin1() { |
There was a problem hiding this comment.
Typo in test name Witht. Looks like the surrounding tests have the same typo.
| expected = "'ascii' codec can't decode byte 0xe2 in position 16: " \ | ||
| "ordinal not in range(128)" | ||
| if sys.implementation.name == 'ironpython': | ||
| expected = "'us-ascii' codec can't decode byte 0xe2 in position 16: " \ |
There was a problem hiding this comment.
Is there a reason why we have us-ascii instead of the standard Python name ascii? I tracked it down to StringOps.GetEncodingName and it looks like it's hardcoded for the common encodings.
There was a problem hiding this comment.
It is hardcoded to us-ascii to make .NET Core ASCII codec have the same name as under .NET Framework 4.6. On 4.6, this code path is not reached and the name is pulled from encoding.HeaderName above, which produces us-ascii. Also codepage 28591 has WebName iso-8859-1, rather than latin-1 as CPython would say.
I kept the .NET names preferred because these were names that IronPython was already using when I joined the project. Also, only a subset of encodings is enumerated here, the rest will simply get cpXXXX (on .NET Core at least).
However, I can move the switch before the .NET 4.6 part and override what .NET encoding says about itself, at least for a subset of most common encodings.
|
Not too worried about the marshaling stuff. It's mainly for |
Normally I try hard to match an exeption text message to CPython's if it is being tested for by a test from StdLib, but this time I believe it is not worth pursuing. With
PythonAsciiEncodinggone, the IronPython's message is produced by .NETDecodingExceptionHandler, which has a different text. Technically, it is still possible to match CPython's message, for instance by always usingStringOps.ExeptionFallbackwith a specially crafted message text for the case of ASCII encoding, but I was hoping that that fallback would be a temporary workaround for the .NET Core UTF-8 bug and will disappear one day. It is practically impossible to match CPython's messages exactly for every case, because .NET encodings do not provide any infomation on why a fallback occurs. For ASCII, the reason is trivial, it is always "not in range(128)", but for e.g. UTF-8 there could be a variety of reasons. So modifyingStringOps.ExeptionFallbackwould be just a complex workaround just to pass this single test.Another thing is the code in
MarshalOps.ReadAsciiString(). It is being used to read marshalled data with typecodet, which is never produced by IronPython 3 so I assume it is for compatibility with IronPython 2. The IronPython 2 code usesPythonAsciiEncoding.Instancefor this purpose, which is a "non-throwing " version, in practice equivalent to Latin-1. So the replacement encoding here is Latin-1 to maintain exactly the same semantics as before. However, I wonder whether it is necessary. For one, IronPython 2 will never marshal any non-ASCII strings with typecode 't' (at least as far as the recorded history on GitHub reaches), and second, is this backward-compatibility really needed? Perhaps the whole typecodetcan be removed.