Remove PythonAsciiEncoding by BCSharp · Pull Request #959 · IronLanguages/ironpython3

BCSharp · 2020-10-04T03:48:33Z

Normally I try hard to match an exeption text message to CPython's if it is being tested for by a test from StdLib, but this time I believe it is not worth pursuing. With PythonAsciiEncoding gone, the IronPython's message is produced by .NET DecodingExceptionHandler, which has a different text. Technically, it is still possible to match CPython's message, for instance by always using StringOps.ExeptionFallback with a specially crafted message text for the case of ASCII encoding, but I was hoping that that fallback would be a temporary workaround for the .NET Core UTF-8 bug and will disappear one day. It is practically impossible to match CPython's messages exactly for every case, because .NET encodings do not provide any infomation on why a fallback occurs. For ASCII, the reason is trivial, it is always "not in range(128)", but for e.g. UTF-8 there could be a variety of reasons. So modifying StringOps.ExeptionFallback would be just a complex workaround just to pass this single test.

Another thing is the code in MarshalOps.ReadAsciiString(). It is being used to read marshalled data with typecode t, which is never produced by IronPython 3 so I assume it is for compatibility with IronPython 2. The IronPython 2 code uses PythonAsciiEncoding.Instance for this purpose, which is a "non-throwing " version, in practice equivalent to Latin-1. So the replacement encoding here is Latin-1 to maintain exactly the same semantics as before. However, I wonder whether it is necessary. For one, IronPython 2 will never marshal any non-ASCII strings with typecode 't' (at least as far as the recorded history on GitHub reaches), and second, is this backward-compatibility really needed? Perhaps the whole typecode t can be removed.

slozier · 2020-10-04T13:30:27Z


            [Test]
-            public void TestIncrementalWithtPythonAscii() {
+            public void TestIncrementalWithtLatin1() {


Typo in test name Witht. Looks like the surrounding tests have the same typo.

slozier · 2020-10-04T13:59:07Z

-        expected = "'ascii' codec can't decode byte 0xe2 in position 16: " \
-                   "ordinal not in range(128)"
+        if sys.implementation.name == 'ironpython':
+            expected = "'us-ascii' codec can't decode byte 0xe2 in position 16: " \


Is there a reason why we have us-ascii instead of the standard Python name ascii? I tracked it down to StringOps.GetEncodingName and it looks like it's hardcoded for the common encodings.

It is hardcoded to us-ascii to make .NET Core ASCII codec have the same name as under .NET Framework 4.6. On 4.6, this code path is not reached and the name is pulled from encoding.HeaderName above, which produces us-ascii. Also codepage 28591 has WebName iso-8859-1, rather than latin-1 as CPython would say.

I kept the .NET names preferred because these were names that IronPython was already using when I joined the project. Also, only a subset of encodings is enumerated here, the rest will simply get cpXXXX (on .NET Core at least).

However, I can move the switch before the .NET 4.6 part and override what .NET encoding says about itself, at least for a subset of most common encodings.

slozier · 2020-10-04T14:01:36Z

Not too worried about the marshaling stuff. It's mainly for pyc support (which we don't have right now).

slozier

Looks good to me!

Remove PythonAsciiEncoding

ed5704f

slozier reviewed Oct 4, 2020

View reviewed changes

BCSharp added 2 commits October 4, 2020 13:36

Fix typo

5f84e9a

Align encoding names with CPython

1fd1585

slozier approved these changes Oct 5, 2020

View reviewed changes

slozier merged commit e60c887 into IronLanguages:master Oct 5, 2020

BCSharp deleted the delete_pythonascii branch October 5, 2020 15:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove PythonAsciiEncoding#959

Remove PythonAsciiEncoding#959
slozier merged 3 commits intoIronLanguages:masterfrom
BCSharp:delete_pythonascii

BCSharp commented Oct 4, 2020

Uh oh!

slozier Oct 4, 2020

Uh oh!

slozier Oct 4, 2020

Uh oh!

BCSharp Oct 4, 2020

Uh oh!

slozier commented Oct 4, 2020

Uh oh!

slozier left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BCSharp commented Oct 4, 2020

Uh oh!

slozier Oct 4, 2020

Choose a reason for hiding this comment

Uh oh!

slozier Oct 4, 2020

Choose a reason for hiding this comment

Uh oh!

BCSharp Oct 4, 2020

Choose a reason for hiding this comment

Uh oh!

slozier commented Oct 4, 2020

Uh oh!

slozier left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants