Skip to content

Remove PythonAsciiEncoding#959

Merged
slozier merged 3 commits intoIronLanguages:masterfrom
BCSharp:delete_pythonascii
Oct 5, 2020
Merged

Remove PythonAsciiEncoding#959
slozier merged 3 commits intoIronLanguages:masterfrom
BCSharp:delete_pythonascii

Conversation

@BCSharp
Copy link
Copy Markdown
Member

@BCSharp BCSharp commented Oct 4, 2020

Normally I try hard to match an exeption text message to CPython's if it is being tested for by a test from StdLib, but this time I believe it is not worth pursuing. With PythonAsciiEncoding gone, the IronPython's message is produced by .NET DecodingExceptionHandler, which has a different text. Technically, it is still possible to match CPython's message, for instance by always using StringOps.ExeptionFallback with a specially crafted message text for the case of ASCII encoding, but I was hoping that that fallback would be a temporary workaround for the .NET Core UTF-8 bug and will disappear one day. It is practically impossible to match CPython's messages exactly for every case, because .NET encodings do not provide any infomation on why a fallback occurs. For ASCII, the reason is trivial, it is always "not in range(128)", but for e.g. UTF-8 there could be a variety of reasons. So modifying StringOps.ExeptionFallback would be just a complex workaround just to pass this single test.

Another thing is the code in MarshalOps.ReadAsciiString(). It is being used to read marshalled data with typecode t, which is never produced by IronPython 3 so I assume it is for compatibility with IronPython 2. The IronPython 2 code uses PythonAsciiEncoding.Instance for this purpose, which is a "non-throwing " version, in practice equivalent to Latin-1. So the replacement encoding here is Latin-1 to maintain exactly the same semantics as before. However, I wonder whether it is necessary. For one, IronPython 2 will never marshal any non-ASCII strings with typecode 't' (at least as far as the recorded history on GitHub reaches), and second, is this backward-compatibility really needed? Perhaps the whole typecode t can be removed.

Comment thread Src/IronPythonTest/EncodingTest.cs Outdated

[Test]
public void TestIncrementalWithtPythonAscii() {
public void TestIncrementalWithtLatin1() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in test name Witht. Looks like the surrounding tests have the same typo.

expected = "'ascii' codec can't decode byte 0xe2 in position 16: " \
"ordinal not in range(128)"
if sys.implementation.name == 'ironpython':
expected = "'us-ascii' codec can't decode byte 0xe2 in position 16: " \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we have us-ascii instead of the standard Python name ascii? I tracked it down to StringOps.GetEncodingName and it looks like it's hardcoded for the common encodings.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is hardcoded to us-ascii to make .NET Core ASCII codec have the same name as under .NET Framework 4.6. On 4.6, this code path is not reached and the name is pulled from encoding.HeaderName above, which produces us-ascii. Also codepage 28591 has WebName iso-8859-1, rather than latin-1 as CPython would say.

I kept the .NET names preferred because these were names that IronPython was already using when I joined the project. Also, only a subset of encodings is enumerated here, the rest will simply get cpXXXX (on .NET Core at least).

However, I can move the switch before the .NET 4.6 part and override what .NET encoding says about itself, at least for a subset of most common encodings.

@slozier
Copy link
Copy Markdown
Contributor

slozier commented Oct 4, 2020

Not too worried about the marshaling stuff. It's mainly for pyc support (which we don't have right now).

Copy link
Copy Markdown
Contributor

@slozier slozier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@slozier slozier merged commit e60c887 into IronLanguages:master Oct 5, 2020
@BCSharp BCSharp deleted the delete_pythonascii branch October 5, 2020 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants