mixing py3 str/bytes with classic unicode/str

I need some help in my understanding. Perhaps it's a documentation request (there's some documentation in "What else you need to know" already, but I'd like it in a table, I think, with a bit more motivation and talking about consequences). Maybe at the end of writing this I'll understand your reasoning better. :)

python-future could be used to incrementally upgrade a larger codebase to Python 3, on a per-module basis. Is this a valid use case? If not, it's at least likely that futurized code can call into non-futurized python 2 code at some point. So I think that this is a use case in any case on an inter-package level, so inter-module should be okay too.

In this case it's possible for backported Python 3 str and bytes objects to interact with  Python 2 unicode/str objects. For convenience I'll call them py3str, py3bytes, py2unicode, py2str.

I expected the system to behave as follows:

```
py3str + py2unicode -> py2unicode

py3str + py2str -> py2unicode (or unicode error if py2str is not ascii)

py3bytes + py2unicode -> py2unicode (or unicode error if py3bytes is not ascii)

py3bytes + py2str -> py2str
```

i.e. the system falls back to Python 2 behavior when mixing Python 2 objects with Python 3 objects, for ensuring compatibility with non-Python 3 code.

Instead the behavior is as follows

```
py3str + py2unicode -> py3str

py3str + py2str -> py3str

py3bytes + py2unicode -> illegal

py3bytes + py2str -> py3bytes
```

i.e. the system upcasts combinations to the Py3 versions and makes one interaction illegal.

I can see why py3bytes interaction should not allow interacting with py2unicode even in Python 2 code in practice, so can be made illegal to enforce it: if you _know_ you're dealing with a byte str in Python 2, there's no sane way to add unicode to it, so could shouldn't do it anyway.

I can also see that for the reason of working with literals, it makes sense to upcast to py3str. 

And then, as the documentation describes, py3str may be combined with py2str to support backwards compatibility. So instead of downcasting things, upcasting is needed to make it possible to work sanely with literals, and there's a concession to backward compatibility by allowing mixing with py2str.

It could still lead to problems down the line in Python 2 code, as it might not do a decode before writing to a file, for instance. But in that case the situation is the same as passing unicode into code that isn't unicode-safe, so the bug exists in Python 2 already.

Is my reasoning correct? Could I help writing docs on this?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mixing py3 str/bytes with classic unicode/str #27

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

mixing py3 str/bytes with classic unicode/str #27

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions