Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

basestring fixer is extremely disruptive #127

Closed
eevee opened this issue Dec 13, 2014 · 9 comments
Closed

basestring fixer is extremely disruptive #127

eevee opened this issue Dec 13, 2014 · 9 comments

Comments

@eevee
Copy link

eevee commented Dec 13, 2014

futurize --stage2 replaces all uses of basestring with str (and also makes str always be the text type), which breaks isinstance checks on native strings on Python 2.

If there were a string_types available, maybe futurize could use that instead :)

@QuLogic
Copy link
Contributor

QuLogic commented Dec 13, 2014

Wouldn't stage 2 imply that you're properly following Python 3 recommendations, and correctly segregating str from bytes? You wouldn't test for basestring then, because they shouldn't be mixed.

@eevee
Copy link
Author

eevee commented Dec 13, 2014

The code in question is a library that explicitly (and sloppily, and incorrectly... and unfixably) allows both native strings and text strings. For Python 2 that means basestring; for Python 3 it means only str.

I ran stage 2 because it includes a lot of fairly mundane fixes that are necessary for running on Python 3, e.g. fixing iterables that became lazier or imports that moved around. At worst, those just make the code slower. It's only a handful of fixes like basestring (and long, come to think of it) that actively sabotages Python 2 code. And because both basestring and unicode are converted to str, it's difficult to go through the resulting code and figure out what the original intent was.

@QuLogic
Copy link
Contributor

QuLogic commented Dec 13, 2014

Instead of stage 2, you could specify fixers explicitly, with -f. You can list them with -l, or for just stage 2 -l -2. Then you could probably skip the basestring change.

@eevee
Copy link
Author

eevee commented Dec 13, 2014

I'll definitely try that next time, thanks.

Seems a shame to not make any distinction between fixers that will break working Python 2 code. Perhaps a stage 1½ is in order. :)

@edschofield
Copy link
Contributor

Thanks for your comments. @eevee: I see your point. It would be better if stage 2 didn't cause previously working code to break on Py2.

If stage 2 imported basestring from past.builtins, the resulting code would no longer break on Py2 but it would be unlikely to run on Py3. Would this be an improvement? I think it might -- because, as you say, it would preserve the original intent of the code.

Do you have any reason to prefer something like six's string_types tuple over the basestring object in past.builtins?

@QuLogic: would you agree to this change to fix_basestring.py?

The recent change in fix_division_safe.py was somewhat similar.

@eevee
Copy link
Author

eevee commented Feb 6, 2015

I'm pretty sure I use basestring for the same reason as 99% of anyone uses basestring. I have a Python 2 API that wants text, and I would love for it to be unicode, but the world is a cold and bitter place full of legacy concerns and a bad default string type, so I accept str as well. In Python 2 you can largely get away with this thanks to the nearly-identical APIs and silent conversions.

I don't think I've ever seen basestring used in any way other than to test whether a value is text-ish, and Python 3 only has one text-ish type: str. Thus I've always changed basestring to six.string_types when porting a codebase.

past's basestring would work as a substitute (and be an improvement over producing code that doesn't work at all), but it would also preserve semantics that don't really make any sense in Python 3. Importing past is probably against the spirit of a tool called futurize, anyway. :)

I can think of one other case where basestring is useful: for detecting whether a value is a container but not string-like. But that's only necessary on Python 3 anyway, since str and unicode don't have __iter__ on Python 2.

@eestrada
Copy link

Wouldn't it make more sense to replace isinstance(var, basestring) with:

from builtins import bytes, str

isinstance(var, (bytes, str))

The intention is clear, works in both versions, and doesn't require anything from past. As has been pointed out, it still may break since bytes and str can't mix in Py3 as they can in Py2, but at least it would continue to work in Py2 until the code base could be updated to work with Py3.

Perhaps this could be added as an alternative fixer for basestring.

@edschofield
Copy link
Contributor

@eevee and @eestrada : thanks for your input.

I've taken the conservative route to fixing this in v0.15.0 by importing the basestring class from past.builtins. This preserves the same behaviour on Py2.x and at least marks the code as needing attention on Py3 through the reference to past. I suppose the best translation of basestring on Py3 will depend on the circumstances: sometimes str or bytes, other times just str.

The main reason I chose this versus replacing basestring with the sensible suggestion of (bytes, str) was robustness and simplicity of implementation of the libfuturize fixer. I'm open to changing it to (bytes, str) in the future if I see enough of a need.

I believe the issue is resolved now, but, if not, please feel free to reopen it.

@eevee
Copy link
Author

eevee commented Jul 28, 2015

I'm cool with that, thank you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants