Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix default encoding #670

Closed
wants to merge 4 commits into from
Closed

Conversation

nachoaIvarez
Copy link

On macOS High Sierra, running built-in python 2.7.10, the script updateHostsFile.py throws:

image

This is happening because the program doesn't ensure UTF8 as default, although in some places it's done ad-hoc in the shape of .decode("UTF-8") calls.

This PR ensures the default encoding is set to UTF8, so it's less likely the program might fail on different environments.

After this change:
image

Hope this helps.

@welcome
Copy link

welcome bot commented Jun 18, 2018

Thank you for submitting this pull request! We’ll get back to you as soon as we can!

@StevenBlack
Copy link
Owner

Hi Nacho @nachoaIvarez! Thank you for this.

As you can see, Travis is complaining. It looks like our Python linter isn't liking some aspect of this.

Can you please amend this PR with a fix for that? I'm guessing the linter is not liking code mixed-in with the import declarations. That's the only thing I see, otherwise, looks great!

@funilrys
Copy link
Contributor

funilrys commented Jun 18, 2018

This PR is great but what do the community thinks about https://stackoverflow.com/a/17628350/5308363 ? @gfyoung @ScriptTiger && others

This is not a safe thing to do, though: this is obviously a hack, since sys.setdefaultencoding() is purposely removed from sys when Python starts. Reenabling it and changing the default encoding can break code that relies on ASCII being the default (this code can be third-party, which would generally make fixing it impossible or dangerous).

@funilrys
Copy link
Contributor

funilrys commented Jun 18, 2018

@nachoaIvarez normally such issue can be fixed with the following (what I'm doing when working with a terminal which is restricted to ASCII data).

Linux

export LC_ALL=C.UTF-8
export LANG=C.UTF-8

Most other UNIX systems

(Please replace with your country "code")

export LC_ALL=en_US.utf-8
export LANG=en_US.utf-8

Start the script

python updateHostsFile.py --auto

@StevenBlack
Copy link
Owner

/remind me to close this PR in 3 days if no reply from @nachoaIvarez

@StevenBlack
Copy link
Owner

@bkeepers sorry to bug you. I have @probot reminders configured. The docs seem to indicate we should see an ack here?

@nachoaIvarez
Copy link
Author

Sorry guys for the late. The World Cup is really having my complete attention 😅. So, what do you want the outcome of all this to be? Should I add a couple of lines to the docs to account for the requirement of LANG and LC_ALL variables? Change the code so the linter doesn't fails?

Regarding the quote from @funilrys, I want to remember that the code already counts on UTF-8 to be the encoding, given all the .decode('UTF-8') calls in the existing code. So, either we remove all of them, or we set the default encoding at top to make sure all the existing code works.

I think working-by-default is what we should aim for here, but this is just my opinion.

BTW, great work @StevenBlack, I love your project and I'm grateful for your investment in it.

@StevenBlack
Copy link
Owner

I agree with Nacho @nachoaIvarez when he says, we want this working by default.

Guys (and gals), I have a question for readers outside North America and Western Europe.

If we drop support for Python 2.7, to what degree does all this become easier?

Deprecating Python 2.7 would certainly shrink and simplify the code in some places...

@funilrys
Copy link
Contributor

Well, we then can merge this. Too bad nobody really answered my question but will get my answer by myself elsewhere ...

In my opinion, dropping the support for 2.7 have to be done today or in >= 2 years (cf. https://pythonclock.org/). So it's just a matter of time.

I will just point out: https://twitter.com/llanga/status/1009104547866742786 (Take the time to understand both side and you may understand the problem of Python 2 vs Python 3 and Experienced vs Inexperienced users ...)

Cheers

@StevenBlack
Copy link
Owner

Nissar @funilrys, not merging this with failing lint under Travis. Ping Nacho @nachoaIvarez!

@nachoaIvarez
Copy link
Author

Great! Submitting the updated PR with linter compliance now.

@nachoaIvarez
Copy link
Author

Added a new commit to include the reload import. While in Python 2.x, this was a builtin, but in 3.x, it's in the importlib module.

@nachoaIvarez
Copy link
Author

Travis was still failing on python 3, because the imp module was deprecated in Python 3.4 in favor of the importlib module. Added a new commit to do the proper import for python 3, specifically.

@nachoaIvarez
Copy link
Author

nachoaIvarez commented Jun 26, 2018

So, made a step forward, now python 3 is succesfully reloading sys, but turns out setdefaultencoding is only available on python 2 but not in python 3.

Should we leave python 3 alone and do this under the python 2 conditional?

Not that of an expert in python honestly, so if someone has a better idea, please let the rest of us know!

@StevenBlack
Copy link
Owner

Thanks Nacho @nachoaIvarez for tending to this.

Oh the joys of maintaining codebases across incompatible versions.

I think we'll drop Python 2.7 support sooner than later.

Maybe we abandon this path, and recommend what Nissar @funilrys suggests in this comment above?

@StevenBlack
Copy link
Owner

Closing.

@StevenBlack StevenBlack closed this Jul 3, 2018
@StevenBlack StevenBlack mentioned this pull request Jul 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants