Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ansible permits ssh to send local LANG and LC_ALL env vars (file misencoding, filename corruption) #10698

Closed
ringerc opened this issue Apr 14, 2015 · 10 comments
Labels
bug This issue/PR relates to a bug.

Comments

@ringerc
Copy link
Contributor

ringerc commented Apr 14, 2015

Newer ssh versions like to send LANG and LC_ALL to the remote sshd, so that the remote ssh session has the same locale as the local user's session.

This makes Ansible playbook execution, including things like files created on the remote end, dependent on the environment of the user running the playbook. Files may have different language text and/or be in a different text encoding. File names may have different encodings too, so that the file named Álvaro in en_US.utf-8 will be listed as ??lvaro in en_US, even though it'll actually match the glob ?lvaro.

E.g.

$ LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 touch "Álvaro"
$ LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 ls
Álvaro
$ LANG=en_US LC_ALL=en_US ls
??lvaro

As far as I can tell Ansible doesn't support overriding environment variables for all tasks that run on a given host / hostgroup, nor does it support the environment keyword at the playbook level under the hosts dictionary. So it's not currently easy to say "Always use the xx_XX locale for this host".

These errors are often treated as minor and ignore-able, but they're anything but, especially when they result in mixing utf-8 and 1-byte encodings.

This does not (just) affect people who use non-English languages. I noticed the problem because I use the en_AU.utf-8 locale on my workstation, but most of the hosts I manage only have the C, C.UTF-8 and en_US.UTF-8 locales configured. The system falls back to C.

perl is particularly noisy about this:

$ perl
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_PAPER = "en_AU.utf8",
    LC_MONETARY = "en_AU.utf8",
    LC_NUMERIC = "en_AU.utf8",
    LC_MEASUREMENT = "en_AU.utf8",
    LC_TIME = "en_AU.utf8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").

The locale command will also complain:

$ locale
locale: Cannot set LC_ALL to default locale: No such file or directory

The best solution, IMO, is for Ansible to use the remote end's default locale, by suppressing sending of LANG and LC_ALL over ssh by default.

This is most visible on Debian and Ubuntu systems, which default to

AcceptEnv LANG LC_*

in /etc/ssh/sshd_config.

As a workaround for this issue, users may wish to remove the above AcceptEnv directive from the sshd_config on the server, thus disallowing locales from being set by the ssh client. Alternately, on the client side, users may remove:

SendEnv LANG LC_*

from their /etc/ssh/ssh_config. It does not appear to be possible to disable this on a per-host basis, unfortunately; see:

So the ideal would be for Ansible to override the LC_ variables and LANG before invoking ssh, or for Ansible to use its own global configuration file for ssh.


The simplest workaround is to create a minimal ansible_ssh_config in your Ansible project trees, like:

Host *
        ForwardAgent no
        ControlMaster=auto
        ControlPersist=60s

and then override ansible's default ssh command in an ansible.cfg in your project tree, like:

[ssh_connection]
ssh_args=-F ansible_ssh_config

The downside is that if Ansible's default ssh command changes, you won't see the change.

@ringerc ringerc changed the title Ansible permits ssh to send local LANG and LC_ALL env vars Ansible permits ssh to send local LANG and LC_ALL env vars (file misencoding, filename corruption) Apr 14, 2015
@mscherer
Copy link
Contributor

Yeah, this can have some annoying effect, especially if you parse the output of tools ( personal example with salt and chkconfig saltstack/salt-bootstrap#558 ).

I think it would just be easier to force LC_ALL=C in the ssh plugin.

mscherer added a commit to mscherer/ansible that referenced this issue Apr 14, 2015
This might have side effect on command line UI who will be in
english, and this might cause issue to people, but I think
that's a lesser problem that having silent breakage and corruption
of filename.
@mscherer
Copy link
Contributor

In fact, maybe the code should be pushed higher in the stack, since the issue could occurs on differents plugins.

@mscherer
Copy link
Contributor

So what about having ansible_env_LC_ALL=C to set the environment variable used for that connexion, like we have ansible_user= etc, etc for inventory ?

@ringerc
Copy link
Contributor Author

ringerc commented Apr 15, 2015

or, perhaps:

ansible_env:
    LC_ALL: C
    LANG: C

?

Some smarts are needed, though, because it's common for LC_ALL to be unset, and for the individual LC_ parameters to be set directly. (Though, digging further, it looks like LC_ALL overrides_ the individual LC` vars, so that might be OK).

It might make sense to have an ansible_locale or similar, which sets:

LANG
LC_ALL
LC_CTYPE
LC_NUMERIC
LC_TIME
LC_COLLATE
LC_MONETARY
LC_MESSAGES
LC_PAPER
LC_NAME
LC_ADDRESS
LC_TELEPHONE
LC_MEASUREMENT
LC_IDENTIFICATION

Separately, though, environment: support at the role and host/hostgroup level would certainly be handy.

@ringerc
Copy link
Contributor Author

ringerc commented Apr 15, 2015

See also comments on #10714 (comment)

@FurcyPin
Copy link

Maybe it would be nice if someone could write a short explanation of the current behavior of Ansible
regarding locale settings in the documentation.

I have been running into similar issues yesterday and was about to open a ticket on the same topic,
but then I saw this one and I finally managed to fix my end of the problem, but I will try to
retrace it here in case it helps:

First I should say that having C as locale instead of en_US.UTF-8 may indeed be a problem sometimes since when I tried to setup a postgresql server, the shell command pg_createcluster failed with the same kind of noisy perl error...

I started digging and found #7060 which was supposed to fix the problem but did not for me : indeed when I tried to run the following test task with ansible 1.7.2 (in which the patch was applied, I checked) :

- name: test
  shell: echo $LANG
  environment:
    LANG: "en_US.UTF-8"

I got C as output (with or without the environment: option), while when I ran the command echo $LANG on both my local host and the remote host via ssh (without ansible), I had en_US.UTF-8 as output.

I finally noticed that my locale on the local host was:

LANG="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_MONETARY="fr_FR.UTF-8"
LC_PAPER="fr_FR.UTF-8"
LC_NAME="fr_FR.UTF-8"
LC_ADDRESS="fr_FR.UTF-8"
LC_TELEPHONE="fr_FR.UTF-8"
LC_MEASUREMENT="fr_FR.UTF-8"
LC_IDENTIFICATION="fr_FR.UTF-8"

so I changed everything to en_US.UTF-8 and it finally worked (after relogging).

I am not sure about it, but I think what happened is this:

Ansible (or ssh) tries to send LANG to the remote sshd, but
fails to do so because fr_FR.UTF-8 is not configured on the remote host,
and then it silently goes for C instead, completely overriding the locale settings, and even the environment option.

Please let me know if it makes sense...

@ringerc
Copy link
Contributor Author

ringerc commented Apr 16, 2015

@FurcyPin Amusingly, I ran into this issue while using Ansible to automate deployment of a couple of PostgreSQL buildfarm members out at the OSU OSL.

Right now Ansible doesn't care about the locale, as far as I can tell. So what happens is down to:

  • Your local LC_ vars and LANG
  • Your local ssh_config's SendEnv directives
  • The remote sshd_config's AcceptEnv directives
  • The remote default locale

If you have LC_* or LC_ALL and/or LANG set in your local environment, your ssh has SendEnv LANG LC_ALL LC_CTYPE .... and the destination server has AcceptEnv LANG LC_ALL LC_CTYPE ... in its sshd_config then your local machine's locale environment will be exported to the remote machine.

So if you run the remote command perl -e 'exit;' using Ansible's command task and your local environment has LANG=en_AU.UTF-8 LC_ALL=en_AU.UTF-8 it's as if you actually run the remote command:

LANG=en_AU.UTF-8 LC_ALL=en_AU.UTF-8 perl -e 'exit';

On Debian/Ubuntu this will spit errors because Debian systems don't include all locales by default. On Red Hat / Fedora systems it'll silently run in a different locale to the system default, because all locales are installed. You'll probably only notice problems if the locale's collation order is different to what you expect or it's LC_CTYPE specifies a different character encoding, but more subtle issues like formatting of numbers can also arise.


#7060 looks like it actually unmasked this issue; prior to that commit, everything would be done in the C locale, which could well be wrong if the remote's default is C.UTF-8, but is at least consistent. Now different remote management nodes could use different locales and the server yet another different locale. #10714 effectively reverts #7060, which isn't the right thing to do (that change was made for good reasons, it's that since then ssh has started to default to sending the locale environment).


BTW by setting:

  environment:
    LANG: "en_US.UTF-8"

you're overriding LANG but not the LC_ variables. You should override LC_ALL if you override LANG, e.g.

  environment:
    LANG: "en_US.UTF-8"
    LC_ALL: "en_US.UTF-8"

This won't work if you've applied #10714 though.

@jimi-c
Copy link
Member

jimi-c commented Jul 12, 2015

Ansible does provide the ANSIBLE_MODULE_LANG (just module_lang, in the [defaults] section of ansible.cfg) to set the LANG value. This currently defaults to en_US.UTF-8 so you should not need to set it as above.

A quick example here shows things working as I'd expect:

# ansible -m file -a "state=touch path=/tmp/Álvaro" localhost -c local
localhost | SUCCESS => {
    "changed": true, 
    "dest": "/tmp/Álvaro", 
    "gid": 0, 
    "group": "root", 
    "mode": "0644", 
    "owner": "root", 
    "secontext": "unconfined_u:object_r:user_tmp_t:s0", 
    "size": 0, 
    "state": "file", 
    "uid": 0
}
# ll /tmp/Álvaro
-rw-r--r--. 1 root root 0 Jul 12 00:25 /tmp/Álvaro

Based on this, I'm going to close this issue.

If you continue seeing any problems related to this issue, or if you have any further questions, please let us know by stopping by one of the two mailing lists, as appropriate:

Because this project is very active, we're unlikely to see comments made on closed tickets, but the mailing list is a great way to ask questions, or post if you don't think this particular issue is resolved.

Thank you!

@jean
Copy link

jean commented Oct 3, 2016

postgres needs LANGUAGE not LANG, so we still need hacks like a-chernykh/railsbox#29 (comment)

@rgarrigue
Copy link

I know the topix is a bit old, but I ran into the same kind of issue.

With Ansible 2.2 runnng on Ubuntu 16.10 in french to provision out of the box CentOS 7.3, I had some weird "Unable to communicate with yum" error (sorry, I don't remember the exact message)

Turned out the CentOS's netwok was unproperly configured and yum ended up with a lot of "Connexion refusée", meaning Connection refused... And Ansible failing both my playbook's package: and bennojoy.nginx's on this.

After a bit of browsing, I ended up commenting out CentOS's sshd_config AcceptEnv LANG LC etc + sshd reload, before any yum, it solved my issue.

@ansibot ansibot added bug This issue/PR relates to a bug. and removed bug_report labels Mar 6, 2018
@ansible ansible locked and limited conversation to collaborators Apr 25, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue/PR relates to a bug.
Projects
None yet
Development

No branches or pull requests

8 participants