Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file lookup is not binary safe #3518

Closed
tyll opened this issue Jul 12, 2013 · 14 comments
Closed

file lookup is not binary safe #3518

tyll opened this issue Jul 12, 2013 · 14 comments
Labels
bug This issue/PR relates to a bug.

Comments

@tyll
Copy link
Contributor

tyll commented Jul 12, 2013

0xe4 is a latin1 encoded ä. It cannot be transfered using the FILE lookup and copy module:

$ echo -e '\xe4' > ae.txt
$ ansible test -m copy --user root -a 'dest=/tmp/test content="$FILE(ae.txt)"'
test | success >> {
"changed": true,
"dest": "/tmp/test",
"gid": 0,
"group": "root",
"md5sum": "d41d8cd98f00b204e9800998ecf8427e",
"mode": "0600",
"owner": "root",
"size": 0,
"src": "/root/.ansible/tmp/ansible-1373660975.11-58729570197455/source",
"state": "file",
"uid": 0
}
$ md5sum ae.txt
6935d28700cd9056e616f514ddaf7399 ae.txt
$ md5sum /dev/null
d41d8cd98f00b204e9800998ecf8427e /dev/null

The returned md5sum of the copy modules corresponds to the empty string and not to the one of the file ae.txt

@tyll
Copy link
Contributor Author

tyll commented Jul 12, 2013

The file lookup plugin always assumes to read utf8 and removes whitespace,
which would be fixed by this patch, but since the rest of ansible can not cope
with this, more needs to be changed for this to work:

diff --git a/lib/ansible/runner/lookup_plugins/file.py b/lib/ansible/runner/lookup_plugins/file.py
index d5291c2..9079aa0 100644
--- a/lib/ansible/runner/lookup_plugins/file.py
+++ b/lib/ansible/runner/lookup_plugins/file.py
@@ -39,7 +39,7 @@ class LookupModule(object):
             if not os.path.exists(path):
                 raise errors.AnsibleError("%s does not exist" % path)

-            ret.append(codecs.open(path, encoding="utf8").read().rstrip())
+            ret.append(codecs.open(path).read())


     return ret

@mpdehaan
Copy link
Contributor

You can of course use the src= option in this case.

@tyll
Copy link
Contributor Author

tyll commented Jul 13, 2013

Yes, this is just a minimal example to make the bug reproducable. I noticed this when I looked into issue 3214.

@srgvg
Copy link
Contributor

srgvg commented Aug 8, 2013

The md5sums being different, is perfectly normal due to the strip(). So the initial issue here is not really a bug to me.

I couldn't reproduce any problem with echo -e '\xe4' > ae.txt as this didn't show the ä character in that file, but manually editing ae.txt with that character, I got this error:

"msg": "could not write content temp file: 'ascii' codec can't encode character u'\\xe4' in position 5: ordinal not in range(128)"

So there does seem to be a problem here. After some research I came to this
http://mypy.pythonblogs.com/12_mypy/archive/1253_workaround_for_python_bug_ascii_codec_cant_encode_character_uxa0_in_position_111_ordinal_not_in_range128.html

Tried a patch, and this solves that ascii error. Sound like a valid fix to me, but not sure if this is a proper fix. I'll propose a PR for evaluation.

@bcoca
Copy link
Member

bcoca commented Aug 8, 2013

@serge, I've used that approach with success in the past though many warn
against it (not sure why).

I was also thinking of adding a 'encoding' parameter to lookup to safely
decode the file (default utf8), which may or may not match the local
encoding.

Brian Coca
Stultorum infinitus est numerus
0110000101110010011001010110111000100111011101000010000001111001011011110111010100100000011100110110110101100001011100100111010000100001
Pedo mellon a minno

@srgvg
Copy link
Contributor

srgvg commented Aug 8, 2013

FWIW, I checked the variables types, only to notice that everything was and remained utf8. The problem remains in the fd's write method only.

@tyll
Copy link
Contributor Author

tyll commented Aug 9, 2013

On Thu, Aug 08, 2013 at 01:11:57PM -0700, Serge van Ginderachter wrote:

The md5sums being different, is perfectly normal due to the strip(). So the initial issue here is not really a bug to me.

strip() removes whitespace, therefore it is not expected that strip()
changes the file:
In [1]: '\xe4'.strip()
Out[1]: '\xe4'

I couldn't reproduce any problem with echo -e '\xe4' > ae.txt as this didn't show the ä character in that file, but manually editing ae.txt with that character, I got this error:

It is the character encoded in latin1. Please re-try and check whether
the local and remote file are the same.

"msg": "could not write content temp file: 'ascii' codec can't encode character u'\\xe4' in position 5: ordinal not in range(128)"

So there does seem to be a problem here. After some research I came to this
http://mypy.pythonblogs.com/12_mypy/archive/1253_workaround_for_python_bug_ascii_codec_cant_encode_character_uxa0_in_position_111_ordinal_not_in_range128.html

Tried a patch, and this solves that ascii error. Sound like a valid fix to me, but not sure if this is a proper fix. I'll propose a PR for evaluation.

It suppresses the exception but is not a valid fix, unless you want
non binary safe behaviour for the file lookup. But then why do you want
it?

@tyll
Copy link
Contributor Author

tyll commented Aug 9, 2013

On Thu, Aug 08, 2013 at 01:38:14PM -0700, Brian Coca wrote:

I was also thinking of adding a 'encoding' parameter to lookup to safely
decode the file (default utf8), which may or may not match the local
encoding.

Why does ansible need to know the encoding of the file and cannot just
take the file as it is? What is the advantage of this?

@srgvg
Copy link
Contributor

srgvg commented Aug 9, 2013

strip() removes whitespace, therefore it is not expected that strip()
changes the file:
In [1]: '\xe4'.strip()
Out[1]: '\xe4'

You created your test file with echo -e '\xe4' > ae.txt which means there is a trailing newline that gets stripped.

It suppresses the exception but is not a valid fix, unless you want
non binary safe behaviour for the file lookup. But then why do you want
it?

AFAICS it doesn't suppress the error, it somehow forces the fd.write() method to not use ASCII, but UTF8.

@tyll
Copy link
Contributor Author

tyll commented Aug 9, 2013

On Fri, Aug 09, 2013 at 12:34:30AM -0700, Serge van Ginderachter wrote:

strip() removes whitespace, therefore it is not expected that strip()
changes the file:
In [1]: '\xe4'.strip()
Out[1]: '\xe4'

You created your test file with echo -e '\xe4' > ae.txt which means there is a trailing newline that gets stripped.

Ok, this explains why it is different, but now why the target file is
empty. I showed that the md5sum of the target file is the md5sum of the
empty string.

It suppresses the exception but is not a valid fix, unless you want
non binary safe behaviour for the file lookup. But then why do you want
it?

AFAICS it doesn't suppress the error, it somehow forces the fd.write() method to not use ASCII, but UTF8.

But if the input file is not encoded in UTF8, this is wrong. Ansible
casts from str() to unicode() when the file lookup is performed. Then
the unicode() object is transfered via json and later it is casted from
unicode() to str() when the file is written in the copy module. Why do
you want that the output of the second cast does not always match the
input of the first cast?

@hngkr
Copy link

hngkr commented Nov 5, 2013

This fix actually breaks Ansible when it's running with redirected stdin (such as when a PlayBook is being executed in a Celery/Django task from Python code).

The culprint seems to be the reload(sys) that's introduced here. It resets the redirected stdin - which messes with what the runner expects. I'm getting this fatal error:

fatal: [127.0.0.1] => Traceback (most recent call last):
  File "/Users/henning/django/lib/python2.7/site-packages/ansible/runner/__init__.py", line 378, in _executor
    self._new_stdin = os.fdopen(os.dup(sys.stdin.fileno()))
ValueError: I/O operation on closed file

From this very simple playbook:

---
# file: site1.yml
- hosts: 127.0.0.1
  connection: local
  gather_facts: False
  tasks:
  - copy: src="/tmp/blarg" dest="/tmp/blob"

It works when running ansible-playbook in the console without any redirections, so I don't know if this is considered a fringe use-case?

@jctanner
Copy link
Contributor

jctanner commented Nov 5, 2013

@hngkr can you open a new bug please?

@rajasaur
Copy link
Contributor

rajasaur commented Dec 3, 2013

I have the same problem as @hngkr . Created #5146. Removing the patch makes it work just fine.

@mpdehaan
Copy link
Contributor

mpdehaan commented Dec 5, 2013

Thanks for opening a new ticket. It's easy to lose comments on closed tickets since they don't show up in the open bug list.

@ansibot ansibot added bug This issue/PR relates to a bug. and removed bug_report labels Mar 6, 2018
@ansible ansible locked and limited conversation to collaborators Apr 24, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue/PR relates to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants