Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding fixes required for group1 tests to pass without LC_ALL #58638

Open
wants to merge 4 commits into
base: devel
from

Conversation

@sivel
Copy link
Member

commented Jul 2, 2019

SUMMARY

Encoding fixes required for group1 tests to pass without LC_ALL

ISSUE TYPE
  • Bugfix Pull Request
COMPONENT NAME

many

ADDITIONAL INFORMATION

@sivel sivel requested review from abadger, mattclay and samdoran Jul 2, 2019

@sivel

This comment has been minimized.

Copy link
Member Author

commented Jul 2, 2019

I'll add LC_ALL back after tests run.

@ansibot

This comment has been minimized.

Copy link
Contributor

commented Jul 2, 2019

@sivel

This comment has been minimized.

Copy link
Member Author

commented Jul 2, 2019

I may have requested reviews too early. There is much more not functional that I am still working on.

@sivel sivel changed the title Encoding fixes required for group1 tests to pass without LC_ALL [WIP] Encoding fixes required for group1 tests to pass without LC_ALL Jul 2, 2019

@ansibot ansibot added the WIP label Jul 2, 2019

@sivel sivel closed this Jul 2, 2019

@sivel sivel reopened this Jul 3, 2019

@sivel sivel force-pushed the sivel:group1-no-lc-all branch from e091a63 to 6cb80fd Jul 3, 2019

@sivel sivel changed the title [WIP] Encoding fixes required for group1 tests to pass without LC_ALL Encoding fixes required for group1 tests to pass without LC_ALL Jul 3, 2019

@sivel

This comment has been minimized.

Copy link
Member Author

commented Jul 3, 2019

Ok, I believe I've gotten posix/group1 taken care of. I've re-opened this PR now.

I did address some of the network issues too. win_copy is failing, but I did not look at it. It can be taken care of later.

@@ -506,7 +506,6 @@ def raw_command(cmd, capture=False, env=None, data=None, cwd=None, explain=False
def common_environment():
"""Common environment used for executing all programs."""
env = dict(
LC_ALL='en_US.UTF-8',

This comment has been minimized.

Copy link
@sivel

sivel Jul 3, 2019

Author Member

This line should be reverted before merging.

Show resolved Hide resolved lib/ansible/errors/__init__.py Outdated
if sys.version_info[0] == 3:
contents = f.read().decode('utf-8')
else:
contents = f.read()

This comment has been minimized.

Copy link
@abadger

abadger Jul 10, 2019

Member

Couple notes:

  • Instead of doing a version check here, we could just use to_native(). Your code will work but I don't necessarily trust that the next person to edit this will understand what's going on with the python version check and decode.
  • For controller code we make everything text or explicitly leave it as bytes with a named variable. I believe that @mattclay wants the same convention for ansible-test code. This is kind of a plugin to ansible-test but I think we should probably follow the same convention here?

This comment has been minimized.

Copy link
@abadger

abadger Jul 10, 2019

Member

From slack:

mattclay: sivel abadger1999 We've avoided a lot of unicode issues in ansible-test by running with LC_ALL set, so most of the the encoding work in ansible-test has been on an as needed basis. Working towards removing the use of LC_ALL we'll want to go with the unicode sandwich, but we don't yet have good functions in place like _text provides to do that.

So we do want to go unicode sandwich here which means contents should become text on both python2 and python3. Then other code later on will have to be changed to be sure that it's not mixing text and bytes.

for line, text in enumerate(path_fd.readlines()):
match = re.search(r'((^\s*import\s+six\b)|(^\s*from\s+six\b))', text)
match = re.search(br'((^\s*import\s+six\b)|(^\s*from\s+six\b))', text)

This comment has been minimized.

Copy link
@abadger

abadger Jul 10, 2019

Member

Instead of changing the regex to a byte regex, I think it's better to convert the lines into text. The current code won't fail because it never touches the string that's been matched. However, it will quickly produce mangled output on Pyhton3 if we were to use the matched text in and update to this code. However, in some cases, that does have a knock on effect (making a text version of path to use in the print(). Changing the print format string to u"%s:%d:%d")

So this depends on whether mattclay is doing unicode sandwich for ansible-test code or not.

for i, line in enumerate(f.readlines()):
matches = ASSERT_RE.findall(line)

if matches:
lineno = i + 1
colno = line.index('assert') + 1
print('%s:%d:%d: raise AssertionError instead of: %s' % (path, lineno, colno, matches[0][colno - 1:]))
colno = line.index(b'assert') + 1

This comment has been minimized.

Copy link
@abadger

abadger Jul 10, 2019

Member

The b_ convention was missed for line.

This comment has been minimized.

Copy link
@abadger

abadger Jul 10, 2019

Member

I'm conflicted as to whether all of the matches style variables should also have a b_... I kinda lean towards yes. They aren't byte strings but a container of byte strings but the convention is designed to make it obvious when you're combining text and bytes. That doesn't work for matches unless you've marked that it contains byte strings.

self._templar.environment.loader.searchpath = j2_searchpath = []
for p in searchpath:
j2_searchpath.append(
to_native(

This comment has been minimized.

Copy link
@abadger

abadger Jul 10, 2019

Member

We should probably put a comment about why we do any double encoding/decodings when we do it. This one is particularly tricky because it deals with jinja2's expectations and native strings instead of either bytes or text. So it probably deserves some explicit spelling out of the different cases we expect to encounter.

@@ -46,6 +47,7 @@ class LookupModule(LookupBase):
def run(self, terms, variables, **kwargs):

ret = []
basedir = to_bytes(self._loader.get_basedir(), errors='surrogate_or_strict')

This comment has been minimized.

Copy link
@abadger

abadger Jul 10, 2019

Member

b_basedir ?

@@ -56,12 +58,12 @@ def run(self, terms, variables, **kwargs):
https://github.com/ansible/ansible/issues/6550
'''
term = str(term)
term = to_bytes(term, errors='surrogate_or_strict')

This comment has been minimized.

Copy link
@abadger

abadger Jul 10, 2019

Member

b_term

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.