Fix #167: allow non-ascii unicode chars in user and group names #178

adobeDan · 2017-05-05T06:11:17Z

With this fix, we are unicode-compliant (even in Python 2.7):

add a --config-file-encoding command-line parameter, so the config files (which contain the group mapping) can contain non-ascii characters.
add a string_encoding setting to the directory and ldap configurations, which controls the encoding of the input values.
Update the sample config files to show the new settings.
require v2.4.1 of umapi-client, to ensure that it's unicode-capable and matches its validations to the server-side validations.
Fix Allow secure credentials to be retrieved from environment, not disk #159 again by catching some edge cases around empty credential files.
Fix excluded user counts are wrong if adobe-only-user-action is exclude #173 again by actually counting the number of primary org users read from Adobe.

With this fix, we are unicode-compliant (even in Python 2.7): * add a `--config-file-encoding` command-line parameter, so the config files (which contain the group mapping) can contain non-ascii characters. * add a `string_encoding` setting to the directory and ldap configurations, which controls the encoding of the input values. * Update the sample config files to show the new settings. * require v2.4.1 of umapi-client, to ensure that it's unicode-capable and matches its validations to the server-side validations. * Fix #159 again by catching some edge cases around empty credential files. * Fix #173 again by actually counting the number of primary org users read from Adobe.

phil-levy

Only question in my mind is if 'rb' mode is correct for reading. Does it need to be 'b' because the internal io code looks for line ending and would work improperly if encoding was utf-16 or something like that?

adobeDan · 2017-05-05T19:54:56Z

I use 'rb' mode because non-binary modes are not guaranteed to allow all byte sequences on all platforms; that is, 'rb' is the reliable way to tell the OS not to try to do line buffering or implicit encoding (e.g., some OSes won't allow null characters in non-binary streams). Since I am going to be doing the decoding explicitly, I need to make sure the OS is not standing in the way.

In fact the YAML parser doesn't care what kinds of line-endings are used; YAML is very explicit about all sequences of line-ending characters (including form feed as well as CR and LF) being interpreted as a single line end. So in fact doing a binary stream in this context is guaranteed safe because the OS not doing line-end detection can't affect the parse.

In my testing of this feature I am actually using files with both kinds of line end on both platforms. It works flawlessly, which is very reassuring :).

adobeDan requested a review from phil-levy May 5, 2017 06:11

Set default 'ascii' for --config-file-encoding command-line arg.

3b09719

phil-levy approved these changes May 5, 2017

View reviewed changes

adobeDan merged commit 2211f2f into v2 May 5, 2017

adobeDan deleted the issue-167 branch May 5, 2017 19:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix #167: allow non-ascii unicode chars in user and group names #178

Fix #167: allow non-ascii unicode chars in user and group names #178

Uh oh!

adobeDan commented May 5, 2017

Uh oh!

phil-levy left a comment

Uh oh!

adobeDan commented May 5, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix #167: allow non-ascii unicode chars in user and group names #178

Fix #167: allow non-ascii unicode chars in user and group names #178

Uh oh!

Conversation

adobeDan commented May 5, 2017

Uh oh!

phil-levy left a comment

Choose a reason for hiding this comment

Uh oh!

adobeDan commented May 5, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants