Skip to content

Conversation

@adobeDan
Copy link
Contributor

@adobeDan adobeDan commented May 5, 2017

With this fix, we are unicode-compliant (even in Python 2.7):

  • add a --config-file-encoding command-line parameter, so the config files (which contain the group mapping) can contain non-ascii characters.
  • add a string_encoding setting to the directory and ldap configurations, which controls the encoding of the input values.
  • Update the sample config files to show the new settings.
  • require v2.4.1 of umapi-client, to ensure that it's unicode-capable and matches its validations to the server-side validations.
  • Fix Allow secure credentials to be retrieved from environment, not disk #159 again by catching some edge cases around empty credential files.
  • Fix excluded user counts are wrong if adobe-only-user-action is exclude #173 again by actually counting the number of primary org users read from Adobe.

With this fix, we are unicode-compliant (even in Python 2.7):
* add a `--config-file-encoding` command-line parameter, so the config files (which contain the group mapping) can contain non-ascii characters.
* add a `string_encoding` setting to the directory and ldap configurations, which controls the encoding of the input values.
* Update the sample config files to show the new settings.
* require v2.4.1 of umapi-client, to ensure that it's unicode-capable and matches its validations to the server-side validations.
* Fix #159 again by catching some edge cases around empty credential files.
* Fix #173 again by actually counting the number of primary org users read from Adobe.
@adobeDan adobeDan requested a review from phil-levy May 5, 2017 06:11
Copy link
Contributor

@phil-levy phil-levy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only question in my mind is if 'rb' mode is correct for reading. Does it need to be 'b' because the internal io code looks for line ending and would work improperly if encoding was utf-16 or something like that?

@adobeDan
Copy link
Contributor Author

adobeDan commented May 5, 2017

I use 'rb' mode because non-binary modes are not guaranteed to allow all byte sequences on all platforms; that is, 'rb' is the reliable way to tell the OS not to try to do line buffering or implicit encoding (e.g., some OSes won't allow null characters in non-binary streams). Since I am going to be doing the decoding explicitly, I need to make sure the OS is not standing in the way.

In fact the YAML parser doesn't care what kinds of line-endings are used; YAML is very explicit about all sequences of line-ending characters (including form feed as well as CR and LF) being interpreted as a single line end. So in fact doing a binary stream in this context is guaranteed safe because the OS not doing line-end detection can't affect the parse.

In my testing of this feature I am actually using files with both kinds of line end on both platforms. It works flawlessly, which is very reassuring :).

@adobeDan adobeDan merged commit 2211f2f into v2 May 5, 2017
@adobeDan adobeDan deleted the issue-167 branch May 5, 2017 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

excluded user counts are wrong if adobe-only-user-action is exclude Allow secure credentials to be retrieved from environment, not disk

3 participants