Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nickname profile strips more whitespace characters than it should. #29

Closed
byllyfish opened this issue Oct 29, 2023 · 3 comments · Fixed by #32
Closed

Nickname profile strips more whitespace characters than it should. #29

byllyfish opened this issue Oct 29, 2023 · 3 comments · Fixed by #32

Comments

@byllyfish
Copy link
Owner

The Nickname profile includes an "additional mapping rule" that strips white space characters from both the beginning and end of the string. The current implementation uses the str.strip() method without a parameter. Python's default behavior defines a number of control characters as white space. I believe this is a bug; the code should be str.strip(' ') to just strip the standard ascii space.

The result of the Nickname enforce step is not affected by this bug. The code is simply over-permissive; it corrects the input more than it should. A diagnostic should tell the person to clean up their nickname string. A proper fix will include a diagnostic error message that reports "invalid_white_space". This will be a new error message.

Current behavior that is not technically correct:

Nickname profile:
  " a\n" -> "a"
  "\x1fb\x1f" -> "b"
  "\tc\r\n" -> "c"

Note that non-ascii Zs white space are converted to ASCII space 0x20 before any stripping takes place. The Precis spec only talks about stripping the Zs white space chars.

The following characters are stripped by Python:

  • U+09 (Cc)
  • U+0A (Cc)
  • U+0B (Cc)
  • U+0C (Cc)
  • U+0D (Cc)
  • U+1C (Cc)
  • U+1D (Cc)
  • U+1E (Cc)
  • U+1F (Cc)
  • U+85 (Cc)
  • U+2028 (Zl)
  • U+2029 (Zp)

Of these, I'm only worried about \n (U+0A). I might just continue to strip these, even though the spec doesn't say I have to. This is mostly for compatibility with existing software use.

@byllyfish
Copy link
Owner Author

I will continue to strip "\t", "\n", and "\r" when they appear at the beginning and end of a nickname. The other control characters will no longer be stripped and the enforce() operation will fail with "DISALLOWED/controls".

The Zl, Zp chars no longer be stripped, and the enforce() operation will fail with "DISALLOWED/other".

@SamWhited
Copy link

IIRC all three of those characters (\t, \n, and \r) should be disallowed by the underlying Freeform class, no? I don't think you need to strip them as an error should have already been returned before you even get to the nickname profile mapping rules.

@byllyfish
Copy link
Owner Author

You are correct that the characters \t, \r, and \n will result in a "DISALLOWED/controls" error if I don't strip them. They are caught when the FreeForm class is checked as the last step in enforce.

My concern was that it might be surprising if I stopped stripping them now in the Nickname profile. After sleeping on it, I realize it's better to emit the "DISALLOWED/controls" diagnostic as I had deviated from the spec which only mentions Zs class white space characters. A Nickname is just a display name; it's not a user/login name.

Basically, a nickname of "name\n" will now be DISALLOWED due to control characters. The previous version of the library automatically stripped the trailing "\n" and enforced the nickname to "name". (And it did this for all the leading/trailing Cc class chars that Python3 considers white space).

byllyfish added a commit that referenced this issue Oct 31, 2023
Don't strip \t, \r, \n either. (See #29)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants