-
-
Notifications
You must be signed in to change notification settings - Fork 31.6k
gh-69619: Add whitespace term to glossary and reference in stdtypes.rst
#132568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM.
@@ -2092,8 +2092,9 @@ expression support in the :mod:`re` module). | |||
|
|||
Return a copy of the string with leading characters removed. The *chars* | |||
argument is a string specifying the set of characters to be removed. If omitted | |||
or ``None``, the *chars* argument defaults to removing whitespace. The *chars* | |||
argument is not a prefix; rather, all combinations of its values are stripped:: | |||
or ``None``, the *chars* argument defaults to removing :term:`whitespace`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to link to https://docs.python.org/3/library/stdtypes.html#str.isspace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll yield, but I think a glossary term is the right way to go here. isspace()
methods should document quirks with the methods themselves, not necessarily provide the definition for whitespace.
@@ -3243,8 +3245,8 @@ produce new objects. | |||
*chars* argument is a binary sequence specifying the set of byte values to | |||
be removed - the name refers to the fact this method is usually used with | |||
ASCII characters. If omitted or ``None``, the *chars* argument defaults | |||
to removing ASCII whitespace. The *chars* argument is not a prefix; | |||
rather, all combinations of its values are stripped:: | |||
to removing :term:`ASCII whitespace <whitespace>`. The *chars* argument is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, better to link to https://docs.python.org/3/library/stdtypes.html#bytes.isspace?
@@ -1443,6 +1443,32 @@ Glossary | |||
A computer defined entirely in software. Python's virtual machine | |||
executes the :term:`bytecode` emitted by the bytecode compiler. | |||
|
|||
whitespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decide to keep this glossary entry (see other comments), it should mention Unicode first, and reduce the table to an in-line description (see the entry for bytes.isspace()) to take up less space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I suggested the table. I didn't realize there was precedent for the inline format.
I find the table significantly easier to read, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The glossary page is very long, we should avoid making it longer. Perhaps split up the characters though, eg "
(space), \t
(horizontal tab), ...".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it bad for the glossary to be long? I don't think people read it in order, they just click on terms elsewhere and get redirected. I would think that users prefer more information on individual terms rather than the overall glossary page being short.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not bad for it to be long, but rather longer than it needs to be. A full table here isn't needed to describe six characters, and as mentioned it takes the focus away from Unicode whitespace, which is the default set of whitespace operated on, unless using bytes/buffer functions, or re.ASCII
. The more common thing (Unicode) should be the focus, and we should avoid giving readers the expectation that whitespace is limited to the ASCII set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, fair point. Maybe there's a better way to emphasize Unicode here? I'm really not a fan of the inline version based on bytes.isspace
.
Continues: #14753
Remainder of files will be split into smaller prs. Included just stdtypes here.
📚 Documentation preview 📚: https://cpython-previews--132568.org.readthedocs.build/