Skip to content

gh-69619: Add whitespace term to glossary and reference in stdtypes.rst #132568

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

StanFromIreland
Copy link
Contributor

@StanFromIreland StanFromIreland commented Apr 15, 2025

Continues: #14753

Remainder of files will be split into smaller prs. Included just stdtypes here.


📚 Documentation preview 📚: https://cpython-previews--132568.org.readthedocs.build/

StanFromIreland and others added 2 commits April 16, 2025 09:04
Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM.

@@ -2092,8 +2092,9 @@ expression support in the :mod:`re` module).

Return a copy of the string with leading characters removed. The *chars*
argument is a string specifying the set of characters to be removed. If omitted
or ``None``, the *chars* argument defaults to removing whitespace. The *chars*
argument is not a prefix; rather, all combinations of its values are stripped::
or ``None``, the *chars* argument defaults to removing :term:`whitespace`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll yield, but I think a glossary term is the right way to go here. isspace() methods should document quirks with the methods themselves, not necessarily provide the definition for whitespace.

@@ -3243,8 +3245,8 @@ produce new objects.
*chars* argument is a binary sequence specifying the set of byte values to
be removed - the name refers to the fact this method is usually used with
ASCII characters. If omitted or ``None``, the *chars* argument defaults
to removing ASCII whitespace. The *chars* argument is not a prefix;
rather, all combinations of its values are stripped::
to removing :term:`ASCII whitespace <whitespace>`. The *chars* argument is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1443,6 +1443,32 @@ Glossary
A computer defined entirely in software. Python's virtual machine
executes the :term:`bytecode` emitted by the bytecode compiler.

whitespace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to keep this glossary entry (see other comments), it should mention Unicode first, and reduce the table to an in-line description (see the entry for bytes.isspace()) to take up less space.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I suggested the table. I didn't realize there was precedent for the inline format.

I find the table significantly easier to read, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The glossary page is very long, we should avoid making it longer. Perhaps split up the characters though, eg " (space), \t (horizontal tab), ...".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it bad for the glossary to be long? I don't think people read it in order, they just click on terms elsewhere and get redirected. I would think that users prefer more information on individual terms rather than the overall glossary page being short.

Copy link
Member

@AA-Turner AA-Turner Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not bad for it to be long, but rather longer than it needs to be. A full table here isn't needed to describe six characters, and as mentioned it takes the focus away from Unicode whitespace, which is the default set of whitespace operated on, unless using bytes/buffer functions, or re.ASCII. The more common thing (Unicode) should be the focus, and we should avoid giving readers the expectation that whitespace is limited to the ASCII set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, fair point. Maybe there's a better way to emphasize Unicode here? I'm really not a fan of the inline version based on bytes.isspace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting core review docs Documentation in the Doc dir skip news
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

3 participants