ENH: Add formatters for numpydoc section ordering and name/type spacing #132

DWesl · 2022-07-21T18:14:04Z

Related to #125; I got the bits I could think how to automate for the docstring style I actually use (numpydoc)

Should probably extend the tests for more types of docstrings and more sections within each docstring.

Ideally the section test would have docstrings on

and would test the ordering of some subset of these sections:

Formatters for:

name-colon parameter spacing (x : float not x:float or x: float)
Section ordering
Section spacing (blank line between sections)
Length of line of hyphens after section header (should be same length as section header)

codecov · 2022-07-21T18:42:11Z

Codecov Report

Merging #132 (03833db) into main (edd53a0) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main      #132   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           20        21    +1     
  Lines          490       576   +86     
=========================================
+ Hits           490       576   +86

Impacted Files	Coverage Δ
...tringformatter/_configuration/arguments_manager.py	`100.00% <ø> (ø)`
pydocstringformatter/_formatting/__init__.py	`100.00% <100.00%> (ø)`
pydocstringformatter/_formatting/base.py	`100.00% <100.00%> (ø)`
...stringformatter/_formatting/formatters_numpydoc.py	`100.00% <100.00%> (ø)`

Pierre-Sassoulas

Looks pretty good. I was wondering if we should detect that a docstring is in numpy style, but it can be handled in configuration. (Do not add the formatter if you don't have numpy style docstring)

pydocstringformatter/_formatting/base.py

DanielNoord

I don't have the time for a full review right now but this is highly appreciated!

That said, I would like to handle numpy docstrings behind a --style= flag. I think for future maintainability that will be much easier.
Since in reality projects can use different styles at the same time --style should probably be an append type with the default being ["pep257"]. For the tests in this PR we would then use --style=numpy.

I should have probably documented this somewhere in the issue. Sorry about that!

We could also first create the PR that adds --style to keep the size of PRs limited. I might be able to do this myself but that might take 2/3 weeks.

@DWesl I have quickly hacked together a PR to add the --style flag as I had some spare time. See #138. After that it should be trivial to add "numpy" to the available choices for the option.

DWesl · 2022-07-28T19:33:07Z

To add numpy to the options, yes, but actually having that option do something will take a bit more work. I think I could arrange for "numpydoc" in run.config.style to imply both options I have here without terribly much trouble, but it sounded like you had a different idea for how the code should work.

DanielNoord · 2022-07-30T11:12:12Z

Yeah I haven't thought this through completely but I was thinking of perhaps adding a style attribute to Formatter which is "pep257" by default but can also be "numpy" and then add formatters to the list of formatters to run based on the config.style attribute.

Does that sound like it would work?

DWesl · 2022-07-30T12:09:54Z

The straightforward way to do that involves having the default for the formatter options depend on the --style value, which I don't know how to do with argparse

DWesl · 2022-07-30T15:06:30Z

It turns out BooleanOptionalAction doesn't type-check the default, so I can use None as the default for checkers associated with a particular style, then loop through the formatter options after parsing the options and set any values that are still None (i.e., they were not set on the command line) based on the values in self.namespace.style.

DanielNoord · 2022-08-08T18:44:33Z

@DWesl Just want to say this is on my radar. I'm on holiday currently but I expect to get to this PR somewhere this week! 😄

DanielNoord

I did a first review of the code associated with the adding of the new style. The formatters themselves look relatively straightforward and I didn't see anything obvious just now.

I'm really happy you worked on this as this seems like a very nice addition. Let me know if there is anything I can do to help, I'd like to merge and release this ASAP 😄

docs/usage.rst

pydocstringformatter/_configuration/arguments_manager.py

pydocstringformatter/_configuration/formatter_options.py

pydocstringformatter/_formatting/base.py

DanielNoord

I have removed all the stuff that was related to how formatters are being run and merged it in #145. That way we can focus on the numpy stuff here. Sorry for the push to your branch, but that seemed to most effective way forward.

Let me know what you think!

pydocstringformatter/_formatting/base.py

pydocstringformatter/_formatting/formatters_numpydoc.py

tests/test_config.py

pydocstringformatter/_configuration/arguments_manager.py

pydocstringformatter/_formatting/formatters_numpydoc.py

tests/test_config.py

pydocstringformatter/_formatting/formatters_numpydoc.py

DanielNoord

This PR is becoming quite large. If you think of any other formatters please add them in another PR.

I think this might be my last comments 😄

DanielNoord · 2022-08-11T07:20:37Z

docs/usage.rst

                                [files ...]

    positional arguments:
      files                 The directory or files to format.

-    options:
+    optional arguments:


Which interpreter version are you using locally?

Cygwin CPython 3.9; should I be using 3.8 or 3.7 for this?

No I think it might actually be 3.10 that is giving different results here... 😓

Anyway, let's fix this at the end of this PR.

Was hoping it might be because I did pip install -r requirements-test.txt instead of pip install -U -r requirements-test.txt, but adding the -U didn't seem to change anything

DanielNoord · 2022-08-11T07:23:52Z

pydocstringformatter/_formatting/base.py

+        # Rejoin sections
+        new_lines = [line for section in new_sections.values() for line in section]
+        # Ensure the last line puts the quotes in the right spot
+        # Enforces indented closing quotes on the last line


This probably shouldn't be handled by this formatter. Can we store whether the closing quotes were already on a new line and do this according to that?

We like to keep each formatter responsible for a single thing to provide optimal customisation for users.

So, new test for that. Would it help if I made all the test input files symlinks of numpydoc_style.py, with only the changes for that formatter reflected in the corresponding .py.out file?

That might work well in this case yeah!

Whole bunch of symlinks created and .args files expanded. I hope git knows how to set these up on other computers; I've had problems before.

DanielNoord · 2022-08-11T07:25:37Z

pydocstringformatter/_formatting/formatters_numpydoc.py

+    ) -> OrderedDict[str, list[str]]:
+        """Sort the numpydoc sections into the numpydoc order."""
+        new_sections = OrderedDict([sections.popitem(last=False)])
+        new_sections.update(


Can't we remove L36 and just return a sorted OrderedDict?

I don't see an OrderedDict.sort method here. and the implementation here looks like the ordering is kept by a linked list rather than a list I can sort.

To merge line 36 with 37-44, I would need a consistent name for the summary/deprecation warning/extended summary section, which currently uses the summary as the name. I can look into that, as it would simplify other things.

I don't see an OrderedDict.sort method here. and the implementation here looks like the ordering is kept by a linked list rather than a list I can sort.

👍

To merge line 36 with 37-44, I would need a consistent name for the summary/deprecation warning/extended summary section, which currently uses the summary as the name. I can look into that, as it would simplify other things.

Yeah, Summary seems fine to me. We could add a comment somewhere that it also includes the other two.

Initial section now named "Summary" unless the initial section is zero lines, in which case the initial section shares a name with the next section and gets disappeared by the dict constructor.

pydocstringformatter/_formatting/formatters_numpydoc.py

DanielNoord · 2022-08-11T07:28:15Z

pydocstringformatter/_formatting/formatters_numpydoc.py

+        new_sections = OrderedDict([])
+        first_section = True
+        for section_name, section_lines in sections.items():
+            if first_section:


But what if the first section isn't a summary? Could you add a test for this?

If it starts with

"""Parameters ----------

then there is no summary section; I think the current code would change this to

""" Parameters ------------

To fix this further, I would need a consistent name for the summary/deprecation warning/extended summary section.

See above.

Although I think this change is good, it should probably also be its own formatter. But let's add that in a follow up PR.

I think I made it so that starting with a section header on the first line will work. Should I change a test to make sure?

Yes, I think that would be good!

"Works" now, although interaction with the section ordering formatter can produce strange results:

""" Parameters ---------- ... Returns ------- ...

I think there's a formatter to strip the leading space before "Parameters" (may need two runs to finish); not sure about indenting Returns properly.

DWesl · 2022-08-11T10:23:16Z

pydocstringformatter/_formatting/formatters_numpydoc.py

+        new_sections = OrderedDict([sections.popitem(last=False)])
+        new_sections.update(
+            OrderedDict(
+                [
+                    (sec_name, sections[sec_name])
+                    for sec_name in sorted(
+                        sections.keys(),
+                        key=self.numpydoc_section_order.index,
+                    )
+                ]
+            )
+        )
+        return new_sections


Another option, using OrderedDict features and avoiding the KeyError on weird sections:

Suggested change

new_sections = OrderedDict([sections.popitem(last=False)])

new_sections.update(

OrderedDict(

[

(sec_name, sections[sec_name])

for sec_name in sorted(

sections.keys(),

key=self.numpydoc_section_order.index,

)

]

)

)

return new_sections

new_sections = sections.copy()

for sec_name in self.numpydoc_section_order:

try:

new_sections.move_to_end(sec_name)

except KeyError:

pass

for sec_name in new_sections.keys()[1:]:

if sec_name not in self.numpydoc_section_order:

new_sections.move_to_end(sec_name)

return new_sections

Too bad there isn't a move_to_front. But this seems to work!

There's move_to_end(key, last=False), which should do the same thing. My main reason for ordering it this way is now moot, so I should be able to iterate through reversed(self.numpydoc_section_order) without a problem.

DanielNoord

Lots of different things going on in this PR, but I feel like we're getting there 😄

DanielNoord · 2022-08-11T20:24:18Z

pydocstringformatter/_formatting/base.py

+
+        # Everything before the first section header is in a single
+        # summary/deprecation warning/extended summary section.  This
+        # ends up called "Summary".


Suggested change

# ends up called "Summary".

# ends up being called "Summary".

DanielNoord · 2022-08-11T20:31:38Z

pydocstringformatter/_formatting/base.py

+        )
+        if section_hyphen_lines and section_hyphen_lines[0] == 1:
+            # No summary/deprecation warning/extended summary section
+            section_starts.pop(0)


Can section_starts be a set to avoid this?

I need that list sorted and I'm not sure that sets are ordered the way dicts have been since 3.7 or so. This is the last time that variable is used, and dictionary semantics means most other problems are already dealt with; would you prefer I negated the condition and only included the other branch?

Yeah, let's only do the other branch! I didn't see it wasn't used anymore.

pydocstringformatter/_formatting/formatters_numpydoc.py

DanielNoord · 2022-08-11T20:36:36Z

pydocstringformatter/_formatting/formatters_numpydoc.py

+    ) -> OrderedDict[str, list[str]]:
+        """Ensure proper spacing between sections."""
+        new_sections = OrderedDict([])
+        for section_name, section_lines in sections.items():


Shall we replace in-place here as well?

pydocstringformatter/_formatting/formatters_numpydoc.py

DanielNoord · 2022-08-11T20:37:30Z

tests/data/format/numpydoc/numpydoc_header_line.py.out

+
+def sincos(theta):
+    """Returns
+    -------


This doesn't seem correct? Shouldn't it be a little longer?

It matches the length of the word "Returns", and is shorter in the file by three characters, matching the three double quotes starting that line.

It might be clearer as

def sincos(theta): """\ Returns ------- ... """

(same number of hyphens)

I mean, there isn't really a style guide for this I think, but imo it makes sense to add more - here so that the second line covers both the quotes and Returns.

DanielNoord · 2022-08-11T20:38:54Z

tests/data/format/numpydoc/numpydoc_section_ordering.py.out

+
+
+def sincos(theta):
+    """    Parameters


This should probably be fixed.

A variant on textwrap.dedent that ignored the indentation of the first line for determining common whitespace, which might work, then a similar variant of textwrap.indent at the end.

I would need to test how both handle

"""\ A summary line, indented the same as the others for all tools. Section Header -------------- ... """

Another option might be to replace the f"{quotes:s}{body:s}{quotes:s}" reconstructing the new docstring with f"{quotes:s}\\\n{body:s}{quotes:s}" if the first character of the body is whitespace, but that still leaves the question of what to do with that "Returns" at the left margin.

Can't we just check which line we are inserting on and if it's 0 we remove any indent and if higher we add an indent if it is missing?

That's another option.

The numpydoc style guide doesn't specify whether it prefers first line of docstring on same line as quotes or different line, but has examples of both. The numpydoc validation tool suggests the

"""\ Summary. Extended summary... """

form is preferred to the

"""Summary. Extended summary... """

form, which doesn't seem to agree with the style guide. It also insists on extended summary and "See Also" sections, which I don't see always being necessary.

Supporting the first form is relatively straightforward; looping through the first lines of each section and doing line.lstrip() on the first section or textwrap.indent(line) for subsequent sections is also straightforward.

DanielNoord · 2022-08-11T20:39:22Z

tests/data/format/numpydoc/numpydoc_section_ordering.py.out

+    theta: float
+        the angle at which to calculate the sine and cosine.
+
+Returns


This as well.

DanielNoord · 2022-08-16T18:27:37Z

pydocstringformatter/_formatting/base.py

+        # Everything before the first section header is in a single
+        # summary/deprecation warning/extended summary section.  This
+        # ends up being called "Summary".


Suggested change

# Everything before the first section header is in a single

# summary/deprecation warning/extended summary section. This

# ends up being called "Summary".

Now that I look at it again, this no longer makes sense here.

DanielNoord · 2022-08-16T18:29:09Z

pydocstringformatter/_formatting/base.py

+                section[0] = section[0].lstrip()
+            elif not section[0][0].isspace():
+                section[0] = f"{' ' * indent_length:s}{section[0]:s}"
+            first_section = False


Suggested change

section[0] = section[0].lstrip()

elif not section[0][0].isspace():

section[0] = f"{' ' * indent_length:s}{section[0]:s}"

first_section = False

section[0] = section[0].lstrip()

first_section = False

elif not section[0][0].isspace():

section[0] = f"{' ' * indent_length:s}{section[0]:s}"

Saves some reassignments 😄

DanielNoord · 2022-08-16T18:31:35Z

tests/data/format/numpydoc/numpydoc_header_line.py.out

+
+def sincos(theta):
+    """Returns
+    -------


I mean, there isn't really a style guide for this I think, but imo it makes sense to add more - here so that the second line covers both the quotes and Returns.

Co-authored-by: Daniël van Noord <13665637+DanielNoord@users.noreply.github.com> Co-authored-by: Pierre Sassoulas <pierre.sassoulas@gmail.com>

This commit adds ``NumpydocNameColonTypeFormatter``, ``NumpydocSectionHyphenLengthFormatter``, ``NumpydocSectionOrderingFormatter`` and ``NumpydocSectionSpacingFormatter`` Co-authored-by: Daniël van Noord <13665637+DanielNoord@users.noreply.github.com>

github-actions · 2022-08-21T11:11:08Z

According to the primer, this change has no effect on the checked open source code. 🤖🎉

DanielNoord · 2022-08-21T11:12:16Z

@DWesl Thanks for all the work you put into this PR. I'm going to make a release immediately after this has been merged 😄

This comment has been minimized.

Sign in to view

Pierre-Sassoulas added the enhancement New feature or request label Jul 21, 2022

Pierre-Sassoulas added this to the 0.7.0 milestone Jul 21, 2022

Pierre-Sassoulas reviewed Jul 21, 2022

View reviewed changes

pydocstringformatter/_formatting/base.py Outdated Show resolved Hide resolved

pydocstringformatter/_formatting/base.py Outdated Show resolved Hide resolved

DanielNoord self-requested a review July 21, 2022 19:13

DanielNoord reviewed Jul 21, 2022

View reviewed changes

This comment has been minimized.

Sign in to view

DanielNoord self-requested a review August 8, 2022 18:44

This comment has been minimized.

Sign in to view

DanielNoord reviewed Aug 9, 2022

View reviewed changes

This comment has been minimized.

Sign in to view

DanielNoord requested changes Aug 10, 2022

View reviewed changes

This comment has been minimized.

Sign in to view

DanielNoord requested changes Aug 11, 2022

View reviewed changes

DWesl commented Aug 11, 2022

View reviewed changes

DanielNoord requested changes Aug 11, 2022

View reviewed changes

This comment has been minimized.

Sign in to view

DanielNoord requested changes Aug 16, 2022

View reviewed changes

This comment has been minimized.

Sign in to view

Add numpydoc style and NumpydocSectionFormatter

9a59dff

Co-authored-by: Daniël van Noord <13665637+DanielNoord@users.noreply.github.com> Co-authored-by: Pierre Sassoulas <pierre.sassoulas@gmail.com>

DanielNoord force-pushed the numpydoc-simple-reformatting branch from ad56aeb to 03833db Compare August 21, 2022 11:10

DanielNoord approved these changes Aug 21, 2022

View reviewed changes

DanielNoord enabled auto-merge (rebase) August 21, 2022 11:11

DanielNoord merged commit 9e60bb9 into DanielNoord:main Aug 21, 2022

DanielNoord mentioned this pull request Aug 21, 2022

Feature Request: Support for Numpy, Google, and rst Docstrings #125

Open

	# ends up called "Summary".
	# ends up being called "Summary".

	# Everything before the first section header is in a single
	# summary/deprecation warning/extended summary section. This
	# ends up being called "Summary".

ENH: Add formatters for numpydoc section ordering and name/type spacing #132

ENH: Add formatters for numpydoc section ordering and name/type spacing #132

Conversation

DWesl commented Jul 21, 2022 • edited

This comment has been minimized.

codecov bot commented Jul 21, 2022 • edited

Codecov Report

Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

DanielNoord left a comment • edited

Choose a reason for hiding this comment

DWesl commented Jul 28, 2022

DanielNoord commented Jul 30, 2022

DWesl commented Jul 30, 2022

DWesl commented Jul 30, 2022

This comment has been minimized.

This comment has been minimized.

DanielNoord commented Aug 8, 2022

DanielNoord left a comment

Choose a reason for hiding this comment

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

DanielNoord left a comment

Choose a reason for hiding this comment

This comment has been minimized.

This comment has been minimized.

DanielNoord left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DWesl Aug 11, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DWesl Aug 11, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DWesl Aug 11, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DanielNoord Aug 11, 2022 • edited

Choose a reason for hiding this comment

DWesl Aug 11, 2022 • edited

Choose a reason for hiding this comment

DanielNoord left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment has been minimized.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment has been minimized.

github-actions bot commented Aug 21, 2022

DanielNoord commented Aug 21, 2022

DWesl commented Jul 21, 2022 •

edited

codecov bot commented Jul 21, 2022 •

edited

DanielNoord left a comment •

edited

DWesl Aug 11, 2022 •

edited

DWesl Aug 11, 2022 •

edited

DWesl Aug 11, 2022 •

edited

DanielNoord Aug 11, 2022 •

edited

DWesl Aug 11, 2022 •

edited