Skip to content

Fix documentation - remove extra periods between paragraphs#460

Merged
chmreid merged 4 commits intomasterfrom
chmreid-fix-docs-1
Nov 6, 2019
Merged

Fix documentation - remove extra periods between paragraphs#460
chmreid merged 4 commits intomasterfrom
chmreid-fix-docs-1

Conversation

@chmreid
Copy link
Copy Markdown
Contributor

@chmreid chmreid commented Oct 15, 2019

This will address one of the items brought up in issue #443

The documentation has a paragraph with stray commas everywhere, this fixes the problem

Copy link
Copy Markdown
Contributor

@jessebrennan jessebrennan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DailyDreaming added the periods which was an artifact of some changes that fixed the way help text was displayed (this can be confirmed by digging through the git history). Compare the help diffs with and without them

...
Files are always downloaded to a cache / filestore directory called '.hca'. This directory is created in the
current directory where download is initiated. A copy of the manifest used is also written to the current
directory. This manifest has an added column that lists the paths of the files within the '.hca' filestore.
.
The default layout is none. In this layout all of the files are downloaded to the filestore and the
recommended way of accessing the files in by parsing the manifest copy that's written to the download
directory.
.
The bundle layout still downloads all of files to the filestore. For each bundle mentioned in the
manifest a directory is created. All relevant metadata files for each bundle are linked into these
directories in addition to relevant data files mentioned in the manifest.
.
Each row in the manifest represents one file in DSS. The manifest must have a header row. The header row
must declare the following columns.
.
bundle_uuid - the UUID of the bundle containing the file in DSS.
bundle_version - the version of the bundle containing the file in DSS.
file_name - the name of the file as specified in the bundle.
file_uuid - the UUID of the file in the DSS.
file_sha256 - the SHA-256 hash of the file.
file_size - the size of the file.
.
The TSV may have additional columns. Those columns will be ignored. The ordering of the columns is
insignificant because the TSV is required to have a header row.
.
This download format will serve as the main storage format for downloaded files. If a user specifies a different
format for download (coming in the future) the files will first be downloaded in this format, then hard-linked
to the user's preferred format.
...

vs

...
Files are always downloaded to a cache / filestore directory called '.hca'. This directory is created in the
current directory where download is initiated. A copy of the manifest used is also written to the current
directory. This manifest has an added column that lists the paths of the files within the '.hca' filestore.
The default layout is none. In this layout all of the files are downloaded to the filestore and the
recommended way of accessing the files in by parsing the manifest copy that's written to the download
directory.
The bundle layout still downloads all of files to the filestore. For each bundle mentioned in the
manifest a directory is created. All relevant metadata files for each bundle are linked into these
directories in addition to relevant data files mentioned in the manifest.
Each row in the manifest represents one file in DSS. The manifest must have a header row. The header row
must declare the following columns:
bundle_uuid - the UUID of the bundle containing the file in DSS.

bundle_version - the version of the bundle containing the file in DSS.

file_name - the name of the file as specified in the bundle.

file_uuid - the UUID of the file in the DSS.

file_sha256 - the SHA-256 hash of the file.

file_size - the size of the file.
The TSV may have additional columns. Those columns will be ignored. The ordering of the columns is
insignificant because the TSV is required to have a header row.
This download format will serve as the main storage format for downloaded files. If a user specifies a different
format for download (coming in the future) the files will first be downloaded in this format, then hard-linked
to the user's preferred format.
...

I'm not really sure what the best resolution is. Either add this fix and reopen the ticket about the misformatted --help message, abandon this change, or find a fix for both.

@chmreid
Copy link
Copy Markdown
Contributor Author

chmreid commented Oct 15, 2019

@jessebrennan by "the way the help text is displayed" do you mean the help printed on the command line? (Is that what you pasted above?)

This PR addresses an issue with the sphinx documentation. IMHO the --help flag should be able to handle two consecutive line breaks without modification. There are also multiple docstrings that are formatted the same as the one I've changed (e.g., the function right above it), so is the --help flag breaking the output for those too?

@jessebrennan
Copy link
Copy Markdown
Contributor

@chmreid, sorry yes I should have been more clear. If you run

hca dss download-manifest --help

You will see the formatting of the output is messed up by your change.

This PR addresses an issue with the sphinx documentation. IMHO the --help flag should be able to handle two consecutive line breaks. There are also multiple docstrings that are formatted the same as the one I've changed (for example, the docstring of the function right above this one), so is the --help flag breaking the output for those too?

I don't understand what you are saying here.

@chmreid
Copy link
Copy Markdown
Contributor Author

chmreid commented Oct 16, 2019

To clarify what I was asking:

I'm looking in the same file that I modified in this PR, and looking at the function download() right above the function download_manifest() that I modified, and looking at the docstring of the download() function:

dcp-cli/hca/dss/__init__.py

Lines 202 to 233 in 7213d95

"""
Download a bundle and save it to the local filesystem as a directory.
:param str bundle_uuid: The uuid of the bundle to download
:param str replica: the replica to download from. The supported replicas are: `aws` for Amazon Web Services, and
`gcp` for Google Cloud Platform. [aws, gcp]
:param str version: The version to download, else if not specified, download the latest. The version is a
timestamp of bundle creation in RFC3339
:param str download_dir: The directory into which to download
:param iterable metadata_filter: One or more shell patterns against which all metadata files in the bundle will
be matched case-sensitively. A file is considered a metadata file if the
`indexed` property in the manifest is set. If and only if a metadata file
matches any of the patterns in `metadata_files` will it be downloaded.
:param iterable data_filter: One or more shell patterns against which all data files in the bundle will be
matched case-sensitively. A file is considered a data file if the `indexed` property
in the manifest is not set. The file will be downloaded only if a data file matches
any of the patterns in `data_files` will it be downloaded.
:param no_metadata: Exclude metadata files. Cannot be set when --metadata-filter is also set.
:param no_data: Exclude data files. Cannot be set when --data-filter is also set.
:param int num_retries: The initial quota of download failures to accept before exiting due to failures.
The number of retries increase and decrease as file chucks succeed and fail.
:param float min_delay_seconds: The minimum number of seconds to wait in between retries.
Download a bundle and save it to the local filesystem as a directory.
By default, all data and metadata files are downloaded. To disable the downloading of data files,
use `--data-files ''` if using the CLI (or `data_files=()` if invoking `download` programmatically).
Likewise for metadata files.
If a retryable exception occurs, we wait a bit and retry again. The delay increases each time we fail and
decreases each time we successfully read a block. We set a quota for the number of failures that goes up with
every successful block read and down with each failure.
"""

That docstring is formatted exactly the same way as my new proposed/changed docstring. So I was wondering askingif using the command line interface with the --help flag to display help for download() messes up the formatting the same way that the --help flag to display help for download_manifest() messes up the formatting.


To try and answer this question, I ran the hca dss download --help command and saw that in fact the --help flag is turning any arbitrary number of newline characters into a single newline character, so the docstring

        Download a bundle and save it to the local filesystem as a directory.
        By default, all data and metadata files are downloaded. To disable the downloading of data files,
        use `--data-files ''` if using the CLI (or `data_files=()` if invoking `download` programmatically).
        Likewise for metadata files.
        If a retryable exception occurs, we wait a bit and retry again.  The delay increases each time we fail and
        decreases each time we successfully read a block.  We set a quota for the number of failures that goes up with
        every successful block read and down with each failure.

is rendered (via hca dss download --help) as:

Download a bundle and save it to the local filesystem as a directory.
By default, all data and metadata files are downloaded. To disable the downloading of data files,
use --data-files '' if using the CLI (or data_files=() if invoking download programmatically).
Likewise for metadata files.
If a retryable exception occurs, we wait a bit and retry again.  The delay increases each time we fail and
decreases each time we successfully read a block.  We set a quota for the number of failures that goes up with
every successful block read and down with each failure.

Note the missing two newlines.


If I add a bullet list to the docstring, like so:

        Download a bundle and save it to the local filesystem as a directory.

        By default, all data and metadata files are downloaded. To disable the downloading of data files,
        use `--data-files ''` if using the CLI (or `data_files=()` if invoking `download` programmatically).
        Likewise for metadata files.

        - I will not talk during class
        - I will not talk during class
        - I will not talk during class
        - I will not talk during class
        - I will not talk during class

        If a retryable exception occurs, we wait a bit and retry again.  The delay increases each time we fail and
        decreases each time we successfully read a block.  We set a quota for the number of failures that goes up with
        every successful block read and down with each failure.

here is the corresponding output (via hca dss download --help):

Download a bundle and save it to the local filesystem as a directory.
By default, all data and metadata files are downloaded. To disable the downloading of data files,
use --data-files '' if using the CLI (or data_files=() if invoking download programmatically).
Likewise for metadata files.
I will not talk during class

I will not talk during class

I will not talk during class

I will not talk during class

I will not talk during class
If a retryable exception occurs, we wait a bit and retry again.  The delay increases each time we fail and
decreases each time we successfully read a block.  We set a quota for the number of failures that goes up with
every successful block read and down with each failure.

So, the docstrings are being formatted using a kind of quasi-ReST (restructured text) so that Sphinx can interpret it and turn it into nice fancy HTML. But the command line --help option does not do that, and mangles it in some way that does not preserve formatting.

Will keep looking into this to see if there is a solution that works for both. Ultimately whatever the --help flag does to mangle docstrings before displaying them should be fixed, but it's uncertain how easy/difficult that will be.

@chmreid
Copy link
Copy Markdown
Contributor Author

chmreid commented Oct 16, 2019

passing a formatter_class argument to Argparse (when telling it to display func.__doc__ for the help string) allows defining custom behavior for how Argparse mangles the docstring. See this stack overflow answer and related info in argparse documentation.

I will add that change in a separate pull request, and then we can merge this PR if/when the formatter class is changed so that both sphinx and the CLI --help are formatted correctly.

Copy link
Copy Markdown
Contributor

@jessebrennan jessebrennan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I don't think the --help stuff has to be blocking this PR. I think it's totally reasonable to merge this, and make a new ticket to address the --help output formatting.

Also for reference, here's the original PR that created those changes #395

edit: although I wouldn't consider my review sufficient approval to merge; you may want a PL to take a look.

@codecov-io
Copy link
Copy Markdown

codecov-io commented Oct 16, 2019

Codecov Report

Merging #460 into master will increase coverage by 0.07%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #460      +/-   ##
==========================================
+ Coverage   85.35%   85.42%   +0.07%     
==========================================
  Files          41       41              
  Lines        1864     1873       +9     
==========================================
+ Hits         1591     1600       +9     
  Misses        273      273
Impacted Files Coverage Δ
hca/dss/__init__.py 90.13% <ø> (ø) ⬆️
hca/util/_docs.py 97.61% <100%> (+0.64%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3789edf...f228429. Read the comment docs.

@chmreid
Copy link
Copy Markdown
Contributor Author

chmreid commented Oct 17, 2019

Looking into this more, in util/_docs.py there is a utility _parse_docstring(), which uses Sphinx's ReST parser to parse the docstrings, and that function is used to create help for the command line utilities in hca/util/__init__.py.

This _parse_docstring() function is where the ReST bullet list is converted to plain text (and mangled in the process). From this debugger session you can see that the RSTParser object identifies this portion of the docstring as a bullet list:

(Pdb) where
  /Users/charles/codes/dcp-cli/vp/bin/hca(11)<module>()
-> load_entry_point('hca==6.4.0', 'console_scripts', 'hca')()
  /Users/charles/codes/dcp-cli/vp/lib/python3.6/site-packages/hca-6.4.0-py3.6.egg/hca/cli.py(108)main()
-> parser = get_parser(help_menu=True)
  /Users/charles/codes/dcp-cli/vp/lib/python3.6/site-packages/hca-6.4.0-py3.6.egg/hca/cli.py(97)get_parser()
-> dss_cli.add_commands(parser._subparsers, help_menu=help_menu)
  /Users/charles/codes/dcp-cli/vp/lib/python3.6/site-packages/hca-6.4.0-py3.6.egg/hca/dss/cli.py(12)add_commands()
-> dss_cli_client.build_argparse_subparsers(dss_subparsers, help_menu=help_menu)
  /Users/charles/codes/dcp-cli/vp/lib/python3.6/site-packages/hca-6.4.0-py3.6.egg/hca/util/__init__.py(612)build_argparse_subparsers()
-> method_args = _parse_docstring(docstring)
> /Users/charles/codes/dcp-cli/vp/lib/python3.6/site-packages/hca-6.4.0-py3.6.egg/hca/util/_docs.py(92)_parse_docstring()
-> if node.tagname is 'paragraph' and method_args['summary'] == '':

(Pdb) p node
<bullet_list: <list_item...><list_item...><list_item...><list_item...> ...>

But the _parse_docstring() function does not treat bullet lists differently, so it ends up converting each bullet as separate paragraphs, with no leading/trailing newline, as mentioned above:

(Pdb) p node.astext()
'bundle_uuid - the UUID of the bundle containing the file in DSS.\n\nbundle_version - the version of the bundle containing the file in DSS.\n\nfile_name - the name of the file as specified in the bundle.\n\nfile_uuid - the UUID of the file in the DSS.\n\nfile_sha256 - the SHA-256 hash of the file.\n\nfile_size - the size of the file.'

To resolve this, I wrote a function called render_bullet_list() that turns a bullet list into text and added it to this PR.

@chmreid chmreid force-pushed the chmreid-fix-docs-1 branch 2 times, most recently from d2e1360 to 7c926f5 Compare October 17, 2019 21:48
@chmreid chmreid self-assigned this Oct 30, 2019
@chmreid chmreid force-pushed the chmreid-fix-docs-1 branch from 7c926f5 to 76391fd Compare November 1, 2019 19:00
@chmreid
Copy link
Copy Markdown
Contributor Author

chmreid commented Nov 1, 2019

Rebasing onto master

@chmreid chmreid mentioned this pull request Nov 1, 2019
13 tasks
Copy link
Copy Markdown
Contributor

@DailyDreaming DailyDreaming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@chmreid chmreid merged commit 8008c51 into master Nov 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants