Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NF: Intersphinx mapping to link to handbook content (fixes gh-4034) #4046

Merged
merged 1 commit into from Feb 6, 2020

Conversation

mih
Copy link
Member

@mih mih commented Jan 18, 2020

Demo link is included. All rendered references are version to match the
current "major" release.

Looks like this: --help output and help(clone) in Python

  handbook:3-001 (http://handbook.datalad.org/symbols)
    More information on Remote Indexed Archive (RIA) stores

The given URL points to a dedicated index section in the handbook, where
we can maintain any number of references that point to arbitrary
locations. The name 'symbols' is used, because this is how sphinx lists
numeric references. The ?-???? scheme aims to mimic manpage sections
(where we can come up with some categorization, if needed, and
incremental integer IDs).

  • 1: commands
  • 2: concepts
  • 3: use cases

Ping @adswa for awareness (I have pushed a corresponding change to the
handbook; directly, no PR).

@mih mih added documentation merge-if-ok labels Jan 18, 2020
@codecov
Copy link

codecov bot commented Jan 18, 2020

Codecov Report

Merging #4046 into master will increase coverage by 0.53%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4046      +/-   ##
==========================================
+ Coverage   89.12%   89.66%   +0.53%     
==========================================
  Files         274      274              
  Lines       36759    36764       +5     
==========================================
+ Hits        32761    32963     +202     
+ Misses       3998     3801     -197     
Impacted Files Coverage Δ
datalad/tests/test_utils.py 95.98% <0.00%> (+0.26%) ⬆️
datalad/support/tests/test_annexrepo.py 94.90% <0.00%> (+0.30%) ⬆️
datalad/config.py 97.14% <0.00%> (+0.31%) ⬆️
datalad/support/tests/test_network.py 100.00% <0.00%> (+0.36%) ⬆️
datalad/interface/tests/test_rerun.py 100.00% <0.00%> (+0.40%) ⬆️
datalad/support/network.py 86.40% <0.00%> (+0.46%) ⬆️
datalad/support/archives.py 89.47% <0.00%> (+0.52%) ⬆️
datalad/ui/dialog.py 88.75% <0.00%> (+0.59%) ⬆️
datalad/customremotes/tests/test_archives.py 89.54% <0.00%> (+0.65%) ⬆️
datalad/ui/progressbars.py 83.10% <0.00%> (+0.67%) ⬆️
... and 23 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 828cb93...e56769a. Read the comment docs.

@adswa
Copy link
Member

adswa commented Jan 18, 2020

It might be me or my setup, but when I build the docs locally on this commit, it only references the handbook correctly in the Python module reference of clone, but not in the commandline reference:

Python API works

image

cmdline reference doesn't

image

I'm not sure why, or what I might be doing wrong

@kyleam
Copy link
Collaborator

kyleam commented Jan 18, 2020

It might be me or my setup [...]

FWIW I was just in the process of writing the same thing.

@mih
Copy link
Member Author

mih commented Jan 18, 2020

That is kinda how it works. Only rendered refs look nice automatically. Like with any other ref, we can tweak the raw text version (which what is in docstring and manpage).

:ref:`better text <handook:usecase-scalable-datastore`

I have another implementation that uses a custom link role, but that has the same issue, because it can only show what is in the docstring. We could extend the docstring preprocessing to discover such refs and somehow make them better, but I had no good idea yet.

@adswa
Copy link
Member

adswa commented Jan 18, 2020

I think I found the culprits:

docs = re.sub(
r':\S+:`[^`]*`[\\]*',
lambda match: ':'.join(match.group(0).split(':')[2:]).strip('`\\'),
docs,
flags=re.MULTILINE | re.DOTALL)

# capitalize variables and remove backticks to uniformize with
# argparse output
docs = re.sub(
r'`\S*`',
lambda match: match.group(0).strip('`').upper(),
docs)

# Remove RST paragraph markup
docs = re.sub(
r'^.. \S+::',
lambda match: match.group(0)[3:-2].upper(),
docs,
flags=re.MULTILINE)

But without them, the commandline help contains rst markup:

datalad clone --help
Usage: datalad clone [-h] [-d DATASET] [-D DESCRIPTION] [--reckless [{auto}]]
                     SOURCE [PATH]

Obtain a dataset (copy) from a URL or local directory

The purpose of this command is to obtain a new clone (copy) of a dataset
and place it into a not-yet-existing or empty directory. As such `clone`
provides a strict subset of the functionality offered by `install`. Only a
single dataset can be obtained, and immediate recursive installation of
subdatasets is not supported. However, once a (super)dataset is installed
via `clone`, any content, including subdatasets can be obtained by a
subsequent `get` command.

Primary differences over a direct `git clone` call are 1) the automatic
initialization of a dataset annex (pure Git repositories are equally
supported); 2) automatic registration of the newly obtained dataset as a
subdataset (submodule), if a parent dataset is specified; and 3) support
for additional resource identifiers (DataLad resource identifiers as used
on datasets.datalad.org, and RIA store URLs as used for store.datalad.org;
see examples); and 4) automatic configurable generation of alternative
access URL for common cases (such as appending '.git' to the URL in case
the accessing the base URL failed).

.. seealso::

  :ref:`handbook:usecase-scalable-datastore`
    More information on Remote Indexed Archive (RIA) stores

@adswa
Copy link
Member

adswa commented Jan 18, 2020

oh sorry, I didn't see your earlier comment...

@kyleam
Copy link
Collaborator

kyleam commented Jan 18, 2020

Stepping back, I'm negative on this change in general because it makes the local --help and pydoc output less useful. I would prefer the existing problem---a working link that points to latest version rather the source-matched version---to a change that leads to a more targeted reference in one representation and an obscure reference in other representations.

@mih
Copy link
Member Author

mih commented Jan 19, 2020

I concur with the general criticism, and I am fine to not merge this PR.

That being said: The status quo (full URL) is rather fragile, in addition to being bulky. Here is, for example, a URL referencing a section in the handbook:

http://handbook.datalad.org/en/0.12/usecases/datastorage_for_institutions.html#the-data-store-as-a-git-annex-ria-remote

120 chars that are not only dependent on the specific source file name, but also on the particular wording of the headline. Given the state of the handbook (fast development, frequent structure optimizations), these will break in any representation of the documentation. Only the intersphinx approach gives us a compensation layer, as we only have to maintain the link anchors in the handbook sources, of which we can have any number, and the cost of making them stable "forever" is very low.

I have pushed an alternative solution based on extlinks that would lessen the pain of URL size, but remains fragile.

The last thing I can think about is to use index-markup to identify points of interested for linking from the docstring. Those would show up at http://handbook.datalad.org/en/0.12/genindex.htm so we could still have a versioned page, but would leave the final navigation step to users coming from the docstring/cmdline help. We cannot use the intersphinx resolution approach for the docstrings, because it would be way too slow.

Personally, without any such helper, I would not start adding handbook links to the docstrings, because they would be too painful to maintain.

ATM the following approach seems the most viable:

:ref:`DataLad handbook usecase 13 <handbook:usecase-scalable-datastore>`

Where 13 is some kind of identifier or numbering that is discoverable by a human via the table of contents. ATM the handbook does that at the chapter level for the "Basics" already. I would be fine to
have the docstring references be approximate/coarse, when the HTML and PDF links are up-to-date and precise.

@adswa
Copy link
Member

adswa commented Jan 20, 2020

FTR: I have begun to restructure the handbook toctree and numbered the use cases. This should make easy navigation based on a numerical identifier possible, and improves the index page of the handbook regardless of this proposed feature. Personally, I agree that the full URL is both ugly and fragile. What you refer to as "the most viable approach" seems very sensible and workable to me, compared with everything else, and I believe that at least for some functionality such as the RIA store URLS it is beneficial to point to handbook content.

@kyleam
Copy link
Collaborator

kyleam commented Jan 20, 2020

Perhaps I'm being obtuse or not understanding what's being proposed, but I don't think I'd look at

:ref:`DataLad handbook usecase 13 <handbook:usecase-scalable-datastore>`

and have much more sense what to do with it than

:ref:`handbook:usecase-scalable-datastore`

At any rate, please don't waste time on coming up with a solution to appease one dissenting opinion. We weigh the tradeoffs differently, but given that you two are the ones working on the handbook, your assessment matters more here. (And I don't think most datalad users share my fondness for local help/documentation.)

I do think you should try to not restructure the handbook too often. You want people to link to it, and no one likes broken URLs.

@mih
Copy link
Member Author

mih commented Jan 20, 2020

Perhaps I'm being obtuse or not understanding what's being proposed, but I don't think I'd look at

:ref:`DataLad handbook usecase 13 <handbook:usecase-scalable-datastore>`

and have much more sense what to do with it than

:ref:`handbook:usecase-scalable-datastore`

True in many ways. For the machine it is the same info, for a human reader I was hoping it would be enough, hence I proposed the latter initially, but it wasn't well received. The former is turned into

DataLad handbook usecase 13

which I would argue is damn close to something like Ezekiel 17, and as such proven to be decipherable for more than a thousand years ;-)

I do think you should try to not restructure the handbook too often. You want people to link to it, and no one likes broken URLs.

Same here. It will stop immediately, once an optimum is reached ;-)

@kyleam
Copy link
Collaborator

kyleam commented Jan 22, 2020

All right, I'm ok with being the dissenting opinion here, but not ok with it being blocking. I say let's reset to the initial commit (5a1ca18) and merge.

@mih
Copy link
Member Author

mih commented Jan 22, 2020

Dont worry, I am just distracted. You are not blocking anything.

@mih
Copy link
Member Author

mih commented Jan 23, 2020

FTR: @adswa is investigating how to best implement "medieval navigation" for the handbook. Once that is figured out, we can reset this to 5a1ca18 -- possibly implement a docstring rendering improvement -- and start filling the docs up with links.

…4034)

Demo link is included. All rendered references are version to match the
current "major" release.

Looks like this:

--help output and help(clone) in Python

```
  handbook:3-001 (http://handbook.datalad.org/symbols)
    More information on Remote Indexed Archive (RIA) stores
```

The given URL points to a dedicated index section in the handbook, where
we can maintain any number of references that point to arbitrary
locations. The name 'symbols' is used, because this is how sphinx lists
numeric references. The ?-???? scheme aims to mimic manpage sections
(where we can come up with some categorization, if needed, and
incremental integer IDs).

Ping @adswa for awareness (I have pushed a corresponding change to the
handbook; directly, no PR).
@mih
Copy link
Member Author

mih commented Feb 6, 2020

Ok, reworked the beast with @adswa Should be functional now. Top-comment edited to reflect current approach.

@mih
Copy link
Member Author

mih commented Feb 6, 2020

Unless there are strong objections, I will merge this tomorrow morning. This is mostly relevant for handbook content and @adswa was cool with it. The few links we will have in the docstrings are easy to change in structure and content later on.

@kyleam
Copy link
Collaborator

kyleam commented Feb 6, 2020

Unless there are strong objections, I will merge this tomorrow morning.

From my end, no new objections to the approach, and no objection to merging despite my objections.

@mih
Copy link
Member Author

mih commented Feb 6, 2020

Thx @kyleam !

@mih mih merged commit 8048696 into datalad:master Feb 6, 2020
15 of 17 checks passed
@mih mih deleted the nf-handbooklinks branch Feb 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation merge-if-ok
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants