Skip to content
This repository

The metacpan.org/module/* url space should not include scripts #176

Closed
timbunce opened this Issue July 25, 2011 · 33 comments

9 participants

Tim Bunce Leo Lapworth Olaf Alders Moritz Onken Olivier Mengué Mike Doherty Randy Stauner Jay Allen Daniel Perrett
Tim Bunce
Collaborator

Scripts, like mcpani, have a url like https://metacpan.org/module/mcpani

That will cause a problem if a module is ever released with the same name as an existing script.

PAUSE manages the namespace for modules but there's no such namespace management for scripts.

I suggest scripts get put in a /script/ url like https://metacpan.org/script/mcpani

Olaf Alders
Owner

Excellent idea. I would say this is, on some level, related to this issue: CPAN-API/cpan-api#110 ie. we need to define what should or should not exist in the module namespace.

Moritz Onken
Owner
monken commented July 25, 2011

I agree. However we have to keep in mind that there is also documentation, which consists of POD only (.pod extension). Should they live in the module namespace or the script namespace?

Tim Bunce
Collaborator

They can't live in the module namespace because PAUSE manages that namespace allocation and doesn't pay attention to .pod files, only to perl code with "package ...;".

I suggest they simply use source/... for now.

I can't stress enough how important it is to think through the mapping to urls. It's better to use explicit source/... urls that include the dist version for anything that hasn't been thought-through yet. http://www.w3.org/Provider/Style/URI.html

Moritz Onken
Owner
monken commented July 26, 2011

But there are several modules that ship with Module.pm and Module.pod in the same directory. /module/Module should then show the .pod instead of the .pm, because the .pm doesn't contain any pod. That's how sco does it, and that's how we do it at the moment. That's a special case though and I generally agree that we should have only modules listed in the pause module index under the /module namespace.

Olivier Mengué
dolmen commented July 26, 2011

@monken: But currently, when we ask to view the source of some /module/Module, we get empty content. Because the documentation is stored in a .pod and the code in a .pm and MetaCPAN does not automatically switches from one to the other.

Moritz Onken
Owner
monken commented July 26, 2011

damn regressions, used to work in the pre-catalyst era :)

Moritz Onken
Owner

I'm dragging @clintongormley into this.

I'd like to discuss a uri scheme that we can then implement.

  • /module/Moose::Meta::Class
  • /module/FLORA/Moose-2.0204/lib/Moose/Meta/Class.pm

This will return the latest version of Moose::Meta::Class. If we find documentation for that package in the file or the corresponding .pod file, we will use that and render it as html. If we don't, we will show the source code of the package.

  • /pod/cpanm
  • /pod/Catalyst::Manual::About
  • /pod/BOBTFISH/Catalyst-Manual-5.9000/lib/Catalyst/Manual/About.pod

Those files don't have a package declaration and are thus to be handled separately. Since there is no central index that makes sure that there are only unique names, we have to apply some kind of heuristic. What I do currently is to look only in releases that are flagged as "latest". Releases are flagged as latest if they contain a package that is in the 02packages.details.txt file.

People are also requesting to cover a third case, where they want to access files independent of the release version, but the distribution.

  • /distribution/Moose/README
  • /distribution/Catalyst/lib/Catalyst/Manual/About.pod

This requires always a full path to the file, but without the full release name (author + version).

Anything else?

Tim Bunce
Collaborator
  • /module/Moose::Meta::Class

Good. Simple, obvious, direct, short, unambiguous.

  • /module/FLORA/Moose-2.0204/lib/Moose/Meta/Class.pm

I'm really not keen on this. It seems like an abuse of the /module/ url namespace. I think /pod and /file could be used to access the pod and source of specific files.

  • /pod/cpanm

Not safe. There's no namespace management for scripts. Two dists could easily contain scripts with the same name. Let's avoid a debate about which one to show.

  • /pod/Catalyst::Manual::About

Redundant given /module/Catalyst::Manual::About

  • /pod/BOBTFISH/Catalyst-Manual-5.9000/lib/Catalyst/Manual/About.pod

Fine.

  • /distribution/Moose/README
  • /distribution/Catalyst/lib/Catalyst/Manual/About.pod

That's a good idea but is ambiguous about pod vs raw.

Alternative

It seems to be there are two things the url is trying to achieve: a) specifying what's being referred to, and b) specifying how to display information about it.

The What:

  • module/Foo::Bar
  • release/AUTHOR/Foo-Bar-1.23/$filepath <-- filepath could be empty to refer to the release file itself
  • distribution/Foo-Bar/$filepath <-- is this robust without an AUTHOR? Maybe allow an optional AUTHOR
  • script/Foo-Bar/$scriptname <-- just a suggestion

The How to present it:

  • pod - show pod if there is, else show source
  • file - show source
  • raw - return the content with no formatting at all
  • meta - return metacpan data about the specified thing (formatted in HTML or JSON per the Accept http header)

Since the What can include an arbitrary number of slashes it makes sense to put the How in front:

  • /$how/$what

Examples:

  • /pod/module/Foo::Bar
  • /file/module/Foo::Bar
  • /raw/release/AUTHOR/Foo-Bar-1.23/Changes
  • /raw/release/AUTHOR/Foo-Bar-1.23 <-- return the tarball

Your previous examples would be:

  • /pod/module/Moose::Meta::Class
  • /pod/release/FLORA/Moose-2.0204/lib/Moose/Meta/Class.pm
  • /pod/script/App::cpanm/cpanm
  • /pod/module/Catalyst::Manual::About
  • /pod/release/BOBTFISH/Catalyst-Manual-5.9000/lib/Catalyst/Manual/About.pod
  • /file/distribution/Moose/README
  • /file/distribution/Catalyst/lib/Catalyst/Manual/About.pod
Tim Bunce
Collaborator

And the existing /requires/module/DBI fits nicely into that scheme.

And suggests interesting extensions: /requires/distribution/Foo-Bar could show all the distributions that depend on any of the modules in Foo-Bar.

Moritz Onken
Owner

A problem with this scheme came up that we were not able to solve.

We somehow have to define a common endpoint for documentation because documentation of modules contains links to other modules (or scripts or pods) without any indication. So it's both L<cpanm> and L<Moose>, which we have to translate to some common uri.

RE /pod/Catalyst::Manual::About being redundant to /module/...: Catalyst::Manual::About is not a module, because the documentation is in a .pod file, there is no .pm file and no package declaration. Only modules listed in the 02packages file are considered modules.

Olivier Mengué

The fact is that there are two namespaces for code of scripts and modules, but a single common one for POD. But this is only a problem for single word entries (multiple words separated with '::' are always modules) without extension ("foo.pl" is a script).
For the remaining ambiguous names (/\A\w+\z/), we could use this heuristics:

  • starts with capital => lookup for a module, but if fails lookup for a script
  • known pragma (lookup with Module::CoreList) => module, perldoc.perl.org ?
  • else lookup for a script, but if fails lookup for a module

And note that I don't think that we should distinguish URLs for POD whether it is stored in a .pm or in a .pod if a corresponding .pm or script exists. It is the author choice to embed the POD in the code or not, and that should not impact how the doc is displayed: this is the current behavior of perldoc and this rule should not be broken.

@monken An endpoint has to be defined to present links in HTML view of POD. But this endpoint does not have to be the final destination. Of course, this would be better if that was. But for ambiguous (as defined above) POD links, a temporary endpoint could redirect to the final endpoint with an HTTP 302. So the (maybe costly) resolution of ambiguous names could be done only for files to be displayed, not for any link.

Tim Bunce
Collaborator

@dolmen The redirect is a good idea.

Automatically redirect a /pod/$foo to the resolved /pod/module/$foo or /pod/script/$dist/$foo if there's only one current possibility. If there's more than one then show a 'disambiguate' page that lists the possibilities so the user can choose.

(The non-ambigous cases could be cached and used to present the 'right' link in the source document when it's rendered.)

And I agree that the form the author chooses to release module pod as shouldn't alter the endpoint url for the module pod.

Mike Doherty
doherty commented May 31, 2012

Perhaps a helpful illustration: https://metacpan.org/release/App-p links to https://metacpan.org/module/p for the script 'p'. When App::p was uploaded, that page was the documentation that came with App::p. But now, it shows something else - the docs for a different dist that also has a script called 'p' that was uploaded later. So we're not even consistent!

Leo Lapworth
Owner

Another example where /module/XXX doesn't work e.g. http://search.cpan.org/dist/CPAN/scripts/cpan
doesn't match https://metacpan.org/module/cpan

Moritz Onken
Owner

@ranguard why should it? If any, it should match http://search.cpan.org/perldoc?cpan (but doesn't). There is no index for app documentation so metacpan makes up its own.

Leo Lapworth
Owner

@monken - https://metacpan.org/module/cpanm works - which I appreciate is probably a side affect not a feature, I was just giving another example where this hack doesn't work.

I'd like a URL to link to these that isn't a hack see issue #805

Moritz Onken
Owner

me too, but this issue is about something else :)

Randy Stauner
Owner

From the perspective of a pod parser:

perlpod describes the L<> construct as a link to "a Perl manual page" and has specific examples for modules (Net::Ping) and perldocs (perlsyn). (Also man pages but that's less relevant.)

It doesn't mention scripts or other dist files, but it does use the same construct to link to indexed modules and non-indexed pod (perlsyn is not in 02packages).

perlpodspec however adds scripts to the list of what an L<> can link to:

the name of a Pod page like L<Foo::Bar> (which might be a real Perl module or program in an @INC / PATH directory, or a .pod file in those places)

So I think I'd vote for the single endpoint that would make a best guess as to where to redirect them (module, then pod, then script, then whatever else), and offer a disambiguation page if the identifier is non-unique.

I also wonder if there might be value in having that decision be made by the api, but i haven't really thought that through yet.

Of course that isn't really what this ticket was about, which was offering a different endpoint (other than module) that would show the html version of pod for non-module files.

Randy Stauner
Owner

I'm going to take a crack at a /pod/type/path controller and we'll go from there.

Randy Stauner
Owner

So along the lines of @timbunce's "alternative" suggestion...

  • /pod/module/Moose::Meta::Class
    • Latest version of indexed module
    • Same as /module/Moose::Meta::Class
  • /pod/module/Catalyst::Manual::About
    • If .pod exists for indexed module use it.
    • If not indexed we could look for .pod and redirect them to release or distribution url or possibly give them a 404 with suggestions.
  • /pod/release/FLORA/Moose-2.0204/lib/Moose/Meta/Class.pm
    • Specific version of a module file
    • Could have a rel=canonical to /module/$1 if it's indexed
    • Non-indexed could possibly rel=canonical to /pod/distribution/$1 if #796/#797 gets worked out
  • /pod/release/BOBTFISH/Catalyst-Manual-5.9000/lib/Catalyst/Manual/About.pod
    • Specific version of pod file
    • Essentially the same as previous
  • /pod/distribution/Catalyst-Manual/lib/Catalyst/Manual/About.pod
    • File in latest version of a dist
    • This suffers from #796/#797 but may be less prone to conflicts since the full file path is used.
    • If there's a conflict, show a disambiguation page.
    • Otherwise essentially the same as /pod/release/$1
  • /pod/script/cpanm
    • Look for files with exactly that basename
    • (We could prefer things in bin/, script/, or scripts/ but I'm not sure that gains us anything).
    • Offer disambiguation page (linking to release or distribution url) if not unique.
  • /pod/find/perlsyn
    • Primarily for the sake of L<> pod links which are, by specification, ambiguous.
    • Prefer:
      • Indexed module (probably redirect to /module/$1).
      • Non-indexed pod file (probably redirect to release or distribution url).
      • Script (probably redirect to release or distribution url).
    • Show Disambiguation page if not unique.
    • We wouldn't have to redirect... (not redirecting would reduce the number of API calls being made...) but it does seem the more appropriate thing to do.

For any redirects we could prefer /pod/distribution/$1 but we may want to wait
until dist-names are unique or #796/#797 otherwise gets resolved.

We could also have a /file/ controller that finds files the same way
but shows the source instead of the pod.

Old /module namespace

Should we keep the old /module/Foo::Bar as the canonical url?
It's probably the most commonly used/desired url on the site.
We could redirect non-indexed modules to one of the other urls
(to fix the /module/scriptname behavior).

Comments?

Moritz Onken
Owner
monken commented June 27, 2013

my 2 cents

  1. /script/cpanm doesn't solve the issue that anyone can upload a cpanm script which makes it not unique. IMO scripts should only be accessed through the version independent distribution endpoint (/distribution/App-cpanminus/bin/cpanm). Authors can still upload a App-cpanminus distribution and take over that link. However, that's less likely than someone to release his/her local::lib
  2. Rename /module/{arg} to /pod/{arg}. If {arg} is not a registered module (i.e. 02packages.txt) redirect to the version independent distribution endpoint (/distribution/{dist}/{full path_to file}. That way we save a redirect for most documentation that are modules, and provide a pretty stable url for scripts and .pod files.
  3. /pod/{release with version}/{full path to file} will behave just like /module/{...} today

Pod parsers will then use /pod/ as a base for L<>. In case the requested resource is not an indexed module, we redirect to the canonical version independent distribution endpoint (CVIDE). If there is more than one documentation with the same name, we show a page where the user can select where he wants to go.

To summarize:

  • /pod/Moose

    show latest Moose documentation

  • /pod/cpanm

    redirect to /distribution/App-cpanminus/bin/cpanm

  • /pod/Catalyst::Manual::About

    redirect to /distribution/Catalyst-Manual/lib/Catalyst/Manual/About.pod

  • /pod/cpan

    show list of documentation with title cpan to choose from (distributions WAIT, CPAN, App-Cpan)

  • /pod/ETHER/Moose-2.0802/lib/Moose.pm

    version specific documentation

Jay Allen

fwiw, I think Tim's points/proposal, rwstauner's summary and monken's points are sound. I especially like the twist of potentially ambiguous links like /pod/cpan automatically rendering a search/disambiguation page. However, there's one quirky case to consider:

  1. miyagawa introduces cpanm. Gods and men rejoice. /pod/cpanm renders the docs.
  2. Some time later, some jerk includes a cpanm script in his distribution
  3. BOOM, all anchored links to the documentation break on the shiny, new disambiguation page

Sure, you could apply the same anchors to all disambiguation results so that the next click gets you where you want to go, but it may throw some people off.

And IMHO, I think the /pod/ should be canonical, letting /module/ die out as it will.

Moritz Onken
Owner
monken commented June 27, 2013

@jayallen if an author wants his/her module/script to be indexed, he should add a package cpanm; line to the file and /pod/cpanm will always link to this module/script. Not doing that is just calling for trouble. It has the nice sideeffect that cpan cpanm will actually work.

Jay Allen

Agreed. Don't add complexity to cater to laziness. :)

Randy Stauner
Owner

I'm fine with not doing 1. (/script/blah)... i probably would have left it until last and never gotten around to it anyway.

I really like the idea of the identifier (/pod/$type/@args) because it seems simple, reads well, and allows for future expansion (which would have been nice to have from the start).

I might be able to warm up to the idea of 2. because it seems to cover the cases, however the /distribution/@path endpoint bothers me:

  • It's a misnomer (which is technically how this issue began... "incorrect" urls).
  • It will be limited to showing pod even though it sounds like it ought to be showing distribution metadata (favorites, bugs, requirements, relations, rev-deps... I'm not sure what else could go here it just seems to me like "files" isn't it).
  • It would become another endpoint that's showing pod, and we still don't have a CVIDE for non-pod files. I could easily see wanting to be able to link to a script in the examples/ directory of the latest version of a dist or a certain dist's dist.ini or cpanfile or something.

Number 3. seems too limiting.

We might be able to do a combination:
A single arg could be the magic redirect of 2. (or a disambiguation page):

  • /pod/Module::Name
  • /pod/script_name
  • /pod/Documentation::Name

and we could use the qualifiers for other entries (more than one arg):

  • /pod/release/AUTHOR/NAME-VER/@path
  • /pod/distribution/NAME/@path

Then a file endpoint could have consistent urls for source code: /file/release/..., /file/distribution/...

As for the non-unique dist names, here's a stupid idea: we could let people link to files that are in the dist of a named (indexed) module... something like /pod/module/App::cpanminus/bin/cpanm. (NOTE: I'm only mildly serious on this one, it seems silly, but it would be a solution.)

Olaf Alders
Owner

I really like the way this discussion has progressed. This last comment from @rwstauner sounds very good to me. As far as the final (silly?) solution, I think something along those lines might be a great idea. It's a memorable URL and it would give you what you expect to get from it.

Moritz Onken
Owner
monken commented July 11, 2013

/pod/$type/@args requires a redirect for L<> pod links if we want to establish it as the canonical url. And that is something I'm not comfortable with. That will add another huge delay to page loads.
Another reason why I think /pod/$type/@args is not a good idea for a canonical url is that $type might actually change. People will realize that their script should actually be a package and that will change $type, making the old link 404.

If authors want /pod/$module to do the right thing, they just define a proper package. It's not that hard! We shouldn't encourage people to upload dists that are not properly packaged by building workarounds.

The disambiguation page is a nice way to let authors know that the namespace is not taken yet and they simply have to define that package name.

Randy Stauner
Owner

I agree with linking to /pod/$arg and using that as the canonical where possible (no redirect).

How do you feel about using the qualifier for the other (direct) links (release, distribution, etc)?

I think that gives us the maximum flexibility and the /pod/$single_arg gives us a good (stable) url.

Best of both worlds.

Daniel Perrett
pdl commented July 12, 2013

Can we use a more SEO-friendly word than pod? docs? help?

Jay Allen

I'd say POD IS more SEO friendly since when I'm searching for perl module documentation, I'm going to be looking for POD. Did you mean more non-perl-developer-human-friendly?

Leo Lapworth
Owner

@rwstauner has merged his changes (3 months ago) - so closing

Leo Lapworth ranguard closed this December 22, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.