Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Add recommendation support #253

Open
yanick opened this Issue · 50 comments
@yanick
Collaborator

Issue placeholder for discussions related to http://babyl.dyndns.org/techblog/entry/metacpan-recommendations.

@tobyink

I think there's a lot more value to recommendations if people can justify the recommendation.

To avoid it becoming yet another reviews site though, perhaps instead of putting a note by recommendations, allow people to tag their recommendation with pre-defined categories, such as: "more features", "better API", "tiny", "in core", etc.

For example, on the LWP::UserAgent page I might want to recommend both WWW::Mechanize ("more features") and also HTTP::Tiny ("in core", "tiny").

@tobyink

Also, a category which would need to be displayed separately: recommend modules which aren't an alternative to the current module, but are good partners for it. For example on the List::Util page, recommend List::MoreUtils.

@monken
Owner

Let's keep it simple for the first release. We can still over-engineer later on :)

@monken
Owner

@yanick here is my take on how to implement this:
I like the idea of bundling the recommendation with a ++. In fact, we could extend the /favorite endpoint to include the inferior modules in the favorite document.

Example: user A recommends Moo over Moose and Mo, the resulting favorite entry would be:

{
  user: 1,
  distribution: "Moo",
  instead_of: ["Moose", "Mo"]
}

Pretty easy to implement and easy to query, too.
Another thing to keep in mind. In my opinion we should use distribution instead of module names. When people look at Moose::Role, they still want to see the recommendation for Moo, although the user didn't explicitly recommend Moo::Role over Moose::Role.

@oalders
Owner

My vote is for simple to start with as well. I think we should also take into account the ideas presented by @timbunce here http://blog.timbunce.org/2013/03/10/suggested-alternatives-as-a-metacpan-feature/. I would say for this thread we could limit discussion of Tim's blog post to the points where it overlaps with what @yanick has proposed. We really need a road map moving forward here. My inclination would be to break it up into small, simple, deployable chunks with a view to expanding the functionality down the road if it turns out to be as useful as we think/hope it will be.

I like the UI which Tim has proposed for the recommendations and I tend to agree that modules make more sense than distributions for the suggested alternatives. However, having ++ refer to dists and alternatives referring to modules could lead to some confusion. At the very least, I think it's a conversation worth having and maybe getting some wider input on.

Seeing the way @mo has laid out the ++ entry, looks really clean and easy to use, but what if I want to recommend Mojo::UserAgent as an alternative for LWP::UserAgent? Now we're talking about

{
  user: 1,
  distribution: "Mojolicious",
  instead_of: ["libwww-perl"]
}
@yanick
Collaborator

[incorporate Tim's ideas here as well] Absolutely.

[road map] I agree that many small steps is the way to go. The sooner we have a core to play with, the sooner we can have a snowball effect and hundred, nay, thousand, of hackers pouring over the feature and submitting patches. ... okay, I might be harbouring too much hope here, but you know what I mean. ;-)

[modules versus distributions] I see the point for modules. For most modules/distributions, it won't make a lot of difference as, typically, one dist == one functionality == one main module. What I see playing again a per-module recommendation is that there will be dilution/confusion: anything that recommend Moose::Util, Moose::Meta::Class, Moose::Role really boils down to recommending Moose. Now, it's true that the flip side is that for distributions that are an umbrella for many functionalities (@oalders's example of Mojolicious is a good one), the recommendation might look odd, but I think that's still better than having a more diffuse module selection.

@timbunce
Collaborator

[road map] I agree that many small steps is the way to go, but picking the right direction is important! :)

[modules versus distributions] Firstly, I would argue that the current placement of the ++ is misleading. Rather than:

Gisle Aas / libwww-perl-6.04 / LWP::UserAgent [47 ++]

I'd suggest:

Gisle Aas / libwww-perl-6.04 [47 ++] – LWP::UserAgent

To take your example @yanick, it's unlikely that anyone would recommend Moose::Util, Moose::Meta::Class, or Moose::Role specifically unless they had good reason. They'd simply refer to Moose instead. If they did have a good reason then they wouldn't be able to express it clearly if they had to do so at the distro level.

Also, consider the case of a large distribution with many modules (Moose, DBIx::Class etc) where someone has developed a separate distribution that contains a single module that's improves on the functionality of just one of the bundled modules. Clearly that distro isn't a "suggested alternative" for the original distro, but the module is a "suggested alternative" for a specific module in that distro.

(I can see an argument for calling the new distro a "complementary distro". So if you choose to implement at the distro level then the implementation should support different relationship types from the start.)

[API] I'm nervous of having this functionality ride piggy-back on favourites, but I don't know the API well enough to know how valid that concern is. Clearly it's only appropriate if you choose to implement this at the distro level.

They'll need to be API support for the other side of the relationship as well, i.e. the "Suggested as the alternative to X other modules by Y people" and "Suggested as complementary with X other modules by Y people".

[Naming] Either "suggested alternative" and "suggested addition" or "recommended alternative" and "recommended addition". Umm, "alternative" seems clear but "addition" doesn't seem quite right; "extra" is a bit vague and "complementary" is a bit of a mouthful. I'll let you bike-shed that one :)

@monken
Owner

[placement of ++]
@timbunce Please open a separate ticket for that. :+1:

[module vs distro]
libwww-perl and Mojo are the exception and don't really follow the idea of CPAN where each dist tackles a certain problem or functionality. If we do it on a dist level that will also motivate people to split up their large dists.
One might argue that the perl dist has many modules that are candidates for recommendations. My argument against that is that most of these modules are dual-lived and have their own dist that we would recommend instead of the perl dist.
Again, let's keep it simple and I feel like having the recommendation on a module level would cause us a lot of headache.

Worst case scenario:
User looks at LWP::UserAgent and recommends Mojo::UserAgent, will result in

{
  user: 1,
  distribution: "Mojolicious",
  instead_of: ["libwww-perl"]
}

Looking at any of the libwww-perl modules will result in showing a recommendation for Mojolicious (instead of Mojo::UserAgent). I'm totally fine with that. But others might disagree.

[API]
Both queries you suggested should be supported. I like putting it in the favorite table because it relates and makes the implementation easier (in my mind).

[Naming]
I vote for recommended alternative, it's quite a mouthful for the API key so I still vouch for instead_of

@dagolden

Please, please, please, use module names. They are stable and reliable. Dist names are not. Dist names aren't even unique unless paired with uploader name (AUTHOR/Foo-Bar-1.23.tar.gz), so how do you track recommendations as different maintainers release. You're going to be in a heuristic pickle. (You might be there already with "previous dist".)

Modules are also precise. Forget the Mojolicious example, how about Scalar::Util and List::Util? Different alternatives will apply to each.

You can always roll it up on a distribution page and show other suggested distributions based on modules contained.

@dagolden

[naming]

I suggest using see also -- this is softer than "recommended alternative" and could eventually be enhanced with comments or tagging. It would allow for "recommendation" or "alternative" or "for use with" semantics.

It solves the discoverability problem without making direct value judgments. And it's general enough that you can use it to weight search rankings.

@yanick
Collaborator

[naming]

see also is, imho, too soft. The razor edge's we are walking, I think, is to have something that won't degenerate in bloodbaths, but still provide a venue to recommend solution X over solution Y.

[modules versus distributions]

Question for MetaCPAN peeps: assuming that we go with module names, is it easy or costly to have aggregation of those results done for the distribution? I'm asking because I think that we have to show some form of results at the dist level (I do not want to click through the many modules of DBIx::Class to know what peeps recommend for DBIx::Class as a whole). If per-dist aggregations of the module results is costly, then that would be a strong argument against the per-module recommendations. If not... then the fight can go on. :-)

@oalders
Owner

[naming]

I had also thought "see also" would be a nice, succinct way of naming this. The argument I had against it is that "See Also" seems to be a fairly common header in module documentation and the meaning given there is probably wider in scope. It seems to be, "if you like this, then you might just want to look at these", with no implied judgement. However, I could live with our version of "See Also" having a narrower definition.

@dagolden

[naming]

I think it's good to start soft and general, because you can always make it harder with more specific annotations or tagging. On the other hand, if you try to create the right ontology first and get it wrong, then you're sort of locked in.

Go back to @tobyink 's comment about comments -- I think starting with soft will allow greater insight into how it's being used, then more rigor can be figured out on the basis of actual usage.

@timbunce
Collaborator

All @dagolden's points are strong ones and I agree with them.

Re @yanick's query on cost of calculating the distro level recommendations, that could be done async as a batch job. As mentioned before, most recommendations will be made to the 'root' module of a distro, and the module level recommendations on that page would be updated immediately. I don't see a problem with the distro page not getting the change till later.

@monken
Owner

@dagolden when I talk about distributions I'm talking about Foo-Bar and not AUTHOR/Foo-Bar-1.23.tar.gz, which is a release in my mind. So the issue to track releases of different authors doesn't really apply.

Someone might recommend List::MoreUtils over List::Util. There is no harm if that also shows up for Scalar::Util. It's one distribution, it's one bucket of modules that try to solve a common issue: provide utilities to Perl data structures.

@timbunce

most recommendations will be made to the 'root' module of a distro

I agree and that's why we could just stick with recommending distributions because in all those cases, the distribution matches the module name. I guess I still have trouble understanding why we should recommend based on module names when we collapse on a dist-level anyway.

@dagolden

@monken "distributions" in the way you describe it don't exist as far as PAUSE is concerned. They don't exist as far as users are concerned because they can't be installed. They are a fiction invented by search.cpan.org and mimicked by metacpan.org. Perpetuating that design mistake would be unfortunate.

@shadowcat-mst

"recommendations" is wrong. For a start, now what do you call the relationshuip going the other way?

SEE ALSO is exactly the CPAN tradition for this since it doesn't mean "if you like this module", it means "if you're looking at this module you should also look at".

Generally in a deprecation situation, optimally you'd create the relationship going both ways.

Plus many recommendations would be conditional on some factor - there's no one universal best practice or we wouldn't be having this discussion, we'd just be picking one best module for each task and moving on.

As an example - I might add a link on DBIx::Class saying "see also DBIx::Lite if you don't need objects" and they might add a link back for the 'do need objects' case. Those are both conditional recommendations.

We can't assume a concept of 'obsoletes' and 'obsoleted by' - the Mojo/LWP example is good there, since while the Mojo API is a lot nicer for a lot of cases sri told me he doesn't want it to become 'the' HTTP API because he doesn't want to take on the backcompat requirements, so it's not a straight replacement even at the user agent level.

Another example would be PAR and App::FatPacker. fatpacker is way nicer to deal with than PAR for the cases it supports ... because by refusing to handle XS I managed to avoid 90% of the complications. So I'd like to think that it obsoletes PAR for most pure perl packing, but I still recommend PAR when you need XS support.

So I think calling it 'module relationships', displaying it as 'see also', and letting people put 'obsoletes' or 'obsoleted by' in the tags is probably the sensible way forwards. We can't capture a lot of the useful information otherwise, and it leaves us providing mechanism and then seeing what the userts shake out in tertms of policy

@monken
Owner

You rate distributions, you file bugs against distributions, cpan testers is organized by distributions. I think that term and fiction is well established in the Perl community and ecosystem. I understand that PAUSE follows a different approach, but I don't think it's practical to think in terms of modules for many use cases.

@shadowcat-mst

@monken CPAN works the way dagolden describes, not the way you describe. rt.cpan.org creates a queue based on the name of the first tarball to contain a new module, then uses that module's permissions to determine the maintainer, and the result is that bugs have to be re-opened when modules are split out. Not actually a feature, just a historical thing.

A see also attached to Sub::Quote pointing to Eval::Closure should not stay with Moo if I split the module out. A see also on Path::Router pointing to Web::Dispatch should not stay pointing at Web::Simple if I split the module out.

Making links to mojolicious would be completely futile if it was dist level only, too.

So for this use case it evidently isn't practical to think in terms of distributions alone. So the remaining question is whether we initially support only modules, or whether we need distributions as well. Can you provide a concrete example of a case where distributions work and modules don't?

@dagolden

I'm actually curious what happens on RT if there's an identically named distribution. If I were more evil, I would upload Moose-3.000.tar.gz containing a legal, unindexed module (NotReallyMoose.pm) and see what blows up as a result.

Since Metabase started, internally, all reports are full AUTHOR/DIST-VERSION.SUFFIX. It's only the display stuff that hasn't been updated.

Regardless of that, I think @shadowcat-mst makes the stronger case -- as modules move between distribution, recommendations/see-also should follow them.

I can't make you rewind the clock and stop having metacpan.org stop using "distribution" the way you are. But I do encourage you not to hang any more stuff off a non-unique key.

@dagolden

[naming]

How about Related Modules? Not the "See Also" we're used to seeing; establishes that there is a relationship; but is generic to support future distinctions.

@yanick yanick referenced this issue from a commit in yanick/metacpan-web
@yanick yanick First stab at reports for CPAN-API/cpan-api#253
I'll let people fight over nomenclature. Meanwhile here's some first
stab at the report. :-)

I'm still not sure that per-module makes more
sense than per-distribution, but just for the sake of having something
running, I've assuming that the data is per module. If that change,
the logic will be easy enough to move around.

I'm not terribly happy about the name of the keys, they'll probably
have to be changed.

I've also sorted the recommendations per total activity, and report both
the recommendations in favor of alternatives and against in the same
list, which seems to me the most succinct way to present the information.
eb47fcb
@monken monken referenced this issue in CPAN-API/metacpan-web
Closed

Make the module / dist link clearer in search results #798

@oalders
Owner

[naming]

I like "Related Modules", but I also do think "See Also" is the most succinct, even if I have some reservations about it.

[modules versus distributions]

I feel like at this point we've settled on modules and can move on from this. We an always add a dist recommendation system if needed, but I can't see a use case for that just yet. Correct me if I'm wrong.

@tobyink

As per http://blogs.perl.org/users/neilb/2013/03/whats-wrong-with-cpan.html#comment-405091 it would be nice if an author's own recommendations could be handled specially.

OK, so authors already get to put whatever recommendations they like in the pod, but the recommendation system would be machine-queryable.

@neilbowers

I think there are some overlapping concepts that are possibly getting mixed up here, including at least:

  • identifying all modules in a group. Eg all modules related to defining constants. It's a set, not an ordered list, but obviously might be presented in some order based on metadata, including the next point. I think of SEE ALSO as this list, though as someone above pointed out, some SEE ALSOs suggest specific (other) modules to use in certain situations.
  • 'suggested alternate' module(s). Eg "if you're using Readonly I think you should look at Const::Fast (or ...) instead".

I think the first type could be solved by tagging: Const::Fast might be tagged with "constants", and "immutable variables". These could be displayed next to every module (that has them), and clicking on them would list all modules in that group (ie tagged with that tag). So if someone knows they're thinking about immutable variables, they'll click on that, and get a shorter list.

I think the two concepts can be tied together by making the alternate modules model be "I think module::A is better/worse/equivalent than/to module::B [for tag]".

@yanick
Collaborator

Just a quick word to say that I'm chugging along with the UI part at https://github.com/yanick/metacpan-web/tree/recommendations I have stubbed in a MetaCPAN::Web::Controller::Account::Recommend, and I should be in a position to hook to the ElasticSearch backend as soon as I have one hour or two more to sink in this project. So... if any of you metacpaners feel like carving me a rest uri for that, that might come in handy rrrrreal soon. :-)

@yanick
Collaborator

Because I'm obviously bonker, I began to look at the cpan-api side of things. Result: https://github.com/yanick/cpan-api/tree/recommendations I have absolutely no idea what I'm doing... but I have tests that are passing for the creation and removal of recommendations.

Anyway, that part is far from being done, but I just wanted to give everybody a fair warning that the fox is in the henhouse. It's not too late to grab a shovel and come give it a good whack before it does too much damage. :-)

@ranguard
Owner

@yanick I don't know the metacpan code enough to do a review - just wanted to say keep it up even if you are damaging the henhouse - I'm sure someone will help patch it up after :)

@yanick
Collaborator

@ranguard That's the plan. :-) If nothing else, it gives me an excuse to learn ElasticSearch, which I wanted to do for some time now.

I'll push my latest code in a few instants. But it seems that I can push changes to the db just fine (yay!). Now remains the more thorny question of how ES does its searching.

@yanick
Collaborator

I... I think I have a working prototype. https://github.com/yanick/metacpan-web/tree/recommendations and https://github.com/yanick/cpan-api/tree/recommendations

In the database, I have Recommendation documents that have a user / module / alternative triplet, which can be pushed via

/recommendation/[user]/[module]/alternative/[better module]

In metacpan-web, the lesser and better alternatives to the current module are gathered in, respectively, 'instead_of' and 'supplanted_by'.

And that's pretty much it. Oh, and I put in the restriction that a user can only give one alternative for each module (to keep things simple).

@yanick
Collaborator

I think that I went as far as I could go. For the next step, I'd need somebody from the MetaCPAN team to look at what I did, and provide feedback for the, uh, well-meant atrocities I did to the model. Not to mention that I also need feedback on the UI: placement / nomenclature / etc.

@oalders
Owner

Sounds good. We'll get you some feedback. :)

@thaljef

[distributions versus modules]

Working on Pinto, I've had to think about modules and distributions a lot, so I'll toss in my 2 cents. My own conclusion thus far, is that modules are the only real thing. That's what PAUSE indexes, that's what you use in code, and that's what you put in the prerequisites. There is no such thing as a distribution. At most, there are only archives (tarballs) which is just a bag of modules at specific versions. So from that view point, any recommendation system probably ought to work at the module level.

@thaljef

Something else just occurred to me (and I think you all might have had a similar thought):

One way to generalize this might be to create various types of "associations" between modules. An example of an association might be "similar to" or "extended by" or "plugin for" or "superseded with". Some of those associations could be bi-directional, and some might only be uni-directional. Some pairs of associations could even be reciprocal. For example, a "slower than" association in one direction creates a "faster than" association in the other direction.

Any MetaCPAN user could create a link between any two modules, using one of the predefined association types. Then users could up (or down) vote on the associations. For each module, MetaCPAN tracks the vote counts and displays the most favored module for each type of association.

In other words, there are several axes of discovery within MetaCPAN. So you define some of those axes (i.e. associations), let users nominate the endpoints (i.e. modules) and let them assign weight to each candidate (i.e. voting).

I have no idea how the UI would work out for this, but you get the idea. Thanks for listening.

@monken
Owner

@yanick great work! Could you please make this a branch in the CPAN-API org? I think that's some solid ground work but needs some fine tuning :)

@yanick
Collaborator
@yanick
Collaborator
@oalders
Owner

@yanick You now have full access to do terrible things to MetaCPAN. :)

@yanick
Collaborator
@thaljef

I guess this model has one big decision point: are the association types pre-defined, or are they free-form and created by the users.

Maybe a bit of both.

The total number of useful association types is not that large. Maybe 10 or 20. More than that will probably be overwhelming to the user and difficult to present visually. So I don't think you want to leave it completely wide open. And personally, I would want to avoid people creating associations like "cooler" or "buggier" -- that kind of stuff is better left in full-text reviews where someone has to really own their words.

If you have bi-directional and reciprocal associations, then the system needs to know about those in advance so it can make the connection going the "other way" as well. For example, if you have predefined "faster than" and "slower than" associations, you probably don't want someone to invent a "quicker" association.

But you could still leave the door open for people to suggest new associations. Perhaps they would be excluded from the official score. But if you see patterns in the suggestions, then you'll have some clues about which associations should be part of your predefined set.

One caveat: this whole idea requires the voters to have experience with two modules. And perhaps that is the whole point -- to make relative comparisons. But the number of folks who have used both Mojolicious and Dancer (for example) is certainly less than the number of people who have used only one of them. So a simple tagging system might actually get more user input, albeit less specific and noisier.

@rwstauner
Owner

welcome aboard, @yanick! thanks for helping out! :-)

@yanick
Collaborator

*cough* *cough*

Poke?

@oalders
Owner

Poke appreciated. I will review this over the weekend. :+1:

@oalders oalders added the QA Hackathon label
@timbunce
Collaborator

Poke! QA Hackathon?

@oalders
Owner

It's officially on the list. I've got a few things to work through today. I imagine I'll look at this tomorrow. :)

@yanick yanick referenced this issue from a commit in CPAN-API/metacpan-web
@yanick yanick First stab at reports for CPAN-API/cpan-api#253
I'll let people fight over nomenclature. Meanwhile here's some first
stab at the report. :-)

I'm still not sure that per-module makes more
sense than per-distribution, but just for the sake of having something
running, I've assuming that the data is per module. If that change,
the logic will be easy enough to move around.

I'm not terribly happy about the name of the keys, they'll probably
have to be changed.

I've also sorted the recommendations per total activity, and report both
the recommendations in favor of alternatives and against in the same
list, which seems to me the most succinct way to present the information.
2bd76c5
@oalders
Owner

I've just rebased both of the branches -- I forked metacpan-web because the rebase was a bit hairier. This is a wip.

@oalders oalders removed the QA Hackathon label
@ranguard ranguard closed this
@timbunce
Collaborator

Closed without comment. Odd. What's the status?

@ranguard
Owner

This needs a champion to actually work on it - was just going to point the next person that mentioned it here :)

@timbunce
Collaborator

Wouldn't it be better to label the issue 'Volunteer needed' rather than closing it?

@ranguard
Owner

After further discussion... yes... (though I've never seen someone actually take on 'Volunteer needed', so we have renamed - 'Champion required'...)

@ranguard ranguard reopened this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.