Filter by author / dist name #1

Closed
bigpresh opened this Issue Oct 11, 2011 · 14 comments

Projects

None yet

3 participants

@bigpresh

I think it'd be useful to be able to provide patterns which should be matched against only the author / dist name to filter results down - for instance, to look for your own name, but not in your own modules, for a random example usage.

If I get sufficient tuits I'll fork the repo, try to get a dev version of this up and running, and submit a pull request, but just thought I'd raise it as a "wishlist" issue item to share the idea at least.

Owner
dgl commented Oct 11, 2011

Yes, definitely, there's a TODO file in the repo with that sort of thing on it ("complex search") :).

Currently it takes the rather simple brute force regex approach so doing filtering (depending how it's done) could be a large change. One idea I had is to actually use metacpan API for some of it, to avoid reimplementing "searching CPAN", but I don't know how workable that would be in reality.

I'm afraid it's a little hard to setup at the moment, I haven't had a chance to clean it up, but let me know if you have trouble and I'll see what I can do.

Contributor

Yeah, I really need this feature as well. I'd like to be able to search against just one dist for something simple, but I can't seem to get any sort of RE to work for that effect. For example:

DBIx::Class::Schema::Loader.+ucfirst
DBIx::Class::Schema::Loader[.\n]+ucfirst
DBIx::Class::Schema::Loader(.|\n)+ucfirst
DBIx::Class::Schema::Loader[\0-\xff]+ucfirst

These don't work. First one doesn't have single line support a la //s. Second one doesn't interpret the . right. The last two would work, but I get a RE too greedy error. So, I have no way of really doing this currently.

I think in terms of implementation, it can simply be done post-match, which would throw out the results if it doesn't match the author/dist. This could actually be done on the same singular field as well, similar to Google's extra variables.

fixed:     (fixed strings; the new default search method, so keyword is optional)
re:        (old default now needs this keyword and possibly delimiters to capture spaces)
OR AND | & (keyword only parsed if exact case and lone word)
author:    (author filter; also supports author:re: or author:fixed:)
dist:      (distribution filter; also supports dist:re: or author:fixed:)
file:      (filename filter; not including dist directory; supports *? globbing, file:re:,
    and file:fixed: for NO globbing)
ln:        (line number range; supports stuff like 2,5,50-200)

attr:[{section}/]attr.in.dot.format:  (MetaCPAN property filter; section = release (default
    if not there), file; also supports attr:...:re:)

- (exclude)
[delimiters] (support for delimiters like re:/blah blah/ or re:{blah blah} or re:'blah blah',
   delimiters would also support RE ops on tail end: m, s, i)

These are all the same:
    dist:Moose
    dist:{Moose}
    dist:"Moose"
    dist:re:^Moose$
    dist:re:{^Moose$}ms
    attr:distribution:Moose
    attr:release/distribution:re:{^Moose$}ms

A grep search is required, so at least one fixed: (implicit or explicit keyword) or re: is
    required.

Yes, this is far more than is on this request, but since we are venturing in the realm of expanding the simple "this whole blank line is a regular expression", I felt it important to document all possible features we would use for this new parser. Yes, this would destroy some backwards compatibility, but the result would be far more powerful than what we have right now. And by documenting what we could have in there, we would no longer need to break BC again when we want to expand.

Implementing most of this would be somewhat easy, however. The things you could save for later would be: more than one fixed/re filter, ln: filter, negative filters, custom MetaCPAN attribute filters.

Owner
dgl commented Apr 30, 2012

So actually most of this is implemented already, just there's a leak in Redis connections which I've not yet tracked down so it's not live.

See the dev copy for an example:
http://cpangrep3.default.dgl.uk0.bigv.io/?q=file:\.(?i:xs|c(?:c?|pp))$+-dist:^perl$+SVt_REGEXP

I don't really see the point in making a distinction between RE and non-RE, as it accepts regexps anyway it's less confusing if everything is one.

@dgl dgl closed this Apr 30, 2012
Contributor

Sorry, I guess I was trying to make it more like Google, but still keep the power of the RE via the re: flag. It would make some of the queries less ugly, though it does make it more confusing for existing CPAN.GREP users. So, I'm not real hung up on the whole fixed/re thing.

What's left from the list from above? I can put in the rest as a new issue.

Owner
dgl commented May 1, 2012
  • attr: stuff, I can see searching on e.g. the deps of a release could be cool, but I don't know exactly how that would integrate into metacpan. (I also want at some point to get metacpan using the API, so I don't want to introduce a circular dependency).
  • ln:, although I don't really see the value of searching on that
  • OR/AND, although currently AND is implied and I can't think of any useful searches that can't just be written using | in RE.
Contributor
  • attr: - That's fine, as this was more of a catch-all for any other types of searches. Release deps could be its own flag, if we want to go that route. This might be the only new issue I put in here, as attr: looks to be pretty useful, but we'll need to further discuss the ramifications of implementation.
  • ln: - Ditto. This was more of a "science without application" blue-sky idea. Only thing I could think of would be if somebody was trying to find code that was only within the basic headers of the module (Exporter stuff, package definition, global vars, etc.).
  • OR/AND - This had more to do with usage on with the fixed: flag. However, OR could be used with other flags. For example, /\bdgl\b/ OR author:DGL OR file:{^DGL/}i.

BTW, just to confirm, have delimiters been implemented? With m/s/i support?

Also, shouldn't this issue be open until it's been commited into master?

Contributor

This seems to fail, I'm assuming because of the dashes: dist:"dh-make-perl" LICENSE

Even fails on escaping: http://cpangrep3.default.dgl.uk0.bigv.io/?q=dist%3Adh\-make\-perl+LICENSE

Owner
dgl commented May 3, 2012

On May 2, 2012 11:12 PM, "Brendan Byrd" <
reply@reply.github.com>
wrote:

This seems to fail, I'm assuming because of the dashes:
dist:"dh-make-perl" LICENSE

Are you sure dh-make-perl is indexed on CPAN? MetaCPAN seems to call it
DhMakePerl

Even fails on escaping:
http://cpangrep3.default.dgl.uk0.bigv.io/?q=dist%3Adh\-make\-perl+LICENSE


Reply to this email directly or view it on GitHub:
#1 (comment)

Contributor

Yeah, you're right. But what about this?

http://cpangrep3.default.dgl.uk0.bigv.io/?q=dist%3ADhMakePerl+{LICENSE}i

I guess that answers my question about delimiters and m/s/i support, unless you have it in a different format.

Owner
dgl commented May 3, 2012

On 3 May 2012 13:04, Brendan Byrd <
reply@reply.github.com

wrote:

Yeah, you're right. But what about this?

http://cpangrep3.default.dgl.uk0.bigv.io/?q=dist%3ADhMakePerl+{LICENSE}i

I guess that answers my question about delimiters and m/s/i support,
unless you have it in a different format.

The normal regexp syntax of (?i) or (?i:...) works, I don't see a reason
for multiple syntaxes, but yes some docs would be good.

Contributor

Oh yeah, forgot about that silly syntax. Yeah, just needs some docs to remind us.

I think that covers everything. So, where is the new code at within this repo? You said that you still had a leak to hunt down?

Owner
dgl commented May 6, 2012

The majority of the interesting code for this is: https://github.com/dgl/cpangrep/blob/master/lib/WWW/CPANGrep/Search.pm particularly _parse_search and filter_results.

As for the leak I haven't yet tracked it down, I suspect a circular ref., maybe between some of the coderefs in that file, maybe elsewhere.

Contributor

Oh okay, so it's in master here, but this code isn't quite put on the production web site?

Owner
dgl commented May 8, 2012

Exactly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment