New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find CPAN Module Counts on search.cpan.org #1

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
5 participants
@chromatic

chromatic commented Dec 20, 2010

The Perl module stats on cpan.org represent a fraction of the modules on the CPAN. search.cpan.org has the full count.

This patch adds a suggested migration to change the data source for Perl modules.

@miyagawa

This comment has been minimized.

Show comment
Hide comment
@miyagawa

miyagawa Dec 20, 2010

Ruby gems and PyPI's modules count distributions not namespaces. So CPAN should also match against Distributions (on search.cpan.org) not Modules.

miyagawa commented Dec 20, 2010

Ruby gems and PyPI's modules count distributions not namespaces. So CPAN should also match against Distributions (on search.cpan.org) not Modules.

@perigrin

This comment has been minimized.

Show comment
Hide comment
@perigrin

perigrin Dec 20, 2010

I'm curious to know where search.cpan.org itself gets the stats. The numbers on http://stats.cpantesters.org/statscpan.html are about 4K higher. The larger number (24K) are roughly what I would expect from a quick scan of a (min)cpan mirror:

$find ~/Dropbox/minicpan/ -type f -name *.tar.gz | wc -l
24361

Isn't it wonderful what a horrible thing trying to make some comparative metrics between languages is?

perigrin commented Dec 20, 2010

I'm curious to know where search.cpan.org itself gets the stats. The numbers on http://stats.cpantesters.org/statscpan.html are about 4K higher. The larger number (24K) are roughly what I would expect from a quick scan of a (min)cpan mirror:

$find ~/Dropbox/minicpan/ -type f -name *.tar.gz | wc -l
24361

Isn't it wonderful what a horrible thing trying to make some comparative metrics between languages is?

@wchristian

This comment has been minimized.

Show comment
Hide comment
@wchristian

wchristian Dec 21, 2010

perigrin, searching in that way you won't get all dists, since some files use zip or tar.bz2 or possible even more different formats. As far as i can tell, search.cpan counts the unique release files mentioned in 02_packages (which, since it serves to count namespaces, has duplicates). So, the correct way, which the author is already aware of at this point is to extract the releases mentioned in 02_packages, strip version information and collapse the remainder.

wchristian commented Dec 21, 2010

perigrin, searching in that way you won't get all dists, since some files use zip or tar.bz2 or possible even more different formats. As far as i can tell, search.cpan counts the unique release files mentioned in 02_packages (which, since it serves to count namespaces, has duplicates). So, the correct way, which the author is already aware of at this point is to extract the releases mentioned in 02_packages, strip version information and collapse the remainder.

@edebill edebill closed this Jan 31, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment