New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate list of Perl versions from CPAN mirror and parsed reports #7

Open
preaction opened this Issue Aug 10, 2017 · 2 comments

Comments

Projects
None yet
1 participant
@preaction
Member

preaction commented Aug 10, 2017

There is a perl_version table which caches the known list of Perl versions for easy reference. This table was maintained by the main report processing task (CPAN::Testers::Data::Generator). We need a way to build this table from scratch using data from the local CPAN mirror.

We should have one module, CPAN::Testers::Backend::ProcessPerlVersion. This module should be a runnable module (Beam::Runnable) that, when run, does the following:

  • Clear the current cache completely (optional, with --clear command-line option)
  • Insert cache rows by reading the local CPAN mirror directory (directory must be configurable as an attribute)

This will require a DBIx::Class module be built to read/write the perl_version table (CPAN::Testers::Schema::Result::PerlVersion, in cpan-testers/cpantesters-schema). It could be easier to build the method that reads the CPAN directory in a CPAN::Testers::Schema::ResultSet::PerlVersion class (it's better design to push as much data/business logic into the model layer).

preaction added a commit that referenced this issue Jun 9, 2018

add script to populate perl versions from cpan
This is necessary but not sufficient: We need to still be ensuring that
the Perl version from the incoming reports exists as well...

Refs #7
@preaction

This comment has been minimized.

Show comment
Hide comment
@preaction

preaction Jun 9, 2018

Member

I've added the script that will populate Perl versions from the local CPAN mirror, but this is insufficient. Two things remain on this ticket:

  • A cron job needs to be configured to run the scan from the CPAN mirror. This should be added to the Rexfile (unless cpan-testers/cpantesters-deploy#31 is done, then the cron job should be configured in cpantesters-deploy)
  • The ProcessReports module needs to ensure that the Perl version in the report is also in the database.

The second task is necessary mostly because of patched Perls or other Perls that are not released on CPAN. For every report processed by ProcessReports, it should call the ensure_exists method of the PerlVersion resultset (see cpan-testers/cpantesters-schema#23).

Member

preaction commented Jun 9, 2018

I've added the script that will populate Perl versions from the local CPAN mirror, but this is insufficient. Two things remain on this ticket:

  • A cron job needs to be configured to run the scan from the CPAN mirror. This should be added to the Rexfile (unless cpan-testers/cpantesters-deploy#31 is done, then the cron job should be configured in cpantesters-deploy)
  • The ProcessReports module needs to ensure that the Perl version in the report is also in the database.

The second task is necessary mostly because of patched Perls or other Perls that are not released on CPAN. For every report processed by ProcessReports, it should call the ensure_exists method of the PerlVersion resultset (see cpan-testers/cpantesters-schema#23).

@preaction preaction self-assigned this Jun 9, 2018

@preaction

This comment has been minimized.

Show comment
Hide comment
@preaction

preaction Aug 24, 2018

Member

We also need to fix the existing data: There are cpanstats lines without a corresponding row in the perl_versions table. We should add a FixPerlVersions command that will look for these rows and add them. The query to just find missing entries is:

select distinct cpanstats.perl 
from cpanstats 
left join perl_version on perl_version.version = cpanstats.perl
where cpanstats.perl != "0" and perl_version.version is null;

The "0" Perl version seems to be from some reporter back in the day. None of these records are more recent than 2010.

Member

preaction commented Aug 24, 2018

We also need to fix the existing data: There are cpanstats lines without a corresponding row in the perl_versions table. We should add a FixPerlVersions command that will look for these rows and add them. The query to just find missing entries is:

select distinct cpanstats.perl 
from cpanstats 
left join perl_version on perl_version.version = cpanstats.perl
where cpanstats.perl != "0" and perl_version.version is null;

The "0" Perl version seems to be from some reporter back in the day. None of these records are more recent than 2010.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment