Skip to content

Commit

Permalink
some final prep work before CT2.0
Browse files Browse the repository at this point in the history
  • Loading branch information
barbie committed Mar 18, 2010
1 parent b09890d commit 04933b0
Show file tree
Hide file tree
Showing 5 changed files with 652 additions and 329 deletions.
6 changes: 6 additions & 0 deletions CHANGES
@@ -1,5 +1,11 @@
Revision history for Perl module CPAN::Testers::Data::Generator.

0.41 current
- fixes to change the 'id' (was NNTP ID) to an auto incremental field.
- reworked logic to better fit latest changes.
- added repository to META.yml.
- documentation updates.

0.40 02/02/2010
- fixes to accommodate GUID changes.
- added support for 'type' field.
Expand Down
5 changes: 3 additions & 2 deletions META.yml
@@ -1,6 +1,6 @@
--- #YAML:1.0
name: CPAN-Testers-Data-Generator
version: 0.40
version: 0.41
abstract: Download and summarize CPAN Testers data
author:
- Barbie <barbie@cpan.org>
Expand Down Expand Up @@ -37,7 +37,7 @@ build_requires:
provides:
CPAN::Testers::Data::Generator:
file: lib/CPAN/Testers/Data/Generator.pm
version: 0.40
version: 0.41

no_index:
directory:
Expand All @@ -47,6 +47,7 @@ no_index:
resources:
license: http://dev.perl.org/licenses/
bugtracker: http://rt.cpan.org/Public/Dist/Display.html?Name=CPAN-Testers-Data-Generator
repository: http://github.com/barbie/cpan-testers-data-generator.git

meta-spec:
version: 1.4
Expand Down
308 changes: 228 additions & 80 deletions README
@@ -1,115 +1,263 @@
perl v5.10.0 2010-02-23 CPAN::Testers::Data::Generator(3)

NAME
CPAN::WWW::Testers::Generator - Download and summarize CPAN Testers data
CPAN::Testers::Data::Generator - Download and summarize CPAN Testers
data

SYNOPSIS
% cpanstats
# ... wait patiently
# ... then use cpanstats.db, an SQLite database
% cpanstats
# ... wait patiently, very patiently
# ... then use cpanstats.db, an SQLite database
# ... or the MySQL database

DESCRIPTION
This distribution was originally written by Leon Brocard to download and
summarize CPAN Testers data. However, much of the original code has been
rewritten to use the CPAN Testers Statistics database generation code. This
now means that all the CPAN Testers sites including the Reports site, the
Statistics site and the CPAN Dependencies site, can use the same database.
This distribution was originally written by Leon Brocard to download
and summarize CPAN Testers data. However, all of the original code has
been rewritten to use the CPAN Testers Statistics database generation
code. This now means that all the CPAN Testers sites including the
Reports site, the Statistics site and the CPAN Dependencies site, can
use the same database.

This module downloads articles from the cpan-testers newsgroup,
generating or updating an SQLite database containing all the most
important information. You can then query this database, or use
CPAN::WWW::Testers to present it over the web.

A good example query for Acme-Colour would be:

SELECT version, status, count(*) FROM cpanstats WHERE
distribution = "Acme-Colour" group by version, state;

To create a database from scratch can take several days, as there are
now over 2 million articles in the newgroup. As such updating from a
known copy of the database is much more advisable. If you don’t want to
generate the database yourself, you can obtain the latest official copy
(compressed with gzip) at http://devel.cpantesters.org/cpanstats.db.gz

With over 6 million articles in the archive, if you do plan to run this
software to generate the databases it is recommended you utilise a
high-end processor machine. Even with a reasonable processor it can
take a week!

SQLite DATABASE SCHEMA
The cpanstats database schema is very straightforward, one main table
with several index tables to speed up searches. The main table is as
below:

+--------------------------------+
| cpanstats |
+----------+---------------------+
| id | INTEGER PRIMARY KEY |
| state | TEXT |
| postdate | TEXT |
| tester | TEXT |
| dist | TEXT |
| version | TEXT |
| platform | TEXT |
| perl | TEXT |
| osname | TEXT |
| osvers | TEXT |
| date | TEXT |
| guid | TEXT |
| type | INTEGER |
+----------+---------------------+

It should be noted that ’postdate’ refers to the YYYYMM formatted date,
whereas the ’date’ field refers to the YYYYMMDDhhmm formatted date and
time.

This module downloads articles from the cpan-testers newsgroup, generating or
updating an SQLite database containing all the most important information. You
can then query this database, or use CPAN::WWW::Testers to present it over the
web.
The articles database schema is again very straightforward, and
consists of one table, as below:

A good example query for Acme-Colour would be:
+--------------------------------+
| articles |
+----------+---------------------+
| id | INTEGER PRIMARY KEY |
| article | TEXT |
+----------+---------------------+

SELECT version, status, count(*) FROM reports WHERE
distribution = "Acme-Colour" group by version, status;
SIGNIFICANT CHANGES
v0.31 CHANGES
With the release of v0.31, a number of changes to the codebase were
made as a further move towards CPAN Testers 2.0. The first change is
the name for this distribution. Now titled
’CPAN-Testers-Data-Generator’, this now fits more appropriately within
the CPAN-Testers namespace on CPAN.

To create a database from scratch can take several hours, as there are now over
1.5 million articles in the newgroup. As such updating from a known copy of the
database is much more advisable. If you don't want to generate the database
yourself, you can obtain the latest official copy (compressed with gzip) at
http://devel.cpantesters.org/cpanstats.db.gz
The second significant change is to now reference a MySQL cpanstats
database. The SQLite version is still updated as before, as a number
of other websites and toolsets still rely on that database file format.
However, in order to make the CPAN Testers Reports website more
dynamic, an SQLite database is not really appropriate for a high demand
website.

The database creation code is now available as a standalone program, in
the examples directory, and all the database communication is now
handled by the new distribution CPAN-Testers-Common-DBUtils.

v0.41 CHANGES
In the next stage of development of CPAN Testers 2.0, the id field used
within the database schema above for the cpanstats table no longer
matches the NNTP ID value, although the id in the articles does still
reference the NNTP ID.

In order to correctly reference the id in the articles table, you will
need to use the function guid_to_nntp() with
CPAN::Testers::Common::Utils, using the new guid field in the cpanstats
table.

As of this release the cpanstats id field is a unique auto incrementing
field.

The next release of this distribution will be focused on generation of
stats using the Metabase storage API.

INTERFACE
The Constructor
* new
Instatiates the object CPAN::WWW::Testers::Generator.
The Constructor
· new

Instatiates the object CPAN::Testers::Data::Generator. Accepts a
hash containing values to prepare the object. These are described
as:

my $obj = CPAN::Testers::Data::Generator->new(
logfile => './here/logfile',
config => './here/config.ini'
);

Where ’logfile’ is the location to write log messages. Log messages
are only written if a logfile entry is specified, and will always
append to any existing file. The ’config’ should contain the path
to the configuration file, used to define the database access and
general operation settings.

Methods
* logfile
In addition the binary keys of ’ignore’ and ’nostore’ are
available. ’ignore’ is used to ignore NNTP entries which return no
article and continue processing articles, while ’nostore’ will
delete all articles, except the last one received, thus reducing
space in the SQL database.

Accessor to set/get where the logging information is to be kept. Note
that if this not set, no logging occurs.
Public Methods
· generate

* database
Starting from the last recorded article, retrieves all the more
recent articles from the NNTP server, parsing each and recording
the articles that either upload announcements or reports.

Accessor to set/get the database full path.
· rebuild

* directory
In the event that the cpanstats database needs regenerating, either
in part or for the whole database, this method allow you to do so.
You may supply parameters as to the ’start’ and ’end’ values
(inclusive), where all records are assumed by default. Note that
the ’nostore’ option is ignored and no records are deleted from the
articles database.

Accessor to set/get the directory where the database is to be created.
· reparse

* generate
Rather than a complete rebuild the option to selective reparse
selected entries is useful if there are posts which have since been
identified as valid and now have supporting parsing code within the
codebase.

Starting from the last recorded article, retrieves all the more recent
articles from the NNTP server, parsing each and recording the articles
that either upload announcements or reports.
In addition there is the option to exclude fields from parsing
checks, where they may be corrupted, and can be later amended using
the ’cpanstats-update’ tool.

* insert_article
Private Methods
· cleanup

Inserts the components of a parsed article into the database.
In the event that you do not wish to store all the articles
permanently in the articles database, this method removes all but
the most recent entry, which is kept to ensure that subsequent runs
will start from the correct article. To enable this feature,
specify ’nostore’ within the has passed to new().

DATABASE SCHEMA
· commit

The database schema is very straightforward, one main table with several
index tables to speed up searches. The main table is as below:
To speed up the transaction process, a commit is performed every 50
inserts. This method is used as part of the clean up process to
ensure all transactions are completed.

+--------------------------------+
| cpanstats |
+----------+---------------------+
| id | INTEGER PRIMARY KEY |
| state | TEXT |
| postdate | TEXT |
| tester | TEXT |
| dist | TEXT |
| version | TEXT |
| platform | TEXT |
| perl | TEXT |
| osname | TEXT |
| osvers | TEXT |
| archname | TEXT |
+----------+---------------------+
· nntp_connect

Sets up the connection to the NNTP server.

· parse_article

Parses an article extracting the metadata required for the stats
database.

· insert_article

Inserts an article into the articles database.

· insert_stats

Inserts the components of a parsed article into the statistics
database.

HISTORY
The CPAN testers was conceived back in May 1998 by Graham Barr and Chris
Nandor as a way to provide multi-platform testing for modules. Today there
are over 1.5 million tester reports and more than 100 testers each month
giving valuable feedback for users and authors alike.
The CPAN testers was conceived back in May 1998 by Graham Barr and
Chris Nandor as a way to provide multi-platform testing for modules.
Today there are over 2 million tester reports and more than 100 testers
each month giving valuable feedback for users and authors alike.

BECOME A TESTER
The objective of the CPAN Testers is to test as many of the distributions
on CPAN as possible, on as many platforms as possible. The ultimate goal is
to improve the portability of the distributions on CPAN, and provide good
feedback to the authors.
Whether you have a common platform or a very unusual one, you can help
by testing modules you install and submitting reports. There are plenty
of module authors who could use test reports and helpful feedback on
their modules and distributions.

If you’d like to get involved, please take a look at the CPAN Testers
Wiki, where you can learn how to install and configure one of the
recommended smoke tools.

Whether you have a common platform or a very unusual one, you can help by
testing modules you install and submitting reports. There are plenty of
module authors who could use test reports and helpful feedback on their
modules and distributions.
For further help and advice, please subscribe to the the CPAN Testers
discussion mailing list.

If you'd like to get involved, please take a look at the CPAN Testers Wiki,
where you can learn how to install and configure one of the recommended
smoke tools.
CPAN Testers Wiki
- http://wiki.cpantesters.org
CPAN Testers Discuss mailing list
- http://lists.cpan.org/showlist.cgi?name=cpan-testers-discuss

For further help and advice, please subscribe to the the CPAN Testers
discussion mailing list.
BUGS, PATCHES & FIXES
There are no known bugs at the time of this release. However, if you
spot a bug or are experiencing difficulties, that is not explained
within the POD documentation, please send bug reports and patches to
the RT Queue (see below).

CPAN Testers Wiki - http://wiki.cpantesters.org
CPAN Testers Discuss mailing list
- http://lists.cpan.org/showlist.cgi?name=cpan-testers-discuss
Fixes are dependant upon their severity and my availablity. Should a
fix not be forthcoming, please feel free to (politely) remind me.

RT Queue -
http://rt.cpan.org/Public/Dist/Display.html?Name=CPAN-Testers-Data-Generator

SEE ALSO
CPAN::Testers::WWW::Statistics

http://www.cpantesters.org/, http://stats.cpantesters.org/,
http://wiki.cpantesters.org/

AUTHOR
Original author: Leon Brocard <acme@astray.com> (C) 2002-2008
Current maintainer: Barbie <barbie@cpan.org> (C) 2008
It should be noted that the original code for this distribution began
life under another name. The original distribution generated data for
the original CPAN Testers website. However, in 2008 the code was
reworked to generate data in the format for the statistics data
analysis, which in turn was reworked to drive the redesign of the all
the CPAN Testers websites. To reflect the code changes, a new name was
given to the distribution.

CPAN-WWW-Testers-Generator
Original author: Leon Brocard <acme@astray.com> (C) 2002-2008
Current maintainer: Barbie <barbie@cpan.org> (C) 2008-2010

CPAN-Testers-Data-Generator
Original author: Barbie <barbie@cpan.org> (C) 2008-2010

LICENSE
This code is distributed under the same license as Perl.
This code is distributed under the Artistic License 2.0.



perl v5.10.0 2010-02-23 CPAN::Testers::Data::Generator(3)

0 comments on commit 04933b0

Please sign in to comment.