-
Notifications
You must be signed in to change notification settings - Fork 2
/
README
303 lines (226 loc) · 12 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
CPAN::Testers::Data::Generator(3) CPAN::Testers::Data::Generator(3)
NAME
CPAN::Testers::Data::Generator - Download and summarize CPAN Testers
data
SYNOPSIS
% cpanstats
# ... wait patiently, very patiently
# ... then use cpanstats.db, an SQLite database
# ... or the MySQL database
DESCRIPTION
This distribution was originally written by Leon Brocard to download
and summarize CPAN Testers data. However, all of the original code has
been rewritten to use the CPAN Testers Statistics database generation
code. This now means that all the CPAN Testers sites including the
Reports site, the Statistics site and the CPAN Dependencies site, can
use the same database.
This module downloads articles from the cpan-testers newsgroup,
generating or updating an SQLite database containing all the most
important information. You can then query this database, or use
CPAN::WWW::Testers to present it over the web.
A good example query for Acme-Colour would be:
SELECT version, status, count(*) FROM cpanstats WHERE
distribution = "Acme-Colour" group by version, state;
To create a database from scratch can take several days, as there are
now over 2 million articles in the newgroup. As such updating from a
known copy of the database is much more advisable. If you don't want to
generate the database yourself, you can obtain the latest official copy
(compressed with gzip) at http://devel.cpantesters.org/cpanstats.db.gz
With over 6 million articles in the archive, if you do plan to run this
software to generate the databases it is recommended you utilise a
high-end processor machine. Even with a reasonable processor it can
take a week!
SQLite DATABASE SCHEMA
The cpanstats database schema is very straightforward, one main table
with several index tables to speed up searches. The main table is as
below:
+--------------------------------+
| cpanstats |
+----------+---------------------+
| id | INTEGER PRIMARY KEY |
| state | TEXT |
| postdate | TEXT |
| tester | TEXT |
| dist | TEXT |
| version | TEXT |
| platform | TEXT |
| perl | TEXT |
| osname | TEXT |
| osvers | TEXT |
| date | TEXT |
| guid | TEXT |
| type | INTEGER |
+----------+---------------------+
It should be noted that 'postdate' refers to the YYYYMM formatted date,
whereas the 'date' field refers to the YYYYMMDDhhmm formatted date and
time.
The metabase database schema is again very straightforward, and
consists of one table, as below:
+--------------------------------+
| metabase |
+----------+---------------------+
| guid | TEXT PRIMARY KEY |
| report | TEXT |
+----------+---------------------+
The report field is JSON encoded, and is a cached version of the one
extracted from Metabase::Librarian.
SIGNIFICANT CHANGES
v0.31 CHANGES
With the release of v0.31, a number of changes to the codebase were
made as a further move towards CPAN Testers 2.0. The first change is
the name for this distribution. Now titled
'CPAN-Testers-Data-Generator', this now fits more appropriately within
the CPAN-Testers namespace on CPAN.
The second significant change is to now reference a MySQL cpanstats
database. The SQLite version is still updated as before, as a number
of other websites and toolsets still rely on that database file format.
However, in order to make the CPAN Testers Reports website more
dynamic, an SQLite database is not really appropriate for a high demand
website.
The database creation code is now available as a standalone program, in
the examples directory, and all the database communication is now
handled by the new distribution CPAN-Testers-Common-DBUtils.
v0.41 CHANGES
In the next stage of development of CPAN Testers 2.0, the id field used
within the database schema above for the cpanstats table no longer
matches the NNTP ID value, although the id in the articles does still
reference the NNTP ID.
In order to correctly reference the id in the articles table, you will
need to use the function guid_to_nntp() with
CPAN::Testers::Common::Utils, using the new guid field in the cpanstats
table.
As of this release the cpanstats id field is a unique auto incrementing
field.
The next release of this distribution will be focused on generation of
stats using the Metabase storage API.
v1.00 CHANGES
Moved to Metabase API. The change to a definite major version number
hopefully indicates that this is a major interface change. All previous
NNTP access has been dropped and is no longer relavent. All report
updates are now fed from the Metabase API.
INTERFACE
The Constructor
· new
Instatiates the object CPAN::Testers::Data::Generator. Accepts a
hash containing values to prepare the object. These are described
as:
my $obj = CPAN::Testers::Data::Generator->new(
logfile => './here/logfile',
config => './here/config.ini'
);
Where 'logfile' is the location to write log messages. Log messages
are only written if a logfile entry is specified, and will always
append to any existing file. The 'config' should contain the path
to the configuration file, used to define the database access and
general operation settings.
Public Methods
· generate
Starting from the last cached report, retrieves all the more recent
reports from the Metabase Report Submission server, parsing each
and recording each report in both the cpanstats databases (MySQL &
SQLite) and the metabase cache database.
· regenerate
For a given date range, retrieves all the reports from the Metabase
Report Submission server, parsing each and recording each report in
both the cpanstats databases (MySQL & SQLite) and the metabase
cache database.
Note that as only 2500 can be returned at any one time due to
Amazon SimpleDB restrictions, this method will only process the
guids returned from a given start data, up to a maxiumu of 2500
guids.
This methog will return the guid of the last report processed.
· rebuild
In the event that the cpanstats database needs regenerating, either
in part or for the whole database, this method allow you to do so.
You may supply parameters as to the 'start' and 'end' values
(inclusive), where all records are assumed by default. Records are
rebuilt using the local metabase cache database.
· reparse
Rather than a complete rebuild the option to selective reparse
selected entries is useful if there are reports which were
previously unable to correctly supply a particular field, which now
has supporting parsing code within the codebase.
In addition there is the option to exclude fields from parsing
checks, where they may be corrupted, and can be later amended using
the 'cpanstats-update' tool.
Private Methods
· commit
To speed up the transaction process, a commit is performed every
500 inserts. This method is used as part of the clean up process
to ensure all transactions are completed.
· get_next_guids
Get the list of GUIDs for the reports that have been submitted
since the last cached report.
· already_saved
Given a guid, determines whether it has already been saved in the
local metabase cache.
· get_fact
Get a specific report factfor a given GUID.
· parse_report
Parses a report extracting the metadata required for the cpanstats
database.
· reparse_report
Parses a report (from a local metabase cache) extracting the
metadata required for the stats database.
· retrieve_report
Given a guid will attempt to return the report metadata from the
cpanstats database.
· store_report
Inserts the components of a parsed report into the cpanstats
database.
· cache_report
Inserts a serialised report into a local metabase cache database.
· cache_update
For the current report will update the local metabase cache with
the id used within the cpanstats database.
HISTORY
The CPAN testers was conceived back in May 1998 by Graham Barr and
Chris Nandor as a way to provide multi-platform testing for modules.
Today there are over 2 million tester reports and more than 100 testers
each month giving valuable feedback for users and authors alike.
BECOME A TESTER
Whether you have a common platform or a very unusual one, you can help
by testing modules you install and submitting reports. There are plenty
of module authors who could use test reports and helpful feedback on
their modules and distributions.
If you'd like to get involved, please take a look at the CPAN Testers
Wiki, where you can learn how to install and configure one of the
recommended smoke tools.
For further help and advice, please subscribe to the the CPAN Testers
discussion mailing list.
CPAN Testers Wiki
- http://wiki.cpantesters.org
CPAN Testers Discuss mailing list
- http://lists.cpan.org/showlist.cgi?name=cpan-testers-discuss
BUGS, PATCHES & FIXES
There are no known bugs at the time of this release. However, if you
spot a bug or are experiencing difficulties, that is not explained
within the POD documentation, please send bug reports and patches to
the RT Queue (see below).
Fixes are dependent upon their severity and my availability. Should a
fix not be forthcoming, please feel free to (politely) remind me.
RT Queue -
http://rt.cpan.org/Public/Dist/Display.html?Name=CPAN-Testers-Data-Generator
SEE ALSO
CPAN::Testers::Report, Metabase, Metabase::Fact,
CPAN::Testers::Fact::LegacyReport, CPAN::Testers::Fact::TestSummary,
CPAN::Testers::Metabase::AWS
CPAN::Testers::WWW::Statistics
http://www.cpantesters.org/, http://stats.cpantesters.org/,
http://wiki.cpantesters.org/
AUTHOR
It should be noted that the original code for this distribution began
life under another name. The original distribution generated data for
the original CPAN Testers website. However, in 2008 the code was
reworked to generate data in the format for the statistics data
analysis, which in turn was reworked to drive the redesign of the all
the CPAN Testers websites. To reflect the code changes, a new name was
given to the distribution.
CPAN-WWW-Testers-Generator
Original author: Leon Brocard <acme@astray.com> (C) 2002-2008
Current maintainer: Barbie <barbie@cpan.org> (C) 2008-2010
CPAN-Testers-Data-Generator
Original author: Barbie <barbie@cpan.org> (C) 2008-2011
LICENSE
This code is distributed under the Artistic License 2.0.
perl v5.10.1 2011-07-04 CPAN::Testers::Data::Generator(3)