Skip to content


Subversion checkout URL

You can clone with
Download ZIP
tree: 40cdfdc5e7
Fetching contributors…

Cannot retrieve contributors at this time

executable file 157 lines (109 sloc) 3.641 kB
=head1 NAME - validate URLs located in module code and POD
B<> [B<-d|--dir> path] [B<-v|--verbose>] [B<-?|-h|--help>]
[B<-o|--outfile> filename]
Checks code and POD of all bioperl-live modules for URLs, and validates them.
Output is a series of lines containing two fields, tab separated.
The first field is the file with the bad URL, the second is the URL itself.
The whole URL is not retrieved, only a HTTP "HEAD" request is done
to see if the URL exists on the server. The request is done using
B<LWP::Simple> so the B<http_proxy> environmental variable will be
The URL parsing may not be perfect - although I use the B<Regexp::Common::URI>
module, I have to manually clean up some URLs which are embedded in Perl
strings to convert the matched URL to a more probable real world URL,
e.g. most URLs don\'t end in "'," or ")" :-)
use strict;
use Data::Dumper;
use File::Find;
use Getopt::Long;
use Regexp::Common qw(URI);
use LWP::Simple qw($ua head);
# command line options
my ($verbose, $dir, $help) = (0, '../Bio/', undef);
my $file;
'v|verbose' => \$verbose,
'd|dir:s' => \$dir,
'o|outfile:s' => \$file,
'h|help|?' => sub{ exec('perldoc',$0); exit(0) }
my $fh;
if (defined $file) {
open($fh, '>', $file) || die "Can't open file : $!";
} else {
$fh = \*STDOUT;
# find all modules
find( \&find_modules, $dir );
# validate unique URLs and print fail cases to stdout
my %cached_urls;
sub check_url {
my ($url, $file) = @_;
if (exists $cached_urls{$url}) {
print STDERR "$url checked in ".$cached_urls{$url}[0].":".$cached_urls{$url}[1]."\n" if $verbose;
print $fh "$file\t$url\n" if $cached_urls{$url}[1] ne 'ok';
print STDERR "Checking $url in $file... " if $verbose;
my $ok = head($url);
my $status = $ok ? 'ok' : 'FAIL!';
print STDERR "$status!\n" if $verbose;
print $fh "$file\t$url\n" if !$ok;
$cached_urls{$url} = [$file, $status];
close $fh if $file; # don't close STDOUT
# this is where the action is
sub find_modules {
# only want files with .pm
return unless m/\.pm$/;
return unless -f $_;
my $fname = $_;
# slurp in the file
my $text = do { local( @ARGV, $/ ) = $fname ; <> } ;
# keep track of URLs
while ($text =~ m/$RE{URI}{HTTP}{-keep}/g) {
my $url = $1 or next;
# remove Perl code if URL was embedded in string and other stuff
$url =~ s/\s*[.,;'")]*\s*$//;
check_url($url, $File::Find::name);
=head1 OPTIONS
=over 3
=item B<-d | --dir> path
Overides the default directory to recursively look for .pm file
(Default is '../Bio')
=item B<-v | --verbose>
Show the progress through files during the POD checking.
=item B<-? | -h | --help>
This help text.
=head2 Mailing Lists
User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to
the Bioperl mailing list. Your participation is much appreciated. - General discussion - About the mailing lists
=head2 Reporting Bugs
Report bugs to the Bioperl bug tracking system to help us keep track
of the bugs and their resolution. Bug reports can be submitted via the
=head1 AUTHOR - Torsten Seemann
Email: torsten-dot-seemann-at-infotech-dot-monash-dot-edu-dot-au
Jump to Line
Something went wrong with that request. Please try again.