Skip to content

Commit

Permalink
added Juan Tena's split_wig.pl script, which is useful for uploading …
Browse files Browse the repository at this point in the history
…WIG files from genomes with lots of scaffolds.
  • Loading branch information
lstein committed May 17, 2013
1 parent bc26f9d commit f74fd94
Show file tree
Hide file tree
Showing 4 changed files with 85 additions and 0 deletions.
2 changes: 2 additions & 0 deletions Changes
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
optionally pass font names like "Helvetica-12:Italic" to the track "font" option).
* Clicking on the "Go" button when an annotation plugin is selected now turns on the corresponding track.
* Clicking on the "Go" button when a filter plugin is selected opens up the configuration dialog.
* Added Juan Tena's split_wig.pl script, which is useful for uploading WIG files from genomes with lots
of scaffolds.

2.54
* Version 2.53 introduced a bad bug into track configuration such
Expand Down
1 change: 1 addition & 0 deletions MANIFEST
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ bin/process_ncbi_human.pl
bin/process_sgd.pl
bin/report_missing_language_tags.pl
bin/scan_gbrowse.pl
bin/split_wig.pl
bin/ucsc_genes2gff.pl
bin/wiggle2gff3.pl
Build.PL
Expand Down
74 changes: 74 additions & 0 deletions bin/split_wig.pl
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
#!/usr/bin/perl -w
use strict;
use warnings;
use File::Temp qw/tempdir/;
use Getopt::Long;

##
## Splits a wig file (variable or fixed step format) in different wig files with a maximum of 900 scaffolds/each
## and runs the wiggel2gff.pl script to upload these files to GBrowse2.
## Usage: split_wig.pl -w FILE.wig -p DATABASE_PATH
## The whole path is needed, since the gff files will point to their respective wib files.
## After running this script, you can run it again for a different wig file, and all gff files will be pooled
## together in the same folder. To upload the data to GBrowse2, the MySQL Backend is recommended:
## bp_seqfeature_load.pl -f -a DBI::mysql -d DATABASE gff3_files/*.gff3
## The data track should be configured in your DATABASE.conf file, setting the 'feature' field with the name of
## your original wig file (without extension).
##
## Juan J. Tena, CABD 2013
## jjtenagu@upo.es
##

my ($wig,$path)=('','');
GetOptions
(
"w=s" => \$wig,
"p=s" => \$path,
);


if (!$wig || !$path) {die "Usage: split_wig.pl -w FILE.wig -p DATABASE_PATH\n";}

mkdir "$path/wib_files";
mkdir "$path/gff3_files";

my $count=0;
my $chr_old='';
my $dir=tempdir(CLEANUP => 1);
my $out=File::Temp->new(DIR => $dir, UNLINK => 0, SUFFIX => '.dat');
my $header=`head -n 1 $wig`;
open IN, $wig or die "Cannot open $wig: $!\n";
while (<IN>) {
my $line=$_;
chomp $line;
if ($line=~/chrom/) {
if ($count>=900) {
$out=File::Temp->new(DIR => $dir, UNLINK => 0, SUFFIX => '.dat');
print $out $header;
$count=0;
}
my @fields=split /\s/,$line;
my $chr=$fields[1];
$chr=~s/chrom=//;
if ($chr ne $chr_old) {
$count++;
}
$chr_old=$chr;
}
print $out "$line\n";
}
close IN;

my @files=<$dir/*.dat>;
my @filepath=split /\//,$wig;
my @filename=split /\./,$filepath[-1];
my $suf=1;
foreach (@files) {
my $tmpout=File::Temp->new();
my $outfile="$path/gff3_files/$filename[0]_$suf.gff3";
system("wiggle2gff3.pl --path=$path/wib_files $_ > $tmpout");
system ("sed 's/microarray_oligo/$filename[0]/' $tmpout > $outfile");
$suf++;
}

exit;
8 changes: 8 additions & 0 deletions bin/wiggle2gff3.pl
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,14 @@ =head2 Example WIG File
chr19 example example 59304701 59308020 . . . Name=variableStep;wigfile=/var/gbrowse/db/track002.chr19.1199828298.wig
chr19 example example 59307401 59310400 . . . Name=fixedStep;wigfile=/var/gbrowse/db/track003.chr19.1199828298.wig
=head1 PROBLEMS
This script has trouble with wig files from very fragmented genomes
(>100K scaffolds). In this case, you may wish to run split_wig.pl,
which splits the original wig file into a series of smaller files with
a maximum of 900 scaffolds each. It then runs wiggle2gff3.pl for each
subfile and stores the results in separate folders.
=head1 SEE ALSO
L<Bio::DB::GFF>, L<bp_bulk_load_gff.pl>, L<bp_fast_load_gff.pl>,
Expand Down

0 comments on commit f74fd94

Please sign in to comment.