Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chromosome order in "bwtool window" #47

Open
steffenheyne opened this issue Sep 21, 2015 · 4 comments
Open

chromosome order in "bwtool window" #47

steffenheyne opened this issue Sep 21, 2015 · 4 comments

Comments

@steffenheyne
Copy link

What determines the chromosome order in "bwtool window"? For example I get this order on my bigwigs (just look at the first column):
GL456396.1 21200 21225 111
GL456354.1 195950 195975 2
GL456382.1 23125 23150 2
JH584298.1 184150 184175 2
GL456367.1 42025 42050 0
GL456216.1 66625 66650 8
GL456381.1 25825 25850 2
JH584297.1 205750 205775 0
GL456366.1 47025 47050 0
GL456394.1 24275 24300 14
GL456379.1 72350 72375 2
JH584296.1 199325 199350 4
MT 16250 16275 33
GL456393.1 55675 55700 2
GL456378.1 31575 31600 6
JH584295.1 1950 1975 0
GL456392.1 23600 23625 63
GL456350.1 227925 227950 4
GL456213.1 39300 39325 2
JH584294.1 191875 191900 0
JH584304.1 114425 114450 842
GL456212.1 153575 153600 2
JH584293.1 207925 207950 4
JH584303.1 158050 158075 0
GL456239.1 40025 40050 48
GL456389.1 28725 28750 0
GL456390.1 24625 24650 29
GL456211.1 241700 241725 0
19 61431525 61431550 0
18 90702600 90702625 0
17 94987225 94987250 0
16 98207725 98207750 0
15 104043650 104043675 0
14 124902200 124902225 0
13 120421600 120421625 0
12 120128975 120129000 0
11 122082500 122082525 0
10 130694950 130694975 0
JH584292.1 14900 14925 4
JH584302.1 155800 155825 1
GL456210.1 169700 169725 0
JH584301.1 259850 259875 0
GL456359.1 22925 22950 8
GL456360.1 31675 31700 6
GL456387.1 24650 24675 0
JH584300.1 182300 182325 0
GL456372.1 28625 28650 14
GL456221.1 206925 206950 4
GL456385.1 35200 35225 2
GL456219.1 175925 175950 0
GL456370.1 26725 26750 0
Y 91744650 91744675 0
X 171031250 171031275 0
GL456233.1 336900 336925 0
9 124595075 124595100 0
8 129401175 129401200 0
7 145441425 145441450 0
6 149736500 149736525 0
5 151834650 151834675 0
4 156508075 156508100 0
3 160039650 160039675 0
2 182113175 182113200 0
1 195471925 195471950 0

The order seems a bit random. My problem is that I just want to use the "bwtool window" output from column 4 to the last column, and for example concatenate this output from multiple files. But then I need to be sure about the ordering of the regions/chromosomes. Can I be sure that for similar bigwigs the chromosome order is always the same?
Thanks!

@pkhoueiry
Copy link

Why don't you create a bed file from your "bwtool window" output and use it to extract data from several bigwigs using the "bwtool extract" or "bwtool aggregate" tools. You can then control the features that you want to fetch for.

@andypohl
Copy link

Sorry, I saw this and forgot to reply. I don't think chromosome order is different for the window program than the other programs. Before looking more closely, I believe the program is just following the same order of chromosomes as the index written inside inside the bigWig file. Generally, I would prefer the chromosomes to be listed in largest-to-smallest order, but then of course for the human genome that would mean chromosomes being read in the order 16,17,18,20,19,22,21 instead of numerical order, and I already know it doesn't do that. Are you using any options to your bwtool window command? Hypothetically, you could make a bed file of the whole genome, one line per chromosome, in the order you wish, and use the -regions=bed option to control the order.

I say hypothetically, because bwtool window should [probably] only be run on whole genomes if the genome is small (bacterial, etc). bwtool window is rather slow, and a command like "bwtool window 1000 human_H3K27me3.bw > windows.txt" would write a many-terabyte file. I've used bwtool window on full human and mouse bigwigs before, but in these cases my window is usually small, and I use a cluster of computers to run the window program on chromosomes separately.

@andypohl
Copy link

OK, so I wanted to reply something, but now that I've looked into it, I can say a little more. The order of traversing the chromosomes through bwtool window is the reverse of how the chromosomes are written into the wig/bigWig. Here is a test wig file (test.wig):

fixedStep chrom=chr1 start=1 step=1
4
5
6
7
1
3
4
6
fixedStep chrom=chr7 start=1 step=1
9
8
2
0
1
fixedStep chrom=chr2 start=1 step=1
3
5
2
1
3
5

Here is chromosomes file (chroms.txt):

chr1    8
chr7    5
chr2    6

I'll make it into a bigWig:

$ wigToBigWig test.wig chroms.txt test.bw

Now I'll run window on this:

$ bwtool window 2 test.bw
chr7    0   2   9.00,8.00
chr7    1   3   8.00,2.00
chr7    2   4   2.00,0.00
chr7    3   5   0.00,1.00
chr2    0   2   3.00,5.00
chr2    1   3   5.00,2.00
chr2    2   4   2.00,1.00
chr2    3   5   1.00,3.00
chr2    4   6   3.00,5.00
chr1    0   2   4.00,5.00
chr1    1   3   5.00,6.00
chr1    2   4   6.00,7.00
chr1    3   5   7.00,1.00
chr1    4   6   1.00,3.00
chr1    5   7   3.00,4.00
chr1    6   8   4.00,6.00

If I want the chromosomes in a certain order I can make an additional bed file (chroms.bed):

chr1    0   8
chr2    0   6
chr7    0   5

and run the program again with that using the -regions option:

$ bwtool -regions=chroms.bed window 2 test.bw
chr1    0   2   4.00,5.00
chr1    1   3   5.00,6.00
chr1    2   4   6.00,7.00
chr1    3   5   7.00,1.00
chr1    4   6   1.00,3.00
chr1    5   7   3.00,4.00
chr1    6   8   4.00,6.00
chr2    0   2   3.00,5.00
chr2    1   3   5.00,2.00
chr2    2   4   2.00,1.00
chr2    3   5   1.00,3.00
chr2    4   6   3.00,5.00
chr7    0   2   9.00,8.00
chr7    1   3   8.00,2.00
chr7    2   4   2.00,0.00
chr7    3   5   0.00,1.00

I hope that helps. At any rate, my warning about running bwtool window on large genomes still stands.

The order of chromosomal output is indeed deterministic, but I'm hesitant to alter that beyond reversing what's there. After all, there isn't really any biological continuity between different chromosomes.

@steffenheyne
Copy link
Author

Hi, thanks a lot for clarification and looking into this. I have also overlooked the general option -regions=chroms.bed as I was looking always at the help output of "bwtool window -h"

Currently I'm using this:
bwtool window 1 -step=25 -decimals=0 my.bw > genome_bs25_tiling.tab

What I want is a full genome tiling/binning of the bigwig in a bedgraph/fixed-width-wig like manner. So it works for me with bwtools now.

My only issues was the chr order as I generated my tiling from a common genome.fa.fai index file and wanted the exact same order from the bigwig. Many other tools only output variable-width wigs with regions combined with the same value as well as some do not output chromosomes with NA/no data on it.

One question remains: is there a faster way to get this from the bigwig (with bwtool) than bwtool window? :-)
I don't want a variable-width wig first, because I need to convert this again, I was looking for a tool which gives this fixed tiling in output!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants