New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document dependencies for bigWig file creation #195

Closed
lbeltrame opened this Issue Dec 3, 2013 · 14 comments

Comments

Projects
None yet
4 participants
@lbeltrame
Contributor

lbeltrame commented Dec 3, 2013

Unless the functionality is deprecated, some more documentation would be needed as so far I haven't been able to generate the bigWig files directly (only by launching the script manually).

@chapmanb

This comment has been minimized.

Member

chapmanb commented Dec 3, 2013

Luca;
I've been hoping to deprecate the bigWig conversion so neglected it. The major issue is that wigToBigWig uses large amounts of memory on deeply covered regions and it's not easy to predict expected usage. Does bigWig provide better viewing in UCSC or do you have another use why you prefer them over BAM?

@lbeltrame

This comment has been minimized.

Contributor

lbeltrame commented Dec 3, 2013

covered regions and it's not easy to predict expected usage. Does bigWig
provide better viewing in UCSC or do you have another use why you prefer
them over BAM?

I can distribute wiggle and bigwig files to the wet lab part of the group so
that they can view the results in IGV without having to load BAM files that
can be quite large.

I'm not saying this solution is the best, but providing a simple graph of
coverage would be best.

@chapmanb

This comment has been minimized.

Member

chapmanb commented Dec 3, 2013

Luca;
If you provide BAM + the index file, IGV will load those in sections as opposed to the whole file. Will that work for visualization needs, or are they still running into issues in deep coverage regions?

@lbeltrame

This comment has been minimized.

Contributor

lbeltrame commented Dec 3, 2013

If you provide BAM + the index file, IGV will load those in sections as
opposed to the whole file. Will that work for visualization needs, or are

The reasoning is that BAM files are heavy with regards to size, that is why I
was looking at "lighter" alternatives (except, like you said, that the
conversion is not light at all).

@chapmanb

This comment has been minimized.

Member

chapmanb commented Dec 4, 2013

Yes, that was my initial idea as well but unfortunately there don't appear to be any alternatives to wigToBigWig with control over resource usage. In the end bigWig files are also fairly large so I've been writing it off as a not-that-great experiment.

@tanglingfung

This comment has been minimized.

Contributor

tanglingfung commented Dec 27, 2013

does it make sense to generate bigwig for selected region (specified by bed file)? That can serve as a quick check for 'positive' control for RNA-Seq/ChIp-seq experiments. I have been using homer in generating the bigwig file. You can limit the filesize by compromising the resolution. Hope it helps.
http://biowhat.ucsd.edu/homer/ngs/ucsc.html

@chapmanb

This comment has been minimized.

Member

chapmanb commented Dec 28, 2013

Paul;
Good idea, this is definitely something to explore. Enabling more specific queries than "all the coverage over the whole genome" is a good use for BigWig. I believe the main conversion issue is related to read depth, not necessarily file size, so it will definitely take some work to avoid the memory spikes seen in the ucsc BigWig conversion tools.

@tanglingfung

This comment has been minimized.

Contributor

tanglingfung commented Dec 28, 2013

agree, I thought Luca concerned 'BAM files are heavy with regards to size, that is why I
was looking at "lighter" alternatives'

@chapmanb chapmanb closed this in 60ba1b9 Mar 30, 2014

@chapmanb

This comment has been minimized.

Member

chapmanb commented Mar 30, 2014

Luca and Paul;
I removed the BigWig generation code since this was not scaling and we're starting to move in the direction of having too many files. Generally I'd like to move towards a compressed representation like CRAM and then having ways to make this available for viewing in sections. This is more of a long term project but BigWig generation was too slow and error prone to make it worth investing more effort it. Hope the current BAM viewing approaches will work okay for you both.

@lbeltrame

This comment has been minimized.

Contributor

lbeltrame commented Mar 30, 2014

worth investing more effort it. Hope the current BAM viewing approaches
will work okay for you both.

For internal purposes I made a rather crude "get coverage" script that uses
pybedtools and the BED file to do some basic calculation and a plot with
matplotlib. I'm thinking this is more downstream than in the pipeline, but if
there's interest...

@chapmanb

This comment has been minimized.

Member

chapmanb commented Mar 30, 2014

Luca;
It would be great to try and collect these types of downstream scripts so folks can share and re-use. I agree it's tough to make automated since they are custom by their nature but having pointers to them would help begin to make them into something more general. What is the best way to do this? Do you want to add a section to the documentation on downstream tools and provide pointers/usage?

@mjafin

This comment has been minimized.

Contributor

mjafin commented Mar 30, 2014

Agree, would be nice to have a collection of useful downstream scripts and programs. We're currently working on a generic annotation script that can be yaml configured to annotate from any number of vcf/bed files etc. We'll open a ticket for discussion soon-ish..

@lbeltrame

This comment has been minimized.

Contributor

lbeltrame commented Mar 31, 2014

Well, since I mentioned this, here we go:

https://gist.github.com/lbeltrame/9888910

  • It runs on Python 3 but should work with Python 2
  • Lots of dependencies (pathlib, sarge) because it was meant for internal use ;)
  • Uses bedtools directly as pybedtools doesn't work with Python 3
  • Might be completely incorrect calculation wise - feel free to point out the glaring mistakes

chapmanb added a commit that referenced this issue Apr 16, 2014

Documentation: Add wget to required software for installation. Thanks…
… to Tim Hughes. Add pointers to downstream analysis tools, thanks to @lbeltrame #195
@chapmanb

This comment has been minimized.

Member

chapmanb commented Apr 16, 2014

Luca -- brilliant, thank you. I added a pointer to this in the documentation:

https://bcbio-nextgen.readthedocs.org/en/latest/contents/outputs.html

Happy to add more tools or pointers that would help folks working with the outputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment