Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no coverage file produced #403

Closed
imerelli opened this issue Jan 8, 2021 · 8 comments
Closed

no coverage file produced #403

imerelli opened this issue Jan 8, 2021 · 8 comments

Comments

@imerelli
Copy link

@imerelli imerelli commented Jan 8, 2021

H Felix,
while in previously analysis with bismark_methylation_extractor (v0.20.0) using a command line such as:
bismark_methylation_extractor -p -o /analysis/results/ --gzip --bedGraph --multicore 4 ETRs_1_deduplicated.bam
the cov file was always generated (which is essential in order to use R/methylkit for the statistical analysis), now using the same command line with bismark_methylation_extractor (v0.23.0) the coverage file in not generated anymore. The computation seems ok, since all the other files are present (CpG_OT, CpG_OB, CHH_OT, CHH_OB, CHG_OB) and also the .M-bias.txt and splitting_report.txt seems fine. Below the end of the log, which also looks good.
Any hints of what is happening?

C methylated in CpG context: 73.4%
C methylated in CHG context: 0.6%
C methylated in CHH context: 0.5%

Merging individual M-bias reports into overall M-bias statistics from these 4 individual files:
ETRs_1_deduplicated_splitting_report.txt.1.mbias
ETRs_1_deduplicated_splitting_report.txt.2.mbias
ETRs_1_deduplicated_splitting_report.txt.3.mbias
ETRs_1_deduplicated_splitting_report.txt.4.mbias

Determining maximum read lengths for M-Bias plots
Maximum read length of Read 1: 101
Maximum read length of Read 2: 96

Perl module GD::Graph::lines is not installed, skipping drawing M-bias plots (only writing out M-bias plot table)
Determining maximum read lengths for M-Bias plots
Maximum read length of Read 1: 101
Maximum read length of Read 2: 96

Perl module GD::Graph::lines is not installed, skipping drawing M-bias plots (only writing out M-bias plot table)
Deleting unused files ...

CpG_OT_ETRs_1_deduplicated.txt.gz contains data -> kept
CpG_CTOT_ETRs_1_deduplicated.txt.gz was empty -> deleted
CpG_CTOB_ETRs_1_deduplicated.txt.gz was empty -> deleted
CpG_OB_ETRs_1_deduplicated.txt.gz contains data -> kept
CHG_OT_ETRs_1_deduplicated.txt.gz contains data -> kept
CHG_CTOT_ETRs_1_deduplicated.txt.gz was empty -> deleted
CHG_CTOB_ETRs_1_deduplicated.txt.gz was empty -> deleted
CHG_OB_ETRs_1_deduplicated.txt.gz contains data -> kept
CHH_OT_ETRs_1_deduplicated.txt.gz contains data -> kept
CHH_CTOT_ETRs_1_deduplicated.txt.gz was empty -> deleted
CHH_CTOB_ETRs_1_deduplicated.txt.gz was empty -> deleted
CHH_OB_ETRs_1_deduplicated.txt.gz contains data -> kept

@FelixKrueger
Copy link
Owner

@FelixKrueger FelixKrueger commented Jan 8, 2021

Hi @imerelli

I really can't say why this isn't working, I have slightly modified the command and ran it over here:

 bismark_methylation_extractor -p -o ~/VersionControl/Bismark/consistency_filtering/outs/ --gzip --bedGraph --multicore 4 SRR4290315_1_val_1_bismark_bt2_pe.genome2.bam

And it runs through to the coverage step just fine:

/bi/home/fkrueger/VersionControl/Bismark/consistency_filtering/outs/CHH_OB_SRR4290315_1_val_1_bismark_bt2_pe.genome2.txt.gz contains data ->    kept


Using these input files: /bi/home/fkrueger/VersionControl/Bismark/consistency_filtering/outs/CpG_OT_SRR4290315_1_val_1_bismark_bt2_pe.genome2.txt.gz /bi/home/fkrueger/VersionControl/Bismark/consistency_filtering/outs/CpG_OB_SRR4290315_1_val_1_bismark_bt2_pe.genome2.txt.gz /bi/home/fkrueger/VersionControl/Bismark/consistency_filtering/outs/CHG_OT_SRR4290315_1_val_1_bismark_bt2_pe.genome2.txt.gz /bi/home/fkrueger/VersionControl/Bismark/consistency_filtering/outs/CHG_OB_SRR4290315_1_val_1_bismark_bt2_pe.genome2.txt.gz /bi/home/fkrueger/VersionControl/Bismark/consistency_filtering/outs/CHH_OT_SRR4290315_1_val_1_bismark_bt2_pe.genome2.txt.gz /bi/home/fkrueger/VersionControl/Bismark/consistency_filtering/outs/CHH_OB_SRR4290315_1_val_1_bismark_bt2_pe.genome2.txt.gz

Summary of parameters for bismark2bedGraph conversion:
======================================================
...

Finished BedGraph conversion ...

This is also using version 0.23.0. So in conclusion, it seems to be something on your side, but I am not sure what it could be? Could you just take a small number of lines from the initial BAM file (say 1000 lines), and see that works? Maybe drop some of the options, such as --multicore 4, or -o ...?

Loading

@imerelli
Copy link
Author

@imerelli imerelli commented Jan 8, 2021

Hi Felix,
I can confirm that with smaller bam files everything works as expected. I'm running other tests. Are there any memory concerns to consider when using larger files (~80GB) ? I'm using 32GB with 4 cores. May I use more cores? Anything else?

Loading

@FelixKrueger
Copy link
Owner

@FelixKrueger FelixKrueger commented Jan 8, 2021

The number of cores is only relevant for the extraction part (which seems to have been completed fine in your case). More cores should generally mean it will be faster.

The bedGraph conversion step is run on a single core and is indeed memory dependent, and will run with a default of 2GB, and cache the rest out to disk (the folder you specified with -o). So allowing more memory here (e.g. --buffer 20G) might speed it up somewhat. I am a little concerned that your didn't see any text on screen (or was that the end of a log file?). Maybe the process was still running but hadn't yet printed anything to the log due to log?

You should be able to start the bedGraph conversion directly, starting from the CpG* files, just type bismark2bedGraph --help for some instructions.

Loading

@imerelli
Copy link
Author

@imerelli imerelli commented Jan 8, 2021

Ok, I will go on with the tests and let you know. Is there a way, starting from the CpG* files, to generate the coverage files?

Loading

@FelixKrueger
Copy link
Owner

@FelixKrueger FelixKrueger commented Jan 8, 2021

yes indeed, e.g.:

bismark2bedGraph -o output.bedGraph --gzip --buffer 10G CpG*

Loading

@imerelli
Copy link
Author

@imerelli imerelli commented Jan 8, 2021

Ok, I get it. This command makes both the .bedgraph.gz and the .cov.gz files.

Loading

@imerelli
Copy link
Author

@imerelli imerelli commented Jan 9, 2021

Hi Felix, it was a memory problem. Increasing the buffer to 10G the problem disappeared (both computing directly with bismark_methylation_extractor and using bismark2bedGraph). Nonetheless, it would be interesting to understand why switching from memory to disk in sorting files (if I get correctly) gives problems.

Loading

@FelixKrueger
Copy link
Owner

@FelixKrueger FelixKrueger commented Jan 9, 2021

Excellent, glad this is now working. If you ever find out more, feel free to post it here.

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants