Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

very slow nucleoatac run #34

Closed
kenkclam opened this issue Jan 25, 2016 · 2 comments
Closed

very slow nucleoatac run #34

kenkclam opened this issue Jan 25, 2016 · 2 comments

Comments

@kenkclam
Copy link

Hi Alicia,

Thanks for the great package.
I installed nucleoatac and tried the provided example.
The example worked and finished in around 5 minutes.

Then I ran my own bam file (human data) and it was very slow. The bed file is human genome modified with slopBed (-b 600). Chromosomes are labelled with numbers only (thus 3, instead of chr3 or chrIII), but it is consistent among the bed, bam and fa files
Here is what I ran:
/package/NucleoATAC-0.3.1/bin/nucleoatac run --bed /data/hs38_genes_b600.bed --bam /data/Sample_Lib_A376_2/Bowtie2/50K.bam --fasta /data/repository/organisms/GRCh38_ensembl/genome_fasta/genome.fa --out Sample_Lib_A376_2

I can finish bowtie to generate these bam files very quickly, but with the same computational resources, the nucleoatac ran for 4 days and was still running. It generated the output file and was modifying it. There are no error messages.

The bam file contain 40 mil read. Is it normal that it runs for a week? I am new to data analysis, it would be nice you can point me to relevant settings that I should check.

Many Thanks!!
Ken

Python 2.7.5
NucleoATAC (0.3.1)
$ pip list
cycler (0.9.0)
Cython (0.23.4)
matplotlib (1.5.0)
numpy (1.10.1)
pip (1.4.1)
pyparsing (2.0.5)
pysam (0.8.3)
python-dateutil (2.4.2)
pytz (2015.7)
scipy (0.16.1)
setuptools (0.9.8)
six (1.10.0)
wsgiref (0.1.2)

@kenkclam
Copy link
Author

Actually I cannot run nucleoatac with all the active genes.
When I reduced the bed file a lot, then it was fine.

@AliciaSchep
Copy link
Contributor

You can use the --cores flag to use multiple cores to speed things up. You set --cores 10 to use 10 cores for example.

Even using multiple cores, the program is not intended to be run on a whole genome or anything close to that. As most of the genome will have very little coverage (and the little coverage there is will be background noise), running it on anything but open chromatin peaks is a waste of time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants