-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in performing process_atac #27
Comments
Hi there, sorry about the delay. It's hard to tell without looking at the actual file, but it sounds like the peaks file you are using may not be tab-separated (DAStk expects a tab as the delimiter). Could you check if that is the case? |
I reformatted my .broadPeak file using the awk command to insert "\t" tab delimiters between columns. The error i know receive is: I am either not making it to the previous error or have solved that problem. |
Please take a closer look at the Readme document. If you have an internet connection, it should be reaching out to UCSC to get the chromosome sizes for the reference genome that you are using. Alternatively, you can also provide your own file with the chromosome sizes (we provide instructions on how to create it on the Readme, too). |
I have created the genome.chrom.sizes file as mentioned in the readme and included the --chromosomes argument in the process_atac call but I am still returning the same error. I installed and reinstalled dastk using the pip install/uninstall command - is there an alternative installation that may help me? |
At this point the easiest way to debug this would be if you could send us the exact commands that you are running, and at least a sample of the peaks file. One last thing to check: Have you checked that all files involved (peaks, motif binding sites, etc) are sorted by the same criteria? |
I double checked my sorting and they are all consistent. Here is what I am working with:
Other than setting directories, the only command I have is: $CHROMSIZES directs to a file generated by: $GENOME is at: and $MOTIF was downloaded from your repository My input file (Sm25_ATACseq-Ctrl2_R1_peaks_clean_chr.broadPeak) was generated from macs2 callPeak and is only altered by intersecting out the blacklist and the addition of "chr" to the beginning so the sample output looks like: I am using the rmacc-summit computing cluster and I have allocated sufficient memory, nodes and time for the project (was successful in running dastk previously on other samples). However, when I submit each time now, I return the same error posted above: I have tried the following so far: I am worried that the problem may lie in my conda environment and the versions of python and python libraries I am using currently (have experienced this with other programs such as igvtools where it runs with certain versions but not others) Thanks again for your help. |
Sorry about the delay, I was presenting at a conference and missed this notification. Something else I didn't point out: the The bottom line: try the above without specifying the |
Ok, I am back to receiving my original error, seems like the chromosome sizes issue has been fixed (posted at the top of this thread - pandas.errors.ParserError: Too many columns specified: expected 3 and found 1) chr1 10090 10206 Sm25_ATACseq-Ctrl2_R1_peak_1 26 + |
I used the awk command |
That’s odd, the error really means the input is not tab-delimited. Usually the output from MACS is tab-delimited by default, why did you pass it thru awk? Maybe you need to explicitly specify the output field separator if you do need awk, by adding OFS="\t"... At this point, the easiest way would be to try with the broadPeak file directly or just email me the broadPeak file to ignacio.tripodi (at) colorado.edu so we can test it because we can't reproduce the error. |
Closing this one, after discussing this offline it looks like the problem was a zero-sized bed file in the motif sites directory. |
Hi Margaret,
I am trying to process ATAC-seq data in the form of broadPeak files (generated from macs2) which are in bedGraph format for motif displacement. I have downloaded HOCOMOCO v11 hg38 motifs from your repository. The error I return is:
Traceback (most recent call last):
File "/projects/agupta06@xsede.org/applications/conda/env/bio-env/bin/process_atac", line 10, in
sys.exit(process_atac())
File "/projects/agupta06@xsede.org/applications/conda/env/bio-env/lib/python2.7/site-packages/DAStk/init.py", line 7, in process_atac
p.main()
File "/projects/agupta06@xsede.org/applications/conda/env/bio-env/lib/python2.7/site-packages/DAStk/process_atac.py", line 253, in main
[md_score, small_window, large_window, motif_site_count, heat] = get_md_score(filename, int(args.mp_threads), args.atac_peaks_filename, CHROMOSOMES, args.radius)
File "/projects/agupta06@xsede.org/applications/conda/env/bio-env/lib/python2.7/site-packages/DAStk/process_atac.py", line 133, in get_md_score
CHROMOSOMES)
File "/projects/agupta06@xsede.org/applications/conda/env/bio-env/lib/python2.7/multiprocessing/pool.py", line 253, in map
return self.map_async(func, iterable, chunksize).get()
File "/projects/agupta06@xsede.org/applications/conda/env/bio-env/lib/python2.7/multiprocessing/pool.py", line 572, in get
raise self._value
pandas.errors.ParserError: Too many columns specified: expected 3 and found 1
My broadPeak files have 9 columns and appear similar to previous broadPeak files I have successfully analyzed with process_ATAC. Can you please advise as to where my error is?
Thanks
The text was updated successfully, but these errors were encountered: