Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with de novo miRNA annation with SS 4.0.2 #138

Closed
FlaviaPavan opened this issue Aug 23, 2023 · 4 comments
Closed

Problem with de novo miRNA annation with SS 4.0.2 #138

FlaviaPavan opened this issue Aug 23, 2023 · 4 comments

Comments

@FlaviaPavan
Copy link

Hi Dr Axtell,

I predict de novo miRNAs with SS 4.0.2 and the analysis didn't work for one of my bam files (otherwise SS works perfectly for the others). On the other hand, I have previously predicted miRNAs with SS 3.8.5 for the same bam without any problems. Here's the problem that appeared in the log:

Analyzing cluster properties using 2 threads
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/eep/softwares/miniconda/envs/shortstack-4.0.2/lib/python3.10/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/eep/softwares/miniconda/envs/shortstack-4.0.2/lib/python3.10/multiprocessing/pool.py", line 51, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "/eep/softwares/miniconda/envs/shortstack-4.0.2/bin/ShortStack", line 1778, in quant
for row in reader:
_csv.Error: field larger than field limit (131072)
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/eep/softwares/miniconda/envs/shortstack-4.0.2/bin/ShortStack", line 3588, in
qdata, pmir_bedfile = quant_controller(args, merged_bam, cluster_bed, read_count)
File "/eep/softwares/miniconda/envs/shortstack-4.0.2/bin/ShortStack", line 1996, in quant_controller
q_results = pool.starmap(quant, q_iter)
File "/eep/softwares/miniconda/envs/shortstack-4.0.2/lib/python3.10/multiprocessing/pool.py", line 375, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "/eep/softwares/miniconda/envs/shortstack-4.0.2/lib/python3.10/multiprocessing/pool.py", line 774, in get
raise self._value
_csv.Error: field larger than field limit (131072)"

Can you tell me what the problem is?
Thanks,
Flavia Pavan

@MikeAxtell
Copy link
Owner

MikeAxtell commented Aug 23, 2023

The error message _csv.Error: field larger than field limit (131072) indicates that you have some unusual lines in your BAM file. Specifically, it appears there are one or more lines where there are more than 131072 characters in one of the tab-delimited SAM fields. I can't see a reason why any field in a valid small RNA-seq BAM file would have over one hundred thousand characters. Did you make this BAM file with ShortStack's aligner? Are there very long reads in it? Are there header lines in the BAM that are extremely long?

The 131072 character limit is a default for Python's csv parser, which ShortStack uses to quickly parse SAM data.

@FlaviaPavan
Copy link
Author

Thanks for your quick reply. My data are small RNA-seq, I use bowtie for mapping and I give SS a sorted bam file. I have re-run the mapping, checked that the new bam was correct and I still have the same problem when predicting miRNA.... I have used the new bam file in other scripts which work fine.
I would like to avoid mapping with SS because I want to keep the same pipeline for all my samples. Do you think I could solve the problem in another way ?

@MikeAxtell
Copy link
Owner

MikeAxtell commented Aug 24, 2023 via email

@MikeAxtell
Copy link
Owner

Another user also reported this error (#144) ... I could not trace exactly why it happens (the csv fields are never really that large), but I did make a simple hack to fix it, as of commit 2b8c4c5. This will be included in the next release. Thanks again for the bug report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants