Problem with de novo miRNA annation with SS 4.0.2 #138

FlaviaPavan · 2023-08-23T10:43:11Z

Hi Dr Axtell,

I predict de novo miRNAs with SS 4.0.2 and the analysis didn't work for one of my bam files (otherwise SS works perfectly for the others). On the other hand, I have previously predicted miRNAs with SS 3.8.5 for the same bam without any problems. Here's the problem that appeared in the log:

Analyzing cluster properties using 2 threads
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/eep/softwares/miniconda/envs/shortstack-4.0.2/lib/python3.10/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/eep/softwares/miniconda/envs/shortstack-4.0.2/lib/python3.10/multiprocessing/pool.py", line 51, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "/eep/softwares/miniconda/envs/shortstack-4.0.2/bin/ShortStack", line 1778, in quant
for row in reader:
_csv.Error: field larger than field limit (131072)
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/eep/softwares/miniconda/envs/shortstack-4.0.2/bin/ShortStack", line 3588, in
qdata, pmir_bedfile = quant_controller(args, merged_bam, cluster_bed, read_count)
File "/eep/softwares/miniconda/envs/shortstack-4.0.2/bin/ShortStack", line 1996, in quant_controller
q_results = pool.starmap(quant, q_iter)
File "/eep/softwares/miniconda/envs/shortstack-4.0.2/lib/python3.10/multiprocessing/pool.py", line 375, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "/eep/softwares/miniconda/envs/shortstack-4.0.2/lib/python3.10/multiprocessing/pool.py", line 774, in get
raise self._value
_csv.Error: field larger than field limit (131072)"

Can you tell me what the problem is?
Thanks,
Flavia Pavan

MikeAxtell · 2023-08-23T19:17:46Z

The error message _csv.Error: field larger than field limit (131072) indicates that you have some unusual lines in your BAM file. Specifically, it appears there are one or more lines where there are more than 131072 characters in one of the tab-delimited SAM fields. I can't see a reason why any field in a valid small RNA-seq BAM file would have over one hundred thousand characters. Did you make this BAM file with ShortStack's aligner? Are there very long reads in it? Are there header lines in the BAM that are extremely long?

The 131072 character limit is a default for Python's csv parser, which ShortStack uses to quickly parse SAM data.

FlaviaPavan · 2023-08-24T13:26:13Z

Thanks for your quick reply. My data are small RNA-seq, I use bowtie for mapping and I give SS a sorted bam file. I have re-run the mapping, checked that the new bam was correct and I still have the same problem when predicting miRNA.... I have used the new bam file in other scripts which work fine.
I would like to avoid mapping with SS because I want to keep the same pipeline for all my samples. Do you think I could solve the problem in another way ?

MikeAxtell · 2023-08-24T13:28:47Z

I suspect the BAM is corrupt in some way .. it appears to have an unusually large field with more than 100000 characters in one or more lines. If you want to post it somewhere where I can get it (use my regular email not github) I can take a look. From: FlaviaPavan ***@***.***> Date: Thursday, August 24, 2023 at 9:26 AM To: MikeAxtell/ShortStack ***@***.***> Cc: Axtell, Michael ***@***.***>, Comment ***@***.***> Subject: Re: [MikeAxtell/ShortStack] Problem with de novo miRNA annation with SS 4.0.2 (Issue #138) Thanks for your quick reply. My data are small RNA-seq, I use bowtie for mapping and I give SS a sorted bam file. I have re-run the mapping, checked that the new bam was correct and I still have the same problem when predicting miRNA.... I have used the new bam file in other scripts which work fine. I would like to avoid mapping with SS because I want to keep the same pipeline for all my samples. Do you think I could solve the problem in another way ? — Reply to this email directly, view it on GitHub<#138 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABUJPCPNKKOB5CUDT45FWVTXW5JADANCNFSM6AAAAAA33IOWWY>. You are receiving this because you commented.Message ID: ***@***.***>

MikeAxtell · 2023-11-25T14:46:48Z

Another user also reported this error (#144) ... I could not trace exactly why it happens (the csv fields are never really that large), but I did make a simple hack to fix it, as of commit 2b8c4c5. This will be included in the next release. Thanks again for the bug report.

MikeAxtell mentioned this issue Nov 21, 2023

_csv.Error: field larger than field limit (131072) #144

Closed

MikeAxtell closed this as completed Nov 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with de novo miRNA annation with SS 4.0.2 #138

Problem with de novo miRNA annation with SS 4.0.2 #138

FlaviaPavan commented Aug 23, 2023

MikeAxtell commented Aug 23, 2023 •

edited

Loading

FlaviaPavan commented Aug 24, 2023

MikeAxtell commented Aug 24, 2023 via email

MikeAxtell commented Nov 25, 2023

Problem with de novo miRNA annation with SS 4.0.2 #138

Problem with de novo miRNA annation with SS 4.0.2 #138

Comments

FlaviaPavan commented Aug 23, 2023

MikeAxtell commented Aug 23, 2023 • edited Loading

FlaviaPavan commented Aug 24, 2023

MikeAxtell commented Aug 24, 2023 via email

MikeAxtell commented Nov 25, 2023

MikeAxtell commented Aug 23, 2023 •

edited

Loading