Fix identification of full length informative reads when using single end reads #133

timothymillar · 2020-05-20T01:53:51Z

If single end reads are used then TEF should still be able to use soft-clips (but not full danglers).

At the moment this fails though because the extraction process incorrectly identifies unmapped single end reads as having a mapped pair.

This happens because read.mate_is_unmapped (used here) always returns False for a single end read but it is currently used assuming a paired end read in which case it only returns False if the mate is unmapped.

Because of this the extraction script recognises an unmapped single read as having a mapped mate an therefore treats this read as an informative full length dangler.

Later on the script calls read.next_reference_name to get the name of the element that the mate read mapped to but this returns None for a single end read which ultimately results in a pysam error:

Traceback (most recent call last):
  File "/workspace/appscratch/miniconda/cfltxm_tefingerprint_v0.3.2/bin/tef-extract-informative", line 12, in <module>
    main()
  File "/workspace/appscratch/miniconda/cfltxm_tefingerprint_v0.3.2/bin/tef-extract-informative", line 8, in main
    Program.from_cli(sys.argv[1:]).run()
  File "/workspace/appscratch/miniconda/cfltxm_tefingerprint_v0.3.2/lib/python3.7/site-packages/tefingerprint/_applications/extract_informative.py", line 249, in run
    self._run_pipeline()
  File "/workspace/appscratch/miniconda/cfltxm_tefingerprint_v0.3.2/lib/python3.7/site-packages/tefingerprint/_applications/extract_informative.py", line 197, in _run_pipeline
    self.mate_element_tag)
  File "/workspace/appscratch/miniconda/cfltxm_tefingerprint_v0.3.2/lib/python3.7/site-packages/tefingerprint/_applications/extract_informative.py", line 512, in tag_danglers
    read.tags += [(tag, mate_element_dict[read.qname])]
  File "pysam/libcalignedsegment.pyx", line 2763, in pysam.libcalignedsegment.AlignedSegment.tags.__set__
  File "pysam/libcalignedsegment.pyx", line 2571, in pysam.libcalignedsegment.AlignedSegment.set_tags
  File "pysam/libcalignedsegment.pyx", line 418, in pysam.libcalignedsegment.pack_tags
ValueError: could not deduce typecode for value None

This can be avoided by checking if a read has a pair with read.is_paired before trying to identify if it is a full-length informative dangler.
Cheching for soft-clips should always use read.reference_name instead of read.next_reference_name to avoid this issue

This issue is already partially resolved in the dev version which prioritises soft-clips over full length danglers because the use of full-length reads can be disabled when using single end reads.

The text was updated successfully, but these errors were encountered:

timothymillar added the bug label May 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix identification of full length informative reads when using single end reads #133

Fix identification of full length informative reads when using single end reads #133

timothymillar commented May 20, 2020

Fix identification of full length informative reads when using single end reads #133

Fix identification of full length informative reads when using single end reads #133

Comments

timothymillar commented May 20, 2020