Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix identification of full length informative reads when using single end reads #133

Open
timothymillar opened this issue May 20, 2020 · 0 comments
Labels

Comments

@timothymillar
Copy link
Collaborator

If single end reads are used then TEF should still be able to use soft-clips (but not full danglers).

At the moment this fails though because the extraction process incorrectly identifies unmapped single end reads as having a mapped pair.

This happens because read.mate_is_unmapped (used here) always returns False for a single end read but it is currently used assuming a paired end read in which case it only returns False if the mate is unmapped.

Because of this the extraction script recognises an unmapped single read as having a mapped mate an therefore treats this read as an informative full length dangler.

Later on the script calls read.next_reference_name to get the name of the element that the mate read mapped to but this returns None for a single end read which ultimately results in a pysam error:

Traceback (most recent call last):
  File "/workspace/appscratch/miniconda/cfltxm_tefingerprint_v0.3.2/bin/tef-extract-informative", line 12, in <module>
    main()
  File "/workspace/appscratch/miniconda/cfltxm_tefingerprint_v0.3.2/bin/tef-extract-informative", line 8, in main
    Program.from_cli(sys.argv[1:]).run()
  File "/workspace/appscratch/miniconda/cfltxm_tefingerprint_v0.3.2/lib/python3.7/site-packages/tefingerprint/_applications/extract_informative.py", line 249, in run
    self._run_pipeline()
  File "/workspace/appscratch/miniconda/cfltxm_tefingerprint_v0.3.2/lib/python3.7/site-packages/tefingerprint/_applications/extract_informative.py", line 197, in _run_pipeline
    self.mate_element_tag)
  File "/workspace/appscratch/miniconda/cfltxm_tefingerprint_v0.3.2/lib/python3.7/site-packages/tefingerprint/_applications/extract_informative.py", line 512, in tag_danglers
    read.tags += [(tag, mate_element_dict[read.qname])]
  File "pysam/libcalignedsegment.pyx", line 2763, in pysam.libcalignedsegment.AlignedSegment.tags.__set__
  File "pysam/libcalignedsegment.pyx", line 2571, in pysam.libcalignedsegment.AlignedSegment.set_tags
  File "pysam/libcalignedsegment.pyx", line 418, in pysam.libcalignedsegment.pack_tags
ValueError: could not deduce typecode for value None

This can be avoided by checking if a read has a pair with read.is_paired before trying to identify if it is a full-length informative dangler.
Cheching for soft-clips should always use read.reference_name instead of read.next_reference_name to avoid this issue

This issue is already partially resolved in the dev version which prioritises soft-clips over full length danglers because the use of full-length reads can be disabled when using single end reads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant