Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynast count error: #6

Closed
mav-mit opened this issue Aug 13, 2021 · 7 comments
Closed

Dynast count error: #6

mav-mit opened this issue Aug 13, 2021 · 7 comments
Labels

Comments

@mav-mit
Copy link

mav-mit commented Aug 13, 2021

Despite using Dynast Count on the same data there seems to be a difference when running Dynast with "TC, GA ", "GA", and "TC". It seems that there's in increase in the conversion you call for. (ie. Higher TC when you look for TC)

210809_dual_labeling.pdf

@Lioscro
Copy link
Collaborator

Lioscro commented Aug 14, 2021

Hi, @MartinaVillanueva,
What exactly are you plotting here?
Are these the mutation rates?

Assuming those are what you are plotting here, it is likely due to how UMI deduplication works. When reads with the same cell BC and UMI that maps to the same gene is observed, the read with the most conversions of interest is selected.

@Xiaojieqiu
Copy link
Contributor

Thanks! My understanding is that when Martin calls for TC,GA (with --conversion TC,GA argument in dynast count), the TC, GA mutation rates are different from when you call for TC or GA separately (with --conversion TC or --conversion GA argument in dynast count). And when calling for TC or GA, the corresponding TC/GA mutation rate is higher than when you don't look for it. Is any special treatment for the mutation you asked for (via ---conversion) comparing the rest mutations?

@mav-mit
Copy link
Author

mav-mit commented Aug 16, 2021

Exactly @Xiaojieqiu! Does that make sense @Lioscro ?

@mav-mit
Copy link
Author

mav-mit commented Aug 16, 2021

Take a look at GA conversion and how it it lower when we don't look for the conversion (last slide) vs when we do look for it (the top 2 slides)

210809_dual_labeling.pdf

@Lioscro
Copy link
Collaborator

Lioscro commented Aug 16, 2021

I see what you mean. This is because in the UMI deduplication step, which read is selected depends on the number of conversions (see my previous comment). When you supply --conversion TC,GA, the read with the most TC+GA conversions is selected; when you supply --conversion GA, the read with the most GA conversions is selected; and vice-versa when you supply --conversion TC.
(To be exact, the order of priority is 1) the read that maps to the transcriptome (exon only), 2) the read that has the highest alignment score, 3) read with the highest sum of the provided --conversion.)

Does that make sense? So it seems that you have many reads per UMI that map to the same gene, do not map to exons only, have (equal) maximum alignment score, but have quite different conversion numbers.

@mav-mit
Copy link
Author

mav-mit commented Aug 23, 2021

I see. And so the reason we see changes in other conversions (see blue and yellow circles) is because based on the transcripts that were selected to have the conversion of interest, it changes the background. Is that right?

Would you expect this to affect the accuracy of calling new / old transcripts?
210809_dual_labeling_2.pdf

@github-actions
Copy link

github-actions bot commented Dec 4, 2021

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days

@github-actions github-actions bot added the Stale label Dec 4, 2021
@github-actions github-actions bot closed this as completed Dec 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants