Missing data for tetraploid multiparenting population #25

PaulaEB · 2022-12-01T15:46:47Z

Hello David,
Thanks for developing updog!
My project goal is identify QTLs for pest resistance, so we have a multiparenting population similar to a NAM pop (4 pollen recipients and a pollen donor) so we have four half-sib families. We are treating each family separated but I'd like to know your thoughts about if it's possible to do use all the population for the genotype calling.

And a last question would be about the missing data for de geno field. In the multidog$inddf output we don't see missing data, is this normal?

Thank you very much!
Paula E

dcgerard · 2022-12-09T15:59:39Z

Hey @PaulaEB,

Thanks for trying out {updog}!

I haven't gotten around to allowing for multiparent populations yet. Some things you can look into:

Are the genotypes estimated to be the same for the same parent for runs on different populations?
Are the sequencing error rates, allele biases, and overdispersions estimated to be about the same at the same SNP?

If the answer is yes to both, then combining the different populations would not help much. Estimating the parent genotypes and those parameters is the benefit of using a larger sample size.

As for the missing data, if an individual has NA listed, then it should provide NA in the output. If it has 0 listed for the read-depth, then {updog} will impute the genotype from the prior distribution (which is the best you can do if you aren't use information from other SNPs). E.g. consider:

library(updog)
refvec <- c(3, 4, 0, 8, 3)
sizevec <- c(10, 10, 0, 10, 10)
fout <- flexdog(refvec = refvec, sizevec = sizevec, ploidy = 4, )
fout$geno
plot(fout$postmat[3, ], fout$gene_dist)
abline(0, 1)

refvec <- c(3, 4, NA, 8, 3)
sizevec <- c(10, 10, NA, 10, 10)
fout <- flexdog(refvec = refvec, sizevec = sizevec, ploidy = 4, )
fout$geno

Best,
David

PaulaEB · 2023-11-03T16:33:54Z

Hello @dcgerard, many thanks for your clarification! I am going back to this data, but I would like to keep the missing (0) missing as GATK mark the missing values in DP as DP=0 (https://gatk.broadinstitute.org/hc/en-us/articles/6012243429531-GenotypeGVCFs-and-the-death-of-the-dot)

Is it possible to change that from updog or should I do that in the VCF with other tool?

Thanks again
Paula

dcgerard · 2023-11-06T20:14:38Z

Yey @PaulaEB,

You can do that in R really easily.

E.g., suppose this is the matrix containing the read-depths:

sizemat <- matrix(c(0, 1, 2, 1,
                    1, 0, 1, 1,
                    1, 2, 1, 0), ncol = 4, byrow = TRUE)

Then we can convert those 0's to NA's via:

sizemat[sizemat == 0] <- NA

Cheers,
David

dcgerard added the question label Dec 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing data for tetraploid multiparenting population #25

Missing data for tetraploid multiparenting population #25

PaulaEB commented Dec 1, 2022

dcgerard commented Dec 9, 2022

PaulaEB commented Nov 3, 2023

dcgerard commented Nov 6, 2023

Missing data for tetraploid multiparenting population #25

Missing data for tetraploid multiparenting population #25

Comments

PaulaEB commented Dec 1, 2022

dcgerard commented Dec 9, 2022

PaulaEB commented Nov 3, 2023

dcgerard commented Nov 6, 2023