New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for multi-generational pedigrees #388

Closed
arq5x opened this Issue Mar 9, 2015 · 15 comments

Comments

Projects
None yet
5 participants
@arq5x
Owner

arq5x commented Mar 9, 2015

Currently, we only support two-generation nuclear families with the auto_* and de_novo tools.

@arq5x arq5x added the critical label Mar 9, 2015

@arq5x arq5x added this to the 0.15.0 milestone Apr 23, 2015

@brentp

This comment has been minimized.

Collaborator

brentp commented May 5, 2015

Here are the criteria we had decided:

autosomal recessive:

  • anyone who is not affected must not be homalt
  • anyone who is affected must be homalt

autosomal dominant:

  • anyone who is affected must be het
  • anyone who isn't affected must not be het or homalt.

de novo:

  • count number of kids that have it. (what about the following generation?)

Shall these be the defaults? e.g. if they choose auto_rec and it's a mult-gen pedigree, should it only output if all family members match? Or should that be a flag to require the more stringent tests?

@arq5x

This comment has been minimized.

Owner

arq5x commented May 5, 2015

If you have time, we would love feedback from @jxchong , @dgaston and others about this idea. One complexity is family members with an unknown phenotype, but I think the logic still stands. For example. for auto_rec, unknowns would still be lumped into "not affected".

@jxchong

This comment has been minimized.

Contributor

jxchong commented May 5, 2015

Need to handle compound hets too, in which case:

compound heterozygote:

  • anyone who is affected must be compound het for the same comphet id
  • anyone who is unaffected cannot be homalt at any of the variants in the comp_het id found in the affecteds or comphet for the same comphet id as the affected

I think autosomal dominant should be multi-generation by default. Not clear if recessive should be multi-generation by default because you don't expect to see multi-generational recessive except possibly for very common phenotypes and/or highly consanguineous families. I lean towards yes, making multi-generational by default because I think as a user, I would assume that it is always doing multi-generational unless told otherwise.

de novos that are passed onto the next generation are then dominant -- I don't think gemini can currently handle this, but it would certainly be nice... Also I don't know that I would restrict de novo to be n=1 among the offspring in case of gondal mosaicism, which could appear as unaffected, homref parents and 1 or more affected offspring. Alternatively even somatic mosaicism in a parent could appear as unaffected homref parents and 1+ affected offspring. Making de novo only be when n=1 kids seems artificially limiting and prevents you from looking for de novo mosaics. Instead I would do:

de novo:

  • anyone who is affected should be het, all others should be ref
@jmcelwee

This comment has been minimized.

jmcelwee commented May 6, 2015

I agree that dominant needs to be multi-generational. I also think recessive should have the ability to be multi-generational, as many families being analyzed with this tool are highly consanguineous.

I would think any de novo that's passed down through a pedigree then would appropriately fit the dominant model (at least according to the criteria above), so probably wouldn't need a specific analysis in that case.

@oleraj

This comment has been minimized.

oleraj commented May 6, 2015

Aaron, one thought regarding "family members with an unknown phenotype". I posted an example a little while ago in the google group of how I expected the built-in tools to handle this: https://groups.google.com/forum/#!searchin/gemini-variation/unknown$20phenotype/gemini-variation/FtUvjsL7UUg/Nle09N2VrV4J

Basically, it seems to me that individuals with unknown phenotype shouldn't contribute to the filtering. They should be allowed to have any genotype. Sometimes we don't know if an individual has the phenotype yet -- e.g., late onset, variable penetrance, or else they have a similar phenotype but not quite the same so we don't want to use that individual for strict filtering. But we would like to include them in the analysis so that when we look at the candidate genes, we can look at the genotypes of the unknowns along-side those who have the phenotype.

@jxchong

This comment has been minimized.

Contributor

jxchong commented May 6, 2015

completely 100% agree with @oleraj

@arq5x

This comment has been minimized.

Owner

arq5x commented May 7, 2015

Thanks all!

compound heterozygote:

Just to clarify that the rule that "anyone who is affected must be compound het for the same comphet id" applies only within a guven family. The same gene can be affected by other compound hets in different families and still meet the --min-kindreds criteria. The rule that "anyone who is unaffected cannot be homalt at any of the variants in the comp_het id found in the affecteds or comphet for the same comphet id as the affected" is indeed an enhancement we need to make.

autosomal recessive and dominant:

It seems to mee that it makes sense that both should be multi-generational by default.

de novo:

The rule that "anyone who is affected should be het, all others should be ref" works in the context of disease mapping, but not as a general method for finding all spontaneous mutations in a non-disease context. Perhaps the default should be in a disease context using these rules and we can have a switch for the general case.

unknown phenotype:

Thanks @oleraj, this makes sense

@brentp brentp modified the milestones: 0.15.0, 0.16.0 May 18, 2015

@brentp

This comment has been minimized.

Collaborator

brentp commented May 19, 2015

Hi All, we're revamping the inheritance stuff following your suggestions. I put up a compilation of what it's doing here:

https://gist.github.com/brentp/4b3cbeebfaa7360b5ce6

Please let me know any suggestions / changes / errors as this is a large enough change that we would like to cover most uses and fix existing problems.

@brentp

This comment has been minimized.

Collaborator

brentp commented May 21, 2015

I'm working through these and the comments on the gist.

Here's another question. Is this de novo?

mom: HOM_ALT
dad: HOM_ALT
kid:HET

seems so since the choice of reference is arbitrary.

@jxchong

This comment has been minimized.

Contributor

jxchong commented May 21, 2015

Absolutely IMO.

On May 21, 2015, at 5:19 PM, Brent Pedersen - Bioinformatics notifications@github.com wrote:

I'm working through these and the comments on the gist.

Here's another question. Is this de novo?

mom: HOM_ALT
dad: HOM_ALT
kid:HET
seems so since the choice of reference is arbitrary.


Reply to this email directly or view it on GitHub.

@arq5x

This comment has been minimized.

Owner

arq5x commented May 22, 2015

Yep I agree. This logic exists in the currentde_novo tool.

@brentp

This comment has been minimized.

Collaborator

brentp commented May 22, 2015

yes, it's in current tool, I adjusted. how about this case:

  # dad:un, mom:un, kid:aff, kid2:un
  >>> fam.gt_types = [HOM_REF, HOM_REF, HET, HET]

should that be de_novo? how should it depend on strict?
we had that all unaffected must be hom_ref (or hom_alt).

what about this under non-strict? (unaffected kid has de-novo):

 >>> fam.gt_types = [HOM_REF, HOM_REF, HOM_REF, HET]
@jxchong

This comment has been minimized.

Contributor

jxchong commented May 22, 2015

Under strict, for the purposes of Mendelian filtering, it's not de novo because the 2nd kid is unaffected.

Under non-strict, I'm not sure. Again, for Mendelian analysis, you don't care about variants that are de novo in unaffected kids. Not sure about whether people use the built-in inheritance tools for other purposes?

Maybe you solve this by having a --only-affected option for de_novo?

@brentp

This comment has been minimized.

Collaborator

brentp commented May 22, 2015

yeah, I noticed again after posting that's exactly what current gemini does with --only-affected.

@brentp

This comment has been minimized.

Collaborator

brentp commented Jun 15, 2015

The current state of development along with a representation that we are using for testing is:
https://github.com/brentp/gemini/blob/inheritance-revamp/inheritance.ipynb

We will continue testing since this is a major change

in addition:

  • phase-by-transmission is implemented and will be the default for the comp_het tool.
  • all tools will use bcolz when available.

@brentp brentp closed this Nov 6, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment