Skip to content

Commit

Permalink
add --families to auto_*, de_novo, and comp_hets
Browse files Browse the repository at this point in the history
  • Loading branch information
arq5x committed Feb 18, 2015
1 parent 0d89f70 commit bd24731
Show file tree
Hide file tree
Showing 15 changed files with 527 additions and 6 deletions.
56 changes: 56 additions & 0 deletions docs/content/tools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,19 @@ only individuals with an affected phenotype using the ``--only-affected`` option
$ gemini comp_hets --only-affected my.db
--------------------
``--families``
--------------------
By default, candidate compound heterozygous variants are reported for families
in the database. One can restrict the analysis to variants in
specific familes with the ``--families`` option. Families should be provided
as a comma-separated list

.. code-block:: bash
$ gemini comp_hets --families 1 my.db
$ gemini comp_hets --families 1,7 my.db
---------------------
``--ignore-phasing``
Expand Down Expand Up @@ -306,6 +319,20 @@ only individuals with an affected phenotype using the ``--only-affected`` option
$ gemini de_novo --only-affected my.db
--------------------
``--families``
--------------------
By default, candidate de novo variants are reported for families
in the database. One can restrict the analysis to variants in
specific familes with the ``--families`` option. Families should be provided
as a comma-separated list

.. code-block:: bash
$ gemini de_novo --families 1 my.db
$ gemini de_novo --families 1,7 my.db
============================================================================
``autosomal_recessive``: Find variants meeting an autosomal recessive model.
============================================================================
Expand Down Expand Up @@ -411,6 +438,20 @@ to be the _same_ variant) observed in at least two kindreds, use the following:
1 1_dad(father; unaffected),1_mom(mother; unaffected),1_kid(child; affected) T/C,T/C,C/C 39,29,24 WDR37 chr10 1142207 1142208 T C stop_loss HIGH
2 2_dad(father; unaffected),2_mom(mother; unaffected),2_kid(child; affected) T/C,T/C,C/C 59,49,64 WDR37 chr10 1142207 1142208 T C stop_loss HIGH
--------------------
``--families``
--------------------
By default, candidate autosomal recessive variants are reported for families
in the database. One can restrict the analysis to variants in
specific familes with the ``--families`` option. Families should be provided
as a comma-separated list

.. code-block:: bash
$ gemini autosomal_recessive --families 1 my.db
$ gemini autosomal_recessive --families 1,7 my.db
---------------------
``--filter``
---------------------
Expand Down Expand Up @@ -577,6 +618,21 @@ to be the _same_ variant) observed in at least two kindreds, use the following:
2 2_dad(father; unaffected),2_mom(mother; affected),2_kid(child; affected) T/T,T/C,T/C 39,29,24 WDR37 chr10 1142207 1142208 T C stop_loss HIGH
3 3_dad(father; affected),3_mom(mother; unknown),3_kid(child; affected) T/C,T/T,T/C 39,29,24 WDR37 chr10 1142207 1142208 T C stop_loss HIGH
--------------------
``--families``
--------------------
By default, candidate autosomal dominant variants are reported for families
in the database. One can restrict the analysis to variants in
specific familes with the ``--families`` option. Families should be provided
as a comma-separated list

.. code-block:: bash
$ gemini autosomal_dominant --families 1 my.db
$ gemini autosomal_dominant --families 1,7 my.db
---------------------
``--filter``
---------------------
Expand Down
9 changes: 8 additions & 1 deletion gemini/gemini_inheritance_model_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,12 +53,19 @@ def _report_candidates(self):
"\t",
print row

def _cull_families(self):
"""
If the user has asked to restric the analysis to a specific set
of families, then we need to prune the list of possible families
to that specific subset.
"""

def _get_family_info(self):
"""
Extract the relevant genotype filters, as well all labels
for each family in the database.
"""
families = subjects.get_families(self.args.db)
families = subjects.get_families(self.args.db, self.args.families)
self.family_ids = []
self.family_masks = []
self.family_gt_labels = []
Expand Down
18 changes: 17 additions & 1 deletion gemini/gemini_main.py
Original file line number Diff line number Diff line change
Expand Up @@ -585,9 +585,13 @@ def dbinfo_fn(parser, args):
parser_comp_hets.add_argument('--only-affected',
dest='only_affected',
action='store_true',
help='Report solely those compund heterozygotes impacted a sample \
help='Report solely those compound heterozygotes impacted a sample \
labeled as affected.',
default=False)
parser_comp_hets.add_argument('--families',
dest='families',
help='Restrict analysis to a specific set of 1 or more (comma) separated) families',
default=None)
parser_comp_hets.add_argument('--ignore-phasing',
dest='ignore_phasing',
action='store_true',
Expand Down Expand Up @@ -764,6 +768,10 @@ def lof_interactions_fn(parser, args):
type=int,
default=1,
help='The min. number of kindreds that must have a candidate variant in a gene.')
parser_auto_rec.add_argument('--families',
dest='families',
help='Restrict analysis to a specific set of 1 or more (comma) separated) families',
default=None)
parser_auto_rec.add_argument('-d',
dest='min_sample_depth',
type=int,
Expand Down Expand Up @@ -799,6 +807,10 @@ def autosomal_recessive_fn(parser, args):
type=int,
default=1,
help='The min. number of kindreds that must have a candidate variant in a gene.')
parser_auto_dom.add_argument('--families',
dest='families',
help='Restrict analysis to a specific set of 1 or more (comma) separated) families',
default=None)
parser_auto_dom.add_argument('-d',
dest='min_sample_depth',
type=int,
Expand Down Expand Up @@ -841,6 +853,10 @@ def autosomal_dominant_fn(parser, args):
help='Report solely those de novos that impact a sample \
labeled as affected.',
default=False)
parser_de_novo.add_argument('--families',
dest='families',
help='Restrict analysis to a specific set of 1 or more (comma) separated) families',
default=None)
parser_de_novo.add_argument('-d',
dest='min_sample_depth',
type=int,
Expand Down
20 changes: 17 additions & 3 deletions gemini/gemini_subjects.py
Original file line number Diff line number Diff line change
Expand Up @@ -653,7 +653,7 @@ def get_subject_depth_labels(self):
return subjects


def get_families(db):
def get_families(db, selected_families=None):
"""
Query the samples table to return a list of Family
objects that each contain all of the Subjects in a Family.
Expand All @@ -668,6 +668,8 @@ def get_families(db):
ORDER BY family_id"
c.execute(query)

# create a mapping of family_id to the list of
# individuals that are members of the family.
families_dict = {}
for row in c:
subject = Subject(row)
Expand All @@ -678,10 +680,22 @@ def get_families(db):
families_dict[family_id] = []
families_dict[family_id].append(subject)

# if the user has specified a set of selected families
# to which the analysis should be restricted, then
# first sanity check that the family ids they specified are valid.
if selected_families is not None:
for family in selected_families.split(','):
if family not in families_dict:
sys.exit("ERROR: family \"%s\" is not a valid family_id\n" % family)

families = []
for fam in families_dict:
family = Family(families_dict[fam])
families.append(family)
if selected_families is None:
family = Family(families_dict[fam])
families.append(family)
elif fam in selected_families:
family = Family(families_dict[fam])
families.append(family)
return families

def get_family_dict(args):
Expand Down
6 changes: 6 additions & 0 deletions gemini/tool_compound_hets.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,12 @@ def get_compound_hets(args):
for idx, gt_type in enumerate(gt_types):
if gt_type == HET:
sample = idx_to_sample[idx]

# make sure the sample is in the right family if the
# user has asked for specific families to be analyzed.
if args.families is not None:
if subjects_dict[sample].family_id not in args.families:
continue

if args.only_affected and not subjects_dict[sample].affected:
continue
Expand Down
1 change: 1 addition & 0 deletions test/data-setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ gemini load --skip-gene-tables --test-mode -v test.region.vep.vcf --skip-gerp-bp
gemini load --skip-gene-tables --test-mode -v test.burden.vcf --skip-gerp-bp --skip-cadd -t VEP -p test.burden.ped test.burden.db
gemini load --skip-gene-tables --test-mode -v test.comp_het.vcf --skip-gerp-bp --skip-cadd -t snpEff -p test.comp_het.ped test.comp_het.db
gemini load --skip-gene-tables --test-mode -v test.comp_het.2.vcf --skip-gerp-bp --skip-cadd -t snpEff test.comp_het_default.db
gemini load --skip-gene-tables --test-mode -v test.comp_het.3.vcf --skip-gerp-bp --skip-cadd -t snpEff -p test.comp_het.ped test.comp_het_default.2.db
gemini load --skip-gene-tables --test-mode -v test.auto_dom.vcf --skip-gerp-bp --skip-cadd -t snpEff -p test.auto_dom.ped test.auto_dom.db
gemini load --skip-gene-tables --test-mode -v test.auto_dom.no_parents.vcf --skip-gerp-bp --skip-cadd -t snpEff -p test.auto_dom.no_parents.ped test.auto_dom.no_parents.db
gemini load --skip-gene-tables --test-mode -v test.auto_dom.no_parents.2.vcf --skip-gerp-bp --skip-cadd -t snpEff -p test.auto_dom.no_parents.2.ped test.auto_dom.no_parents.2.db
Expand Down
57 changes: 57 additions & 0 deletions test/test-auto-dom.sh
Original file line number Diff line number Diff line change
@@ -1,3 +1,12 @@
check()
{
if diff $1 $2; then
echo ok
else
echo fail
fi
}
export -f check
###################################################################
# 1. Test basic auto_dominant functionality
###################################################################
Expand Down Expand Up @@ -290,3 +299,51 @@ gemini autosomal_dominant \
test.auto_dom.no_parents.5.db &> obs
check obs exp
rm obs exp

###################################################################
# 17. Test with --families
###################################################################
echo " auto_dom.t17...\c"
echo "family_id family_members family_genotypes family_genotype_depths gene chrom start end ref alt impact impact_severity
3 3_dad(father; affected),3_mom(mother; unknown),3_kid(child; affected) C/T,C/C,C/T 39,29,24 ASAH2C chr10 48003991 48003992 C T non_syn_coding MED
3 3_dad(father; affected),3_mom(mother; unknown),3_kid(child; affected) C/T,C/C,C/T 39,29,24 ASAH2C chr10 48004991 48004992 C T non_syn_coding MED
2 2_dad(father; unaffected),2_mom(mother; affected),2_kid(child; affected) C/C,C/T,C/T 39,29,24 ASAH2C chr10 48003991 48003992 C T non_syn_coding MED
2 2_dad(father; unaffected),2_mom(mother; affected),2_kid(child; affected) C/C,C/T,C/T 39,29,24 ASAH2C chr10 48004991 48004992 C T non_syn_coding MED
3 3_dad(father; affected),3_mom(mother; unknown),3_kid(child; affected) G/A,G/G,G/A 39,29,24 SPRN chr10 135336655 135336656 G A intron LOW
2 2_dad(father; unaffected),2_mom(mother; affected),2_kid(child; affected) T/T,T/C,T/C 39,29,24 WDR37 chr10 1142207 1142208 T C stop_loss HIGH
3 3_dad(father; affected),3_mom(mother; unknown),3_kid(child; affected) T/C,T/T,T/C 39,29,24 WDR37 chr10 1142207 1142208 T C stop_loss HIGH" > exp
gemini autosomal_dominant \
--columns "gene, chrom, start, end, ref, alt, impact, impact_severity" \
--families 2,3 \
test.auto_dom.db &> obs
check obs exp
rm obs exp

###################################################################
# 18. Test with --families
###################################################################
echo " auto_dom.t18...\c"
echo "family_id family_members family_genotypes family_genotype_depths gene chrom start end ref alt impact impact_severity
3 3_dad(father; affected),3_mom(mother; unknown),3_kid(child; affected) C/T,C/C,C/T 39,29,24 ASAH2C chr10 48003991 48003992 C T non_syn_coding MED
3 3_dad(father; affected),3_mom(mother; unknown),3_kid(child; affected) C/T,C/C,C/T 39,29,24 ASAH2C chr10 48004991 48004992 C T non_syn_coding MED
3 3_dad(father; affected),3_mom(mother; unknown),3_kid(child; affected) G/A,G/G,G/A 39,29,24 SPRN chr10 135336655 135336656 G A intron LOW
3 3_dad(father; affected),3_mom(mother; unknown),3_kid(child; affected) T/C,T/T,T/C 39,29,24 WDR37 chr10 1142207 1142208 T C stop_loss HIGH" > exp
gemini autosomal_dominant \
--columns "gene, chrom, start, end, ref, alt, impact, impact_severity" \
--families 3 \
test.auto_dom.db &> obs
check obs exp
rm obs exp

###################################################################
# 19. Test with --families but not valid
###################################################################
echo " auto_dom.t19...\c"
echo "WARNING: Unable to identify at least one affected individual for family (3). Consequently, GEMINI will not screen for variants in this family.
family_id family_members family_genotypes family_genotype_depths gene chrom start end ref alt impact impact_severity" > exp
gemini autosomal_dominant \
--columns "gene, chrom, start, end, ref, alt, impact, impact_severity" \
--families 3 \
test.auto_dom.no_parents.5.db &> obs
check obs exp
rm obs exp
20 changes: 20 additions & 0 deletions test/test-auto-rec.sh
Original file line number Diff line number Diff line change
@@ -1,3 +1,12 @@
check()
{
if diff $1 $2; then
echo ok
else
echo fail
fi
}
export -f check
###################################################################
# 1. Test basic auto_recessive functionality
###################################################################
Expand Down Expand Up @@ -267,5 +276,16 @@ gemini autosomal_recessive \
check obs exp
rm obs exp

###################################################################
# 17. Test with --families
###################################################################
echo " auto_rec.t17...\c"
echo "family_id family_members family_genotypes family_genotype_depths gene chrom start end ref alt impact impact_severity
3 3_dad(father; unaffected),3_mom(mother; unaffected),3_kid(child; unknown) T/C,T/C,C/C 39,29,24 SYCE1 chr10 135369531 135369532 T C non_syn_coding MED" > exp
gemini autosomal_recessive \
--columns "gene, chrom, start, end, ref, alt, impact, impact_severity" \
--families 3 test.auto_rec.no_parents.5.db &> obs
check obs exp
rm obs exp


41 changes: 41 additions & 0 deletions test/test-comphet.sh
Original file line number Diff line number Diff line change
@@ -1,3 +1,12 @@
check()
{
if diff $1 $2; then
echo ok
else
echo fail
fi
}
export -f check
###############################################################################
# 1. Test basic comp-het functionality (for phased genotypes)
###############################################################################
Expand Down Expand Up @@ -42,6 +51,38 @@ gemini comp_hets --ignore-phasing \
check obs exp
rm obs exp

###############################################################################
# 4. Test basic comp-het functionality with --families
###############################################################################
echo " comp_het.t4...\c"
echo "family sample comp_het_id chrom start end gene alt
3 dad_3 1 chr1 17362 17366 WASH7P T
3 dad_3 1 chr1 17729 17730 WASH7P A
3 child_3 2 chr1 17362 17366 WASH7P T
3 child_3 2 chr1 17729 17730 WASH7P A" > exp
gemini comp_hets \
--ignore-phasing \
--columns "chrom, start, end" \
--families 3 \
test.comp_het_default.2.db > obs
check obs exp
rm obs exp

###############################################################################
# 5. Test basic comp-het functionality with --families and --only-affected
###############################################################################
echo " comp_het.t5...\c"
echo "family sample comp_het_id chrom start end gene alt
3 child_3 1 chr1 17362 17366 WASH7P T
3 child_3 1 chr1 17729 17730 WASH7P A" > exp
gemini comp_hets \
--ignore-phasing \
--columns "chrom, start, end" \
--families 3 \
--only-affected \
test.comp_het_default.2.db > obs
check obs exp
rm obs exp



Expand Down

0 comments on commit bd24731

Please sign in to comment.