# Loeys-Dietz syndrome
Loeys-Dietz syndrome (LDS) is an autosomal dominant aortic aneurysm syndrome characterized by multisystemic involvement. The most typical clinical triad includes hypertelorism, bifid uvula or cleft palate and aortic aneurysm with tortuosity. Affected individuals may expereince aortic dissection at smaller aortic diameter and arterial aneurysms throughout the arterial tree. The genetic cause is heterogeneous and includes mutations in genes encoding for components of the transforming growth factor beta (TGFβ) signalling pathway: TGFBR1, TGFBR2, SMAD2, SMAD3, TGFB2 and TGFB3 (See [Velchev JD, et al. (2021). Loeys-Dietz Syndrome. Adv Exp Med Biol](https://pubmed.ncbi.nlm.nih.gov/34807423/)).

This notebook will explore whether there are significant differences in phenotypic features between different genetic forms of LDS. We will include TGFBR1 and TGFBR2 (which were the first forms of LDS to be identified) with 

In [15]:
import gpsea
import hpotk
from pyphetools.visualization import PhenopacketIngestor

store = hpotk.configure_ontology_store()
hpo = store.load_minimal_hpo(release='v2023-10-09')
print(f'Loaded HPO v{hpo.version}')
print(f"Using genophenocorr version {gpsea.__version__}")

Loaded HPO v2023-10-09
Using genophenocorr version 0.5.1.dev0


## LDS1

Loeys Dietz syndrome 1 (LDS1) is caused by mutation in the TGFBR1 gene.

In [16]:
tgfbr1_symbol = 'TGFBR1'
tgfbr1_mane_tx_id = 'NM_004612.4'
tgfbr1_mane_protein_id = 'NP_004603.1' # TGF-beta receptor type-1 isoform 1 precursor"
lds1_disease_id = "OMIM:609192"

from ppktstore.registry import configure_phenopacket_registry
phenopacket_registry = configure_phenopacket_registry()
with phenopacket_registry.open_phenopacket_store("0.1.20") as ps:
    phenopackets = tuple(ps.iter_cohort_phenopackets(tgfbr1_symbol))
tgfbr1_len = len(phenopackets)
lds1_phenopackets = PhenopacketIngestor.filter_phenopackets(ppkt_list=phenopackets, disease_id=lds1_disease_id)

Returning 23 phenopackets for disease OMIM:609192, ommiting 18 phenopackets.


# LDS2
[Loeys-Dietz syndrome 2 (OMIM:610168)](https://omim.org/entry/610168) is caused by heterozygous mutation in the TGFBR2 gene.

In [17]:
tgfbr2_symbol = 'TGFBR2'
tgfbr2_mane_tx_id = 'NM_003242.6'
tgfbr2_mane_protein_id = 'NP_003233.4' # TGF-beta receptor type-2 isoform B precursor
lds2_disease_id = "OMIM:610168"

from ppktstore.registry import configure_phenopacket_registry
phenopacket_registry = configure_phenopacket_registry()
with phenopacket_registry.open_phenopacket_store("0.1.20") as ps:
    phenopackets = tuple(ps.iter_cohort_phenopackets(tgfbr2_symbol))
tgfbr2_len = len(phenopackets)
lds2_phenopackets = PhenopacketIngestor.filter_phenopackets(ppkt_list=phenopackets, disease_id=lds2_disease_id)

Returning 47 phenopackets for disease OMIM:610168, ommiting 0 phenopackets.


# LDS3
[Loeys-Dietz syndrome-3 (LDS3)](https://omim.org/entry/613795) is caused by heterozygous mutation in the SMAD3 gene.


In [18]:
smad3_symbol = 'SMAD3'
smad3_mane_tx_id = 'NM_005902.4'
smad3_mane_protein_id = 'NP_005893.1' # mothers against decapentaplegic homolog 3
lds3_disease_id = "OMIM:613795"

from ppktstore.registry import configure_phenopacket_registry
phenopacket_registry = configure_phenopacket_registry()
with phenopacket_registry.open_phenopacket_store("0.1.20") as ps:
    phenopackets = tuple(ps.iter_cohort_phenopackets(smad3_symbol))

lds3_phenopackets = PhenopacketIngestor.filter_phenopackets(ppkt_list=phenopackets, disease_id=lds3_disease_id)

Returning 49 phenopackets for disease OMIM:613795, ommiting 0 phenopackets.


# LDS4
[Loeys-Dietz syndrome-4 (LDS4)](https://omim.org/entry/614816) is caused by heterozygous mutation in the TGFB2 gene.

In [19]:
tgfb2_symbol = 'TGFB2'
tgfb2_mane_tx_id = 'NM_003238.6'
tgfb2_mane_protein_id = 'NP_003229.1' # transforming growth factor beta-2 proprotein isoform 2 preproprotein
lds4_disease_id = "OMIM:614816"

from ppktstore.registry import configure_phenopacket_registry
phenopacket_registry = configure_phenopacket_registry()
with phenopacket_registry.open_phenopacket_store("0.1.20") as ps:
    phenopackets = tuple(ps.iter_cohort_phenopackets(tgfb2_symbol))

lds4_phenopackets = PhenopacketIngestor.filter_phenopackets(ppkt_list=phenopackets, disease_id=lds4_disease_id)

Returning 36 phenopackets for disease OMIM:614816, ommiting 0 phenopackets.


# LDS5
[Loeys-Dietz syndrome-5 (LDS5)](https://omim.org/entry/615582) is caused by heterozygous mutation in the TGFB3 gene.

In [20]:
tgfb3_symbol = 'TGFB3'
tgfb3_mane_tx_id = 'NM_003239.5'
tgfb3_mane_protein_id = 'NP_003230.1' # transforming growth factor beta-3 proprotein isoform 1 preproprotein"
lds5_disease_id = "OMIM:615582"

from ppktstore.registry import configure_phenopacket_registry
phenopacket_registry = configure_phenopacket_registry()
with phenopacket_registry.open_phenopacket_store("0.1.20") as ps:
    phenopackets = tuple(ps.iter_cohort_phenopackets(tgfb3_symbol))

lds5_phenopackets = PhenopacketIngestor.filter_phenopackets(ppkt_list=phenopackets, disease_id=lds5_disease_id)

Returning 43 phenopackets for disease OMIM:615582, ommiting 0 phenopackets.


# LDS6
[Loeys-Dietz syndrome-6 (LDS6)](https://omim.org/entry/619656) is caused by heterozygous mutation in the SMAD2 gene 

In [21]:
smad2_symbol = 'SMAD2'
smad2_mane_tx_id = 'NM_005901.6'
smad2_mane_protein_id = 'NP_005892.1' # mothers against decapentaplegic homolog 2 isoform 1

lds6_disease_id = "OMIM:619656"

from ppktstore.registry import configure_phenopacket_registry
phenopacket_registry = configure_phenopacket_registry()
with phenopacket_registry.open_phenopacket_store("0.1.20") as ps:
    phenopackets = tuple(ps.iter_cohort_phenopackets(smad2_symbol))

lds6_phenopackets = PhenopacketIngestor.filter_phenopackets(ppkt_list=phenopackets, disease_id=lds6_disease_id)

Returning 16 phenopackets for disease OMIM:619656, ommiting 0 phenopackets.


# LDS1 vs LDS2
Here, we search for significant differences between LDS1 and LDS2

In [23]:
from ppktstore.registry import configure_phenopacket_registry
from gpsea.preprocessing import configure_caching_cohort_creator, load_phenopackets

lds1_and_lds2_phenopackets = list()
lds1_and_lds2_phenopackets.extend(lds1_phenopackets)
lds1_and_lds2_phenopackets.extend(lds2_phenopackets)
print(f"")

cohort_creator = configure_caching_cohort_creator(hpo)
lds1_lds2_cohort, validation = load_phenopackets(
    phenopackets=lds1_and_lds2_phenopackets, 
    cohort_creator=cohort_creator,
)

validation.summarize()


Individuals Processed: 100%|██████████| 70/70 [01:16<00:00,  1.10s/individuals]
Validated under permissive policy


In [24]:
from gpsea.analysis.predicate.phenotype import prepare_predicates_for_terms_of_interest

from gpsea.analysis.mtc_filter import HpoMtcFilter
mtc_filter = HpoMtcFilter.default_filter(
    hpo=hpo,
)
mtc_correction = 'fdr_bh'
mtc_alpha = 0.05
from gpsea.analysis.pcats.stats import FisherExactTest

count_statistic = FisherExactTest()
from gpsea.analysis.pcats import HpoTermAnalysis

analysis = HpoTermAnalysis(
    count_statistic=count_statistic,
    mtc_filter=mtc_filter,
    mtc_correction=mtc_correction,
    mtc_alpha=mtc_alpha,
)

In [30]:
from gpsea.analysis.predicate.genotype import diagnosis_predicate
from gpsea.view import MtcStatsViewer

lds1_lds2_pheno_predicates = prepare_predicates_for_terms_of_interest(
    cohort=lds1_lds2_cohort,
    hpo=hpo,
    missing_implies_excluded=False,
    min_n_of_patients_with_term=2,
)
len(f"Total of {len(lds1_lds2_pheno_predicates)} LDS1/LDS2pheno_predicates")

lds_1_2_disease_predicate = diagnosis_predicate(
    diagnoses=(lds1_disease_id, lds2_disease_id),
    labels=('LDS1', 'LDS2'),
)
print(lds_1_2_disease_predicate.display_question())
result = analysis.compare_genotype_vs_phenotypes(
    cohort=lds1_lds2_cohort,
    gt_predicate=lds_1_2_disease_predicate,
    pheno_predicates=lds1_lds2_pheno_predicates,
)


viewer = MtcStatsViewer()
viewer.process(result)

What disease was diagnosed: OMIM:609192, OMIM:610168


Code,Reason,Count
HMF01,Skipping term with maximum frequency that was less than threshold 0.2,10
HMF03,Skipping term because of a child term with the same individual counts,11
HMF05,Skipping term because one genotype had zero observations,1
HMF08,Skipping general term,61
HMF09,Skipping term with maximum annotation frequency that was less than threshold 0.25,132


In [31]:
from gpsea.view import summarize_hpo_analysis

summarize_hpo_analysis(hpo=hpo, result=result)

What disease was diagnosed,OMIM:609192,OMIM:609192,OMIM:610168,OMIM:610168,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,Corrected p values,p values
Dolichocephaly [HP:0000268],12/13,92%,1/13,8%,0.002844,0.000033
Joint contracture [HP:0034392],3/14,21%,4/4,100%,0.497549,0.011438
Epicanthus [HP:0000286],3/13,23%,19/30,63%,0.629165,0.021695
Talipes equinovarus [HP:0001762],2/17,12%,11/24,46%,0.850727,0.039114
Bifid uvula [HP:0000193],4/16,25%,20/36,56%,1.000000,0.069543
...,...,...,...,...,...,...
Abnormal sternum morphology [HP:0000766],9/9,100%,17/17,100%,1.000000,1.000000
Abnormal thorax morphology [HP:0000765],9/9,100%,17/17,100%,1.000000,1.000000
Abnormal brain morphology [HP:0012443],1/1,100%,19/19,100%,1.000000,1.000000
Downslanted palpebral fissures [HP:0000494],1/2,50%,7/20,35%,1.000000,1.000000


# LDS1 vs LDS3

In [28]:
from ppktstore.registry import configure_phenopacket_registry
from gpsea.preprocessing import configure_caching_cohort_creator, load_phenopackets

lds1_and_lds3_phenopackets = list()
lds1_and_lds3_phenopackets.extend(lds1_phenopackets)
lds1_and_lds3_phenopackets.extend(lds3_phenopackets)
print(f"Got {len(lds1_and_lds3_phenopackets)} LDS1 and LDS3 phenopackets")

cohort_creator = configure_caching_cohort_creator(hpo)
lds1_lds3_cohort, validation = load_phenopackets(
    phenopackets=lds1_and_lds3_phenopackets, 
    cohort_creator=cohort_creator,
)

validation.summarize()

Got 72 LDS1 and LDS3 phenopackets
Individuals Processed: 100%|██████████| 72/72 [00:13<00:00,  5.19individuals/s] 
Validated under permissive policy


In [32]:
lds1_lds3_pheno_predicates = prepare_predicates_for_terms_of_interest(
    cohort=lds1_lds3_cohort,
    hpo=hpo,
    missing_implies_excluded=False,
    min_n_of_patients_with_term=2,
)
len(f"Total of {len(lds1_lds3_pheno_predicates)} LDS1/LDS3pheno_predicates")

lds_1_3_disease_predicate = diagnosis_predicate(
    diagnoses=(lds1_disease_id, lds3_disease_id),
    labels=('LDS1', 'LDS3'),
)
print(lds_1_3_disease_predicate.display_question())
lds1_3_result = analysis.compare_genotype_vs_phenotypes(
    cohort=lds1_lds3_cohort,
    gt_predicate=lds_1_3_disease_predicate,
    pheno_predicates=lds1_lds3_pheno_predicates,
)


viewer = MtcStatsViewer()
viewer.process(lds1_3_result)

What disease was diagnosed: OMIM:609192, OMIM:613795


Code,Reason,Count
HMF01,Skipping term with maximum frequency that was less than threshold 0.2,18
HMF02,Skipping term because no genotype has more than one observed HPO count,1
HMF03,Skipping term because of a child term with the same individual counts,6
HMF05,Skipping term because one genotype had zero observations,7
HMF08,Skipping general term,57
HMF09,Skipping term with maximum annotation frequency that was less than threshold 0.25,117


In [33]:
summarize_hpo_analysis(hpo=hpo, result=lds1_3_result)

What disease was diagnosed,OMIM:609192,OMIM:609192,OMIM:613795,OMIM:613795,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,Corrected p values,p values
Aortic root aneurysm [HP:0002616],11/11,100%,0/22,0%,1.963452e-07,5.166978e-09
Osteoarthritis [HP:0002758],0/11,0%,26/38,68%,0.0008817118,4.640589e-05
Abnormal sternum morphology [HP:0000766],9/9,100%,6/20,30%,0.008862236,0.0006996502
Craniosynostosis [HP:0001363],4/8,50%,0/27,0%,0.01270053,0.001336898
Scoliosis [HP:0002650],18/21,86%,20/43,47%,0.01892136,0.003014785
Varicose veins [HP:0002619],1/12,8%,14/22,64%,0.01892136,0.003143993
Pes planus [HP:0001763],11/14,79%,4/17,24%,0.01892136,0.003847595
Aortic aneurysm [HP:0004942],11/11,100%,26/48,54%,0.01892136,0.004326643
Hypertelorism [HP:0000316],15/19,79%,13/35,37%,0.01892136,0.004481374
Pectus carinatum [HP:0000768],7/18,39%,0/14,0%,0.03980349,0.0104746


# LDS3 vs LDS6
(SMAD3 and SMA2)

In [34]:
lds3_and_lds6_phenopackets = list()
lds3_and_lds6_phenopackets.extend(lds3_phenopackets)
lds3_and_lds6_phenopackets.extend(lds6_phenopackets)
print(f"Got {len(lds3_and_lds6_phenopackets)} LDS3 and LDS6 phenopackets")

cohort_creator = configure_caching_cohort_creator(hpo)
lds3_lds6_cohort, validation = load_phenopackets(
    phenopackets=lds3_and_lds6_phenopackets, 
    cohort_creator=cohort_creator,
)

validation.summarize()

Got 65 LDS3 and LDS6 phenopackets
Individuals Processed: 100%|██████████| 65/65 [00:00<00:00, 615.11individuals/s]
Validated under permissive policy


In [35]:
lds3_lds6_pheno_predicates = prepare_predicates_for_terms_of_interest(
    cohort=lds3_lds6_cohort,
    hpo=hpo,
    missing_implies_excluded=False,
    min_n_of_patients_with_term=2,
)
len(f"Total of {len(lds3_lds6_pheno_predicates)} LDS3/LDS6pheno_predicates")

lds_3_6_disease_predicate = diagnosis_predicate(
    diagnoses=(lds3_disease_id, lds6_disease_id),
    labels=('LDS3', 'LDS6'),
)
print(lds_3_6_disease_predicate.display_question())
lds3_6_result = analysis.compare_genotype_vs_phenotypes(
    cohort=lds3_lds6_cohort,
    gt_predicate=lds_3_6_disease_predicate,
    pheno_predicates=lds3_lds6_pheno_predicates,
)


viewer = MtcStatsViewer()
viewer.process(lds3_6_result)

What disease was diagnosed: OMIM:613795, OMIM:619656


Code,Reason,Count
HMF01,Skipping term with maximum frequency that was less than threshold 0.2,8
HMF03,Skipping term because of a child term with the same individual counts,6
HMF08,Skipping general term,47
HMF09,Skipping term with maximum annotation frequency that was less than threshold 0.25,58


In [36]:
summarize_hpo_analysis(hpo=hpo, result=lds3_6_result)

What disease was diagnosed,OMIM:613795,OMIM:613795,OMIM:619656,OMIM:619656,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,Corrected p values,p values
Thoracic aortic aneurysm [HP:0012727],0/22,0%,10/15,67%,0.000319,9e-06
Aortic aneurysm [HP:0004942],26/48,54%,10/10,100%,0.168124,0.009088
Soft skin [HP:0000977],23/37,62%,0/4,0%,0.332516,0.030216
Tall stature [HP:0000098],8/8,100%,5/10,50%,0.332516,0.035948
Bifid uvula [HP:0000193],13/38,34%,1/14,7%,0.577972,0.078104
Osteochondritis dissecans [HP:0010886],13/18,72%,0/2,0%,0.681579,0.110526
Disproportionate tall stature [HP:0001519],8/19,42%,0/5,0%,0.689441,0.130435
Varicose veins [HP:0002619],14/22,64%,4/12,33%,0.700293,0.151415
High palate [HP:0000218],12/20,60%,5/15,33%,0.721842,0.175583
Intervertebral disc degeneration [HP:0008419],18/20,90%,3/4,75%,1.0,0.436759


# LDS2 vs LDS6

In [37]:
lds1_and_lds6_phenopackets = list()
lds1_and_lds6_phenopackets.extend(lds1_phenopackets)
lds1_and_lds6_phenopackets.extend(lds6_phenopackets)
print(f"Got {len(lds1_and_lds6_phenopackets)} LDS1 and LDS6 phenopackets")

cohort_creator = configure_caching_cohort_creator(hpo)
lds1_lds6_cohort, validation = load_phenopackets(
    phenopackets=lds1_and_lds6_phenopackets, 
    cohort_creator=cohort_creator,
)

validation.summarize()

Got 39 LDS1 and LDS6 phenopackets
Individuals Processed: 100%|██████████| 39/39 [00:00<00:00, 261.39individuals/s]
Validated under permissive policy


In [38]:
lds1_lds6_pheno_predicates = prepare_predicates_for_terms_of_interest(
    cohort=lds1_lds6_cohort,
    hpo=hpo,
    missing_implies_excluded=False,
    min_n_of_patients_with_term=2,
)
len(f"Total of {len(lds1_lds6_pheno_predicates)} LDS3/LDS6pheno_predicates")

lds_1_6_disease_predicate = diagnosis_predicate(
    diagnoses=(lds1_disease_id, lds6_disease_id),
    labels=('LDS1', 'LDS6'),
)
print(lds_1_6_disease_predicate.display_question())
lds1_6_result = analysis.compare_genotype_vs_phenotypes(
    cohort=lds1_lds6_cohort,
    gt_predicate=lds_1_6_disease_predicate,
    pheno_predicates=lds1_lds6_pheno_predicates,
)


viewer = MtcStatsViewer()
viewer.process(lds1_6_result)

What disease was diagnosed: OMIM:609192, OMIM:619656


Code,Reason,Count
HMF01,Skipping term with maximum frequency that was less than threshold 0.2,17
HMF03,Skipping term because of a child term with the same individual counts,8
HMF05,Skipping term because one genotype had zero observations,25
HMF08,Skipping general term,58
HMF09,Skipping term with maximum annotation frequency that was less than threshold 0.25,94


In [39]:
summarize_hpo_analysis(hpo=hpo, result=lds1_6_result)

What disease was diagnosed,OMIM:609192,OMIM:609192,OMIM:619656,OMIM:619656,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,Corrected p values,p values
Aortic root aneurysm [HP:0002616],11/11,100%,0/5,0%,0.01076,0.000229
Abnormal sternum morphology [HP:0000766],9/9,100%,3/13,23%,0.010866,0.000462
Dolichocephaly [HP:0000268],12/13,92%,3/11,27%,0.034928,0.002229
Hypertelorism [HP:0000316],15/19,79%,4/14,29%,0.066583,0.005667
Osteoarthritis [HP:0002758],0/11,0%,2/2,100%,0.120513,0.012821
Scoliosis [HP:0002650],18/21,86%,5/12,42%,0.128188,0.016364
Pectus carinatum [HP:0000768],7/18,39%,0/10,0%,0.165202,0.030171
High palate [HP:0000218],12/16,75%,5/15,33%,0.165202,0.031952
Tall stature [HP:0000098],9/9,100%,5/10,50%,0.165202,0.032508
Pes planus [HP:0001763],11/14,79%,3/10,30%,0.165202,0.035149
