# Loeys-Dietz syndrome
Loeys-Dietz syndrome (LDS) is an autosomal dominant aortic aneurysm syndrome characterized by multisystemic involvement. The most typical clinical triad includes hypertelorism, bifid uvula or cleft palate and aortic aneurysm with tortuosity. Affected individuals may expereince aortic dissection at smaller aortic diameter and arterial aneurysms throughout the arterial tree. The genetic cause is heterogeneous and includes mutations in genes encoding for components of the transforming growth factor beta (TGFβ) signalling pathway: TGFBR1, TGFBR2, SMAD2, SMAD3, TGFB2 and TGFB3 (See [Velchev JD, et al. (2021). Loeys-Dietz Syndrome. Adv Exp Med Biol](https://pubmed.ncbi.nlm.nih.gov/34807423/)).

This notebook will explore whether there are significant differences in phenotypic features between different genetic forms of LDS. We will include TGFBR1 and TGFBR2 (which were the first forms of LDS to be identified) with 

In [1]:
import gpsea
import hpotk

store = hpotk.configure_ontology_store()
hpo = store.load_minimal_hpo()
print(f'Loaded HPO v{hpo.version}')
print(f"Using genophenocorr version {gpsea.__version__}")

Loaded HPO v2024-12-12
Using genophenocorr version 0.9.1


## LDS1

Loeys Dietz syndrome 1 (LDS1) is caused by mutation in the TGFBR1 gene.

In [2]:
from ppktstore.registry import configure_phenopacket_registry

tgfbr1_symbol = 'TGFBR1'
tgfbr1_mane_tx_id = 'NM_004612.4'
tgfbr1_mane_protein_id = 'NP_004603.1' # TGF-beta receptor type-1 isoform 1 precursor"
lds1_disease_id = "OMIM:609192"


phenopacket_registry = configure_phenopacket_registry()
with phenopacket_registry.open_phenopacket_store("0.1.23") as ps:
    lds1_phenopackets = tuple(ps.iter_cohort_phenopackets(tgfbr1_symbol))
tgfbr1_len = len(lds1_phenopackets)
print(f"{len(lds1_phenopackets)} LDS1 phenopackets")

41 LDS1 phenopackets


# LDS2
[Loeys-Dietz syndrome 2 (OMIM:610168)](https://omim.org/entry/610168) is caused by heterozygous mutation in the TGFBR2 gene.

In [3]:
tgfbr2_symbol = 'TGFBR2'
tgfbr2_mane_tx_id = 'NM_003242.6'
tgfbr2_mane_protein_id = 'NP_003233.4' # TGF-beta receptor type-2 isoform B precursor
lds2_disease_id = "OMIM:610168"

from ppktstore.registry import configure_phenopacket_registry
phenopacket_registry = configure_phenopacket_registry()
with phenopacket_registry.open_phenopacket_store("0.1.23") as ps:
    lds2_phenopackets = tuple(ps.iter_cohort_phenopackets(tgfbr2_symbol))
print(f"{len(lds2_phenopackets)} LDS2 phenopackets")

53 LDS2 phenopackets


# LDS3
[Loeys-Dietz syndrome-3 (LDS3)](https://omim.org/entry/613795) is caused by heterozygous mutation in the SMAD3 gene.


In [4]:
smad3_symbol = 'SMAD3'
smad3_mane_tx_id = 'NM_005902.4'
smad3_mane_protein_id = 'NP_005893.1' # mothers against decapentaplegic homolog 3
lds3_disease_id = "OMIM:613795"

from ppktstore.registry import configure_phenopacket_registry
phenopacket_registry = configure_phenopacket_registry()
with phenopacket_registry.open_phenopacket_store("0.1.23") as ps:
    lds3_phenopackets = tuple(ps.iter_cohort_phenopackets(smad3_symbol))

print(f"{len(lds3_phenopackets)} LDS3 phenopackets")

49 LDS3 phenopackets


# LDS4
[Loeys-Dietz syndrome-4 (LDS4)](https://omim.org/entry/614816) is caused by heterozygous mutation in the TGFB2 gene.

In [5]:
tgfb2_symbol = 'TGFB2'
tgfb2_mane_tx_id = 'NM_003238.6'
tgfb2_mane_protein_id = 'NP_003229.1' # transforming growth factor beta-2 proprotein isoform 2 preproprotein
lds4_disease_id = "OMIM:614816"

from ppktstore.registry import configure_phenopacket_registry
phenopacket_registry = configure_phenopacket_registry()
with phenopacket_registry.open_phenopacket_store("0.1.20") as ps:
    lds4_phenopackets = tuple(ps.iter_cohort_phenopackets(tgfb2_symbol))

print(f"{len(lds4_phenopackets)} LDS4 phenopackets")

36 LDS4 phenopackets


# LDS5
[Loeys-Dietz syndrome-5 (LDS5)](https://omim.org/entry/615582) is caused by heterozygous mutation in the TGFB3 gene.

In [6]:
tgfb3_symbol = 'TGFB3'
tgfb3_mane_tx_id = 'NM_003239.5'
tgfb3_mane_protein_id = 'NP_003230.1' # transforming growth factor beta-3 proprotein isoform 1 preproprotein"
lds5_disease_id = "OMIM:615582"

from ppktstore.registry import configure_phenopacket_registry
phenopacket_registry = configure_phenopacket_registry()
with phenopacket_registry.open_phenopacket_store("0.1.23") as ps:
    lds5_phenopackets = tuple(ps.iter_cohort_phenopackets(tgfb3_symbol))

print(f"{len(lds5_phenopackets)} LDS5 phenopackets")

75 LDS5 phenopackets


# LDS6
[Loeys-Dietz syndrome-6 (LDS6)](https://omim.org/entry/619656) is caused by heterozygous mutation in the SMAD2 gene 

In [7]:
smad2_symbol = 'SMAD2'
smad2_mane_tx_id = 'NM_005901.6'
smad2_mane_protein_id = 'NP_005892.1' # mothers against decapentaplegic homolog 2 isoform 1

lds6_disease_id = "OMIM:619656"

from ppktstore.registry import configure_phenopacket_registry
phenopacket_registry = configure_phenopacket_registry()
with phenopacket_registry.open_phenopacket_store("0.1.20") as ps:
    lds6_phenopackets = tuple(ps.iter_cohort_phenopackets(smad2_symbol))

print(f"{len(lds6_phenopackets)} LDS6 phenopackets")

16 LDS6 phenopackets


# Create cohort
We create a cohort with phenopackets representing individuals with all six forms of LDS.


In [23]:
from gpsea.preprocessing import configure_caching_cohort_creator, load_phenopackets

lds_phenopackets = list()
lds_phenopackets.extend(lds1_phenopackets)
lds_phenopackets.extend(lds2_phenopackets)
lds_phenopackets.extend(lds3_phenopackets)
lds_phenopackets.extend(lds4_phenopackets)
lds_phenopackets.extend(lds5_phenopackets)
lds_phenopackets.extend(lds6_phenopackets)


cohort_creator = configure_caching_cohort_creator(hpo)
lds_cohort, validation = load_phenopackets(
    phenopackets=lds_phenopackets, 
    cohort_creator=cohort_creator,
)

validation.summarize()

Individuals Processed: 100%|██████████| 270/270 [00:50<00:00,  5.37individuals/s]
Validated under permissive policy


In [24]:
from gpsea.analysis.pcats import configure_hpo_term_analysis
from gpsea.analysis.clf import prepare_classifiers_for_terms_of_interest

analysis = configure_hpo_term_analysis(hpo)

pheno_clfs = prepare_classifiers_for_terms_of_interest(
    cohort=lds_cohort,
    hpo=hpo,
)

In [25]:
from gpsea.analysis.clf import diagnosis_classifier
from gpsea.view import MtcStatsViewer


lds_1_2_disease_clf = diagnosis_classifier(
    diagnoses=(lds1_disease_id, lds2_disease_id),
    labels=('LDS1', 'LDS2'),
)
lds_1_2_result = analysis.compare_genotype_vs_phenotypes(
    cohort=lds_cohort,
    gt_clf=lds_1_2_disease_clf,
    pheno_clfs=pheno_clfs,
)

viewer = MtcStatsViewer()
viewer.process(lds_1_2_result)

Code,Reason,Count
HMF01,Skipping term with maximum frequency that was less than threshold 0.4,131
HMF08,Skipping general term,76
HMF09,Skipping term with maximum annotation frequency that was less than threshold 0.4,249


In [26]:
from gpsea.view import summarize_hpo_analysis

summarize_hpo_analysis(hpo=hpo, result=lds_1_2_result)

Diagnosis,OMIM:609192,OMIM:609192,OMIM:610168,OMIM:610168,Unnamed: 5_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p values


# LDS1 vs LDS3

In [13]:
lds1_and_lds3_phenopackets = list()
lds1_and_lds3_phenopackets.extend(lds1_phenopackets)
lds1_and_lds3_phenopackets.extend(lds3_phenopackets)
print(f"Got {len(lds1_and_lds3_phenopackets)} LDS1 and LDS3 phenopackets")

cohort_creator = configure_caching_cohort_creator(hpo)
lds1_lds3_cohort, validation = load_phenopackets(
    phenopackets=lds1_and_lds3_phenopackets, 
    cohort_creator=cohort_creator,
)

validation.summarize()

Got 90 LDS1 and LDS3 phenopackets
Individuals Processed: 100%|██████████| 90/90 [00:22<00:00,  4.09individuals/s] 
Validated under permissive policy


In [31]:
lds1_lds3_clfs = prepare_classifiers_for_terms_of_interest(
    cohort=lds1_lds3_cohort,
    hpo=hpo,
)

lds_1_3_disease_clf = diagnosis_classifier (
    diagnoses=(lds1_disease_id, lds3_disease_id),
    labels=('LDS1', 'LDS3'),
)

lds1_3_result = analysis.compare_genotype_vs_phenotypes(
    cohort=lds_cohort,
    gt_clf=lds_1_3_disease_clf,
    pheno_clfs=pheno_clfs,
)

viewer = MtcStatsViewer()
viewer.process(lds1_3_result)

Code,Reason,Count
HMF01,Skipping term with maximum frequency that was less than threshold 0.4,207
HMF08,Skipping general term,76
HMF09,Skipping term with maximum annotation frequency that was less than threshold 0.4,173


In [29]:
summarize_hpo_analysis(hpo=hpo, result=lds1_3_result)

Diagnosis,OMIM:609192,OMIM:609192,OMIM:613795,OMIM:613795,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,Corrected p values,p values
Osteoarthritis [HP:0002758],0/11,0%,26/38,68%,0.000975,4.6e-05
Scoliosis [HP:0002650],18/21,86%,20/43,47%,0.023527,0.003015
Aortic aneurysm [HP:0004942],11/11,100%,26/48,54%,0.023527,0.004327
Hypertelorism [HP:0000316],15/19,79%,13/35,37%,0.023527,0.004481
Joint hypermobility [HP:0001382],12/19,63%,12/36,33%,0.197692,0.04707
High palate [HP:0000218],12/16,75%,12/20,60%,1.0,0.481498
Inguinal hernia [HP:0000023],6/14,43%,12/39,31%,1.0,0.514672
Arterial tortuosity [HP:0005116],9/17,53%,11/26,42%,1.0,0.545009
Disproportionate tall stature [HP:0001519],9/17,53%,8/19,42%,1.0,0.738795
Abnormal oral cavity morphology [HP:0000163],17/17,100%,24/24,100%,1.0,1.0


# LDS3 vs LDS6
(SMAD3 and SMAD2)

In [17]:
lds3_and_lds6_phenopackets = list()
lds3_and_lds6_phenopackets.extend(lds3_phenopackets)
lds3_and_lds6_phenopackets.extend(lds6_phenopackets)
print(f"Got {len(lds3_and_lds6_phenopackets)} LDS3 and LDS6 phenopackets")

cohort_creator = configure_caching_cohort_creator(hpo)
lds3_lds6_cohort, validation = load_phenopackets(
    phenopackets=lds3_and_lds6_phenopackets, 
    cohort_creator=cohort_creator,
)

validation.summarize()

Got 65 LDS3 and LDS6 phenopackets
Individuals Processed: 100%|██████████| 65/65 [00:12<00:00,  5.25individuals/s]
Validated under permissive policy


In [18]:
lds3_lds6_pheno_clfs = prepare_classifiers_for_terms_of_interest(
    cohort=lds3_lds6_cohort,
    hpo=hpo,
)

lds_3_6_disease_clf = diagnosis_classifier(
    diagnoses=(lds3_disease_id, lds6_disease_id),
    labels=('LDS3', 'LDS6'),
)
lds3_6_result = analysis.compare_genotype_vs_phenotypes(
    cohort=lds3_lds6_cohort,
    gt_clf=lds_3_6_disease_clf,
    pheno_clfs=lds3_lds6_pheno_clfs,
)

viewer = MtcStatsViewer()
viewer.process(lds3_6_result)

Code,Reason,Count
HMF01,Skipping term with maximum frequency that was less than threshold 0.4,21
HMF03,Skipping term because of a child term with the same individual counts,4
HMF08,Skipping general term,49
HMF09,Skipping term with maximum annotation frequency that was less than threshold 0.4,71


In [19]:
summarize_hpo_analysis(hpo=hpo, result=lds3_6_result)

Diagnosis,OMIM:613795,OMIM:613795,OMIM:619656,OMIM:619656,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,Corrected p values,p values
Thoracic aortic aneurysm [HP:0012727],0/22,0%,10/15,67%,0.000181,9e-06
Aortic aneurysm [HP:0004942],26/48,54%,10/10,100%,0.095422,0.009088
Soft skin [HP:0000977],23/37,62%,0/4,0%,0.211514,0.030216
Varicose veins [HP:0002619],14/22,64%,4/12,33%,0.737449,0.151415
High palate [HP:0000218],12/20,60%,5/15,33%,0.737449,0.175583
Arterial tortuosity [HP:0005116],11/26,42%,1/1,100%,1.0,0.444444
Inguinal hernia [HP:0000023],12/39,31%,5/11,45%,1.0,0.475054
Umbilical hernia [HP:0001537],12/39,31%,2/4,50%,1.0,0.585479
Osteoarthritis [HP:0002758],26/38,68%,2/2,100%,1.0,1.0
Arthritis [HP:0001369],26/26,100%,2/2,100%,1.0,1.0


# LDS2 vs LDS6

In [20]:
lds1_and_lds6_phenopackets = list()
lds1_and_lds6_phenopackets.extend(lds1_phenopackets)
lds1_and_lds6_phenopackets.extend(lds6_phenopackets)
print(f"Got {len(lds1_and_lds6_phenopackets)} LDS1 and LDS6 phenopackets")

cohort_creator = configure_caching_cohort_creator(hpo)
lds1_lds6_cohort, validation = load_phenopackets(
    phenopackets=lds1_and_lds6_phenopackets, 
    cohort_creator=cohort_creator,
)

validation.summarize()

Got 57 LDS1 and LDS6 phenopackets
Individuals Processed: 100%|██████████| 57/57 [00:00<00:00, 377.55individuals/s]
Validated under permissive policy


In [21]:
lds1_lds6_pheno_clfs = prepare_classifiers_for_terms_of_interest(
    cohort=lds1_lds6_cohort,
    hpo=hpo,
)

lds_1_6_disease_clf = diagnosis_classifier (
    diagnoses=(lds1_disease_id, lds6_disease_id),
    labels=('LDS1', 'LDS6'),
)

lds1_6_result = analysis.compare_genotype_vs_phenotypes(
    cohort=lds1_lds6_cohort,
    gt_clf=lds_1_6_disease_clf,
    pheno_clfs=lds1_lds6_pheno_clfs,
)

viewer = MtcStatsViewer()
viewer.process(lds1_6_result)

Code,Reason,Count
HMF01,Skipping term with maximum frequency that was less than threshold 0.4,46
HMF03,Skipping term because of a child term with the same individual counts,2
HMF08,Skipping general term,63
HMF09,Skipping term with maximum annotation frequency that was less than threshold 0.4,157


In [22]:
summarize_hpo_analysis(hpo=hpo, result=lds1_6_result)

Diagnosis,OMIM:609192,OMIM:609192,OMIM:619656,OMIM:619656,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,Corrected p values,p values
Dolichocephaly [HP:0000268],12/13,92%,3/11,27%,0.035671,0.002229
Hypertelorism [HP:0000316],15/19,79%,4/14,29%,0.045333,0.005667
Scoliosis [HP:0002650],18/21,86%,5/12,42%,0.087277,0.016364
High palate [HP:0000218],12/16,75%,5/15,33%,0.112478,0.031952
Pes planus [HP:0001763],11/14,79%,3/10,30%,0.112478,0.035149
Thoracic aortic aneurysm [HP:0012727],11/11,100%,10/15,67%,0.140468,0.052676
Joint hypermobility [HP:0001382],12/19,63%,1/5,20%,0.324756,0.142081
Abnormal oral cavity morphology [HP:0000163],17/17,100%,6/6,100%,1.0,1.0
Abnormal palate morphology [HP:0000174],17/17,100%,6/6,100%,1.0,1.0
Abnormal axial skeleton morphology [HP:0009121],21/21,100%,9/9,100%,1.0,1.0


# Summary

In [None]:
from gpseacs.report import GpseaAnalysisReport, GPAnalysisResultSummary

f_results = (
  GPAnalysisResultSummary.from_multi( result=lds_1_2_result,  ),
)


caption = """."""
report = GpseaAnalysisReport(name="Kabuki", 
                             cohort=kabuki_1_and_2_cohort, 
                             fet_results=f_results,
                             gene_symbol="n/a",
                             mane_tx_id="n/a",
                             mane_protein_id="n/a",
                             caption=caption)