# Kabuki 1 & Kabuki 2 Syndrome 

[Kabuki 1 Syndrome - OMIM:147920](https://omim.org/entry/147920) can be caused by variants in [KMT2D](https://omim.org/entry/602113).
[Kabuki 2 Syndrome** - OMIM:300867](https://www.omim.org/entry/300867) is caused by variants in [KDM6A](https://www.omim.org/entry/300128)).
This notebook used GPSEA to characterize phenotypic differences between Kabuki 1 and 2.

In [1]:
import gpsea
import hpotk

store = hpotk.configure_ontology_store()
hpo = store.load_minimal_hpo()
print(f'Loaded HPO v{hpo.version}')
print(f"Using gpsea version {gpsea.__version__}")

Loaded HPO v2025-01-16
Using gpsea version 0.9.6.dev0


In [2]:
kabuk1_cohort = 'KMT2D'  # KABUK1
mane_tx_id_1 = 'NM_003482.4'
mane_px_id_1 = 'NP_003473.3' 

kabuk2_cohort = 'KDM6A' #KABUK2
mane_tx_id_2 = 'NM_001291415.2'
mane_px_id_2 = 'NP_001278344.1' 

In [3]:
from ppktstore.registry import configure_phenopacket_registry
phenopacket_registry = configure_phenopacket_registry()
with phenopacket_registry.open_phenopacket_store('0.1.23') as ps:
    kabuk1_phenopackets = tuple(ps.iter_cohort_phenopackets(kabuk1_cohort))
print(f"Extracted {len(kabuk1_phenopackets)} phenopackets for Kabuki syndrome 1")

Extracted 65 phenopackets for Kabuki syndrome 1


In [4]:
from ppktstore.registry import configure_phenopacket_registry
phenopacket_registry = configure_phenopacket_registry()
with phenopacket_registry.open_phenopacket_store('0.1.24') as ps:
    kabuk2_phenopackets = tuple(ps.iter_cohort_phenopackets(kabuk2_cohort))
print(f"Extracted {len(kabuk2_phenopackets)} phenopackets for Kabuki syndrome 2")

Extracted 81 phenopackets for Kabuki syndrome 2


## Combine cohorts
Here, we combine the phenopackets for Kabuki syndrome types 1 and 2.

In [5]:
from gpsea.preprocessing import configure_caching_cohort_creator, load_phenopackets

kabuki_phenopackets = list()
kabuki_phenopackets.extend(kabuk1_phenopackets)
kabuki_phenopackets.extend(kabuk2_phenopackets)

cohort_creator = configure_caching_cohort_creator(hpo)
kabuki_1_and_2_cohort, validation = load_phenopackets(
    phenopackets=kabuki_phenopackets, 
    cohort_creator=cohort_creator,
)

validation.summarize()

Individuals Processed: 100%|██████████| 146/146 [00:12<00:00, 11.72 individuals/s]
Validated under permissive policy
Phenopackets
  patient #23
    individual
     ·ontology_class of the time_at_last_encounter field cannot be parsed into age. Consider formatting the age as ISO8601 duration (e.g., "P31Y2M" for 31 years and 2 months)
     ·ontology_class of the time_of_death field cannot be parsed into age. Consider formatting the age as ISO8601 duration (e.g., "P31Y2M" for 31 years and 2 months)


# Genotype-Phenotype Correlation (GPC) Analysis

This notebook compares the phenotypic features of Robinow syndrome types 1 and 2.

In [6]:
from gpsea.analysis.pcats import configure_hpo_term_analysis
from gpsea.analysis.clf import prepare_classifiers_for_terms_of_interest

analysis = configure_hpo_term_analysis(hpo)

pheno_clfs = prepare_classifiers_for_terms_of_interest(
    cohort=kabuki_1_and_2_cohort,
    hpo=hpo,
)

In [7]:
from gpsea.analysis.clf import diagnosis_classifier
from gpsea.view import MtcStatsViewer

kabuki_clf = prepare_classifiers_for_terms_of_interest(
    cohort=kabuki_1_and_2_cohort,
    hpo=hpo,
)

kabuki_1_disease_id = "OMIM:147920"
kabuki_2_disease_id = "OMIM:300867"

kabuki_disease_clf = diagnosis_classifier(
    diagnoses=(kabuki_1_disease_id, kabuki_2_disease_id),
    labels=('KABUK1', 'KABUK2'),
)
kabuki_result = analysis.compare_genotype_vs_phenotypes(
    cohort=kabuki_1_and_2_cohort,
    gt_clf=kabuki_disease_clf,
    pheno_clfs=kabuki_clf,
)

viewer = MtcStatsViewer()
viewer.process(kabuki_result)

Reason,Count
Skip terms if all counts are identical to counts for a child term,4
Skipping term because one genotype had zero observations,4
"Skipping ""general"" level terms",120
Skipping terms that are rare on the cohort level (in less than 40% of the cohort members),442


In [8]:
from gpsea.view import summarize_hpo_analysis

summarize_hpo_analysis(hpo=hpo, result=kabuki_result)

Diagnosis,OMIM:147920,OMIM:300867,Corrected p values,p values
Feeding difficulties [HP:0011968],8/25 (32%),55/63 (87%),2.5e-05,7.406525e-07
Motor delay [HP:0001270],4/10 (40%),58/61 (95%),0.001776,0.0001044696
Recurrent infections [HP:0002719],30/41 (73%),22/22 (100%),0.066156,0.005837319
Neonatal hypoglycemia [HP:0001998],0/5 (0%),31/55 (56%),0.157953,0.02174398
Patent foramen ovale [HP:0001655],5/44 (11%),0/45 (0%),0.157953,0.02616405
Atrial septal defect [HP:0001631],11/20 (55%),16/61 (26%),0.157953,0.02787402
Hypotonia [HP:0001252],12/21 (57%),52/64 (81%),0.194847,0.04011551
Short stature [HP:0004322],22/31 (71%),22/46 (48%),0.25828,0.06077172
Hearing impairment [HP:0000365],22/40 (55%),8/26 (31%),0.290575,0.07691681
Seizure [HP:0001250],4/25 (16%),17/47 (36%),0.350491,0.1030855


# Summary

In [9]:
from gpseacs.report import GpseaAnalysisReport, GPAnalysisResultSummary

f_results = (
  GPAnalysisResultSummary.from_multi( result=kabuki_result,  ),
)


caption = """."""
report = GpseaAnalysisReport(name="Kabuki", 
                             cohort=kabuki_1_and_2_cohort, 
                             fet_results=f_results,
                             gene_symbol="n/a",
                             mane_tx_id="n/a",
                             mane_protein_id="n/a",
                             caption=caption)

In [10]:
from gpseacs.report import GpseaNotebookSummarizer
summarizer = GpseaNotebookSummarizer(hpo=hpo, gpsea_version=gpsea.__version__)
summarizer.summarize_report(report=report)

HPO Term,OMIM:147920,OMIM:300867,p-val,adj. p-val
Feeding difficulties [HP:0011968],8/25 (32%),55/63 (87%),7.41e-07,2.52e-05
Motor delay [HP:0001270],4/10 (40%),58/61 (95%),0.000104,0.002


In [11]:
summarizer.process_latex(report=report)

Output to ../../supplement/tex/Kabuki_summary_draft.tex
