# POGZ and Associated Syndrome

<!-- Provide a brief overview of the syndrome and its genetic basis. Replace [Gene Name] and [Syndrome Name] with the specific gene and syndrome you are studying. -->

[White-Sutton syndrome](https://omim.org/entry/616364) is a neurodevelopmental disorder caused by pathogenic variants in the [POGZ](https://omim.org/entry/614787) gene.

<!-- Summarize key findings from previous studies on genotype-phenotype correlations in the syndrome. -->

One/Two/Several previous publications have reported candidate genotype-phenotype correlations in [Syndrome Name].

<!-- Cite specific studies and their findings. Replace with relevant studies and findings for the specific gene and syndrome. -->

[Negy et al. (2022)](https://pubmed.ncbi.nlm.nih.gov/35052493/) summarized data on 117 individuals with White-Sutton syndrome, including 12 novel individuals. They wrote:

> A severity scoring system was developed for the comparison. Mild and severe phenotypes were compared with the types and location of the variants and the predicted presence or absence of nonsense-mediated RNA decay (NMD). Missense variants were more often associated with mild phenotypes (p = 0.0421) and truncating variants predicted to escape NMD presented with more severe phenotypes (p < 0.0001). Within this group, variants in the prolin-rich region of the POGZ protein were associated with the most severe phenotypes (p = 0.0004).



TODO -- find set of HPO terms that matches the severity score used by the authors:

> Example: Therefore, we tested missense vs other variants (inspection of the distribution of variants showed that the other variant categories in our dataset are [list other variant categories], all of which we deemed to be null variants for the purposes of this analysis). We also tested the two most common missense variants ([DETAILS]) for correlations.


## Imports 

In [4]:
import gpsea

from gpsea.analysis.predicate import PatientCategories
from gpsea.preprocessing import load_phenopacket_folder
from gpsea.preprocessing import configure_caching_cohort_creator
from gpsea.model import FeatureType, VariantEffect
from gpsea.view import CohortViewable
from gpsea.preprocessing import UniprotProteinMetadataService
from gpsea.model.genome import GRCh38
from gpsea.preprocessing import VVMultiCoordinateService
from gpsea.view import ProteinVisualizable, ProteinVisualizer, ProteinViewable
import hpotk
from IPython.display import display, HTML

store = hpotk.configure_ontology_store()
hpo = store.load_minimal_hpo(release='v2023-10-09')
print(f'Loaded HPO v{hpo.version}')
print(f"Using genophenocorr version {gpsea.__version__}")


Loaded HPO v2023-10-09
Using genophenocorr version 0.3.1.dev0


## Loading Phenopackets & Gene Variant Data
The prefered transcript can be found by searching on the gene symbol in [ClinVar](https://www.ncbi.nlm.nih.gov/clinvar/). By entering the accession number in [NCBI Nucleotide](https://www.ncbi.nlm.nih.gov/nuccore/?), you can find the corresponding protein accession number.

In [None]:
[GeneName]_MANE_transcript = 'NM_...'
[GeneName]_protein_id = "NP_..."
cohort_creator = configure_caching_cohort_creator(hpo, timeout=20)
phenopacket_input_folder = "../../../../GIT/phenopacket-store/notebooks/[GeneName]/phenopackets/"
cohort = load_phenopacket_folder(pp_directory=phenopacket_input_folder, cohort_creator=cohort_creator)

## Define Configuration & Run Analysis

In [None]:
cv = CohortViewable(hpo=hpo)
html = cv.process(cohort=cohort, transcript_id=[GeneName]_MANE_transcript)
display(HTML(html))

In [None]:
pms = UniprotProteinMetadataService()
protein_meta = pms.annotate([GeneName]_protein_id)
# TODO: Check Genome Build
txc_service = VVMultiCoordinateService(genome_build=GRCh38)
tx_coordinates = txc_service.fetch([GeneName]_MANE_transcript)
pvis = ProteinVisualizable(tx_coordinates=tx_coordinates, protein_meta=protein_meta, cohort=cohort)

In [None]:
viewer = ProteinViewable()
html_prot = viewer.process(cohort, pvis)
display(HTML(html_prot))

In [None]:
drawer = ProteinVisualizer()
drawer.draw_fig(pvis=pvis);

## Correlation Analysis

In [None]:
analysis_config = CohortAnalysisConfiguration()
analysis_config.missing_implies_excluded = False
# TODO: Check HPO_observed_frequency 
analysis_config.heuristic_strategy(threshold_HPO_observed_frequency=0.2)
analysis = configure_cohort_analysis(cohort, hpo, config=analysis_config)

In [None]:
from genophenocorr.model import FeatureType

# TODO: Check compare_by_variant_effect parameters and/or variables 
frameshift = analysis.compare_by_variant_effect(VariantEffect.MISSENSE_VARIANT, tx_id=[GeneName]_MANE_transcript)
frameshift.summarize(hpo, PatientCategories.YES)

In [None]:
# TODO: Check compare_by_variant_key parameters and/or variables
feature = analysis.compare_by_variant_key(variant_key="12_114385521_114385521_C_T")
feature.summarize(hpo, PatientCategories.YES)

In [None]:
# TODO: Check compare_by_variant_key parameters and/or variables
feature = analysis.compare_by_variant_key(variant_key="12_114401830_114401830_C_T")
feature.summarize(hpo, PatientCategories.YES)