# Arginine-glutamic acid dipeptide repeats (RERE)

The Arginine-glutamic acid dipeptide repeats (RERE) RERE gene encodes a nuclear receptor coregulator that positively regulates retinoic acid signaling [Fregeau et al., 2016](https://pubmed.ncbi.nlm.nih.gov/27087320/). A high percentage of RERE pathogenic variants affect a 21 amino acid (amino acids 1425–1445), histidine-rich region of the Atrophin-1 domain [Jordan et al., 2018](https://pubmed.ncbi.nlm.nih.gov/29330883/). The authors noted that Of the 19 individuals with NEDBEH described here and by Fregeau et al. (2016), nine (47%) carry sequence variants that affect a histidine-rich region of the Atrophin-1 domain that spans 21 amino acids (1425– 1445). The amino acid sequence in this region is 100% conserved down to Xenopus and zebrafish, but the functional significance of this domain is currently unknown.

Let us try to reproduce Table 3 of https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5903952/
To do this, we will want to use (for instance)  Abnormal brain morphology HP:0012443 for "Brain anomalies", and use the MTC feature that tests only one term at a time (this will mean no Bonferroni is applied, just as in the original publication).  

Take a look at "analysis_config.specify_terms_strategy()"
Do this once for each of the items in Table 3 with the appropriate HPO term.
Presumably, we will also not achieve statistical significance, but if we get a comparable result as the authors, this will be fine.

As a stretch goal, we can figure out how to do "Number of defects per individual"


In [1]:
import genophenocorr
from genophenocorr.preprocessing import load_phenopacket_folder
from genophenocorr.preprocessing import configure_caching_cohort_creator
import hpotk

store = hpotk.configure_ontology_store()
hpo = store.load_minimal_hpo(release='v2023-10-09')
print(f'Loaded HPO v{hpo.version}')
print(f"Using genophenocorr version {genophenocorr.__version__}")

Loaded HPO v2023-10-09
Using genophenocorr version 0.1.1dev


In [2]:
RERE_MANE_transcript = 'NM_012102.4'
RERE_protein_id = "NP_036234.3"
cohort_creator = configure_caching_cohort_creator(hpo, timeout=20)
phenopacket_input_folder = "../../../phenopacket-store/notebooks/RERE/phenopackets/"
cohort = load_phenopacket_folder(pp_directory=phenopacket_input_folder, cohort_creator=cohort_creator)

ValueError: `/Users/robin/PycharmProjects/phenopacket-store/notebooks/RERE/phenopackets` does not point to a directory

In [None]:
from IPython.display import display, HTML
from genophenocorr.view import CohortViewable

cv = CohortViewable(hpo=hpo)
html = cv.process(cohort=cohort, transcript_id=RERE_MANE_transcript)

display(HTML(html))

In [None]:
from genophenocorr.preprocessing import UniprotProteinMetadataService
from genophenocorr.model.genome import GRCh38
from genophenocorr.preprocessing import VVTranscriptCoordinateService
from genophenocorr.view import ProteinVisualizable, ProteinVisualizer, ProteinViewable

pms = UniprotProteinMetadataService()
protein_meta = pms.annotate(RERE_protein_id)
txc_service = VVTranscriptCoordinateService(genome_build=GRCh38)
tx_coordinates = txc_service.fetch(RERE_MANE_transcript)
pvis = ProteinVisualizable(tx_coordinates=tx_coordinates, protein_meta=protein_meta, cohort=cohort)

In [None]:
viewer = ProteinViewable()
html_prot = viewer.process(cohort, pvis)
display(HTML(html_prot))

In [None]:
drawer = ProteinVisualizer()
drawer.draw_fig(pvis=pvis)

In [None]:
from genophenocorr.analysis import configure_cohort_analysis, CohortAnalysisConfiguration
from genophenocorr.analysis.predicate import PatientCategories
from genophenocorr.model.genome import Region

analysis_config = CohortAnalysisConfiguration()
analysis_config.missing_implies_excluded = False

analysis = configure_cohort_analysis(cohort, hpo, config=analysis_config)

In [None]:
from genophenocorr.model import VariantEffect

frameshift = analysis.compare_by_variant_effect(VariantEffect.FRAMESHIFT_VARIANT, tx_id=RERE_MANE_transcript)
frameshift.summarize(hpo, PatientCategories.YES)

In [None]:
from genophenocorr.analysis import configure_cohort_analysis, CohortAnalysisConfiguration
from genophenocorr.analysis.predicate import PatientCategories

analysis_config = CohortAnalysisConfiguration()
analysis_config.missing_implies_excluded = False
analysis_config.heuristic_strategy()
analysis = configure_cohort_analysis(cohort, hpo, config=analysis_config)

In [None]:
frameshift = analysis.compare_by_variant_effect(VariantEffect.FRAMESHIFT_VARIANT, tx_id=RERE_MANE_transcript)
frameshift.summarize(hpo, PatientCategories.YES)

In [None]:
missense = analysis.compare_by_variant_effect(VariantEffect.MISSENSE_VARIANT, tx_id=RERE_MANE_transcript)
missense.summarize(hpo, PatientCategories.YES)

In [None]:
from genophenocorr.model import FeatureType

feature = analysis.compare_by_protein_feature_type(FeatureType.DOMAIN, tx_id=RERE_MANE_transcript)
feature.summarize(hpo, PatientCategories.YES)