# RNU4-2
[ReNU syndrome (RENU)](https://omim.org/entry/620851) is caused by heterozygous mutation in the RNU4-2 gene.

RNU4-2 is one of two U4 snRNA genes.

[Chen Y, et al. (2024)](https://pubmed.ncbi.nlm.nih.gov/38991538/) report:
> The n.64_65insT variant is one of six single base insertions that we observe in the 18 bp critical region in individuals with NDD, in a total of 100 individuals across cohorts. By contrast, single base insertions are very rare in population cohorts. Although we do also observe some SNVs in this region in individuals with NDD, our initial data suggest these SNVs may result in a milder phenotype. However, given this observation is based on only four fully phenotyped individuals, it needs to be confirmed in larger cohorts. 

In [1]:
import hpotk
import gpsea

store = hpotk.configure_ontology_store()
hpo = store.load_minimal_hpo(release='v2024-08-13')
print(f'Loaded HPO v{hpo.version}')
print(f"Using gpsea version {gpsea.__version__}")

Loaded HPO v2024-08-13
Using gpsea version 0.9.2


# RNU4-2
We user the [Matched Annotation from NCBI and EMBL-EBI (MANE)](https://www.ncbi.nlm.nih.gov/refseq/MANE/) transcript for RNU4-2. RNU4-2 is non-coding and thus there is no protein identifier.

In [2]:
gene_symbol = 'RNU4-2'
mane_tx_id = 'NR_003137.3'

In [3]:
from ppktstore.registry import configure_phenopacket_registry
from gpsea.preprocessing import load_phenopackets
from gpsea.preprocessing import configure_caching_cohort_creator

phenopacket_store_release = '0.1.21'  # Update, if necessary
registry = configure_phenopacket_registry()

with registry.open_phenopacket_store(release=phenopacket_store_release) as ps:
    phenopackets = tuple(ps.iter_cohort_phenopackets(gene_symbol))

cohort_creator = configure_caching_cohort_creator(hpo)

cohort, qc_results = load_phenopackets(phenopackets, cohort_creator)  
qc_results.summarize()

Individuals Processed: 100%|██████████| 61/61 [00:16<00:00,  3.72 individuals/s]
Validated under permissive policy


In [4]:
from gpsea.view import CohortViewer
viewer = CohortViewer(hpo)
viewer.process(cohort=cohort, transcript_id=mane_tx_id)

n,HPO Term
49,Absent speech
48,Intellectual disability
46,Delayed ability to walk
45,Seizure
42,Hypotonia
38,Short stature
37,Failure to thrive
35,Severe global developmental delay
35,Constipation
32,Strabismus

n,Disease
61,ReNU syndrome

n,Variant key,HGVS,Variant Class
53,12_120291839_120291839_T_TA,12_120291839_120291839_T_TA (None),
2,12_120291828_120291828_G_A,12_120291828_120291828_G_A (None),
2,12_120291835_120291835_G_A,12_120291835_120291835_G_A (None),
1,12_120291835_120291835_G_GT,12_120291835_120291835_G_GT (None),
1,12_120291839_120291839_T_TC,12_120291839_120291839_T_TC (None),
1,12_120291839_120291839_T_C,12_120291839_120291839_T_C (None),
1,12_120291826_120291826_T_TA,12_120291826_120291826_T_TA (None),

Variant effect,Count


# Genotype phenotype correlation analysis

In [5]:
from gpsea.analysis.pcats import configure_hpo_term_analysis
analysis = configure_hpo_term_analysis(hpo)

from gpsea.analysis.clf import prepare_classifiers_for_terms_of_interest
pheno_predicates = prepare_classifiers_for_terms_of_interest(
    cohort=cohort,
    hpo=hpo,
)

In [6]:
from gpsea.analysis.predicate import variant_key
from gpsea.analysis.clf import monoallelic_classifier


is_n64_65insT = variant_key("12_120291839_120291839_T_TA") # n.64_65insT
is_b = variant_key("12_120291835_120291835_G_GT") #12_120291835_120291835_G_GT
is_c = variant_key("12_120291839_120291839_T_TC") # 12_120291839_120291839_T_TC
is_d = variant_key("12_120291826_120291826_T_TA") # 12_120291826_120291826_T_TA

is_insertion = is_n64_65insT | is_b | is_c | is_d 


is_n64_65insT_predicate = monoallelic_classifier(
    a_predicate=is_n64_65insT,
    b_predicate=~is_n64_65insT,
    a_label= "n.64_65insT", 
    b_label="other"
)

is_insertion_predicate = monoallelic_classifier(
    a_predicate=is_insertion,
    b_predicate=~is_insertion,
    a_label="insertion", 
    b_label="other"
)

In [7]:
n64_65insT_result = analysis.compare_genotype_vs_phenotypes(
    cohort=cohort,
    gt_clf=is_n64_65insT_predicate,
    pheno_clfs=pheno_predicates,
)
from gpsea.view import MtcStatsViewer

viewer = MtcStatsViewer()
viewer.process(n64_65insT_result)

Code,Reason,Count
HMF01,Skipping term with maximum frequency that was less than threshold 0.4,145
HMF03,Skipping term because of a child term with the same individual counts,10
HMF08,Skipping general term,92
HMF09,Skipping term with maximum annotation frequency that was less than threshold 0.4,234


In [8]:
from gpsea.view import summarize_hpo_analysis

summarize_hpo_analysis(hpo=hpo, result=n64_65insT_result)

Allele group,n.64_65insT,n.64_65insT,other,other,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,Corrected p values,p values
Severe global developmental delay [HP:0011344],33/38,87%,2/7,29%,0.091922,0.003469
Moderate global developmental delay [HP:0011343],5/38,13%,5/7,71%,0.091922,0.003469
Absent speech [HP:0001344],45/51,88%,4/8,50%,0.393489,0.022273
Hypotonia [HP:0001252],46/49,94%,5/8,62%,0.407792,0.030777
Hearing impairment [HP:0000365],6/38,16%,3/7,43%,1.0,0.130678
Primary microcephaly [HP:0011451],19/34,56%,1/5,20%,1.0,0.181764
Seizure [HP:0001250],37/52,71%,8/8,100%,1.0,0.18244
Constipation [HP:0002019],32/48,67%,3/7,43%,1.0,0.241929
Delayed ability to walk [HP:0031936],40/41,98%,6/7,86%,1.0,0.27305
Strabismus [HP:0000486],29/49,59%,3/7,43%,1.0,0.446515


In [9]:
insertion_result = analysis.compare_genotype_vs_phenotypes(
    cohort=cohort,
    gt_clf=is_insertion_predicate,
    pheno_clfs=pheno_predicates,
)
summarize_hpo_analysis(hpo=hpo, result=insertion_result)

Allele group,insertion,insertion,other,other,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,Corrected p values,p values
Moderate global developmental delay [HP:0011343],6/41,15%,4/4,100%,0.03735,0.001409
Severe global developmental delay [HP:0011344],35/41,85%,0/4,0%,0.03735,0.001409
Absent speech [HP:0001344],47/54,87%,2/5,40%,0.535189,0.030294
High forehead [HP:0000348],3/56,5%,2/5,40%,0.652308,0.049231
Diabetes insipidus [HP:0000873],9/48,19%,3/5,60%,0.659341,0.070212
Hypotonia [HP:0001252],48/52,92%,3/5,60%,0.659341,0.080878
Failure to thrive [HP:0001508],36/50,72%,1/4,25%,0.659341,0.087083
Constipation [HP:0002019],34/51,67%,1/4,25%,0.869172,0.131196
Delayed ability to walk [HP:0031936],43/44,98%,3/4,75%,0.950158,0.161348
Short stature [HP:0004322],36/46,78%,2/4,50%,1.0,0.239917


In [10]:
from gpsea.analysis.clf import sex_classifier
mf_result = analysis.compare_genotype_vs_phenotypes(
    cohort=cohort,
    gt_clf=sex_classifier(),
    pheno_clfs=pheno_predicates,
)

summarize_hpo_analysis(hpo, mf_result)

Sex,FEMALE,FEMALE,MALE,MALE,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,Corrected p values,p values
Reduced circulating growth hormone concentration [HP:0034323],0/11,0%,11/22,50%,0.261289,0.005025
Feeding difficulties [HP:0011968],16/18,89%,22/22,100%,1.0,0.196154
Gastroesophageal reflux [HP:0002020],9/24,38%,17/31,55%,1.0,0.277773
Delayed gross motor development [HP:0002194],19/20,95%,27/27,100%,1.0,0.425532
Short stature [HP:0004322],18/22,82%,20/28,71%,1.0,0.511611
Intrauterine growth retardation [HP:0001511],8/17,47%,9/27,33%,1.0,0.525863
Hypotonia [HP:0001252],24/26,92%,27/31,87%,1.0,0.67794
Facial hypotonia [HP:0000297],4/10,40%,5/16,31%,1.0,0.692449
Severe global developmental delay [HP:0011344],15/20,75%,20/25,80%,1.0,0.731035
Absent speech [HP:0001344],23/27,85%,26/32,81%,1.0,0.741246


# Summary

In [11]:
from gpseacs.report import GpseaAnalysisReport, GPAnalysisResultSummary


fet_results = (
    GPAnalysisResultSummary.from_multi(
        result=n64_65insT_result,
    ),
    GPAnalysisResultSummary.from_multi(
        result=insertion_result,
    ),
    GPAnalysisResultSummary.from_multi(
        result=mf_result
    )
)

caption = "No previous statistical analysis of correlations with PTPN11 missense variants identified in the medical literature."
report = GpseaAnalysisReport(name=gene_symbol, 
                             cohort=cohort, 
                             fet_results=fet_results,
                             gene_symbol=gene_symbol,
                             mane_tx_id=mane_tx_id,
                             mane_protein_id="",
                             caption=caption)

In [12]:
from gpseacs.report import GpseaNotebookSummarizer
summarizer = GpseaNotebookSummarizer(hpo=hpo, gpsea_version=gpsea.__version__)
summarizer.summarize_report(report=report)

Genotype (A),Genotype (B),Tests performed,Significant tests
n.64_65insT,other,53,0

HPO Term,insertion,other,p-val,adj. p-val
Severe global developmental delay [HP:0011344],35/41 (85%),0/4 (0%),0.001,0.037
Moderate global developmental delay [HP:0011343],6/41 (15%),4/4 (100%),0.001,0.037

Genotype (A),Genotype (B),Tests performed,Significant tests
FEMALE,MALE,52,0


In [13]:
summarizer.process_latex(report=report)

Output to ../../supplement/tex/RNU4-2_summary_draft.tex
