# RNU4-2
[ReNU syndrome (RENU)](https://omim.org/entry/620851) is caused by heterozygous mutation in the RNU4-2 gene.

RNU4-2 is one of two U4 snRNA genes.

[Chen Y, et al. (2024)](https://pubmed.ncbi.nlm.nih.gov/38991538/) report:
> The n.64_65insT variant is one of six single base insertions that we observe in the 18 bp critical region in individuals with NDD, in a total of 100 individuals across cohorts. By contrast, single base insertions are very rare in population cohorts. Although we do also observe some SNVs in this region in individuals with NDD, our initial data suggest these SNVs may result in a milder phenotype. However, given this observation is based on only four fully phenotyped individuals, it needs to be confirmed in larger cohorts. 

In [16]:
import hpotk
import gpsea

store = hpotk.configure_ontology_store()
hpo = store.load_minimal_hpo(release='v2024-08-13')
print(f'Loaded HPO v{hpo.version}')
print(f"Using gpsea version {gpsea.__version__}")

Loaded HPO v2024-08-13
Using gpsea version 0.5.1.dev0


# RNU4-2
We user the [Matched Annotation from NCBI and EMBL-EBI (MANE)](https://www.ncbi.nlm.nih.gov/refseq/MANE/) transcript for RNU4-2. RNU4-2 is non-coding and thus there is no protein identifier.

In [17]:
gene_symbol = 'RNU4-2'
mane_tx_id = 'NR_003137.3'

In [18]:
from ppktstore.registry import configure_phenopacket_registry

phenopacket_store_release = '0.1.21'  # Update, if necessary
registry = configure_phenopacket_registry()

#with registry.open_phenopacket_store(release=phenopacket_store_release) as ps:
#    phenopackets = tuple(ps.iter_cohort_phenopackets(gene_symbol))

#print(f'Loaded {len(phenopackets)} phenopackets')

from gpsea.preprocessing import configure_caching_cohort_creator

cohort_name = "RNU4-2"	


cohort_creator = configure_caching_cohort_creator(hpo)
from gpsea.preprocessing import load_phenopacket_folder
pp_dir = '/Users/robin/GIT/phenopacket-store/notebooks/RNU4-2/phenopackets/'
cohort, qc_results = load_phenopacket_folder(pp_dir, cohort_creator)  
qc_results.summarize()


Individuals Processed: 50individuals [00:03, 15.93individuals/s]
Validated under permissive policy
Phenopackets
  patient #12
    phenotype-features
     errors:



In [19]:
from gpsea.view import CohortViewable
viewer = CohortViewable(hpo)
viewer.process(cohort=cohort, transcript_id=mane_tx_id)

HPO Term,ID,Seen in n individuals
Delayed ability to walk,HP:0031936,46
Intellectual disability,HP:0001249,40
Absent speech,HP:0001344,40
Seizure,HP:0001250,39
Hypotonia,HP:0001252,39
Short stature,HP:0004322,38
Severe global developmental delay,HP:0011344,35
Failure to thrive,HP:0001508,30
Constipation,HP:0002019,30
Gastroesophageal reflux,HP:0002020,24

Count,Variant key,Variant Name,Protein Variant,Variant Class
43,12_120291839_120291839_T_TA,12_120291839_120291839_T_TA,,
2,12_120291828_120291828_G_A,12_120291828_120291828_G_A,,
1,12_120291835_120291835_G_GT,12_120291835_120291835_G_GT,,
1,12_120291839_120291839_T_TC,12_120291839_120291839_T_TC,,
1,12_120291835_120291835_G_A,12_120291835_120291835_G_A,,
1,12_120291839_120291839_T_C,12_120291839_120291839_T_C,,
1,12_120291826_120291826_T_TA,12_120291826_120291826_T_TA,,

Disease Name,Disease ID,Annotation Count
ReNU syndrome,OMIM:620851,50

Variant effect,Annotation Count


In [20]:
from gpsea.analysis.predicate.phenotype import prepare_predicates_for_terms_of_interest
from gpsea.analysis.pcats.stats import FisherExactTest
from gpsea.analysis.mtc_filter import HpoMtcFilter
from gpsea.analysis.pcats import HpoTermAnalysis

pheno_predicates = prepare_predicates_for_terms_of_interest(
    cohort=cohort,
    hpo=hpo,
)

mtc_filter = HpoMtcFilter.default_filter(hpo=hpo, term_frequency_threshold=0.4, annotation_frequency_threshold=0.4)
mtc_correction = 'fdr_bh'
statistic = FisherExactTest()

analysis = HpoTermAnalysis(
    count_statistic=statistic,
    mtc_filter=mtc_filter,
    mtc_correction=mtc_correction,
    mtc_alpha=0.05,
)

In [25]:
from gpsea.analysis.predicate.genotype import VariantPredicates, monoallelic_predicate
from gpsea.model import VariantEffect


is_n64_65insT = VariantPredicates.variant_key("12_120291839_120291839_T_TA") # n.64_65insT
is_b = VariantPredicates.variant_key("12_120291835_120291835_G_GT") #12_120291835_120291835_G_GT
is_c = VariantPredicates.variant_key("12_120291839_120291839_T_TC") # 12_120291839_120291839_T_TC
is_d = VariantPredicates.variant_key("12_120291826_120291826_T_TA") # 12_120291826_120291826_T_TA

is_insertion = is_n64_65insT | is_b | is_c | is_d 


is_n64_65insT_predicate = monoallelic_predicate(
    a_predicate=is_n64_65insT,
    b_predicate=~is_n64_65insT,
    names=("n.64_65insT", "other")
)

is_insertion_predicate = monoallelic_predicate(
    a_predicate=is_insertion,
    b_predicate=~is_insertion,
    names=("insertion", "other")
)

is_n64_65insT_predicate.display_question()

'Allele group: n.64_65insT, other'

In [23]:
result = analysis.compare_genotype_vs_phenotypes(
    cohort=cohort,
    gt_predicate=is_n64_65insT_predicate,
    pheno_predicates=pheno_predicates,
)
from gpsea.view import MtcStatsViewer

viewer = MtcStatsViewer()
viewer.process(result)

Code,Reason,Count
HMF01,Skipping term with maximum frequency that was less than threshold 0.4,131
HMF03,Skipping term because of a child term with the same individual counts,12
HMF08,Skipping general term,59
HMF09,Skipping term with maximum annotation frequency that was less than threshold 0.4,103


In [24]:
from gpsea.view import summarize_hpo_analysis

summarize_hpo_analysis(hpo=hpo, result=result)

Allele group,n.64_65insT,n.64_65insT,other,other,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,Corrected p values,p values
Severe global developmental delay [HP:0011344],33/38,87%,2/7,29%,0.08325,0.003469
Moderate global developmental delay [HP:0011343],5/38,13%,5/7,71%,0.08325,0.003469
Absent speech [HP:0001344],37/41,90%,3/7,43%,0.160025,0.010002
Constipation [HP:0002019],28/38,74%,2/6,33%,0.847419,0.070618
Hearing impairment [HP:0000365],6/38,16%,3/7,43%,1.0,0.130678
Delayed ability to walk [HP:0031936],40/40,100%,6/7,86%,1.0,0.148936
Primary microcephaly [HP:0011451],19/34,56%,1/5,20%,1.0,0.181764
Secondary microcephaly [HP:0005484],6/34,18%,2/5,40%,1.0,0.267693
Seizure [HP:0001250],32/42,76%,7/7,100%,1.0,0.318569
Autistic behavior [HP:0000729],21/39,54%,1/5,20%,1.0,0.344867


In [26]:
result = analysis.compare_genotype_vs_phenotypes(
    cohort=cohort,
    gt_predicate=is_insertion_predicate,
    pheno_predicates=pheno_predicates,
)
summarize_hpo_analysis(hpo=hpo, result=result)

Allele group,insertion,insertion,other,other,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,Corrected p values,p values
Moderate global developmental delay [HP:0011343],6/41,15%,4/4,100%,0.031008,0.001409
Severe global developmental delay [HP:0011344],35/41,85%,0/4,0%,0.031008,0.001409
Absent speech [HP:0001344],39/44,89%,1/4,25%,0.174119,0.011872
Failure to thrive [HP:0001508],30/40,75%,0/3,0%,0.24186,0.023175
Constipation [HP:0002019],30/41,73%,0/3,0%,0.24186,0.027484
Delayed ability to walk [HP:0031936],43/43,100%,3/4,75%,0.624113,0.085106
Intrauterine growth retardation [HP:0001511],6/41,15%,2/4,50%,0.877076,0.139535
Nystagmus [HP:0000639],18/37,49%,0/3,0%,1.0,0.238462
Short stature [HP:0004322],36/46,78%,2/4,50%,1.0,0.239917
Seizure [HP:0001250],35/45,78%,4/4,100%,1.0,0.568663


In [27]:
from gpsea.analysis.predicate.genotype import sex_predicate
result = analysis.compare_genotype_vs_phenotypes(
    cohort=cohort,
    gt_predicate=sex_predicate(),
    pheno_predicates=pheno_predicates,
)
from gpsea.view import summarize_hpo_analysis
summary_df = summarize_hpo_analysis(hpo, result)
summary_df

Sex of the individual,FEMALE,FEMALE,MALE,MALE,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,Corrected p values,p values
Hypothyroidism [HP:0000821],0/11,0%,10/22,45%,0.593472,0.012902
Feeding difficulties [HP:0011968],13/15,87%,20/20,100%,1.0,0.176471
Gastroesophageal reflux [HP:0002020],8/18,44%,16/26,62%,1.0,0.358767
Short stature [HP:0004322],18/22,82%,20/28,71%,1.0,0.511611
Nystagmus [HP:0000639],6/16,38%,12/24,50%,1.0,0.525515
Seizure [HP:0001250],16/21,76%,23/28,82%,1.0,0.725622
Severe global developmental delay [HP:0011344],15/20,75%,20/25,80%,1.0,0.731035
Failure to thrive [HP:0001508],11/17,65%,19/26,73%,1.0,0.735697
Primary microcephaly [HP:0011451],9/16,56%,11/23,48%,1.0,0.747527
Constipation [HP:0002019],11/17,65%,19/27,70%,1.0,0.747711
