# POLR1A

POLR1A is the largest subunit of RNA polymerase I (Pol I), which catalyzes DNA-dependent synthesis of ribosomal RNA ([OMIM:616404](https://omim.org/entry/616404)).

Pathogeniv POLR1A variants are associated with

- [Acrofacial dysostosis, Cincinnati type, OMIM:616462](https://omim.org/entry/616462), and
- [Leukodystrophy, hypomyelinating, 27, OMIM:620675](https://omim.org/entry/620675).
	


[Acrofacial dysostosis, Cincinnati type](https://omim.org/entry/616462) is characterized by craniofacial anomalies reminiscent of Treacher Collins syndrome, as well as numerous additional phenotypes including neurodevelopmental abnormalities and structural cardiac defects, in combination with highly prevalent craniofacial anomalies and variable limb defects.
[Hypomyelinating leukodystrophy-27](https://omim.org/entry/620675) is an autosomal recessive neurologic disorder characterized by global developmental delay with impaired motor and intellectual development apparent from infancy. 

> Genotype-phenotpye correlations related to specific variants or variant categories were not indentified in the published literature at the time of this writing (2024-09-19). However, the clinical manifestations of Acrofacial dysostosis, Cincinnati type, an autosomal dominant disease, and
Hypomyelinating leukodystrophy-27, an autosomal recessive disease, are distinct. We were able to identify only four published cases of Hypomyelinating leukodystrophy-27, limiting statistical power, but we here test clinical manifestations of
monoallic vs. biallelic genotypes

In [1]:
import gpsea
import hpotk
from IPython.display import display, HTML

store = hpotk.configure_ontology_store()
hpo = store.load_minimal_hpo()
print(f'Loaded HPO v{hpo.version}')
print(f"Using gpsea version {gpsea.__version__}")

Loaded HPO v2024-08-13
Using gpsea version 0.4.1.dev0


In [2]:
from ppktstore.registry import configure_phenopacket_registry
registry = configure_phenopacket_registry()
cohort_name = "POLR1A"
with registry.open_phenopacket_store(release="0.1.19") as ps:
    phenopackets = list(ps.iter_cohort_phenopackets(cohort_name))
print(f"Found cohort with {len(phenopackets)} phenopackets for {cohort_name}")

Found cohort with 22 phenopackets for POLR1A


In [3]:
POLR1A_MANE_transcript = 'NM_015425.6' # Homo sapiens RNA polymerase I subunit A (POLR1A), mRNA
POLR1A_protein_id = 'NP_056240.2' #  DNA-directed RNA polymerase I subunit RPA1
from gpsea.preprocessing import configure_caching_cohort_creator, load_phenopackets
cohort_creator = configure_caching_cohort_creator(hpo)
cohort, validation = load_phenopackets(  
    phenopackets=phenopackets,
    cohort_creator=cohort_creator,
)

Individuals Processed: 100%|██████████| 22/22 [00:00<00:00, 423.20individuals/s]


In [4]:
validation.summarize()  

Validated under none policy


In [5]:
from gpsea.view import CohortViewable
cv = CohortViewable(hpo=hpo)
report = cv.process(cohort=cohort, transcript_id=POLR1A_MANE_transcript)
display(HTML(report))

HPO Term,ID,Seen in n individuals
Global developmental delay,HP:0001263,12
Hypotonia,HP:0001252,11
Hypertelorism,HP:0000316,9
Micrognathia,HP:0000347,7
Seizure,HP:0001250,7
Low-set ears,HP:0000369,6
Abnormality of limbs,HP:0040064,6
Cleft palate,HP:0000175,5
Brain imaging abnormality,HP:0410263,5
Microcephaly,HP:0000252,5

Count,Variant key,Variant Name,Protein Variant,Variant Class
2,2_86075199_86075199_C_T,c.1442G>A,p.Arg481Lys,MISSENSE_VARIANT
2,2_86030290_86030290_C_A,c.4685G>T,p.Cys1562Phe,MISSENSE_VARIANT
2,6_43519367_43519367_A_T,6_43519367_43519367_A_T,,
2,2_86045702_86045702_G_A,c.2801C>T,p.Ser934Leu,MISSENSE_VARIANT
2,2_86078193_86078193_C_T,c.1178G>A,p.Arg393His,MISSENSE_VARIANT
2,2_86038743_86038746_TCTC_T,c.3988_3990del,p.Glu1330del,INFRAME_DELETION
2,2_86065407_86065407_G_T,c.1925C>A,p.Thr642Asn,MISSENSE_VARIANT
1,2_86028600_86028600_C_T,c.4891G>A,p.Val1631Met,MISSENSE_VARIANT
1,2_86100059_86100060_CA_C,c.190del,p.Cys64AlafsTer42,FRAMESHIFT_VARIANT
1,2_86048931_86048935_GATCA_G,c.2583_2586del,p.Asp862Ter,FRAMESHIFT_VARIANT

Disease Name,Disease ID,Annotation Count
"Acrofacial dysostosis, Cincinnati type",OMIM:616462,18
"Leukodystrophy, hypomyelinating, 27",OMIM:620675,4

Variant effect,Annotation Count
STOP_GAINED,1
MISSENSE_VARIANT,14
INFRAME_DELETION,2
FRAMESHIFT_VARIANT,3


In [8]:
from gpsea.model.genome import GRCh38
from gpsea.preprocessing import configure_default_protein_metadata_service, VVMultiCoordinateService
txc_service = VVMultiCoordinateService(genome_build=GRCh38)
pms = configure_default_protein_metadata_service()
tx_coordinates = txc_service.fetch(POLR1A_MANE_transcript)
#protein_meta = pms.annotate(POLR1A_protein_id)

ValueError: A required `transcripts` field is missing in the response from Variant Validator API: 
{
  "error": "Unable to recognise gene symbol LOC90784",
  "requested_symbol": "NM_015425.6"
}

In [15]:
from gpsea.view import CohortVariantViewer

viewer = CohortVariantViewer(tx_id=POLR1A_MANE_transcript)
report = viewer.process(cohort)
display(HTML(report))

Variant key,Variant (cDNA),Variant (protein),Effects,Count
2_86078193_86078193_C_T,c.1178G>A,p.Arg393His,missense,2
2_86075199_86075199_C_T,c.1442G>A,p.Arg481Lys,missense,2
2_86065407_86065407_G_T,c.1925C>A,p.Thr642Asn,missense,2
2_86045702_86045702_G_A,c.2801C>T,p.Ser934Leu,missense,2
6_43519367_43519367_A_T,6_43519367_43519367_A_T,,,2
2_86038743_86038746_TCTC_T,c.3988_3990del,p.Glu1330del,inframe deletion,2
2_86030290_86030290_C_A,c.4685G>T,p.Cys1562Phe,missense,2
2_86040411_86040411_C_T,c.3721G>A,p.Val1241Ile,missense,1
2_86100059_86100060_CA_C,c.190del,p.Cys64AlafsTer42,frameshift,1
2_86048931_86048935_GATCA_G,c.2583_2586del,p.Asp862Ter,frameshift,1


In [25]:
from gpsea.model import VariantEffect
from gpsea.analysis.predicate.genotype import VariantPredicates, ModeOfInheritancePredicate

is_missense = VariantPredicates.variant_effect(VariantEffect.MISSENSE_VARIANT, tx_id=POLR1A_MANE_transcript)
gt_predicate = ModeOfInheritancePredicate.autosomal_recessive(is_missense)

In [26]:
gt_predicate.get_categorizations()

(Categorization(category=HOM_REF),
 Categorization(category=HET),
 Categorization(category=BIALLELIC_ALT))

In [27]:
from gpsea.analysis.predicate.genotype import filtering_predicate

gt_predicate = filtering_predicate(gt_predicate, gt_predicate.get_categorizations()[1:])
gt_predicate.display_question()

'What is the genotype group: HET, BIALLELIC_ALT'

In [17]:
from gpsea.analysis.predicate.phenotype import prepare_predicates_for_terms_of_interest

pheno_predicates = prepare_predicates_for_terms_of_interest(
    cohort=cohort,
    hpo=hpo,
)
from gpsea.analysis.pcats.stats import FisherExactTest
from gpsea.analysis.mtc_filter import HpoMtcFilter

mtc_filter = HpoMtcFilter.default_filter(hpo=hpo, term_frequency_threshold=0.2)
mtc_correction = 'fdr_bh'
statistic = FisherExactTest()
from gpsea.analysis.pcats import HpoTermAnalysis

analysis = HpoTermAnalysis(
    count_statistic=statistic,
    mtc_filter=mtc_filter,
    mtc_correction=mtc_correction,
    mtc_alpha=0.05,
)

In [31]:
result = analysis.compare_genotype_vs_phenotypes(
    cohort=cohort,
    gt_predicate=gt_predicate,
    pheno_predicates=pheno_predicates,
)

In [32]:
from gpsea.view import MtcStatsViewer

viewer = MtcStatsViewer()
report = viewer.process(result)
display(HTML(report))

Code,Reason,Count
HMF01,Skipping term with maximum frequency that was less than threshold 0.2,31
HMF02,Skipping term because no genotype has more than one observed HPO count,2
HMF03,Skipping term because of a child term with the same individual counts,1
HMF04,Skipping term because all genotypes have same HPO observed proportions,8
HMF05,Skipping term because one genotype had zero observations,13
HMF06,Skipping term with less than 7 observations (not powered for 2x2),129
HMF08,Skipping general term,61


In [33]:
from gpsea.view import summarize_hpo_analysis

report = summarize_hpo_analysis(hpo=hpo, result=result)
report

What is the genotype group,HET,HET,BIALLELIC_ALT,BIALLELIC_ALT,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,Corrected p values,p values
Truncal ataxia [HP:0002078],0/8,0%,3/3,100%,0.027273,0.006061
Ataxia [HP:0001251],0/8,0%,3/3,100%,0.027273,0.006061
Macrocephaly [HP:0000256],0/8,0%,2/2,100%,0.066667,0.022222
Developmental regression [HP:0002376],0/8,0%,2/4,50%,0.163636,0.090909
Relative macrocephaly [HP:0004482],0/8,0%,2/4,50%,0.163636,0.090909
Ventriculomegaly [HP:0002119],1/6,17%,3/4,75%,0.278721,0.190476
Hypotonia [HP:0001252],7/9,78%,1/4,25%,0.278721,0.216783
Seizure [HP:0001250],4/4,100%,2/4,50%,0.482143,0.428571
Global developmental delay [HP:0001263],6/8,75%,4/4,100%,0.515152,0.515152
