# POLR1A

POLR1A is the largest subunit of RNA polymerase I (Pol I), which catalyzes DNA-dependent synthesis of ribosomal RNA ([OMIM:616404](https://omim.org/entry/616404)).

Pathogeniv POLR1A variants are associated with

- [Acrofacial dysostosis, Cincinnati type, OMIM:616462](https://omim.org/entry/616462), and
- [Leukodystrophy, hypomyelinating, 27, OMIM:620675](https://omim.org/entry/620675).
	


[Acrofacial dysostosis, Cincinnati type](https://omim.org/entry/616462) is characterized by craniofacial anomalies reminiscent of Treacher Collins syndrome, as well as numerous additional phenotypes including neurodevelopmental abnormalities and structural cardiac defects, in combination with highly prevalent craniofacial anomalies and variable limb defects.
[Hypomyelinating leukodystrophy-27](https://omim.org/entry/620675) is an autosomal recessive neurologic disorder characterized by global developmental delay with impaired motor and intellectual development apparent from infancy. 

> Genotype-phenotpye correlations related to specific variants or variant categories were not indentified in the published literature at the time of this writing (2024-09-19). However, the clinical manifestations of Acrofacial dysostosis, Cincinnati type, an autosomal dominant disease, and
Hypomyelinating leukodystrophy-27, an autosomal recessive disease, are distinct. We were able to identify only four published cases of Hypomyelinating leukodystrophy-27, limiting statistical power, but we here test clinical manifestations of
monoallic vs. biallelic genotypes

In [4]:
import gpsea
import hpotk
from IPython.display import display, HTML

store = hpotk.configure_ontology_store()
hpo = store.load_minimal_hpo()
print(f'Loaded HPO v{hpo.version}')
print(f"Using gpsea version {gpsea.__version__}")

Loaded HPO v2024-08-13
Using gpsea version 0.7.1


# POGZ
We user the [Matched Annotation from NCBI and EMBL-EBI (MANE)](https://www.ncbi.nlm.nih.gov/refseq/MANE/) transcript and the corresponding protein identifier for POGZ.

In [2]:
gene_symbol = 'POLR1A'
mane_tx_id =  'NM_015425.6'
mane_protein_id = 'NP_056240.2' #  DNA-directed RNA polymerase I subunit RPA1

In [5]:
from ppktstore.registry import configure_phenopacket_registry
from gpsea.preprocessing import configure_caching_cohort_creator, load_phenopackets, load_phenopacket_folder

phenopacket_store_release = '0.1.21'  # Update, if necessary
registry = configure_phenopacket_registry()

with registry.open_phenopacket_store(release=phenopacket_store_release) as ps:
    phenopackets = tuple(ps.iter_cohort_phenopackets(gene_symbol))


cohort_creator = configure_caching_cohort_creator(hpo)
cohort, qc = load_phenopacket_folder("../../../phenopacket-store/notebooks/POLR1A/phenopackets", cohort_creator)

#cohort, qc = load_phenopackets(
#    phenopackets=phenopackets, 
#    cohort_creator=cohort_creator,
#)
print(f'Loaded {len(cohort)} individuals from {cohort}')
qc.summarize()

Individuals Processed: 22individuals [00:05,  4.36individuals/s]
Loaded 22 individuals from Cohort(members=(Patient(labels:Individual 3[PMID_37075751_Individual_3], sex:Sex.FEMALE, age:Age(days=56.0, timeline=Timeline.GESTATIONAL), vital_status:VitalStatus(status=<Status.DECEASED: 2>, age_of_death=Age(days=56.0, timeline=Timeline.GESTATIONAL)), variants:(Variant(variant_info=VariantInfo(variant_coordinates=VariantCoordinates(region=GenomicRegion(contig=2, start=86100058, end=86100060, strand=+), ref=CA, alt=C, change_length=-1), sv_info=None), tx_annotations=(TranscriptAnnotation(gene_id:POLR1A,transcript_id:NM_015425.6,hgvs_cdna:NM_015425.6:c.190del,is_preferred:True,variant_effects:(<VariantEffect.FRAMESHIFT_VARIANT: 'SO:0001589'>,),overlapping_exons:(2,),protein_id:NP_056240.2,hgvsp:NP_056240.2:p.Cys64AlafsTer42,protein_effect_location:Region(start=63, end=64)),), genotypes=Genotypes(['Individual 3[PMID_37075751_Individual_3]=0/1'])),), phenotypes:[DefaultTermId(idx=2, value=HP_0000

In [6]:
from gpsea.view import CohortViewer
cv = CohortViewer(hpo=hpo)
cv.process(cohort=cohort)

HPO Term,ID,Seen in n individuals
Global developmental delay,HP:0001263,12
Hypotonia,HP:0001252,11
Hypertelorism,HP:0000316,9
Short stature,HP:0004322,8
Micrognathia,HP:0000347,7
Seizure,HP:0001250,7
Low-set ears,HP:0000369,6
Microcephaly,HP:0000252,5
Cleft palate,HP:0000175,5
Ptosis,HP:0000508,5

Seen in n individuals,Variant key,Variant Name,Protein Effect,Variant Class
2,2_86065407_86065407_G_T,c.1925C>A,,
2,2_86100074_86100074_T_A,c.176A>T,,
2,2_86045702_86045702_G_A,c.2801C>T,,
2,2_86078193_86078193_C_T,c.1178G>A,,
2,2_86030290_86030290_C_A,c.4685G>T,,
2,2_86075199_86075199_C_T,c.1442G>A,,
2,2_86038743_86038746_TCTC_T,c.3988_3990del,,
1,2_86100059_86100060_CA_C,c.190del,,
1,2_86040411_86040411_C_T,c.3721G>A,,
1,2_86040482_86040483_TG_T,c.3649del,,

Name,ID,N diagnosed individuals
"Acrofacial dysostosis, Cincinnati type",OMIM:616462,18
"Leukodystrophy, hypomyelinating, 27",OMIM:620675,4


In [8]:
from gpsea.model.genome import GRCh38
from gpsea.model import ProteinMetadata
from gpsea.preprocessing import VVMultiCoordinateService, configure_default_protein_metadata_service
import matplotlib.pyplot as plt
from gpsea.view import ProteinVisualizer

txc_service = VVMultiCoordinateService(genome_build=GRCh38)
tx_coordinates = txc_service.fetch(mane_tx_id) 
pms = configure_default_protein_metadata_service()
#protein_meta = pms.annotate(mane_protein_id)

protein_meta = ProteinMetadata.from_uniprot_json(protein_id=mane_protein_id, 
                                                 label="POLR1A",
                                                 uniprot_json="O95602.json",
                                                 protein_length=1720)

polr1a_fig, ax = plt.subplots(figsize=(15, 8))
visualizer = ProteinVisualizer()
visualizer.draw_protein_diagram(
    tx_coordinates,
    protein_meta,
    cohort,
    ax=ax,
)
polr1a_fig.tight_layout()

ValueError: A required `transcripts` field is missing in the response from Variant Validator API: 
{
  "error": "Unable to recognise gene symbol LOC90784",
  "requested_symbol": "NM_015425.6"
}

In [7]:
from gpsea.view import CohortVariantViewer

viewer = CohortVariantViewer(tx_id=mane_tx_id)
report = viewer.process(cohort)
display(HTML(report))

TypeError: 'HtmlGpseaReport' object is not subscriptable

In [8]:
from gpsea.analysis.pcats import configure_hpo_term_analysis
analysis = configure_hpo_term_analysis(hpo)

from gpsea.analysis.predicate.phenotype import prepare_predicates_for_terms_of_interest
pheno_predicates = prepare_predicates_for_terms_of_interest(
    cohort=cohort,
    hpo=hpo,
)

In [10]:
from gpsea.model import VariantEffect
from gpsea.analysis.predicate.genotype import VariantPredicates, biallelic_predicate
from gpsea.view import MtcStatsViewer

is_missense = VariantPredicates.variant_effect(VariantEffect.MISSENSE_VARIANT, tx_id=mane_tx_id)
missense_predicate = biallelic_predicate(
    a_predicate=is_missense,
    b_predicate=~is_missense,
    a_label="missense",
    b_label="other",
    partitions=((0,1),(2,))
)
missense_result = analysis.compare_genotype_vs_phenotypes(
    cohort=cohort,
    gt_predicate=missense_predicate,
    pheno_predicates=pheno_predicates,
)


viewer = MtcStatsViewer()
viewer.process(missense_result)

Code,Reason,Count
HMF01,Skipping term with maximum frequency that was less than threshold 0.4,236
HMF08,Skipping general term,75
HMF09,Skipping term with maximum annotation frequency that was less than threshold 0.4,93


In [12]:
from gpsea.view import summarize_hpo_analysis

summarize_hpo_analysis(hpo=hpo, result=missense_result)

Allele group,missense/missense OR missense/other,missense/missense OR missense/other,other/other,other/other,Unnamed: 5_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,p values


In [33]:
from gpsea.view import summarize_hpo_analysis

report = summarize_hpo_analysis(hpo=hpo, result=result)
report

What is the genotype group,HET,HET,BIALLELIC_ALT,BIALLELIC_ALT,Unnamed: 5_level_0,Unnamed: 6_level_0
Unnamed: 0_level_1,Count,Percent,Count,Percent,Corrected p values,p values
Truncal ataxia [HP:0002078],0/8,0%,3/3,100%,0.027273,0.006061
Ataxia [HP:0001251],0/8,0%,3/3,100%,0.027273,0.006061
Macrocephaly [HP:0000256],0/8,0%,2/2,100%,0.066667,0.022222
Developmental regression [HP:0002376],0/8,0%,2/4,50%,0.163636,0.090909
Relative macrocephaly [HP:0004482],0/8,0%,2/4,50%,0.163636,0.090909
Ventriculomegaly [HP:0002119],1/6,17%,3/4,75%,0.278721,0.190476
Hypotonia [HP:0001252],7/9,78%,1/4,25%,0.278721,0.216783
Seizure [HP:0001250],4/4,100%,2/4,50%,0.482143,0.428571
Global developmental delay [HP:0001263],6/8,75%,4/4,100%,0.515152,0.515152
