<h1>WWOX: Mignot, et al. (2015)</h1>
<p>We will process <a href="https://pubmed.ncbi.nlm.nih.gov/25411445/" target="__blank">Mignot, et al. (2015) WWOX-related encephalopathies: delineation of the phenotypical spectrum and emerging genotype-phenotype correlation</a></p>
<p>According to the authors, the phenotype in four patients carrying two predicted null alleles was characterised by (1) little if any psychomotor acquisitions, poor spontaneous motility and absent eye contact from birth, (2) pharmacoresistant epilepsy starting in the 1st weeks of life, (3) possible retinal degeneration, acquired microcephaly and premature death. This contrasted with the less severe autosomal recessive spinocerebellar ataxia type 12 phenotype due to hypomorphic alleles. </p>

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None) # show entire column contents, important!
from IPython.display import display, HTML
from pyphetools.creation import *
from pyphetools.visualization import *
from pyphetools.validation import *
import pyphetools
print(f"Using pyphetools version {pyphetools.__version__}")

Using pyphetools version 0.8.30


<h2>Importing HPO data</h2>

In [2]:
parser = HpoParser()
hpo_cr = parser.get_hpo_concept_recognizer()
hpo_version = parser.get_version()
hpo_ontology = parser.get_ontology()
PMID = "PMID:25411445"
title = "WWOX-related encephalopathies: delineation of the phenotypical spectrum and emerging genotype-phenotype correlation"
metadata = MetaData(created_by="ORCID:0000-0002-5648-2155", pmid=PMID, pubmed_title=title)
metadata.default_versions_with_hpo(version=hpo_version)
print(f"HPO version {hpo_version}")

HPO version 2023-10-09


<h2>Importing the supplemental table</h2>

In [3]:
df = pd.read_excel('./input/PMID_25411445.xlsx')
df = df.set_index('Patient').T.reset_index()
df['Age'] = 0
df['patient_id'] = df['index'] 
df.head()

Patient,index,Sex,Age,Mutation,DNA level (NM_016373.2),Second mutation,Protein level (NM_057457),Microcephaly,Profound global developmental delay,Axial hypotonia,...,Hypokinesia,Rigidity,Thin corpus callosum,Delayed myelination,Cerebral atrophy,Seizure,EEG abnormality,Reduced eye contact,Sudden death,patient_id
0,1,F,0,CH for two deletions of several exons,c.366_516del,c.517-?_1056+ ?del,p.[0];[His173_Met352del],Yes,Yes,Yes,...,Yes,Yes,Yes,No,No,Yes,Yes,Yes,No,1
1,2,F,0,CH: deletion of exon 6/ nonsense exon 8,c.1005G>A,c.517-?_605+?del,p.[His173Alafs*67];[Trp335*],Yes,Yes,Yes,...,Yes,Yes,No,Yes,No,Yes,Yes,Yes,Yes,2
2,3,M,0,CH: frameshift exon,c.45_48delGGAC,c.140C>G,p.[Asp16Serfs*63];[Pro47Arg],No,Yes,Yes,...,No,No,No,No,No,Yes,Yes,No,No,3
3,4,F,0,1/missense exon 2,c.45_48delGGAC,c.140C>G,,No,Yes,Yes,...,No,No,No,No,No,Yes,Yes,No,No,4
4,5,F,0,CH: complete deletion/ nonsense exon 8,c.889A>T,c.-366-? *871+?del,p.[0];[Lys297*],No,Yes,Yes,...,Yes,Yes,No,No,Yes,Yes,Yes,Yes,Yes,5


<h2>Column mappers</h2>

In [4]:
generator = SimpleColumnMapperGenerator(df=df, observed='Yes', excluded='No', hpo_cr=hpo_cr)
column_mapper_d = generator.try_mapping_columns()

In [5]:
from IPython.display import display, HTML
display(HTML(generator.to_html()))

Result,Columns
Mapped,Microcephaly; Profound global developmental delay; Axial hypotonia; Cerebellar ataxia; Spasticity; Hypokinesia; Rigidity; Thin corpus callosum; Delayed myelination; Cerebral atrophy; Seizure; EEG abnormality; Reduced eye contact; Sudden death
Unmapped,index; Sex; Age; Mutation; DNA level (NM_016373.2); Second mutation; Protein level (NM_057457); patient_id


<h2>Variant Data</h2>
<p>The variant data (HGVS transcript) is listed with respect to NM_016373.4.</p>

In [6]:
genome = 'hg38'
default_genotype = 'compound heterozygous'
WWOX_transcript='NM_016373.4'
wwox_id = "HGNC:12799"
wwox_symbol = "WWOX"
vvalidator = VariantValidator(genome_build=genome, transcript=WWOX_transcript)
all_variants = set()
variant_d = {}
patient_id_to_hgvs1_d = {}
patient_id_to_hgvs2_d = {}
for idx, row in df.iterrows():
    patient_id = str(row['patient_id'])
    v1 = row['DNA level (NM_016373.2)']
    v2 = row['Second mutation']
    all_variants.add(v1)
    all_variants.add(v2)
    patient_id_to_hgvs1_d[patient_id] = v1
    patient_id_to_hgvs2_d[patient_id] = v2
# The following are exon/multiexon deletions that cannot easily be interpreted by variant validator
# we will encode as structural since they are almost certainly loss of function
structural_vars = {
    "c.517-?_605+?del",
    "c.517-?_1056+ ?del",
    "c.-366-? *871+?del"
}
for v in all_variants:
    print(f"Encoding {v}")
    if v in structural_vars:
        var = StructuralVariant.chromosomal_deletion(cell_contents=v, gene_symbol=wwox_symbol, gene_id=wwox_id)
    else:
        var = vvalidator.encode_hgvs(v)
    variant_d[v] = var
print(f"Extracted {len(variant_d)} unique variants")

Encoding c.1005G>A
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_016373.4%3Ac.1005G>A/NM_016373.4?content-type=application%2Fjson
Encoding c.45_48delGGAC
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_016373.4%3Ac.45_48delGGAC/NM_016373.4?content-type=application%2Fjson
Encoding c.366_516del
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_016373.4%3Ac.366_516del/NM_016373.4?content-type=application%2Fjson
Encoding c.140C>G
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_016373.4%3Ac.140C>G/NM_016373.4?content-type=application%2Fjson
Encoding c.-366-? *871+?del
Encoding c.517-?_605+?del
Encoding c.517-?_1056+ ?del
Encoding c.889A>T
https://rest.variantvalidator.org/VariantValidator/variantvalidator/hg38/NM_016373.4%3Ac.889A>T/NM_016373.4?content-type=application%2Fjson
Extracted 8 unique variants


<h1>Demographic data</h1>
<p>pyphetools can be used to capture information about age, sex, and individual identifiers. This information is stored in a map of "IndividualMapper" objects. Special treatment may be required for the indifiers, which may be used as the column names or row index.</p>

In [7]:
ageMapper = AgeColumnMapper.by_year('Age')
ageMapper.preview_column(df['Age'])

Unnamed: 0,original column contents,age
0,0,P0Y


In [8]:
sexMapper = SexColumnMapper(male_symbol='M', female_symbol='F', column_name='Sex')
sexMapper.preview_column(df['Sex'])

Unnamed: 0,original column contents,sex
0,F,FEMALE
1,F,FEMALE
2,M,MALE
3,F,FEMALE
4,F,FEMALE


In [9]:
encoder = CohortEncoder(df=df, 
                        hpo_cr=hpo_cr, 
                        column_mapper_d=column_mapper_d, 
                        individual_column_name="index", 
                        agemapper=ageMapper, 
                        sexmapper=sexMapper,
                        metadata=metadata,
                        pmid=PMID)
dee28 = Disease(disease_id='OMIM:616211', disease_label='Developmental and epileptic encephalopathy 28')
encoder.set_disease(dee28)

In [10]:
individuals = encoder.get_individuals()

<h2>Variants</h2>
<p>Add the variants for each individual - it is hard to automate this with two alleles.</p>

In [11]:
for i in individuals:
    v1 = patient_id_to_hgvs1_d.get(i.id)
    v2 = patient_id_to_hgvs2_d.get(i.id)
    print(f"v1 {v1} v2 {v2}")
    if v1 == v2:
        var = variant_d.get(v1)
        var.set_homozygous()
        i.add_variant(var)
    else:
        var1 = variant_d.get(v1)
        var1.set_heterozygous()
        i.add_variant(var1)
        var2 = variant_d.get(v2)
        var2.set_heterozygous()
        i.add_variant(var2)

v1 c.366_516del v2 c.517-?_1056+ ?del
v1 c.1005G>A v2 c.517-?_605+?del
v1 c.45_48delGGAC v2 c.140C>G
v1 c.45_48delGGAC v2 c.140C>G
v1 c.889A>T v2 c.-366-? *871+?del


In [12]:
cvalidator = CohortValidator(cohort=individuals, ontology=hpo_ontology, min_hpo=1, allelic_requirement=AllelicRequirement.BI_ALLELIC)
qc = QcVisualizer(ontology=hpo_ontology, cohort_validator=cvalidator)
display(HTML(qc.to_summary_html()))

In [13]:
individuals = cvalidator.get_error_free_individual_list()
table = PhenopacketTable(individual_list=individuals, metadata=metadata)
display(HTML(table.to_html()))

Individual,Disease,Genotype,Phenotypic features
1 (FEMALE; P0Y),Developmental and epileptic encephalopathy 28 (OMIM:616211),NM_016373.4:c.366_516del (heterozygous) c.517-?_1056+ ?del: chromosomal_deletion (SO:1000029),Microcephaly (HP:0000252); Profound global developmental delay (HP:0012736); Axial hypotonia (HP:0008936); Hypokinesia (HP:0002375); Rigidity (HP:0002063); Thin corpus callosum (HP:0033725); Seizure (HP:0001250); EEG abnormality (HP:0002353); Reduced eye contact (HP:0000817); excluded: Ataxia (HP:0001251); excluded: Spasticity (HP:0001257); excluded: Delayed myelination (HP:0012448); excluded: Cerebral atrophy (HP:0002059); excluded: Sudden death (HP:0001699)
2 (FEMALE; P0Y),Developmental and epileptic encephalopathy 28 (OMIM:616211),NM_016373.4:c.1005G>A (heterozygous) c.517-?_605+?del: chromosomal_deletion (SO:1000029),Microcephaly (HP:0000252); Profound global developmental delay (HP:0012736); Axial hypotonia (HP:0008936); Spasticity (HP:0001257); Hypokinesia (HP:0002375); Rigidity (HP:0002063); Delayed myelination (HP:0012448); Seizure (HP:0001250); EEG abnormality (HP:0002353); Reduced eye contact (HP:0000817); Sudden death (HP:0001699); excluded: Ataxia (HP:0001251); excluded: Thin corpus callosum (HP:0033725); excluded: Cerebral atrophy (HP:0002059)
3 (MALE; P0Y),Developmental and epileptic encephalopathy 28 (OMIM:616211),NM_016373.4:c.46_49del (heterozygous) NM_016373.4:c.140C>G (heterozygous),Profound global developmental delay (HP:0012736); Axial hypotonia (HP:0008936); Seizure (HP:0001250); EEG abnormality (HP:0002353); excluded: Microcephaly (HP:0000252); excluded: Ataxia (HP:0001251); excluded: Spasticity (HP:0001257); excluded: Hypokinesia (HP:0002375); excluded: Rigidity (HP:0002063); excluded: Thin corpus callosum (HP:0033725); excluded: Delayed myelination (HP:0012448); excluded: Cerebral atrophy (HP:0002059); excluded: Reduced eye contact (HP:0000817); excluded: Sudden death (HP:0001699)
4 (FEMALE; P0Y),Developmental and epileptic encephalopathy 28 (OMIM:616211),NM_016373.4:c.46_49del (heterozygous) NM_016373.4:c.140C>G (heterozygous),Profound global developmental delay (HP:0012736); Axial hypotonia (HP:0008936); Seizure (HP:0001250); EEG abnormality (HP:0002353); excluded: Microcephaly (HP:0000252); excluded: Ataxia (HP:0001251); excluded: Spasticity (HP:0001257); excluded: Hypokinesia (HP:0002375); excluded: Rigidity (HP:0002063); excluded: Thin corpus callosum (HP:0033725); excluded: Delayed myelination (HP:0012448); excluded: Cerebral atrophy (HP:0002059); excluded: Reduced eye contact (HP:0000817); excluded: Sudden death (HP:0001699)
5 (FEMALE; P0Y),Developmental and epileptic encephalopathy 28 (OMIM:616211),NM_016373.4:c.889A>T (heterozygous) c.-366-? *871+?del: chromosomal_deletion (SO:1000029),Profound global developmental delay (HP:0012736); Axial hypotonia (HP:0008936); Spasticity (HP:0001257); Hypokinesia (HP:0002375); Rigidity (HP:0002063); Cerebral atrophy (HP:0002059); Seizure (HP:0001250); EEG abnormality (HP:0002353); Reduced eye contact (HP:0000817); Sudden death (HP:0001699); excluded: Microcephaly (HP:0000252); excluded: Ataxia (HP:0001251); excluded: Thin corpus callosum (HP:0033725); excluded: Delayed myelination (HP:0012448)


In [14]:
output_directory = "phenopackets"
Individual.output_individuals_as_phenopackets(individual_list=individuals,
                                             metadata=metadata,
                                             outdir=output_directory)

We output 5 GA4GH phenopackets to the directory phenopackets
