# Create Custom GenePanels From Public Databases

Let's create a series of dummy genome panels.

*Disclaimer* I am not a variant scientist. Please use your best judgement and consult with a real expert when creating actual data for medical purposes.

Given a phenotype file that looks like this:

| Gene | NM_ |	c. |	p. |
|---|---|---|---|
|	HBB	|NM_000518.5	|c.33C>A	|p.Ala11Ala|
|	HBB	| NM_000518.5	| c.316-197C>T |	NA |
|	BRCA2 |	NM_000059.4 |	c.9976A>T	| p.Lys3326Ter |
|	MSH6	|NM_000179.2 |	c.30C>A |	p.Phe10Leu |
|	RET | NM_020975.6 |	c.1832G>A |	 p.Cys611Tyr |

We are going to generate some dummy data for Ella given the above phenotypes.


## Example - Query Public Databases for HBB c.33C>A	|p.Ala11Ala

First we will search for the first **HBB** record in gnomad. Make sure you are using v2 to be compatible with Ella Anno!

Scroll to the bottom and you will see a variant table. 

![GnomAD Variant Table](./gnomad-variant-table.png)

If you filter this table on **c.33C>A** nothing is found, but if you filter on **p.Ala11Ala** you'll get two records. [Click the first record](https://gnomad.broadinstitute.org/variant/11-5248219-G-T?dataset=gnomad_r2_1) and you will see more information, including an [RS#](https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=rs35799536) and a [Clinvar ID](https://www.ncbi.nlm.nih.gov/clinvar/variation/439155/). 

Click on the [RS#](https://www.ncbi.nlm.nih.gov/snp/rs35799536) to get the SNP report.

![SNP Report](./hbb-dbsnp-rs-report.png)

The important things to note there are the alleles and the **GR37** Coordinates. Make sure you are using the **GR37/hg19 reference genome!**.

| Type | Value |
| --- | --- |
| Alleles | G>A / G>T |
| GRCh37.p13 | chr 11	NC_000011.9:g.5248219G>A |
| GRCh37.p13 | chr 11	NC_000011.9:g.5248219G>T |

In [1]:
import os
import pandas as pd
import requests
from pprint import pprint
import numpy as np
from datetime import datetime
import json

In [2]:
# I am really bad at naming things. 
GENEPANEL_NAME="test_all"
GENEPANEL_VERSION="v01"

BASE_PATH="/data"

## Prepare the GenePanel CSV

In [3]:
genepanel_transcript_columns = ["#chromosome","txStart", "txEnd","refseq", "score", 
                                "strand", "geneSymbol", "HGNC", "Omim gene entry", "geneAlias", "eGeneID",
                                "eTranscriptID", "cdsStart", "cdsEnd", "exonsStarts", "exonEnds"]

genepanel_phenotype_columns = ['#gene symbol',
 'HGNC',
 'remove (add x)',
 'phenotype',
 'inheritance',
 'omim_number',
 'pmid',
 'inheritance info',
 'comment']

all_genepanel_transcripts_records = []
all_genepanel_phenotypes_records = []

vcf_info_line = """##fileformat=VCFv4.1
##contig=<ID=13>
##FILTER=<ID=PASS,Description="All filters passed">
"""

vcf_columns = ["#CHROM", "POS", "ID", "REF", "ALT", "QUAL", "FILTER", "INFO", "FORMAT", "nrl_test_sample_1"]

def get_genepanel_info_line(genepanel_name, genepanel_version, date=None):
    if date is None:
        date = datetime.today().strftime('%Y-%m-%d')
    return "# Genepanel: {} Version: {} Date: {}\n".format(genepanel_name, genepanel_version, date)


def write_pandas_csv_with_info_line(file_name, info_line, df):
    with open(file_name, 'w') as fp:
        fp.write(info_line)
        df.to_csv(fp, index=False, sep="\t")


def print_genepanel_phenotypes_file(data, genepanel_name, genepanel_version, date=None):
    info_line = get_genepanel_info_line(genepanel_name, genepanel_version, date=None)   
    df = pd.DataFrame(columns = genepanel_phenotype_columns, data=data)
    df = df.replace(np.nan, '', regex=True)
    genepanel_phenotypes_file_name = "{name}_v{version}.phenotypes.csv".format(name=genepanel_name,
                                                                             version=genepanel_version)

    write_pandas_csv_with_info_line(genepanel_phenotypes_file_name, info_line, df)
    return df
        
def print_genepanel_transcripts_file(data, genepanel_name, genepanel_version, date=None):
    info_line = get_genepanel_info_line(genepanel_name, genepanel_version, date=None)   
    df = pd.DataFrame(columns = genepanel_transcript_columns, data=data)
    df = df.replace(np.nan, '', regex=True)
    
    genepanel_transcripts_file_name = "{name}_v{version}.transcripts.csv".format(name=genepanel_name,
                                                                             version=genepanel_version)

    write_pandas_csv_with_info_line(genepanel_transcripts_file_name, info_line, df)
    return df

## Get Gene Panel Data from MyGene

In [4]:
def get_gene_data_mygene(query, exon_index=0):
    gene_data = requests.get("https://mygene.info/v3/gene/{}".format(query))
    gene_data = gene_data.json()
    assert 'exons_hg19' in gene_data
    assert 'genomic_pos_hg19' in gene_data
    genomic_data = gene_data['genomic_pos_hg19']
    chr = genomic_data['chr']
    strand = gene_data['exons_hg19'][0]['strand']
    if strand == 1:
        strand = '+'
    else:
        strand = '-'
    
    mygene_txstart = gene_data['exons_hg19'][exon_index]['txstart']
    mygene_txend = gene_data['exons_hg19'][exon_index]['txend']
    mygene_cdsstart = gene_data['exons_hg19'][exon_index]['cdsstart']
    mygene_cdsend = gene_data['exons_hg19'][exon_index]['cdsend']
    
    mygene_exons = np.array(gene_data['exons_hg19'][0]['position'], dtype =int)
    mygene_exon_starts = mygene_exons[:,0]
    mygene_exon_ends = mygene_exons[:,1]
    
    hgnc = gene_data['HGNC']
    omim = gene_data['MIM']
    alias = gene_data['alias']
    etranscript_id = gene_data['ensembl']['transcript'][0]
    
    return {
        "#chromosome": chr, 
        "strand": strand,
        "txStart": mygene_txstart,
        "txEnd": mygene_txend,
        "cdsStart": mygene_cdsstart,
        "cdsEnd": mygene_cdsend,
        "exonsStarts": ','.join(np.array(mygene_exon_starts, dtype=str)),
        "exonEnds": ','.join(np.array(mygene_exon_ends, dtype=str)),
        "Omim_gene_entry" : omim,
        "HGNC": hgnc,
        "geneAlias": ','.join(alias),
        "eTranscriptID": etranscript_id
    }


## Get the SNP Data for BRCA2 c.9976A>T	 p.Lys3326Ter

BRCA2	NM_000059.4	c.9976A>T	 p.Lys3326Ter

### BRCA2 Transcripts File

In [5]:
# You may have some of these IDs already when you are creating a real gene panel. 
# Either fetch them from mygene or add them in manually
# It's also a very good double check!
# HGNC = 1101
# Omim_gene_entry = 600185
# geneAlias = "FAD,FAD1,BRCC2,XRCC11"

refseq = "NM_000059.3"
geneSymbol = "BRCA2"
eTranscriptID = "ENST00000544455"
geneSymbol = "BRCA2"

# Query by the eGeneID - NO VERSION
eGeneID = "ENSG00000139618"

In [6]:
gene_data = get_gene_data_mygene(eGeneID)
# Update with our reference info
gene_data['refseq'] = refseq
gene_data['geneSymbol'] = geneSymbol
gene_data['eGeneID'] = eGeneID
gene_data['eTranscriptID'] = eTranscriptID

pd.DataFrame.from_records([gene_data])

Unnamed: 0,#chromosome,strand,txStart,txEnd,cdsStart,cdsEnd,exonsStarts,exonEnds,Omim_gene_entry,HGNC,geneAlias,eTranscriptID,refseq,geneSymbol,eGeneID
0,13,+,32889644,32974405,32890597,32972907,"32889644,32890558,32893213,32899212,32900237,3...","32889804,32890664,32893462,32899321,32900287,3...",600185,1101,"BRCC2,BROVCA2,FACD,FAD,FAD1,FANCD,FANCD1,GLM3,...",ENST00000544455,NM_000059.3,BRCA2,ENSG00000139618


### Phenotype Data

This file is mostly created manually by looking at OMIM and Clinvar. You will have to use your best judgement when creating a phenotype file.

OMIM is an excellent resource, and they ask you to sign up if you will require their database for querying. The HTML here comes directly from the website, and is only for demonstration purposes.

In [7]:
html_string = """<table class="table table-bordered table-condensed table-hover mim-table-padding small"> <thead> <tr class="active"> <th> Location </th> <th> Phenotype </th> <th> Phenotype <br>MIM number </th> <th> Inheritance </th> <th> Phenotype <br>mapping key </th> </tr></thead> <tbody> <tr> <td rowspan="8"> <span class="mim-font"> <a href="/geneMap/13/77?start=-3&amp;limit=10&amp;highlight=77"> 13q13.1 </a> </span> </td><td> <span class="mim-font">{Breast cancer, male, susceptibility to}</span> </td><td> <span class="mim-font"> <a href="/entry/114480"> 114480 </a> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="51" oldtitle="Autosomal dominant" title="">AD</abbr>, <abbr class="mim-hint" data-hasqtip="52" oldtitle="Somatic mutation" title="">SMu</abbr> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="53" oldtitle="3 - The molecular basis of the disorder is known" title="">3</abbr> </span> </td></tr><tr> <td> <span class="mim-font">{Breast-ovarian cancer, familial, 2}</span> </td><td> <span class="mim-font"> <a href="/entry/612555"> 612555 </a> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="54" oldtitle="Autosomal dominant" title="">AD</abbr> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="55" oldtitle="3 - The molecular basis of the disorder is known" title="">3</abbr> </span> </td></tr><tr> <td> <span class="mim-font">{Glioblastoma 3}</span> </td><td> <span class="mim-font"> <a href="/entry/613029"> 613029 </a> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="56" oldtitle="Autosomal recessive" title="">AR</abbr> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="57" oldtitle="3 - The molecular basis of the disorder is known" title="">3</abbr> </span> </td></tr><tr> <td> <span class="mim-font">{Medulloblastoma}</span> </td><td> <span class="mim-font"> <a href="/entry/155255"> 155255 </a> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="58" oldtitle="Autosomal dominant" title="">AD</abbr>, <abbr class="mim-hint" data-hasqtip="59" oldtitle="Autosomal recessive" title="">AR</abbr>, <abbr class="mim-hint" data-hasqtip="60" oldtitle="Somatic mutation" title="">SMu</abbr> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="61" oldtitle="3 - The molecular basis of the disorder is known" title="">3</abbr> </span> </td></tr><tr> <td> <span class="mim-font">{Pancreatic cancer 2}</span> </td><td> <span class="mim-font"> <a href="/entry/613347"> 613347 </a> </span> </td><td> <span class="mim-font"> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="62" oldtitle="3 - The molecular basis of the disorder is known" title="">3</abbr> </span> </td></tr><tr> <td> <span class="mim-font">{Prostate cancer}</span> </td><td> <span class="mim-font"> <a href="/entry/176807"> 176807 </a> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="63" oldtitle="Autosomal dominant" title="">AD</abbr>, <abbr class="mim-hint" data-hasqtip="64" oldtitle="Somatic mutation" title="">SMu</abbr> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="65" oldtitle="3 - The molecular basis of the disorder is known" title="">3</abbr> </span> </td></tr><tr> <td> <span class="mim-font"> Fanconi anemia, complementation group D1 </span> </td><td> <span class="mim-font"> <a href="/entry/605724"> 605724 </a> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="66" oldtitle="Autosomal recessive" title="">AR</abbr> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="67" oldtitle="3 - The molecular basis of the disorder is known" title="">3</abbr> </span> </td></tr><tr> <td> <span class="mim-font"> Wilms tumor </span> </td><td> <span class="mim-font"> <a href="/entry/194070"> 194070 </a> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="68" oldtitle="Autosomal dominant" title="">AD</abbr>, <abbr class="mim-hint" data-hasqtip="69" oldtitle="Somatic mutation" title="">SMu</abbr> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="70" oldtitle="3 - The molecular basis of the disorder is known" title="">3</abbr> </span> </td></tr></tbody> </table>"""
dfs = pd.read_html(html_string)
dfs[0]

ImportError: lxml not found, please install it

In [None]:
phenotype_data = []
for index, row in dfs[0].iterrows():
    phenotype_data.append({
    "#gene symbol": "BRCA2",
    "HGNC": int(gene_data["HGNC"]),
    "phenotype": row["Phenotype"],
    "inheritance": row["Inheritance"],
    "omim_number": row["Phenotype MIM number"],
    "pmid": "",
    "inheritance info": "",
    "comment": "",
    "remove (add x)": "",
    })


In [None]:
all_genepanel_transcripts_records.append(gene_data)
for p in phenotype_data:
    all_genepanel_phenotypes_records.append(p)

## Get the SNP Data for HBB c.33C>A	p.Ala11Ala

HBB	NM_000518.5	c.33C>A	p.Ala11Ala

https://www.ncbi.nlm.nih.gov/gene/3043

Ensembl:ENSG00000244734 MIM:141900

SNP Info for p.Ala11Ala

RS - https://www.ncbi.nlm.nih.gov/snp/rs35799536

GRCh37.p13 chr 11	NC_000011.9:g.5248219G>A

GRCh37.p13 chr 11	NC_000011.9:g.5248219G>T

I can't find the exact phenotype the variant scientist is looking for so I will come back.  I can find `p.Ala11Ala`, but not `c.33C>A`

### Clinvar c.33C>A	p.Ala11Ala

https://www.ncbi.nlm.nih.gov/clinvar/variation/439155/

11: 5248219 (GRCh37) C -> A

In [None]:
refseq = "NM_000518.5"
geneSymbol = "HBB"
# We're grabbing the canonical transcriptID from gnomad
eTranscriptID = "ENST00000335295"

# Query by the eGeneID - NO VERSION
eGeneID = "ENSG00000244734"

In [None]:
gene_data = get_gene_data_mygene(eGeneID)
gene_data['refseq'] = refseq
gene_data['geneSymbol'] = geneSymbol
gene_data['eGeneID'] = eGeneID
gene_data['eTranscriptID'] = eTranscriptID
pd.DataFrame.from_records([gene_data])

In [None]:
html_string = """<table class="table table-bordered table-condensed table-hover mim-table-padding small"> <thead> <tr class="active"> <th> Location </th> <th> Phenotype </th> <th> Phenotype <br>MIM number </th> <th> Inheritance </th> <th> Phenotype <br>mapping key </th> </tr></thead> <tbody> <tr> <td rowspan="9"> <span class="mim-font"> <a href="/geneMap/11/108?start=-3&amp;limit=10&amp;highlight=108"> 11p15.4 </a> </span> </td><td> <span class="mim-font">{Malaria, resistance to}</span> </td><td> <span class="mim-font"> <a href="/entry/611162"> 611162 </a> </span> </td><td> <span class="mim-font"> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="54" oldtitle="3 - The molecular basis of the disorder is known" title="">3</abbr> </span> </td></tr><tr> <td> <span class="mim-font"> Delta-beta thalassemia </span> </td><td> <span class="mim-font"> <a href="/entry/141749"> 141749 </a> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="55" oldtitle="Autosomal dominant" title="">AD</abbr> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="56" oldtitle="3 - The molecular basis of the disorder is known" title="">3</abbr> </span> </td></tr><tr> <td> <span class="mim-font"> Erythrocytosis 6 </span> </td><td> <span class="mim-font"> <a href="/entry/617980"> 617980 </a> </span> </td><td> <span class="mim-font"> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="57" oldtitle="3 - The molecular basis of the disorder is known" title="">3</abbr> </span> </td></tr><tr> <td> <span class="mim-font"> Heinz body anemia </span> </td><td> <span class="mim-font"> <a href="/entry/140700"> 140700 </a> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="58" oldtitle="Autosomal dominant" title="">AD</abbr> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="59" oldtitle="3 - The molecular basis of the disorder is known" title="">3</abbr> </span> </td></tr><tr> <td> <span class="mim-font"> Hereditary persistence of fetal hemoglobin </span> </td><td> <span class="mim-font"> <a href="/entry/141749"> 141749 </a> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="60" oldtitle="Autosomal dominant" title="">AD</abbr> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="61" oldtitle="3 - The molecular basis of the disorder is known" title="">3</abbr> </span> </td></tr><tr> <td> <span class="mim-font"> Methemoglobinemia, beta type </span> </td><td> <span class="mim-font"> <a href="/entry/617971"> 617971 </a> </span> </td><td> <span class="mim-font"> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="62" oldtitle="3 - The molecular basis of the disorder is known" title="">3</abbr> </span> </td></tr><tr> <td> <span class="mim-font"> Sickle cell anemia </span> </td><td> <span class="mim-font"> <a href="/entry/603903"> 603903 </a> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="63" oldtitle="Autosomal recessive" title="">AR</abbr> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="64" oldtitle="3 - The molecular basis of the disorder is known" title="">3</abbr> </span> </td></tr><tr> <td> <span class="mim-font"> Thalassemia-beta, dominant inclusion-body </span> </td><td> <span class="mim-font"> <a href="/entry/603902"> 603902 </a> </span> </td><td> <span class="mim-font"> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="65" oldtitle="3 - The molecular basis of the disorder is known" title="">3</abbr> </span> </td></tr><tr> <td> <span class="mim-font"> Thalassemia, beta </span> </td><td> <span class="mim-font"> <a href="/entry/613985"> 613985 </a> </span> </td><td> <span class="mim-font"> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="66" oldtitle="3 - The molecular basis of the disorder is known" title="">3</abbr> </span> </td></tr></tbody></table>"""
dfs = pd.read_html(html_string)

phenotype_data = []
for index, row in dfs[0].iterrows():
    phenotype_data.append({
    "#gene symbol": geneSymbol,
    "HGNC": int(gene_data["HGNC"]),
    "phenotype": row["Phenotype"],
    "inheritance": row["Inheritance"],
    "omim_number": row["Phenotype MIM number"],
    "pmid": "",
    "inheritance info": "",
    "comment": "",
    "remove (add x)": "",
    })


In [None]:
all_genepanel_transcripts_records.append(gene_data)
for p in phenotype_data:
    all_genepanel_phenotypes_records.append(p)

## Get the SNP Data for MSH6 c.30C>A 	p.Phe10Leu

MSH6	NM_000179.2	c.30C>A 	p.Phe10Leu

eGeneId ENSG00000116062.10

Gnomad Variant ID for p.Phe10Leu - https://gnomad.broadinstitute.org/variant/2-48010400-T-C?dataset=gnomad_r2_1

RS - https://www.ncbi.nlm.nih.gov/snp/rs773861137

GRCh37.p13 chr 2	NC_000002.11:g.48010400T>C

In [None]:
eGeneID="ENSG00000116062"
eTranscriptID="ENST00000234420"
refseq="NM_000179.2"
geneSymbol="MSH6"

In [None]:
gene_data = get_gene_data_mygene(eGeneID)
gene_data['refseq'] = refseq
gene_data['geneSymbol'] = geneSymbol
gene_data['eGeneID'] = eGeneID
gene_data['eTranscriptID'] = eTranscriptID

pd.DataFrame.from_records([gene_data])

In [None]:
html_string = """<table class="table table-bordered table-condensed table-hover mim-table-padding small"> <thead> <tr class="active"> <th> Location </th> <th> Phenotype </th> <th> Phenotype <br>MIM number </th> <th> Inheritance </th> <th> Phenotype <br>mapping key </th> </tr></thead> <tbody> <tr> <td rowspan="3"> <span class="mim-font"> <a href="/geneMap/2/227?start=-3&amp;limit=10&amp;highlight=227"> 2p16.3 </a> </span> </td><td> <span class="mim-font">{Endometrial cancer, familial}</span> </td><td> <span class="mim-font"> <a href="/entry/608089"> 608089 </a> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="54" oldtitle="Autosomal dominant" title="">AD</abbr>, <abbr class="mim-hint" data-hasqtip="55" oldtitle="Somatic mutation" title="">SMu</abbr> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="56" oldtitle="3 - The molecular basis of the disorder is known" title="">3</abbr> </span> </td></tr><tr> <td> <span class="mim-font"> Colorectal cancer, hereditary nonpolyposis, type 5 </span> </td><td> <span class="mim-font"> <a href="/entry/614350"> 614350 </a> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="57" oldtitle="Autosomal dominant" title="">AD</abbr> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="58" oldtitle="3 - The molecular basis of the disorder is known" title="">3</abbr> </span> </td></tr><tr> <td> <span class="mim-font"> Mismatch repair cancer syndrome </span> </td><td> <span class="mim-font"> <a href="/entry/276300"> 276300 </a> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="59" oldtitle="Autosomal recessive" title="">AR</abbr> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="60" oldtitle="3 - The molecular basis of the disorder is known" title="">3</abbr> </span> </td></tr></tbody> </table>"""

dfs = pd.read_html(html_string)
dfs[0]

In [None]:
phenotype_data = []
for index, row in dfs[0].iterrows():
    phenotype_data.append({
    "#gene symbol": geneSymbol,
    "HGNC": int(gene_data["HGNC"]),
    "phenotype": row["Phenotype"],
    "inheritance": row["Inheritance"],
    "omim_number": row["Phenotype MIM number"],
    "pmid": "",
    "inheritance info": "",
    "comment": "",
    "remove (add x)": "",
    })

In [None]:
all_genepanel_transcripts_records.append(gene_data)
for p in phenotype_data:
    all_genepanel_phenotypes_records.append(p)

## Get the SNP data for RET c.1832G>A	 p.Cys611Tyr

RET	NM_020975.6	c.1832G>A	 p.Cys611Tyr

Ensembl gene ID
ENSG00000165731.13

Ensembl canonical transcript More information
ENST00000355710.3


### GnomAD -  p.Cys618Arg

I can't find the requested phenotype in GnomAD

So I'm using this phenotype instead - p.Cys618Arg

https://gnomad.broadinstitute.org/variant/10-43609096-T-C?dataset=gnomad_r2_1

https://www.ncbi.nlm.nih.gov/snp/rs76262710

GRCh37.p13 chr 10	NC_000010.10:g.43609096T>A

GRCh37.p13 chr 10	NC_000010.10:g.43609096T>C

GRCh37.p13 chr 10	NC_000010.10:g.43609096T>G

###  Clinvar - p.Cys611Tyr

NM_020975.6(RET):c.1832G>A (p.Cys611Tyr)

https://www.ncbi.nlm.nih.gov/clinvar/RCV000412987/

Chr10: 43609076 (on Assembly GRCh37)

In [None]:
geneSymbol="RET"
eGeneID="ENSG00000165731"
eTranscriptID="ENST00000355710"
refseq="NM_020975.6"
gene_data = get_gene_data_mygene(eGeneID)
gene_data['refseq'] = refseq
gene_data['geneSymbol'] = geneSymbol
gene_data['eGeneID'] = eGeneID
gene_data['eTranscriptID'] = eTranscriptID

pd.DataFrame.from_records([gene_data])

In [None]:
html_string = """<table class="table table-bordered table-condensed table-hover mim-table-padding small"> <thead> <tr class="active"> <th> Location </th> <th> Phenotype </th> <th> Phenotype <br>MIM number </th> <th> Inheritance </th> <th> Phenotype <br>mapping key </th> </tr></thead> <tbody> <tr> <td rowspan="7"> <span class="mim-font"> <a href="/geneMap/10/147?start=-3&amp;limit=10&amp;highlight=147"> 10q11.21 </a> </span> </td><td> <span class="mim-font">{Hirschsprung disease, protection against}</span> </td><td> <span class="mim-font"> <a href="/entry/142623"> 142623 </a> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="49" oldtitle="Autosomal dominant" title="">AD</abbr> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="50" oldtitle="3 - The molecular basis of the disorder is known" title="">3</abbr> </span> </td></tr><tr> <td> <span class="mim-font">{Hirschsprung disease, susceptibility to, 1}</span> </td><td> <span class="mim-font"> <a href="/entry/142623"> 142623 </a> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="51" oldtitle="Autosomal dominant" title="">AD</abbr> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="52" oldtitle="3 - The molecular basis of the disorder is known" title="">3</abbr> </span> </td></tr><tr> <td> <span class="mim-font"> Central hypoventilation syndrome, congenital </span> </td><td> <span class="mim-font"> <a href="/entry/209880"> 209880 </a> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="53" oldtitle="Autosomal dominant" title="">AD</abbr> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="54" oldtitle="3 - The molecular basis of the disorder is known" title="">3</abbr> </span> </td></tr><tr> <td> <span class="mim-font"> Medullary thyroid carcinoma </span> </td><td> <span class="mim-font"> <a href="/entry/155240"> 155240 </a> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="55" oldtitle="Autosomal dominant" title="">AD</abbr> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="56" oldtitle="3 - The molecular basis of the disorder is known" title="">3</abbr> </span> </td></tr><tr> <td> <span class="mim-font"> Multiple endocrine neoplasia IIA </span> </td><td> <span class="mim-font"> <a href="/entry/171400"> 171400 </a> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="57" oldtitle="Autosomal dominant" title="">AD</abbr> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="58" oldtitle="3 - The molecular basis of the disorder is known" title="">3</abbr> </span> </td></tr><tr> <td> <span class="mim-font"> Multiple endocrine neoplasia IIB </span> </td><td> <span class="mim-font"> <a href="/entry/162300"> 162300 </a> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="59" oldtitle="Autosomal dominant" title="">AD</abbr> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="60" oldtitle="3 - The molecular basis of the disorder is known" title="">3</abbr> </span> </td></tr><tr> <td> <span class="mim-font"> Pheochromocytoma </span> </td><td> <span class="mim-font"> <a href="/entry/171300"> 171300 </a> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="61" oldtitle="Autosomal dominant" title="">AD</abbr> </span> </td><td> <span class="mim-font"> <abbr class="mim-hint" data-hasqtip="62" oldtitle="3 - The molecular basis of the disorder is known" title="">3</abbr> </span> </td></tr></tbody> </table>"""
dfs = pd.read_html(html_string)
dfs[0]

In [None]:
phenotype_data = []
for index, row in dfs[0].iterrows():
    phenotype_data.append({
    "#gene symbol": geneSymbol,
    "HGNC": int(gene_data["HGNC"]),
    "phenotype": row["Phenotype"],
    "inheritance": row["Inheritance"],
    "omim_number": row["Phenotype MIM number"],
    "pmid": "",
    "inheritance info": "",
    "comment": "",
    "remove (add x)": "",
    })

In [None]:
all_genepanel_transcripts_records.append(gene_data)

for p in phenotype_data:
    all_genepanel_phenotypes_records.append(p)

In [None]:
df = print_genepanel_transcripts_file(all_genepanel_transcripts_records, 'test-ALL', 'v01')
df

In [None]:
df = print_genepanel_phenotypes_file(all_genepanel_phenotypes_records, 'test-ALL', 'v01')
df

# Copy over the Gene Panels

In [None]:
SAMPLE_NAME="test_sample_1"
ANALYSIS_NAME="test_all"
GENE_PANEL_NAME="test_all"
GENE_PANEL_VERSION="v01"

In [None]:
#! mkdir -p /home/jovyan/project/dev/dev-data/gene_panels/NRL-test-ALL_v01/
# make an analysis directory for the output VCF
#! mkdir -p /home/jovyan/project/dev/dev-data/analysis/complete/nrl_test_all-NRL-test-ALL_v01/
#! cp -rf NRL-test-ALL*csv /home/jovyan/project/dev/dev-data/gene_panels/NRL-test-ALL_v01/