# Example HBB VCF and Annotation

Let's say we want to custom design a gene panel to integrate with Ella. We'll use HBB in this case, and generate some dummy VCFs that are homozygous for the reference, heterozygous, and homozygous for the alternate.

[OMIM](http://www.omim.org/entry/141900)

**Phenotype HBB c.20A>T**

[Clinvar - NM_000518.5(HBB):c.20A>T (p.Glu7Val) AND HEMOGLOBIN S](https://www.ncbi.nlm.nih.gov/clinvar/RCV000016573.11/)

[RS334](https://www.ncbi.nlm.nih.gov/snp/rs334)

**Location** - Chr11: 5248232 (on Assembly GRCh37)

NM_000518.5

Alleles
T>A / T>C / T>G

**Genomic Placements**

GRCh37.p13 chr 11	NC_000011.9:g.5248232T>A

GRCh37.p13 chr 11	NC_000011.9:g.5248232T>C

GRCh37.p13 chr 11	NC_000011.9:g.5248232T>G


**HBB c.19G>A (p.Glu6Lys)**

[Clinvar HBB c.19G>A (p.Glu6Lys)](https://www.ncbi.nlm.nih.gov/clinvar/variation/15126/)

[RS33930165](https://www.ncbi.nlm.nih.gov/snp/rs33930165)

**Location** - Chr11: 5248233 (GRCh37)

NM_000518.4

Alleles
C>G / C>T

## Generate Dummy VCF data for SNPs associated to HBB

This is a total hack to generate some VCF data that we can then load into Ella, and is only done for demonstration purposes.

In [1]:
vcf_data = {
    "sample_a_hom_ref": [
    {
        # First one
        '#CHROM': 11,
 'POS': 5248232,
 'ID': 'CM082515',
 'REF': "T",
 'ALT': "A",
 'QUAL': 5000.0,
 'FILTER': 'PASS',
 'INFO': '.',
 'FORMAT': 'GT:AD:DP:GQ:PL',
 'hbb_sample_a_hom_ref': '0/0:107,80:187:99:2048,0,2917'
    },    
    {
    '#CHROM': 11,
     'POS': 5248233,
 'ID': 'CM082516',
 'REF': "C",
 'ALT': "G",
 'QUAL': 5000.0,
 'FILTER': 'PASS',
 'INFO': '.',
 'FORMAT': 'GT:AD:DP:GQ:PL',
 'hbb_sample_a_hom_ref': '0/0:107,80:187:99:2048,0,2917'
    }
],
    "hbb_sample_b_het": [
    {
 '#CHROM': 11,
 'POS': 5248232,
 'ID': 'CM082515',
 'REF': "T",
 'ALT': "A",
 'QUAL': 5000.0,
 'FILTER': 'PASS',
 'INFO': '.',
 'FORMAT': 'GT:AD:DP:GQ:PL',
 'hbb_sample_b_het': '0/1:107,80:187:99:2048,0,2917'
    },    
    {
    '#CHROM': 11,
     'POS': 5248233,
 'ID': 'CM082516',
 'REF': "C",
 'ALT': "G",
 'QUAL': 5000.0,
 'FILTER': 'PASS',
 'INFO': '.',
 'FORMAT': 'GT:AD:DP:GQ:PL',
 'hbb_sample_b_het': '0/1:107,80:187:99:2048,0,2917'
    }
],
   "hbb_sample_c_hom_alt" : [
    {
 '#CHROM': 11,
 'POS': 5248232,
 'ID': 'CM082515',
 'REF': "T",
 'ALT': "A",
 'QUAL': 5000.0,
 'FILTER': 'PASS',
 'INFO': '.',
 'FORMAT': 'GT:AD:DP:GQ:PL',
 'hbb_sample_c_hom_alt': '1/1:107,80:187:99:2048,0,2917'
    },    
    {
    '#CHROM': 11,
     'POS': 5248233,
 'ID': 'CM082516',
 'REF': "C",
 'ALT': "G",
 'QUAL': 5000.0,
 'FILTER': 'PASS',
 'INFO': '.',
 'FORMAT': 'GT:AD:DP:GQ:PL',
 'hbb_sample_c_hom_alt': '1/1:107,80:187:99:2048,0,2917'
    }
] 
}

## Write out VCF Files

In [2]:
import copy
import os
import pandas as pd
import requests
from pprint import pprint
import numpy as np
from datetime import datetime
import json

In [3]:
# These are the paths that are local to the jupyterhub notebook. 

# There is a /data volume that is mounted to all the containers
# If you're using a different volume / file system change this
BASE_PATH="/data"

ANALYSIS_INCOMING_PATH=os.path.join(BASE_PATH, "analysis/incoming")
ANALYSIS_COMPLETE_PATH=os.path.join(BASE_PATH, "analysis/complete")

os.makedirs(ANALYSIS_INCOMING_PATH, exist_ok=True)
os.makedirs(ANALYSIS_COMPLETE_PATH, exist_ok=True)

genepanel_name="test_HBB"
genepanel_version="v01"

In [4]:
vcf_info_line = """##fileformat=VCFv4.1
##contig=<ID=13>
##FILTER=<ID=PASS,Description="All filters passed">
"""

vcf_columns = ["#CHROM", "POS", "ID", "REF", "ALT", "QUAL", "FILTER", "INFO", "FORMAT"]

def write_pandas_csv_with_info_line(file_name, info_line, df):
    with open(file_name, 'w') as fp:
        fp.write(info_line)
        df.to_csv(fp, index=False, sep="\t")

In [5]:
for sample in vcf_data.keys():
    columns = copy.deepcopy(vcf_columns)
    columns.append(sample)
    file_name = os.path.join(ANALYSIS_INCOMING_PATH, "{}.vcf".format(sample))
    
    vcf_records = vcf_data[sample]
    df = pd.DataFrame(columns = columns, data=vcf_records)

    with open(file_name, 'w') as fp:
        fp.write(vcf_info_line)
        df.to_csv(fp, index=False, sep="\t")

## Write out the Annotation Commands

This assumes that you are using the docker compose dev stack that is in this github repo. 

A quick note about the annotatation. Ella expects that VCF files will have the format `{analysis_name}.{genepanel_name}_{genepanel_version}`.

In [6]:
for sample in vcf_data.keys():
    name = "{}.{}_{}".format(sample, genepanel_name, genepanel_version)
    
    command = """
# SAMPLE: {sample}

docker-compose exec  ella-anno bash  -c "/anno/bin/annotate \\
        --vcf {incoming}/{sample}.vcf \\
        -o {complete}/{name}"
        
docker-compose exec  ella-anno bash  -c " mv \\
       {complete}/{name}/VCFANNO/output.vcf \\
       {complete}/{name}/{name}.vcf"
       
docker-compose exec ella-web bash -c "ella-cli deposit analysis \\
       {complete}/{name}/{name}.vcf"

    """.format(name=name, sample=sample, incoming=ANALYSIS_INCOMING_PATH, complete=ANALYSIS_COMPLETE_PATH)
    print(command)


# SAMPLE: sample_a_hom_ref

docker-compose exec  ella-anno bash  -c "/anno/bin/annotate \
        --vcf /data/analysis/incoming/sample_a_hom_ref.vcf \
        -o /data/analysis/complete/sample_a_hom_ref.test_HBB_v01"
        
docker-compose exec  ella-anno bash  -c " mv \
       /data/analysis/complete/sample_a_hom_ref.test_HBB_v01/VCFANNO/output.vcf \
       /data/analysis/complete/sample_a_hom_ref.test_HBB_v01/sample_a_hom_ref.test_HBB_v01.vcf"
       
docker-compose exec ella-web bash -c "ella-cli deposit analysis \
       /data/analysis/complete/sample_a_hom_ref.test_HBB_v01/sample_a_hom_ref.test_HBB_v01.vcf"

    

# SAMPLE: hbb_sample_b_het

docker-compose exec  ella-anno bash  -c "/anno/bin/annotate \
        --vcf /data/analysis/incoming/hbb_sample_b_het.vcf \
        -o /data/analysis/complete/hbb_sample_b_het.test_HBB_v01"
        
docker-compose exec  ella-anno bash  -c " mv \
       /data/analysis/complete/hbb_sample_b_het.test_HBB_v01/VCFANNO/output.vcf \
       /data/ana