# Genome plotter

This ipython script was written to create a visual representation of the human genome, reflecting its chemical composition (GC content), distribution of protein coding genes, and location of know genome wide association signals.

### Requirements:

* Previously downloaded and processed genomic data (prepared by `prepare_data.sh`): chunked chromosomes in bed format, gwas catalog, gencode file.
* non-standard python libraries: [pandas](http://pandas.pydata.org/), [numpy](http://www.numpy.org/), [cairosvg](http://cairosvg.org/), [pybedtools](https://pythonhosted.org/pybedtools/)
* Just for the record: once cairo has been installed on an OSX using homebrew, python by default won't be able to link to the libraries. Therefore this path pointing the `cairo/lib` directory has to be exported:

```bash
export DYLD_FALLBACK_LIBRARY_PATH=~/homebrew/Cellar/cairo/1.14.12/lib/
```


### About the data:

In this work I was using the GRCH38 build of the human genome (Ensembl release 83), GENCODE version , the GWAS catalog was downloaded on 2015.06.16 (genomic coordinates also in GRCH37), the downloadable GWAS catalog was extended with an in-house maintained positive control collection. 


### Further readings and references:

* Human genome:

* GWAS:

### Data sources:

* [Ensembl](Ensembl.org): the European genome database, where all known genomic information is aggregated including genes, transcripts, variations, regulatory elements and many more. They provide access to the human genome via ftp server.

* [GWAS catalog](https://www.ebi.ac.uk/gwas/): manually curated collection of variations in the human genome with known phenotypical changes including [breast size](https://www.ebi.ac.uk/gwas/search?query=Breast%20size) and [polytical ideology](https://www.ebi.ac.uk/gwas/search?query=Political%20ideology). 

* [GENCODE](http://www.gencodegenes.org/): ultimate resource of annotated genetic elements example genes, transcripts, exons etc. 

In [3]:
import time
print ("Last modified:", (time.strftime("%d/%m/%Y")))

Last modified: 19/01/2018


## Stage 1.

In the first stage we further process the sequence data, and integrate GENCODE and GWAS data into a single dataframe and save it for plotting.

### Steps:

1. Reading bedfiles with the chunk numbers and the GC content.
2. Run bedtools to find out which chunks overlap with protein coding genes and GWAS hits.
3. Based on the GC content and the genetic overlapping, a color is assigned to each chunk.
4. Constructing a data frame with the above mentioned data.
5. Save dataframe in binary file (pickle).

In [4]:
'''
Importig libraries:
'''

import gzip # For reading data
import pybedtools # For finding overlap between our chunks and genes and GWAS signals
import pandas as pd # data handling
import numpy as np # Working with large arrays
import colorsys # Generate color gradient.
import cairosvg # Converting svg to image
import pickle # Saving dataframes to disk
import os.path # checking if the datafies are already there.
from time import gmtime, strftime

### Color functions

In [5]:
# Functions to generate and modify colors
import colorsys

def hex_to_RGB(hex):
    ''' "#FFFFFF" -> [255,255,255] '''
    # Pass 16 to the integer function for change of base
    return [int(hex[i:i+2], 16) for i in range(1,6,2)]


def RGB_to_hex(RGB):
    ''' [255,255,255] -> "#FFFFFF" '''
    # Components need to be integers for hex to make sense
    RGB = [int(x) for x in RGB]
    return "#"+"".join(["0{0:x}".format(v) if v < 16 else
            "{0:x}".format(v) for v in RGB])

def linear_gradient(start_hex, finish_hex="#FFFFFF", n=10):
    '''
    returns a gradient list of (n) colors between
    two hex colors. start_hex and finish_hex
    should be the full six-digit color string,
    inlcuding the sharp sign (eg "#FFFFFF")
    '''
    # Starting and ending colors in RGB form
    s = hex_to_RGB(start_hex)
    f = hex_to_RGB(finish_hex)

    # Initilize a list of the output colors with the starting color
    RGB_list = [start_hex]

    # Calcuate a color at each evenly spaced value of t from 1 to n
    for t in range(1, n):

        # Interpolate RGB vector for color at the current value of t
        curr_vector = [
            int(s[j] + (float(t)/(n-1))*(f[j]-s[j]))
            for j in range(3)
        ]

        # Add it to our list of output colors
        RGB_list.append(RGB_to_hex(curr_vector))

    return RGB_list

def color_darkener(row, width, threshold, max_diff_value):
    '''
    width - how many chunks do we have in one line.
    threshold - at which column the darkening starts
    max_diff_value - the max value of darkening

    Once the colors are assigned, we make the colors darker for those columns
    that are at the end of the plot.
    '''

    color = row['color']
    col_frac = row['x'] / width

    # Ok, we have the color, now based on the column we make it a bit darker:
    if col_frac > threshold:
        diff = (col_frac - threshold)/(1 - threshold)
        factor = 1 - max_diff_value*diff

        # Get rgb code of the hexa code:
        rgb_code = hex_to_RGB(color)

        # Get the hls code of the rgb:
        hls_code = colorsys.rgb_to_hls(rgb_code[0]/float(255),
                                       rgb_code[1]/float(255),
                                       rgb_code[2]/float(255))

        # Get the modifed rgb code:
        new_rgb = colorsys.hls_to_rgb(hls_code[0],
                                      hls_code[1]*factor,
                                      hls_code[2])

        # Get the modifed hexacode:
        color = RGB_to_hex([new_rgb[0] * 255,
                           new_rgb[1] * 255,
                           new_rgb[2] * 255])
    return(color)


### Helper functions:

In [6]:
# Helper functions:
def generate_xy(df, min_pos, chunk_size, width, position_column = 'start', x = 'x', y = 'y'):
    '''
    Generating x,y coordinates from genomic position and
    the provided plot width or chunk size.

    pos - genomic position
    pixel - the size of the point taken up by one chunk.
    width - number of chunks in one row
    chunk_size - the number of basepairs pooled together in one chunk.
    '''
    pos = int(df[position_column]) - min_pos
    chunk = int(int(pos)/chunk_size)
    df[x] = (chunk - int(chunk / width)*width)
    df[y] = int(chunk / width)
    return (df)

def get_now():
    return strftime("%H:%M:%S", gmtime())

### Defining variables that are submitted from the command line

In [88]:
chromosome = '21'
dimension = 200
axis = 1
pixel = 9
darkStart = 0.75
darkMax = 0.15
test = '' #30000

# Understanding the provided axis:
fixed_dim = "Width"
if axis == 2:
    fixed_dim = "Height"

# Print report:
print("[Info %s] Processing chromosome: %s." % (get_now(), chromosome))
print("[Info %s] Fixed dimension: %s, length: %s chunks." % (get_now(), fixed_dim, dimension))

workingDir = os.getcwd()
print("[Info %s] Working directory: %s." % (get_now(), workingDir))

# Checking input files:
cytoband_file = workingDir + '/data/cytoBand.GRCh38.bed.bgz'
Chromosome_file_loc = workingDir + "/data/Processed_chr%s.bed.gz"
GWAS_file_loc = workingDir + "/data/processed_GWAS.bed.gz"
GENCODE_file_loc = workingDir + "/data/GENCODE.merged.bed.gz"
outputDir = workingDir + "/plots"

[Info 02:35:06] Processing chromosome: 21.
[Info 02:35:06] Fixed dimension: Width, length: 200 chunks.
[Info 02:35:06] Working directory: /Users/ds26/Projects/GenomePlotter.


### Reading genome file

In [89]:
# Reading datafile:
dataFile = Chromosome_file_loc % chromosome
print("[Info %s] reading file %s... " % (get_now(), dataFile))
chr_dataf = pd.read_csv(dataFile, compression='gzip', sep='\t', quotechar='"')

# If we want to do test, we can restrict the dataframe to n lines:
if test:
    chr_dataf = chr_dataf.ix[0:test]
    print("[Info %s] Test mode is on. The first %s rows will be kept." %(get_now(), test))

# the start position will be the new index:
chr_dataf = chr_dataf.set_index('start', drop = False)

chunk_size = chr_dataf.head(1).apply(lambda x: x['end'] - x['start'], axis = 1).tolist()[0]
min_pos  = chr_dataf.start.min()
print("[Info %s] Number of chunks read: %s" %(get_now(), chr_dataf.shape[0]))

# Get the width (number of chunks plotted in one row)
width = dimension # We are dealing with fixed column numbers
if axis == 2:  width = int(chr_dataf.shape[0]/dimension) # We are dealing with fixed number of rows.

# Adding plot coordinates to the dataframe:
print("[Info %s] Chunk size: %s bp, plot width: %s" %(get_now(), chunk_size, width))
print("[Info %s] Calculating plot coordinates for each chunk." %(get_now()))
chr_dataf = chr_dataf.apply(generate_xy, args = (min_pos, chunk_size, width), axis =1)
print(chr_dataf.head())


[Info 02:35:25] reading file /Users/ds26/Projects/GenomePlotter/data/Processed_chr21.bed.gz... 
[Info 02:35:25] Number of chunks read: 103799
[Info 02:35:25] Chunk size: 450.0 bp, plot width: 200
[Info 02:35:25] Calculating plot coordinates for each chunk.
        chr   start     end  GC_ratio  x  y
start                                      
0      21.0     0.0   450.0       NaN  0  0
450    21.0   450.0   900.0       NaN  1  0
900    21.0   900.0  1350.0       NaN  2  0
1350   21.0  1350.0  1800.0       NaN  3  0
1800   21.0  1800.0  2250.0       NaN  4  0


### Reading GENCODE file and merge with genome

In [90]:
# Reading GENCODE file:
print("[Info %s] Reading GENCODE file (%s)..." % (get_now(), GENCODE_file_loc))
GENCODE_bed = pybedtools.BedTool(GENCODE_file_loc)

# Run intersectbed:
chr_data_bed = pybedtools.BedTool(dataFile)

print("[Info %s] Selecting intersecting features." %(get_now()))
GencodeIntersect = chr_data_bed.intersect(GENCODE_bed, wa = True, wb = True)
GC_INT = pybedtools.bedtool.BedTool.to_dataframe(GencodeIntersect)

# Assign features to each chunk:
print("[Info %s] Assign intersecting features to each chunk." % (get_now()))
GENCODE_chunks = GC_INT.groupby('start').apply(lambda x: 'exon' if 'exon' in x.thickEnd.unique() else 'gene' )
GENCODE_chunks.names = "GENCODE"

chr_dataf['GENCODE'] = 'intergenic' # By default, all GENCODE values are intergenic
chr_dataf.GENCODE.update(GENCODE_chunks) # This value will be overwritten if overlaps with exon or gene
print(chr_dataf.head())

[Info 02:37:38] Reading GENCODE file (/Users/ds26/Projects/GenomePlotter/data/GENCODE.merged.bed.gz)...
[Info 02:37:38] Selecting intersecting features.
[Info 02:37:38] Assign intersecting features to each chunk.
        chr   start     end  GC_ratio  x  y     GENCODE
start                                                  
0      21.0     0.0   450.0       NaN  0  0  intergenic
450    21.0   450.0   900.0       NaN  1  0  intergenic
900    21.0   900.0  1350.0       NaN  2  0  intergenic
1350   21.0  1350.0  1800.0       NaN  3  0  intergenic
1800   21.0  1800.0  2250.0       NaN  4  0  intergenic


### Reading cytoband file

In [91]:
# Reading cytoband file:
print("[Info %s] Opening and processing cytoband file..." %(get_now()))
cyb_df = pd.read_csv(cytoband_file, compression='gzip', sep='\t')
cyb_df = cyb_df.loc[cyb_df.chr == chromosome] # Selecting only the relevant rows

# Centromeres actually won't be used.... these rows will be deleted:
centromer_loc = cyb_df.loc[(cyb_df.chr == chromosome) & (cyb_df.type == 'acen'),['start', 'end']]
centromer_loc = (int(centromer_loc.start.min()),
                 int(centromer_loc.end.max()))

# Calculating proper y coordinate for both the start and end position of each cytoband (x coordinate won't be used:):
cyb_df = cyb_df.apply(generate_xy, axis = 1, args = (min_pos, chunk_size, width), y = 'y1')
cyb_df = cyb_df.apply(generate_xy, axis = 1, args = (min_pos, chunk_size, width), position_column = 'end', y = 'y2')

print(cyb_df.head())

# Assigning centromere:
chr_dataf.loc[(chr_dataf.end > centromer_loc[0]) & (chr_dataf.start < centromer_loc[1]) , 'GENCODE'] = 'centromere'

# Assigning heterocromatin where no GC ratio is available:
chr_dataf.loc[chr_dataf.GC_ratio.isnull(), 'GENCODE'] = 'heterochromatin'

[Info 02:37:46] Opening and processing cytoband file...
    chr     start       end   name   type    x   y1   y2
616  21         0   3100000    p13   gvar   88    0   34
617  21   3100000   7000000    p12  stalk  155   34   77
618  21   7000000  10900000  p11.2   gvar   22   77  121
619  21  10900000  12000000  p11.1   acen   66  121  133
620  21  12000000  13000000  q11.1   acen   88  133  144


### Assigning colors

In [92]:
# Basic color assignment and adjustment:
colors_GENCODE = {
    'centromere'      : linear_gradient('#9393FF', n=20),
    'heterochromatin' : linear_gradient('#F9D2C2', finish_hex='#ffc6af', n=20), # Monochrome, no gradient!
    'intergenic'      : linear_gradient('#A3E0D1', n=20),
    'exon'            : linear_gradient('#FFD326', n=20),
    'gene'            : linear_gradient('#6CB8CC', n=20)
}

print("[Info %s] Assigning colors to each chunk." % (get_now()))
chr_dataf['color'] = chr_dataf.apply(lambda x: colors_GENCODE[x['GENCODE']][int(x[3]*20)] if not np.isnan(x[3]) else colors_GENCODE['heterochromatin'][0], axis = 1)

# Get the colors darker based on column number:
print("[Info %s] Apply darkness filter." % (get_now()))
chr_dataf['color'] = chr_dataf.apply(color_darkener, axis = 1, args=(width, darkStart, darkMax))
print(chr_dataf.head())

[Info 02:37:50] Assigning colors to each chunk.
[Info 02:37:54] Apply darkness filter.
        chr   start     end  GC_ratio  x  y          GENCODE    color
start                                                                
0      21.0     0.0   450.0       NaN  0  0  heterochromatin  #F9D2C2
450    21.0   450.0   900.0       NaN  1  0  heterochromatin  #F9D2C2
900    21.0   900.0  1350.0       NaN  2  0  heterochromatin  #F9D2C2
1350   21.0  1350.0  1800.0       NaN  3  0  heterochromatin  #F9D2C2
1800   21.0  1800.0  2250.0       NaN  4  0  heterochromatin  #F9D2C2


### Selecting GWAS signals

In [93]:
## Overlapping GWAS signals:
print("[Info %s] Opening GWAS file.." %(get_now()))
GWAS_file = pybedtools.BedTool(GWAS_file_loc)
full_GWAS = pybedtools.bedtool.BedTool.to_dataframe(GWAS_file)
GWAS_df = full_GWAS[full_GWAS.chrom == str(chromosome)]
print("[Info %s] Calculating plot coordinates for overlapping GWAS sigals." %(get_now()))
GWAS_df = GWAS_df.apply(generate_xy, args = (min_pos, chunk_size, width), axis =1)


[Info 02:37:59] Opening GWAS file..
[Info 02:38:00] Calculating plot coordinates for overlapping GWAS sigals.


In [94]:
# Reading the list of genes to be added as extra annotation:
genes_df = pd.read_table('temp/genes.txt', sep = " ")
genes_bed = pybedtools.BedTool.from_dataframe(genes_df)

# Extracting the chunks overlapping with the genes:
genes_Intersect = genes_bed.intersect(chr_data_bed, wa = True, wb = True)
genes_Intersect_df = genes_Intersect.to_dataframe()
genes_Intersect_df[['x', 'y', 'GENCODE', 'color']] = genes_Intersect_df.apply(lambda x: chr_dataf.loc[x['thickStart']][['x', 'y', 'GENCODE', 'color']], axis =1)

# Assigning colors to the selected dataframe:
gene_select_color = {
    'gene'   : linear_gradient('#FD25BC', n=20),
    'exon'   : linear_gradient('#FFC300', n=20)
}

print("[Info %s] Assigning colors to each chunk in the selected gene." % (get_now()))
genes_Intersect_df['color'] = genes_Intersect_df.apply(lambda x: gene_select_color[x['GENCODE']][int(x['itemRgb']*20)], axis = 1)
genes_Intersect_df['color'] = genes_Intersect_df.apply(color_darkener, axis = 1, args=(width, darkStart, darkMax))
print(genes_Intersect_df.head())


[Info 02:38:04] Assigning colors to each chunk in the selected gene.
   chrom    start      end     name            score  strand  thickStart  \
0     21  7744962  7777853  SMIM11B  ENSG00000273590      21     7744950   
1     21  7744962  7777853  SMIM11B  ENSG00000273590      21     7745400   
2     21  7744962  7777853  SMIM11B  ENSG00000273590      21     7745850   
3     21  7744962  7777853  SMIM11B  ENSG00000273590      21     7746300   
4     21  7744962  7777853  SMIM11B  ENSG00000273590      21     7746750   

   thickEnd   itemRgb   x   y GENCODE    color  
0   7745400  0.660000  11  86    exon  #ffecae  
1   7745850  0.620000  12  86    exon  #ffe8a1  
2   7746300  0.388889  13  86    gene  #fd75d4  
3   7746750  0.437778  14  86    gene  #fd80d8  
4   7747200  0.446667  15  86    gene  #fd80d8  


### Plotting chromosome:

In [97]:
outputFileName = outputDir +("/cicaful")
plot = SVG_plot(chr_dataf.x.max(), chr_dataf.y.max(), pixel, margins = [350, 100, 100, 100])

print("[Info %s] Drawing chunks (%s of them)... might take a while to complete." % (get_now(), chr_dataf.shape[0]))
chr_dataf.apply(plot.draw_chunk, axis = 1)

print("[Info %s] Adding selected genes (%s chunks)." % (get_now(), genes_Intersect_df.shape[0]))
genes_Intersect_df.apply(plot.draw_chunk, axis = 1, y_scale = 2, y_shift = -0.5)

print("[Info %s] Adding GWAS hits." %(get_now()))
GWAS_df.apply(plot.draw_GWAS, axis = 1 )

print("[Info %s] Marking centromere." %(get_now()))
plot.mark_centromere(int(centromer_loc[0]/chunk_size), int(centromer_loc[1]/chunk_size))

GWAS_df.sample(frac = 0.20).apply(plot.add_assoc, axis = 1 )

print("[Info %s] Adding cytoband ruler." %(get_now()))
cyb_df.loc[cyb_df.chr == chromosome].apply(plot.draw_cytoband, axis = 1)

print("[Info %s] Saving svg file.." % (get_now()))
plot.save_svg(outputFileName + ".svg")

print("[Info %s] Saving png file.." % (get_now()))
plot.save_png(outputFileName + ".png")

[Info 02:43:43] Drawing chunks (103799 of them)... might take a while to complete.
[Info 02:46:54] Adding selected genes (293 chunks).
[Info 02:46:55] Adding GWAS hits.
[Info 02:46:57] Marking centromere.
[Info 02:46:57] Adding cytoband ruler.
[Info 02:46:57] Saving svg file..
[Info 02:46:58] Saving png file..


![caption](plots/cicaful.png)

In [96]:
# The svg class 
import cairosvg
import math
import sys

class SVG_plot:
    '''
    This class contains all the methods and data to create the genome plot in svg format.
    Once the process is done, use can choose to save the svg or render using cairosvg.

    margins = array of four integers corresponding to the left, upper, right and bottom margin.

    At this point the function won't check if the arguments are correct.
    '''

    def __init__(self, width, height, pixel, margins = [0,0,0,0]):

        # Testing if margins are proper:
        try:
            self.margins = [ int(x * pixel) for x in margins]
        except:
            sys.exit("[Error] Margins were not properly formatted! Expecting an array of four floats.")

        self.width = width
        self.height = height
        self.pixel = pixel
        self.margins = margins

        # Constants for the cytobnand plot:
        self.cband_x1 = self.margins[0] * 0.5
        self.cband_x2 = self.margins[0] * 0.7
        self.cband_width = (self.cband_x2 - self.cband_x1)
        self.cband_colors = {
            'gneg'    : '#FFFFFF', # White
            'gpos25'  : '#E5E5E5', # Gray90
            'gpos50'  : '#CCCCCC', # Gray80
            'gpos75'  : '#B3B3B3', # Gray70
            'gpos100' : '#999999', # Gray60
            'acen'    : '#CCCCCC', # Gray80
            'gvar'    : '#999999', # Gray60
            'stalk'   : '#E5E5E5', # Gray90
            'border'  : '#999999'  # Gray60
        }

        # This is a scaled star for plotting custom annotation:
        self.star = [e for ts in zip([self.__rotate(0, pixel*2, math.radians(alpha*36)) for alpha in range(0,9,2)],
                                    [self.__rotate(0, pixel*2, math.radians(alpha*36), 1.7) for alpha in range(1,10,2)],) for e in ts]

        self.plot =  '<svg width="%s" height="%s">\n' % (width*pixel + margins[0] + margins[2], height*pixel + margins[1] + margins[3])

    def __adjust_coord(self, x, y):
        ''' This simple function just sifts the coordinates by the margins and corrects for the pixel size.'''
        return (int(x*self.pixel + self.margins[0]),
                int(y*self.pixel + self.margins[1]))

    def add_assoc(self, row):
        ''' Adding a star to the polt and the rsID and trait... once later'''
        (x,y) = self.__adjust_coord(row['x'], row['y'])

        # Offsetting the star:
        self.plot += ('<polygon stroke="white" fill="#E25FDA" stroke-width="2" points="%s" />\n' %
                      " ".join([",".join([str(x + a[0] + self.pixel/2),str(y+a[1] + self.pixel/2)]) for a in self.star]))

        # Adding annotation:
        # to be implemented.... not yet there...


    # Adding square:
    def draw_chunk(self, row, y_scale = 1, y_shift = 0):
        '''
        This function expects a row of a dataframe in which there must be the
        following three columns:
            x - column number
            y - row number
            color - color of the field in hexadecimal code.
            
        y_scale: the pre-defined pixel size will be scaled by this factor.
        y_sihft: the box will be shifted on the y axis by this factor (in pixel)
        '''
        (x,y) = self.__adjust_coord(row['x'],row['y'])
        color = row['color']
        self.plot += ('<rect x="%s" y="%s" width="%s" height="%s" style="stroke-width:1;stroke:%s; fill: %s" />\n' %
            (x, (y + self.pixel * y_shift), self.pixel , self.pixel * y_scale, color, color))

    # Adding dot:
    def draw_GWAS(self, row):
        (x,y) = self.__adjust_coord(row['x'],row['y'])
        self.plot += ('<circle cx="%s" cy="%s" r="%s" stroke="%s" stroke-width="1" fill="%s" />\n' %
            (x + self.pixel/2, y + self.pixel/2, self.pixel * 0.75, "black", "black"))

    def mark_centromere(self, centr_start, centr_end):

        # calculating the y coordinates of the centromeres based on the chunk count:
        start_y = int(centr_start / self.width)
        end_y = int(centr_end / self.width)

        # Marking centromere on the left:
        self.plot += ('<path d="M %s %s C %s %s, %s %s, %s %s C %s %s, %s %s, %s %s Z" fill="white"/>\n' %
                  ( self.__adjust_coord(0, start_y) +
                    self.__adjust_coord(0, (start_y+end_y)/2) +
                    self.__adjust_coord((end_y-start_y), (start_y+end_y)/2) +
                    self.__adjust_coord((end_y-start_y)*2, (start_y+end_y)/2) +
                    self.__adjust_coord((end_y-start_y), (start_y+end_y)/2) +
                    self.__adjust_coord(0, (start_y+end_y)/2) +
                    self.__adjust_coord(0, end_y )))

        # Marking centromoere on the right:
        self.plot += ('<path d="M %s %s C %s %s, %s %s, %s %s C %s %s, %s %s, %s %s Z" fill="white"/>\n' %
                  ( self.__adjust_coord(self.width+1, start_y) +
                    self.__adjust_coord(self.width+1, (start_y+end_y)/2) +
                    self.__adjust_coord(self.width+1-(end_y-start_y), (start_y+end_y)/2) +
                    self.__adjust_coord(self.width+1-(end_y-start_y)*2, (start_y+end_y)/2) +
                    self.__adjust_coord(self.width+1-(end_y-start_y), (start_y+end_y)/2) +
                    self.__adjust_coord(self.width+1, (start_y+end_y)/2) +
                    self.__adjust_coord(self.width+1, end_y)))
    # Adding legend?
    # Adding centromere? DONE
    # Adding frame?
    # Adding custom annotation.
    # Adding cytobands.
    # Rotating and scaling vecot
    def __rotate(self, x, y, angle, scale=1):
        return ((x * math.cos(angle) - y * math.sin(angle))*scale,
                (x * math.sin(angle) + y * math.cos(angle))*scale)

    # Close svg document:
    def __close(self):
        self.plot += '</svg>\n'

    # Drawing the cytoband ruler next to the chromosome:
    def draw_cytoband(self, row):

        (x1,y1) = self.__adjust_coord(row['x'],row['y1'])
        (x2,y2) = self.__adjust_coord(row['x'],row['y2'])

        # Centromeres are plotted as triangles facing against each other:
        if row['type'] == 'acen' and 'p' in row['name']:
            self.plot += ('<polygon points="%s,%s %s,%s %s,%s" style="fill:%s;stroke:%s;stroke-width:1;fill-rule:nonzero;" />\n' %
                  (self.cband_x1, y1, self.cband_x2, y1, (self.cband_x1 + self.cband_x2) / 2 , y2, self.cband_colors[row['type']], self.cband_colors['border']))
        elif row['type'] == 'acen' and 'q' in row['name']:
            self.plot += ('<polygon points="%s,%s %s,%s %s,%s" style="fill:%s;stroke:%s;stroke-width:1;fill-rule:nonzero;" />\n' %
                  (self.cband_x1, y2, self.cband_x2, y2, (self.cband_x1 + self.cband_x2) / 2, y1, self.cband_colors[row['type']], self.cband_colors['border']))

        # Regular bands are plotted as rectables, where the fill color is set based on the type of the band:
        else:
            self.plot += ('<rect x="%s" y="%s" width="%s" height="%s" style="stroke-width:1;stroke:%s; fill: %s" />\n' %
                (self.cband_x1, y1, self.cband_width , y2 - y1, self.cband_colors['border'], self.cband_colors[row['type']]))

        # Adding the name of the band:
        self.plot += ('<text x="%s" y="%s" text-anchor="end" font-family="sans-serif" font-size="50px" fill="%s">%s</text>' %(self.cband_x1*0.8, y1 + (y2 - y1)/2, self.cband_colors['border'], row['name']))

    # Save svg file:
    def save_svg(self, file_name):
        self.__close() # Closing the svg tag.
        f = open(file_name, 'w')
        f.write(self.plot)
        f.close()

    # Render png file:
    def save_png(self, file_name):
        cairosvg.svg2png(bytestring=self.plot,write_to=file_name)


### Adding gene as annotation


```bash
cd GenomePlotter/temp
genes="TPTE KCNE1B SMIM11B"

cat <(echo "chr start end name id") <(for gene in ${genes}; do wget -q "rest.ensembl.org/lookup/symbol/homo_sapiens/${gene}?content-type=application/json" -O - | jq -r '. |"\(.seq_region_name) \(.start) \(.end) \(.display_name) \(.id)"'; done | sort -k1,1 -k2,2n ) > genes.txt
```

```
chr  start     end       name     id
21   7744962   7777853   SMIM11B  ENSG00000273590
21   7816675   7829926   KCNE1B   ENSG00000276289
21   10521553  10606140  TPTE     ENSG00000274391
```

In [53]:
# reading gene annotation:
genes_df = pd.read_table('temp/genes.txt', sep = " ")
genes_df.head()

Unnamed: 0,chr,start,end,name,id
0,21,7744962,7777853,SMIM11B,ENSG00000273590
1,21,7816675,7829926,KCNE1B,ENSG00000276289
2,21,10521553,10606140,TPTE,ENSG00000274391


In [71]:
# Reading the list of genes to be added as extra annotation:
genes_df = pd.read_table('temp/genes.txt', sep = " ")
genes_bed = pybedtools.BedTool.from_dataframe(genes_df)

# Extracting the chunks overlapping with the genes:
genes_Intersect = genes_bed.intersect(chr_data_bed, wa = True, wb = True)
genes_Intersect_df = genes_Intersect.to_dataframe()
genes_Intersect_df[['x', 'y', 'GENCODE', 'color']] = genes_Intersect_df.apply(lambda x: chr_dataf.loc[x['thickStart']][['x', 'y', 'GENCODE', 'color']], axis =1)

# Assigning colors to the selected dataframe:
gene_select_color = {
    'gene'   : linear_gradient('#FD25BC', n=20),
    'exon'   : linear_gradient('#DAF7A6', n=20)
}

print("[Info %s] Assigning colors to each chunk in the selected gene." % (get_now()))
genes_Intersect_df['color'] = genes_Intersect_df.apply(lambda x: gene_select_color[x['GENCODE']][int(x['itemRgb']*20)], axis = 1)
genes_Intersect_df['color'] = genes_Intersect_df.apply(color_darkener, axis = 1, args=(width, darkStart, darkMax))
print(genes_Intersect_df.head())


[Info 02:10:33] Assigning colors to each chunk in the selected gene.
   chrom    start      end     name            score  strand  thickStart  \
0     21  7744962  7777853  SMIM11B  ENSG00000273590      21     7744950   
1     21  7744962  7777853  SMIM11B  ENSG00000273590      21     7745400   
2     21  7744962  7777853  SMIM11B  ENSG00000273590      21     7745850   
3     21  7744962  7777853  SMIM11B  ENSG00000273590      21     7746300   
4     21  7744962  7777853  SMIM11B  ENSG00000273590      21     7746750   

   thickEnd   itemRgb   x   y GENCODE    color  
0   7745400  0.660000  11  86    exon  #edbfff  
1   7745850  0.620000  12  86    exon  #eab4ff  
2   7746300  0.388889  13  86    gene  #fd75d4  
3   7746750  0.437778  14  86    gene  #fd80d8  
4   7747200  0.446667  15  86    gene  #fd80d8  


In [70]:
genes_Intersect_df.head().color

0    #edbfff
1    #eab4ff
2    #fd75d4
3    #fd80d8
4    #fd80d8
Name: color, dtype: object