# Advanced random IS
Stage: model Zero - a simple model  
This is a tool to generate random integration sites on a hg38 for AAV-vector

#### Authors: 
Elvira Mingazova, Saira Afzal, Raffaele Fronza 2017
#### Parameters:
n - number of random intervals  
r - range  
d - delta

First we want to make the input of the parameters possible. For that reason we use sys and argparse modules
(inspired by https://dmorgan.info/posts/argparse-intro/)

In [18]:
#parser setup
import sys
import argparse
parser = argparse.ArgumentParser()
parser.prog = 'progName.py'
parser.description = 'You can provide the program with three parameters through the terminal'
parser.add_argument("-n", type=int, help='Number of integration sites generated')
parser.add_argument("-r", type=int, help='Range-value: defines an interval where the IS is located')
parser.add_argument("-d", type=int, help='Delta-value: expands the range with the value provided by user')

_StoreAction(option_strings=['-d'], dest='d', nargs=None, const=None, default=None, type=<type 'int'>, choices=None, help='Delta-value: expands the range with the value provided by user', metavar=None)

In [19]:
parser.parse_args(['--help'])

usage: progName.py [-h] [-n N] [-r R] [-d D]

You can provide the program with three parameters through the terminal

optional arguments:
  -h, --help  show this help message and exit
  -n N        Number of integration sites generated
  -r R        Range-value: defines an interval where the IS is located
  -d D        Delta-value: expands the range with the value provided by user


SystemExit: 0

Now we are ready to parse some arguments. The parse_args method expects a list of strings to use as input. Typically, this list will come from sys.argv(input from the terminal) but in Jupyter Notebook you just have to provide the code with the parameters of your choice right inside of the cell. For instance:

In [3]:
namespace=parser.parse_args('-n 5 -r 0 -d 1'.split())
print namespace

Namespace(d=1, n=5, r=0)


The parameters will be stored in a Namespace object. Each parameter is accessible through namespace.n, namespace.r or namespace.d. In this example the following values are assigned to the parameters: namespace.n=4, namespace.r=0 and namespace.d=1

Now we are ready to set up the random model Zero:

In [26]:
import random
#create a dictionary for the chromosoms lengths
chrLen = {}
chrLen[1]=224999719
chrLen[2]=237712649
chrLen[3]=194704827
chrLen[4]=187297063
chrLen[5]=177702766
chrLen[6]=167273993
chrLen[7]=154952424
chrLen[8]=142612826
chrLen[9]=120312298
chrLen[10]=131624737
chrLen[11]=131130853
chrLen[12]=130303534
chrLen[13]=95559980
chrLen[14]=88290585
chrLen[15]=81341915
chrLen[16]=78884754
chrLen[17]=77800220
chrLen[18]=74656155
chrLen[19]=55785651
chrLen[20]=59505254
chrLen[21]=34171998
chrLen[22]=34893953
chrLen[23]=151058754
chrLen[24]=57741652 #is that number true? 57741652i was written in getRegion.awk
#define a count variable to visualise the order number of the output line
count = 1
#loop on n random IS
print "chr   #", '\t', 'Start', '\t\t', 'End', '\t\trnd #'
for i in range(namespace.n):
    #select a chromosome
    chrom = random.randint(1,24) #the name "chrom" was chosen because "chr" is a built-in function

    #select a site on that chromosome
    start = random.randint(1,chrLen[chrom])

    #select a random region
    end = start + random.randint(0,namespace.r) + namespace.d
    if len(str(chrom))==1:
        print "chr  ",chrom, "\t", start, "\t", end, "\trnd", count
    else:
        print "chr ", chrom, "\t", start, "\t", end, "\trnd", count
    count +=1

chr   # 	Start 		End 		rnd #
chr  20 	30591710 	30591711 	rnd 1
chr  18 	60218260 	60218261 	rnd 2
chr  15 	54590444 	54590445 	rnd 3
chr   2 	70629299 	70629300 	rnd 4
chr   2 	188315716 	188315717 	rnd 5


In this example the program calculated 5 random integration sites.

Now that argparse is taking care of the input validation, we can make a script, store it in the .py file and run it through the terminal.

### Making a script

In [35]:
%%writefile getRegion.py
import sys
import argparse
parser = argparse.ArgumentParser()
parser.prog = 'progName.py'
parser.description = 'You can provide the program with three parameters through the terminal'
parser.add_argument("-n", type=int, help='Number of integration sites generated')
parser.add_argument("-r", type=int, help='Range-value: defines an interval where the IS is located')
parser.add_argument("-d", type=int, help='Delta-value: expands the range with the value provided by user')

#input parsing
namespace = parser.parse_args((sys.argv[1:]))
#create a dictionary for the chromosoms lengths
import random
chrLen = {}
chrLen[1]=224999719
chrLen[2]=237712649
chrLen[3]=194704827
chrLen[4]=187297063
chrLen[5]=177702766
chrLen[6]=167273993
chrLen[7]=154952424
chrLen[8]=142612826
chrLen[9]=120312298
chrLen[10]=131624737
chrLen[11]=131130853
chrLen[12]=130303534
chrLen[13]=95559980
chrLen[14]=88290585
chrLen[15]=81341915
chrLen[16]=78884754
chrLen[17]=77800220
chrLen[18]=74656155
chrLen[19]=55785651
chrLen[20]=59505254
chrLen[21]=34171998
chrLen[22]=34893953
chrLen[23]=151058754
chrLen[24]=57741652
count = 1
#loop on n random IS
print "chr   #", '\t', 'Start', '\t\t', 'End', '\t\trnd #'
for i in range(namespace.n):
    #select a chromosome
    chrom = random.randint(1,24) #the name "chrom" was chosen because "chr" is a built-in function

    #select a site on that chromosome
    start = random.randint(1,chrLen[chrom])

    #select a random region
    end = start + random.randint(0,namespace.r) + namespace.d
    if len(str(chrom))==1:
        print "chr  ",chrom, "\t", start, "\t", end, "\trnd", count
    else:
        print "chr ", chrom, "\t", start, "\t", end, "\trnd", count
    count +=1

Overwriting getRegion.py


The contents of the script will be saved in a filename.py using the magic function %%writefile. Now we can run this script in the terminal.

In [ ]:
$ python getRegion.py -n 5 -r 0 -d 1
chr   #         Start           End             rnd #
chr  14         13705298        13705299        rnd 1
chr   7         16256478        16256479        rnd 2
chr  16         72191862        72191863        rnd 3
chr  23         47153563        47153564        rnd 4
chr  17         58457521        58457522        rnd 5