# Explore usage of PyBedtools for Marker Design I/O

See 

- http://bedtools.readthedocs.org/en/latest/
- http://pythonhosted.org/pybedtools/


Can Tabix Bedtools see http://daler.github.io/pybedtools/autodocs/pybedtools.bedtool.BedTool.tabix.html#pybedtools.bedtool.BedTool.tabix

-------

### Explore usage with Fasta

Could possibly use this to get sequence of an amplicon

http://pythonhosted.org/pybedtools/autodocs/pybedtools.bedtool.BedTool.seq.html#pybedtools.bedtool.BedTool.seq

also consider other pure py tools https://pypi.python.org/pypi/pyfaidx

In [4]:
!pip freeze | grep pybedtools

pybedtools==0.7.6


In [5]:
from pybedtools import BedTool

In [6]:
from pybedtools import example_filename

### Example from docs

In [7]:
a = BedTool("""
... chr1 1 10
... chr1 50 55""", from_string=True)
fasta = example_filename('test.fa')
a = a.sequence(fi=fasta)
print(open(a.seqfn).read())

>chr1:1-10
GATGAGTCT
>chr1:50-55
CCATC



**NOTE** that VCF location is 1-based, versus bed/py 0-based

### Do same with original test data

In [9]:
ls /Users/cfljam/Documents/pcr_marker_design/test/test-data/

384um_251453690362217.txt  targets.fasta
targets                    targets.gff


#### Need an indexed  genome file

In [10]:
!samtools faidx /Users/cfljam/Documents/pcr_marker_design/test/test-data/targets.fasta

In [11]:
cat ../test/test-data/targets.fasta.fai

k69_93535	1628	11	60	61
k69_98089	749	1678	60	61


In [12]:
cat ../test/test-data/targets

k69_93535:SAMTOOLS:SNP:1147
k69_93535:SAMTOOLS:SNP:1336
k69_98089:SAMTOOLS:SNP:30
k69_98089:SAMTOOLS:SNP:550
k69_98089:SAMTOOLS:SNP:625


In [17]:
PRIMER_PRODUCT_SIZE_RANGE=[60,120]

In [20]:
PRIMER_PRODUCT_SIZE_RANGE[1]

120

### Make a target bed

b=BedTool("k69_93535 1146 1147",from_string=True)

### Check the properties of this and how to access them

 see https://pythonhosted.org/pybedtools/intervals.html

In [26]:
b[0]

Interval(k69_93535:1146-1147)

In [30]:
b[0].start

1146

In [32]:
b[0].length

1

### attach a sequence to it

In [15]:
b=b.sequence(fi='/Users/cfljam/Documents/pcr_marker_design/test/test-data/targets.fasta')

In [21]:
print(b)

k69_93535	1146	1147



In [23]:
c=BedTool('/Users/cfljam/Documents/pcr_marker_design/test/test-data/targets.gff')

In [25]:
print(c.intersect(b))

k69_93535	SAMTOOLS	SNP	1147	1147	999	.	.	ID=k69_93535:SAMTOOLS:SNP:1147;Variant_seq=G;Reference_seq=C;DP=2645;VDB=0.0371;AF1=0.3527;G3=0.2771,0.7229,6.934e-153;HWE=0.0248;AC1=8;DP4=733,804,447,519;MQ=42;FQ=999;PV4=0.51,0,0.027,1



### Can use subtraction to exclude our target

here we get the features in our design window and then remove the target to create an exclude

In [113]:
max_product_size=PRIMER_PRODUCT_SIZE_RANGE[1]
target_exclude=c.intersect(b.slop(b=max_product_size,g='../test/test-data/targets.fasta.fai')).subtract(b)
print(target_exclude)

k69_93535	SAMTOOLS	SNP	1141	1142	999	.	.	ID=k69_93535:SAMTOOLS:SNP:1141;Variant_seq=T;Reference_seq=C;DP=2644;VDB=0.0374;AF1=0.1882;AC1=5;DP4=748,786,225,294;MQ=42;FQ=999;PV4=0.037,0,0.036,0.39
k69_93535	SAMTOOLS	SNP	1147	1148	999	.	.	ID=k69_93535:SAMTOOLS:SNP:1147;Variant_seq=G;Reference_seq=C;DP=2645;VDB=0.0371;AF1=0.3527;G3=0.2771,0.7229,6.934e-153;HWE=0.0248;AC1=8;DP4=733,804,447,519;MQ=42;FQ=999;PV4=0.51,0,0.027,1



In [48]:
print(b.slop(b=max_product_size,g='../test/test-data/targets.fasta.fai'))

k69_93535	1026	1267



### Use Getfasta to extract the window of sequence

- need the genome or fai index for slop

In [56]:
target_window=b.slop(b=max_product_size,g='../test/test-data/targets.fasta.fai')

In [57]:
print(target_window)

k69_93535	1026	1267



In [60]:
target_window=target_window.sequence(fi='../test/test-data/targets.fasta')

In [81]:
open(target_window.seqfn).read()

'>k69_93535:1026-1267\nAGATGAATCAGACTCTTCAGTTGCTTCCTGCCCTCCTACACTTAATGAAGGAAAGAAAAAAAGGACAGGGAAGCTTCATAGGCCTTTGAGTCTGAACGCATTTGACATAATTTCCTTTTCCAGAGGATTTGATCTTTCAGGTTTGTTTGAAGAAACGGGAGATGAAACAAGATTTGTGTCGGGTGAAACGATACCAAACATCATATCGAAATTGGAGGAGATTGCAAAAGTGGGTAGTTTC\n'

In [82]:
fo=open(target_window.seqfn)

In [84]:
fo.readlines()[1].strip('\n')

'AGATGAATCAGACTCTTCAGTTGCTTCCTGCCCTCCTACACTTAATGAAGGAAAGAAAAAAAGGACAGGGAAGCTTCATAGGCCTTTGAGTCTGAACGCATTTGACATAATTTCCTTTTCCAGAGGATTTGATCTTTCAGGTTTGTTTGAAGAAACGGGAGATGAAACAAGATTTGTGTCGGGTGAAACGATACCAAACATCATATCGAAATTGGAGGAGATTGCAAAAGTGGGTAGTTTC'

### HOWTO Nudge all the annotations relative to the reference slice

we should be able to use bedtools [shift](http://bedtools.readthedocs.org/en/latest/content/tools/shift.html?highlight=shift)

but that doesnt seem to be implemented in pybedtools

In [90]:
print(c.intersect(target_window))

k69_93535	SAMTOOLS	SNP	1141	1142	999	.	.	ID=k69_93535:SAMTOOLS:SNP:1141;Variant_seq=T;Reference_seq=C;DP=2644;VDB=0.0374;AF1=0.1882;AC1=5;DP4=748,786,225,294;MQ=42;FQ=999;PV4=0.037,0,0.036,0.39
k69_93535	SAMTOOLS	SNP	1147	1148	999	.	.	ID=k69_93535:SAMTOOLS:SNP:1147;Variant_seq=G;Reference_seq=C;DP=2645;VDB=0.0371;AF1=0.3527;G3=0.2771,0.7229,6.934e-153;HWE=0.0248;AC1=8;DP4=733,804,447,519;MQ=42;FQ=999;PV4=0.51,0,0.027,1



#### Just need to get tuples in form [start,length] for primer3-py

*How come intervals for SNP are 2 bp??*

In [95]:
print(target_window)

k69_93535	1026	1267



In [120]:
[(X.start,X.length) for X in c.intersect(target_window)]

[(1140, 2), (1146, 2)]

In [121]:
offset=target_window[0].start
offset

1026

### can adjust annotations for design like..

In [122]:
[(X.start - offset,X.length) for X in c.intersect(target_window)]

[(114, 2), (120, 2)]

### Could Create a Design Target Dict Like so..

In [154]:
 target_dict={'SEQUENCE_ID': 'test'}   

In [155]:
target_dict['SEQUENCE_TARGET']=[b[0].start - offset,b[0].length]

In [156]:
target_dict['SEQUENCE_EXCLUDED_REGION']=[(X.start - offset,X.length) for X in c.intersect(target_window) - b]

In [157]:
fo=open(target_window.seqfn)
target_dict['SEQUENCE_TEMPLATE']=fo.readlines()[1].strip('\n')

In [158]:
target_dict

{'SEQUENCE_EXCLUDED_REGION': [(114, 2)],
 'SEQUENCE_ID': 'test',
 'SEQUENCE_TARGET': [120, 1],
 'SEQUENCE_TEMPLATE': 'AGATGAATCAGACTCTTCAGTTGCTTCCTGCCCTCCTACACTTAATGAAGGAAAGAAAAAAAGGACAGGGAAGCTTCATAGGCCTTTGAGTCTGAACGCATTTGACATAATTTCCTTTTCCAGAGGATTTGATCTTTCAGGTTTGTTTGAAGAAACGGGAGATGAAACAAGATTTGTGTCGGGTGAAACGATACCAAACATCATATCGAAATTGGAGGAGATTGCAAAAGTGGGTAGTTTC'}

In [166]:
 import primer3 as P3

In [160]:
p3_test_globals={
        'PRIMER_OPT_SIZE': 20,
        'PRIMER_PICK_INTERNAL_OLIGO': 1,
        'PRIMER_INTERNAL_MAX_SELF_END': 8,
        'PRIMER_MIN_SIZE': 18,
        'PRIMER_MAX_SIZE': 25,
        'PRIMER_OPT_TM': 60.0,
        'PRIMER_MIN_TM': 57.0,
        'PRIMER_MAX_TM': 63.0,
        'PRIMER_MIN_GC': 20.0,
        'PRIMER_MAX_GC': 80.0,
        'PRIMER_MAX_POLY_X': 100,
        'PRIMER_INTERNAL_MAX_POLY_X': 100,
        'PRIMER_SALT_MONOVALENT': 50.0,
        'PRIMER_DNA_CONC': 50.0,
        'PRIMER_MAX_NS_ACCEPTED': 0,
        'PRIMER_MAX_SELF_ANY': 12,
        'PRIMER_MAX_SELF_END': 8,
        'PRIMER_PAIR_MAX_COMPL_ANY': 12,
        'PRIMER_PAIR_MAX_COMPL_END': 8,
        'PRIMER_PRODUCT_SIZE_RANGE': [[75,100]],
    }

In [167]:
P3.designPrimers(target_dict,p3_test_globals)

{'PRIMER_INTERNAL_0': (135L, 27L),
 'PRIMER_INTERNAL_0_GC_PERCENT': 40.74074074074074,
 'PRIMER_INTERNAL_0_HAIRPIN_TH': 39.4598148393772,
 'PRIMER_INTERNAL_0_PENALTY': 9.73623253472806,
 'PRIMER_INTERNAL_0_SELF_ANY_TH': 0.0,
 'PRIMER_INTERNAL_0_SELF_END_TH': 0.0,
 'PRIMER_INTERNAL_0_SEQUENCE': 'TTCAGGTTTGTTTGAAGAAACGGGAGA',
 'PRIMER_INTERNAL_0_TM': 57.26376746527194,
 'PRIMER_INTERNAL_1': (135L, 27L),
 'PRIMER_INTERNAL_1_GC_PERCENT': 40.74074074074074,
 'PRIMER_INTERNAL_1_HAIRPIN_TH': 39.4598148393772,
 'PRIMER_INTERNAL_1_PENALTY': 9.73623253472806,
 'PRIMER_INTERNAL_1_SELF_ANY_TH': 0.0,
 'PRIMER_INTERNAL_1_SELF_END_TH': 0.0,
 'PRIMER_INTERNAL_1_SEQUENCE': 'TTCAGGTTTGTTTGAAGAAACGGGAGA',
 'PRIMER_INTERNAL_1_TM': 57.26376746527194,
 'PRIMER_INTERNAL_2': (135L, 27L),
 'PRIMER_INTERNAL_2_GC_PERCENT': 40.74074074074074,
 'PRIMER_INTERNAL_2_HAIRPIN_TH': 39.4598148393772,
 'PRIMER_INTERNAL_2_PENALTY': 9.73623253472806,
 'PRIMER_INTERNAL_2_SELF_ANY_TH': 0.0,
 'PRIMER_INTERNAL_2_SELF_END_TH': 0

In [164]:
!gister -d 'How to use pybedtools to drive bulk design' 2016-01-24PyBedToolsforMarkerDesign.ipynb

https://gist.github.com/3b24004be19d1f55ad25


In [168]:
!gister -e https://gist.github.com/3b24004be19d1f55ad25 2016-01-24PyBedToolsforMarkerDesign.ipynb

https://gist.github.com/3b24004be19d1f55ad25
