Testing ipyrad functions for speed improvements with numba

In [16]:
import numpy as np
from numba import jit

In [2]:
def normtest(x):
    k = 0
    for i in xrange(x):
        k = np.sqrt(2+x)
    return k

In [3]:
%%timeit 
normtest(int(1e6))

1 loops, best of 3: 1.18 s per loop


In [4]:
@jit
def jnormtest(x):
    k = 0
    for i in xrange(x):
        k = np.sqrt(2+x)
    return k

In [5]:
%%time
jnormtest(int(1e6))

CPU times: user 54.3 ms, sys: 12.1 ms, total: 66.4 ms
Wall time: 64.2 ms


1000.0009999995

In [43]:
arr = np.empty([1,20],dtype="S1")

In [44]:
def fillarr(arr, let):
    for i,j in enumerate(let):
        arr[0,i] = j
    return arr

In [45]:
@jit
def jfillarr(arr, let):
    for i,j in enumerate(let):
        arr[0,i] = j
    return arr

### HOly Cow
using jit is so much faster! Just need to rewrite code to fill empty arrays instead of appending to lists and we should be able to use jit just fine for steps 1,2,4,5,7.

In [46]:
%%timeit
fillarr(arr, list("abcdefgh"))

The slowest run took 6.70 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 2.99 µs per loop


In [47]:
%%timeit
jfillarr(arr, list("abcdefgh"))

The slowest run took 778.38 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 209 µs per loop


In [1]:
arr

NameError: name 'arr' is not defined

In [2]:
## pure python
def findbcode(cut, longbar, read1):
    search = read1[1][:longbar+len(cut)]
    countcuts = search.count(cut)
    if countcuts == 1:
        barcode = search.split(cut, 1)[0]
    elif countcuts == 2:
        barcode = search.rsplit(cut, 2)[0]
    else:
        barcode = ""
    return barcode

In [42]:
## jit version
@jit
def jfindbarcode(cut, longbar, read1):
    search = read1[1][:longbar+len(cut)]
    countcuts = search.count(cut)
    if countcuts == 1:
        barcode = search.split(cut, 1)[0]
    elif countcuts == 2:
        barcode = search.rsplit(cut, 2)[0]
    else:
        barcode = ""
    return barcode

In [47]:
cut = "TGCAG"
longbar = 6
read1 = ['fakeread','AAACCCTGCAGAAAAAAAAAAAAAAAAA']
nread1 = np.array(['fakeread','AAACCCTGCAGAAAAAAAAAAAAAAAAA'])


In [48]:
%%timeit 
findbcode(cut, longbar, read1)

The slowest run took 5.48 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 914 ns per loop


In [59]:
%%timeit
findbcode(cut, longbar, nread1)

The slowest run took 11.24 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 1.08 µs per loop


In [51]:
%%timeit 
jfindbarcode(cut, longbar, nread1)

10000 loops, best of 3: 53.5 µs per loop


In [37]:
read1[1][:longbar+6].count("TGCAG")

1