# Faster Data Types

Python is simple because of it's flexibility - it tries to find what will work best

## Specifying data types

Python uses different checks to be able to choose the correct data-type, even when it is given: the more specific about your data you can be, the fewer checks are done

In [1]:
import timeit as ti
iterMax = 100000

def makeFloat( ):
    valB = float( '5' )
    
def makeInt( ):
    # Checks all bases - should take longest
    valB = int( '5' )
        
def makeIntMore( ):
    # Provide the base - fewer checks to should be faster
    valB = int( '5', 10 )

print 'Float:', ti.timeit( makeFloat,     number = iterMax )
print 'Int:  ', ti.timeit( makeInt,       number = iterMax )
print 'Int+: ', ti.timeit( makeIntMore,   number = iterMax )

Float: 0.0240831375122
Int:   0.0716269016266
Int+:  0.0291101932526


## Avoiding Loops

Loops tend to be slow in python (and in general, but compilers typically unroll for efficiency).  Let's look at some better ways of doing things.

In [2]:
import numpy as np
someList = range(50)

In [3]:
%%timeit
out=[]
for i in range( len( someList ) ):
    out.append( someList[i] * 2 )

100000 loops, best of 3: 6.98 µs per loop


In [4]:
%%timeit
twoXSomeList = ( someList[i] * 2 for i in range( len( someList ) ) )

The slowest run took 4.84 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.23 µs per loop


Enumerate and Zip can also be used for efficiency if you need to use loops with multiple variables

In [7]:
import timeit as ti
iterMax = 10000
incVal = 1
aTot = range( 0, 500, incVal )
bTot = range( 1000, 1500, incVal )

def bothInd():
    for indVal in range( len( aTot ) ):
        out = aTot[indVal] * bTot[indVal]

def indVal():
    for indVal, aValue in enumerate( aTot ):
        out = aValue * bTot[indVal]
        
def zipped():
    for aValue, bValue in zip(aTot, bTot):
         out = aValue * bValue
            
            
print 'bothInd:', ti.timeit( bothInd, number = iterMax )
print 'indVal: ', ti.timeit( indVal,  number = iterMax )
print 'zipped: ', ti.timeit( zipped,  number = iterMax )

bothInd: 0.5216588974
indVal:  0.371380090714
zipped:  0.410984039307


## Numpy structures

Numpy has built in structures to do create efficient arrays!  There is some overhead associated with the creation of the arrays, so larger arrays have more savings

In [8]:
import numpy as np
import timeit as ti
iterMax = 10000

def numpyAdd(n):
    a = np.arange(n) ** 2
    b = np.arange(n) ** 3
    return a + b

def listAdd(n):
    a = [i ** 2 for i in range(n)]
    b = [i ** 3 for i in range(n)]
    return [a[i] + b[i] for i in range(n)]

print '#-# 10 '
print 'Numpy:', ti.timeit( 'numpyAdd(10)', 'from __main__ import numpyAdd', number = iterMax )
print 'List: ', ti.timeit( 'listAdd(10)',  'from __main__ import listAdd',  number = iterMax )
print '#-# 100 '
print 'Numpy:', ti.timeit( 'numpyAdd(100)', 'from __main__ import numpyAdd', number = iterMax )
print 'List: ', ti.timeit( 'listAdd(100)',  'from __main__ import listAdd',  number = iterMax )
print '#-# 10000 '
print 'Numpy:', ti.timeit( 'numpyAdd(1000)', 'from __main__ import numpyAdd', number = iterMax )
print 'List: ', ti.timeit( 'listAdd(1000)',  'from __main__ import listAdd',  number = iterMax )


#-# 10 
Numpy: 0.0705468654633
List:  0.0451550483704
#-# 100 
Numpy: 0.0726268291473
List:  0.300966024399
#-# 10000 
Numpy: 0.149797201157
List:  2.74000096321


In [13]:
import numpy as np
someList = range( 500 )
someNpArray = np.array( someList )

In [14]:
%%timeit
out=[]
for i in range( len( someList ) ):
    out.append( someList[i] * 2 )

The slowest run took 4.32 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 63.6 µs per loop


In [15]:
%%timeit
twoXSomeList = ( someList[i] * 2 for i in range( len( someList ) ) )

The slowest run took 4.55 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.88 µs per loop


In [16]:
%%timeit
twoXSomeArray = someNpArray * 2

The slowest run took 14.10 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.06 µs per loop


## Pandas structures