# Faster Data Types

Python is simple because of it's flexibility - it tries to find what will work best

## Specifying data types

Python uses different checks to be able to choose the correct data-type, even when it is given: the more specific about your data you can be, the fewer checks are done

In [101]:
import timeit as ti
iterMax = 100000

def makeFloat( ):
    valB = float( '5' )
    
def makeInt( ):
    # Checks all bases - should take longest
    valB = int( '5' )
        
def makeIntMore( ):
    # Provide the base - fewer checks to should be faster
    valB = int( '5', 10 )

print 'Float:', ti.timeit( makeFloat,     number = iterMax )
print 'Int:  ', ti.timeit( makeInt,       number = iterMax )
print 'Int+: ', ti.timeit( makeIntMore,   number = iterMax )

Float: 0.0639600753784
Int:   0.115952014923
Int+:  0.0929849147797


## Avoiding Loops

Loops tend to be slow in python (and in general, but compilers typically unroll for efficiency).  Let's look at some better ways of doing things.

In [135]:
import numpy as np
someList = range(50)

In [136]:
%%timeit
out=[]
for i in range( len( someList ) ):
    out.append( someList[i] * 2 )

10000 loops, best of 3: 20.8 µs per loop


In [137]:
%%timeit
twoXSomeList = ( someList[i] * 2 for i in range( len( someList ) ) )

1000000 loops, best of 3: 1.62 µs per loop


Enumerate and Zip can also be used for efficiency if you need to use loops with multiple variables

In [165]:
import timeit as ti
iterMax = 10000
incVal = 1
aTot = range( 0, 500, incVal )
bTot = range( 1000, 1500, incVal )

def bothInd():
    for indVal in range( len( aTot ) ):
        out = aTot[indVal] * bTot[indVal]

def indVal():
    for indVal, aValue in enumerate( aTot ):
        out = aValue * bTot[indVal]
        
def zipped():
    for aValue, bValue in zip(aTot, bTot):
         out = aValue * bValue
            
            
print 'bothInd:', ti.timeit( bothInd, number = iterMax )
print 'indVal: ', ti.timeit( indVal,  number = iterMax )
print 'zipped: ', ti.timeit( zipped,  number = iterMax )

bothInd: 1.94408679008
indVal:  1.61185789108
zipped:  1.42466115952


## Numpy structures

Numpy has built in structures to do create efficient arrays!  There is some overhead associated with the creation of the arrays, so larger arrays have more savings

In [160]:
import numpy as np
import timeit as ti
iterMax = 10000

def numpyAdd(n):
    a = np.arange(n) ** 2
    b = np.arange(n) ** 3
    return a + b

def listAdd(n):
    a = [i ** 2 for i in range(n)]
    b = [i ** 3 for i in range(n)]
    return [a[i] + b[i] for i in range(n)]

print '#-# 10 '
print 'Numpy:', ti.timeit( 'numpyAdd(10)', 'from __main__ import numpyAdd', number = iterMax )
print 'List: ', ti.timeit( 'listAdd(10)',  'from __main__ import listAdd',  number = iterMax )
print '#-# 100 '
print 'Numpy:', ti.timeit( 'numpyAdd(100)', 'from __main__ import numpyAdd', number = iterMax )
print 'List: ', ti.timeit( 'listAdd(100)',  'from __main__ import listAdd',  number = iterMax )
print '#-# 10000 '
print 'Numpy:', ti.timeit( 'numpyAdd(1000)', 'from __main__ import numpyAdd', number = iterMax )
print 'List: ', ti.timeit( 'listAdd(1000)',  'from __main__ import listAdd',  number = iterMax )


#-# 10 
Numpy: 0.0701229572296
List:  0.151982069016
#-# 100 
Numpy: 0.0660808086395
List:  1.06911015511
#-# 10000 
Numpy: 0.143600940704
List:  9.38206911087


In [161]:
import numpy as np
someList = range( 50 )
someNpArray = np.array( someList )

In [162]:
%%timeit
out=[]
for i in range( len( someList ) ):
    out.append( someList[i] * 2 )

10000 loops, best of 3: 20.5 µs per loop


In [163]:
%%timeit
twoXSomeList = ( someList[i] * 2 for i in range( len( someList ) ) )

1000000 loops, best of 3: 1.67 µs per loop


In [164]:
%%timeit
twoXSomeArray = someNpArray * 2

The slowest run took 33.78 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.36 µs per loop


## Pandas structures