**Optimizing Source Code ** 
===

Many code-snippets and examples make not full use of the computational power within a programming language

A successful optimization presumes the profiling and exact analyses of the used source code. Different version of the used language also play an important role as function implementations may differ e.g. between python 2.7 and python 3.0.




Sources: 

Hans Petter Langtangen - "Python Scripting for Computational Science" (https://www.amazon.de/Python-Scripting-Computational-Science-Engineering/dp/3540739157)

http://www.scipy-lectures.org/advanced/optimizing/index.html
    
https://wiki.python.org/moin/PythonSpeed/PerformanceTips


**Overview of the notebook:**
    - Loops
    - module prefixes
    - numpy functions with scalars
    - resizing arrays
    - if-then-try-except
    

In [1]:
import numpy as np

In [2]:
a = np.arange(1000)
%timeit a**2

The slowest run took 5502.65 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.27 µs per loop


**Doing things in parallel**

https://sebastianraschka.com/Articles/2014_multiprocessing.html

https://medium.com/@urban_institute/using-multiprocessing-to-make-python-code-faster-23ea5ef996ba

https://nnc3.com/mags/LJ_1994-2014/LJ/217/11238.html

In [3]:
'''
Create number of processes using the multiprocessing module
'''

import multiprocessing
from multiprocessing import Process

# dummy function
def f(id):
    #This is a dummy function taking a parameter
    return

if __name__ == '__main__':

    # get the number of CPUs
    np = multiprocessing.cpu_count()
    print('You have {0:1d} CPUs'.format(np))

    # Create the processes
    p_list=[]
    for i in range(1,np+1):
        p = Process(target=f, name='Process'+str(i), args=(i,))
        p_list.append(p)
        print( 'Process:: ', p.name)
        p.start()
        print ('Was assigned PID:: ', p.pid)

    # Wait for all the processes to finish
    for p in p_list:
        p.join()

You have 4 CPUs
Process::  Process1
Was assigned PID::  2395
Process::  Process2
Was assigned PID::  2396
Process::  Process3
Was assigned PID::  2397
Process::  Process4
Was assigned PID::  2398


In [6]:

#futures_thread_pool_map.py

from concurrent import futures
import threading
import time


def task(n):
    print('{}: sleeping {}'.format(
        threading.current_thread().name,
        n)
    )
    time.sleep(n / 10)
    print('{}: done with {}'.format(
        threading.current_thread().name,
        n)
    )
    return n / 10


ex = futures.ThreadPoolExecutor(max_workers=3)
print('main: starting')
results = ex.map(task, range(5, 0, -1))
print('main: unprocessed results {}'.format(results))
print('main: waiting for real results')
real_results = list(results)
print('main: results: {}'.format(real_results))



<concurrent.futures.thread.ThreadPoolExecutor object at 0x10cfea390>_0: sleeping 5main: starting
<concurrent.futures.thread.ThreadPoolExecutor object at 0x10cfea390>_1: sleeping 4<concurrent.futures.thread.ThreadPoolExecutor object at 0x10cfea390>_2: sleeping 3


main: unprocessed results <generator object Executor.map.<locals>.result_iterator at 0x10cf42ba0>
main: waiting for real results
<concurrent.futures.thread.ThreadPoolExecutor object at 0x10cfea390>_2: done with 3
<concurrent.futures.thread.ThreadPoolExecutor object at 0x10cfea390>_2: sleeping 2
<concurrent.futures.thread.ThreadPoolExecutor object at 0x10cfea390>_1: done with 4
<concurrent.futures.thread.ThreadPoolExecutor object at 0x10cfea390>_1: sleeping 1
<concurrent.futures.thread.ThreadPoolExecutor object at 0x10cfea390>_0: done with 5<concurrent.futures.thread.ThreadPoolExecutor object at 0x10cfea390>_2: done with 2

<concurrent.futures.thread.ThreadPoolExecutor object at 0x10cfea390>_1: done with 1
main: results: [0.5, 

1) Avoiding Loops
===

When there are for example vectorized NumPy expressions that can do the same job, use those!

Do not loop over lists or arrays when there methods available that get the same result:




2) Avoid module prefixes in loops
===

import mod <br/>
func = mod.func<br/>

for x in hugelist:<br/>
    func(x)
    
    
   
**will run faster than**


import mod<br/>
for x in hugelist:<br/>
    mod.func(x)    

In [22]:
import math
def fu(d):
    for x in range(1,d):
        math.sin(x)

In [28]:
%timeit fu(10000000)

1 loop, best of 3: 1.47 s per loop


In [20]:
import math
sin=math.sin

def fu2(d):
    for x in range(1,d):
        sin(x)

In [29]:
%timeit fu2(10000000)

1 loop, best of 3: 1.21 s per loop


3) Avoid using NumPy functions with scalar arguments
===


In [7]:
import numpy as np
import math

print('np.sin(2) : ',np.sin(2))
print(20*'-')
%timeit np.sin(2)

print(20*'=')

print('math.sin(2) : ',math.sin(2))
print(20*'-')
%timeit math.sin(2)

np.sin(2) :  0.909297426826
--------------------
The slowest run took 18.98 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 966 ns per loop
math.sin(2) :  0.9092974268256817
--------------------
The slowest run took 14.21 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 119 ns per loop


4) Avoid resizing NumPy arrays
===

5) if-else statements are faster than try-except
===

In [10]:
from math import sqrt
def f1(x):
    if x > 0:
        return sqrt(x)
    else:
        return 0.0
    
    
def f2(x):
    try:
        return sqrt(x)
    except:
        return 0.0

In [11]:
%timeit f1(2)

The slowest run took 14.75 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 181 ns per loop


In [12]:
%timeit f2(2)

The slowest run took 19.08 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 157 ns per loop


In [13]:
%timeit f1(-1)

The slowest run took 17.02 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 109 ns per loop


In [14]:
%timeit f2(-1)

The slowest run took 10.45 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 396 ns per loop


If a valid condition is requested like shown above the **√2** than if-else and try-except are equally fast.

But if the condition is going to be not true like shown above **√-1** the **if-else construction is 4-times faster** than the try-except block. 

The **except** statement is complex and takes more time. So if such constructions are used on large datasets if-else tests are to priviledge and can increase the performance of the source code.

In [31]:
import numpy as np
a = np.random.rand(20, 2**18)
b = np.random.rand(20, 2**18)
c = np.ascontiguousarray(a.T)

In [34]:
%timeit c = np.ascontiguousarray(a.T)

100 loops, best of 3: 13.7 ms per loop


In [32]:
%timeit np.dot(b, a.T)

The slowest run took 7.62 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 8.21 ms per loop


In [33]:
%timeit np.dot(b, c)

100 loops, best of 3: 9.1 ms per loop


In [36]:
print(1e7)

10000000.0


In [38]:
a = np.zeros(10000000)

In [39]:
%timeit a.copy()

The slowest run took 5.36 times longer than the fastest. This could mean that an intermediate result is being cached.
10 loops, best of 3: 24.7 ms per loop


In [43]:
%timeit a+1

10 loops, best of 3: 25.1 ms per loop


In [44]:
print(a)

[ 0.  0.  0. ...,  0.  0.  0.]


In [45]:
%timeit global a ; a = 0*a

10 loops, best of 3: 26.9 ms per loop


In [46]:
%timeit global a ; a *= 0

100 loops, best of 3: 7.06 ms per loop


In [19]:
import numpy as np
A=np.array([[2,4,3],[1,2,3],[-2,4,5]])
#A=np.ones((3,3))
B=np.ones((3,3))



print(A)
print(20*'-')
print(B)
print(20*'-')
print(A*B)
print(20*'-')
print(np.dot(A,B))

[[ 2  4  3]
 [ 1  2  3]
 [-2  4  5]]
--------------------
[[ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]]
--------------------
[[ 2.  4.  3.]
 [ 1.  2.  3.]
 [-2.  4.  5.]]
--------------------
[[ 9.  9.  9.]
 [ 6.  6.  6.]
 [ 7.  7.  7.]]
