# Table of Contents
 <p><div class="lev1"><a href="#summary"><span class="toc-item-num">1&nbsp;&nbsp;</span>summary</a></div><div class="lev2"><a href="#Direct-view"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Direct view</a></div><div class="lev2"><a href="#LoadBalancedView"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>LoadBalancedView</a></div>

# summary

In [1]:
from __future__ import print_function, division # legacy support
import pandas as pd
import numpy as np

In [2]:
%matplotlib inline

## Direct view

Direct view is a simple way to perform parallel calculations in ipython, however this has some drawbacks:

Advantages:
- Easy

Disadvantages:
- While performing the parallel calculations, the notebook kernel is busy. This means that you can change whatever you want in the notebook, but you have to wait until the parallel calculation has finished before you can evaluate other cells. (So for example during the parallel calculation you want to make a figure, however the evaluation of the figure code will only be done after finishing the parallel calculation).
- For the direct view you have to 'scatter' your scenarios before you can start the simulations. This can be a drawback in some cases, for example: You have 10 cores and 1000 scenarios you want to calculate. So when you use the scatter command, you will give 100 scenarios to every core. Seems reasonable, right? However, imagine that some scenarios are really fast and others are really slow... this can have as an impact that some cores will be finished, while the others still have to do a lot of work.
- Less options


In [3]:
# Import client for parallel calculation
from ipyparallel import Client
# Make client using the cores started with --profile='nbserver'
c = Client(profile='nbserver')
# Make directView of all clients, used to parse data/objects to cores
dview = c[:]
# Print all clients
print(c.ids)

[0, 1, 2, 3]


In [4]:
%%px --local
# Execute on all cores but also for serial calculations, has to be loaded on the FIRST line
# This code should only be runned when using self defined classes 
# Load dill, More general pickling class
import dill
# fallback to pickle instead of cPickle, so that dill can take over
import pickle
from ipykernel import serialize
serialize.pickle = pickle

In [5]:
# Import on all cores
with c[:].sync_imports():
    import math

importing math on engine(s)


There are two options to load your functions and other variables on the different cores
1. Use "%%px --local" to load everything which is defined in the cells both in the local and the core environments
2. Use "dview['data'] = data" to parse local variables to all the different cores

In [6]:
%%px --local
def get_sum(pararray = None):
    return sum(pararray)

In [7]:
samples = 5000
scenarios = np.random.random([samples,3])

In [8]:
# Example for using 4 parallel cores
def mul(a, b):
    return a * b

In [9]:
res = dview.map(mul, [5, 6, 7, 8], [8, 9, 10, 11])

In [10]:
res.result()

[40, 54, 70, 88]

In [11]:
res = dview.apply(mul, 5, 6)

In [12]:
res.get()

[30, 30, 30, 30]

In a DirectView interface, we can either use Blocking (synchronous) execution, in which all results must finish computing 
before any results are recorded, or non-blocking (asynchronous) execution, where we receive results as they finish.

- Get results directly without using the .result option
    <pre><code>
    out = dview.map_sync(get_sum, scenarios)
    </code></pre>
- Create objects for which you have to use the .result option
    <pre><code>
    out = dview.map_async(get_sum, scenarios)
    </code></pre>

However only the number of runned scenarios will equal the number of processors,
therefore we have to "scatter" the scenarios over the different cores

In [13]:
dview.scatter('scenarios',scenarios)

<AsyncResult: scatter>

In [14]:
%px out = [get_sum(scen) for scen in scenarios]

In [15]:
out = dview.gather('out')
out = out.get()

In [16]:
df_out = pd.DataFrame(np.hstack([scenarios, np.atleast_2d(out).T]), 
             columns=['par1','par2','par3','sum'])

In [17]:
df_out.head()

Unnamed: 0,par1,par2,par3,sum
0,0.905388,0.433759,0.017719,1.356865
1,0.883372,0.157088,0.026599,1.06706
2,0.807046,0.814463,0.527293,2.148802
3,0.408789,0.909581,0.504522,1.822892
4,0.772068,0.009461,0.342701,1.124229


## LoadBalancedView

LoadBalancedView is a more advanced way of performing parallel calculations

Advantages:
- Parallel calculations are not directly connected to your notebook, this means that during the parallel calculations your notebook is as productive as usual. For very fast functions the difference will be non-existing (or even the other way around), but when load increases you will see the advantage of the LoadBalancedView.
- The addressed cores are used in the most efficient way, because the assignment of the work to the different cores is done dynamically during the calculation. Thereby avoiding unbalanced workload of the different cores.
- More options
- All the stuff which is printed during each simulation is not shown, keeping your notebook tidy and clean ^^

Disadvantages:
- No printing during calculation, errors are not popping up. So if something goes wrong with simulations, this will not directly show you an error. The advantage is that the next calculation is started automatically, so no loss of time. The disadvantage is that when all simulations are failing, this only pops up when you ask for the results.

In a loadBalancedView interface, we can either use Blocking (synchronous) execution, in which all results must finish computing 
before any results are recorded, or non-blocking (asynchronous) execution, where we receive results as they finish.

- Get results directly without using the .result option
    <pre><code>
    out = lview.map_sync(get_sum, scenarios)
    </code></pre>
- Create objects for which you have to use the .result option
    <pre><code>
    out = lview.map_async(get_sum, scenarios)
    </code></pre>

In [18]:
lview = c.load_balanced_view()

chunksize is of high importance in this case, because the calculations take very little time. For low size of chunks (e.g. the number of elements in the scenarios that are calculated at once), the overhead is high due to communication between scheduler and engines

In [19]:
out = lview.map_async(get_sum, scenarios, chunksize=10)
out.wait_interactive()
print('the total calculation time is: %g'%out.wall_time)

 500/500 tasks finished after    2 s
done
the total calculation time is: 2.36536


In [20]:
out = lview.map_async(get_sum, scenarios, chunksize=100)
out.wait_interactive()
print('the total calculation time is: %g'%out.wall_time)

  50/50 tasks finished after    0 s
done
the total calculation time is: 0.305344


In [21]:
df_out = pd.DataFrame(np.hstack([scenarios, np.atleast_2d(out.result()).T]), 
             columns=['par1','par2','par3','sum'])

In [22]:
df_out.head()

Unnamed: 0,par1,par2,par3,sum
0,0.905388,0.433759,0.017719,1.356865
1,0.883372,0.157088,0.026599,1.06706
2,0.807046,0.814463,0.527293,2.148802
3,0.408789,0.909581,0.504522,1.822892
4,0.772068,0.009461,0.342701,1.124229
