## Simple Parallel Computation


In parallel computing, an [embarrassingly parallel workload](https://en.wikipedia.org/wiki/Embarrassingly_parallel) is one where little or no effort is needed to separate the problem into a number of parallel tasks. This is often the case where there is little or no dependency or need for communication between those parallel tasks. 

A lot of the time, data scientists are dealing with this type of workflow. They can focus on working on a single threaded computation graph. Greenflow provides a `SimpleParallelNode` to parallel it in the Dask. This notebook shows an simple example of it.

Let's start the Dask cluster:

In [1]:
from dask_cuda import LocalCUDACluster
cluster = LocalCUDACluster()
from dask.distributed import Client
client = Client(cluster)
client

0,1
Client  Scheduler: tcp://127.0.0.1:46799  Dashboard: http://127.0.0.1:8787/status,Cluster  Workers: 4  Cores: 4  Memory: 270.39 GB


We will use a simple taskgraph borrowed from `09_gquant_machine_leanring` notebook. It generates a random dataset and adds a few categorical variables to it.

In [2]:
from greenflow.dataframe_flow import TaskGraph
task_graph = TaskGraph.load_taskgraph('../taskgraphs/xgboost_example/data_generator.gq.yaml')
task_graph.draw()

GreenflowWidget(sub=HBox(), value=[OrderedDict([('id', 'data_gen'), ('type', 'ClassificationData'), ('conf', {…

The above workflow is a typical embarrassingly parallel workload where we can generate a chunck of the datasets in different GPUs. 

Greenflow provides a `SimpleParallelNode` that can take a single GPU/CPU workflow, and convert it to output Dask dataframe. Each partition in the Dask dataframe will be computed in parallel in different GPU/CPUs. 

In the `SimpleParallelNode` configuration, user just need to set the taskgraph, the inputs and outputs of the taskgraph, the context parameters for the taskgraph. Most importantly, it can map the iteration id (or the Dask Dataframe partition id) to any number typed configuration item of the taskgraph. E.g. it can be used to set the randomn seed number for each of the iteration runs.

In the following taskgraph, we will run the above workflow 4 times, each time it generates a 300 data points dataframe. We choose to get the outputs from two output ports. The `SimpleParallelNode` will combine the cudf dataframe result into the Dask Dataframe with 4 partitions.

In [3]:
task_graph = TaskGraph.load_taskgraph('../taskgraphs/xgboost_example/simple_parallel.gq.yaml')
task_graph.draw()

GreenflowWidget(sub=HBox(), value=[OrderedDict([('id', 'paralell'), ('type', 'SimpleParallelNode'), ('conf', {…

It generates two Dask Dataframes of 4 partitions

In [4]:
result = task_graph.run()
result['paralell.drop_x2_x3@out']


Unnamed: 0_level_0,x0,x1,x4,x5,x6,x7,x8,x9,y,x3_0,x3_1,x2_0,x2_1
npartitions=4,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
,float64,float64,float64,float64,float64,float64,float64,float64,int64,float64,float64,float64,float64
,...,...,...,...,...,...,...,...,...,...,...,...,...
,...,...,...,...,...,...,...,...,...,...,...,...,...
,...,...,...,...,...,...,...,...,...,...,...,...,...
,...,...,...,...,...,...,...,...,...,...,...,...,...


In [5]:
result['paralell.x3_to_sign@out']

Unnamed: 0_level_0,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,y,x2_sign,x3_sign
npartitions=4,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
,float64,float64,float64,float64,float64,float64,float64,float64,float64,float64,int64,int64,int64
,...,...,...,...,...,...,...,...,...,...,...,...,...
,...,...,...,...,...,...,...,...,...,...,...,...,...
,...,...,...,...,...,...,...,...,...,...,...,...,...
,...,...,...,...,...,...,...,...,...,...,...,...,...


We can evaluate them. Note, `SimpleParallelNode` calls `persist` on the output Dask Dataframes. `compute` won't re-compute the graph. 

In [6]:
result['paralell.drop_x2_x3@out'].compute()

Unnamed: 0,x0,x1,x4,x5,x6,x7,x8,x9,y,x3_0,x3_1,x2_0,x2_1
0,-1.590499,-0.109641,-1.201097,0.373905,0.860087,-0.008752,0.415462,0.425636,1,1.0,0.0,1.0,0.0
1,0.063491,1.515215,-0.461497,-0.576744,1.325481,1.286008,0.351541,0.607517,0,1.0,0.0,1.0,0.0
2,-0.826528,1.100968,0.215652,0.378117,-0.325190,0.210768,1.526683,-1.798531,0,1.0,0.0,1.0,0.0
3,4.000650,2.296431,0.399628,-1.098406,-1.673600,1.775856,-2.146905,-0.302625,0,1.0,0.0,1.0,0.0
4,-1.642885,-0.006057,0.223185,1.228473,-0.877286,0.455420,0.364706,0.096409,1,1.0,0.0,1.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
295,-0.867480,-3.107077,-0.404139,0.802885,-1.900449,-2.221317,-0.680082,-0.136953,1,0.0,1.0,1.0,0.0
296,-0.628308,-1.522425,-2.406415,1.651044,1.197216,-1.236540,-0.764979,0.514345,0,0.0,1.0,0.0,1.0
297,2.058393,-0.069698,0.297251,-0.278816,0.974293,0.357927,-1.581650,1.008222,1,0.0,1.0,1.0,0.0
298,0.278533,-1.640604,1.422526,1.279790,1.899960,-0.236495,-1.531827,-0.210135,1,0.0,1.0,0.0,1.0


In [7]:
result['paralell.x3_to_sign@out'].compute()

Unnamed: 0,x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,y,x2_sign,x3_sign
0,-1.590499,-0.109641,-1.564527,-1.738593,-1.201097,0.373905,0.860087,-0.008752,0.415462,0.425636,1,0,0
1,0.063491,1.515215,-1.249759,-0.217541,-0.461497,-0.576744,1.325481,1.286008,0.351541,0.607517,0,0,0
2,-0.826528,1.100968,-1.211564,-0.307162,0.215652,0.378117,-0.325190,0.210768,1.526683,-1.798531,0,0,0
3,4.000650,2.296431,-0.584589,-2.410844,0.399628,-1.098406,-1.673600,1.775856,-2.146905,-0.302625,0,0,0
4,-1.642885,-0.006057,-1.400736,-0.326703,0.223185,1.228473,-0.877286,0.455420,0.364706,0.096409,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
295,-0.867480,-3.107077,-0.652265,0.493787,-0.404139,0.802885,-1.900449,-2.221317,-0.680082,-0.136953,1,0,1
296,-0.628308,-1.522425,0.779262,1.245632,-2.406415,1.651044,1.197216,-1.236540,-0.764979,0.514345,0,1,1
297,2.058393,-0.069698,-1.882360,0.442483,0.297251,-0.278816,0.974293,0.357927,-1.581650,1.008222,1,0,1
298,0.278533,-1.640604,1.021632,1.354946,1.422526,1.279790,1.899960,-0.236495,-1.531827,-0.210135,1,1,1


## Clean Up

In [8]:
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}