<img src="https://www.colorado.edu/rc/sites/default/files/page/logo.png"
     alt="Logo for Research Computing @ University of Colorado Boulder"
     width="400" />

# Scatter/Gather of data

One common task when parallel programming involves distributing (scattering) a list of numbers among
the different processes or collating (gathering) a distributed list of numbers back to the hub processes.
This example illustrates the basic mechanics of scattering
 and gathering.
 
 <img src="scatter.png"
     alt="Concept of scatter a list to engines "
     width="215" />
<img src="gather.png"
     alt="Concept of gather the values a list to engines "
     width="200" />

In [2]:
!ipcluster start -n 4 --daemonize

In [4]:
import ipyparallel

rc  =ipyparallel.Client(profile='default')
nengines = len(rc)
nengines

4

## Create list of data

In [5]:
all_proc  = rc[:]
all_proc.block=True

a = []
lsize=6*nengines
for i in range(0,lsize):
    a.append(i**2)

a

[0,
 1,
 4,
 9,
 16,
 25,
 36,
 49,
 64,
 81,
 100,
 121,
 144,
 169,
 196,
 225,
 256,
 289,
 324,
 361,
 400,
 441,
 484,
 529]

## Scatter the list

We scatter the list "a" from the hub out to all engines. 
Each process stores a portion of "a" locally in the variable "mylist"

In [6]:
all_proc.scatter('mylist',a)

In [7]:
%%px
print(mylist)

[stdout:0] [0, 1, 4, 9, 16, 25]
[stdout:1] [36, 49, 64, 81, 100, 121]
[stdout:2] [144, 169, 196, 225, 256, 289]
[stdout:3] [324, 361, 400, 441, 484, 529]


## Get `mylist` to list of lists

Create a variable on the controller that holds the contents of `mylist` for each engine.
sub_lists is a nested list, `sub_list[i][:]` holds the value `mylist`for engine 'i'

In [8]:
sub_lists = all_proc['mylist']

print('\n ',nengines," Python engines are active.\n")

print(' ')
for i in range(nengines):
    istr = '{:02d}'.format(i)  # returns a 2-digit string whose value is i
    msg = 'Engine '+istr+':   list segment = '
    print(msg, sub_lists[i])
print(' ')


  4  Python engines are active.

 
Engine 00:   list segment =  [0, 1, 4, 9, 16, 25]
Engine 01:   list segment =  [36, 49, 64, 81, 100, 121]
Engine 02:   list segment =  [144, 169, 196, 225, 256, 289]
Engine 03:   list segment =  [324, 361, 400, 441, 484, 529]
 


## Gather the `mylist` data

Gather `mylist` back to the controller, store the contents in a list named gathered.

In [9]:
gathered = all_proc.gather('mylist')
print('Gathered list: ', gathered[:], type(gathered))

Gathered list:  [0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400, 441, 484, 529] <class 'list'>


In [10]:
print(sub_lists)

[[0, 1, 4, 9, 16, 25], [36, 49, 64, 81, 100, 121], [144, 169, 196, 225, 256, 289], [324, 361, 400, 441, 484, 529]]


In [11]:
!ipcluster stop

2019-05-20 21:39:18.609 [IPClusterStop] Stopping cluster [pid=32568] with [signal=<Signals.SIGINT: 2>]
