# `Distribute` Parallel Processing Basics
Due to a number of limitations involving data passed to processes using `multiprocessing.Pool()`, I've implemented a similar class called `Distribute()`. The primary difference is that Distribute is meant to distribute chunks of data for parallel processing, so your map function should parse multiple values. There are currently two functions in Distribute:

* `.map_chunk()` simply applies a function to a list of elements and returns a list of parsed elements.
* `.map_insert()` applies a function to a single element and stores the result as a row in a doctable.

In [1]:
#from IPython import get_ipython
import sys
sys.path.append('..')
import doctable

## `.map_chunk()` Method
Allows you to write map functions that processes a chunk of your data at a time. This is the lowest-level method for distributed processing.

In [3]:
# map function to multiply 1.275 by each num and return a list
def multiply_nums(nums):
    return [num*1.275 for num in nums]

# use Distribute(3) to create three separate processes
nums = list(range(1000))
with doctable.Distribute(3) as d:
    %time res = d.map_chunk(multiply_nums, nums)

# won't create new process at all. good for testing
with doctable.Distribute(1) as d:
    %time res = d.map_chunk(multiply_nums, nums)
res[:3]

CPU times: user 1.76 ms, sys: 24.3 ms, total: 26 ms
Wall time: 36.6 ms
CPU times: user 161 µs, sys: 0 ns, total: 161 µs
Wall time: 166 µs


[0.0, 1.275, 2.55]

## `map_insert()` Method
Allows you to write methods which are meant to store single rows into a database. Note how `muli_multi_store()` inserts into database a single element, and the doctable is passed using the `dt_inst` keyword parameter.

In [3]:
## make a new doctable instance with two columns
#db = doctable.DocTable(schema=(
#    ('idcol', 'id'), 
#    ('float', 'num', dict(unique=True)), 
#), target='tmp_distributed_basics.db', new_db=True)
#
## function to apply to each number then store in doctable
#def multiply_and_insert(num, db):
#    db.insert({'num': num*1.275}, ifnotunique='replace')
#
## use .map_insert() while passing the DocTable instance through dt_inst
#with doctable.Distribute(2) as d:
#    %time res = d.map_insert(multiply_and_insert, nums, dt_inst=db)
#db.select_df(limit=10)

CPU times: user 9.66 ms, sys: 11.7 ms, total: 21.3 ms
Wall time: 1.79 s


Unnamed: 0,id,num
0,2001,0.0
1,2002,1.275
2,2003,2.55
3,2004,3.825
4,2005,5.1
5,2006,6.375
6,2007,7.65
7,2008,8.925
8,2009,637.5
9,2010,638.775
