Simple parallelized reduce for Python instead of functools.reduce
, which is serial.
pip install git+https://github.com/IncubatorShokuhou/python-parallel-reduce.git
,or
git clone https://github.com/IncubatorShokuhou/python-parallel-reduce.git
then
python setup install
import os
import time
from reduce_p import reduce_p,handle_none
from functools import reduce
@handle_none
def max_new(i,j): #attention: the function should better be decorated by `handle_none` to handle with a None parameter
time.sleep(0.1)
return max(i,j)
a = time.time()
p_result = reduce_p(max_new,range(500))
b = time.time()
p_time = b-a
a = time.time()
s_result = reduce(max_new,range(500))
b = time.time()
s_time = b-a
print("serial version spends ",s_time," seconds, result is ",s_result)
print("parallelized version spends ",p_time," seconds, result is ",p_result)
>>> print("serial version spends ",s_time," seconds, result is ",s_result)
serial version spends 49.99384617805481 seconds, result is 499
>>> print("parallelized version spends ",p_time," seconds, result is ",p_result)
parallelized version spends 5.494153022766113 seconds, result is 499
reduce_p(function, iterable[,n_jobs ,initializer])
function
, iterable
, initializer
: same as in functools.reduce
except
n_jobs
: number of threads. Default use all cpus.
It is recommended only when length of iterable
is much larger than cpu numbers, or it will not necessarily faster than functools.reduce
.
The function
should better be able to handle the situation that one of the parameters is None
, or errors might happened, especially when length of iterable
is small.