**Author**: J W Debelius<br/>
**Date**: 6 March 2015<br/>
**Scikit-bio version**: 0.2.2<br/>
**virtualenv**: Playground

In [2]:
%%javascript
IPython.load_extensions('calico-spell-check', 'calico-document-tools')

<IPython.core.display.Javascript object>

I'm going to play with multiprocessing today. I got the multiprocessing to work in an IPython notebook last week, but I couldn't pass the tests in the browser. So, I'm going to try an alternative. I'd like to work through the multiprocessing examples. (BTW, its unlikely this notebook will serve as anything for anyone else).

Also, I wanted to test the spell-check functionality. Because I clearly need it. It seems to work okay. The little red lines are sort of obnoxious while I'm typing, but it's definitely awesome, and you can still look up the correct spellings on Google.

So, I think for my code, I need to use the `Pool` class from the **`multiprocessing`** module.
This example comes from the [Python documentation](https://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-workers).

In [4]:
from multiprocessing import Pool

def f(x):
    return x*x

pool = Pool(processes=1)
result = pool.apply_async(f, [10])
print result.get(timeout=1)
print pool.map(f, range(10))

100
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


In [5]:
p = Pool(5)

p.map(f, [1, 2, 3])

[1, 4, 9]

Before, on [27 Feb 2015](../2015_2/20150227_power_update.ipynb), I could run the code in an IPython notebook where I defined the pool function in the notebook, as well as the code in the notebook. When I try to run my test code, I get the following error message:

```Macintosh:stats$ python tests/test_power.py
......Exception in thread Thread-2:
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 551, in __bootstrap_inner
    self.run()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 504, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 319, in _handle_tasks
    put(task)
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed```

Which suggests there's an issue with importanting and pickling the function. I'm going to try the same thing here, and see if I can define a function in-line here, and then import the skbio functions.

To do that, I'm going to restart this notebook in **skbio-dev**.

---------
Okay, so now we're running skbio-dev instead of playground. I'm going to use subsample power because profiling is easier that way. I'm going to start with a simple example, two random normal distributions.

In [1]:
import numpy as np
import scipy
from skbio.stats.power import subsample_power

In [2]:
# Draws two samples with the same standard deviation (1) which is 1.5x the difference in the means.
samp1 = np.random.randn(50)
samp2 = np.random.randn(50) + 0.75

In [3]:
# Sets up a function to test the difference in the populations
f = lambda x: scipy.stats.ttest_ind(x[0], x[1])[1]

In [4]:
# Checks the function on the two samples
f([samp1, samp2])

5.9533417547358208e-06

I'd also like to profile the code to make sure it's functioning the way I expect. This is, once again, based on the information presented by Bryan Helmig in his blog post, "[Profiling Python Like a Boss](https://zapier.com/engineering/profiling-python-boss/)". I'm just going to use the time wrapper here.

In [8]:
import time

def timefunc(f):
    def f_timer(*args, **kwargs):
        start = time.time()
        result = f(*args, **kwargs)
        end = time.time()
        print f.__name__, 'took', end - start, 'time'
        return result
    return f_timer

In [9]:
@timefunc
def test_samps_1():
    subsample_power(f, [samp1, samp2], num_cpus=1, num_iter=10, num_runs=3)

@timefunc
def test_samps_4():
    subsample_power(f, [samp1, samp2], num_cpus=4, num_iter=10, num_runs=3)

In [None]:
test_samps_4()

I appear to be getting a silent error with timing, so I'm just going to try calling the script without the timer.

In [None]:
subsample_power(f, [samp1, samp2], num_cpus=1, num_iter=10, num_runs=3)

I just straight-up get a silent error. I have to call it in a terminal interpreter (IPython instance) to generate the error:

So, I *do* need a way to solve the pickling error. 

In [None]:
    
test_samps_1()