## Sklearn inside a Process


Sklearn gridsearch memory

- https://stackoverflow.com/questions/24406937/scikit-learn-joblib-bug-multiprocessing-pool-self-value-out-of-range-for-i-fo/24411581#24411581


Working with numerical data in shared memory (memmaping)
- https://pythonhosted.org/joblib/parallel.html#working-with-numerical-data-in-shared-memory-memmaping

Set number cpus julia
- https://stackoverflow.com/questions/27931026/obtain-the-number-of-cpu-cores-in-julia

In [1]:
import sklearn
import multiprocessing as mp
import numpy as np
import pandas as pd
from sklearn import linear_model

In [2]:
MNIST_path = "/home/david/Datasets/MNIST/train_mnist.csv"

In [3]:
X_tr = pd.read_csv("/home/david/Datasets/MNIST/train_mnist.csv", header=None)
X_tr = X_tr.as_matrix()
y_tr = X_tr[:,0]
X_tr = X_tr[:,1:]

X_te = pd.read_csv("/home/david/Datasets/MNIST/test_mnist.csv", header=None)
X_te = X_te.as_matrix()
y_te = X_te[:,0]
X_te = X_te[:,1:]

In [4]:
X_tr.shape, X_te.shape

((60000, 784), (10000, 784))

### Spawn a single Process

Infor about the multiprocessing module
- https://docs.python.org/2/library/multiprocessing.html

In [5]:
sklearn.linear_model.Perceptron()

Perceptron(alpha=0.0001, class_weight=None, eta0=1.0, fit_intercept=True,
      n_iter=5, n_jobs=1, penalty=None, random_state=0, shuffle=True,
      verbose=0, warm_start=False)

In [6]:
m1 = linear_model.Perceptron(n_jobs=2)

The following code snippet prints true because the main Python process, the one executing the notebook is not blocked by the spawned process `p`.

In [7]:
p = mp.Process(target=m1.fit, args=(X_tr, y_tr))
p.start()
p.is_alive() 

True

The following code snippet prints False because the `p.join()` blocks the Python process, the one executing the notebook, until  `p` ends.

In [8]:
p = mp.Process(target=m1.fit, args=(X_tr, y_tr))
p.start()
p.join()
p.is_alive() 

False

### Spawn multiple processes in parallel

We can define several Python Process objects and start them in parallel. 

In [9]:
m1 = linear_model.Perceptron(n_jobs=1)
m2 = linear_model.LogisticRegression(n_jobs=1)

In [10]:
p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
p2 = mp.Process(target=m2.fit, args=(X_tr, y_tr))

In [11]:
p1.start()
p2.start()

In [12]:
p1.is_alive()

True

In [13]:
p2.is_alive()

True

### Measure Time execution time of a process

In [14]:
m1 = linear_model.Perceptron(n_jobs=1, n_iter=30)
m2 = linear_model.Perceptron(n_jobs=1, n_iter=30)
m3 = linear_model.Perceptron(n_jobs=1, n_iter=30)
m4 = linear_model.Perceptron(n_jobs=1, n_iter=30)

p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
p2 = mp.Process(target=m2.fit, args=(X_tr, y_tr))
p3 = mp.Process(target=m3.fit, args=(X_tr, y_tr))
p4 = mp.Process(target=m4.fit, args=(X_tr, y_tr))

In [15]:
%%time
p1.start()
p2.start()
p3.start()
p4.start()
p1.join()
p2.join()
p3.join()
p4.join()

CPU times: user 4 ms, sys: 64 ms, total: 68 ms
Wall time: 58.6 s


In [16]:
m1 = linear_model.Perceptron(n_jobs=2, n_iter=30)
m2 = linear_model.Perceptron(n_jobs=2, n_iter=30)
m3 = linear_model.Perceptron(n_jobs=2, n_iter=30)
m4 = linear_model.Perceptron(n_jobs=2, n_iter=30)

p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
p2 = mp.Process(target=m2.fit, args=(X_tr, y_tr))
p3 = mp.Process(target=m3.fit, args=(X_tr, y_tr))
p4 = mp.Process(target=m4.fit, args=(X_tr, y_tr))

In [17]:
%%time
p1.start()
p2.start()
p1.join()
p2.join()
p3.start()
p4.start()
p3.join()
p4.join()

CPU times: user 0 ns, sys: 72 ms, total: 72 ms
Wall time: 1min


In [18]:
m1 = linear_model.Perceptron(n_jobs=4, n_iter=30)
m2 = linear_model.Perceptron(n_jobs=4, n_iter=30)
m3 = linear_model.Perceptron(n_jobs=4, n_iter=30)
m4 = linear_model.Perceptron(n_jobs=4, n_iter=30)

p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
p2 = mp.Process(target=m2.fit, args=(X_tr, y_tr))
p3 = mp.Process(target=m3.fit, args=(X_tr, y_tr))
p4 = mp.Process(target=m4.fit, args=(X_tr, y_tr))

In [19]:
%%time
p1.start()
p1.join()
p2.start()
p2.join()
p3.start()
p3.join()
p4.start()
p4.join()

CPU times: user 0 ns, sys: 64 ms, total: 64 ms
Wall time: 1min


In [20]:
m1 = linear_model.Perceptron(n_jobs=3, n_iter=30)
m2 = linear_model.Perceptron(n_jobs=1, n_iter=30)
m3 = linear_model.Perceptron(n_jobs=3, n_iter=30)
m4 = linear_model.Perceptron(n_jobs=1, n_iter=30)

p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
p2 = mp.Process(target=m2.fit, args=(X_tr, y_tr))
p3 = mp.Process(target=m3.fit, args=(X_tr, y_tr))
p4 = mp.Process(target=m4.fit, args=(X_tr, y_tr))

In [21]:
%%time
p1.start()
p2.start()
p1.join()
p2.join()
p3.start()
p4.start()
p3.join()
p4.join()

CPU times: user 8 ms, sys: 64 ms, total: 72 ms
Wall time: 1min 16s


#### Timing individually

In [22]:
%%time
m1 = linear_model.Perceptron(n_jobs=4, n_iter=30)
p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
p1.start()
p1.join()

CPU times: user 8 ms, sys: 12 ms, total: 20 ms
Wall time: 15.8 s


In [23]:
%%time
m1 = linear_model.Perceptron(n_jobs=3, n_iter=30)
p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
p1.start()
p1.join()

CPU times: user 8 ms, sys: 4 ms, total: 12 ms
Wall time: 16.1 s


In [24]:
%%time
m1 = linear_model.Perceptron(n_jobs=2, n_iter=30)
p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
p1.start()
p1.join()

CPU times: user 0 ns, sys: 16 ms, total: 16 ms
Wall time: 17.8 s


In [10]:
%%time
m1 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
p1.start()
p1.join()

CPU times: user 8 ms, sys: 8 ms, total: 16 ms
Wall time: 19.4 s


#### Two at a time

In [12]:
%%time
m1 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
m2 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p2 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
p1.start()
p2.start()
p2.join()
p1.join()

CPU times: user 0 ns, sys: 32 ms, total: 32 ms
Wall time: 24.5 s


In [17]:
%%time

m1 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
m2 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p2 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
m3 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p3 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
m4 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p4 = mp.Process(target=m1.fit, args=(X_tr, y_tr))

p1.start()
p2.start()
p2.join()
p1.join()
p3.start()
p4.start()
p3.join()
p4.join()

CPU times: user 8 ms, sys: 68 ms, total: 76 ms
Wall time: 53.7 s


In [25]:
%%time

m1 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
m2 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p2 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
m3 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p3 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
m4 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p4 = mp.Process(target=m1.fit, args=(X_tr, y_tr))

p1.start()
p2.start()
p3.start()
p4.start()
p2.join()
p1.join()
p3.join()
p4.join()

CPU times: user 4 ms, sys: 88 ms, total: 92 ms
Wall time: 38.8 s


#### More applications than resources

In [20]:
%%time

m1 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
m2 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p2 = mp.Process(target=m2.fit, args=(X_tr, y_tr))
m3 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p3 = mp.Process(target=m3.fit, args=(X_tr, y_tr))
m4 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p4 = mp.Process(target=m4.fit, args=(X_tr, y_tr))
m5 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p5 = mp.Process(target=m5.fit, args=(X_tr, y_tr))
m6 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p6 = mp.Process(target=m6.fit, args=(X_tr, y_tr))

p1.start()
p2.start()
p3.start()
p4.start()
p5.start()
p6.start()

p2.join()
p1.join()
p3.join()
p4.join()
p5.join()
p6.join()

CPU times: user 8 ms, sys: 144 ms, total: 152 ms
Wall time: 59 s


In [24]:
%%time
m1 = linear_model.Perceptron(n_jobs=2, n_iter=30)
p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
m2 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p2 = mp.Process(target=m2.fit, args=(X_tr, y_tr))
m3 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p3 = mp.Process(target=m3.fit, args=(X_tr, y_tr))
m4 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p4 = mp.Process(target=m4.fit, args=(X_tr, y_tr))
m5 = linear_model.Perceptron(n_jobs=2, n_iter=30)
p5 = mp.Process(target=m5.fit, args=(X_tr, y_tr))
m6 = linear_model.Perceptron(n_jobs=2, n_iter=30)
p6 = mp.Process(target=m6.fit, args=(X_tr, y_tr))

p1.start()
p2.start()
p3.start()
p4.start()
p2.join()
p5.start()
p6.start()
p1.join()
p3.join()
p4.join()
p5.join()
p6.join()

CPU times: user 8 ms, sys: 112 ms, total: 120 ms
Wall time: 58.3 s


#### Without the join.()

In [26]:
m1 = linear_model.Perceptron(n_jobs=2, n_iter=30)
m2 = linear_model.Perceptron(n_jobs=2, n_iter=30)
m3 = linear_model.Perceptron(n_jobs=2, n_iter=30)
m4 = linear_model.Perceptron(n_jobs=2, n_iter=30)

p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
p2 = mp.Process(target=m2.fit, args=(X_tr, y_tr))
p3 = mp.Process(target=m3.fit, args=(X_tr, y_tr))
p4 = mp.Process(target=m4.fit, args=(X_tr, y_tr))

In [27]:
%%time
p1.start()
p2.start()
p3.start()
p4.start()

p1.join()
p2.join()
p3.join()
p4.join()

CPU times: user 16 ms, sys: 52 ms, total: 68 ms
Wall time: 50.6 s


#### With joblib


- https://machinelearningmastery.com/save-load-machine-learning-models-python-scikit-learn/

In [None]:
import joblib
from joblib import Memory

In [None]:
mem = Memory(cachedir="/tmp/joblib")

In [None]:
from joblib import Parallel, delayed

In [None]:
m1 = linear_model.Perceptron(n_jobs=1, n_iter=30)
m2 = linear_model.Perceptron(n_jobs=1, n_iter=30)
m3 = linear_model.Perceptron(n_jobs=1, n_iter=30)
m4 = linear_model.Perceptron(n_jobs=1, n_iter=30)

models = [m1, m2, m3, m4]

### Control resource usage
- https://docs.python.org/2/library/resource.html#module-resource

## Scheduling  processes/resources

We would like to, given a set of models to train, control what resources each model is going to use.

#### n_jobs

    Number of jobs to run in parallel.


Some models have the argument `n_jobs` which allows the implementation to use `n_jobs` jobs in parallel using `n_jobs` CPU threads. Not all models have the  `n_jobs`  parameter. 

#### pre_dispatch

```
Controls the number of jobs that get dispatched during parallel execution.

Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be:
    
    - None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs
    - An int, giving the exact number of total jobs that are spawned
    - A string, giving an expression as a function of n_jobs, as in ‘2*n_jobs’
```

Some functions like `sklearn.model_selection.GridSearchCV` 
    