<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Sklearn-inside-a-Process" data-toc-modified-id="Sklearn-inside-a-Process-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Sklearn inside a Process</a></span><ul class="toc-item"><li><span><a href="#Spawn-a-single-Process" data-toc-modified-id="Spawn-a-single-Process-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Spawn a single Process</a></span></li><li><span><a href="#Spawn-multiple-processes-in-parallel" data-toc-modified-id="Spawn-multiple-processes-in-parallel-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Spawn multiple processes in parallel</a></span></li><li><span><a href="#Encapsulating-more-than-a-simple-fit" data-toc-modified-id="Encapsulating-more-than-a-simple-fit-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Encapsulating more than a simple fit</a></span></li><li><span><a href="#Measure-Time-execution-time-of-a-process" data-toc-modified-id="Measure-Time-execution-time-of-a-process-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Measure Time execution time of a process</a></span><ul class="toc-item"><li><span><a href="#Timing-individually" data-toc-modified-id="Timing-individually-1.4.1"><span class="toc-item-num">1.4.1&nbsp;&nbsp;</span>Timing individually</a></span></li><li><span><a href="#Two-at-a-time" data-toc-modified-id="Two-at-a-time-1.4.2"><span class="toc-item-num">1.4.2&nbsp;&nbsp;</span>Two at a time</a></span></li><li><span><a href="#More-applications-than-resources" data-toc-modified-id="More-applications-than-resources-1.4.3"><span class="toc-item-num">1.4.3&nbsp;&nbsp;</span>More applications than resources</a></span></li><li><span><a href="#Without-the-join.()" data-toc-modified-id="Without-the-join.()-1.4.4"><span class="toc-item-num">1.4.4&nbsp;&nbsp;</span>Without the join.()</a></span></li><li><span><a href="#With-joblib" data-toc-modified-id="With-joblib-1.4.5"><span class="toc-item-num">1.4.5&nbsp;&nbsp;</span>With joblib</a></span></li></ul></li><li><span><a href="#Control-resource-usage" data-toc-modified-id="Control-resource-usage-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Control resource usage</a></span></li></ul></li><li><span><a href="#Scheduling--processes/resources" data-toc-modified-id="Scheduling--processes/resources-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Scheduling  processes/resources</a></span><ul class="toc-item"><li><ul class="toc-item"><li><span><a href="#n_jobs" data-toc-modified-id="n_jobs-2.0.1"><span class="toc-item-num">2.0.1&nbsp;&nbsp;</span>n_jobs</a></span></li><li><span><a href="#pre_dispatch" data-toc-modified-id="pre_dispatch-2.0.2"><span class="toc-item-num">2.0.2&nbsp;&nbsp;</span>pre_dispatch</a></span></li></ul></li></ul></li></ul></div>

## Sklearn inside a Process


Sklearn gridsearch memory

- https://stackoverflow.com/questions/24406937/scikit-learn-joblib-bug-multiprocessing-pool-self-value-out-of-range-for-i-fo/24411581#24411581


Working with numerical data in shared memory (memmaping)
- https://pythonhosted.org/joblib/parallel.html#working-with-numerical-data-in-shared-memory-memmaping

Set number cpus julia
- https://stackoverflow.com/questions/27931026/obtain-the-number-of-cpu-cores-in-julia

In [5]:
import sklearn
import multiprocessing as mp
import numpy as np
import pandas as pd
from sklearn import linear_model

In [17]:
MNIST_path = "/home/david/Datasets/MNIST/train_mnist.csv"

In [43]:
from os.path import expanduser
home = expanduser("~")

X_tr = pd.read_csv(os.path.join(home,"Datasets/MNIST/train_mnist.csv"), header=None)
X_tr = X_tr.values
y_tr = X_tr[:,0]
X_tr = X_tr[:,1:]

X_te = pd.read_csv(os.path.join(home,"Datasets/MNIST/test_mnist.csv"), header=None)
X_te = X_te.values
y_te = X_te[:,0]
X_te = X_te[:,1:]

In [44]:
X_tr.shape, X_te.shape

((60000, 784), (10000, 784))

### Spawn a single Process

Infor about the multiprocessing module
- https://docs.python.org/2/library/multiprocessing.html

In [5]:
sklearn.linear_model.Perceptron()

Perceptron(alpha=0.0001, class_weight=None, early_stopping=False, eta0=1.0,
      fit_intercept=True, max_iter=None, n_iter=None, n_iter_no_change=5,
      n_jobs=None, penalty=None, random_state=0, shuffle=True, tol=None,
      validation_fraction=0.1, verbose=0, warm_start=False)

In [6]:
m1 = linear_model.Perceptron(n_jobs=2)

The following code snippet prints true because the main Python process, the one executing the notebook is not blocked by the spawned process `p`.

In [7]:
p = mp.Process(target=m1.fit, args=(X_tr, y_tr))
p.start()
p.is_alive() 

True



The following code snippet prints False because the `p.join()` blocks the Python process, the one executing the notebook, until  `p` ends.

In [8]:
p = mp.Process(target=m1.fit, args=(X_tr, y_tr))
p.start()
p.join()
p.is_alive() 



False

### Spawn multiple processes in parallel

We can define several Python Process objects and start them in parallel. 

In [152]:
m1 = linear_model.Perceptron(n_jobs=1)
m2 = linear_model.Perceptron(n_jobs=1)

In [153]:
p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
p2 = mp.Process(target=m2.fit, args=(X_tr, y_tr))

In [11]:
p1.start()
p2.start()



In [12]:
p1.is_alive()

True

In [13]:
p2.is_alive()

True

### Encapsulating more than a simple fit

Assume we want to fit in parallel different models and once fitted we want to generate predictions for a test set.

Let us create a function that recieves as input a model, data to train the model, and test data to generate predictions. That is `fit_and_return_test_preds(model, X_tr, y_tr, X_te)`.




In [1]:
# y_preds = np.zeros(sample_submission.shape[0])
# y_oof   = np.zeros(X_train.shape[0])
# 
# 253     clf = pipe_cv.best_estimator_
# 254     X_tr, X_vl = X_train.iloc[tr_idx, :], X_train.iloc[val_idx, :]
# 255     y_tr, y_vl = y_train.iloc[tr_idx], y_train.iloc[val_idx]
# 256     clf.fit(X_tr, y_tr)
# 257     y_pred_train = clf.predict_proba(X_vl)[:,1]
# 258     y_oof[val_idx] = y_pred_train
# 259     print('ROC AUC {}'.format(roc_auc_score(y_vl, y_pred_train)))
# 260     y_preds+= clf.predict_proba(X_test)[:,1]/EPOCHS
# 261
# 262
# 

In [239]:

def fit_and_return_preds(p_id,
                         return_predictions,
                         model, 
                         X:                pd.core.frame.DataFrame or np.ndarray,
                         y:                pd.core.frame.DataFrame or np.ndarray,
                         tr_idx:              np.ndarray or list,
                         va_idx:              np.ndarray or list, 
                         X_te:                pd.core.frame.DataFrame or np.ndarray,
                         evaluation_metric):
    """
    This function is meant to be used to train a model on the rows of `X` specified by `tr_idx`.
    Then it computes valudation metric in the rows `val_idx`.
    Finally it computes predictions on `X_te`
    """
    assert type(X) in [pd.DataFrame, np.ndarray], "type(X)={} but it should be a pd.DataFrame or np.ndarray".format(type(X))
    assert type(y) in [pd.DataFrame, np.ndarray], "type(y)={} but it should be a pd.DataFrame or np.ndarray".format(type(X))
    assert type(tr_idx)== np.ndarray, "type(tr_idx)={} but it should be a np.ndarray".format(type(train_idx))
    assert type(va_idx)== np.ndarray, "type(va_idx)={} but it should be a np.ndarray".format(type(train_idx))
    
    if type(X) == pd.DataFrame:
        X_tr, X_va = X.iloc[tr_idx, :],  X.iloc[va_idx, :]
        y_tr, y_va = y.iloc[tr_idx],     y.iloc[va_idx]
    else:
        X_tr, X_va = X[tr_idx, :],  X[va_idx, :]
        y_tr, y_va = y[tr_idx],     y[va_idx]
        
    model    = model.fit(X_tr, y_tr)
    y_te_hat = model.predict(X_te)
        
    y_tr_pred = model.predict_proba(X_tr)[:,1]
    y_va_pred = model.predict_proba(X_va)[:,1]
    print('{} train: {}'.format(evaluation_metric.__name__, evaluation_metric(y_tr, y_tr_pred)))
    print('{} valid: {}'.format(evaluation_metric.__name__, evaluation_metric(y_va, y_va_pred)))

    y_te_pred = model.predict_proba(X_te)[:,1]
    
    return_predictions[p_id] = y_te_pred


In [2]:
import copy
import multiprocessing as mp

def fit_and_average_k_models(model, X, y, X_te, k):
    
    manager            = mp.Manager()
    return_predictions = manager.dict()
    evaluation_function = sklearn.metrics.roc_auc_score
    kf = sklearn.model_selection.KFold(n_splits = k, shuffle = True)
    
    processes=[]
    for p_id, (tr_idx, va_idx) in enumerate(kf.split(X, y)):
        processes.append(mp.Process(target=fit_and_return_preds, 
                                    args=(p_id, return_predictions, 
                                          copy.deepcopy(model),
                                          X, y , tr_idx, va_idx, X_te, evaluation_function)))

    for p in processes:
        p.start()
        
    for p in processes:
        p.join()
        
    y_pred = np.zeros(X_te.shape[0])
    for y_k in return_predictions.values():
        y_pred += y_k
    
    y_pred = y_pred/k
    
    return y_pred, return_predictions


In [3]:
inds_01 = np.logical_or(y_tr==0,y_tr==1)
X_tr_01 = X_tr[inds_01,:]
y_tr_01 = y_tr[inds_01]

inds_01 = np.logical_or(y_te==0,y_te==1)
X_te_01 = X_te[inds_01,:]

NameError: name 'np' is not defined

In [4]:
len(y_tr_01), len(X_tr_01), len(X_te_01)

NameError: name 'y_tr_01' is not defined

In [251]:
model = sklearn.neural_network.MLPClassifier(hidden_layer_sizes=[30], max_iter=2)
y_pred, return_predictions = fit_and_average_k_models(model, X_tr_01, y_tr_01, X_te_01, 4)



roc_auc_score train: 0.999880379961358
roc_auc_score valid: 0.999308584193171




roc_auc_score train: 0.9997705203676407
roc_auc_score valid: 0.9999991962060928




roc_auc_score train: 0.9998824024211869
roc_auc_score valid: 0.9999800033034543
roc_auc_score train: 0.9999927026591687
roc_auc_score valid: 0.9999843467099393


### Measure Time execution time of a process

In [None]:
m1 = linear_model.Perceptron(n_jobs=1, n_iter=30)
m2 = linear_model.Perceptron(n_jobs=1, n_iter=30)
m3 = linear_model.Perceptron(n_jobs=1, n_iter=30)
m4 = linear_model.Perceptron(n_jobs=1, n_iter=30)

p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
p2 = mp.Process(target=m2.fit, args=(X_tr, y_tr))
p3 = mp.Process(target=m3.fit, args=(X_tr, y_tr))
p4 = mp.Process(target=m4.fit, args=(X_tr, y_tr))

In [None]:
%%time
p1.start()
p2.start()
p3.start()
p4.start()
p1.join()
p2.join()
p3.join()
p4.join()

In [None]:
m1 = linear_model.Perceptron(n_jobs=2, n_iter=30)
m2 = linear_model.Perceptron(n_jobs=2, n_iter=30)
m3 = linear_model.Perceptron(n_jobs=2, n_iter=30)
m4 = linear_model.Perceptron(n_jobs=2, n_iter=30)

p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
p2 = mp.Process(target=m2.fit, args=(X_tr, y_tr))
p3 = mp.Process(target=m3.fit, args=(X_tr, y_tr))
p4 = mp.Process(target=m4.fit, args=(X_tr, y_tr))

In [None]:
%%time
p1.start()
p2.start()
p1.join()
p2.join()
p3.start()
p4.start()
p3.join()
p4.join()

In [None]:
m1 = linear_model.Perceptron(n_jobs=4, n_iter=30)
m2 = linear_model.Perceptron(n_jobs=4, n_iter=30)
m3 = linear_model.Perceptron(n_jobs=4, n_iter=30)
m4 = linear_model.Perceptron(n_jobs=4, n_iter=30)

p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
p2 = mp.Process(target=m2.fit, args=(X_tr, y_tr))
p3 = mp.Process(target=m3.fit, args=(X_tr, y_tr))
p4 = mp.Process(target=m4.fit, args=(X_tr, y_tr))

In [None]:
%%time
p1.start()
p1.join()
p2.start()
p2.join()
p3.start()
p3.join()
p4.start()
p4.join()

In [None]:
m1 = linear_model.Perceptron(n_jobs=3, n_iter=30)
m2 = linear_model.Perceptron(n_jobs=1, n_iter=30)
m3 = linear_model.Perceptron(n_jobs=3, n_iter=30)
m4 = linear_model.Perceptron(n_jobs=1, n_iter=30)

p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
p2 = mp.Process(target=m2.fit, args=(X_tr, y_tr))
p3 = mp.Process(target=m3.fit, args=(X_tr, y_tr))
p4 = mp.Process(target=m4.fit, args=(X_tr, y_tr))

In [None]:
%%time
p1.start()
p2.start()
p1.join()
p2.join()
p3.start()
p4.start()
p3.join()
p4.join()

#### Timing individually

In [None]:
%%time
m1 = linear_model.Perceptron(n_jobs=4, n_iter=30)
p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
p1.start()
p1.join()

In [None]:
%%time
m1 = linear_model.Perceptron(n_jobs=3, n_iter=30)
p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
p1.start()
p1.join()

In [None]:
%%time
m1 = linear_model.Perceptron(n_jobs=2, n_iter=30)
p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
p1.start()
p1.join()

In [None]:
%%time
m1 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
p1.start()
p1.join()

#### Two at a time

In [None]:
%%time
m1 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
m2 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p2 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
p1.start()
p2.start()
p2.join()
p1.join()

In [None]:
%%time

m1 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
m2 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p2 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
m3 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p3 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
m4 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p4 = mp.Process(target=m1.fit, args=(X_tr, y_tr))

p1.start()
p2.start()
p2.join()
p1.join()
p3.start()
p4.start()
p3.join()
p4.join()

In [None]:
%%time

m1 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
m2 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p2 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
m3 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p3 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
m4 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p4 = mp.Process(target=m1.fit, args=(X_tr, y_tr))

p1.start()
p2.start()
p3.start()
p4.start()
p2.join()
p1.join()
p3.join()
p4.join()

#### More applications than resources

In [None]:
%%time

m1 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
m2 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p2 = mp.Process(target=m2.fit, args=(X_tr, y_tr))
m3 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p3 = mp.Process(target=m3.fit, args=(X_tr, y_tr))
m4 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p4 = mp.Process(target=m4.fit, args=(X_tr, y_tr))
m5 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p5 = mp.Process(target=m5.fit, args=(X_tr, y_tr))
m6 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p6 = mp.Process(target=m6.fit, args=(X_tr, y_tr))

p1.start()
p2.start()
p3.start()
p4.start()
p5.start()
p6.start()

p2.join()
p1.join()
p3.join()
p4.join()
p5.join()
p6.join()

In [None]:
%%time
m1 = linear_model.Perceptron(n_jobs=2, n_iter=30)
p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
m2 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p2 = mp.Process(target=m2.fit, args=(X_tr, y_tr))
m3 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p3 = mp.Process(target=m3.fit, args=(X_tr, y_tr))
m4 = linear_model.Perceptron(n_jobs=1, n_iter=30)
p4 = mp.Process(target=m4.fit, args=(X_tr, y_tr))
m5 = linear_model.Perceptron(n_jobs=2, n_iter=30)
p5 = mp.Process(target=m5.fit, args=(X_tr, y_tr))
m6 = linear_model.Perceptron(n_jobs=2, n_iter=30)
p6 = mp.Process(target=m6.fit, args=(X_tr, y_tr))

p1.start()
p2.start()
p3.start()
p4.start()
p2.join()
p5.start()
p6.start()
p1.join()
p3.join()
p4.join()
p5.join()
p6.join()

#### Without the join.()

In [None]:
m1 = linear_model.Perceptron(n_jobs=2, n_iter=30)
m2 = linear_model.Perceptron(n_jobs=2, n_iter=30)
m3 = linear_model.Perceptron(n_jobs=2, n_iter=30)
m4 = linear_model.Perceptron(n_jobs=2, n_iter=30)

p1 = mp.Process(target=m1.fit, args=(X_tr, y_tr))
p2 = mp.Process(target=m2.fit, args=(X_tr, y_tr))
p3 = mp.Process(target=m3.fit, args=(X_tr, y_tr))
p4 = mp.Process(target=m4.fit, args=(X_tr, y_tr))

In [None]:
%%time
p1.start()
p2.start()
p3.start()
p4.start()

p1.join()
p2.join()
p3.join()
p4.join()

#### With joblib


- https://machinelearningmastery.com/save-load-machine-learning-models-python-scikit-learn/

In [None]:
import joblib
from joblib import Memory

In [None]:
mem = Memory(cachedir="/tmp/joblib")

In [None]:
from joblib import Parallel, delayed

In [None]:
m1 = linear_model.Perceptron(n_jobs=1, n_iter=30)
m2 = linear_model.Perceptron(n_jobs=1, n_iter=30)
m3 = linear_model.Perceptron(n_jobs=1, n_iter=30)
m4 = linear_model.Perceptron(n_jobs=1, n_iter=30)

models = [m1, m2, m3, m4]

### Control resource usage
- https://docs.python.org/2/library/resource.html#module-resource

## Scheduling  processes/resources

We would like to, given a set of models to train, control what resources each model is going to use.

#### n_jobs

    Number of jobs to run in parallel.


Some models have the argument `n_jobs` which allows the implementation to use `n_jobs` jobs in parallel using `n_jobs` CPU threads. Not all models have the  `n_jobs`  parameter. 

#### pre_dispatch

```
Controls the number of jobs that get dispatched during parallel execution.

Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be:
    
    - None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fast-running jobs, to avoid delays due to on-demand spawning of the jobs
    - An int, giving the exact number of total jobs that are spawned
    - A string, giving an expression as a function of n_jobs, as in ‘2*n_jobs’
```

Some functions like `sklearn.model_selection.GridSearchCV` 
    