# Tutorial on speeding up python code on a computing cluster

* Machine learning jobs can take a long time to run
* How can your code exploit the number of cores


Assumptions

This file is a jupyter notebook that you can download to your own machine.
I suggest you paste the python code into a text file to run on the cluster 

This basic tutorial was created by Craig McNeile  https://www.plymouth.ac.uk/staff/craig-mcneile

##  Using multiple cores to run machine learning jobs

*  Example 1 and 2 were developed from this page  https://machinelearningmastery.com/multi-core-machine-learning-in-python/
*  See the reference material on scikit learn about running paralllel jobs https://scikit-learn.org/stable/computing/parallelism.html


### Tip

If you want an explanation of what some of the python code is doing you can write
**Explain the python code** and paste the code into ChatGPT https://chat.openai.com/
See this example below

<img src="https://github.com/cmcneile/HPC-tutorial/blob/main/ChatGPt.png?raw=true" alt="ChatGPT example" />


###  Example 1

In [1]:
from time import time
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score, ConfusionMatrixDisplay
from sklearn.model_selection import RandomizedSearchCV, train_test_split
from scipy.stats import randint



We create a fake data set that the classifier can be run on 

In [4]:
# define dataset
X, y = make_classification(n_samples=10000, n_features=20, n_informative=15, n_redundant=5, random_state=3)

# break the dataset into test and train data set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Build the classifier using the random forest method

In [6]:
number_of_cores = 4 
# define the model
model = RandomForestClassifier(n_estimators=500, n_jobs=number_of_cores)
# record current time
start = time()
# fit the model
model.fit(X_train, y_train)
# record current time
end = time()
# report execution time

# evaluate the model on the test data set
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Create the confusion matrix
cm = confusion_matrix(y_test, y_pred)

print("The confusion matrix")
print(cm)

result = end - start
print('Time taken %.3f seconds' % result, "with ", number_of_cores, "cores")



Accuracy: 0.9595
The confusion matrix
[[966  48]
 [ 33 953]]
Time taken 3.365 seconds with  4 cores


### Questions

* Try running the above code with number_of_cores = 1,2,4,8,16,32 . How does the time taken change?

* 

### Example 2

An important part of the machine learning is tuning the hyperparamters to obtain good performance of the algorithm.
The hyperparameters can be looped over a grid of the different possibilities or random points can be selected in a region of the space of hyperparameters.


In [20]:
%reset -f
from time import time
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score, ConfusionMatrixDisplay
from sklearn.model_selection import RandomizedSearchCV, train_test_split
from scipy.stats import randint



In [21]:
# create the dataset again
X, y = make_classification(n_samples=10000, n_features=20, n_informative=15, n_redundant=5, random_state=3)

# break the dataset into test and train data set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

In [22]:
param_dist = {'n_estimators': randint(50,500),
              'max_depth': randint(1,20)}

# Create a random forest classifier
modelA = RandomForestClassifier()

njob = 4
# Use random search to find the best hyperparameters
rand_search = RandomizedSearchCV(modelA, 
                                 param_distributions = param_dist, 
                                 n_iter=5, 
                                 cv=5, n_jobs=njob)

# Fit the random search object to the data

print("Starting the random search")

start = time()
rand_search.fit(X_train, y_train)
end = time()

result = end - start
print("No cores = ", njob , 'Time taken %.3f seconds' % result)




Starting the random search




No cores =  4 Time taken 26.115 seconds


In [23]:
# evaluate the model
y_pred = rand_search.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Create the confusion matrix
cm = confusion_matrix(y_test, y_pred)

##ConfusionMatrixDisplay(confusion_matrix=cm).plot()
print(cm)


Accuracy: 0.9475
[[911  74]
 [ 31 984]]


### Questions

*  Try running the above code with number_of_cores = 1,2,4,8,16,32 . How does the time taken change?

### Example 3 

* Some python code can be speeded by using libraries
* One such library is https://numba.pydata.org/
* Numba has options to run code in parallel


In [8]:
from time import time
from numba import jit
import numpy as np

x = np.arange(100).reshape(10, 10)

#@jit(nopython=True) # Set "nopython" mode for best performance, equivalent to @njit

#@jit(parallel = True)
def go_fast(a): # Function is compiled to machine code when called the first time
    trace = 0.0
    for i in range(a.shape[0]):   # Numba likes loops
        trace += np.tanh(a[i, i]) # Numba likes NumPy functions
    return a + trace              # Numba likes NumPy broadcasting

start = time()
print(go_fast(x))
end = time()

result = end - start
print('Time taken %.3f seconds' % result)

start = time()
ans = go_fast(x)
end = time()
print('Time taken %.3f seconds' % result)

[[  9.  10.  11.  12.  13.  14.  15.  16.  17.  18.]
 [ 19.  20.  21.  22.  23.  24.  25.  26.  27.  28.]
 [ 29.  30.  31.  32.  33.  34.  35.  36.  37.  38.]
 [ 39.  40.  41.  42.  43.  44.  45.  46.  47.  48.]
 [ 49.  50.  51.  52.  53.  54.  55.  56.  57.  58.]
 [ 59.  60.  61.  62.  63.  64.  65.  66.  67.  68.]
 [ 69.  70.  71.  72.  73.  74.  75.  76.  77.  78.]
 [ 79.  80.  81.  82.  83.  84.  85.  86.  87.  88.]
 [ 89.  90.  91.  92.  93.  94.  95.  96.  97.  98.]
 [ 99. 100. 101. 102. 103. 104. 105. 106. 107. 108.]]
Time taken 0.001 seconds
Time taken 0.001 seconds
