## Demonstrating serial vs parallel processing

**Week02, Example 1**

ISM6562 

&copy; 2023 Dr. Tim Smith


<a target="_blank" href="https://colab.research.google.com/github/prof-tcsmith/bd-f23/blob/main/W02/W02.1-multiprocessing-ex1.ipynb#offline=1">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

---

In [1]:
# uncomment the following line to install the package on google colab
# ! pip install multiprocess

# uncomment the following line to install the package on your local machine with conda
# ! conda install -c conda-forge multiprocess

In [2]:
import multiprocess
from multiprocess import Pool

number_of_cores = multiprocess.cpu_count()

print(f"The computer you are running this on has {number_of_cores} cores.")

The computer you are running this on has 8 cores.


> NOTE: The difference between single processing and multiprocessing speeds will depend on how many cores your CPU has. The more cores, the more paralellization can be done. My desktop has 32 cores. If you have a different number of cores, you will see a different speedup.

In [4]:
# the number of cores to use is dependent on a number factors
#  * how many other processes are running on your computer
#  * how many cores your computer has
#  * newer CPUs have p-cores (performance cores) and e-cores (efficiency cores) 
#       - the p-cores are faster but use more power, the e-cores are slower but use less power

# Uncomment on of the following to set the number of cores to use


#number_of_cores_to_use = 2 # if you are on colaboratory, you have 2 cores

# if you are running this on your local machine, you likely have 4 or more cores
number_of_cores_to_use = number_of_cores - 1 # leave one core for the OS 

In this notebook we will create a task that can be run in parallel. This is defined as a function (you can call it whatever name you'd like, but I've called it 'tasks').

The task has a certain computational complexity. In this case, it is defined by the number of iterations in the for loop. The more iterations, the more complex the task. The more complex the task, the more time it takes to complete.

Also, I've defined the number of tasks that will be run. The more tasks that are run, the more time it takes to complete.

In [18]:
task_complexity = 100000
number_of_tasks = 1000

In [19]:
def task(num):
    val = 0
    for i in range(task_complexity):
        val += i / 23
    return val

Now, let's do this job a number of times on a single core and then on multiple cores. We'll time the results and compare.

Single core results...

In [20]:
%%time
data = []
for i in range(number_of_tasks):
    data += [task(i)]

print(data[:10])

[217389130.43478262, 217389130.43478262, 217389130.43478262, 217389130.43478262, 217389130.43478262, 217389130.43478262, 217389130.43478262, 217389130.43478262, 217389130.43478262, 217389130.43478262]
CPU times: total: 1.92 s
Wall time: 5.37 s


> NOTE: **Wall time** is the total time from start to finish of the call. This includes time spent waiting for network, CPU, and disk I/O resources. This is the time that you would measure as the elapsed time between the start and end of a python script.


Multiple core results...

In [26]:
%%time

with Pool(number_of_cores) as p:
    data = p.starmap(task, [(i,task_complexity) for i in range(number_of_tasks)])

print(data[:10])    

TypeError: task() takes 1 positional argument but 2 were given

### Summary

The multiprocessing package is a great way to speed up your code. But, if the complexity of the task is not high enough, the overhead of creating the processes will be higher than the speedup you get from parallelization. In this case, the code will actually run slower than the serial version. This overhead is also why the speedup is not linear with the number of cores. The more cores you use, the more overhead you have. This overhead though becomes a smaller percentage of the total time as the complexity of the task increases.