**Section 4.3 - Example: Adding Two Arrays**

We will write functions to add two 1D arrays in a few different ways.


1.   In serial, using a `for` loop
2.   Vectorized, using `numpy`
3.   Multi-threaded, using `Pool` and `map()`






In [2]:
# Script with different functions to add arrays
# Import necessary modules
import random
import numpy as np
# your code here...
from multiprocessing import Pool

# Create the arrays to be added
N = 10         # number of entries in the array
array1 = np.random.randint(1, 100, size=N)
array2 = np.random.randint(1, 100, size=N)



# Define serial function
# your code here...
def add_arrays_sequential(arr1, arr2):
    result = []
    for i in range(len(arr1)):
        result.append(arr1[i] + arr2[i])
    return result

# Define vectorized function
# your code here...
def add_arrays_vectorized(arr1, arr2):
    result = np.add(arr1, arr2)
    return result

# Define multi-threaded function
# your code here...
def add_arrays_multithreaded(arr1, arr2):
    array_pairs = list(zip(arr1, arr2))
    with Pool(processes=5) as p:
        result = p.starmap(np.add, array_pairs)
    return result

# Run each function and print the output
# your code here...
seq_array = add_arrays_sequential(array1, array2)
print("Sequential array is", seq_array)

vec_array = add_arrays_vectorized(array1, array2)
print("Vectorized array is", vec_array)

mult_array = add_arrays_multithreaded(array1, array2)
print("Multithreaded array is", mult_array)


Sequential array is [40, 101, 151, 127, 66, 57, 98, 47, 108, 78]
Vectorized array is [ 40 101 151 127  66  57  98  47 108  78]
Multithreaded array is [40, 101, 151, 127, 66, 57, 98, 47, 108, 78]


In [2]:
# Import necessary modules
import random
import numpy as np
# your code here...
from multiprocessing import Pool

# Create the arrays to be added
N = 10         # number of entries in the array
array1 = np.random.randint(1, 100, size=N)
array2 = np.random.randint(1, 100, size=N)

print(array1)
print(array2)

array_pairs = list(zip(array1, array2))
print(array_pairs)

[21 85 70 67 75 75 73 86 40 69]
[65 87 35 70 95 40 41 78 71  5]
[(21, 65), (85, 87), (70, 35), (67, 70), (75, 95), (75, 40), (73, 41), (86, 78), (40, 71), (69, 5)]


**Section 5 - Analyzing Performance**

To best demonstrate how to check the performance of Python code, let's consider the examples below showing different ways to use the `timeit` module.

This is a built-in Python module that can be used either from the command line or within a script.

In this script, we'll use two methods of concatenating strings by appending numbers 1-99. We will define two functions, then compare how long they take to run using `timeit.timeit()`.

The general syntax is `timeit.timeit(stmt, setup, timer, number)`






In [6]:
# Script to time two different functions with timeit
# your code here...
import timeit

def method1():
    newstring = ""
    for x in range(100):
        newstring += str(x)
    return newstring

def method2():
    newstring = "".join([str(x) for x in range(100)])
    return newstring

output1 = method1()
print("The first output is", output1)
output2 = method2()
print("The second output is", output2)

# get the timing 
time1 = timeit.timeit(method1)
time2 = timeit.timeit(method2)

print("Method 1 time:", time1)
print("Method 2 time:", time2)

The first output is 0123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899
The second output is 0123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899
Method 1 time: 27.2454920001328
Method 2 time: 25.0220317998901


`timeit` tells us the total time it took to run each iteration of the code, in seconds. In our example, this is ~2 seconds to run our functions 100,000 times. Consider the differences in the methods and why you might want to use one over the other.  


Another method of timing code with `timeit` is to pass the code as a string to the timeit function. This is made simple by encasing our formatted code snippet in triple quotes, and storing it as a variable. The same can be done for any setup code.

In [7]:
# Script to time code in triple quotes
import timeit

# setup snippet to be executed only once
mysetup = ""

# code snippet whose execution time is to be measured
# your code here...
mycode = '''
def method1():
    newstring = ""
    for x in range(100):
        newstring += str(x)
    return newstring

method1()
'''

# timeit statement
# your code here...
print("Code snippet runtime:", timeit.timeit(setup = mysetup,stmt = mycode,number = 100000))

Code snippet runtime: 2.7665488000493497


Another (maybe better) way to check the timing of your code is to profile your code using the `profile` or `cProfile` modules. These profile your entire code as it runs, and allows for a more holistic view to identify where speedup is most needed.

Of the two modules, `cProfile` is preferred because of its reduced overhead and suitability for long-running programs.

Let's use `cProfile` with a script that contains two functions to perform the same task - summing a series of random numbers. The original script is provided to edit.

In [9]:
# Python script without cProfile incorporated
import random

# set number of integers in range
N = 100       # this can be increased or decreased


def function_A(N):
  total = 0
  for x in range(N):
    total += random.randint(1, 100)
  return total


def function_B(N):
 total = sum(random.randint(1, 100) for x in range(N))
 return total


def test_methods(N):
  # Call function A
  first_result = function_A(N)
  print("Result of function_A:", first_result)

 # Call function B
  second_result = function_B(N)
  print("Result of function_B:", second_result)

# using if name == main ensures that the function runs only when called, and not when imported
if __name__ == "__main__":
  test_methods(N)


Result of function_A: 5370
Result of function_B: 5019


Now that we have our script, we can edit it to incorporate `cProfile`. To do this, import `cProfile` at the beginning of the code and edit how `main()` is run.

Keep the above cell intact for reference, and edit the copy below.

In [11]:
# Copy of Python script without cProfile incorporated - to edit!
import random
import cProfile

# set number of integers in range
N = 10000       # this can be increased or decreased


def function_A(N):
  total = 0
  for x in range(N):
    total += random.randint(1, 100)
  return total


def function_B(N):
 total = sum(random.randint(1, 100) for x in range(N))
 return total


def main(N):
  first_result = function_A(N)
  print("Result of function_A:", first_result)

  second_result = function_B(N)
  print("Result of function_B:", second_result)

if __name__ == "__main__":
  cProfile.run('main(N)')


Result of function_A: 503807
Result of function_B: 501278
         175644 function calls in 0.100 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.055    0.055 3125828966.py:16(function_B)
    10001    0.008    0.000    0.052    0.000 3125828966.py:17(<genexpr>)
        1    0.000    0.000    0.100    0.100 3125828966.py:21(main)
        1    0.007    0.007    0.045    0.045 3125828966.py:9(function_A)
        1    0.000    0.000    0.100    0.100 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 iostream.py:137(_event_pipe)
        1    0.000    0.000    0.000    0.000 iostream.py:258(schedule)
        8    0.000    0.000    0.000    0.000 iostream.py:519(_is_master_process)
        8    0.000    0.000    0.000    0.000 iostream.py:546(_schedule_flush)
        8    0.000    0.000    0.000    0.000 iostream.py:624(write)
    20000    0.022    0.000    0.028    0.000 random.py

This is a useful output, but we can still improve it! We can use the `pstats` module to sort the output according to the number of calls.

In [15]:
# Another copy of the Python script to edit!
import random
import cProfile
import pstats
from pstats import SortKey

# set number of integers in range
N = 100000       # this can be increased or decreased


def function_A(N):
  total = 0
  for x in range(N):
    total += random.randint(1, 100)
  return total


def function_B(N):
 total = sum(random.randint(1, 100) for x in range(N))
 return total


def main(N):
  first_result = function_A(N)
  print("Result of function_A:", first_result)

  second_result = function_B(N)
  print("Result of function_B:", second_result)

if __name__ == "__main__":
    cProfile.run('main(N)', filename='stats.prof')
    p = pstats.Stats('stats.prof')
    p.strip_dirs().sort_stats('cumtime').print_stats(10)


Result of function_A: 5063887
Result of function_B: 5057077
Tue May 14 19:48:22 2024    stats.prof

         1755884 function calls in 1.067 seconds

   Ordered by: cumulative time
   List reduced from 31 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.067    1.067 {built-in method builtins.exec}
        1    0.000    0.000    1.067    1.067 <string>:1(<module>)
        1    0.000    0.000    1.067    1.067 681358055.py:23(main)
   200000    0.138    0.000    0.890    0.000 random.py:358(randint)
   200000    0.375    0.000    0.751    0.000 random.py:284(randrange)
        1    0.000    0.000    0.536    0.536 681358055.py:18(function_B)
        1    0.020    0.020    0.536    0.536 {built-in method builtins.sum}
        1    0.076    0.076    0.531    0.531 681358055.py:11(function_A)
   100001    0.080    0.000    0.516    0.000 681358055.py:19(<genexpr>)
   200000    0.236    0.000    0.301    0

The output can be sorted according to any of the columns, and can be shortened to however many columns. Specify the number of lines to output in `print_stats(N)`

Now, let's return to the previous array-adding example to get some information about the timing. We have three distinct functions, so let's time them using the `timeit` module. Edit the provided script to incorporate the timing portion.

In [18]:
# Code from earlier - to edit!

import random
import numpy as np
from multiprocessing import Pool
import timeit

# set the arrays to be used
N = 10
array1 = np.random.randint(1, 100, size=N)
array2 = np.random.randint(1, 100, size=N)


def add_arrays_sequential(arr1, arr2):
  result = []
  for i in range(len(arr1)):
    result.append(arr1[i] + arr2[i])
  return result

def add_arrays_vectorized(arr1, arr2):
  result = np.add(arr1, arr2)
  return result

def add_arrays_multithreaded(arr1, arr2):
  array_pairs = list(zip(arr1, arr2))
  with Pool(processes=5) as p:
      result = p.starmap(np.add, array_pairs)
  return result


# Run our various functions

seq_array = add_arrays_sequential(array1, array2)
print("Sequential array is",seq_array)

vec_array = add_arrays_vectorized(array1, array2)
print("Vectorized array is",vec_array)

mult_array = add_arrays_multithreaded(array1, array2)
print("Multithreaded array is",mult_array)


# Timing section
time_seq = timeit.timeit(lambda: add_arrays_sequential(array1, array2), number=10)
print("The sequential time is:", time_seq)

time_vec = timeit.timeit(lambda: add_arrays_vectorized(array1, array2), number=10)
print("The vectorized time is:", time_vec)

time_mult = timeit.timeit(lambda: add_arrays_multithreaded(array1, array2), number=10)
print("The multithreaded time is:", time_mult)


Sequential array is [111, 151, 106, 155, 90, 129, 106, 122, 103, 134]
Vectorized array is [111 151 106 155  90 129 106 122 103 134]
Multithreaded array is [111, 151, 106, 155, 90, 129, 106, 122, 103, 134]
The sequential time is: 0.00010000006295740604
The vectorized time is: 5.150004290044308e-05
The multithreaded time is: 8.357957100030035


Take a look at the output. Why might the multi-threaded function take longer? Which parallelization method was best for this task?