**Section 4.3 - Example: Adding Two Arrays**

We will write functions to add two 1D arrays in a few different ways.


1.   In serial, using a `for` loop
2.   Vectorized, using `numpy`
3.   Multi-threaded, using `Pool` and `map()`






In [None]:
# Script with different functions to add arrays

# Import necessary modules
import random
import numpy as np
# your code here...


# Create the arrays to be added
N = 10         # number of entries in the array
array1 = np.random.randint(1, 100, size=N)
array2 = np.random.randint(1, 100, size=N)


# Define serial function
# your code here...


# Define vectorized function
# your code here...


# Define multi-threaded function
# your code here...


# Run each function and print the output
# your code here...

**Section 5 - Analyzing Performance**

To best demonstrate how to check the performance of Python code, let's consider the examples below showing different ways to use the `timeit` module.

This is a built-in Python module that can be used either from the command line or within a script.

In this script, we'll use two methods of concatenating strings by appending numbers 1-99. We will define two functions, then compare how long they take to run using `timeit.timeit()`.

The general syntax is `timeit.timeit(stmt, setup, timer, number)`






In [None]:
# Script to time two different functions with timeit
# your code here...


`timeit` tells us the total time it took to run each iteration of the code, in seconds. In our example, this is ~2 seconds to run our functions 100,000 times. Consider the differences in the methods and why you might want to use one over the other.  


Another method of timing code with `timeit` is to pass the code as a string to the timeit function. This is made simple by encasing our formatted code snippet in triple quotes, and storing it as a variable. The same can be done for any setup code.

In [None]:
# Script to time code in triple quotes
import timeit

# setup snippet to be executed only once
mysetup = ""

# code snippet whose execution time is to be measured
# your code here...


# timeit statement
# your code here...

Another (maybe better) way to check the timing of your code is to profile your code using the `profile` or `cProfile` modules. These profile your entire code as it runs, and allows for a more holistic view to identify where speedup is most needed.

Of the two modules, `cProfile` is preferred because of its reduced overhead and suitability for long-running programs.

Let's use `cProfile` with a script that contains two functions to perform the same task - summing a series of random numbers. The original script is provided to edit.

In [30]:
# Python script without cProfile incorporated
import random

# set number of integers in range
N = 1000       # this can be increased or decreased


def function_A(N):
  total = 0
  for x in range(N):
    total += random.randint(1, 100)
  return total


def function_B(N):
 total = sum(random.randint(1, 100) for x in range(N))
 return total


def main(N):
  # Call function A
  first_result = function_A(N)
  print("Result of function_A:", first_result)

 # Call function B
  second_result = function_B(N)
  print("Result of function_B:", second_result)

# using if name == main ensures that the function runs only when called, and not when imported
if __name__ == "__main__":
  main(N)


Result of function_A: 48952
Result of function_B: 50375


Now that we have our script, we can edit it to incorporate `cProfile`. To do this, import `cProfile` at the beginning of the code and edit how `main()` is run.

Keep the above cell intact for reference, and edit the copy below.

In [33]:
# Copy of Python script without cProfile incorporated - to edit!
import random


# set number of integers in range
N = 1000       # this can be increased or decreased


def function_A(N):
  total = 0
  for x in range(N):
    total += random.randint(1, 100)
  return total


def function_B(N):
 total = sum(random.randint(1, 100) for x in range(N))
 return total


def main(N):
  first_result = function_A(N)
  print("Result of function_A:", first_result)

  second_result = function_B(N)
  print("Result of function_B:", second_result)

if __name__ == "__main__":
  main(N)


Result of function_A: 48742
Result of function_B: 48271


This is a useful output, but we can still improve it! We can use the `pstats` module to sort the output according to the number of calls.

In [None]:
# Another copy of the Python script to edit!
import random


# set number of integers in range
N = 1000       # this can be increased or decreased


def function_A(N):
  total = 0
  for x in range(N):
    total += random.randint(1, 100)
  return total


def function_B(N):
 total = sum(random.randint(1, 100) for x in range(N))
 return total


def main(N):
  first_result = function_A(N)
  print("Result of function_A:", first_result)

  second_result = function_B(N)
  print("Result of function_B:", second_result)

if __name__ == "__main__":
  main(N)


The output can be sorted according to any of the columns, and can be shortened to however many columns. Specify the number of lines to output in `print_stats(N)`

Now, let's return to the previous array-adding example to get some information about the timing. We have three distinct functions, so let's time them using the `timeit` module. Edit the provided script to incorporate the timing portion.

In [2]:
# Code from earlier - to edit!

import random
import numpy as np
from multiprocessing import Pool

# set the arrays to be used
N = 10
array1 = np.random.randint(1, 100, size=N)
array2 = np.random.randint(1, 100, size=N)


def add_arrays_sequential(arr1, arr2):
  result = []
  for i in range(len(arr1)):
    result.append(arr1[i] + arr2[i])
  return result

def add_arrays_vectorized(arr1, arr2):
  result = np.add(arr1, arr2)
  return result

def add_arrays_multithreaded(arr1, arr2):
  array_pairs = list(zip(arr1, arr2))
  with Pool(processes=5) as p:
      result = p.starmap(np.add, array_pairs)
  return result


# Run our various functions

seq_array = add_arrays_sequential(array1, array2)
print("Sequential array is",seq_array)

vec_array = add_arrays_vectorized(array1, array2)
print("Vectorized array is",vec_array)

mult_array = add_arrays_multithreaded(array1, array2)
print("Multithreaded array is",mult_array)


Sequential array is [19, 124, 160, 56, 19, 28, 117, 124, 156, 113]
Vectorized array is [ 19 124 160  56  19  28 117 124 156 113]
Multithreaded array is [19, 124, 160, 56, 19, 28, 117, 124, 156, 113]


Take a look at the output. Why might the multi-threaded function take longer? Which parallelization method was best for this task?