In [2]:
import numpy as np

1. Benchmarks to test contiguousness - list, array, np array. Are Numpy arrays by default contiguous?

2. Numpy calls underlying C implementation

5. Call BLAS through numpy somehow?

In [3]:
# This creates an array without initializing it
a_numpy = np.empty((20000, 10000))

In [4]:
# As we can see, array is contiguous
np.info(a_numpy)

class:  ndarray
shape:  (20000, 10000)
strides:  (80000, 8)
itemsize:  8
aligned:  True
contiguous:  True
fortran:  False
data pointer: 0x280000000
byteorder:  little
byteswap:  False
type: float64


In [5]:
print("Size of a: ", a_numpy.nbytes / 1024 / 1024, " MB")
print("Nr bytes: ", a_numpy.nbytes)

Size of a:  1525.87890625  MB
Nr bytes:  1600000000


Why 160000000 bytes? Because we have float64 elements, and there's 200 million of them

<font color="red"> Interesting - Colab doesn't show memory is taken up so far, even though RAM should be mostly full </font>

Let's see how much more memory we're using

In [6]:
from scipy.stats import randint

In [7]:
# Generate random ints between 0-9000 (Uniform dist.)
# https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.randint.html
a_numpy = randint.rvs(low=0, high=9000, size=1000000)
a_list = list(a_numpy)

In [9]:
from pympler.asizeof import asizeof

print("Python list: Size in MB: ", asizeof(a_list) / 1024 / 1024)

Python list: Size in MB:  37.71116638183594


In [10]:
print("Numpy array: Size in MB: ", asizeof(a_numpy) / 1024 / 1024)

Numpy array: Size in MB:  7.6295166015625


<font color="orange">True. Found in [this post](https://medium.com/swlh/numpy-why-is-it-so-fast-8087f4da4d79)</font>

#### Testing bad code found [here](https://towardsdatascience.com/how-fast-numpy-really-is-e9111df44347)

In [34]:
import time

start_time = time.time()
count = 1000000000 # one billion

normalRange = range(count)
print(sum(normalRange))

print("Time taken: %s seconds" % (time.time() - start_time))

499999999500000000
Time taken: 8.36018705368042 seconds


In [35]:
import time
import numpy as np

start_time = time.time()
count = 1000000000 # one billion

numpyRange = np.arange(count)
print(numpyRange.sum())

print("Time taken: %s seconds" % (time.time() - start_time))

499999999500000000
Time taken: 1.2841331958770752 seconds


<font color="orange">But this makes the mistake of benchmarking more than just the Target function!!</font>

In [38]:
%%timeit

normalRange = range(count)

83 ns ± 0.0346 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [40]:
%%timeit

normalRange = np.arange(count)

835 ms ± 10 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


<font color="orange">Let's run the benchmark without all the clutter</font>

In [36]:
%%timeit

sum(normalRange)

9.61 s ± 821 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [37]:
%%timeit

np.sum(numpyRange)

160 ms ± 116 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


<font color="red">100x faster!</font>

Now let's test contiguity


We're gonna use A dumb program to sum elemnets by accessing them all sequentially. If one is contiguous but another is not, we should see huge difference in access times

In [49]:
a_numpy = randint.rvs(low=0, high=9000, size=100000000) # I increased this
a_list = list(a_numpy)

In [50]:
%%timeit

summation = 0

for i in range(len(a_list)):
    summation += a_list[i]

# print(summation)

4.51 s ± 31.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [51]:
%%timeit

summation = 0

for i in range(len(a_numpy)):
    summation += a_numpy[i]

# print(summation)

6.93 s ± 61.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


<font color="red">Wow! Numpy access is 50% slower than a regular Python lists. That's especially surprising bcs. Lists are heterogeneous - you can have elements of different types and therefore sizes in them, so indexing is not as straightforward. See [this great StackOverflow post](https://stackoverflow.com/questions/35020604/why-is-numpy-list-access-slower-than-vanilla-python) for more info why this happens</font>

And yet, when we try this

In [None]:
%%timeit

sum(a_list)

In [None]:
%%timeit

summation = np.sum(a_numpy)
# print(summation)

In [None]:
print(np.sum(a_numpy))

<font color="orange">Numpy is incredibly faster. So fast that I had to remove the print() bcs. it was noticeably affecting the runtime. But also sum(list) is 2x faster!</font>

<font color="red">So Numpy isn't just "faster" by some kind of magic. In some cases it's even slower. Generally, if your code has for loops, you're not taking advantage of Numpy. See my Part 2 for a deep dive behind Numpy</font>


Now let us compare to [Python arrays](https://docs.python.org/3/library/array.html). To do this, we need to specify the data type we're using.
As we can see, the numpy array is type int64. That corresponds to 'l' in python arrays

In [None]:
np.info(a_numpy)

In [None]:
from array import array

a_array = array('l', a_list)

#### Let's see how much memory Python arrays take vs. Numpy vs. Lists

In [None]:
print("Python list: Size in MB: ", sys.getsizeof(a_array) / 1024 / 1024)

<font color="orange">As we see, Python arrays take pretty much the same exact amount of space as a Numpy array</font>


### Let's sum up the elements (sequential access)

In [None]:
%%timeit

summation = 0

for i in range(len(a_array)):
    summation += a_array[i]

print(summation)

<font color="orange">But arrays are supposed to have faster access! :(</font>

Let's try the built-in sum:

In [None]:
%%timeit

sum(a_array)

Ahh, there's the benefit we expected of using arrays - 4.26s vs. 10s to run. About 2x as fast as summing Lists


This is most likely due to contiguous memory access. Unfortunately, I cannot see whether the list elements are stored contiguously.

In [None]:
# Need to delete some elements bcs. Colab Free runs out of memory
del a_array
del a_list

## Let's test the hypothesis that Numpy parallelizes computation

##### Let's create another such array and do some Matrix-matrix multiplication, while watching CPU core usage

Check out [this cool library](https://github.com/InfuseAI/colab-xterm) to do this on Colab

In [7]:
import numpy as np
from scipy.stats import randint

In [10]:
a_numpy = randint.rvs(low=0, high=9000, size=(1000, 20000))
b_numpy = randint.rvs(low=0, high=9000, size=(20000, 1000))

In [11]:
%%timeit -n 2

a_numpy @ b_numpy

1.67 s ± 6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [None]:
np.info(a_numpy)

#### Now let's test [Dask](https://www.dask.org/get-started)

In [13]:
del a_numpy
del b_numpy

In [1]:
import numpy as np
from scipy.stats import randint
import dask.array

In [2]:
a_dask = dask.array.from_array(randint.rvs(low=0, high=9000, size=(10000, 10000)))
b_dask = dask.array.from_array(randint.rvs(low=0, high=9000, size=(10000, 10000)))

In [None]:
%%timeit

dask.array.matmul(a_dask, b_dask).compute()

Let's also Sum up all array

In [None]:
dask.array.sum(dask.array.matmul(a_dask, b_dask)).compute()

In [None]:
%%timeit

# This was for int64
dask.array.matmul(a_dask, b_dask).compute()

In [20]:
del a_dask
del b_dask

<font color="orange">Trying lower precision</font>

In [None]:
a = randint.rvs(low=0, high=9000, size=(10000, 100000))
np.info(a)

In [None]:
b = a.astype(np.int16)
np.info(b)

https://medium.com/swlh/numpy-why-is-it-so-fast-8087f4da4d79


This guy says


1. Lists cost more memory than np arrays, since metadata for each element

2. No type checking needed when reading for same reason

3. Contiguous


https://towardsdatascience.com/how-fast-numpy-really-is-e9111df44347

This one doesn't elaborate on reasons why

https://towardsdatascience.com/is-your-numpy-optimized-for-speed-c1d2b2ba515

This guy gives the real reason but doesn't explain why. Says how to choose BLAS


https://www.geeksforgeeks.org/why-numpy-is-faster-in-python/


This one is plain wrong. Python doesn't use any data-level parallelism. If it did, the speedup would be limited by the number of CPU cores you have

<font color="red">Twice as fast when B in Fortran order</font>

In [None]:
b_numpy_fortran = np.asfortranarray(b_numpy)

In [None]:
%%timeit

a_numpy @ b_numpy_fortran

In [None]:
# del a_numpy
del b_numpy_fortran