#Abstract

Lists and numpy arrays are one of the most generic, yet very important data types in Python. This assignment was dedicated to determining whether or not numpy arrays are faster than lists for certain computations. In the end, numpy arrays turned out to be faster for all the operations in this assignment. Depending on the type of operation, arrays were from 1.5 up to 10 times faster. 

#Introduction 

The goal of this assignment is to generate random numbers and perform different operations on them using lists and numpy arrays, and then to determine which data type handles those computations faster.

Hypothesis: for the tasks in this assignment, numpy arrays should perform better than lists, because:
*   Lists in general may contain different data types, so before doing operations on numbers like addition python has to check whether or not the entries in list are compatible with such operations. In contrast, in arrays all the entries are of the same type 
*   Array entries are stored in memory compactly so it is faster to access them in sequence
*   Numpy can separate a procedure into subparts and exectute them parallelly 

Arrays/lists will be generated and then different computations will be performed on them. The time will be measured for all the steps and in the end the results will be summarized in the table and analysed.  

#Methodology

The following steps were performed for both arrays and lists:

*   Part A: Arrays/lists $a$ and $b$ of size 1000 were generated and filled with random entries
*   Part B: Array/list $c$ was created, with entries $c[i]=a[i]+b[i]$
* Part C: For $a$, $b$ and $c$ minimum, maximum, mean and root mean squared were calculated

The time was measured using %timeit command. It executes a given line of code several times and then finds the average time. Then it repeats this process 5 times and selects the best result, this is done in order to minimize effects due to background processes.

#Results
Task|List(time in $\mu s$)|Array(time in $\mu s$)|Ratio(list time/array time)
-|-|-|-
Generate a|97.3|9.33|10.4
Generate b|96.6|9.32|10.4
Generate c|166|1.08|154
Calculate min of a|17.4|5.08|3.43
Calculate max of a|17.8|5.11|3.48
Calculate mean of a|4.39|3.30|1.33
Calculate rms of a|118|9.18|12.9
Calculate min of b|17.5|5.19|3.34
Calculate max of b|17.9|4.99|3.59
Calculate mean of b|4.69|3.24|1.45
Calculate rms of b|117|9.32|12.6
Calculate min of c|17.4|5.07|3.43
Calculate max of c|17.8|5.00|3.56
Calculate mean of c|4.59|3.22|1.43
Calculate rms of c|119|9.18|13.0

Numpy arrays performed better than lists in all parts of assigned tasks.
Generating arrays $a$ and $b$ filled with random numbers was 10 times faster. Generating array $c$ was 154 faster. Calculating min/max was on average 3.5 times faster. Calculating mean was 1.3-1.4 times faster. Calclating root mean squared was 12-13 times faster.

This is mainly due to parallelization: for $a$ and $b$ numpy generated random numbers and assigned them to array entries in parallel, for $c$ it also added those numbers in parallel, etc.

#Conclusion
According to the hypothesis numpy arrays indeed were faster than lists. This is mainly achieved due to numpy being able to compute different processes in parallel. This topic can be further investigated by varying the size of arrays and learning whether or not python arrays will continue to be faster and if the ratio of speed will increase.

#References
https://towardsdatascience.com/how-fast-numpy-really-is-e9111df44347

In [None]:
import numpy as np
from time import time
from random import random
n=1000
 
def rms_list(x):
  s=0
  for i in range(n):
    s=s+x[i]*x[i]
  rms=(s/n)**0.5
  return rms
 
def rms_array(x):
  return np.sqrt(1/n*np.sum(x**2))

Lists Part A

In [None]:
a=[random() for i in range(n)]
%timeit a=[random() for i in range(n)]

10000 loops, best of 5: 97.3 µs per loop


In [None]:
b=[random() for i in range(n)]
%timeit b=[random() for i in range(n)]

10000 loops, best of 5: 96.6 µs per loop


Lists Part B

In [None]:
c=[a[i]+b[i]for i in range(n)]
%timeit c=[a[i]+b[i]for i in range(n)]

10000 loops, best of 5: 166 µs per loop


Lists Part C

In [None]:
%timeit min(a)

100000 loops, best of 5: 17.4 µs per loop


In [None]:
%timeit max(a)

100000 loops, best of 5: 17.8 µs per loop


In [None]:
%timeit sum(a)/n

The slowest run took 6.20 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 4.39 µs per loop


In [None]:
%timeit rms_list(a)

10000 loops, best of 5: 118 µs per loop


In [None]:
%timeit min(b)

100000 loops, best of 5: 17.5 µs per loop


In [None]:
%timeit max(b)

100000 loops, best of 5: 17.9 µs per loop


In [None]:
%timeit sum(b)/n

The slowest run took 4.22 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 4.69 µs per loop


In [None]:
%timeit rms_list(b)

10000 loops, best of 5: 117 µs per loop


In [None]:
%timeit min(c)

100000 loops, best of 5: 17.4 µs per loop


In [None]:
%timeit max(c)

100000 loops, best of 5: 17.8 µs per loop


In [None]:
%timeit sum(c)/n

The slowest run took 4.49 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 4.59 µs per loop


In [None]:
%timeit rms_list(c)

10000 loops, best of 5: 119 µs per loop


Arrays Part A

In [None]:
a=np.random.rand(1000)
%timeit a=np.random.rand(1000)

The slowest run took 5.30 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 9.33 µs per loop


In [None]:
b=np.random.rand(1000)
%timeit b=np.random.rand(1000)

The slowest run took 4.35 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 9.32 µs per loop


Arrays Part B

In [None]:
c=a+b
%timeit c=a+b

The slowest run took 22.38 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 1.08 µs per loop


Arrays Part C

In [None]:
%timeit np.amin(a)

The slowest run took 91.53 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 5.08 µs per loop


In [None]:
%timeit np.amax(a)

The slowest run took 14.29 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 5.11 µs per loop


In [None]:
%timeit a.sum()/n

The slowest run took 31.49 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 3.3 µs per loop


In [None]:
%timeit rms_array(a)

The slowest run took 12.19 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 9.18 µs per loop


In [None]:
%timeit np.amin(b)

The slowest run took 32.50 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 5.19 µs per loop


In [None]:
%timeit np.amax(b)

The slowest run took 16.92 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 4.99 µs per loop


In [None]:
%timeit b.sum()/n

The slowest run took 21.84 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 3.24 µs per loop


In [None]:
%timeit rms_array(b)

The slowest run took 13.12 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 9.32 µs per loop


In [None]:
%timeit np.amin(c)

The slowest run took 14.03 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 5.07 µs per loop


In [None]:
%timeit np.amax(c)

The slowest run took 17.28 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 5 µs per loop


In [None]:
%timeit c_mean=c.sum()/n

The slowest run took 27.17 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 3.22 µs per loop


In [None]:
%timeit rms_array(c)

The slowest run took 13.77 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 5: 9.18 µs per loop
