<img src="numpy-hd.jpg">

Numpy's most useful feature is the n dimensional array object aka ndarray

In [10]:
import numpy as np # To import numpy as the np shortcut
import sys
import time

**To create an array**<br>
We simply use the array method to numpy and pass it a list as an argument

In [4]:
a = np.array([1,2,3]) 
a

array([1, 2, 3])

We end up with something that looks very similar to a list and behaves very much like a list. You can access the elements by index...

In [5]:
a[0] # To give us the first element of the array...just like you would in a list

1

**Why do I need the numpy array?**<br>
There are three main benefits of using a Numpy array over a python list...<br>
 - Less Memory
 - Fast
 - Convenient
 
**Less Memory - Comparing a Python list to a Numpy array**

In [8]:
l = range(1000)
print(sys.getsizeof(5) * len(l))

array = np.arange(1000)
print(array.size * array.itemsize)

14000
4000


1.) *<font color = blue>l = range(1000)</font>* - Creates a list with 1000 elements<br>
2.) _<font color = blue>print(sys.getsizeof(5) \* len(l))</font>_ - Prints the size of the list
 - To get the size you need to pass it one element, it could be any number but we chose 5<br>

3.) *<font color = blue>array = np.arange(1000)</font>* - Next, we create a Numpy array with 1000 elements<br>
 - arange is virtually the same as range - it is the Numpy version and behaves very similarly<br>

4.) _<font color = blue>print(array.size * array.itemsize)</font>_ - To get the size of this array, you multiply the size of the array (array.size) by every indvidual element in the array (array.itemsize)<br>

**Results**<br>
We see that the Python list is 14,000 bytes while the Numpy array is only 4000 bytes. This is because the size of just one Python object is much bigger than the Numpy array elements. ICYMI each list element is 14bytes and each array element is 4bytes.<br><br>

The diagram below shows the memory represerntation of lists and Numpy arrays. At the implementation level, the array essentially contains a single pointer to one contiguous block of data. The Python list, on the other hand, contains a pointer to a block of pointers, each of which in turn points to a full Python object like the Python integer we saw earlier. This is why there is such a difference between a Numpy array and a Python list. For small arrays you will not notice much difference but for very large amounts of data then it would make more sense to a Numpy array 

<img src="arrayvlist.png" height="500" width="500">

**Fast**<br>

In [12]:
#Timing how long a Python list and a Numpy array take to complete the same operation
size = 1000 # Number of operations to be performed

#Creating Python Lists
l1 = range(size)
l2 = range (size)

#Creating Numpy arrays
a1 = np.arange(size)
a2 = np.arange(size)

#Timinig the Python operation
start = time.time()
result = [(x + y) for x,y in zip(l1,l2)]
print("Python list took", (time.time() - start) * 1000)

#Timing the Numpy array
start = time.time()
result = a1 + a2
print("Numpy array took", (time.time() - start) * 1000)

Python list took 0.99945068359375
Numpy array took 0.0


As we said before, over a small number of elements the difference will be neglible. So we are going to run the code again but this time with 1m elements and we'll see what the time difference is then...

In [14]:
#Timing how long a Python list and a Numpy array take to complete the same operation
size = 1000000 # Number of operations to be performed

#Creating Python Lists
l1 = range(size)
l2 = range (size)

#Creating Numpy arrays
a1 = np.arange(size)
a2 = np.arange(size)

#Timinig the Python operation
start = time.time()
result = [(x + y) for x,y in zip(l1,l2)]
print("Python list took", (time.time() - start) * 1000)

#Timing the Numpy array
start = time.time()
result = a1 + a2
print("Numpy array took", (time.time() - start) * 1000)

Python list took 124.94564056396484
Numpy array took 14.99032974243164


We can see now that the Numpy array took almost 10 times less time than the Python list to complete a similar operation. So when we are processing millions of numbers, it makes much more sense to use a Numpy array.<br><br>
**Convenient**<br>
You can see that to add two lists together, in Python you need quite a complicated bit of code to get the job done. Whereas in Numpy, you only need to do a simple mathematical sum. You can also do the other mathematical operations in much the same way...

In [17]:
a3 = np.array([1,2,3])
a4 = np.array([4,5,6])

a3 * a4

array([ 4, 10, 18])

In [18]:
a4 / a3

array([4. , 2.5, 2. ])

In [19]:
a4 - a3

array([3, 3, 3])

All very simple and much more convenient than doing them in a Python list