# Intro to numpy arrays
Why numpy arrays rather than the standard python iterables and `for` loops?
- Readability
- Performance

Two key concepts of today:
- vectorization
- broadcasting

Differences between numpy arrays and python lists (built-in types):
- numpy arrays (`ndarray` objects) must be homogeneous
- ndarray objects have a fixed size at initialization -> changing them makes a new array and deletes the old one

Relevant properties of ndarray:
- ndarray.ndim
- ndarray.shape
- ndarray.size
- ndarray.dtype

It's worth having a look at the numpy documentation, there is a lot more than we can cover!
- https://numpy.org/doc/stable/user/whatisnumpy.html
- https://numpy.org/doc/stable/user/quickstart.html

In [1]:
import numpy as np

## Creating 1D arrays

In [2]:
arr1 = np.arange(15)
print(arr1)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]


With `arange` you can specify the step size (here it is 2).

In [3]:
arr2 = np.arange(3,25,2)
print(arr2)

[ 3  5  7  9 11 13 15 17 19 21 23]


In [4]:
np.array([])

array([], dtype=float64)

You can also create a numpy array with specific values by passing in a python list or tuple.

In [5]:
a = np.array([2, 3, 4])
print(a)

[2 3 4]


But remember the brackets!

In [6]:
a = np.array(2, 3, 4)
print(a)

TypeError: array() takes from 1 to 2 positional arguments but 3 were given

In [7]:
np.full(10,2.5)

array([2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5])

In [8]:
np.ones(10) # Same as np.full(10,1.)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [9]:
np.zeros(10)# Same as np.full(10, 0.)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [10]:
np.zeros(0) # Same as np.array([])

array([], dtype=float64)

You can specify the data type using dtype.

In [11]:
print(np.ones(10, dtype=int))
print(np.ones(10))
print(np.ones(10, dtype=complex))

[1 1 1 1 1 1 1 1 1 1]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j]


In [13]:
# Check the size of an array
print(arr1)
print(arr1.size)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
15


`arange` is useful, but it is hard to predict the number of elements in the array (why could this be?). `linspace` allows you to set this, by specifying the bounds and the number of elements you want.

In [14]:
np.linspace(10,100,50)

array([ 10.        ,  11.83673469,  13.67346939,  15.51020408,
        17.34693878,  19.18367347,  21.02040816,  22.85714286,
        24.69387755,  26.53061224,  28.36734694,  30.20408163,
        32.04081633,  33.87755102,  35.71428571,  37.55102041,
        39.3877551 ,  41.2244898 ,  43.06122449,  44.89795918,
        46.73469388,  48.57142857,  50.40816327,  52.24489796,
        54.08163265,  55.91836735,  57.75510204,  59.59183673,
        61.42857143,  63.26530612,  65.10204082,  66.93877551,
        68.7755102 ,  70.6122449 ,  72.44897959,  74.28571429,
        76.12244898,  77.95918367,  79.79591837,  81.63265306,
        83.46938776,  85.30612245,  87.14285714,  88.97959184,
        90.81632653,  92.65306122,  94.48979592,  96.32653061,
        98.16326531, 100.        ])

In [15]:
print(arr1)
arr4 = np.concatenate([arr1,arr1])
print(arr4)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14  0  1  2  3  4  5  6  7  8
  9 10 11 12 13 14]


## Large numpy arrays

In [16]:
print(np.arange(9000))

[   0    1    2 ... 8997 8998 8999]


You can insist on printing everything using `sys`.

In [17]:
import sys

np.set_printoptions(threshold=sys.maxsize) 
print(np.arange(9000))

[   0    1    2    3    4    5    6    7    8    9   10   11   12   13
   14   15   16   17   18   19   20   21   22   23   24   25   26   27
   28   29   30   31   32   33   34   35   36   37   38   39   40   41
   42   43   44   45   46   47   48   49   50   51   52   53   54   55
   56   57   58   59   60   61   62   63   64   65   66   67   68   69
   70   71   72   73   74   75   76   77   78   79   80   81   82   83
   84   85   86   87   88   89   90   91   92   93   94   95   96   97
   98   99  100  101  102  103  104  105  106  107  108  109  110  111
  112  113  114  115  116  117  118  119  120  121  122  123  124  125
  126  127  128  129  130  131  132  133  134  135  136  137  138  139
  140  141  142  143  144  145  146  147  148  149  150  151  152  153
  154  155  156  157  158  159  160  161  162  163  164  165  166  167
  168  169  170  171  172  173  174  175  176  177  178  179  180  181
  182  183  184  185  186  187  188  189  190  191  192  193  194  195
  196 

## Indexing in 1D

In [18]:
print(arr4)
print(arr4[4])

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14  0  1  2  3  4  5  6  7  8
  9 10 11 12 13 14]
4


In [19]:
print(arr4[2:6])

[2 3 4 5]


In [20]:
print(arr4[:3]) # Same as arr4[0:3]

[0 1 2]


In [21]:
print(arr4[3:])

[ 3  4  5  6  7  8  9 10 11 12 13 14  0  1  2  3  4  5  6  7  8  9 10 11
 12 13 14]


In [22]:
print(arr4[:]) # Same as arr4

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14  0  1  2  3  4  5  6  7  8
  9 10 11 12 13 14]


In [23]:
print(arr4[:-1]) # Doesn't contain the last one!

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14  0  1  2  3  4  5  6  7  8
  9 10 11 12 13]


## Simple array operations

In [24]:
atens = np.linspace(10,100,10)
print(atens)

[ 10.  20.  30.  40.  50.  60.  70.  80.  90. 100.]


Here we see vectorization at work!

In [25]:
print(atens * 0.01)

[0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]


Here we see broadcasting at work!

In [26]:
# So these two lines create the same array:
atest1 = np.zeros(20) + 3.
atest2 = np.full(20, 3.)
print(atest1)
print(atest2)

[3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3.]
[3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3.]


In [28]:
print(arr1)
asquare = arr1 ** 2
print(asquare)
aneg = -asquare + arr1
print(aneg)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
[  0   1   4   9  16  25  36  49  64  81 100 121 144 169 196]
[   0    0   -2   -6  -12  -20  -30  -42  -56  -72  -90 -110 -132 -156
 -182]


In [34]:
# Note that this is fine...
print(np.arange(10) + np.arange(10))

# ...but this results in an error:
#print(np.arange(10) + np.arange(12))

print(np.arange(10) + 3.0)

[ 0  2  4  6  8 10 12 14 16 18]
[ 3.  4.  5.  6.  7.  8.  9. 10. 11. 12.]


## Comparison operates on an element-by-element basis!

In [35]:
print(aneg)
islarge = aneg > -100
print(islarge)

[   0    0   -2   -6  -12  -20  -30  -42  -56  -72  -90 -110 -132 -156
 -182]
[ True  True  True  True  True  True  True  True  True  True  True False
 False False False]


## Aggregate functions

In [36]:
# Sum
arr3 = np.arange(10)
print(arr3)
np.sum(arr3)

[0 1 2 3 4 5 6 7 8 9]


45

In [37]:
# Product
arr5 = np.linspace(3.,5.,3)
print(arr5)
arr6 = np.prod(arr5)
print(arr6)

[3. 4. 5.]
60.0


In [38]:
# Calculate average of x**2 between -1 and 5 using linspace, sum, and size
s=np.linspace(-1,5,1000)
b=s**2
SUM=sum(b)
SIZE=np.size(b)
MEAN=SUM/SIZE
print(f"Calculating the mean by hand", MEAN)
print(f"Using the mean method", b.mean())

Calculating the mean by hand 7.006006006006001
Using the mean method 7.006006006006006


In [39]:
print(arr5)
np.mean(arr5), np.min(arr5), np.max(arr5)

[3. 4. 5.]


(4.0, 3.0, 5.0)

##### Side note: inside math operations, `True` has value 1 and `False` 0

In [40]:
print(f"{True + False=}")
print(f"{True * False=}")
print(f"{True + True=}")
print(f"{True * np.pi=}")
print(f"{False * np.pi=}")

True + False=1
True * False=0
True + True=2
True * np.pi=3.141592653589793
False * np.pi=0.0


This means we can use `np.sum` to count how many elements fulfill our criterion:

In [41]:
print(aneg)
count = np.sum(aneg > -40)
print(f"{count} numbers above -40.")

[   0    0   -2   -6  -12  -20  -30  -42  -56  -72  -90 -110 -132 -156
 -182]
7 numbers above -40.


In [42]:
#any requires one true value to be true, all requires all values to be true to true
print(aneg)
print(np.any(aneg < -20))
print(np.any(aneg < -200))
print(np.all(aneg < 10))
print(np.all(aneg < 0))
print(np.all(aneg <= 0))

[   0    0   -2   -6  -12  -20  -30  -42  -56  -72  -90 -110 -132 -156
 -182]
True
False
True
False
True


How can we check that arrays atest1 and atest2 are the same?

In [44]:
print(atest1)
print(atest2)
np.all(atest1 == atest2)

[3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3.]
[3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3.]


True

## Moving up to $n$ dimensions

In [45]:
print(atest1)
print(atest1.shape)
print(atest1.size)

[3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3.]
(20,)
20


In [48]:
print(atest1)
arr2d = atest1.reshape(4,5)
print(arr2d)
print(arr2d.shape)
# arr2d_2 = atest1. reshape(3,6)

[3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3.]
[[3. 3. 3. 3. 3.]
 [3. 3. 3. 3. 3.]
 [3. 3. 3. 3. 3.]
 [3. 3. 3. 3. 3.]]
(4, 5)


In [49]:
np.zeros((5,5))
# same as :
# np.zeros(25).reshape(5,5)

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [50]:
ran2d = np.arange(100).reshape(10,10)
print(ran2d)

[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49]
 [50 51 52 53 54 55 56 57 58 59]
 [60 61 62 63 64 65 66 67 68 69]
 [70 71 72 73 74 75 76 77 78 79]
 [80 81 82 83 84 85 86 87 88 89]
 [90 91 92 93 94 95 96 97 98 99]]


In [51]:
rand2d_tr = np.transpose(ran2d)
print(rand2d_tr)

[[ 0 10 20 30 40 50 60 70 80 90]
 [ 1 11 21 31 41 51 61 71 81 91]
 [ 2 12 22 32 42 52 62 72 82 92]
 [ 3 13 23 33 43 53 63 73 83 93]
 [ 4 14 24 34 44 54 64 74 84 94]
 [ 5 15 25 35 45 55 65 75 85 95]
 [ 6 16 26 36 46 56 66 76 86 96]
 [ 7 17 27 37 47 57 67 77 87 97]
 [ 8 18 28 38 48 58 68 78 88 98]
 [ 9 19 29 39 49 59 69 79 89 99]]


In [52]:
print(ran2d.shape, ran2d.size)

(10, 10) 100


In numpy, indexing doesn't have to be done dimension by dimension, like with lists:

In [None]:
#row, column
print(ran2d[2][1])
print(ran2d[1][2])

Instead, we can access elements directly with multiple indices (faster and more readable)

In [53]:
#row, column
ran2d[2,1]

21

We can also flatten the multi-dimensional array out with the ravel method.

In [None]:
print(rand2d_tr.ravel())

### With this syntax I can also access slices of my arrays along any axis, as well as "chunks":

In [54]:
print(ran2d)

[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49]
 [50 51 52 53 54 55 56 57 58 59]
 [60 61 62 63 64 65 66 67 68 69]
 [70 71 72 73 74 75 76 77 78 79]
 [80 81 82 83 84 85 86 87 88 89]
 [90 91 92 93 94 95 96 97 98 99]]


In [55]:
hslice = ran2d[1,:] # Same as ran2d[1] (same as with a list)
print(hslice)

[10 11 12 13 14 15 16 17 18 19]


In [56]:
vslice = ran2d[:,1] # Not so straightforward with lists!
print(vslice)

[ 1 11 21 31 41 51 61 71 81 91]


Create a 3D array from np.arange(100) 

In [57]:
ran3d = ran2d.reshape(2,5,10)
print(ran3d)

[[[ 0  1  2  3  4  5  6  7  8  9]
  [10 11 12 13 14 15 16 17 18 19]
  [20 21 22 23 24 25 26 27 28 29]
  [30 31 32 33 34 35 36 37 38 39]
  [40 41 42 43 44 45 46 47 48 49]]

 [[50 51 52 53 54 55 56 57 58 59]
  [60 61 62 63 64 65 66 67 68 69]
  [70 71 72 73 74 75 76 77 78 79]
  [80 81 82 83 84 85 86 87 88 89]
  [90 91 92 93 94 95 96 97 98 99]]]


What parts of my 3D matrix do these lines of code access?

In [58]:
ran3d[0,:,:]

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]])

In [59]:
ran3d[:,0,:]

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59]])

In [60]:
ran3d[:,:,0]

array([[ 0, 10, 20, 30, 40],
       [50, 60, 70, 80, 90]])

## Operating along different axes

In [None]:
# 3 rows, 2 columns
testmat = np.arange(6).reshape(3,2)
print(testmat)

In [None]:
np.sum(testmat) # Default of the `axis` parameter is None, 
                # which simply applies the function to all 
                # the elements in the array (in this case sum)

In [None]:
print(np.sum(testmat, axis=0)) # This sums along axis 0 
                               # (i.e. column by column through the rows)
print(np.sum(testmat, axis=1)) # This sums along axis 1
                               # (i.e. row by row through the columns)

## Advanced indexing

#### Example 1: indexing with arrays

In [None]:
atest = np.arange(20) ** 2 + 7
print(atest)
aselect = np.array([3,5,6])
print(atest[aselect])

#### Example 2: indexing with `np.where`

In [None]:
ilarge = np.where(atest > 70)
print(ilarge)
print(atest[ilarge])

In [None]:
imedium = np.where(np.logical_and(atest > 70,
                                  atest < 200)
                  )
print(imedium)
print(atest[imedium])

In [None]:
iextremes = np.where(np.logical_or(atest < 70,
                                   atest > 200)
                  )
print(iextremes) # Doesn't contain indices 8-13 
print(atest[iextremes])

#### Example 3: `argmax` and `argmin`

In [None]:
print(atest)
maxval = np.max(atest)  # This gives me the maximum value of my array
print(maxval)
imax = np.argmax(atest) # This gives me the *index* of the maximum-value element
print(imax)
print(atest[imax])      # Same as np.max(imax)

#### Example 4 (advanced)

In [None]:
aran = np.arange(20,40).reshape(5,2,2) # Take a look at this example if you feel like 
                                       # working out how np.where works in n>1 dimensions!
                                       # (In a nutshell: np.where returns a tuple of
                                       # arrays. Each of these arrays gives the indices
                                       # along an axis of the original array)
print("Original array:\n", aran)


ieven = np.where(aran % 2 == 0) # Let's say we want the even elements
print("Even indices:\n", ieven)

aeven = aran[ieven]
print("Even values:\n", aeven) # This is how np.where accesses them  

aeven_manual = aran[:,:,0]
print("Accessing manually:\n", aeven_manual.ravel()) # This is how I would accesses them "manually", which
                                                     # is easy for this simple case, but in real-life examples,
                                                     # np.where is often our only option.

## Useful functions & methods (copied from the numpy manual)
![image.png](attachment:22a1d5e9-fe2a-4212-8ce8-84d4214619c5.png)

### Example: sinusoidal signals

In [None]:
atheta = np.linspace(-np.pi, np.pi,20)
acos = np.cos(atheta)
asin = np.sin(atheta) + 0.5 # We add an offset

print(acos)

In [None]:
from matplotlib import pyplot as plt # We'll talk more about 
                                     # plotting in the next weeks

#           x-axis    y-axis
plt.scatter(atheta,   acos,                  label="cos(x)")
plt.scatter(atheta,   asin,                  label="sin(x) + 0.5")
plt.plot   (atheta,   np.zeros(atheta.size)) # Draw a line at y==0

plt.legend(loc='upper left')

#### Get the maximum values of asin and acos

In [None]:
cosmax = np.max(acos) 
sinmax = np.max(asin)
print(f"{cosmax=:.3f}, {sinmax=:.3f}")

#### Get the value of asin where the value of acos is max

In [None]:
imaxcos = np.argmax(acos) # Note: this gives you the *first* occurrence of the maximum value!
sinval = asin[imaxcos]
print(f"Value of asin when acos is max: {sinval:.3f} at i={imaxcos}")

#### Select only the positive values of acos

In [None]:
acos[np.where(acos>0)]

#### Select the values of atheta, asin and acos in the range where acos is positive 

In [None]:
isel = np.where(acos > 0)
athetasel = atheta[isel]
asinsel = asin[isel]
acossel = acos[isel]

# Plot old arrays
plt.scatter(atheta,   acos,                  label="cos(x)")
plt.scatter(atheta,   asin,                  label="sin(x) + 0.5")
plt.plot   (atheta,   np.zeros(atheta.size)) # Draw a line at y=0

# Plot new selected values as large circles
#           x-axis      y-axis
plt.scatter(athetasel,  acossel, alpha=0.5, s=200)
plt.scatter(athetasel,  asinsel, alpha=0.5, s=200)              

plt.legend(loc='upper left')

#### Create a "truncated" array with only the values of acos where they're negative, and zero where they're positive   

In [None]:
atrunc = acos.copy() # copy the original cosine array

isel = np.where(acos > 0)
atrunc[isel] = 0 # replace positive elements with zero


plt.scatter(atheta, acos, label="Normal cosine")  # Plot old array
plt.scatter(atheta, atrunc, label="Truncated cosine")  # Plot the new truncated array

plt.legend(loc='upper left')

### A note on `copy()`:

In [None]:
# use id() to get a unique identifier for the array
arr1 = np.array([1,0,-3])
print(arr1)
print(id(arr1))

In [None]:
print("Case 1:")
arr1 = np.array([1,0,-3])
arr2 = arr1
print(np.all(arr1 == arr2)) # Same values in the array
print(id(arr2) == id(arr1)) # Same location in memory.
                            # This means arr1 and arr2 are
                            # actually references to the same 
                            # array, and if I change arr2, I
                            # will also be changing arr1!

print("Case 2:")
arr3 = np.array([1,0,-3])
arr4 = arr3.copy()
print(np.all(arr3 == arr4)) # Same values in the array
print(id(arr3) == id(arr4)) # Different locations in memory
                            # (I've truly created a new array)

print("Case 3:")
arr5 = np.array([1,0,-3])
arr6 = arr5 * 1             # Here I operated on the array
print(np.all(arr5 == arr6)) # Same values in the array (because
                            # I just multiplied by 1.0)
print(id(arr5) == id(arr6)) # Different locations in memory.
                            # Because I've "changed" the 
                            # original array, python creates a 
                            # new array, just like with copy()                            

### **Important note:** There are many, many useful functions in the numpy package. 

### So when you're faced with a new task, check the numpy documentation online, and forums like Stackoverflow, because there may be a function out there that does the job faster!

In the example at hand:

In [None]:
newarr = np.clip(acos, -1, 0)

plt.scatter(atheta, acos, label="Normal cosine")  # Plot old array
plt.scatter(atheta, newarr, label="Truncated cosine")  # Plot the new truncated array