## The basics of NumPy Arrays

Data manipulation in Python is nearly synonymous with NumPy array manipulation; most Python libraries are built around the NumPy array.

NumPy arrays allow you to organize and manipulate data. Arrays are somewhat similar to `lists`; however in an arrray, all data must be of the same type (ie, `int`, `float`, `bool`, `uint8`, etc)

In [6]:
#create some sample arrays 
import numpy as np #start with importing numpy
np.set_printoptions(legacy='1.25') #this makes it easy to visualize some results, it is optional

np.random.seed(42) #adding this at the start of your code ensures that it is reproducible, as the "random" numbers produced by numpy will be the same
#the number can be anything, 42 is a common choice as it is the answer to life, the universe, and everything.

x1 = np.random.randint(10, size=6)  # One-dimensional array -- ie, a vector
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array -- ie, a 2d matrix
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array -- ie, a 3d matrix 

NumPy arrays are objects. As such, they have properties

In [9]:
print("x2 ndim: ", x2.ndim) #number os dimensions in the array
print("x2 shape:", x2.shape) # shape of the array (rows, columns for a 2d array or matrix)
print("x2 size: ", x2.size) #number of elements in the array 

x2 ndim:  2
x2 shape: (3, 4)
x2 size:  12


In [10]:
#what will be printed?
print("x3 ndim: ", x3.ndim) #number os dimensions in the array
print("x3 shape:", x3.shape) # shape of the array (rows, columns for a 2d array or matrix)
print("x3 size: ", x3.size) #number of elements in the array 

x3 ndim:  3
x3 shape: (3, 4, 5)
x3 size:  60


In [11]:
#all the elements in the array are of the same type
print(x3.dtype)

int64


## Array Indexing

You can access elements in the array in a similar way to using lists in Python.

In [16]:
x1


array([6, 3, 7, 4, 6, 9])

In [12]:
#what will be printed?
x1[0]

6

In [13]:
#what will be printed?
x1[4]

6

In [14]:
#what will be printed?
x1[-1]

9

In [15]:
#what will be printed?
x1[-2]

6

In multi-dimensional arrays, items are accessed using a comma-separated list of indices:

In [17]:
x2

array([[2, 6, 7, 4],
       [3, 7, 7, 2],
       [5, 4, 1, 7]])

In [18]:
#what will be printed?
x2[0,0]

2

In [19]:
#what will be printed?
x2[2,-1]

7

In [25]:
x3

array([[[5, 1, 4, 0, 9],
        [5, 8, 0, 9, 2],
        [6, 3, 8, 2, 4],
        [2, 6, 4, 8, 6]],

       [[1, 3, 8, 1, 9],
        [8, 9, 4, 1, 3],
        [6, 7, 2, 0, 3],
        [1, 7, 3, 1, 5]],

       [[5, 9, 3, 5, 1],
        [9, 1, 9, 3, 7],
        [6, 8, 7, 4, 1],
        [4, 7, 9, 8, 8]]])

In [26]:
#what will be printed?
x3[0,0,0]

5

In [27]:
#what will be printed?
x3[-1,-1,-1]

8

You can also use slices to grab parts of an array

In [28]:
x1[:3] # first 3 elements

array([6, 3, 7])

In [29]:
x1[::2] # every other element

array([6, 7, 6])

In [30]:
x1[1::2] # every other element, starting at index 1

array([3, 4, 9])

In [31]:
x1[::-1]  # all elements, reversed

array([9, 6, 4, 7, 3, 6])

In [32]:
x2[:2, :3]  # two rows, three columns

array([[2, 6, 7],
       [3, 7, 7]])

In [33]:
x2[:, 0]  # all rows, first column of x2

array([2, 3, 5])

In [34]:
x2[0, :]# first row, all columns of x2

array([2, 6, 7, 4])

In [35]:
x2

array([[2, 6, 7, 4],
       [3, 7, 7, 2],
       [5, 4, 1, 7]])

In [None]:
[[7,1,4,5],
[2,7,7,3],
[4,7,6,2]]

In [36]:
x2[::-1,::-1]

array([[7, 1, 4, 5],
       [2, 7, 7, 3],
       [4, 7, 6, 2]])

## Reshaping Arrays
Another common action is to reshape the dimensions of an array. The `.reshape()` method is the easiest way to do this.

In [40]:
x1r = x1.reshape(3,2) #reshaping x1 to have 3 rows and 2 columns
print(x1)
print(x1r)

[6 3 7 4 6 9]
[[6 3]
 [7 4]
 [6 9]]


In [43]:
#NumPy can automatically identify one of the dimensions for you
x1r = x1.reshape(3,-1) #reshaping x1 to have 3 rows and as many columns as needed 
print(x1)
print(x1r)

[6 3 7 4 6 9]
[[6 3]
 [7 4]
 [6 9]]


In [44]:
#NumPy can automatically identify one of the dimensions for you
x1r = x1.reshape(-1,2) #reshaping x1 to have 2 columns, as many rows as needed 
print(x1)
print(x1r)

[6 3 7 4 6 9]
[[6 3]
 [7 4]
 [6 9]]


## Concatenation of arrays
There are several functions to concatenate two arrays in NumPy: `np.concatenate`, `np.vstack`, and `np.hstack` are common methods

In [45]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

array([1, 2, 3, 3, 2, 1])

In [46]:
# Concatenating multiple arrays
z = [99, 99, 99]
print(np.concatenate([x, y, z]))

[ 1  2  3  3  2  1 99 99 99]


In [49]:
# Concatenating 2-dimensional arrays
grid = np.array([[1, 2, 3,],
                 [4, 5, 6]])

np.concatenate([grid,grid], axis=0) # the axis here is not needed as 0 is the default value

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [48]:
# Concatenating along the second axis (zero-indexed)
np.concatenate([grid,grid], axis=1)

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

In [53]:
np.vstack([x,grid])

array([[1, 2, 3],
       [1, 2, 3],
       [4, 5, 6]])

In [56]:
np.hstack([x,grid])

ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 1 dimension(s) and the array at index 1 has 2 dimension(s)

## Computation on NumPy Arrays: Universal Functions

NumPy offers the ability to vectorize operations, making operating on arrays very fast and efficient (both in terms of time and code). NumPy is very useful when you need to do repeated operations on the elements of an array

Without NumPy, operations on arrays imply going through all the elements in the array and performing the desired operation.
Say you want to create a function that finds the inverse of each element in an array

In [58]:

def compute_reciprocals(values):
    output = np.empty(len(values))
    for i in range(len(values)):
        output[i] = 1.0 / values[i]
    return output
        
values = np.random.randint(1, 10, size=5) #create a random matrix of int with minimun value 1 and max 10 (so that 0 is not included)
values, compute_reciprocals(values)

(array([1, 8, 8, 3, 1]),
 array([1.        , 0.125     , 0.125     , 0.33333333, 1.        ]))

This is fine for small matrices, but what about large matrices?

In [59]:
# Let's time this on a big array:
big_array = np.random.randint(1, 100, size=1000000)
%timeit compute_reciprocals(big_array)

738 ms ± 3.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


It takes almost 1 second to run the look over all the data. 

It turns out that the bottleneck here is not the operations themselves. It is the type-checking and function dispatches that CPython (the language that Python uses to communicate with the computer) must do at each cycle of the loop. Each time the reciprocal is computed, Python first checks the object's type and performs a dynamic lookup of the correct function for that type. If we were working in compiled code instead, this type specification would be known before the code executes, and the result could be computed much more efficiently.

In NumPy, as all the elements in the array are of the same type, this type-checking is not needed, and the process is much faster. 

In [60]:
%timeit (1.0/big_array)

1.08 ms ± 7.79 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


It now takes less than 1 ms to perform the same operation, an speed improvement larger than 99%

These special operations are called **Ufunc** and are designed to speed up arrays manipulation. 

These operation can also be applied to multidimensial arrays

In [61]:
def compute_square(values):
    rows, columns = values.shape
    output = np.empty((rows,columns))
    
    for i in range(rows):
        for j in range(columns):
            output[i,j] = values[i,j]**2
    return output
        
values = np.random.randint(1, 10, size=(1000,1000)) #create a random matrix of int with minimum value 1 and max 10 (so that 0 is not included)
%timeit compute_square(values)

182 ms ± 506 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [62]:
%timeit (values**2)

1 ms ± 157 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


## Broadcasting

Broadcasting is a technique used by NumPy to vectorize operations with arrays of different sizes. 

In [63]:
#NumPy will happily add two arrays with the same dimensions
a = np.array([0, 1, 2])
b = np.array([5, 5, 5])
a + b

array([5, 6, 7])

**Broadcasting** allows you to do this kind of operation in arrays that have different dimensions, for instance, the previous cell can be reduced to



In [64]:
a+5

array([5, 6, 7])

NumPy will *fill* the missing dimensions for you so the operation can be performed.  
This works for high dimensional arrays as well.

In [65]:
M  = np.ones((3,3)) #array of 1 with dimension 3x3
print(f'M = {M}\n') 
print(f'M+5 = {M + 5}')

M = [[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]

M+5 = [[6. 6. 6.]
 [6. 6. 6.]
 [6. 6. 6.]]


In [66]:
#you can extend this to higher dimensions
M + np.array([0,1,2])

array([[1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.]])

In [67]:
M + np.array([[0],[1],[2]])

array([[1., 1., 1.],
       [2., 2., 2.],
       [3., 3., 3.]])

In [68]:
np.array([[0],[1],[2]]) + np.array([0,1,2])

array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])

Internally, NumPy adapts the size of the matrices so the operations can be performed 

![Image showing NumPy broadcasting](broadcasting.png)


### Rules of Broadcasting
Broadcasting in NumPy follows a strict set of rules to determine the interaction between the two arrays:

- Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.

- Rule 2: If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.

- Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

In [69]:
#Example 1
M = np.ones((2, 3))
a = np.arange(3)

Let's consider an operation on these two arrays. The shape of the arrays are

- `M.shape = (2, 3)`
- `a.shape = (3,)`
  
We see by rule 1 that the array `a` has fewer dimensions, so we pad it on the left with ones:

- `M.shape -> (2, 3)`
- `a.shape -> (1, 3)`
  
By rule 2, we now see that the first dimension disagrees, so we stretch this dimension to match:

- `M.shape -> (2, 3)`
- `a.shape -> (2, 3)`
  
The shapes match, and we see that the final shape will be `(2, 3)`:

In [71]:
M, a, M + a

(array([[1., 1., 1.],
        [1., 1., 1.]]),
 array([0, 1, 2]),
 array([[1., 2., 3.],
        [1., 2., 3.]]))

In [73]:
#Example 2
a = np.arange(3).reshape((3, 1))
b = np.arange(3)

Again, we'll start by writing out the shape of the arrays:

- `a.shape = (3, 1)`
- `b.shape = (3,)`
  
Rule 1 says we must pad the shape of b with ones:

- `a.shape -> (3, 1)`
- `b.shape -> (1, 3)`
  
And rule 2 tells us that we upgrade each of these ones to match the corresponding size of the other array:

- `a.shape -> (3, 3)`
- `b.shape -> (3, 3)`
  
Because the result matches, these shapes are compatible. We can see this here:

In [74]:
a, b , a+b

(array([[0],
        [1],
        [2]]),
 array([0, 1, 2]),
 array([[0, 1, 2],
        [1, 2, 3],
        [2, 3, 4]]))

In [78]:
#Example 3
M = np.ones((3, 2))
a = np.arange(3)

Again, we'll start by writing out the shape of the arrays:

- `M.shape = (3, 2)`
- `b.shape = (3,)`
  
Rule 1 says we must pad the shape of b with ones:

- `M.shape -> (3, 2)`
- `b.shape -> (1, 3)`
  
And rule 2 tells us that we upgrade each of these ones to match the corresponding size of the other array:

- `M.shape -> (3, 2)`
- `b.shape -> (3, 3)`
  
The end dimensions don't match, so by rule-3. these two arrays are incompatible. 

In [79]:
M, a

(array([[1., 1.],
        [1., 1.],
        [1., 1.]]),
 array([0, 1, 2]))

In [80]:
M+a

ValueError: operands could not be broadcast together with shapes (3,2) (3,) 

## NumPy operators

Numpy allows you to use operators with arrays. including 
- `==` or  `np.equal()`
- `<` or `np.less()`
- `>` or `np.greater()`
- `!=` or `np.not_equal()`
- `<=` or `np.lesss_equal()`
- `>=` or `np.greater_equal()`

These operations return a Boolean array with `True` or `False`.


In [81]:
x = np.random.randint(10, size=(3, 4))
x

array([[1, 6, 5, 8],
       [9, 2, 7, 4],
       [8, 4, 0, 8]])

In [82]:
x < 6

array([[ True, False,  True, False],
       [False,  True, False,  True],
       [False,  True,  True, False]])

In [83]:
#you can combine these operations into complex expressions
(x>4) & (x<6)

array([[False, False,  True, False],
       [False, False, False, False],
       [False, False, False, False]])

In [84]:
(x==0) | ((x>4) & (x<6))

array([[False, False,  True, False],
       [False, False, False, False],
       [False, False,  True, False]])

In [85]:
# These operations can also be used for indexing the array; these are known as masks
x[(x==0) | ((x>4) & (x<6))]

array([5, 0])

## Agregators: Min, Max, and others

NumPy offers Ufuncs to extract information from arrays 

In [86]:
#Maximum 
print(x.max())

9


In [87]:
#Maximum along a dimension
print(f'Maximum along dimension 0: {x.max(axis = 0)}')
print(f'Maximum along dimension 1: {x.max(axis = 1)}')

Maximum along dimension 0: [9 6 7 8]
Maximum along dimension 1: [8 9 8]


In [88]:
#Minimum 
x.min()

0

In [None]:
#sum
print(x.sum())
print(x.sum(axis=0))
print(x.sum(axis=1))

In [None]:
#mean 
print(x.mean())
print(x.mean(axis=0))
print(x.mean(axis=1))

In [None]:
#standard deviation 
print(x.std())
print(x.std(axis=0))
print(x.std(axis=1))

The following table provides a list of useful aggregation functions available in NumPy:

|Function Name      |   NaN-safe Version  | Description                                   |
|-------------------|---------------------|-----------------------------------------------|
| ``np.sum``        | ``np.nansum``       | Compute sum of elements                       |
| ``np.prod``       | ``np.nanprod``      | Compute product of elements                   |
| ``np.mean``       | ``np.nanmean``      | Compute mean of elements                      |
| ``np.std``        | ``np.nanstd``       | Compute standard deviation                    |
| ``np.var``        | ``np.nanvar``       | Compute variance                              |
| ``np.min``        | ``np.nanmin``       | Find minimum value                            |
| ``np.max``        | ``np.nanmax``       | Find maximum value                            |
| ``np.argmin``     | ``np.nanargmin``    | Find index of minimum value                   |
| ``np.argmax``     | ``np.nanargmax``    | Find index of maximum value                   |
| ``np.median``     | ``np.nanmedian``    | Compute median of elements                    |
| ``np.percentile`` | ``np.nanpercentile``| Compute rank-based statistics of elements     |
| ``np.any``        | N/A                 | Evaluate whether any elements are true        |
| ``np.all``        | N/A                 | Evaluate whether all elements are true        |
