# NumPy

### What is NumPy?
NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, basic linear algebra, basic statistical operations, random simulation and much more.

You may be familiar with the list data structure in python, similar to this, NumPy offers multi-dimensional lists also called **n-dimensionalal arrays** or **ndarrays**. Manipulation on these **ndarrays** are much faster and highly optimized because of the way it has been implemented in the NumPy library. So we will be using these **ndarrays** moving forward.

### Learning Objectives

After reading, you should be able to:
1. Understand the difference between one-, two- and n-dimensional arrays in NumPy;
2. Understand how to apply some linear algebra operations to n-dimensional arrays without using for-loops;
3. Understand axis and shape properties for n-dimensional arrays.

Let's import numpy


In [1]:
import numpy as np # we import it as np so we can type 'np' instead of 'numpy' for calling functions

## The Basics

NumPy’s main object is the _homogeneous_ multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of non-negative integers. In NumPy dimensions are called **_axes_**.

For example, the array for the coordinates of a point in 3D space, ``[1, 2, 1]``, has one axis. That axis has 3 elements in it, so we say it has a length of 3.

In the example below, the array has 2 axes. The first axis has a length of 2 as there are 2 elements(arrays) inside it, the second axis has a length of 3 as there are 3 elements inside it.

In [None]:
"""
[[1., 0., 0.],
 [0., 1., 2.]]
 
"""

## Array Creation
There are several ways to create arrays.

**Method 1:**
For example, you can create an array from a regular Python list or tuple using the array function. The type of the resulting array is deduced from the type of the elements in the sequences.

In [2]:
list_1D = [1, 2, 3]
list_2D = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [3]:
list_1D

[1, 2, 3]

In [4]:
list_2D

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [5]:
vector = np.array(list_1D)
matrix = np.array(list_2D)

In [6]:
vector

array([1, 2, 3])

In [7]:
matrix

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

**Method 2:** Often, the elements of an array are originally unknown, but its size is known. Hence, NumPy offers several functions to create arrays with initial placeholder content. These minimize the necessity of growing arrays, an expensive operation.

The function ``zeros`` creates an array full of zeros, the function ``ones`` creates an array full of ones, and the function ``empty`` creates an array whose initial content is random and depends on the state of the memory. By default, the dtype of the created array is ``float64``, but it can be specified via the key word argument ``dtype``.

Let's try each of these

In [8]:
np.zeros((3,4)) # this takes a tuple as an argument to specify the dimensions of the array

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [9]:
np.ones((2,3,4), dtype=np.int16) # we can also specify the data type of the array

array([[[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]]], dtype=int16)

In [11]:
np.empty((2, 3)) # this creates an array with random values that depend on the state of the memory

array([[0., 0., 0.],
       [0., 0., 0.]])

To create sequences of numbers, NumPy provides the ``arange`` function which is analogous to the Python built-in ``range``, but returns an array.

In [12]:
np.arange(10, 30, 5) # this creates an array with values ranging from 10 to 30 with a step of 5

array([10, 15, 20, 25])

In [13]:
np.arange(0, 2, 0.3)  # it accepts float arguments

array([0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8])

When arange is used with floating point arguments, it is generally not possible to predict the number of elements obtained, due to the finite floating point precision. For this reason, it is usually better to use the function ``linspace`` that receives as an argument the number of elements that we want, instead of the step:

In [14]:
np.linspace(0, 2, 9) # this creates an array with 9 evenly spaced values between 0 and 2

array([0.  , 0.25, 0.5 , 0.75, 1.  , 1.25, 1.5 , 1.75, 2.  ])

In [16]:
# example usecase:
x = np.linspace(0, 2*np.pi, 100) # we create an array with 100 values between 0 and 2*pi
f = np.sin(x) # we apply the sin function to the array x and store the result in f 

# so we are evaluating the sin function at 100 points between 0 and 2*pi
f

array([ 0.00000000e+00,  6.34239197e-02,  1.26592454e-01,  1.89251244e-01,
        2.51147987e-01,  3.12033446e-01,  3.71662456e-01,  4.29794912e-01,
        4.86196736e-01,  5.40640817e-01,  5.92907929e-01,  6.42787610e-01,
        6.90079011e-01,  7.34591709e-01,  7.76146464e-01,  8.14575952e-01,
        8.49725430e-01,  8.81453363e-01,  9.09631995e-01,  9.34147860e-01,
        9.54902241e-01,  9.71811568e-01,  9.84807753e-01,  9.93838464e-01,
        9.98867339e-01,  9.99874128e-01,  9.96854776e-01,  9.89821442e-01,
        9.78802446e-01,  9.63842159e-01,  9.45000819e-01,  9.22354294e-01,
        8.95993774e-01,  8.66025404e-01,  8.32569855e-01,  7.95761841e-01,
        7.55749574e-01,  7.12694171e-01,  6.66769001e-01,  6.18158986e-01,
        5.67059864e-01,  5.13677392e-01,  4.58226522e-01,  4.00930535e-01,
        3.42020143e-01,  2.81732557e-01,  2.20310533e-01,  1.58001396e-01,
        9.50560433e-02,  3.17279335e-02, -3.17279335e-02, -9.50560433e-02,
       -1.58001396e-01, -

**Method 3:** Randomly initiating values for a given shape. This is a very useful function that is often used. For example to set the weights and biases of a neural network before training begins

In [18]:
np.random.rand(2,3) # this creates an array with random values from a uniform distribution between 0 and 1

array([[0.33167216, 0.57955807, 0.92707037],
       [0.5431244 , 0.81816156, 0.66017697]])

In [19]:
np.random.uniform(-1, 1, (2,3)) # this creates an array with random values from a uniform distribution between -1 and 1

# np.random.uniform(low, high, size): low is the lower bound of the distribution, 
# high is the upper bound of the distribution, size is the shape of the array

array([[ 0.63689836,  0.20136838, -0.80998536],
       [-0.14837714,  0.97002562, -0.02762426]])

In [20]:
np.random.randint(0, 10, (2,3)) # this creates an array with random integers between 0 and 10

array([[1, 0, 7],
       [9, 6, 3]])

In [21]:
# if you arent sure about what a function does, you can use the help function
help(np.random.randint)

Help on built-in function randint:

randint(...) method of numpy.random.mtrand.RandomState instance
    randint(low, high=None, size=None, dtype=int)
    
    Return random integers from `low` (inclusive) to `high` (exclusive).
    
    Return random integers from the "discrete uniform" distribution of
    the specified dtype in the "half-open" interval [`low`, `high`). If
    `high` is None (the default), then results are from [0, `low`).
    
    .. note::
        New code should use the `~numpy.random.Generator.integers`
        method of a `~numpy.random.Generator` instance instead;
        please see the :ref:`random-quick-start`.
    
    Parameters
    ----------
    low : int or array-like of ints
        Lowest (signed) integers to be drawn from the distribution (unless
        ``high=None``, in which case this parameter is one above the
        *highest* such integer).
    high : int or array-like of ints, optional
        If provided, one above the largest (signed) integer t

Now that we know how to create ndarrays, let's look at how we can check the properties of an ndarray

``ndarray.ndim``
    : the number of axes (dimensions) of the array.

``ndarray.shape``
    : the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the number of axes, ndim.

``ndarray.size``
    : the total number of elements of the array. This is equal to the product of the elements of shape.

``ndarray.dtype``
    : an object describing the type of the elements in the array. One can create or specify dtype’s using standard Python types. Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some examples.

In [22]:
arr = np.random.randint(0, 10, (2,3))

print(f"Type: {type(arr)}")
print(f"Shape of the array: {arr.shape}")
print(f"Number of dimensions: {arr.ndim}")
print(f"Number of elements: {arr.size}")
print(f"Data type of the elements: {arr.dtype}")

Type: <class 'numpy.ndarray'>
Shape of the array: (2, 3)
Number of dimensions: 2
Number of elements: 6
Data type of the elements: int32


## Basic Arithmetic Operations

Arithmetic operators on arrays apply _elementwise_,i.e, element by element A new array is created and filled with the result.

This is different from normal python lists. Let's see how

In [23]:
python_list = [1, 2, 3]
numpy_array = np.array(python_list)

In [24]:
python_list + python_list # this concatenates the lists

[1, 2, 3, 1, 2, 3]

In [25]:
numpy_array + numpy_array # this adds the arrays element-wise

array([2, 4, 6])

In [26]:
# similarly, we can subtract, multiply, divide, exponentiate, etc. element-wise
numpy_array**2

array([1, 4, 9])

Let's look at some examples of operations on ndarrays. Note that the original arrays are not changed, instead a new array is returned as a result

In [38]:
a = np.array([20, 30, 40, 50])
b = np.arange(4)
print(a, b)

[20 30 40 50] [0 1 2 3]


In [39]:
c = a - b # element-wise subtraction
c

array([20, 29, 38, 47])

In [40]:
b**2 # element-wise exponentiation

array([0, 1, 4, 9])

In [41]:
10 * np.sin(a) # element-wise multiplication with a scalar and a function

array([ 9.12945251, -9.88031624,  7.4511316 , -2.62374854])

In [42]:
a < 35 # element-wise comparison

array([ True,  True, False, False])

Many unary operations, such as computing the sum of all the elements in the array, are implemented as methods of the ``ndarray`` class.

In [51]:
a = np.array([[1, 5], [-3, 1], [2, 3]]) # creating an array
a

array([[ 1,  5],
       [-3,  1],
       [ 2,  3]])

In [52]:
a.sum() # sum of all elements

9

In [53]:
a.min() # minimum value of the array

-3

In [54]:
a.max() # maximum value of the array

5

In [55]:
a.argmax() # index of the maximum value

1

In [56]:
a.argmin() # index of the minimum value

2

By default, these operations apply to the array as though it were a list of numbers, regardless of its shape. However, by specifying the ``axis`` parameter you can apply an operation along the specified axis of an array:

In [58]:
b = np.arange(12).reshape(3, 4) # we will talk about the reshape function in detail later on
b

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [59]:
b.sum(axis=0) # sum of each column

array([12, 15, 18, 21])

In [60]:
b.min(axis=1) # minimum of each row

array([0, 4, 8])

In [61]:
b.cumsum(axis=1) # cumulative sum along each row

array([[ 0,  1,  3,  6],
       [ 4,  9, 15, 22],
       [ 8, 17, 27, 38]])

Unlike in many matrix languages, the product operator ``*`` operates elementwise in NumPy arrays. The matrix product can be performed using the ``@`` operator (in python >=3.5) or the ``dot`` function

In [62]:
A = np.array([[1, 1],
              [0, 1]])

B = np.array([[2, 0],
              [3, 4]])

In [63]:
A * B # elementwise product

array([[2, 0],
       [0, 4]])

In [64]:
A @ B # matrix product

array([[5, 4],
       [3, 4]])

In [65]:
A.dot(B) # matrix product (another way) by using the dot function

array([[5, 4],
       [3, 4]])

## Indexing, Slicing and Iterating
One-dimensional arrays can be indexed, sliced and iterated over, much like lists and other Python sequences.

In [68]:
a = np.arange(10)**3
print(a, end='\n\n')
print(a[2], end='\n\n')
print(a[2:5], end='\n\n')

# from start to position 6, exclusive, set every 2nd element to 1000
a[:6:2] = 1000
print(a, end='\n\n')
print(a[::-1]) # reverse the array

[  0   1   8  27  64 125 216 343 512 729]

8

[ 8 27 64]

[1000    1 1000   27 1000  125  216  343  512  729]

[ 729  512  343  216  125 1000   27 1000    1 1000]


Multidimensional arrays can have one index per axis. These indices are given in a tuple separated by commas:

In [72]:
b = np.array([[  0,  1,  2,  3],
                [10, 11, 12, 13],
                [20, 21, 22, 23],
                [30, 31, 32, 33],
                [40, 41, 42, 43]])

print(b[2,3], end='\n\n') # element at row 2, column 3 (note that indexing starts from 0)
print(b[0:5, 1], end='\n\n') # second column of all rows
print(b[:,1], end='\n\n') # equivalent to the previous one
print(b[1:3, :], end='\n\n') # second and third row

23

[ 1 11 21 31 41]

[ 1 11 21 31 41]

[[10 11 12 13]
 [20 21 22 23]]



In [73]:
# Iterating over multidimensional arrays is done with respect to the first axis:
for row in b:
    print(row)

[0 1 2 3]
[10 11 12 13]
[20 21 22 23]
[30 31 32 33]
[40 41 42 43]


However, if one wants to perform an operation on each element in the array, one can use the ``flat`` attribute which is an iterator over all the elements of the array:

In [74]:
for element in b.flat:
    print(element)  # this will print all the elements of the array

0
1
2
3
10
11
12
13
20
21
22
23
30
31
32
33
40
41
42
43


## Shape Manipulation

This is one of the most frequent operations that we will perform on arrays. While buidling ML models, we often change the shape of our input data to match the shape of the input for the model. Even internally in the model, shape manipulations are done during calculations.

### Changing the shape of an array
An array has a shape given by the number of elements along each axis:

In [81]:
a = np.random.randint(0, 10, (3,4))
a, a.shape

(array([[9, 8, 6, 1],
        [9, 7, 6, 7],
        [3, 1, 2, 3]]),
 (3, 4))

The shape of an array can be changed with various commands. Note that the following three commands all return a **modified** array, but do **not** change the original array:

In [82]:
a.ravel() # returns the array, flattened

array([9, 8, 6, 1, 9, 7, 6, 7, 3, 1, 2, 3])

the ``reshape()`` method takes in the new shape provided its valid. To check if its valid, make sure that the product of the dimensions is same for the old shape and the new shape. In this example the product is 3 x 4 = 12, so as long as we choose a shape that has a product of 12, it will be valid. For example: 6 x 2

In [83]:
a.reshape(6, 2) # returns the array with a modified shape

array([[9, 8],
       [6, 1],
       [9, 7],
       [6, 7],
       [3, 1],
       [2, 3]])

In [84]:
a.T # returns the array, transposed

array([[9, 9, 3],
       [8, 7, 1],
       [6, 6, 2],
       [1, 7, 3]])

In [86]:
a.T.shape, a.shape # notice that the array is infact transposed

((4, 3), (3, 4))

In [88]:
a.reshape(4, -1) # if a dimension is given as -1 in a reshaping operation, the other dimensions are automatically calculated

array([[9, 8, 6],
       [1, 9, 7],
       [6, 7, 3],
       [1, 2, 3]])