# Basics

Loading numpy library

In [19]:
import numpy as np

Let's define an array

In [20]:
arr = [1,2,3]
arr

[1, 2, 3]

This array can be converted to a numpy array as such

In [21]:
nparr = np.array(arr)
nparr

array([1, 2, 3])

Let's define some new arrays and do some basic algebra

In [22]:
a = np.array([1,2])
b = np.array([2,-1])
c = a + b
d = a - b
print(c,d)

[3 1] [-1  3]


Element wise multiplication and division could also be done in same manner

In [23]:
m = a*b
n = a/b
print(m,n)

[ 2 -2] [ 0.5 -2. ]


You can check the array dimensions as such

In [24]:
a.shape

(2,)

Since the array `a` has two entries and a single row.

Let's define some more arrays and check dimensions

In [25]:
dim1 = np.array([[3, 1, 2], [-4, 8, -1]])
dim1.shape

(2, 3)

Notice that here, the first number is the number of rows that the array has, the second being the number of columns. Alternatively, the first number could be thought of as the number of arrays the array has, and the second number is the number of entries each array inside the numpy array has.

To get total number of entries in an array, we can use

In [26]:
dim1.size

6

The `len' function returns how many rows an array has, or alternatively, how many subarrays it contains.

In [27]:
len(a)

2

# Data types
Just like in python, numpy has its own data types.

In [28]:
ex1 = np.array([1,2,3])
ex1.dtype

dtype('int32')

In [29]:
ex2 = np.array([1.1, 2, 3])
ex2.dtype

dtype('float64')

Notice how having a single `float` data converts the whole array into a `float` data type. Let's see what happens when we have `string` data.

In [30]:
ex3 = np.array(['abc', 'defgj'])
ex3.dtype

dtype('<U5')

Notice that the datatype is shown to be `<U5`, which means that the data contains strings with length less than or equal to 5.

There are more data types, see the full documentation [here](https://numpy.org/doc/stable/user/basics.types.html).

# Initializing arrays

There are many ways to initialize numpy array. When we started this notebook, we saw a way to initialize arrays by explicitly typing in each entry. This method only works if we have a particularly small number of entries to input, as otherwise it would be cumbersome to manually input everything.

To create a 3 by 4 matrix that is entirely filled with zeros, we can use `zeros()` command.

In [31]:
zer = np.zeros((3,4))
zer

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

Similarly, we can create an array filled with one by using `ones()` in a similar manner. To create an array with every entry set to 7, we could create an array of `ones` then multiply it by 7.

In [32]:
oarr = np.ones((2,3))*7
oarr

array([[7., 7., 7.],
       [7., 7., 7.]])

To create an identity matrix of fifth order, we can use

In [33]:
iden = np.eye(5)
iden

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

To generate an array within a given range, we have two options available: `arange` and `linspace`.

`arange` takes the boundaries of the range and then takes the step value, that is, the spacing between each point starting from the left boundary. So, if we want to generate the sequence 3, 6, 9 upto 33, we will use the following.

In [34]:
ex4 = np.arange(3,33,3)
ex4

array([ 3,  6,  9, 12, 15, 18, 21, 24, 27, 30])

Note that it excludes the value on the right boundary.

`linspace` works somewhat in a similar fashion, but it takes the number of samples to generate within a range. So if we have to pick 10 numbers evenly distributed from each other in the range 3 to 33, we will us

In [35]:
ex5 = np.linspace(3, 33, 10)
ex5

array([ 3.        ,  6.33333333,  9.66666667, 13.        , 16.33333333,
       19.66666667, 23.        , 26.33333333, 29.66666667, 33.        ])

Note that it includes the right boundary, unlike `arange`.

We can also generate an array and form a matrix out of it using `reshape()`. The code shown below creates an array with the numbers 0 to 9 and then reshapes it as a 2 by 5 array or amtrix.

In [36]:
reshp = np.arange(10).reshape((2,5))
reshp

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

To create a random array of the order 2 by 4, we can use.

In [37]:
randarr = np.random.random((2,4))
randarr

array([[0.28422684, 0.42877185, 0.2838451 , 0.12477088],
       [0.84390864, 0.37828582, 0.43191335, 0.96509912]])

Note that the numbers generated in this method are in the range 0 to 1. To generate 15 random integers in the range of 0 to 10, we can use

In [38]:
randints = np.random.randint(0,10,size=15)
randints

array([1, 6, 0, 4, 9, 3, 9, 4, 2, 1, 3, 0, 3, 2, 8])

We can also create a matrix with random values by indicating a tuple in the `size=` argument.

In [39]:
randmat = np.random.randint(0, 25, size=(4, 3)) 
#Generate a matrix with values from 0 to 24 with 4 rows and 3 columns
randmat

array([[ 6, 10, 14],
       [ 7, 11,  3],
       [13, 21, 13],
       [17,  7,  0]])

To demonstrate a three dimensional array, we see the following example

In [40]:
arr3d = np.random.randint(0, 11, size=(4, 3, 5))
arr3d

array([[[ 2,  2,  4,  9,  2],
        [ 3,  7,  8, 10,  2],
        [ 5,  1,  2,  6,  9]],

       [[ 0,  2,  9,  1,  4],
        [ 3,  7,  2,  3,  5],
        [ 1,  6,  1,  5,  9]],

       [[ 8,  6,  9,  1, 10],
        [ 2,  8, 10,  2,  0],
        [10,  7,  3,  8,  5]],

       [[ 8,  2,  4,  2, 10],
        [ 5,  6,  5,  5,  3],
        [ 6,  0, 10,  7,  6]]])

The easiest way to understand a three dimensional array is to think like this: `size=(4,3,5)` has three parameters in the tuple. The first one is 4, which means how many matrices are contained in the array. The second and and third are just the order of each of the matrices contained in the array, as shown in the previous example.

# Matrix manipulation

We saw some basic arithmetic in the Basics section. We can cover some more operations here.

Suppose that we are given two vectors $$v_1=3\textbf i-4\textbf j+2\textbf k,\ v_2=-5\textbf i+2\textbf j-2\textbf k$$And we wish to find the dot product between these vectors. We can do it as follows. 

In [41]:
v1 = np.array([3, -4, 2])
v2 = np.array([-5, 2, -2])
v1.dot(v2)

-27

The `dot` method is also used for matrix multiplication of two matrices. Please keep in mind the condition for being able to find the product of two matrices.

In [42]:
mat1 = np.array([
    [2,3],
    [1,2]
])
mat2 = np.array([
    [-1,4],
    [3,-6]
])
mat1.dot(mat2)

array([[  7, -10],
       [  5,  -8]])

We can also transpose matrices easily

In [43]:
mat1.transpose()

array([[2, 1],
       [3, 2]])

We can use the standard operations for adding or subtracting matrices. We can also multiply scalars intuitively. Alternatively, some people like to use the methods provided by numpy to compute matrix arithmetic.

In [44]:
np.add(mat1, mat2)

array([[ 1,  7],
       [ 4, -4]])

In [45]:
np.subtract(mat1, mat2)

array([[ 3, -1],
       [-2,  8]])

The advantage this has over simply adding or subtracting is that it brings clarity. Writing `mat1 + mat2` might be easier but it won't be intuitively understood by the reader as to what `mat1` and `mat2` might be. Using `numpy.add()` implies that we are adding two numpy array or amtrices.

# Numpy mathematical functions

Suppose an array `x` of values are given, and we wish to apply some predefined function on it. For example, if we want we can find `y=e^x` for each value in `x`.

In [46]:
x = np.linspace(1, 5, 100)  #create a list of hundred in the inclusive range of 1 to 5
y = np.exp(x)               #apply e^x to each element
y[:10]                      #view the first 10 elements

array([2.71828183, 2.83036036, 2.94706005, 3.06857143, 3.19509289,
       3.32683101, 3.46400087, 3.60682644, 3.7555409 , 3.91038707])

Similarly we could apply functions such as `sqrt`, `sin` and `log` just to name a few. Keep in mind of the domains while applying these functions, as they can throw a domain error.

# Basic statistics

At first we create some random one dimensional arrays (which are also sometimes called vectors) to show how how basic statistical features can be extracted using numpy.

In [47]:
test1 = np.random.randint(1, 100, 1000) #generate 1000 random integers from 1 to 1000
test1[:5]                               #view the first five entries

array([80, 27, 24,  1, 93])

In [48]:
test1.sum()     #Sum of all the entries

50146

In [49]:
test1.min()     #minimum value from the array

1

In [50]:
test1.max()     #maximum value from the array

99

Keep in mind that `random.randint()` excludes the right boundary, that is why the maximum found in the array is 99, not 100.  
We can also find mean and median of the data.

In [51]:
test1.mean()

50.146

In [52]:
np.median(test1)

49.0

There is no intrinsic method available to the numpy array in order to calculate the median, that is why we need to call the `numpy.median()` method. However, the array has an intrinsic `mean()` method available, as showed above.

We will now create a random three dimensional array and get the basic statistics for it across its various dimensions. 

In [53]:
mat3d = np.random.randint(0, 11, size=(3, 3, 3)) 
mat3d

array([[[10,  5,  8],
        [ 4,  4, 10],
        [ 8,  6,  2]],

       [[ 2,  9,  7],
        [ 2,  9,  8],
        [ 8,  6,  7]],

       [[ 6,  5,  8],
        [ 1,  4,  4],
        [ 8,  6,  1]]])

What if we try using the methods we had learnt previously?

In [54]:
mat3d.sum()

158

This just shows the sum over all the elements.

In [55]:
mat3d.max()

10

It is clear that the methods that we used before work on the entire array. But what if we wanted to know the `sum` of all the elements of a particular dimension, or the `max` of a particular matrix in the array?

In [56]:
mat3d.sum(axis=0)

array([[18, 19, 23],
       [ 7, 17, 22],
       [24, 18, 10]])

Now what did we just get? First, understand that our array has a size of (3,3,3), with the first one indicating how many matrices we have, second one indicating the number of rows and the third the number of columns. When we type `axis=0` in the parameter of `sum()` method, it conducts a summation over the 0th (or the first) axis. And the first axis denotes the number of matrices. So the sum, in this case, essentially means the summation of the three matrices!

In [57]:
mat3d.sum(axis=1) #sums over columns

array([[22, 15, 20],
       [12, 24, 22],
       [15, 15, 13]])

In [58]:
mat3d.sum(axis=2) #sums over rows

array([[23, 18, 16],
       [18, 19, 21],
       [19,  9, 15]])

Similarly, the `max()` with `axis=0` will compare the corresponding entries of each matrix in the array, and choose the maximum from each to form a new matrix of maximums.

In [59]:
mat3d.max(axis=0)

array([[10,  9,  8],
       [ 4,  9, 10],
       [ 8,  6,  7]])

What the max method does here is this: it takes the first entry from first row of each matrix, compares them, and then chooses the maximum. The method continues comparing each corresponding entry, keeping it in a matrix that it finally outputs.

Try out the other axes on your own to understand how everything works.