# Lab: Introduction to NumPy
NumPy is the fundamental library for scientific computing with Python. NumPy is centered around a powerful N-dimensional array object, and it also contains useful linear algebra, Fourier transform, and random number functions.

## Creating arrays
let's import numpy. Most people import it as `np`:

In [1]:
import numpy as np

The `zeros` function creates an array containing any number of zeros:

In [2]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

It's just as easy to create a 2D array (ie. a matrix) by providing a tuple with the desired number of rows and columns. For example, here's a 3x4 matrix:

In [5]:
np.zeros((3,4))

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

## Some vocabulary
* In NumPy, each dimension is called an **axis**.
* The number of axes is called the **rank**.
- For example, the above 3x4 matrix is an array of rank 2 (it is 2-dimensional).
- The first axis has length 3, the second has length 4.
* An array's list of axis lengths is called the **shape** of the array.
- For example, the above matrix's shape is (3, 4).
- The rank is equal to the shape's length.
* The **size** of an array is the total number of elements, which is the product of all axis lengths (eg. 3*4=12)

In [7]:
a = np.zeros((3,4))
a.shape

(3, 4)

In [8]:
a.ndim

2

In [9]:
a.size

12

## N-dimensional arrays
You can also create an N-dimensional array of arbitrary rank. For example, here's a 3D array (rank=3), with shape (2,3,4):

In [10]:
a = np.zeros((2,3,4))
a

array([[[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]])

NumPy arrays have the type ndarrays:

In [11]:
type(a)

numpy.ndarray

Many other NumPy functions create ndarrays.

Here's a 3x4 matrix full of ones:

In [17]:
np.full((3,4), np.pi)

array([[3.14159265, 3.14159265, 3.14159265, 3.14159265],
       [3.14159265, 3.14159265, 3.14159265, 3.14159265],
       [3.14159265, 3.14159265, 3.14159265, 3.14159265]])

You can create an array of the given shape initialized with the given value. 

Here's a 3x4 matrix full of π.

In [15]:
np.empty((2,3))

array([[0., 0., 0.],
       [0., 0., 0.]])

You can create an uninitialized 2x3 array (its content is not predictable, as it is whatever is in memory at that point):

In [19]:
np.array([[1,2,3,4],[10,20,30,40]])

array([[ 1,  2,  3,  4],
       [10, 20, 30, 40]])

You can initialize an ndarray using a regular python array. Just call the array function:

You can create an ndarray using NumPy's range function, which returns evenly spaced values within a given interval: 
`arange(start, stop, step)`. The default start is 0, and the default step size is 1.

In [20]:
np.arange(1,6,1)

array([1, 2, 3, 4, 5])

In [22]:
np.arange(1,30,2)

array([ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29])

When working with floats, use the linspace function, which returns an array containing a specific number of points evenly distributed between two values. `linspace(start, stop, num_of_values)`

In [23]:
np.linspace(0,1.5,5)

array([0.   , 0.375, 0.75 , 1.125, 1.5  ])

A number of functions are available in NumPy's random module to create ndarrays initialized with random values. For example, here is a 3x4 matrix initialized with random floats between 0 and 1 (uniform distribution):

In [None]:
np.random.rand(3,4)

Here's a 3x4 matrix containing random floats sampled from a univariate normal distribution (Gaussian distribution) of mean 0 and variance 1:

NumPy's ndarrays are also efficient in part because all their elements must have the same type (usually numbers). You can check what the data type is by looking at the dtype attribute:

In [24]:
c = np.arange(1,5)
print(c.dtype,c)

int32 [1 2 3 4]


In [25]:
c = np.linspace(0, 1.5, 5)
print(c.dtype, c)

float64 [0.    0.375 0.75  1.125 1.5  ]


## Reshaping an array

Changing the shape of an ndarray is as simple as setting its `shape` attribute (in place). However, the array's size must remain the same.

In [26]:
g = np.arange(24)
print(g)
print("Rank:", g.ndim)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
Rank: 1


In [30]:
g.shape = (3,8)
print(g)
print("Rank:", g.ndim)

[[ 0  1  2  3  4  5  6  7]
 [ 8  9 10 11 12 13 14 15]
 [16 17 18 19 20 21 22 23]]
Rank: 2


The `reshape` function gives a new shape to an array without changing its data. Note that it returns a new `ndarray` object pointing at the same data. This means that modifying one array will also modify the other.

In [31]:
g2 = g.reshape(4,6)
print(g)
print("Rank:", g.ndim)
print(g2)
print("Rank:", g2.ndim)

[[ 0  1  2  3  4  5  6  7]
 [ 8  9 10 11 12 13 14 15]
 [16 17 18 19 20 21 22 23]]
Rank: 2
[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]
Rank: 2


In [36]:
g2[1,2] = 999
g2

array([[  0,   1,   2,   3,   4,   5],
       [  6,   7, 999,   9,  10,  11],
       [ 12,  13,  14,  15,  16,  17],
       [ 18,  19,  20,  21,  22,  23]])

Set element at row 1, col 2 to 999 (more about indexing below):

In [37]:
g

array([[  0,   1,   2,   3,   4,   5,   6,   7],
       [999,   9,  10,  11,  12,  13,  14,  15],
       [ 16,  17,  18,  19,  20,  21,  22,  23]])

The corresponding element in g has been modified as well:

In [38]:
g.ravel()

array([  0,   1,   2,   3,   4,   5,   6,   7, 999,   9,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23])

Finally, the `ravel` function returns a new one-dimensional ndarray that also points to the same data:

## Mathematical and statistical functions
Many mathematical and statistical functions are available for ndarrays. Here are a few useful functions.

Note the `mean` function computes the mean of all elements in the ndarray, regardless of its shape.

In [40]:
a = np.array([[-2,3.1,7],[10,11,12]])
print(a)
print("mean is ", a.mean())

[[-2.   3.1  7. ]
 [10.  11.  12. ]]
mean is  6.8500000000000005


In [42]:
for func in (a.min, a.max, a.sum, a.prod, a.std, a.var):
    print(func.__name__, "=", func())

min = -2.0
max = 12.0
sum = 41.1
prod = -57288.0
std = 4.934149707227511
var = 24.34583333333333


These functions accept an optional argument `axis` which lets you ask for the operation to be performed on elements along the given axis. For example:

In [44]:
c = np.arange(24).reshape(6,4)
c

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23]])

In [45]:
# sum across columns
c.sum(axis = 0)

array([60, 66, 72, 78])

In [46]:
# sum across rows
c.sum(axis = 1)

array([ 6, 22, 38, 54, 70, 86])

In [3]:
# sum across columns and columns

## Array indexing
**One-dimensional** NumPy arrays can be accessed with the basic slice syntax `i:j:k` where `i` is the starting index, `j` is the stopping index, and `k` is the step (`k!=0`). 

In [48]:
a = np.array([12,5,4,19,13,7,3])
a

array([12,  5,  4, 19, 13,  7,  3])

In [49]:
a[0]

12

In [50]:
a[3]

19

In [51]:
a[2:5]

array([ 4, 19, 13])

In [52]:
a[2:5:2]

array([ 4, 13])

Negative starting index `i` and stopping index `j` are interpreted as `n + i` and `n + j` where `n` is the number of elements in the corresponding dimension.

In [54]:
a[-1]

3

In [53]:
a[-2]

7

If `i` is not given it defaults to 0 for k > 0 and n - 1 for k < 0

If `j` is not given it defaults to n for k > 0 and -n-1 for k < 0 . 

If `k` is not given it defaults to 1.

 Negative step `k` makes stepping go towards smaller indices.

In [55]:
print(a)
a[::-1]

[12  5  4 19 13  7  3]


array([ 3,  7, 13, 19,  4,  5, 12])

You can modify elements using the index:

You can also modify an ndarray slice:

**Multi-dimensional** NumPy arrays can be accessed in a similar way by providing an index or slice for each axis, separated by commas:

In [58]:
b = np.arange(48).reshape(4,12)
b

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35],
       [36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]])

In [61]:
# get row 1, col2
b[1,:]

array([12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23])

In [4]:
# row 1, all columns

In [5]:
# all rows, column 1

Everything works just as well with **higher dimensional arrays**

In [64]:
# 4 small arrays/matrices, each array is 2x6
c = b.reshape(4,2,6)
c

array([[[ 0,  1,  2,  3,  4,  5],
        [ 6,  7,  8,  9, 10, 11]],

       [[12, 13, 14, 15, 16, 17],
        [18, 19, 20, 21, 22, 23]],

       [[24, 25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34, 35]],

       [[36, 37, 38, 39, 40, 41],
        [42, 43, 44, 45, 46, 47]]])

In [7]:
# matrix 2, row 1, col 4

In [8]:
# matrix 2, all rows, col 3

**Iterating** over multidimensional arrays is done with respect to the first axis.

In [9]:
# Create a 3-D array (composed of two 3x4 matrices)


If you want to **iterate on all elements** in the ndarray, simply iterate over the `flat` attribute:

## Stacking arrays
It is often useful to stack together different arrays. NumPy offers several functions to do just that. Let's start by creating a few arrays.

In [65]:
q1 = np.full((2,4), 1)
q1

array([[1, 1, 1, 1],
       [1, 1, 1, 1]])

In [66]:
q2 = np.full((4,4), 2)
q2

array([[2, 2, 2, 2],
       [2, 2, 2, 2],
       [2, 2, 2, 2],
       [2, 2, 2, 2]])

In [67]:
q3 = np.full((2,4), 3)
q3

array([[3, 3, 3, 3],
       [3, 3, 3, 3]])

You can stack them vertically using `vstack`:

In [68]:
q4 = np.vstack((q1,q2,q3))
q4

array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [2, 2, 2, 2],
       [2, 2, 2, 2],
       [2, 2, 2, 2],
       [2, 2, 2, 2],
       [3, 3, 3, 3],
       [3, 3, 3, 3]])

In [70]:
q5 = np.hstack((q1,q3))
q5

array([[1, 1, 1, 1, 3, 3, 3, 3],
       [1, 1, 1, 1, 3, 3, 3, 3]])

You can also stack arrays horizontally using `hstack`:

In [None]:
q5.shape

In [None]:
# this does not work because q1 has 2 rows, but q2 has 4 rows

The `stack` function stacks arrays along a new axis. All arrays have to have the same shape.

In [71]:
q6 = np.hstack(q1,q2)

TypeError: _vhstack_dispatcher() takes 1 positional argument but 2 were given

## Splitting arrays
Splitting is the opposite of stacking. For example, let's use the `vsplit` function to split a matrix vertically.

First let's create a 6x4 matrix:

In [72]:
r = np.arange(24).reshape(6,4)
r

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23]])

Now let's split it in three equal parts, vertically(row-wise) using `vsplit`:

In [74]:
r1 ,r2 , r3 = np.vsplit(r,3)
r1

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

Now let's split it in two equal parts, horizontally (column-wise) using `hsplit`:

In [76]:
r1 ,r2 , r3, r4= np.hsplit(r,4)
r1

array([[ 0],
       [ 4],
       [ 8],
       [12],
       [16],
       [20]])

There is also a split function which splits an array along any given axis. Calling `vsplit` is equivalent to calling `split` with `axis=0`. There is also an `hsplit` function, equivalent to calling `split` with `axis=1`:

# What next?
Now you know some fundamentals of NumPy, but there are many more options available. The best way to learn more is to experiment with NumPy, and go through the [NumPy Reference](https://numpy.org/doc/stable/reference/index.html) to find more functions and features you may be interested in.