## A brief NumPy tutorial

In this notebook, we'll be looking at some basic functionality associated with the [NumPy](https://numpy.org/doc/stable/index.html) package for Python as well as some resources for learnig how to use it fully. NumPy is the most popular scientific computing package used with Python, so if you're interested in working with data you might find it helpful. Let's get it installed.

In [None]:
import numpy as np

### Arrays

The central structure to the NumPy package is the array. Think of an array as a structured, organized set of numbers. This can be a single row of numbers --


In [None]:
arr1 = np.array([4,1,7,5])
print(arr1)

[4 1 7 5]


a matrix --

In [None]:
arr2 = np.array([[9,8,7],[6,5,4],[3,2,1]])
print(arr2)

[[9 8 7]
 [6 5 4]
 [3 2 1]]


or even higher dimensional structures! But don't worry about that for now. The primary consideration for working with matrices is that all rows must have the same number of elements.

### Indexing

To access elements of an array, we need to *index*, or provide a reference for the location to return.

Python uses 0-indexing, so the example below references the 4th element in the array.

In [None]:
print(arr1[3])

5


In a two-dimensional array, providing one number as a reference returns the corresponding row

In [None]:
print(arr2[1])

[6 5 4]


And providing the `(row, column)` values gives us one element

In [None]:
print(arr2[1,1])

5


### What else can we do with NumPy

- Initialize arrays of all zeroes

In [None]:
arr_0 = np.zeros(4)
print(arr_0)

[0. 0. 0. 0.]


- Combine two arrays by concatenation

In [None]:
print(np.concatenate((arr1, arr_0)))

[4. 1. 7. 5. 0. 0. 0. 0.]


- Sort a one-dimensional array

In [None]:
print(np.sort(arr1))

[1 4 5 7]


- Sort by row of a two dimensional array

In [None]:
print(np.sort(arr2))

[[7 8 9]
 [4 5 6]
 [1 2 3]]


- Get the number of elements in an array

In [None]:
arr2.size

9

- and the shape of the array

In [None]:
arr2.shape

(3, 3)

- And! given these values, reshape the array.

In [None]:
arr3 = arr2.reshape(9, 1)
print(arr2)

[[9 8 7]
 [6 5 4]
 [3 2 1]]


### Indexing with logic

Let's initialize a larger array

In [None]:
arr4 = np.array([[26,	4,	91,	25,	32],
                 [26,	80,	33,	16,	8],
                 [100,	86,	50,	21,	82],
                 [54,	90,	95,	99,	43],
                 [64,	67,	67,	38,	92],
                 [25,	20,	50,	69,	75]])

54.266666666666666

You can index an array (as seen in the cases below) by comparing the array with a logical operator, returing all matching elements.

In [None]:
print(arr4[arr4<50])

[26  4 25 32 26 33 16  8 21 43 38 25 20]


In [None]:
print(arr4[arr4 % 3 == 0])

[33 21 54 90 99 69 75]


### Doing math with NumPy

Let's create two arrays to do some calculations on

In [None]:
arr5 = np.array([1,2,3,4,5])
arr6 = np.array([1,1,1,1,1])

like calculating the maximum

In [None]:
max(arr5)

5

, minimum

In [None]:
min(arr5)

1

, sum

In [None]:
sum(arr5)

15

, mean

In [None]:
np.mean(arr5)

, difference by element

In [None]:
print(arr5-arr6)

[0 1 2 3 4]


, and the result of division by element

In [None]:
print(arr4/arr5)

[[ 26.           2.          30.33333333   6.25         6.4       ]
 [ 26.          40.          11.           4.           1.6       ]
 [100.          43.          16.66666667   5.25        16.4       ]
 [ 54.          45.          31.66666667  24.75         8.6       ]
 [ 64.          33.5         22.33333333   9.5         18.4       ]
 [ 25.          10.          16.66666667  17.25        15.        ]]


Given that this is what NumPy was ultimately made for, it's impossible to demonstrate even a small fraction of this functionality. Check out [this documentation](
https://numpy.org/doc/stable/reference/routines.math.html) to see the full list per the creators of the package.

## Exercises

In this section, we will work with a small matrix of simulated data. Imagine each row corresponds to one patient, and each column to a biomarker associated with bladder cancer. We're going to use NumPy to explore the data.


In [None]:
ex_arr = np.array([[6.87, 6.88, 8.95, 8.16, 7.12],
                  [0.47, 8.76, 0.12, 8.18, 5.14],
                  [1.5, 2.65, 1.87, 2.61, 1.54],
                  [2.55, 3.12, 3.57, 3.91, 3.84,],
                  [3.14, 0.16, 7.29, 9, 1.64]])
ex_arr

array([[6.87, 6.88, 8.95, 8.16, 7.12],
       [0.47, 8.76, 0.12, 8.18, 5.14],
       [1.5 , 2.65, 1.87, 2.61, 1.54],
       [2.55, 3.12, 3.57, 3.91, 3.84],
       [3.14, 0.16, 7.29, 9.  , 1.64]])

### 1

Sort each patient by biomarker. What biomarker does each patient have their highest score for? Can you use the documentation above to automatically calculate this.

### 2

Calculate the mean value for each patient and biomarker. Which patient has the highest average? Which biomarker?

### 3

How many values in the matrix are greater than 5? What about 8?

### 4

Often in biological data, we need to *normalize*, meaning to rescale variables to make them more simple to directly compare. Can you normalize this matrix by dividing each biomarker by its mean value?