## NumPy
### BIOINF 575 - Fall 2022



_____


### NumPy - Numeric python <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/NumPy_logo.svg/1200px-NumPy_logo.svg.png" alt="NumPy logo" width = "100">

____
#### A list contains refences to each of the values.
#### An array refers to a block of memory containg all values one after the other.
- <b>that is why we need to know the size of the array and the array size cannot change <br>


<img src = "https://www.python-course.eu/images/list_structure.png" width = 350 /> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<img src = "https://www.python-course.eu/images/array_structure.png" width = 350 />
____

#### Arrays of different dimensions (`shape` gives the number of elements on each dimension):
<img src="https://raw.githubusercontent.com/elegant-scipy/elegant-scipy/master/figures/NumPy_ndarrays_v2.svg" alt="data structures" width="600">  

https://github.com/elegant-scipy/elegant-scipy
_____


#### <b>NumPy basics</b>

Arrays are designed to:
* <b>handle vectorized operations (lists cannot do that)</b>
    - if you apply a function it is performed on every item in the array, rather than on the whole array object
    - both arrays and lists have 0-based indexing
* <b>store multiple items of the same data type</b>
* <b>handle missing values </b>
    - missing numerical values are represented using the `np.nan` object (not a number)
    - the object `np.inf` represents infinite  
* <b>have an unchangeable size</b>
    - array size cannot be changed, should create a new array if you want to change the size
    - you know when you create the array how much space you need for it and that will not change  
* <b>have efficient memory usage</b>
    - an equivalent numpy array occupies much less space than a python list of lists

#### <b>Basic array attributes:</b>
* shape: array dimension - tuple with the number of elements in each dimension
* size: Number of elements in array
* ndim: Number of array dimension (len(arr.shape))
* dtype: Data-type of the array

#### <b>Importing NumPy
The recommended convention to import numpy is to use the <b>np</b> alias:

In [None]:
import numpy as np


##### -----

In [None]:
# all functionality available in numpy
# dir(np)


##### -----

#### <b>Documentation and help
https://numpy.org/doc/

In [None]:
# np.lookfor('sum') 

In [None]:
np.me*?

In [None]:
# np.mean?

In [None]:
# help(np.mean)

#### <b>Motivating example</b> - transform temperatures from Celsius to Farenheit

In [None]:
temp_list_C = [-20, 25, 3, 10]

In [None]:
# using lists we need a loop to apply the formula to 
# each element of the list

temp_list_F = []

for temp in temp_list_C:
    temp_list_F.append(temp * 1.8 + 32)

temp_list_F

In [None]:
# using arrays we can apply the formula directly to the array and 
# it will be applied to each element

temp_array_C = np.array(temp_list_C)
temp_array_C

In [None]:
temp_array_F = temp_array_C * 1.8 + 32
temp_array_F

#### <b>Functions for creating arrays</b>
https://docs.scipy.org/doc/numpy-1.13.0/user/basics.creation.html

##### np.array() - array from lists - e.g. 2D array from a list of lists

In [None]:
# help(np.array)




##### -----

In [None]:
# all functionality of a numpy array
# dir(np.array([1]))

'T', 
 'all',
 'any',
 'argmax',
 'argmin',
 'argpartition',
 'argsort',
 'astype',
 'base',
 'byteswap',
 'choose',
 'clip',
 'compress',
 'conj',
 'conjugate',
 'copy',
 'ctypes',
 'cumprod',
 'cumsum',
 'data',
 'diagonal',
 'dot',
 'dtype',
 'dump',
 'dumps',
 'fill',
 'flags',
 'flat',
 'flatten',
 'getfield',
 'imag',
 'item',
 'itemset',
 'itemsize',
 'max',
 'mean',
 'min',
 'nbytes',
 'ndim',
 'newbyteorder',
 'nonzero',
 'partition',
 'prod',
 'ptp',
 'put',
 'ravel',
 'real',
 'repeat',
 'reshape',
 'resize',
 'round',
 'searchsorted',
 'setfield',
 'setflags',
 'shape',
 'size',
 'sort',
 'squeeze',
 'std',
 'strides',
 'sum',
 'swapaxes',
 'take',
 'tobytes',
 'tofile',
 'tolist',
 'tostring',
 'trace',
 'transpose',
 'var',
 'view'


##### -----

##### np.arange() - vector of evenly spaced values form a range (arange) given by start, stop and step

In [None]:
# help(np.arange)



##### np.linspace() - vector of evenly spaced values (known number, linspace) given by start, stop and number of points

In [None]:
# help(np.linspace)



##### np.zeros() - array of zeros (e.g. 3D array), there is also a np.ones()

In [None]:
# help(np.zeros)



##### More functions to create special arrays:      
    np.identity(n) - 2D square array filled with 1 on the diagonal      
    np.eye(n,m) - 2D array filled with 1 on the diagonal      
    np.full((n,m), val) - array filled with a given value     

#### <b>Basic array attributes:</b>
* shape: array dimension
* size: Number of elements in array
* ndim: Number of array dimension (len(arr.shape))
* dtype: Data-type of the array

In [None]:
# nested lists give us multi dimensional arrays

matrix = np.array([[1,2,3],[4,5,6]])
matrix

In [None]:
# dir(matrix)

In [None]:
# .size - length of array



In [None]:
# .shape tells us the size on each dimension and implicit the number of dimensions



In [None]:
# .ndim - number of array dimensions



In [None]:
# .dtype - type of the dsata stored in the array



In [None]:
matrix

In [None]:
# .T - transpose of the array (rows and columns switched)


#### <b>Reshaping</b> - changing the numbers of rows and columns - data and size stay the same

In [None]:
# .reshape((n,m)) - Reshaping



#### <b>Indexing/Slicing(subsetting): [][] or [,]</b>
___
<img src = "http://scipy-lectures.org/_images/numpy_indexing.png" width = 400/>

In [None]:
matrix = np.full((6,6),range(6)) + 10 * np.full((6,6),range(6)).T
matrix

#### Indexing/Slicing

In [None]:
# [][] - List-like 




In [None]:
# [,] - Using both rows and columns indices to get a value


In [None]:
matrix_reshaped

In [None]:
# Using both rows and columns indices to get a sub-matrix

matrix_reshaped[:2,:3]

In [None]:
# Fun arrays - display a checkers_board list
checkers_board = np.zeros((6,6),dtype=int)
print(checkers_board)

In [None]:
checkers_board[1::2,::2] = 1
print(checkers_board)

In [None]:
checkers_board[::2,1::2] = 1
print(checkers_board)

#### Array of indices subsetting - use array/list of indices to subset array with only the elements given by the indices

In [None]:
matrix 

In [None]:
indices = [0,2,3]
matrix[indices,]

In [None]:
# columns



#### conditional subsetting - use array of booleans to subset array with only the elements where the bool array is True

In [None]:
matrix

In [None]:
# conditional subsetting
matrix[(matrix[:,0] > 20)]

In [None]:
# deconstruct



In [None]:
matrix

In [None]:
# multiple conditions  
(matrix[:,0] > 20) & (matrix[:,0] <= 40)

#### <b>Matrix operations</b>

https://www.tutorialspoint.com/matrix-manipulation-in-python<br>
Arithmetic operators on arrays apply element-wise. <br> 
A new array is created and filled with the result.


#### <b>Array broadcasting</b><br>

https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html<br>
The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. <br>
Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.

<img src = "https://www.tutorialspoint.com/numpy/images/array.jpg" height=10/>


https://www.tutorialspoint.com/numpy/numpy_broadcasting.htm

In [None]:
matrix = np.arange(1,13).reshape(3,4)
matrix


In [None]:
# create an array with 4 values



In [None]:
# addition using a data row



In [None]:
####

In [None]:
# create an array with 3 values



In [None]:
matrix

In [None]:
# addition using a data column



In [None]:
##########

matrix


In [None]:
# column vector



In [None]:
# addition using a data row - error if dimensions do not match




In [None]:
##########

matrix

In [None]:
# column vec



In [None]:
# multiplication with a data column




#### Simple multiplication `*` of two matrices of the same shape results in the multiplication of the elements at the respective indices 
#### Mathematical matrix multiplication of two matrices (`n1 x m1`, `n2 x m2`) can be done using the `.dot` method or `@` operator but the dimensions need to be compatible: `m1 == n2` 
* the resulting matrix will be `n1 x m2`, it will have the number rows the same as `n1` and no cols the same `m2`
* each value in the resulting matrix is the sum of the product of the paired of elements from the respective row and column 

<img src = "https://miro.medium.com/max/1400/1*YGcMQSr0ge_DGn96WnEkZw.png" width = "400"/>
     
https://towardsdatascience.com/a-complete-beginners-guide-to-matrix-multiplication-for-data-science-with-python-numpy-9274ecfc1dc6
     

#### <b>More matrix computation</b> - basic aggregate functions are available - min, max, sum, mean

In [None]:
matrix

#### Use the axis argument to compute mean for each column or row
#### axis = 0 - columns
#### axis = 1 - rows

In [None]:
help(matrix.sum)

In [None]:
matrix

In [None]:
# col sum 




In [None]:
# row sum




https://www.w3resource.com/python-exercises/numpy/index.php


Create a matrix of 2 rows and 3 columns with every fifth number starting from 1 (e.g. 1,6,11,16,...)


In [None]:
matrix = np.arange(1, 2*3*5+1, 5).reshape(2,3)

matrix

#### <font color = "red">Exercise:</font>   


Normalize the values in the matrix to be between 0 and 1 (min-max normalization).     
Substract the minimum value and divide by the maximum value of the resulting values.

#### <font color = "red">Exercise:</font>   

Do the same normalization at the row level

#### <font color = "red">Exercise:</font>   


* Return the even numbers from the matrix.
* Try to return the indices of the even numbers  (hint: look at the where method).

In [None]:
# help(np.where)

In [None]:
matrix

In [None]:
pos = np.where(matrix == 3)
pos

In [None]:
matrix[pos]

#### RESOURCES

http://scipy-lectures.org/intro/numpy/array_object.html#what-are-numpy-and-numpy-arrays   
https://www.python-course.eu/numpy.php   
https://numpy.org/devdocs/user/quickstart.html#universal-functions   
https://www.geeksforgeeks.org/python-numpy/

_____

### Pandas
<img src = "https://upload.wikimedia.org/wikipedia/commons/e/ed/Pandas_logo.svg" width = 200/>

https://commons.wikimedia.org/wiki/File:Pandas_logo.svg

[Pandas](https://pandas.pydata.org/) is a high-performance library that makes familiar data structures, like `data.frame` from R, and appropriate data analysis tools available to Python users.

<img src = "https://media.geeksforgeeks.org/wp-content/uploads/finallpandas.png" width = 550/>

https://www.geeksforgeeks.org/python-pandas-dataframe/