## NumPy
### BIOINF 575 



_____


### NumPy - Numeric python <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/NumPy_logo.svg/1200px-NumPy_logo.svg.png" alt="NumPy logo" width = "100">

____
#### A list contains refences to each of the values.
#### An array refers to a block of memory containg all values one after the other.
- <b>that is why we need to know the size of the array and the array size cannot change <br>


<img src = "https://www.python-course.eu/images/list_structure.png" width = 350 /> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<img src = "https://www.python-course.eu/images/array_structure.png" width = 350 />
____

#### Arrays of different dimensions (`shape` gives the number of elements on each dimension):
<img src="https://raw.githubusercontent.com/elegant-scipy/elegant-scipy/master/figures/NumPy_ndarrays_v2.svg" alt="data structures" width="600">  

https://github.com/elegant-scipy/elegant-scipy
_____


#### <b>NumPy basics</b>

Arrays are designed to:
* <b>handle vectorized operations (lists cannot do that)</b>
    - if you apply a function it is performed on every item in the array, rather than on the whole array object
    - both arrays and lists have 0-based indexing
* <b>store multiple items of the same data type</b>
* <b>handle missing values </b>
    - missing numerical values are represented using the `np.nan` object (not a number)
    - the object `np.inf` represents infinite  
* <b>have an unchangeable size</b>
    - array size cannot be changed, should create a new array if you want to change the size
    - you know when you create the array how much space you need for it and that will not change  
* <b>have efficient memory usage</b>
    - an equivalent numpy array occupies much less space than a python list of lists

#### <b>Basic array attributes:</b>
* shape: array dimension - tuple with the number of elements in each dimension
* size: Number of elements in array
* ndim: Number of array dimension (len(arr.shape))
* dtype: Data-type of the array

#### <b>Importing NumPy
The recommended convention to import numpy is to use the <b>np</b> alias:

In [2]:
[1,2,3] + 5

TypeError: can only concatenate list (not "int") to list

In [4]:
[1,2,3] + [5,6,7]

[1, 2, 3, 5, 6, 7]

In [None]:
["A", 1, [1,2,3]]

In [6]:
import numpy as np


##### -----

In [10]:
# all functionality available in numpy
# dir(np)


##### -----

#### <b>Documentation and help
https://numpy.org/doc/

In [14]:
# np.lookfor('sum') 

In [16]:
np.me*?

np.mean
np.median
np.memmap
np.meshgrid

In [18]:
np.mean?

[0;31mSignature:[0m      
[0mnp[0m[0;34m.[0m[0mmean[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0ma[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0maxis[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdtype[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mout[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mkeepdims[0m[0;34m=[0m[0;34m<[0m[0mno[0m [0mvalue[0m[0;34m>[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0;34m*[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mwhere[0m[0;34m=[0m[0;34m<[0m[0mno[0m [0mvalue[0m[0;34m>[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mCall signature:[0m  [0mnp[0m[0;34m.[0m[0mmean[0m[0;34m([0m[0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mType:[0m            _ArrayFunctionDispatcher
[0;31mString form:[0m     <function mean at 0x1080f7ba0>
[0;31mFile:[0

In [20]:
help(np.mean)

Help on _ArrayFunctionDispatcher in module numpy:

mean(a, axis=None, dtype=None, out=None, keepdims=<no value>, *, where=<no value>)
    Compute the arithmetic mean along the specified axis.

    Returns the average of the array elements.  The average is taken over
    the flattened array by default, otherwise over the specified axis.
    `float64` intermediate and return values are used for integer inputs.

    Parameters
    ----------
    a : array_like
        Array containing numbers whose mean is desired. If `a` is not an
        array, a conversion is attempted.
    axis : None or int or tuple of ints, optional
        Axis or axes along which the means are computed. The default is to
        compute the mean of the flattened array.

        .. versionadded:: 1.7.0

        If this is a tuple of ints, a mean is performed over multiple axes,
        instead of a single axis or all the axes as before.
    dtype : data-type, optional
        Type to use in computing the mean.  For

#### <b>Motivating example</b> - transform temperatures from Celsius to Farenheit

In [24]:
temp_list_C = [-20, 25, 3, 10]

In [26]:
# using lists we need a loop to apply the formula to 
# each element of the list

temp_list_F = []

for temp in temp_list_C:
    temp_list_F.append(temp * 1.8 + 32)

temp_list_F

[-4.0, 77.0, 37.4, 50.0]

In [28]:
[temp * 1.8 + 32 for temp in temp_list_C]

[-4.0, 77.0, 37.4, 50.0]

In [30]:
# using arrays we can apply the formula directly to the array and 
# it will be applied to each element

temp_array_C = np.array(temp_list_C)
temp_array_C

array([-20,  25,   3,  10])

In [32]:
temp_array_F = temp_array_C * 1.8 + 32
temp_array_F

array([-4. , 77. , 37.4, 50. ])

In [38]:
mat = np.array([[1,2,3], [4,5,6]])
mat

array([[1, 2, 3],
       [4, 5, 6]])

#### <b>Functions for creating arrays</b>
https://docs.scipy.org/doc/numpy-1.13.0/user/basics.creation.html

##### np.array() - array from lists - e.g. 2D array from a list of lists

In [46]:
# help(np.array)
mat = np.array([[[1, 11],[2, 22],[3, 33]], [[4, 44],[5, 55],[6, 66]]])
mat


array([[[ 1, 11],
        [ 2, 22],
        [ 3, 33]],

       [[ 4, 44],
        [ 5, 55],
        [ 6, 66]]])

In [48]:
dir(mat)

['T',
 '__abs__',
 '__add__',
 '__and__',
 '__array__',
 '__array_finalize__',
 '__array_function__',
 '__array_interface__',
 '__array_prepare__',
 '__array_priority__',
 '__array_struct__',
 '__array_ufunc__',
 '__array_wrap__',
 '__bool__',
 '__buffer__',
 '__class__',
 '__class_getitem__',
 '__complex__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__divmod__',
 '__dlpack__',
 '__dlpack_device__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__iand__',
 '__ifloordiv__',
 '__ilshift__',
 '__imatmul__',
 '__imod__',
 '__imul__',
 '__index__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__irshift__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lshift__',
 '__lt__',
 '__matmul__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__neg__',
 '


##### -----

In [None]:
# all functionality of a numpy array
# dir(np.array([1]))

'T', 
 'all',
 'any',
 'argmax',
 'argmin',
 'argpartition',
 'argsort',
 'astype',
 'base',
 'byteswap',
 'choose',
 'clip',
 'compress',
 'conj',
 'conjugate',
 'copy',
 'ctypes',
 'cumprod',
 'cumsum',
 'data',
 'diagonal',
 'dot',
 'dtype',
 'dump',
 'dumps',
 'fill',
 'flags',
 'flat',
 'flatten',
 'getfield',
 'imag',
 'item',
 'itemset',
 'itemsize',
 'max',
 'mean',
 'min',
 'nbytes',
 'ndim',
 'newbyteorder',
 'nonzero',
 'partition',
 'prod',
 'ptp',
 'put',
 'ravel',
 'real',
 'repeat',
 'reshape',
 'resize',
 'round',
 'searchsorted',
 'setfield',
 'setflags',
 'shape',
 'size',
 'sort',
 'squeeze',
 'std',
 'strides',
 'sum',
 'swapaxes',
 'take',
 'tobytes',
 'tofile',
 'tolist',
 'tostring',
 'trace',
 'transpose',
 'var',
 'view'

In [50]:
mat = np.array([[1,2,3], [4,5,6]])
mat

array([[1, 2, 3],
       [4, 5, 6]])

In [52]:
mat.T

array([[1, 4],
       [2, 5],
       [3, 6]])


##### -----

##### np.arange() - vector of evenly spaced values form a range (arange) given by start, stop and step

In [54]:
# help(np.arange)

range(5)

range(0, 5)

In [56]:
np.arange(5)

array([0, 1, 2, 3, 4])

In [58]:
np.arange?

[0;31mDocstring:[0m
arange([start,] stop[, step,], dtype=None, *, like=None)

Return evenly spaced values within a given interval.

``arange`` can be called with a varying number of positional arguments:

* ``arange(stop)``: Values are generated within the half-open interval
  ``[0, stop)`` (in other words, the interval including `start` but
  excluding `stop`).
* ``arange(start, stop)``: Values are generated within the half-open
  interval ``[start, stop)``.
* ``arange(start, stop, step)`` Values are generated within the half-open
  interval ``[start, stop)``, with spacing between values given by
  ``step``.

For integer arguments the function is roughly equivalent to the Python
built-in :py:class:`range`, but returns an ndarray rather than a ``range``
instance.

When using a non-integer step, such as 0.1, it is often better to use
`numpy.linspace`.


Parameters
----------
start : integer or real, optional
    Start of interval.  The interval includes this value.  The default
    st

In [60]:
mat

array([[1, 2, 3],
       [4, 5, 6]])

In [62]:
mat.size

6

In [64]:
mat.ndim

2

In [66]:
mat.shape

(2, 3)

In [68]:
mat.dtype

dtype('int64')

In [74]:
a = np.array(["A", 1, 2, "T"])
a

array(['A', '1', '2', 'T'], dtype='<U21')

##### np.linspace() - vector of evenly spaced values (known number, linspace) given by start, stop and number of points

In [80]:
# help(np.linspace)

np.linspace(0,100, 5)

array([  0.,  25.,  50.,  75., 100.])

##### np.zeros() - array of zeros (e.g. 3D array), there is also a np.ones()

In [88]:
# help(np.zeros)

np.zeros(20,dtype = int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [92]:
np.zeros((7, 5),dtype = int)

array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])

##### More functions to create special arrays:      
    np.identity(n) - 2D square array filled with 1 on the diagonal      
    np.eye(n,m) - 2D array filled with 1 on the diagonal      
    np.full((n,m), val) - array filled with a given value     

In [96]:
np.identity(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [98]:
np.eye(3,4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.]])

In [100]:
np.eye(4,3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [0., 0., 0.]])

In [108]:
np.full((2,3), "C", dtype=str)

array([['C', 'C', 'C'],
       ['C', 'C', 'C']], dtype='<U1')

#### <b>Basic array attributes:</b>
* shape: array dimension
* size: Number of elements in array
* ndim: Number of array dimension (len(arr.shape))
* dtype: Data-type of the array

In [110]:
# nested lists give us multi dimensional arrays

matrix = np.array([[1,2,3],[4,5,6]])
matrix

array([[1, 2, 3],
       [4, 5, 6]])

In [None]:
# dir(matrix)

In [112]:
# .size - length of array
matrix.size


6

In [114]:
# .shape tells us the size on each dimension and implicit the number of dimensions

matrix.shape

(2, 3)

In [116]:
# .ndim - number of array dimensions

matrix.ndim

2

In [118]:
# .dtype - type of the dsata stored in the array

matrix.dtype

dtype('int64')

In [120]:
matrix

array([[1, 2, 3],
       [4, 5, 6]])

In [122]:
# .T - transpose of the array (rows and columns switched)
matrix.T

array([[1, 4],
       [2, 5],
       [3, 6]])

#### <b>Reshaping</b> - changing the numbers of rows and columns - data and size stay the same

In [124]:
# .reshape((n,m)) - Reshaping

matrix.reshape(6,1)

array([[1],
       [2],
       [3],
       [4],
       [5],
       [6]])

In [132]:
matrix.reshape(1,6)

array([[1, 2, 3, 4, 5, 6]])

In [128]:
matrix.reshape(6)

array([1, 2, 3, 4, 5, 6])

#### <b>Indexing/Slicing(subsetting): [][] or [,]</b>
___
<img src = "http://scipy-lectures.org/_images/numpy_indexing.png" width = 400/>

In [None]:
matrix = np.full((6,6),range(6)) + 10 * np.full((6,6),range(6)).T
matrix

#### Indexing/Slicing

In [144]:
# [][] - List-like 

np.full((6,6), [3,4,5,6,7,8])


array([[3, 4, 5, 6, 7, 8],
       [3, 4, 5, 6, 7, 8],
       [3, 4, 5, 6, 7, 8],
       [3, 4, 5, 6, 7, 8],
       [3, 4, 5, 6, 7, 8],
       [3, 4, 5, 6, 7, 8]])

In [146]:
np.full((6,6),range(6))

array([[0, 1, 2, 3, 4, 5],
       [0, 1, 2, 3, 4, 5],
       [0, 1, 2, 3, 4, 5],
       [0, 1, 2, 3, 4, 5],
       [0, 1, 2, 3, 4, 5],
       [0, 1, 2, 3, 4, 5]])

In [150]:
np.full((6,6),range(6)).T *10

array([[ 0,  0,  0,  0,  0,  0],
       [10, 10, 10, 10, 10, 10],
       [20, 20, 20, 20, 20, 20],
       [30, 30, 30, 30, 30, 30],
       [40, 40, 40, 40, 40, 40],
       [50, 50, 50, 50, 50, 50]])

In [156]:
matrix = np.full((6,6),range(6)).T *10 + np.full((6,6),range(6))
matrix

array([[ 0,  1,  2,  3,  4,  5],
       [10, 11, 12, 13, 14, 15],
       [20, 21, 22, 23, 24, 25],
       [30, 31, 32, 33, 34, 35],
       [40, 41, 42, 43, 44, 45],
       [50, 51, 52, 53, 54, 55]])

In [166]:
# [,] - Using both rows and columns indices to get a value
matrix[1:2,1:]

array([[11, 12, 13, 14, 15]])

In [168]:
matrix[1,1:]

array([11, 12, 13, 14, 15])

In [170]:
matrix.size

36

In [174]:
matrix_reshaped = matrix.reshape(4,9)
matrix_reshaped

array([[ 0,  1,  2,  3,  4,  5, 10, 11, 12],
       [13, 14, 15, 20, 21, 22, 23, 24, 25],
       [30, 31, 32, 33, 34, 35, 40, 41, 42],
       [43, 44, 45, 50, 51, 52, 53, 54, 55]])

In [176]:
matrix

array([[ 0,  1,  2,  3,  4,  5],
       [10, 11, 12, 13, 14, 15],
       [20, 21, 22, 23, 24, 25],
       [30, 31, 32, 33, 34, 35],
       [40, 41, 42, 43, 44, 45],
       [50, 51, 52, 53, 54, 55]])

In [178]:
# Using both rows and columns indices to get a sub-matrix

matrix_reshaped[:2,:3]

array([[ 0,  1,  2],
       [13, 14, 15]])

In [180]:
matrix_reshaped

array([[ 0,  1,  2,  3,  4,  5, 10, 11, 12],
       [13, 14, 15, 20, 21, 22, 23, 24, 25],
       [30, 31, 32, 33, 34, 35, 40, 41, 42],
       [43, 44, 45, 50, 51, 52, 53, 54, 55]])

In [182]:
# Fun arrays - display a checkers_board list
checkers_board = np.zeros((6,6),dtype=int)
print(checkers_board)

[[0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]]


In [184]:
checkers_board[1::2,::2] = 1
print(checkers_board)

[[0 0 0 0 0 0]
 [1 0 1 0 1 0]
 [0 0 0 0 0 0]
 [1 0 1 0 1 0]
 [0 0 0 0 0 0]
 [1 0 1 0 1 0]]


In [186]:
checkers_board[::2,1::2] = 1
print(checkers_board)

[[0 1 0 1 0 1]
 [1 0 1 0 1 0]
 [0 1 0 1 0 1]
 [1 0 1 0 1 0]
 [0 1 0 1 0 1]
 [1 0 1 0 1 0]]


In [188]:
checkers_board[::2,1::2] += 1
print(checkers_board)

[[0 2 0 2 0 2]
 [1 0 1 0 1 0]
 [0 2 0 2 0 2]
 [1 0 1 0 1 0]
 [0 2 0 2 0 2]
 [1 0 1 0 1 0]]


#### Array of indices subsetting - use array/list of indices to subset array with only the elements given by the indices

In [192]:
matrix 

array([[ 0,  1,  2,  3,  4,  5],
       [10, 11, 12, 13, 14, 15],
       [20, 21, 22, 23, 24, 25],
       [30, 31, 32, 33, 34, 35],
       [40, 41, 42, 43, 44, 45],
       [50, 51, 52, 53, 54, 55]])

In [190]:
indices = [0,2,3]
matrix[indices,]

array([[ 0,  1,  2,  3,  4,  5],
       [20, 21, 22, 23, 24, 25],
       [30, 31, 32, 33, 34, 35]])

In [196]:
# columns

matrix[:,]

array([[ 0,  1,  2,  3,  4,  5],
       [10, 11, 12, 13, 14, 15],
       [20, 21, 22, 23, 24, 25],
       [30, 31, 32, 33, 34, 35],
       [40, 41, 42, 43, 44, 45],
       [50, 51, 52, 53, 54, 55]])

#### conditional subsetting - use array of booleans to subset array with only the elements where the bool array is True

In [198]:
matrix

array([[ 0,  1,  2,  3,  4,  5],
       [10, 11, 12, 13, 14, 15],
       [20, 21, 22, 23, 24, 25],
       [30, 31, 32, 33, 34, 35],
       [40, 41, 42, 43, 44, 45],
       [50, 51, 52, 53, 54, 55]])

In [200]:
matrix[:,0]

array([ 0, 10, 20, 30, 40, 50])

In [None]:
# deconstruct



In [202]:
matrix[:,0] > 20

array([False, False, False,  True,  True,  True])

In [None]:
# conditional subsetting
matrix[(matrix[:,0] > 20)]

In [204]:
matrix[[False, False, False,  True,  True,  True],]

array([[30, 31, 32, 33, 34, 35],
       [40, 41, 42, 43, 44, 45],
       [50, 51, 52, 53, 54, 55]])

In [None]:
matrix

In [208]:
(matrix[:,0] > 20) and (matrix[:,0] <= 40)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In [212]:
(matrix[:,0] > 20)

array([False, False, False,  True,  True,  True])

In [214]:
(matrix[:,0] <= 40)

array([ True,  True,  True,  True,  True, False])

In [210]:
[1,2,3] and 5

5

In [216]:
# multiple conditions  
(matrix[:,0] > 20) & (matrix[:,0] <= 40)

array([False, False, False,  True,  True, False])

#### <b>Matrix operations</b>

https://www.tutorialspoint.com/matrix-manipulation-in-python<br>
Arithmetic operators on arrays apply element-wise. <br> 
A new array is created and filled with the result.


#### <b>Array broadcasting</b><br>

https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html<br>
The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. <br>
Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.

<img src = "https://www.tutorialspoint.com/numpy/images/array.jpg" height=10/>


https://www.tutorialspoint.com/numpy/numpy_broadcasting.htm

In [218]:
matrix = np.arange(1,13).reshape(3,4)
matrix


array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [228]:
# create an array with 4 values
v = np.array([10,20,30,40])


In [230]:
# addition using a data row
matrix + v


array([[11, 22, 33, 44],
       [15, 26, 37, 48],
       [19, 30, 41, 52]])

In [None]:
####

In [232]:
# create an array with 3 values
v = np.array([10,20,30])


In [234]:
matrix

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [236]:
# addition using a data column
matrix + v


ValueError: operands could not be broadcast together with shapes (3,4) (3,) 

In [242]:
##########

matrix


array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [244]:
# column vector
v = np.array([10,20,30]).reshape(3,1)
v


array([[10],
       [20],
       [30]])

In [246]:
# addition using a data row - error if dimensions do not match

matrix + v


array([[11, 12, 13, 14],
       [25, 26, 27, 28],
       [39, 40, 41, 42]])

In [None]:
##########

matrix

In [248]:
# column vec
v


array([[10],
       [20],
       [30]])

In [250]:
# multiplication with a data column

matrix * v


array([[ 10,  20,  30,  40],
       [100, 120, 140, 160],
       [270, 300, 330, 360]])

In [252]:
np.array([[1,2],[3,4]]) * np.array([[10,20],[30,40]])

array([[ 10,  40],
       [ 90, 160]])

#### Simple multiplication `*` of two matrices of the same shape results in the multiplication of the elements at the respective indices 
#### Mathematical matrix multiplication of two matrices (`n1 x m1`, `n2 x m2`) can be done using the `.dot` method or `@` operator but the dimensions need to be compatible: `m1 == n2` 
* the resulting matrix will be `n1 x m2`, it will have the number rows the same as `n1` and no cols the same `m2`
* each value in the resulting matrix is the sum of the product of the paired of elements from the respective row and column 

<img src = "https://miro.medium.com/max/1400/1*YGcMQSr0ge_DGn96WnEkZw.png" width = "400"/>
     
https://towardsdatascience.com/a-complete-beginners-guide-to-matrix-multiplication-for-data-science-with-python-numpy-9274ecfc1dc6
     

In [262]:
np.array([[1,2,3],[4,5,6]]) @ np.array([[10,11],[20,21], [30,31]])

array([[140, 146],
       [320, 335]])

#### <b>More matrix computation</b> - basic aggregate functions are available - min, max, sum, mean

In [None]:
matrix

#### Use the axis argument to compute mean for each column or row
#### axis = 0 - columns
#### axis = 1 - rows

In [264]:
help(matrix.sum)

Help on built-in function sum:

sum(...) method of numpy.ndarray instance
    a.sum(axis=None, dtype=None, out=None, keepdims=False, initial=0, where=True)

    Return the sum of the array elements over the given axis.

    Refer to `numpy.sum` for full documentation.

    See Also
    --------
    numpy.sum : equivalent function



In [266]:
matrix

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [268]:
matrix.sum()

78

In [270]:
matrix.sum(axis = 0)

array([15, 18, 21, 24])

In [272]:
matrix.sum(1)

array([10, 26, 42])

In [274]:
# col sum 

matrix.sum(0)


array([15, 18, 21, 24])

In [276]:
# row sum

matrix.sum(axis = 1)



array([10, 26, 42])

https://www.w3resource.com/python-exercises/numpy/index.php


Create a matrix of 2 rows and 3 columns with every fifth number starting from 1 (e.g. 1,6,11,16,...)


In [278]:
matrix = np.arange(1, 2*3*5+1, 5).reshape(2,3)

matrix

array([[ 1,  6, 11],
       [16, 21, 26]])

#### <font color = "red">Exercise:</font>   


Normalize the values in the matrix to be between 0 and 1 (min-max normalization).     
Substract the minimum value and divide by the maximum value of the resulting values.

In [284]:
np.max*?

np.max
np.maximum
np.maximum_sctype

In [294]:
m = matrix - matrix.min()
m = m/m.max()
m

array([[0. , 0.2, 0.4],
       [0.6, 0.8, 1. ]])

#### <font color = "red">Exercise:</font>   

Do the same normalization at the row level

In [300]:
matrix.min(1)

array([ 1, 16])

In [316]:
m = matrix - matrix.min(1).reshape(matrix.shape[0],1)
m

array([[ 0,  5, 10],
       [ 0,  5, 10]])

In [318]:
m = m/m.max(1).reshape(matrix.shape[0],1)
m

array([[0. , 0.5, 1. ],
       [0. , 0.5, 1. ]])

#### <font color = "red">Exercise:</font>   


* Return the even numbers from the matrix.
* Try to return the indices of the even numbers  (hint: look at the where method).

In [323]:
# help(np.where)

In [325]:
matrix

array([[ 1,  6, 11],
       [16, 21, 26]])

In [331]:
pos = np.where(matrix == 11)
pos

(array([0]), array([2]))

In [333]:
matrix[pos]

array([11])

In [343]:
matrix[matrix % 2 == 0]

array([ 6, 16, 26])

In [341]:
matrix % 2 == 0

array([[False,  True, False],
       [ True, False,  True]])

In [347]:
matrix[np.where(matrix % 2 == 0)]

array([ 6, 16, 26])

In [1]:
# Create 2 4X5 matrices and do the matrix multiplications

import numpy as np

In [7]:
x = np.full((4,5),2)
x

array([[2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2]])

In [13]:
y = np.arange(20).reshape(4,5)
y

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [17]:
x * y

array([[ 0,  2,  4,  6,  8],
       [10, 12, 14, 16, 18],
       [20, 22, 24, 26, 28],
       [30, 32, 34, 36, 38]])

In [19]:
x @ y

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 4 is different from 5)

In [21]:
x @ y.T

array([[ 20,  70, 120, 170],
       [ 20,  70, 120, 170],
       [ 20,  70, 120, 170],
       [ 20,  70, 120, 170]])

In [23]:
y.T

array([[ 0,  5, 10, 15],
       [ 1,  6, 11, 16],
       [ 2,  7, 12, 17],
       [ 3,  8, 13, 18],
       [ 4,  9, 14, 19]])

In [25]:
y.shape

(4, 5)

In [27]:
y.ndim

2

In [29]:
y.size

20

In [33]:
# dir(y)

In [35]:
len(y)

4

In [37]:
y

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

#### RESOURCES

http://scipy-lectures.org/intro/numpy/array_object.html#what-are-numpy-and-numpy-arrays   
https://www.python-course.eu/numpy.php   
https://numpy.org/devdocs/user/quickstart.html#universal-functions   
https://www.geeksforgeeks.org/python-numpy/

_____

### Pandas
<img src = "https://upload.wikimedia.org/wikipedia/commons/e/ed/Pandas_logo.svg" width = 200/>

https://commons.wikimedia.org/wiki/File:Pandas_logo.svg

[Pandas](https://pandas.pydata.org/) is a high-performance library that makes familiar data structures, like `data.frame` from R, and appropriate data analysis tools available to Python users.

<img src = "https://media.geeksforgeeks.org/wp-content/uploads/finallpandas.png" width = 550/>

https://www.geeksforgeeks.org/python-pandas-dataframe/