## NumPy
### BIOINF 575 - Fall 2022



In [1]:
[1,2,3] + [4,5,6]

[1, 2, 3, 4, 5, 6]

_____


### NumPy - Numeric python <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/NumPy_logo.svg/1200px-NumPy_logo.svg.png" alt="NumPy logo" width = "100">

____
#### A list contains refences to each of the values.
#### An array refers to a block of memory containg all values one after the other.
- <b>that is why we need to know the size of the array and the array size cannot change <br>


<img src = "https://www.python-course.eu/images/list_structure.png" width = 350 /> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<img src = "https://www.python-course.eu/images/array_structure.png" width = 350 />
____

#### Arrays of different dimensions (`shape` gives the number of elements on each dimension):
<img src="https://raw.githubusercontent.com/elegant-scipy/elegant-scipy/master/figures/NumPy_ndarrays_v2.svg" alt="data structures" width="600">  

https://github.com/elegant-scipy/elegant-scipy
_____


#### <b>NumPy basics</b>

Arrays are designed to:
* <b>handle vectorized operations (lists cannot do that)</b>
    - if you apply a function it is performed on every item in the array, rather than on the whole array object
    - both arrays and lists have 0-based indexing
* <b>store multiple items of the same data type</b>
* <b>handle missing values </b>
    - missing numerical values are represented using the `np.nan` object (not a number)
    - the object `np.inf` represents infinite  
* <b>have an unchangeable size</b>
    - array size cannot be changed, should create a new array if you want to change the size
    - you know when you create the array how much space you need for it and that will not change  
* <b>have efficient memory usage</b>
    - an equivalent numpy array occupies much less space than a python list of lists

#### <b>Basic array attributes:</b>
* shape: array dimension - tuple with the number of elements in each dimension
* size: Number of elements in array
* ndim: Number of array dimension (len(arr.shape))
* dtype: Data-type of the array

#### <b>Importing NumPy
The recommended convention to import numpy is to use the <b>np</b> alias:

In [2]:
import numpy as np


##### -----

In [4]:
# all functionality available in numpy
# dir(np)


##### -----

#### <b>Documentation and help
https://numpy.org/doc/

In [5]:
np.lookfor('sum') 

Search results for 'sum'
------------------------
numpy.sum
    Sum of array elements over a given axis.
numpy.cumsum
    Return the cumulative sum of the elements along a given axis.
numpy.einsum
    einsum(subscripts, *operands, out=None, dtype=None, order='K',
numpy.nansum
    Return the sum of array elements over a given axis treating Not a
numpy.nancumsum
    Return the cumulative sum of array elements over a given axis treating Not a
numpy.einsum_path
    Evaluates the lowest cost contraction order for an einsum expression by
numpy.trace
    Return the sum along diagonals of the array.
numpy.ma.sum
    Return the sum of the array elements over the given axis.
numpy.Bytes0.sum
    Scalar method identical to the corresponding array attribute.
numpy.polyadd
    Find the sum of two polynomials.
numpy.ma.cumsum
    Return the cumulative sum of the array elements over the given axis.
numpy.logaddexp
    Logarithm of the sum of exponentiations of the inputs.
numpy.Bytes0.cumsum
    Scal

In [6]:
np.me*?

np.mean
np.median
np.memmap
np.meshgrid

In [7]:
np.mean?

[0;31mSignature:[0m
[0mnp[0m[0;34m.[0m[0mmean[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0ma[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0maxis[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdtype[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mout[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mkeepdims[0m[0;34m=[0m[0;34m<[0m[0mno[0m [0mvalue[0m[0;34m>[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0;34m*[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mwhere[0m[0;34m=[0m[0;34m<[0m[0mno[0m [0mvalue[0m[0;34m>[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Compute the arithmetic mean along the specified axis.

Returns the average of the array elements.  The average is taken over
the flattened array by default, otherwise over the specified axis.
`float64` intermediate and return values are used for integer inputs.

Parameters
----------
a :

In [8]:
help(np.mean)

Help on function mean in module numpy:

mean(a, axis=None, dtype=None, out=None, keepdims=<no value>, *, where=<no value>)
    Compute the arithmetic mean along the specified axis.
    
    Returns the average of the array elements.  The average is taken over
    the flattened array by default, otherwise over the specified axis.
    `float64` intermediate and return values are used for integer inputs.
    
    Parameters
    ----------
    a : array_like
        Array containing numbers whose mean is desired. If `a` is not an
        array, a conversion is attempted.
    axis : None or int or tuple of ints, optional
        Axis or axes along which the means are computed. The default is to
        compute the mean of the flattened array.
    
        .. versionadded:: 1.7.0
    
        If this is a tuple of ints, a mean is performed over multiple axes,
        instead of a single axis or all the axes as before.
    dtype : data-type, optional
        Type to use in computing the mean.

#### <b>Motivating example</b> - transform temperatures from Celsius to Farenheit

In [11]:
temp_list_C = [-20, 25, 3, 10]

In [12]:
# using lists we need a loop to apply the formula to 
# each element of the list

temp_list_F = []

for temp in temp_list_C:
    temp_list_F.append(temp * 1.8 + 32)

temp_list_F

[-4.0, 77.0, 37.4, 50.0]

In [14]:
list(map(lambda x: x * 1.8 + 32, temp_list_C))

[-4.0, 77.0, 37.4, 50.0]

In [15]:
temp_list_C * 1.8 + 32

TypeError: can't multiply sequence by non-int of type 'float'

In [16]:
# using arrays we can apply the formula directly to the array and 
# it will be applied to each element

temp_array_C = np.array(temp_list_C)
temp_array_C

array([-20,  25,   3,  10])

In [17]:
temp_array_F = temp_array_C * 1.8 + 32
temp_array_F

array([-4. , 77. , 37.4, 50. ])

#### <b>Functions for creating arrays</b>
https://docs.scipy.org/doc/numpy-1.13.0/user/basics.creation.html

##### np.array() - array from lists - e.g. 2D array from a list of lists

In [18]:
help(np.array)



Help on built-in function array in module numpy:

array(...)
    array(object, dtype=None, *, copy=True, order='K', subok=False, ndmin=0,
          like=None)
    
    Create an array.
    
    Parameters
    ----------
    object : array_like
        An array, any object exposing the array interface, an object whose
        __array__ method returns an array, or any (nested) sequence.
    dtype : data-type, optional
        The desired data-type for the array.  If not given, then the type will
        be determined as the minimum type required to hold the objects in the
        sequence.
    copy : bool, optional
        If true (default), then the object is copied.  Otherwise, a copy will
        only be made if __array__ returns a copy, if obj is a nested sequence,
        or if a copy is needed to satisfy any of the other requirements
        (`dtype`, `order`, etc.).
    order : {'K', 'A', 'C', 'F'}, optional
        Specify the memory layout of the array. If object is not an array


##### -----

In [None]:
# all functionality of a numpy array
# dir(np.array([1]))

'T', 
 'all',
 'any',
 'argmax',
 'argmin',
 'argpartition',
 'argsort',
 'astype',
 'base',
 'byteswap',
 'choose',
 'clip',
 'compress',
 'conj',
 'conjugate',
 'copy',
 'ctypes',
 'cumprod',
 'cumsum',
 'data',
 'diagonal',
 'dot',
 'dtype',
 'dump',
 'dumps',
 'fill',
 'flags',
 'flat',
 'flatten',
 'getfield',
 'imag',
 'item',
 'itemset',
 'itemsize',
 'max',
 'mean',
 'min',
 'nbytes',
 'ndim',
 'newbyteorder',
 'nonzero',
 'partition',
 'prod',
 'ptp',
 'put',
 'ravel',
 'real',
 'repeat',
 'reshape',
 'resize',
 'round',
 'searchsorted',
 'setfield',
 'setflags',
 'shape',
 'size',
 'sort',
 'squeeze',
 'std',
 'strides',
 'sum',
 'swapaxes',
 'take',
 'tobytes',
 'tofile',
 'tolist',
 'tostring',
 'trace',
 'transpose',
 'var',
 'view'


##### -----

In [19]:
a = np.array([1,2,3])

In [20]:
a

array([1, 2, 3])

In [21]:
a.shape

(3,)

In [22]:
a.ndim

1

In [23]:
a.size

3

In [24]:
a.dtype

dtype('int64')

In [25]:
a2D = np.array([[1,2],[3,4],[5,6]])

In [26]:
a2D

array([[1, 2],
       [3, 4],
       [5, 6]])

In [27]:
a2D.ndim

2

In [28]:
a2D.shape

(3, 2)

In [29]:
a2D.size

6

In [30]:
a2D.dtype

dtype('int64')

In [31]:
a_general = np.array([2,"A",4,6])

In [32]:
a_general

array(['2', 'A', '4', '6'], dtype='<U21')

In [33]:
a_general.dtype

dtype('<U21')

##### np.arange() - vector of evenly spaced values form a range (arange) given by start, stop and step

In [35]:
list(range(1,20,2))

[1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

In [36]:
# help(np.arange)

np.arange(1,20,2)

array([ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19])

##### np.linspace() - vector of evenly spaced values (known number, linspace) given by start, stop and number of points

In [37]:
# help(np.linspace)

np.linspace(1,21,5)

array([ 1.,  6., 11., 16., 21.])

##### np.zeros() - array of zeros (e.g. 3D array), there is also a np.ones()

In [38]:
 help(np.zeros)



Help on built-in function zeros in module numpy:

zeros(...)
    zeros(shape, dtype=float, order='C', *, like=None)
    
    Return a new array of given shape and type, filled with zeros.
    
    Parameters
    ----------
    shape : int or tuple of ints
        Shape of the new array, e.g., ``(2, 3)`` or ``2``.
    dtype : data-type, optional
        The desired data-type for the array, e.g., `numpy.int8`.  Default is
        `numpy.float64`.
    order : {'C', 'F'}, optional, default: 'C'
        Whether to store multi-dimensional data in row-major
        (C-style) or column-major (Fortran-style) order in
        memory.
    like : array_like
        Reference object to allow the creation of arrays which are not
        NumPy arrays. If an array-like passed in as ``like`` supports
        the ``__array_function__`` protocol, the result will be defined
        by it. In this case, it ensures the creation of an array object
        compatible with that passed in via this argument.
   

##### More functions to create special arrays:      
    np.identity(n) - 2D square array filled with 1 on the diagonal      
    np.eye(n,m) - 2D array filled with 1 on the diagonal      
    np.full((n,m), val) - array filled with a given value     

In [39]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [41]:
np.zeros((5,6))

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [43]:
np.ones((4,3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

#### <b>Basic array attributes:</b>
* shape: array dimension
* size: Number of elements in array
* ndim: Number of array dimension (len(arr.shape))
* dtype: Data-type of the array

In [44]:
# nested lists give us multi dimensional arrays

matrix = np.array([[1,2,3],[4,5,6]])
matrix

array([[1, 2, 3],
       [4, 5, 6]])

In [45]:
dir(matrix)

['T',
 '__abs__',
 '__add__',
 '__and__',
 '__array__',
 '__array_finalize__',
 '__array_function__',
 '__array_interface__',
 '__array_prepare__',
 '__array_priority__',
 '__array_struct__',
 '__array_ufunc__',
 '__array_wrap__',
 '__bool__',
 '__class__',
 '__complex__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__iand__',
 '__ifloordiv__',
 '__ilshift__',
 '__imatmul__',
 '__imod__',
 '__imul__',
 '__index__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__irshift__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lshift__',
 '__lt__',
 '__matmul__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__or__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdivmod__',
 '__

In [46]:
# .size - length of array
matrix.size


6

In [47]:
# .shape tells us the size on each dimension and implicit the number of dimensions

matrix.shape

(2, 3)

In [48]:
# .ndim - number of array dimensions
matrix.ndim


2

In [49]:
# .dtype - type of the dsata stored in the array

matrix.dtype

dtype('int64')

In [50]:
matrix

array([[1, 2, 3],
       [4, 5, 6]])

In [51]:
# .T - transpose of the array (rows and columns switched)

matrix.T

array([[1, 4],
       [2, 5],
       [3, 6]])

#### <b>Reshaping</b> - changing the numbers of rows and columns - data and size stay the same

In [52]:
# .reshape((n,m)) - Reshaping - has to be compatible with the size

matrix

array([[1, 2, 3],
       [4, 5, 6]])

In [53]:
matrix.reshape(2,1)

ValueError: cannot reshape array of size 6 into shape (2,1)

In [55]:
matrix.reshape((3,2))

array([[1, 2],
       [3, 4],
       [5, 6]])

In [56]:
matrix.reshape(3,2)

array([[1, 2],
       [3, 4],
       [5, 6]])

In [57]:
test_list = ["A", "B", "C"]
test_list[1]

'B'

In [58]:
test_list[:2]

['A', 'B']

In [60]:
# fills an array of a given shape with a given value
np.full((4,5), 100)

array([[100, 100, 100, 100, 100],
       [100, 100, 100, 100, 100],
       [100, 100, 100, 100, 100],
       [100, 100, 100, 100, 100]])

#### <b>Indexing/Slicing(subsetting): [][] or [,]</b>
___
<img src = "http://scipy-lectures.org/_images/numpy_indexing.png" width = 400/>

In [69]:
matrix = np.full((6,6),range(6)) + 10 * np.full((6,6),range(6)).T
matrix

array([[ 0,  1,  2,  3,  4,  5],
       [10, 11, 12, 13, 14, 15],
       [20, 21, 22, 23, 24, 25],
       [30, 31, 32, 33, 34, 35],
       [40, 41, 42, 43, 44, 45],
       [50, 51, 52, 53, 54, 55]])

In [62]:
np.full((6,6),range(6))

array([[0, 1, 2, 3, 4, 5],
       [0, 1, 2, 3, 4, 5],
       [0, 1, 2, 3, 4, 5],
       [0, 1, 2, 3, 4, 5],
       [0, 1, 2, 3, 4, 5],
       [0, 1, 2, 3, 4, 5]])

In [63]:
np.full((6,6),range(6)) + 10

array([[10, 11, 12, 13, 14, 15],
       [10, 11, 12, 13, 14, 15],
       [10, 11, 12, 13, 14, 15],
       [10, 11, 12, 13, 14, 15],
       [10, 11, 12, 13, 14, 15],
       [10, 11, 12, 13, 14, 15]])

In [65]:
np.full((6,6),range(6)).T

array([[0, 0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3, 3],
       [4, 4, 4, 4, 4, 4],
       [5, 5, 5, 5, 5, 5]])

In [66]:
np.full((6,6),range(6)).T * 10

array([[ 0,  0,  0,  0,  0,  0],
       [10, 10, 10, 10, 10, 10],
       [20, 20, 20, 20, 20, 20],
       [30, 30, 30, 30, 30, 30],
       [40, 40, 40, 40, 40, 40],
       [50, 50, 50, 50, 50, 50]])

In [67]:
np.full((6,6),range(6)).T * 10 + np.full((6,6),range(6))

array([[ 0,  1,  2,  3,  4,  5],
       [10, 11, 12, 13, 14, 15],
       [20, 21, 22, 23, 24, 25],
       [30, 31, 32, 33, 34, 35],
       [40, 41, 42, 43, 44, 45],
       [50, 51, 52, 53, 54, 55]])

In [70]:
matrix

array([[ 0,  1,  2,  3,  4,  5],
       [10, 11, 12, 13, 14, 15],
       [20, 21, 22, 23, 24, 25],
       [30, 31, 32, 33, 34, 35],
       [40, 41, 42, 43, 44, 45],
       [50, 51, 52, 53, 54, 55]])

#### Indexing/Slicing

In [71]:
# [][] - List-like 

matrix[:2]


array([[ 0,  1,  2,  3,  4,  5],
       [10, 11, 12, 13, 14, 15]])

In [73]:
matrix[2][3]

23

In [74]:
# [,] - Using both rows and columns indices to get a value

matrix[2,3]

23

In [77]:
matrix[:2,:3]

array([[ 0,  1,  2],
       [10, 11, 12]])

In [78]:
matrix_reshaped = matrix.reshape(4,9)
matrix_reshaped

array([[ 0,  1,  2,  3,  4,  5, 10, 11, 12],
       [13, 14, 15, 20, 21, 22, 23, 24, 25],
       [30, 31, 32, 33, 34, 35, 40, 41, 42],
       [43, 44, 45, 50, 51, 52, 53, 54, 55]])

In [79]:
# Using both rows and columns indices to get a sub-matrix

matrix_reshaped[:2,:3]

array([[ 0,  1,  2],
       [13, 14, 15]])

In [80]:
# Fun arrays - display a checkers_board list
checkers_board = np.zeros((6,6),dtype=int)
print(checkers_board)

[[0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]]


In [81]:
checkers_board[1::2,::2] = 1
print(checkers_board)

[[0 0 0 0 0 0]
 [1 0 1 0 1 0]
 [0 0 0 0 0 0]
 [1 0 1 0 1 0]
 [0 0 0 0 0 0]
 [1 0 1 0 1 0]]


In [84]:
# change values in my array using subsetting or indexing
checkers_board[::2,1::2] = 2
print(checkers_board)

[[0 2 0 2 0 2]
 [1 0 1 0 1 0]
 [0 2 0 2 0 2]
 [1 0 1 0 1 0]
 [0 2 0 2 0 2]
 [1 0 1 0 1 0]]


#### Array of indices subsetting - use array/list of indices to subset array with only the elements given by the indices

In [85]:
matrix 

array([[ 0,  1,  2,  3,  4,  5],
       [10, 11, 12, 13, 14, 15],
       [20, 21, 22, 23, 24, 25],
       [30, 31, 32, 33, 34, 35],
       [40, 41, 42, 43, 44, 45],
       [50, 51, 52, 53, 54, 55]])

In [86]:
indices = [0,2,3]
matrix[indices,]

array([[ 0,  1,  2,  3,  4,  5],
       [20, 21, 22, 23, 24, 25],
       [30, 31, 32, 33, 34, 35]])

In [88]:
# columns

indices = [0,2,3]
matrix[:,indices]

array([[ 0,  2,  3],
       [10, 12, 13],
       [20, 22, 23],
       [30, 32, 33],
       [40, 42, 43],
       [50, 52, 53]])

#### conditional subsetting - use array of booleans to subset array with only the elements where the bool array is True

In [89]:
matrix

array([[ 0,  1,  2,  3,  4,  5],
       [10, 11, 12, 13, 14, 15],
       [20, 21, 22, 23, 24, 25],
       [30, 31, 32, 33, 34, 35],
       [40, 41, 42, 43, 44, 45],
       [50, 51, 52, 53, 54, 55]])

In [90]:
# conditional subsetting
matrix[(matrix[:,0] > 20)]

array([[30, 31, 32, 33, 34, 35],
       [40, 41, 42, 43, 44, 45],
       [50, 51, 52, 53, 54, 55]])

In [91]:
matrix[:,0]

array([ 0, 10, 20, 30, 40, 50])

In [92]:
matrix[:,0] > 20

array([False, False, False,  True,  True,  True])

In [93]:
# returns only the rows where the conition returns True
# The logical array has to have as many elements as we have rows in the matrix

la = matrix[:,0] > 20
matrix[la]


array([[30, 31, 32, 33, 34, 35],
       [40, 41, 42, 43, 44, 45],
       [50, 51, 52, 53, 54, 55]])

In [None]:
# deconstruct

# See above

In [94]:
matrix

array([[ 0,  1,  2,  3,  4,  5],
       [10, 11, 12, 13, 14, 15],
       [20, 21, 22, 23, 24, 25],
       [30, 31, 32, 33, 34, 35],
       [40, 41, 42, 43, 44, 45],
       [50, 51, 52, 53, 54, 55]])

In [95]:
matrix[:,0] > 20 and matrix[:,0] > 30

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In [96]:
(matrix[:,0] > 20)

array([False, False, False,  True,  True,  True])

In [97]:
(matrix[:,0] <= 40)

array([ True,  True,  True,  True,  True, False])

In [98]:
# multiple conditions  
(matrix[:,0] > 20) & (matrix[:,0] <= 40)

array([False, False, False,  True,  True, False])

#### <b>Matrix operations</b>

https://www.tutorialspoint.com/matrix-manipulation-in-python<br>
Arithmetic operators on arrays apply element-wise. <br> 
A new array is created and filled with the result.


#### <b>Array broadcasting</b><br>

https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html<br>
The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. <br>
Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.

<img src = "https://www.tutorialspoint.com/numpy/images/array.jpg" height=10/>


https://www.tutorialspoint.com/numpy/numpy_broadcasting.htm

In [99]:
matrix = np.arange(1,13).reshape(3,4)
matrix


array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [101]:
# create an array with 4 values
row_to_add = np.array([10,20,30])
row_to_add

array([10, 20, 30])

In [102]:
matrix + row_to_add

ValueError: operands could not be broadcast together with shapes (3,4) (3,) 

In [103]:
# addition using a data row
row_to_add = np.array([10,20,30, 40])
matrix + row_to_add

array([[11, 22, 33, 44],
       [15, 26, 37, 48],
       [19, 30, 41, 52]])

In [104]:
matrix


array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [105]:
row_to_add

array([10, 20, 30, 40])

In [None]:
####

In [106]:
# create an array with 3 values
column_to_add = np.array([10,20,30])
column_to_add

array([10, 20, 30])

In [107]:
matrix

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [108]:
# addition using a data column

matrix + column_to_add

ValueError: operands could not be broadcast together with shapes (3,4) (3,) 

In [109]:
##########

matrix


array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [110]:
column_to_add.shape

(3,)

In [111]:
# column vector

column_to_add = column_to_add.reshape(3,1)
column_to_add

array([[10],
       [20],
       [30]])

In [112]:
column_to_add.shape

(3, 1)

In [113]:
# addition using a data row - error if dimensions do not match

matrix + column_to_add


array([[11, 12, 13, 14],
       [25, 26, 27, 28],
       [39, 40, 41, 42]])

In [114]:
##########

matrix

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [115]:
# column vec
column_to_add


array([[10],
       [20],
       [30]])

In [116]:
# multiplication with a data column

matrix * column_to_add


array([[ 10,  20,  30,  40],
       [100, 120, 140, 160],
       [270, 300, 330, 360]])

#### Simple multiplication `*` of two matrices of the same shape results in the multiplication of the elements at the respective indices 
#### Mathematical matrix multiplication of two matrices (`n1 x m1`, `n2 x m2`) can be done using the `.dot` method or `@` operator but the dimensions need to be compatible: `m1 == n2` 
* the resulting matrix will be `n1 x m2`, it will have the number rows the same as `n1` and no cols the same `m2`
* each value in the resulting matrix is the sum of the product of the paired of elements from the respective row and column 

<img src = "https://miro.medium.com/max/1400/1*YGcMQSr0ge_DGn96WnEkZw.png" width = "400"/>
     
https://towardsdatascience.com/a-complete-beginners-guide-to-matrix-multiplication-for-data-science-with-python-numpy-9274ecfc1dc6
     

In [117]:
m1 = np.array([[1,2,3],[4,5,6]])
m2 = np.array([[10,11],[20,21],[30,31]])

In [118]:
m1

array([[1, 2, 3],
       [4, 5, 6]])

In [119]:
m2

array([[10, 11],
       [20, 21],
       [30, 31]])

In [120]:
m1 @ m2

array([[140, 146],
       [320, 335]])

In [121]:
m1.dot(m2)

array([[140, 146],
       [320, 335]])

#### <b>More matrix computation</b> - basic aggregate functions are available - min, max, sum, mean

In [122]:
matrix

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

#### Use the axis argument to compute mean for each column or row
#### axis = 0 - columns
#### axis = 1 - rows

In [123]:
help(matrix.sum)

Help on built-in function sum:

sum(...) method of numpy.ndarray instance
    a.sum(axis=None, dtype=None, out=None, keepdims=False, initial=0, where=True)
    
    Return the sum of the array elements over the given axis.
    
    Refer to `numpy.sum` for full documentation.
    
    See Also
    --------
    numpy.sum : equivalent function



In [124]:
matrix

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

In [125]:
# total sum 

matrix.sum()


78

In [126]:
np.sum(matrix)

78

In [128]:
# col sum - axis = 0
# last dimension is axis 0

matrix.sum(axis = 0)

array([15, 18, 21, 24])

In [129]:
# row sum - axis =  1

matrix.sum(1)


array([10, 26, 42])

In [130]:
matrix

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

https://www.w3resource.com/python-exercises/numpy/index.php


Create a matrix of 2 rows and 3 columns with every fifth number starting from 1 (e.g. 1,6,11,16,...)


In [150]:
# arange will always create a 1-dimensional array
matrix = np.arange(1, 2*3*5 + 1, 5)
matrix

array([ 1,  6, 11, 16, 21, 26])

In [152]:
matrix = np.arange(1, 2*3*5+1, 5).reshape(2,3)

matrix

array([[ 1,  6, 11],
       [16, 21, 26]])

#### <font color = "red">Exercise:</font>   


Normalize the values in the matrix to be between 0 and 1 (min-max normalization).     
Substract the minimum value and divide by the maximum value of the resulting values.

In [153]:
matrix


array([[ 1,  6, 11],
       [16, 21, 26]])

In [155]:
min_val = matrix.min()
min_val

1

In [156]:
min_matrix = matrix - min_val
min_matrix

array([[ 0,  5, 10],
       [15, 20, 25]])

In [159]:
max_val = min_matrix.max()
max_val

25

In [160]:
norm_matrix = min_matrix/max_val
norm_matrix

array([[0. , 0.2, 0.4],
       [0.6, 0.8, 1. ]])

#### <font color = "red">Exercise:</font>   

Do the same normalization at the row level

In [161]:
matrix

array([[ 1,  6, 11],
       [16, 21, 26]])

In [166]:
min_vals = matrix.min(1)
min_vals

array([ 1, 16])

In [165]:
matrix - min_vals

ValueError: operands could not be broadcast together with shapes (2,3) (2,) 

In [168]:
min_vals = min_vals.reshape(2,1)
min_vals

array([[ 1],
       [16]])

In [170]:
minv_matrix = matrix - min_vals
minv_matrix

array([[ 0,  5, 10],
       [ 0,  5, 10]])

In [171]:
max_vals = minv_matrix.max(1)
max_vals

array([10, 10])

In [173]:
max_vals = max_vals.reshape(2,1)
max_vals

array([[10],
       [10]])

In [174]:
minv_matrix/max_vals

array([[0. , 0.5, 1. ],
       [0. , 0.5, 1. ]])

#### <font color = "red">Exercise:</font>   


* Return the even numbers from the matrix.
* Try to return the indices of the even numbers  (hint: look at the where method).

In [175]:
matrix

array([[ 1,  6, 11],
       [16, 21, 26]])

In [176]:
4%2

0

In [177]:
matrix%2

array([[1, 0, 1],
       [0, 1, 0]])

In [178]:
matrix%2 == 0

array([[False,  True, False],
       [ True, False,  True]])

In [180]:
cond = matrix%2 == 0
cond

array([[False,  True, False],
       [ True, False,  True]])

In [181]:
matrix[cond]

array([ 6, 16, 26])

In [182]:
help(np.where)

Help on function where in module numpy:

where(...)
    where(condition, [x, y])
    
    Return elements chosen from `x` or `y` depending on `condition`.
    
    .. note::
        When only `condition` is provided, this function is a shorthand for
        ``np.asarray(condition).nonzero()``. Using `nonzero` directly should be
        preferred, as it behaves correctly for subclasses. The rest of this
        documentation covers only the case where all three arguments are
        provided.
    
    Parameters
    ----------
    condition : array_like, bool
        Where True, yield `x`, otherwise yield `y`.
    x, y : array_like
        Values from which to choose. `x`, `y` and `condition` need to be
        broadcastable to some shape.
    
    Returns
    -------
    out : ndarray
        An array with elements from `x` where `condition` is True, and elements
        from `y` elsewhere.
    
    See Also
    --------
    choose
    nonzero : The function that is called when x and y

In [184]:
matrix

array([[ 1,  6, 11],
       [16, 21, 26]])

In [183]:
pos = np.where(matrix == 3)
pos

(array([], dtype=int64), array([], dtype=int64))

In [185]:
matrix[pos]

array([], dtype=int64)

In [186]:
pos = np.where(matrix == 6)
pos

(array([0]), array([1]))

In [187]:
matrix[pos]

array([6])

In [188]:
pos = np.where(matrix % 2 == 0)
pos

(array([0, 1, 1]), array([1, 0, 2]))

In [189]:
matrix


array([[ 1,  6, 11],
       [16, 21, 26]])

In [190]:
matrix[pos]

array([ 6, 16, 26])

#### RESOURCES

http://scipy-lectures.org/intro/numpy/array_object.html#what-are-numpy-and-numpy-arrays   
https://www.python-course.eu/numpy.php   
https://numpy.org/devdocs/user/quickstart.html#universal-functions   
https://www.geeksforgeeks.org/python-numpy/

_____

### Pandas
<img src = "https://upload.wikimedia.org/wikipedia/commons/e/ed/Pandas_logo.svg" width = 200/>

https://commons.wikimedia.org/wiki/File:Pandas_logo.svg

[Pandas](https://pandas.pydata.org/) is a high-performance library that makes familiar data structures, like `data.frame` from R, and appropriate data analysis tools available to Python users.

<img src = "https://media.geeksforgeeks.org/wp-content/uploads/finallpandas.png" width = 550/>

https://www.geeksforgeeks.org/python-pandas-dataframe/