# Python arrays

There are arrays that exist in python, but they are crap. In the following we have pure python array and no `numpy` arrays.

In [1]:

python_array1 = [1,2,3]
python_array2 = [4,5,6]

python_array1 + python_array2

[1, 2, 3, 4, 5, 6]

We see that there is no mathematical interpretation of the python arrays! Thats why we need `numpy` arrays!

# Numpy arrays

We need to import the `numpy` package.

We can define `numpy` arrays by `np.array([])`.

In [1]:
import numpy as np

numpy_array1 = np.array([1,2,3,4])
numpy_array2 = np.array([4,5,6,7])

numpy_array_sum = numpy_array1 + numpy_array2
numpy_array_sum

array([ 5,  7,  9, 11])

We see that `numpy` interpets arrays mathematically!

## Multidimensional arrays / Matrices

By bassing several same-length arrays to `np.array()`, we can create a multidimensional array.

Note: embracket the whole matrix within `np.array` into `[ ]`.



In [72]:
matrix = np.array([[1,0,0], [0,1,0], [0,0,1]])
matrix

array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]])

`np.arange(i)` quickly sets up an array iterating from 0 (inclusive) to i (exclusive)

In [2]:
#array from 0 (inclusive) to 4 (exclusive):
np.arange(4)


array([0, 1, 2, 3])

`np.linspace(i,j,s)` quickly sets up an array iterating from i (inclusive) to j (inclusive), consisting of s equidistant elements.

In [5]:
#array from -1 (inclusive) to 1 (inclusive) with 3 equidistant elements:
np.linspace(-1,1,3)

array([-1.,  0.,  1.])

## Reshaping arrays

To have multidimensional arrays (matrices) we can reshape the arrays to the desired dimesion by using `reshape()`.



In [31]:
print(numpy_array_sum.reshape(2,2))

[[ 5  7]
 [ 9 11]]


If we don't know the exact end dimension, we can pass -1, which will automatically detect the right dimension.

In [33]:
#If we don't know the 2nd dimension:
print(numpy_array_sum.reshape(2,-1))

print('\n')

#Forming a vector:
print(numpy_array_sum.reshape(-1,1))

[[ 5  7]
 [ 9 11]]


[[ 5]
 [ 7]
 [ 9]
 [11]]


We can always transform a matrix into a row-vector by passing a single value of -1.

In [34]:
numpy_array_sum.reshape(-1)

array([ 5,  7,  9, 11])

Using `shape`, we can output the dimension of an array.

In [36]:
matrix.shape

(3, 3)

Transposing works simply with `.T`

Note: in `numpy`, vectors are not distinguished to be row- or column vectors, they are automatically shifted to make it fit.

In [115]:
#this does not transpose anything:
numpy_array_sum.T

array([ 5,  7,  9, 11])

## Tensors

We can extend an array (1-dim) to a matrix (2-dim) and a matrix to a tensor (3-dim), by simply stacking matrices together.

Note: embracket each matrix into `[ ]` and the whole tensor into `[ ]`.

In [42]:
tensor = np.array([[[1,2,3],[4,5,6],[7,8,9]], [[9,8,7],[6,5,4],[3,2,1]], [[1,0,0],[0,1,0],[0,0,1]]])
print(tensor)
print(tensor.shape)

[[[1 2 3]
  [4 5 6]
  [7 8 9]]

 [[9 8 7]
  [6 5 4]
  [3 2 1]]

 [[1 0 0]
  [0 1 0]
  [0 0 1]]]
(3, 3, 3)


## Accessing arrays (slicing)

Accessing any element $\text{e}_{ijk}$ in a tensor works with `tensor[i][j][k]`.
Alternatively: `tensor[i,j,k]`

In [44]:
tensor[1,0,2]

7

We can grab a subset of an array by using `[start_index:stop_index:step_size]`

Note: `start_index` is inclusive, `stop_index` is exclusive.

Note: if `step_size` is left out, then default is 1.

Note: if `start_index` is left out, then default is 0

Note: if `stop_index` is left out, then default is the last index.

Note: if simply `[:]` is called, then it returns the whole array.

In [60]:
print(numpy_array_sum)
print(numpy_array_sum[0:2:1])
print(numpy_array_sum[2:])

[ 5  7  9 11]
[5 7]
[ 9 11]


Accessing a matrix works the same.

`[i,:]` accesses the i-th row.
`[i]` is the same.

`[:,i]` accesses the i-th column.


In [73]:
print(matrix)

print('\n')

print(matrix[1,:])

print('\n')

matrix[:,1]

[[1 0 0]
 [0 1 0]
 [0 0 1]]


[0 1 0]




array([0, 1, 0])

We can precisely select a submatrix by using `[:i,:j]`, where we select everything up to the ith row and everything up to the jth column.

In [78]:
#corresponds to upper left submatrix:
matrix[:2,:2]

array([[1, 0],
       [0, 1]])

We can precisely select a submatrix by using `[i:k,j:l]`, where we select everything from ith row (inclusively) to kth row (exclusively) and everything from jth column (inclusively) to the lth column.

In [84]:
big_matrix = np.array([[1,2,3,4,5],[12,32,21,53,4],[0,1,2,1,3],[83,13,1,2,3],[2,4,13,21,2]])
print(big_matrix)

#corresponds to inner vector [21,2,1]:
big_matrix[1:4,2:3]

[[ 1  2  3  4  5]
 [12 32 21 53  4]
 [ 0  1  2  1  3]
 [83 13  1  2  3]
 [ 2  4 13 21  2]]


array([[21],
       [ 2],
       [ 1]])

## Mathematical Operations

Mathematical operations in numpy are within `np.linalg`

1. `np.linal.matrix_power()` gives the power of a matrix_power

Alternative: `np.power()` gives the power of an array/matrix.

In [114]:
big_matrix

array([[ 1, 12,  0, 83,  2],
       [ 2, 32,  1, 13,  4],
       [ 3, 21,  2,  1, 13],
       [ 4, 53,  1,  2, 21],
       [ 5,  4,  3,  3,  2]])

In [100]:
#calculates ^2 for matrix:
np.linalg.matrix_power(big_matrix,2)

array([[ 367,  141,  120,  226,   44],
       [4803, 1774,  855, 1955,  418],
       [ 101,   59,   65,  120,   19],
       [ 411,  621,  565, 1089,  482],
       [1797,  426,  163,  317,  132]])

2. `a.dot(b)` gives the (dot)-product. This is the multiplication.

Alternatively: `np.dot(a,b)`

If `a` is matrix and `b` matrix, then `a.dot(b)` is simply the matrix-matrix multiplication.

If `a` is matrix and `b` vector, then `a.dot(b)` is simply the matrix-vector multiplication.

Note: in `numpy`, vectors are not distinguished to be row- or column vectors, they are automatically shifted to make it fit.


In [108]:
#dot-product of big_matrix's rows with b:
b = np.array([1,2,3,4,5])
big_matrix.dot(b)

array([ 55, 371,  27, 135, 143])

3. Calculation for eigenvalues: if a matrix $\mathbf{A}$ is given, then we want to find all eigenvectors $v$ and eigenvalues $\lambda$, which solve the equation $\mathbf{A} v = \lambda v$. 

We can do this using the function `np.linalg.eig()`. 

It returns two arrays: an array with all the eigenvalues and an array with all the eigenvectors. The i-th element of the eigenvalues corresponds to the i-th column-vector of the eigenvectors.

Note: The eigenvectors are not normalized, i.e. any scalar multiple of this vector is also an eigenvector.

In [133]:
eig_mat = np.array([[2,0,0],[0,3,4],[0,4,9]])

eigvals, eigvecs = np.linalg.eig(eig_mat)
print(eigvals)
print(eigvecs)

[11.  1.  2.]
[[ 0.          0.          1.        ]
 [ 0.4472136   0.89442719  0.        ]
 [ 0.89442719 -0.4472136   0.        ]]


Now to verify this, the two products below must be the same.

Note: `np.dot()` had to be used, otherwise error, due to formats that are not in this function.

In [141]:
#eig_mat * first_eigenvector:
print(eig_mat.dot(eigvecs[:,0]))
#first_eigenvalue * first_eigenvector:
print(np.dot(eigvals[0], eigvecs[:,0]))


#this throws error:
#eigvals[0].dot(eigvecs[:,0])

[0.         4.91934955 9.8386991 ]
[0.         4.91934955 9.8386991 ]


# Scipy

`Scipy` is a package that supports basic tools for statistical analysis.

## Sparse matrices

We can use `scipy.sparse.csr_matrix()` to store sparse matrices. This is a special scipy object that only stores the  values from a numpy array that are different to 0 and their positions.

In [10]:
import numpy as np
import scipy.sparse

matrix = np.array([[1.,0.,1.,0.,0.], [0.,1.,0.,0.,0.], [0.,0.,1.,0.,0.], [1.,0.,0.,0.,0.], [1.,1.,0.,0.,1.]])
print(matrix)

#Turn matrix into sparse object:
sparse_matrix = scipy.sparse.csr_matrix(matrix)
print(sparse_matrix)

[[1. 0. 1. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [1. 0. 0. 0. 0.]
 [1. 1. 0. 0. 1.]]
  (0, 0)	1.0
  (0, 2)	1.0
  (1, 1)	1.0
  (2, 2)	1.0
  (3, 0)	1.0
  (4, 0)	1.0
  (4, 1)	1.0
  (4, 4)	1.0


We can turn this sparse matrix object back into a numpy array by using `todense()`.

In [11]:
#Turn sparse object back into full matrix:
sparse_matrix.todense()

matrix([[1., 0., 1., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [1., 0., 0., 0., 0.],
        [1., 1., 0., 0., 1.]])

Using `scipy.sparse.linalg.eigs()`, we can perform eigenvector and -value decomposition on such a sparse object.

Note: `k` (number of eigenvectors) needs to specified (must be smaller than N-1) 

Note: for this, the matrix must consist of `floats` (or doubles)

In [12]:
import scipy.sparse.linalg


eigenvalues, eigenvectors = scipy.sparse.linalg.eigs(sparse_matrix, k=1)
print(eigenvalues)
print(eigenvectors)

[1.00000165-2.85665178e-06j]
[[ 2.85665179e-06+1.64929754e-06j]
 [-8.83308335e-15+5.10030386e-15j]
 [ 9.42293752e-12-5.44075895e-12j]
 [ 2.85664237e-06+1.64930298e-06j]
 [-9.42293752e-12+1.00000000e+00j]]


## Random variables

With scipy, we can easily generate random variables using `scipy.stats`.

There is a whole list of classes to use: (https://docs.scipy.org/doc/scipy/reference/stats.html)

1. Standard normal distribution ($\mu = 0, \sigma = 1$): `scipy.stats.norm.rvs()`

In [18]:
import scipy.stats

#array of 10 random realisations of standard normal distribution:
st_norm_distr = scipy.stats.norm.rvs(size=10)
st_norm_distr

array([-0.61951921,  1.56037512, -0.15284348, -0.29612157,  2.62640533,
       -0.70112341, -0.62759112,  0.15304096, -0.49894647, -0.02954968])

2. Normal distribution (any $\mu ,\sigma$): $\mu$ + $\sigma$ * `scipy.stats.norm.rvs()`

In [19]:
#array of 10 random realisations of normal distribution with mean=10 and standard deviation = 3:
norm_distr = 10 + 3 * scipy.stats.norm.rvs(size=10)
norm_distr

array([ 9.10052206,  5.38098479,  7.46567614,  7.09464648,  8.08305   ,
       14.35653247,  5.46115573,  6.70520216, 12.65241056, 14.91100315])

3. CDF (Verteilungsfunktion) of normal distribution: `scipy.stats.norm.cdf()`

In [22]:
#Gaußsche Glockenkurve hat bei x=0 die Hälfte der Masse:
scipy.stats.norm.cdf(x=0)

0.5