June 9, 2019

Reference: An Essential Guide to Numpy for Machine Learning in Python [by Siddharth Dikshit](https://becominghuman.ai/an-essential-guide-to-numpy-for-machine-learning-in-python-5615e1758301)

In [1]:
import numpy as np

In [4]:
# create a vectors as a Row
vector_row = np.array([7,4,1])
print(vector_row, vector_row.shape)

# column
vector_col = np.array([[1],[2],[4]])
print(vector_col, vector_col.shape) # 3 rows, 1 column

[7 4 1] (3,)
[[1]
 [2]
 [4]] (3, 1)


In [6]:
matrix = np.array([[1,4,7],[2,5,8],[3,6,9]]) 
print(matrix)

[[1 4 7]
 [2 5 8]
 [3 6 9]]


https://docs.scipy.org/doc/numpy/user/basics.indexing.html

In [13]:
from scipy.sparse import csr_matrix
    
# mask index arrays
b = matrix % 2 == 0
print(b)

# spicy sparcity
dense_matrix = np.copy(matrix)
dense_matrix[b] = 0 
print(dense_matrix)

matrix_sparse = csr_matrix(dense_matrix)
print(matrix_sparse)


[[False  True False]
 [ True False  True]
 [False  True False]]
[[1 0 7]
 [0 5 0]
 [3 0 9]]
  (0, 0)	1
  (0, 2)	7
  (1, 1)	5
  (2, 0)	3
  (2, 2)	9


In [17]:
matrix_sparse.todense()

matrix([[1, 0, 7],
        [0, 5, 0],
        [3, 0, 9]], dtype=int32)

In [14]:
from scipy.sparse import dok_matrix
S = dok_matrix((5, 5), dtype=np.float32)
for i in range(5):
    for j in range(5):
        S[i, j] = i + j    # Update element
print(S)

  (0, 1)	1.0
  (0, 2)	2.0
  (0, 3)	3.0
  (0, 4)	4.0
  (1, 0)	1.0
  (1, 1)	2.0
  (1, 2)	3.0
  (1, 3)	4.0
  (1, 4)	5.0
  (2, 0)	2.0
  (2, 1)	3.0
  (2, 2)	4.0
  (2, 3)	5.0
  (2, 4)	6.0
  (3, 0)	3.0
  (3, 1)	4.0
  (3, 2)	5.0
  (3, 3)	6.0
  (3, 4)	7.0
  (4, 0)	4.0
  (4, 1)	5.0
  (4, 2)	6.0
  (4, 3)	7.0
  (4, 4)	8.0


*Question:* HMM a lot more to sparse matrices that i know, why so many different representations that seem quite similar?

source : https://datascience.stackexchange.com/questions/31352/understanding-scipy-sparse-matrix-types

a) Sparse types used to construct the matrices:

- DOK (Dictionary Of Keys): a dictionary that maps (row, column) to the value of the elements. It uses a hash table so it's efficient to set elements.

- LIL (LIst of Lists): LIL stores one list per row. The lil_matrix format is row-based, so if we want to use it then in other operations, conversion to CSR is efficient, whereas conversion to CSC is less so.

- COO (COOrdinate list): stores a list of (row, column, value) tuples.

b) Sparse types that support efficient access, arithmetic operations, column or row slicing, and matrix-vector products:

- CSR (Compressed Sparse Row): similar to COO, but compresses the row indices. Holds all the nonzero entries of M in left-to-right top-to-bottom ("row-major") order (all elements in the first row, all elements in the second row, and so). More efficient in row indexing and row slicing, because elements in the same row are stored contiguously in the memory.

- CSC (Compressed Sparse Column): similar to CSR except that values are read first by column. More efficient in a column indexing and column slicing.

Once the matrices are build using one of the a) types, to perform manipulations such as multiplication or inversion, we should convert the matrix to either CSC or CSR format.

In [20]:
vector_row = matrix.flatten()
print(matrix)
print(matrix[1,1], "2nd row 2nd col") # 2nd row 2nd col
print(vector_row[2])
print(vector_row[-3:], "last 3 elements") # last 3 elements
print(matrix[:2,1])  # first two rows, last columng


[[1 4 7]
 [2 5 8]
 [3 6 9]]
5 2nd row 2nd col
7
[3 6 9] last 3 elements
[4 5]


In [21]:
print(matrix.size) # number of elements (rows*columns)
print(matrix.ndim)


9
2


https://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html

In [25]:
# Numpy's vectorize class
# - converts a function to be applied to an array of input
double_it = lambda i: i*2
vectorized_double_it = np.vectorize(double_it)
print(matrix)
print(vectorized_double_it(matrix))

[[1 4 7]
 [2 5 8]
 [3 6 9]]
[[ 2  8 14]
 [ 4 10 16]
 [ 6 12 18]]


In [26]:
# built in multi-dim functions
print(np.max(matrix))
print(np.min(matrix))
print(np.max(matrix,axis=0)) # per column
print(np.max(matrix,axis=1)) # per row

9
1
[3 6 9]
[7 8 9]


In [32]:
print(np.mean(matrix, axis =0))
print(np.std(matrix))
print(np.var(matrix, axis=1))
print(np.sum(matrix,axis=1))

[2. 5. 8.]
2.581988897471611
[6. 6. 6.]
[12 15 18]


In [36]:
# reshpaing arrays
print(matrix)
print(matrix.reshape(matrix.size,1)) # (9,1)
print(matrix.reshape(1,-1)) # -1 means arbitrarily columns needed

[[1 4 7]
 [2 5 8]
 [3 6 9]]
[[1]
 [4]
 [7]
 [2]
 [5]
 [8]
 [3]
 [6]
 [9]]
[[1 4 7 2 5 8 3 6 9]]


In [41]:
print(matrix.T) # transpose
print(matrix)

[[1 2 3]
 [4 5 6]
 [7 8 9]]
[[1 4 7]
 [2 5 8]
 [3 6 9]]


In [42]:
# determinant
print(np.linalg.det(matrix))
# rank
print(np.linalg.matrix_rank(matrix))

0.0
2


In [45]:
print(matrix.diagonal()) # principal diagonal
print(matrix.diagonal(offset=1))
print(matrix.diagonal(offset=-1))


[1 5 9]
[4 8]
[2 6]


In [46]:
print(matrix.trace()) # sum of principal diagonals

15


In [57]:
# eigenvalues and eigenvectors
# Av = Kv, where A is a square matrix, K the eigenvalues and v the eigenvectors
matrix = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(matrix)
print("-"*matrix.shape[1])

eigenvalues ,eigenvectors=np.linalg.eig(matrix)
print(eigenvalues)
print("---")
print(eigenvectors)

[[1 2 3]
 [4 5 6]
 [7 8 9]]
---
[ 1.61168440e+01 -1.11684397e+00 -9.75918483e-16]
---
[[-0.23197069 -0.78583024  0.40824829]
 [-0.52532209 -0.08675134 -0.81649658]
 [-0.8186735   0.61232756  0.40824829]]


In [69]:
eigenvalues*eigenvectors #element wis multiplcation

array([[-3.73863537e+00,  8.77649763e-01, -3.98417052e-16],
       [-8.46653421e+00,  9.68877101e-02,  7.96834105e-16],
       [-1.31944331e+01, -6.83874343e-01, -3.98417052e-16]])

In [70]:
matrix @ eigenvectors

array([[-3.73863537e+00,  8.77649763e-01, -4.44089210e-16],
       [-8.46653421e+00,  9.68877101e-02, -4.44089210e-16],
       [-1.31944331e+01, -6.83874343e-01, -8.88178420e-16]])

In [61]:
np.dot(matrix,eigenvectors)

array([[-3.73863537e+00,  8.77649763e-01, -3.88578059e-16],
       [-8.46653421e+00,  9.68877101e-02, -3.33066907e-16],
       [-1.31944331e+01, -6.83874343e-01, -7.21644966e-16]])

In [66]:
v1 = np.array([1,2,3])
v2 = np.array([4,5,6])
print(v1@v2)
print(np.dot(v2,v1))

32
32


In [71]:
print(np.linalg.inv(matrix))

[[ 3.15251974e+15 -6.30503948e+15  3.15251974e+15]
 [-6.30503948e+15  1.26100790e+16 -6.30503948e+15]
 [ 3.15251974e+15 -6.30503948e+15  3.15251974e+15]]


In [75]:
# Random Gen with Seed
np.random.seed(297)
print(np.random.randint(0,11,3))

[ 6  0 10]


In [77]:
np.random.seed(299)
print(np.random.randint(0,11,3))

[10 10  5]


In [78]:
#3 from normal distribution with mean 1 and std 2
np.random.normal(1.0,2.0,3) 

array([2.43976594, 3.18554496, 2.93333037])