# Linear Regression

#### Linear regression is an old method from statistics for describing the relationships between variables. It is often used in machine learning for predicting numerical values in simpler regression problems. The common way of summarizing the linear regression equation uses linear algebra notation:
y = A·b 

## Principal Component Analysis
Methods for automatically reducing the number of columns of a dataset are called dimensionality reduction, and perhaps the most popular is method is called the principal component analysis or PCA for short.

The core of the PCA method is a matrix factorization method from linear algebra. The eigendecomposition can be used and more robust implementations may use the singular-value decomposition or SVD.


## Latent Semantic Analysis
In the sub-ﬁeld of machine learning called natural language processing, it is common to represent documents as large matrices of word occurrences. For example, the columns of the matrix may be the known words in the vocabulary and rows may be sentences, paragraphs, pages or documents of text with cells in the matrix marked as the count or frequency of the number of times the word occurred

## Introduction to NumPy Arrays

In [1]:
from numpy import array

l=[1.0, 2.0, 3.0]
a=array(l)

print(a)
print(a.shape)
print(a.dtype)

[ 1.  2.  3.]
(3,)
float64


### Functions to create Arrays

In [2]:
import numpy as np

# empty()
a = np.empty([3,3])
print(a)

# zeros()
a = np.zeros([3,5])
print(a)

# ones()
a = np.ones([5])
print(a)


[[  0.00000000e+000   2.49009086e-321   2.47032823e-323]
 [  9.13056385e-312   0.00000000e+000   2.47032823e-323]
 [  9.13056386e-312   0.00000000e+000   3.23815565e-319]]
[[ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]]
[ 1.  1.  1.  1.  1.]


### Combining arrays

In [3]:
# Vertical stack

a1 = array([1,2,3])
print(a1)

a2 = array([4,5,6])
print(a2)

a3 = np.vstack((a1,a2))
print(a3)
print(a3.shape)

# Horizontal Stack
a3 = np.hstack((a1,a2))
print(a3)
print(a3.shape)


[1 2 3]
[4 5 6]
[[1 2 3]
 [4 5 6]]
(2, 3)
[1 2 3 4 5 6]
(6,)


## Index, Slice and Reshape NumPy Arrays

### one / two dimensional List to array

In [4]:
# one dimensional list of data 
data = [11, 22, 33, 44, 55] 
# array of data 
data = array(data) 
print(data) 
print(type(data))

# two dimensional list of data 
data = [[11, 22], [33, 44], [55, 66]] 
# array of data 
data = array(data) 
print(data) 
print(type(data))

print(data[0,0])

[11 22 33 44 55]
<class 'numpy.ndarray'>
[[11 22]
 [33 44]
 [55 66]]
<class 'numpy.ndarray'>
11


### Array Slicing 

In [5]:
# one dimensional slicing
data = np.array([11,22,33,44,55])

print(data[:])
print(data[0:1])
print(data[-2:])

# two dimensional slicing
data = np.array([
    [1,2,3],
    [4,5,6],
    [7,8,9]
])

X,y = data[:,:-1], data[:,-1]

print(X)
print(y)

# split data (for train and test)
split = 2
train, test = data[:split,:], data[split:,0:2]
print(train)
print(test)

[11 22 33 44 55]
[11]
[44 55]
[[1 2]
 [4 5]
 [7 8]]
[3 6 9]
[[1 2 3]
 [4 5 6]]
[[7 8]]


### Array Reshaping

In [6]:
# shape of one dimensional array
data = np.array([11,22,33,44,55])
print (data.shape)

# shape of two dimensional array
data = np.array([
    [1,2],
    [4,5],
    [7,8]
])
print (data)
print (data.shape[0])
print (data.shape[1])

print('Resahping 1D to 2D array')
data = np.array([11,22,33,44,55])
print(data.shape[0])
print(data.reshape((data.shape[0],1)))


print('Resahping 2D to 3D array')
data = np.array([
    [1,2],
    [4,5],
    [7,8]
])

print(data.reshape((2,data.shape[0],1)))

(5,)
[[1 2]
 [4 5]
 [7 8]]
3
2
Resahping 1D to 2D array
5
[[11]
 [22]
 [33]
 [44]
 [55]]
Resahping 2D to 3D array
[[[1]
  [2]
  [4]]

 [[5]
  [7]
  [8]]]


### Array Broadcasting

In [7]:
a = np.array([1,2,3])
print(a+2)

[3 4 5]


### Vector Norms

#### Vector L1 Norm
the L1 norm is calculated as the sum of the absoute vector values.

In [8]:
from numpy.linalg import norm
a = np.array([1,2,3])

l1 = norm(a,1)
print(l1)

6.0


#### Vector L2 norm
The L2 norm is calculated as the sum of the squared vector values.

In [9]:
l2 = norm(a) # sqrt()
print(l2)

3.74165738677


#### Max Vector norm
The max norm is calculated as returning the maximum value of the vector

In [10]:
print(np.inf)
maxnorm = norm(a,np.inf)
print(maxnorm)

inf
3.0


### Matrix Operations

In [11]:
# Matrix multiplicastion
A = array([
    [1,2],
    [3,4],
    [5,6]
])

B = array([
    [1,2],
    [3,4]
])

C = A.dot(B) # or A @ B
print(C)

[[ 7 10]
 [15 22]
 [23 34]]


In [12]:
# diagonal matrix
M = array([
    [1,2,3],
    [1,2,3],
    [1,2,3]
])

d = np.diag(M)
print(d)
print(np.diag(d))

[1 2 3]
[[1 0 0]
 [0 2 0]
 [0 0 3]]


In [13]:
# Identity matrix
print(np.identity(3))

[[ 1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  0.  1.]]


### Matrix Operations

In [14]:
# Matrix Transpose
print(A.T)

[[1 3 5]
 [2 4 6]]


In [15]:
# Inverse of a matrix
from numpy.linalg import inv
from numpy.linalg import det
A = array([
    [1,2],
    [3,4]
])
B = inv(A)
print(B)
print(B.dot(A))

# determinant
print(det(A))

[[-2.   1. ]
 [ 1.5 -0.5]]
[[  1.00000000e+00   0.00000000e+00]
 [  1.11022302e-16   1.00000000e+00]]
-2.0


In [16]:
from numpy.linalg import matrix_rank

A = array([
    [1,2],
    [3,4],
    [0,0]
])

print(matrix_rank(A))

A = array([
    [1,2],
    [3,6],
])
print(matrix_rank(A))

2
1


### Sparse matrix
A matrix having the most of the values 0's.

#### Data Preparation
we can prepare data using encoding for Sparse Matrix.
- One hot encoding
- Count encoding
- TF-IDF encoding

#### Solution of Sparx matrix
 there are following solutions
 - Dictionary of Keys
 - List of List
 - Coordinate list
 - Compressed Sparsed Row
 - Compressed Sparse Column 

In [20]:
from numpy import array
from scipy.sparse import csr_matrix
# create dense matrix
A = array([
[1, 0, 0, 1, 0, 0],
[0, 0, 2, 0, 0, 1],
[0, 0, 0, 2, 0, 0]])
print(A)
# convert to sparse matrix (CSR method)
S = csr_matrix(A)
print(S)
# reconstruct dense matrix
B = S.todense()
print(B)
sparsity = 1 - np.count_nonzero(B) / B.size
print(sparsity)

[[1 0 0 1 0 0]
 [0 0 2 0 0 1]
 [0 0 0 2 0 0]]
  (0, 0)	1
  (0, 3)	1
  (1, 2)	2
  (1, 5)	1
  (2, 3)	2
[[1 0 0 1 0 0]
 [0 0 2 0 0 1]
 [0 0 0 2 0 0]]
0.7222222222222222
