# Cost and Loss Function of Logistic regression

## Vectorization
We use vectorization in order to reduce the time to run. For loop takes more time in order to run the cost function. So we can use vectorization and run the for loop. Vectorization does not require GPU in order to get executed.

In [1]:
import numpy as np
a = np.array ([1,2,3,4])
print (a)

[1 2 3 4]


In [2]:
#Vectorized version
import time
a = np.random.rand(1000000)
b = np.random.rand(1000000)

tic =time.time()
c = np.dot(a,b)
toc = time.time()

print(c)
print("Vectorized version: " + str(1000*(toc-tic))+" ms")

250301.22180833318
Vectorized version: 1.953125 ms


In [3]:
#Non-vectorized version
c = 0 
tic = time.time()
for i in range(1000000):
    c+=a[i]*b[i]
toc = time.time()

print(c)
print("Non-Vectorized version: " + str(1000*(toc-tic))+" ms")

250301.22180833225
Non-Vectorized version: 567.2075748443604 ms


Both CPU and GPU has SIMD way of implementation

SIMD is Single instruction multiple data

Avoiding multiple loops will make the program faster 
- Whenever possible try to reduce explicit for loops

## Vectors and matrix valued functions
Say you need to apply the exponential operation on every element of a matrix or vector, it is faster if you apply using a vectorization

import numpy as np supports various functions like abs log and exp

In [4]:
## Broadcasting in python will make the code run faster

import numpy as np

A = np.array([[56.0,0.0,4.4,68.0],[1.2,104.0,52.0,8.0],[1.8,135.0,99.0,0.9]])
A

array([[ 56. ,   0. ,   4.4,  68. ],
       [  1.2, 104. ,  52. ,   8. ],
       [  1.8, 135. ,  99. ,   0.9]])

In [5]:
cal = A.sum(axis =0)
cal

array([ 59. , 239. , 155.4,  76.9])

In [6]:
percentage = 100*A/ cal.reshape(1,4)
percentage

array([[94.91525424,  0.        ,  2.83140283, 88.42652796],
       [ 2.03389831, 43.51464435, 33.46203346, 10.40312094],
       [ 3.05084746, 56.48535565, 63.70656371,  1.17035111]])

Python supports an option of repeating the values in matrix/vector so that we get same shape of matrix as that of input

In [7]:
# Python Numpy Vectors (Tips & Tricks)

a =np.random.randn(5) #Creates 5 random variables
print(a)

[-0.61925025 -0.36120829 -2.1757893   0.15440956 -1.61420643]


In [8]:
print(a.shape) #Rank 1 array, it's not a real vector or column vector

(5,)


In [9]:
print(a.T)

[-0.61925025 -0.36120829 -2.1757893   0.15440956 -1.61420643]


In [10]:
print(np.dot(a,a.T))

7.877506100857941


In [11]:
a =np.random.randn(5,1) # Column vector
print(a)

[[ 0.51346824]
 [ 0.27842153]
 [ 1.23329882]
 [-0.54000091]
 [-0.43419021]]


In [12]:
print(a.T)

[[ 0.51346824  0.27842153  1.23329882 -0.54000091 -0.43419021]]


In [13]:
print(np.dot(a,a.T))

[[ 0.26364963  0.14296061  0.63325978 -0.27727332 -0.22294288]
 [ 0.14296061  0.07751855  0.34337695 -0.15034788 -0.1208879 ]
 [ 0.63325978  0.34337695  1.52102598 -0.66598249 -0.53548628]
 [-0.27727332 -0.15034788 -0.66598249  0.29160099  0.23446311]
 [-0.22294288 -0.1208879  -0.53548628  0.23446311  0.18852114]]


In [14]:
assert(a.shape ==(5,1))

In [15]:
a = np.random.randn(2, 3) # a.shape = (2, 3)
b = np.random.randn(2, 1) # b.shape = (2, 1)
c = a + b

In [16]:
c

array([[-0.6156138 , -0.58611695, -1.25381958],
       [-0.90661593,  1.14097381,  0.38904799]])

In [21]:
a = np.random.randn(4, 3) # a.shape = (4, 3)
b = np.random.randn(3, 2) # b.shape = (3, 2)
c = a*b
c

ValueError: operands could not be broadcast together with shapes (4,3) (3,2) 

In [19]:
a = np.random.randn(12288, 150) # a.shape = (12288, 150)
b = np.random.randn(150, 45) # b.shape = (150, 45)
c = np.dot(a,b)
c.shape

(12288, 45)

In [23]:
a = np.random.randn(3, 3)
b = np.random.randn(3, 1)
c = a*b
c

array([[-1.26041266,  0.2512311 , -0.27921017],
       [-0.10569755,  0.11727577,  0.01393659],
       [-0.42151011, -0.09623704,  0.09833676]])