# Logistic Regression as a Neural Network

## Binary Classification

## Logistic Regression

### Basics for logistic regression

### logestic regression Cost Function



### logestic regression gradient descent

<img src="images/gd with 1 example.png" width="600">

<img src="images/gd with m examples.png" width="600">

# Vectorization

## What is vectorization?  

In Deep Learning, vectorization refers to the process of converting an operation that acts on scalar values to an equivalent operation that acts on vectors or matrices of those values. Vectorization is a key technique for optimizing the performance of deep learning models, as it allows large amounts of data to be processed more efficiently by taking advantage of parallel computing architectures. 

Vectorization can help us get rid of explicit for loops in our codes and save time in the deep learning area.

## Vectorizing Logistic Regression

In logistic regression, we compute $Z = W^{T}*X+b$, $W$ and $X$ are both column vectors. Here is the comparison between "for loop" method and "vectorizaiton" method.

In [2]:
import numpy as np
import time
a = np.array([1,2,3,4])
print(a)

[1 2 3 4]


In [3]:
a = np.random.rand(1000000) 
b = np.random.rand(1000000)

tic=time.time()
c=np.dot(a,b)
toc=time.time()
print(c)
x=str(1000*(toc-tic))
print("vectorization version:" + x +"ms")

c=0
tic=time.time()
for i in range(1000000):
  c+=a[i]*b[i]
toc=time.time()
xx=str(1000*(toc-tic))

print(c)
print("for loop version:" + xx +"ms")
print("the for loop version spent " + str(round(float(xx)/float(x),1)) + " times the time of the vectorization version.")

250322.06153138122
vectorization version:458.3883285522461ms
250322.06153138046
for loop version:568.7224864959717ms
the for loop version spent 1.2 times the time of the vectorization version.


## Vecorizing Logistic Regression

Using $n_x=4,m=3$ in our example.  
Recall that $n_x$ is the number of input feature vector (input size) and $m$ is the number of examples in the dataset:

$X=\left[\begin{array}{lll}x^{(1)} & x^{(2)} & x^{(3)} \\ x^{(1)} & x^{(2)} & x^{(3)} \\ x^{(1)} & x^{(2)} & x^{(3)}\\ x^{(1)} & x^{(2)} & x^{(3)}\end{array}\right] \quad w^{T}=\left[w_1, w_2, w_3, w_4\right]$  

X.shape=(4,3); w.shape=(1,4)  
$\begin{aligned} Z=\left[z^{(1)}, z^{(2)}, z^{(3)}\right] & =w^T X+b \\ & = \left[\begin{array}{ll}w_1, w_2, w_3, w_4\end{array}\right]\left[\begin{array}{lll}x^{(1)} & x^{(2)} & x^{(3)} \\ x^{(1)} & x^{(2)} & x^{(3)} \\ x^{(1)} & x^{(2)} & x^{(3)}\\ x^{(1)} & x^{(2)} & x^{(3)}\end{array}\right]+[b,b,b]\\
& =[w^T x^{(1)}+b,w^T x^{(2)}+b,w^T x^{(3)}+b]
\end{aligned}$  
In Python, we use the command Z=np.dot(wT,X)+b to compute.  

Here, b is a real number, when adding this vector to this real number, Python automatically takes this real number b and expands it out to a 1*m matrix, which is called **"broadcasting"**.  
Next, $A=[a^{(1)},a^{(2)},a^{(3)}]=[\sigma(z^{(1)}),\sigma(z^{(2)}),\sigma(z^{(3)})]$

## Vectorizing Logistic Regression's Gradient Output

In the previous notes, we have already derived that $dz=a-y$. So, consider the vectorization, we get:  
$dz^{(1)}=a^{(1)}-y^{(1)},dz^{(2)}=a^{(2)}-y^{(2)},dz^{(3)}=a^{(3)}-y^{(3)}$  
$dZ,A,Y$ are respective 1*m matrix.  
$dZ=A-Y=[a^{(1)}-y^{(1)},a^{(2)}-y^{(2)},a^{(3)}-y^{(3)}]$  
In Python  
- the calculation logistic for $w$ is:  
$\begin{array}{l}d w=0 \\ dw +=x^{(1)} d z^{(1)} \\ d w+=x^{(2)} d z^{(2)} \\ d w+=x^{(3)} d z^{(3)} \\ \vdots \\ d w=d w / m\end{array}$  
- the calculation logistic for $b$ is:  
$\begin{array}{l}d b=0 \\ d b+=d z^{(1)} \\ d b+=d z^{(2)} \\ d b+=d z^{(3)} \\ \vdots \\ d b=d b / m\end{array}$

In Python, the above logistics are concluded as:  

$ \text{For} \  i=1 \  to \  m : $  
$ \qquad z^{(i)}=w^{T} x^{(i)}+b \\ \qquad a^{(i)}=\sigma \left(z^{(i)}\right) \\ \qquad J+=-\left[y^{(i)} \log a^{(i)}+\left(1-y^{(i)}\right) \log \left(1-a^{(i)}\right)\right] \\ \qquad d z^{(i)}=a^{(i)}-y^{(i)} \\ \qquad d w_1+= x_1^{(i)} d z^{(i)} \\ \qquad d w_2+=x_2^{(i)} d z^{(i)} \\ \qquad d b+=d z^{(i)} \\$
$ \quad J /=m $  
$ \quad d w_1 /=m ; d w_2 /=m ; d b /=m$



## Broadcasting  

### What is broadcasting in Python?  

The examples and explanation which are listed below is enough for myself to understand and use this definition in this course. [Here are more details given by Numpy website](https://numpy.org/doc/stable/user/basics.broadcasting.html).

### Example

<img src="images\broadcasting example.png" width="400">

The calculation we want is really to sum up each of the four columns of this matrix to get the total number of calories in 100 grams of apples, beef, eggs, and potatoes. And then to divide throughout the matrix.

In [7]:
import numpy as np

A=np.array([[56.0,0.0,4.4,68.0],
            [1.2,104.0,52.0,8.0],
            [1.8,135.0,99.0,0.9]])
print(A)

[[ 56.    0.    4.4  68. ]
 [  1.2 104.   52.    8. ]
 [  1.8 135.   99.    0.9]]


In [8]:
cal = A.sum(axis=0) #axis=0 means sum vertically
print(cal)

[ 59.  239.  155.4  76.9]


In [9]:
percentage = 100*A/cal.reshape(1,4) #reshape(1,4) can be deleted
print(percentage)

[[94.91525424  0.          2.83140283 88.42652796]
 [ 2.03389831 43.51464435 33.46203346 10.40312094]
 [ 3.05084746 56.48535565 63.70656371  1.17035111]]


2 cases when python do broadcasting:  
- matrix add a real number:  
    
  $\left[\begin{array}{l}1 \\ 2 \\ 3 \\ 4\end{array}\right]+100=\left[\begin{array}{l}1 \\ 2 \\ 3 \\ 4\end{array}\right]+\left[\begin{array}{l}100 \\ 100 \\ 100 \\ 100\end{array}\right]=\left[\begin{array}{l}101 \\ 102 \\ 103 \\ 104\end{array}\right]$
- matrix add matrix:   
    
  $\left[\begin{array}{lll}1 & 2 & 3 \\ 4 & 5 & 6\end{array}\right]+\left[\begin{array}{lll}100 & 200 & 300\end{array}\right]=\left[\begin{array}{lll}1 & 2 & 3 \\ 4 & 5 & 6\end{array}\right]+\left[\begin{array}{lll}100 & 200 & 300 \\ 100 & 200 & 300\end{array}\right]=\left[\begin{array}{lll}101 & 202 & 303 \\ 104 & 205 & 306\end{array}\right]$  
    
    
  $\left[\begin{array}{lll}1 & 2 & 3 \\ 4 & 5 & 6\end{array}\right]+\left[\begin{array}{l}100 \\ 200\end{array}\right]=\left[\begin{array}{lll}1 & 2 & 3 \\ 4 & 5 & 6\end{array}\right]+\left[\begin{array}{ccc}100 & 100 & 100 \\ 200 & 200 & 200\end{array}\right]=\left[\begin{array}{ccc}101 & 102 & 103 \\ 204 & 205 & 206\end{array}\right]$

In [12]:
a = np.random.randn(5)
print(a)
print(a.shape)
print(a.T)
print(np.dot(a,a.T)) 

[ 0.82236043 -0.23316852  1.70631536  0.05589387 -0.07809429]
(5,)
[ 0.82236043 -0.23316852  1.70631536  0.05589387 -0.07809429]
3.6513791981434345


if we use this way to generate $a$, we can see that $a$ and $a^T$ are same and their product is a number. So, we'd better point out the shape of $a$ when generate it.

In [14]:
a = np.random.randn(5,1)
print(a)
print(a.shape)

[[ 0.43003489]
 [-0.33991356]
 [-0.38556682]
 [-0.3753623 ]
 [ 0.30747038]]
(5, 1)


In [16]:
print(a.T)

[[ 0.43003489 -0.33991356 -0.38556682 -0.3753623   0.30747038]]


In [17]:
print(np.dot(a,a.T)) 

[[ 0.18493    -0.14617469 -0.16580718 -0.16141889  0.13222299]
 [-0.14617469  0.11554123  0.13105939  0.12759074 -0.10451335]
 [-0.16580718  0.13105939  0.14866177  0.14472725 -0.11855038]
 [-0.16141889  0.12759074  0.14472725  0.14089686 -0.11541279]
 [ 0.13222299 -0.10451335 -0.11855038 -0.11541279  0.09453804]]


In [20]:
x = np.random.randn(4,3)
y = np.random.randn(1,3)
c=x*y
print(c.shape)

(4, 3)


x=np.array([[[1],[2]],[[3],[4]]])
print(x.shape)

In [21]:
x=np.array([[[1],[2]],[[3],[4]]])
print(x.shape)

(2, 2, 1)
