In [1]:
# Importing libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import h5py
from scipy.special import expit

In [2]:
def load_dataset():
    train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
    train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features
    train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels

    test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
    test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features
    test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels

    classes = np.array(test_dataset["list_classes"][:]) # the list of classes
    
    train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
    test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
    
    return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes

<h2>Loading the data (cat/non-cat)</h2> 
<p>We added <b>"_orig"</b> at the end of image datasets (train and test) because we are going to preprocess them. <br>After preprocessing, we will end up with train_set_x and test_set_x (the labels train_set_y and test_set_y don't need any preprocessing.<br> Each line of your train_set_x_orig and test_set_x_orig is an array representing an image.</p>

In [None]:
train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes = load_dataset()

<h2>How is an image stored on a computer?</h2>
<img style="display:block;" src="./img/1.png" alt="Photo 1"/>
<p>An image is store in the computer in three separate matrices corresponding to the Red, Green, and Blue
color channels of the image. The three matrices have the same size as the image, for example, the
resolution of the cat image is 64 pixels X 64 pixels, the three matrices (RGB) are 64 X 64 each.
The value in a cell represents the pixel intensity which will be used to create a feature vector of ndimension. In pattern recognition and machine learning, a feature vector represents an object, in this
case, a cat or no cat.
To create a feature vector, 𝑥, the pixel intensity values will be “unroll” or “reshape” for each color. The
dimension of the input feature vector 𝑥 is <b>𝑛𝑥</b> = 64 𝑥 64 𝑥 3 = 12288.</p>
<img style="display:block;" src="./img/2.png" alt="Photo 2"/>

In [10]:
m_train = train_set_x_orig.shape[0]
m_test  = test_set_x_orig.shape[0]
num_px  = train_set_x_orig.shape[1]

- m_train (number of training examples)
- m_test (number of test examples)
- num_px (= height = width of a training image)

<h2>Reshape the training and test data sets</h2>
<p>so that images of size (num_px, num_px, 3) are flattened into single vectors of shape (num_px $*$ num_px $*$ 3, 1).</p>
<img src="./img/3.png" style="display: block;width:55%;" alt="Photo 3"/>

In [14]:
train_set_x_flatten = train_set_x_orig.reshape((m_train, -1)).T 
test_set_x_flatten  = test_set_x_orig.reshape((m_test, -1)).T     

<h2>Standardize dataset</h2>
<p>To represent color images, the red, green and blue channels (RGB) must be specified for each pixel, and so the pixel value is actually a vector of three numbers ranging from 0 to 255.

One common preprocessing step in machine learning is to center and standardize your dataset, meaning that you substract the mean of the whole numpy array from each example, and then divide each example by the standard deviation of the whole numpy array. But for picture datasets, it is simpler and more convenient and works almost as well to just divide every row of the dataset by 255 (the maximum value of a pixel channel).</p>

In [16]:
train_set_x = train_set_x_flatten / 255
test_set_x  = test_set_x_flatten / 255

<h1>General Architecture of the learning algorithm</h1>
<p>The following Figure explains why <b>Logistic Regression</b> is actually a very simple <b>Neural Network</b>!.<br>Logistic regression is a learning algorithm used in a supervised learning problem when the output 𝑦 are
all either zero or one. The goal of logistic regression is to minimize the error between its predictions and
training data.</p>
<p><b>Example:</b> <br>Cat vs No - cat
Given an image represented by a feature vector 𝑥, the algorithm will evaluate the probability of a cat
being in that image.</p>
<img src="./img/0.png" style="display: block;width:65%;" alt="Photo 3"/>

<h3 style="text-align:center">Mathematical expression of the algorithm:</h3>
<p style="text-align:center">${z}^{(i)} = w^{T}.x^{{i}} + b$</p>
<p style="text-align:center">$\hat{y}^{(i)} = \sigma({z}^{(i)}) = \frac{1}{1  +  e^{-{z}^{(i)}}} = P({y}^{(i)} = 1 | x);0 \leq \hat{y}^{(i)} \leq 1 $</p>
<p style="text-align:center">$\mathcal{L}(\hat{y}^{(i)}, y^{(i)}) = - (y^{(i)}\log{\hat{y}^{(i)}} + (1 - y^{(i)})\log({1 - \hat{y}^{(i)}}))$</p>
<p style="text-align:center">$\mathcal{J}(w, b) = \frac {1}{m}\left( \sum_{i=1}^m \mathcal{L}(\hat{y}^{(i)}, y^{(i)}) \right) $</p>
<p style="text-align:center">$\frac{\partial \mathcal{J}(w, b)}{\partial w} = \frac{\partial \mathcal{J}(w, b)}{\partial \mathcal{L}(\hat{y}^{(i)}, y^{(i)})} \times \frac{\mathcal{L}(\hat{y}^{(i)}, y^{(i)})}{\partial \hat{y}} \times \frac{\partial \hat{y}}{\partial z} \times \frac{\partial z}{\partial w} = (\hat{y} - y).x$</p>
<p style="text-align:center">$\frac{\partial \mathcal{J}(w, b)}{\partial b} = \frac{\partial \mathcal{J}(w, b)}{\partial \mathcal{L}(\hat{y}^{(i)}, y^{(i)})} \times \frac{\mathcal{L}(\hat{y}^{(i)}, y^{(i)})}{\partial \hat{y}} \times \frac{\partial \hat{y}}{\partial z} \times \frac{\partial z}{\partial b} = (\hat{y} - y)$</p>