This last notebook in Week 1. It implements a NN to recognize handwritten 0 and 1
* Binary Classification
* Implemented in Tensorflow first, and converged
* Replicated in Numpy using weights from Tensorflow
* No backpropagation
* Recognize [0-10] digits will be done later

In [2]:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

2023-07-20 16:19:19.181584: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Inputs

In [3]:
x_input = np.load("handwritten_X.npy")
x = x_input[:1000]
y_input = np.load("handwritten_Y.npy")
y = y_input[:1000]
x.shape, y.shape

((1000, 400), (1000, 1))

# Model

The neural network is shown in the figure below.    
- This has three dense layers with sigmoid activations.
    - Recall that our inputs are pixel values of digit images.
    - Since the images are of size $20\times20$, this gives us $400$ inputs  
    
<img src="./images/C2_W1_Assign1.PNG" width="400" height="200">

- The parameters have dimensions that are sized for a neural network with $25$ units in layer 1, $15$ units in layer 2 and $1$ output unit in layer 3. 

    - Recall that the dimensions of these parameters are determined as follows:
        - If network has $s_{in}$ units in a layer and $s_{out}$ units in the next layer, then 
            - $W$ will be of dimension $s_{in} \times s_{out}$.
            - $b$ will a vector with $s_{out}$ elements
  
    - Therefore, the shapes of `W`, and `b`,  are 
        - layer1: The shape of `W1` is (400, 25) and the shape of `b1` is (25,)
        - layer2: The shape of `W2` is (25, 15) and the shape of `b2` is: (15,)
        - layer3: The shape of `W3` is (15, 1) and the shape of `b3` is: (1,)
>**Note:** The bias vector `b` could be represented as a 1-D (n,) or 2-D (1,n) array. Tensorflow utilizes a 1-D representation and this lab will maintain that convention. 
               

# Tensorflow implementation

In [4]:
model = tf.keras.Sequential([
    tf.keras.Input(shape=(400,)), #For instance, shape=(32,) indicates that the expected input will be batches of 32-dimensional vectors
    tf.keras.layers.Dense(units=25, activation="sigmoid",name="L1"),
    tf.keras.layers.Dense(units=15, activation="sigmoid",name="L2"),
    tf.keras.layers.Dense(units=1, activation="sigmoid",name="L3"),
])

In [5]:
model.summary()

Model: "sequential"
_________________________________________________________________


 Layer (type)                Output Shape              Param #   
 L1 (Dense)                  (None, 25)                10025     
                                                                 
 L2 (Dense)                  (None, 15)                390       
                                                                 
 L3 (Dense)                  (None, 1)                 16        
                                                                 
Total params: 10,431
Trainable params: 10,431
Non-trainable params: 0
_________________________________________________________________


In [6]:
model.compile(
    optimizer = tf.keras.optimizers.Adam(0.001),
    loss = tf.keras.losses.BinaryCrossentropy(),
)

In [7]:
model.fit(x,y,
          epochs=20)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x129d61160>

Predictions

In [8]:
print("zero shape prediction - ",model.predict(x[0].reshape(1,400)))
print("one shape prediction - ",model.predict(x[500].reshape(1,400)))

zero shape prediction -  [[0.02239261]]
one shape prediction -  [[0.9883701]]


# Same model in Numpy (forward propagation)

sigmoid

In [9]:
g = lambda z: 1 / (1 + np.exp(-z))

dense layer

In [10]:
def dense(L_in, w, b):

    f = np.matmul(L_in, w) + b

    """
        option 1: f = np.matmul(L_in, w) + b
        option 2: f = np.dot(L_in, w) + b
        option 3 (for a single point prediction only):

            neurons = w.shape[1]
            L_out = np.zeros(neurons)

            for i in range(neurons):
                f = np.dot(L_in, w[:,i]) + b[i]
                L_out[i] = g(f)
    """
    L_out = g(f)

    print(f"{L_in.shape} x {w.shape} = {L_out.shape}")
    return L_out

sequential

In [11]:
def sequential(x, w1, b1, w2, b2, w3, b3):

    L1 = dense(x, w1, b1)
    L2 = dense(L1, w2, b2)
    L3 = dense(L2, w3, b3)

    return L3

copy weights from Tensorflow model

In [12]:
w1, b1 = model.get_layer("L1").get_weights()
w2, b2 = model.get_layer("L2").get_weights()
w3, b3 = model.get_layer("L3").get_weights()
print(f"w1.shape = {w1.shape}, b1.shape = {b1.shape}")
print(f"w1.shape = {w2.shape}, b1.shape = {b2.shape}")
print(f"w1.shape = {w3.shape}, b1.shape = {b3.shape}")

w1.shape = (400, 25), b1.shape = (25,)
w1.shape = (25, 15), b1.shape = (15,)
w1.shape = (15, 1), b1.shape = (1,)


Prediction

In [13]:
sequential(x[0], w1, b1, w2, b2, w3, b3) # zero prediction with 3 hidden layer shapes

(400,) x (400, 25) = (25,)
(25,) x (25, 15) = (15,)
(15,) x (15, 1) = (1,)


array([0.02239261])

In [14]:
sequential(x, w1, b1, w2, b2, w3, b3) # all train examples predictions

(1000, 400) x (400, 25) = (1000, 25)
(1000, 25) x (25, 15) = (1000, 15)
(1000, 15) x (15, 1) = (1000, 1)


array([[0.02239261],
       [0.0226944 ],
       [0.02353783],
       [0.02193246],
       [0.021432  ],
       [0.02173528],
       [0.02167218],
       [0.02669841],
       [0.02161833],
       [0.02225749],
       [0.02167044],
       [0.02210152],
       [0.02156783],
       [0.02435134],
       [0.02138937],
       [0.02321397],
       [0.02158349],
       [0.02145828],
       [0.02200736],
       [0.02192445],
       [0.0215527 ],
       [0.02194   ],
       [0.0217298 ],
       [0.02676002],
       [0.02227146],
       [0.02194191],
       [0.02519341],
       [0.02275094],
       [0.03014468],
       [0.02206775],
       [0.02945926],
       [0.02138226],
       [0.02133963],
       [0.0220685 ],
       [0.02134249],
       [0.02193717],
       [0.02202698],
       [0.02160501],
       [0.02138243],
       [0.02129947],
       [0.02138487],
       [0.02146596],
       [0.02238427],
       [0.02214385],
       [0.02166786],
       [0.0214178 ],
       [0.02136066],
       [0.021

# Numpy Broadcasting

In [15]:
a = np.ones(shape=((10,)))
b = 5

c = a + b
print(f"{c}, ---> , {c.shape}")

[6. 6. 6. 6. 6. 6. 6. 6. 6. 6.], ---> , (10,)


In [16]:
c = a * b
print(f"{c}, ---> , {c.shape}")

[5. 5. 5. 5. 5. 5. 5. 5. 5. 5.], ---> , (10,)


In [17]:
a = np.ones(shape=((10,1)))
b = 5 * np.ones(shape=(1,5))

c = a + b
print(f"{c}, ---> , {c.shape}")

[[6. 6. 6. 6. 6.]
 [6. 6. 6. 6. 6.]
 [6. 6. 6. 6. 6.]
 [6. 6. 6. 6. 6.]
 [6. 6. 6. 6. 6.]
 [6. 6. 6. 6. 6.]
 [6. 6. 6. 6. 6.]
 [6. 6. 6. 6. 6.]
 [6. 6. 6. 6. 6.]
 [6. 6. 6. 6. 6.]], ---> , (10, 5)


In [18]:
c = a * b
print(f"{c}, ---> , {c.shape}")

[[5. 5. 5. 5. 5.]
 [5. 5. 5. 5. 5.]
 [5. 5. 5. 5. 5.]
 [5. 5. 5. 5. 5.]
 [5. 5. 5. 5. 5.]
 [5. 5. 5. 5. 5.]
 [5. 5. 5. 5. 5.]
 [5. 5. 5. 5. 5.]
 [5. 5. 5. 5. 5.]
 [5. 5. 5. 5. 5.]], ---> , (10, 5)
