## Chapter 2 - A firts look at a neural network

In machine learning, a *category in a classification problem is called class - Data points are called samples. The class associated with a specific example is called a label

In [2]:
from tensorflow.keras.datasets import mnist
(train_images,train_labels),(test_images,test_labels) = mnist.load_data()

2022-11-26 13:36:19.960900: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-26 13:36:22.518452: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-11-26 13:36:22.518476: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-11-26 13:36:26.781015: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directo

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [3]:
train_images.shape

(60000, 28, 28)

### The Network Architecture

In [4]:
from tensorflow import keras
from tensorflow.keras import layers
model = keras.Sequential([
    layers.Dense(512,activation='relu'),
    layers.Dense(10,activation='softmax')
])

2022-11-26 13:39:15.783816: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-11-26 13:39:15.799955: W tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:265] failed call to cuInit: UNKNOWN ERROR (303)
2022-11-26 13:39:15.799995: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (codespaces-c1fe2a): /proc/driver/nvidia/version does not exist
2022-11-26 13:39:15.800498: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


## Compilation Step

An Optimizer - The mechanism through which the model will update itself based on the training data, to improve PERFORMANCE
A Loss Function - How the model will be able to measure its performance on the training data, thus steering itself in the right direction
Metrics -> Accuracy, fraction of images/data correctly classified or Predicted (for regression)

In [5]:
model.compile(optimizer="rmsprop",
              loss="sparse_categorical_crossentropy",
              metrics=["accuracy"])

## Preparing the image data

In [8]:
train_images = train_images.reshape((60000,28*28))
train_images = train_images.astype("float32")/255

test_images = test_images.reshape((10000,28*28))
test_images = test_images.astype("float32")/255


In [9]:
model.fit(train_images,train_labels,epochs=5,batch_size=128)

2022-11-26 13:49:06.549025: W tensorflow/tsl/framework/cpu_allocator_impl.cc:82] Allocation of 188160000 exceeds 10% of free system memory.


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f6234774ca0>

### Making Predictions

In [12]:
test_digits = test_images[0:10]
predictions = model.predict(test_digits)
predictions[0].argmax()



7

## Evaluating on a New Model

In [13]:
test_loss, test_acc = model.evaluate(test_images,test_labels)

2022-11-26 13:53:11.763107: W tensorflow/tsl/framework/cpu_allocator_impl.cc:82] Allocation of 31360000 exceeds 10% of free system memory.




## Data Representation in a NN

- *Tensor* is a container for data, usually numerical data, a matrix is a rank-2 tensor
- *Scalars* - Tensor that contain only one number or rank-0

In [14]:
import numpy as np
x = np.array(12)
print(x,x.ndim)


12 0


- *Vectors(rank-1 tensors)* 

In [15]:
x = np.array([12,3,6,14,7])
x

In [16]:
x.ndim

1

- *Matrices* An array of vectors or rank-2 tensor, or 2D tensor - A matrix has 2 axis (rows and columns)

In [19]:
x = np.array([[12,3,6,14,7],
             [12,3,6,14,7],
             [12,3,6,14,7],
             [12,3,6,14,7]])
x

array([[12,  3,  6, 14,  7],
       [12,  3,  6, 14,  7],
       [12,  3,  6, 14,  7],
       [12,  3,  6, 14,  7]])

In [20]:
x.ndim

2

- *Rank-3* and higher-rank tensors - Essentially packing matrices

In [23]:
x = np.array([[[12,3,6,14,7],
             [12,3,6,14,7],
             [12,3,6,14,7],
             [12,3,6,14,7]],
             [[12,3,6,14,7],
             [12,3,6,14,7],
             [12,3,6,14,7],
             [12,3,6,14,7]]])
x.ndim

3

## Manipulating Tensors in Numpy

In [28]:
## Slicing
train_images2 = train_images.reshape(60000,28,28)
my_slice = train_images2[10:100]
my_slice.shape

(90, 28, 28)

In [36]:
## Selects a 14x14 in all images
train_images2[:,14:,14:].shape

(60000, 14, 14)

## Key Attributes

A tensor is defined by 3 key attributes:
- Number of Axis (rank) ; For instance a rank-3 tensor has 3 axis -- numpy .ndim
- Shape: Tuple of integers that describes how many dimensions are in the structure -- numpy .shape
- Data Type: Type of data - float16,float32,float64,unit8 -- numpy .dtype

## Data Batches

- In general the first axis (axis-0) in all data tensors you'll come across will be the sample axis
- Sample axis

In [38]:
batch = train_images2 [128:256]
batch.shape

(128, 28, 28)

In [40]:
n = 3
batch = train_images[128*n:128*(n+1)]
batch.shape

(128, 784)

Real World Data Examples
- Vector Data: Rank-2 tensors of shape(amples,features)
- Time Series data or Sequence Data - Rank-3 tensors of shape (samples,timesteps,features), where each sample is a sequence of lenght timesteps of feature vectors
- Images - Rank4-Tensor (samples,height,width,channels), where each sample is a 2D grids of pixes and each pixel is represented by a vectorof value ("channels")
- Video - Rank-5 Tensor of shape (samples,frames,height,width,channels), where each sample is a sequence (of length frame) of images

# 2.3 The Gears of Neural Networks: Tensor Operations

output = reul(dot(input,W)+b)
** Dot Product between the Input Tensor and W Tensor
** An addition(+) between the resulting matrix and vector b
** A Relu operation: reul(x) is max(x,0)

In practice when dealing with NumPy arrays, these operations aree available as well optimized built-in NumPy functions, which themselves delegate the heavy lifting to an BLAS (Basic Linear Algebra Subprograms)

#### Element-wise Operations

In [41]:
import numpy as np

x = np.random.random((20,100))
y = np.random.random((20,100))

z = x + y




In [43]:
z = np.maximum(z,0.)
z

array([[1.48394464, 1.33798854, 1.23484468, ..., 1.37347341, 1.17851908,
        0.9391144 ],
       [0.86010941, 0.86324667, 0.96773175, ..., 0.78550363, 0.4666251 ,
        1.42922936],
       [1.29972545, 0.60485555, 1.13141772, ..., 1.09798347, 0.73080414,
        0.40373412],
       ...,
       [1.42453232, 0.94235561, 1.2148877 , ..., 0.30865083, 0.46137207,
        0.74007746],
       [0.66938938, 1.33344567, 0.96352043, ..., 1.38288867, 1.05470122,
        1.09951635],
       [0.89108557, 0.81403623, 0.89881206, ..., 0.56076223, 1.22000215,
        1.77467402]])

#### Broadcasting

When the shapes of the 2 tensors added are different:
- Axes (broadcast axes) are added to the smaller tensor to match ndim of large tensor
- The smaller tensor is repeated alongside these new axes to match the full shape of the larger tensor

In [46]:
x = np.random.random((64,3,32,10))
y = np.random.random((32,10))

z = np.maximum(x,y)
z.shape

(64, 3, 32, 10)

#### Tensor Product 

In [47]:
x = np.random.random((32,))
y = np.random.random((32,))


In [52]:
x * y
m=0
for i in range(x.shape[0]):
    m += x[i] * y[i]
m

6.606005790514937

In [50]:
z = np.dot(x,y)
z

6.6060057905149385

#### Tensor Rehsaping

In [54]:
x = np.array([[0,1],
            [2,3],
            [4,5]])

x

array([[0, 1],
       [2, 3],
       [4, 5]])

In [55]:
x.shape

(3, 2)

In [60]:
x.reshape((6,1)) # Same number of coefficients

array([[0],
       [1],
       [2],
       [3],
       [4],
       [5]])

In [63]:
# Transposition
x = np.zeros((2,6))
x

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [64]:
np.transpose(x)

array([[0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.]])

# The Engine of Neural Networks: Gradient Based

output = relu(dot(input,W) + b)
- W = Weights or Trainable Parameters
- b = bias

Also known as kernel and bias attributable

At the start the Weight and Bias are randomly initialized , the gradual adjustment also called training

Example of a Trining Loop:
- Draw a batch of training samples,x, and corresponding targets, y_true
- Run the model on x (a step called the forward pass) to obtain predictions, y_pred
- Compute the loss of the model on the batch, a measure of the mismatch y_pred and y_true
- Update all weights of the model in a way that slightly reduces the loss on this batch 

#### Stochastic Gradient Descent

- 1.- Draw a batch of training samples,x, and corresponding targets, y_true
- 2.- Run the model on x to obtain predictions, y_pred (forward pass)\
- 3.- Compute the loss of the mode on the batch, a measure of the mismatch between y_pred and y_true
- 4.- COmpute the gradient of the loss with regards to the model's parameters (backward pass)
- 5.- Move the parameters a little in the opposite direction from the gradient - for example, W -= learning rate * gradient - The learning rate would be the scalar factor modulating the speed

The concept above is mini-batch stochastic gradien descent (mini-batch SGD)

# The Backpropagation Algorithm 

In [67]:
#The application of a chain rule to a graph


In [69]:
#GradienTape
import tensorflow as tf

x=tf.Variable(0.)

with tf.GradientTape() as tape:
    y = 2 * x + 3
grad_of_t_wrt_x = tape.gradient(x,y)




In [71]:
grad_of_t_wrt_x

In [72]:
import tensorflow as tf

class NaiveDense:
    def __init__(self,input_size,output_size,activation):
        self.activation = activation

        w_shape = (input_size,output_size)
        w_initial_value = tf.random.uniform(w_shape,minval=0,maxval=1e-1)

        b_shape = (output_size,)
        b_initial_value = tf.zeros(b_shape)
        self.b = tf.Variable(b_initial_value)

    def _call_(self,inputs):
        return self.activation(tf.matmul(inputs,self.W)+self.b)
    
    @property
    def weights(self):
        return [self.W,self.b]


Bad pipe message: %s [b'\xbe\xfd<\x89\x8fce\xa1\xdb9']
Bad pipe message: %s [b"\x06\x8e4\xec\x9f'\xd8!T)\x9c;\x04\x0f\x0b+lD\x00\x00|\xc0,\xc00\x00\xa3\x00\x9f\xcc\xa9\xcc\xa8\xcc\xaa\xc0\xaf\xc0\xad\xc0\xa3\xc0\x9f\xc0]\xc0a\xc0W\xc0S\xc0+\xc0/\x00\xa2\x00\x9e\xc0\xae\xc0\xac\xc0\xa2\xc0\x9e\xc0\\\xc0`\xc0V\xc0R\xc0$\xc0(\x00k\x00j\xc0#\xc0'\x00g\x00@\xc0\n\xc0\x14\x009\x008\xc0\t\xc0\x13\x003\x002\x00\x9d\xc0\xa1\xc0\x9d\xc0Q\x00\x9c\xc0\xa0\xc0\x9c\xc0P\x00=\x00<\x005\x00/\x00\x9a\x00\x99\xc0\x07\xc0\x11\x00\x96\x00\x05\x00\xff\x01\x00\x00j\x00\x00\x00\x0e\x00\x0c\x00\x00\t127.0", b'.1\x00\x0b\x00\x04\x03\x00\x01\x02\x00\n\x00\x0c\x00\n\x00\x1d\x00\x17\x00\x1e\x00\x19\x00\x18\x00#\x00\x00\x00\x16\x00\x00\x00\x17\x00\x00\x00\r\x000\x00.\x04']
Bad pipe message: %s [b'\x03\x06', b'\x07\x08']
Bad pipe message: %s [b'\t\x08\n\x08\x0b\x08\x04']
Bad pipe message: %s [b'\x08\x06\x04\x01\x05\x01\x06', b'', b'\x03\x03']
Bad pipe message: %s [b'']
Bad pipe message: %s [b'', b'\x02']
Bad pipe m