<a href="https://colab.research.google.com/github/XXXXiner/Deep-Learning/blob/main/tensorflow_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import tensorflow as tf
import numpy as np

In [None]:
print("TensorFlow version:", tf.__version__)


TensorFlow version: 2.15.0


Instead of standard Python data structures, Tensorflow uses “tensors.”

To convert a Python objects of type list, int, str, etc; use tf.constant


In [None]:
tf.constant([1.0, 2.0, 3.0])

In [None]:
tf.constant('hello world')

Use .numpy() to get a numpy value from a tensor

In [None]:
tf.constant('hello world').numpy() #.decode() #to get the string

Random is useful for generating weights and bias matrices

In [None]:
x = tf.random.normal([2, 2])
W = tf.random.uniform([2, 2])
print(x)
print(W)

Addition and multiplication are overloaded (a feature that allows a class to have more than one method with the same name) in Tensorflow

In [None]:
print(tf.add(x, W))
print(x + W)

In [None]:
print(tf.multiply(x, W))
print(x * W)

**Remember:** tf.multiply is **not** matrix multiplication, but elementwise multiplication. For dense layers, use tf.matmul:

In [None]:
tf.matmul(x, W)

**Useful operations in TF**

**Note** that many tensorflow operations are the same as numpy operations.

This does not mean that you should use numpy operations on tensors anywhere in your model: this will break autodiff. Only use numpy operations in data preprocessing or in evaluation.


**Argmax** is useful for finding the maximum value logit when making predictions

In [None]:
print(x)
print(tf.argmax(x))
print(np.argmax(x))

**Transpose** will be used in later assigments. The transpose of a matrix is that same matrix with the rows and columns switched (https://en.wikipedia.org/wiki/Transpose)

In [None]:
print(tf.transpose(x))
print(np.transpose(x))

**Shape** is useful for getting the dimensions of your tensor. This often comes in handy when debugging

In [None]:
print(tf.shape(x))
print(np.shape(x))

**Reduce_mean** will take the mean along the axis specified, or along all values if no axis is specified. This is useful in loss calculations.

In [None]:
print(tf.reduce_mean(x)) #similar idea for sum (reduce_sum)
print(np.mean(x))

**Reshape** is useful for changing the dimension of your data.

In [None]:
print(tf.reshape(x, [1,4])) #a shape of [-1] flattens into 1-D
print(np.reshape(x, [1,4]))

Variables are Tensors that can change (or be updated). This makes them useful for our weight and bias matrices (or parameters) for the model.

In [None]:
x = tf.Variable([2.1, 3.5, 9.1])
b = tf.Variable([1.0, 0.2, 0.3])

tf.add(x,b)

**Building a neural network**

Now that we've got the basics out of the way, let's see how to define a one-layer neural network model for classifying MNIST digits:

In [None]:
#load MNIST data as training and test sets
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)

(60000, 28, 28) (60000,)
(10000, 28, 28) (10000,)


In [None]:
# reshape data to make it one-dimensional and normalize it to scale between 0 and 1
x_train = tf.reshape(x_train / 255.0, [-1,784])
x_test = tf.reshape(x_test / 255.0, [-1,784])

#set batch size
batch_size=500

In [None]:
class Model():
  def __init__(self):
    # declare weights and bias matrices
    self.W = tf.Variable(tf.random.uniform([784,10], -1, 1, dtype=tf.float64))
    self.b = tf.Variable(tf.random.uniform([10], -1, 1, dtype=tf.float64))

  def get_logits(self, x):
    return tf.matmul(x,self.W) + self.b

  def get_loss(self, data, label):
    # getting mean loss across examples
    return tf.reduce_mean(
      # crossentropy where the label = index of correct answer in logits
      # from_logits = apply softmax
      tf.losses.SparseCategoricalCrossentropy(from_logits=True)(
          label, self.get_logits(data)
      )
    )



Notice that we use random numbers to initialize our weights and biases. Why do we do that? Why not initialize them to zero, like we did with the perceptron algorithm? Think about how gradient descent works: if all of the weights start out with the same value (i.e. zero), then the derivative of the loss with respect to each weight will be the same. The consequence of this: the gradient updates for all the weights will be the same, so all the weights will end up with the exact same value! This is clearly not good behavior--it reduces the power of our neural network.

To keep this from happening, we initialize the weights with small random values, so that their derivatives are all different, and gradient descent can then push them in different directions.

In the code above, we also randomly initialize the biases. This is not strictly necessary (randomly initializing the weights alone is enough to get around the problem described earlier), but it's common practice.


Here's how we train the model. Note the use of tf.GradientTape(); this is a data structure that records the computation graph of code we execute, so that we can walk that graph backward and compute gradients.

In [None]:

# declare model to setup variables
model = Model()

# use stochastic gradient descent
optimizer = tf.optimizers.SGD()

def train(inputs, outputs):
    # use gradient tape to record the loss calculation
    with tf.GradientTape() as tape:
      loss = model.get_loss(inputs, outputs)

    # use tape.gradient to retrieve δ(loss) / δ(weight_i), etc
    grads = tape.gradient(loss, [model.W, model.b])
    # apply the gradients to our weights and bias
    optimizer.apply_gradients(zip(grads, [model.W, model.b]))

# step through data one batch at a time, and apply training step

for x in range(0, x_train.shape[0], batch_size):
  print(f'processed {x}/{x_train.shape[0]}')
  train(x_train[x: x + batch_size, :], y_train[x: x + batch_size])




Alternatively, you can use 'optimizer.minimize,' which is syntax sugar that encapsulates all the gradient tape business above:

In [None]:
model = Model()

optimizer = tf.optimizers.SGD()

def train(inputs, outputs):
    # .minimize() applies all the steps in our train function
    optimizer.minimize(
      lambda: model.get_loss(inputs, outputs),
      [model.W, model.b]
    )


for x in range(0, x_train.shape[0], batch_size):
  print(f'processed {x}/{x_train.shape[0]}')
  train(x_train[x: x + batch_size, :], y_train[x: x + batch_size])


Finally, we can evaluate the loss on our test set, as well as the prediction accuracy on our test set:

In [None]:
loss = model.get_loss(x_test, y_test)
print(loss)

pred = np.argmax(model.get_logits(x_test), axis=1)
acc = np.mean(pred == y_test)
print(acc)

Acknowledgements: This demo is a modified version of the one originally created by Daniel Ritchie