# Intro to Tensorflow/Keras and Tensors
Notebook originally by Josh Murr 2022, updated by Terence Broad 2023.

A few words about the course: this course assumes a certain level of programming knowledge by now, but don't worry I won't assume too much. Just that you have _some_ basic terminology understood (arrays, loops, variables etc.) and also that you have _some_ experience more specifically programming in __Python__ and that you have used packages such as __NumPy__ and __Matplotlib__ - but what is more important is that you know how to import a Python package and use it. If you are a modular student who is new to the world of Python and feel a bit lost, get in touch with Terence Broad and he can show you the ropes.


## Installing TensorFlow / Keras
### Option A: If you are an M1 Mac user:

This is our first time using tensorflow. You will need to do this a special way.

1. Open a terminal window
2. Activate your conda environment, e.g. by typing `conda activate nlp`
3. Install tensorflow dependencies for Mac by typing `conda install -c apple tensorflow-deps`
4. Install the Mac OS version of tensorflow by typing `python -m pip install tensorflow-macos==2.9`
5. Install tensorflow-metal by typing `python -m pip install tensorflow-metal==0.5.0`
6. Close this notebook. Kill the current notebook process using Ctrl+C (or just closing all terminal windows). Then open a new terminal and launch a new notebook using the `jupyter notebook` command

Then continue on where it says "Everyone: Continue here." below.


### OR, Option B. If you are NOT an M1 Mac user:

Run `!pip install tensorflow` in the cell below

In [None]:
!pip install tensorflow

### OR, I don't know what an M1 Mac is:

If this is the case, speak to Terence and he will help you figure out what kind of computer you have.

## Everyone: Continue here.

In [None]:
# Install keras if you haven't already
!pip install keras

### Lets import some libraries we will be using

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow import keras

## What is a Tensor?

First... 

### 1. A Number

I'm sure you know what a __number__ is... like `1`, or `10`, or `284.38`. A single lonely number out on it's own.

In Python we can assign a number to a variable like so:

In [None]:
lonely_number = 101 # Sorry, this is just for completeness :P

### 2. An Array

I'm also assuming you know what an __array__ is... Well, an array is just a list of __numbers__ like so: `[1, 10, 284.38]`. In mathematics an array is called a __vector__. Given that machine learning is pretty much a hybrid study of computer science and maths, there is a lot of crossover with terminology - but if you hear __vector__ think __array__.

> Yes, technically, with a [loosely typed language](https://www.computerhope.com/jargon/l/looslang.htm) like Python you could have an array with anything in it, not just numbers, but lets try and be mathsy and just focus on numbers for now.

In Python an array is declared with square brackets:

In [None]:
vector_or_array = [1, 10, 285.38]

### 3. A Matrix

I think you also probably know what a __matrix__ is? In laymans terms, it's some kind of rectangular data structure which holds more numbers..? Like so:

![A Matrix](./images/a-matrix.svg)

Another way of thinking about a matrix is that it a __list of arrays__ (or vectors). So we could think of the previous matrix as a list of __3__ arrays of __length 2__:

![Row Major Matrix](./images/row-major.svg)

Or indeed as a list of __2__ arrays of __length 3__:

![Row Major Matrix](./images/col-major.svg)

Mathematically this is irrelevant, but we work with computers and so need to think about how the computer will store and work with these numbers and so becomes quite important. With Python, NumPy, Tensorflow, Keras etc. 9 times in 10 we only need to think about the __former__ which is known as __row major__ (while the latter is __column major__).

So in Python notation we could declare our matrix like so:

In [None]:
mat = [[-1.3, 0.6],
       [20.4, 5.5],
       [ 9.7,-6.2]]

See how each __row__ in our matrix is just an __array__ of length 2?

In [None]:
print(f"Row 0: {mat[0]}")
print(f"Row 1: {mat[1]}")
print(f"Row 2: {mat[2]}")

The __length__ of the matrix is the __total number of rows__, while the length of any row is the __total number of columns__.

In [None]:
num_rows = len(mat)
num_cols = len(mat[0])

print(f"Our matrix has {num_cols} columns and {num_rows} rows!")

It is important to feel confident in forming arrays in code and also accessing particular elements of those arrays. Honestly, it's what you end up doing _most of the time_ when peeking into datasets, creating datasets, cleaning datasets, exploring the output of a model, looking at the weights of a model.... __Everything is in arrays of arrays__. Which leads neatly onto:

### 4. A Tensor

Recap: 

- A number can be thought of as a __0-Dimensional array__ (a point, a single number): `1`
- A series of numbers, or a vector, is a __1-Dimensional array__: `[1, 2, 3, 4, 5]`
- A matrix is a series of arrays, which is a __2-Dimensional array__: `[[1,2],[3,4]]`
- So... a Tensor... is a series of matrices! Or an __N-Dimensional array__. You can think of it as stacked matrices:

![A Tensor](./images/tensor.svg)

So if we look again at our single matrix:

In [None]:
mat = [[-1.3, 0.6],
       [20.4, 5.5],
       [ 9.7,-6.2]]

We now want to stack three of those on top of one another.. How? Well we just need to add another dimension to our list (I will admit this is where it can get confusing):

In [None]:
tensor = [[[-1.3, 0.6],
           [20.4, 5.5],
           [ 9.7,-6.2]],
          [[-1.3, 0.6],
           [20.4, 5.5],
           [ 9.7,-6.2]],
          [[-1.3, 0.6],
           [20.4, 5.5],
           [ 9.7,-6.2]]]

print(tensor)

Now, I appreciate this is not easy to read. It is rare that you will ever need to declare a tensor like this, but it is important to understand this structure. All you really need to be able to do is to understand which __dimension__ refers to what, and how you can index each dimension to get to the data you need.

I'm going to make a simpler tensor for demonstration purposes:

In [None]:
simple_tensor = [[[0, 0],
                  [0, 0],
                  [0, 0]],
                 [[1, 1],
                  [1, 1],
                  [1, 1]],
                 [[2, 2],
                  [2, 2],
                  [2, 2]]]

print(simple_tensor)

The first dimension, in this case, refers to each particular matrix:

In [None]:
print(simple_tensor[0])
print(simple_tensor[1])
print(simple_tensor[2])

The second dimension is then the __row__ is that matrix:

In [None]:
print(simple_tensor[0][1])

The third dimension is then the __column__ in that row:

In [None]:
print(simple_tensor[0][1][1]) # Col 1, of row 1, of matrix 0

Given than tensors are __N-Dimensional__, there really is no limits to the number of dimensions one could have. The best advice I can give right now is to just try and think in terms of stacked matrices, and then possibly stacked tensors.

Lets look at this same example in NumPy:

In [None]:
# Convert our simple_tensor into a NumPy array
np_tensor = np.array(simple_tensor)

print(np_tensor)

In [None]:
print(np_tensor.shape)

Just to illustrate the point, here are some big __N__-Dimensional tensors (or, arrays).

> NB. The first argument to `np.empty()` is the  _shape_ of the N-d array you want to make.

In [None]:
a = np.empty((64,16,16,3))
print(a.shape)

b = np.empty((1,1,1,1,8,1))
print(b.shape)

# The following will give you a memory error! It did for me any way. 

# c = np.empty((8,8,8,8,8,128,256,8,8))

# It is a BIG N-d array! Each entry is a 64bit Float so we are trying to allocate:
# 8*8*8*8*8*128*256*8*8*64(bits) = 4.398*10^12 bits ~= 512 gigabytes, lol

OK this might seem a bit boring but honestly, you're messing about with N-d arrays probably 90% of the time when working with ML frameworks or datasets.

Here are a few more NumPy array methods which will come in handy:

In [None]:
# np.empty does not initialise values, which technically can be faster,
# but you get whatever junk is left in memory at that location.
empty = np.empty((10,2))
print(empty)

# np.zeros/np.ones fills the values with 0's or 1's respectively.
zeros = np.zeros((2,3))
print(zeros)
ones = np.ones((2,3))
print(ones)

# .fill lets you fill an array with whatever you want
my_fill = np.empty((4,2))
my_fill.fill(28)
print(my_fill)

# np.full combines the above 2 operations
full = np.full((2, 4), 32)
print(full)

# You can normally add the dtype argument to any NumPy array to specify
# the datatype you need. Things like float, uint8, float64, int... etc.
# Can you guess why the array is full of 44's despite me specifying 300...?
uint8_array = np.full((3, 2), 300, dtype=np.uint8)
print(uint8_array)

Take a look at the [NumPy array creation routines for many more.](https://numpy.org/doc/stable/reference/routines.array-creation.html)

> ### Task!
> Open up `./tasks/01_building_n-d_arrays.ipynb` and work through the tasks there.
>
> They get progressively harder, naturally, and some may seem a little off topic, but they're all good exercises to get you better at working with N-d arrays.

## What does a machine learning library like Tensorflow do?

> We're going to be using Tensorflow and Keras for much of this course. But to be honest all ML frameworks seem to be converging on a similar workflow so skills you learn in one framework are transferrable.

At it's heart Tensorflow/PyTorch/Keras are just frameworks for manipulating N-d arrays, much like NumPy. NumPy is really great, so why do we just use that?

Well, ML frameworks give us a few more things:

- They can leverage hardware accelerators such as GPUs and TPUs.
- They can automatically compute the gradient of arbitrary differentiable tensor expressions.
- They allow computation to be distributed to large numbers of devices on a single machine, and large number of machines (potentially with multiple devices each).

Until we start making models, working with tensors in any ML framework is a lot like working with NumPy. But the syntax can be a bit more obtuse.

It is rare that you will actually need to explicitly be declaring tensors. But it is good to know how, and also how to access parts of tensors. It also highlights some fundamental details about how these frameworks actually work.

You can declare a tensor in Tensorflow in a number of ways, much like NumPy. Something to bare in mind is that a Tensorflow tensor holds much more information about its state than a simple N-d array which is what allows us to build complex computation graphs (models) and hand it over to Tensorflow to perform complex things like auto-differentiation and backpropagation.

So with that in mind, some values with be mutable (changeable), while others should be immutable (never change), therefore we can declare a tensor as a `constant` or `variable`.

#### Constants

In [None]:
constant_tensor = tf.constant([[5, 2], [1, 3]])
print(constant_tensor)

In [None]:
# We can use familiar methods:
print(constant_tensor.shape)

In [None]:
# and access elements:
print(constant_tensor[0][1])

In [None]:
# Note that even individual elements in the tensor are also tensors!
# To get the simple number we use .numpy() to convert to a NumPy array:
value = constant_tensor[0][1].numpy()
print(value)

# You can also use the following syntax if you prefer:
value = constant_tensor[0, 1].numpy()
print(value)

Much like NumPy we can also declare tensors with `.ones()` and `.zeros()`.

> NB. These methods return constant tensors.

In [None]:
print(tf.ones(shape=(2, 1)))
print(tf.zeros(shape=(2, 1)))

Tensorflow also has its fair share of random functions:

In [None]:
# Random numbers from a random normal/Gaussian distribution:
a = tf.random.normal(shape=(2, 2), mean=0.0, stddev=1.0)
print(a)

# Random numbers from a uniform distribution of specified limits:
b = tf.random.uniform(shape=(2, 2), minval=0, maxval=10, dtype="int32")
print(b)

#### Variables

Variables are special tensors used to store mutable state (such as the weights of a neural network). You create a Variable using some initial value:

In [None]:
initial_value = tf.random.normal(shape=(2, 2))
my_variable = tf.Variable(initial_value)
print(my_variable)

You update the value of a Variable by using the methods .assign(value), .assign_add(increment), or .assign_sub(decrement):

> Again, you probably will never do this, but Tensorflow is doing this kinda thing behind the scenes.

In [None]:
new_value = tf.random.normal(shape=(2, 2))
my_variable.assign(new_value)
for i in range(2):
    for j in range(2):
        assert my_variable[i, j] == new_value[i, j]

added_value = tf.random.normal(shape=(2, 2))
my_variable.assign_add(added_value)
for i in range(2):
    for j in range(2):
        assert my_variable[i, j] == new_value[i, j] + added_value[i, j]

You cannot do the same as above on a _constant_ tensor:

In [None]:
# This will give you an error
new_value = tf.random.normal(shape=constant_tensor.shape)
constant_tensor.assign(new_value)

Maths operations are again very similar to how you would do them with NumPy. Remember these operations are being performs on N-d arrays.

In [None]:
d = tf.random.normal(shape=(2, 2))
e = tf.random.normal(shape=(2, 2))

f = d + e
h = tf.square(f)
i = tf.exp(h)

print(i)

The element-wise nature of the operations might be more clear like so:

In [None]:
a = tf.constant([[1,2],[3,4]])
b = tf.constant([[5,6],[7,8]])

print(a * b)

Whereas a matrix-multiplication looks like so:

In [None]:
print(a @ b)

#### Gradients

Here's another big difference with NumPy: you can automatically retrieve the gradient of any differentiable expression. Performing calculus on the tensors in our network is used by our gradient descent algorithms to backpropagate the error signal through the model, and therefore update the weights in training. 

Being able to track which tensor connects to which, and then passing the gradients backwards through the model is the key difference between variables in TensorFlow and NumPy. If this all seems a bit complicated don't worry, it can be, but don't worry TensorFlow is designed to handle this all for us, so we usually don't have to think about it!

In [None]:
a = tf.Variable(d)
b = tf.Variable(e)

with tf.GradientTape() as tape:
    c = tf.sqrt(tf.square(a) + tf.square(b))
    dc_da = tape.gradient(c, a)
    print(dc_da)