# Tensor: basic unit for data

Tensor is the basic unit of data in tensorflow. It can be considered as an array with multiple dimensions:
- scalar (array with 0 dimension)
- vector (array with 1 dimension)
- matrix (array with 2 dimension)

In [1]:
import tensorflow as tf

In [2]:
random_float = tf.random.uniform(shape=())             # Declare a random float (scalar).
print(random_float)

tf.Tensor(0.38893163, shape=(), dtype=float32)


In [3]:
zero_vector = tf.zeros(shape=(2))                      # Declare a zero vector with two elements, default type: float.
print(zero_vector)
zero_vector1 = tf.zeros(shape=(2), dtype = tf.int32)   # # Declare two 2*2 constant matrices A and B, change float to int
print(zero_vector1)

tf.Tensor([0. 0.], shape=(2,), dtype=float32)
tf.Tensor([0 0], shape=(2,), dtype=int32)


In [4]:
A = tf.constant([[1., 2.], [3., 4.]])                  # Declare a 2×2 constant matrix
print(A)
B = tf.constant([[5., 6.], [7., 8.]])
print(B)

tf.Tensor(
[[1. 2.]
 [3. 4.]], shape=(2, 2), dtype=float32)
tf.Tensor(
[[5. 6.]
 [7. 8.]], shape=(2, 2), dtype=float32)


A tensor have 3 important attributes: shape, data type and value. You can use the shape, data type attribute and the numpy() method to fetch them. For example:

In [5]:
# View the shape, type and value of matrix A.
print(A.shape)      
print(A.dtype)      
print(A.numpy())     #  numpy() method of a tensor is to return a NumPy array whose value equals the value of the tensor.

(2, 2)
<dtype: 'float32'>
[[1. 2.]
 [3. 4.]]


In [6]:
C = tf.add(A, B)            # Compute the elementwise sum of A and B.
D = tf.matmul(A, B)         # Compute the multiplication of A and B.
print(C)
print(D)

tf.Tensor(
[[ 6.  8.]
 [10. 12.]], shape=(2, 2), dtype=float32)
tf.Tensor(
[[19. 22.]
 [43. 50.]], shape=(2, 2), dtype=float32)


# Automatic differentiation mechanism
In machine learning, we often need to compute derivatives of functions. TensorFlow provides the powerful Automatic differentiation mechanism for computing derivatives. The following codes show how to use tf.GradientTape() to computer the derivative of the function $y(x) = x^2$ at $x = 3$:

In [7]:
import tensorflow as tf

# variable can be used to differentiate by the automatic differentiation mechanism of TensorFlow by default, 
# which is often used to define parameters of ML models.
x = tf.Variable(initial_value = 3.) 

# All calculation steps will be recorded within the context of tf.GradientTape() for differentiation.
with tf.GradientTape() as tape:
    y = tf.square(x)
    y = tf.square(x)
    
# Compute the derivative of y with respect to x.
y_grad = tape.gradient(y, x)        
print([y, y_grad])

[<tf.Tensor: id=28, shape=(), dtype=float32, numpy=9.0>, <tf.Tensor: id=32, shape=(), dtype=float32, numpy=6.0>]


The more common case in machine learning is partial differentiation of multivariable functions as well as differentiation of vectors and matrices.

The following codes show how to obtain the partial derivative of the function 
$$L(w, b) = \|Xw + b - y\|^2 $$ for w, b respectively by tf.GradientTape() where $$X = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}, y = \begin{bmatrix} 1 \\ 2\end{bmatrix}$$

$$w = (1, 2)^T, b = 1 $$

In [8]:
X = tf.constant([[1., 2.], [3., 4.]])
y = tf.constant([[1.], [2.]])
w = tf.Variable(initial_value = [[1.], [2.]])
b = tf.Variable(initial_value = 1.)

with tf.GradientTape() as tape:
    L = 0.5 * tf.reduce_sum(tf.square(tf.matmul(X, w) + b - y))
    
w_grad, b_grad = tape.gradient(L, [w, b])              # Compute the partial derivative of L(w, b) with respect to w and b.
print([L.numpy(), w_grad.numpy(), b_grad.numpy()])

[62.5, array([[35.],
       [50.]], dtype=float32), 15.0]


From the output we can see TensorFlow has helped us obtained that

$$L((1, 2)^T, 1) = 62.5$$

$$\frac{\partial L(w, b)}{\partial w}|_{w = (1, 2)^T, b = 1} = \begin{bmatrix} 35 \\ 50\end{bmatrix}$$

$$\frac{\partial L(w, b)}{\partial b} |_{w = (1, 2)^T, b = 1} = 15$$

In [9]:
# tf.square() here squared each element of the input tensor without altering its shape. 
# tf.reduce_sum() summed up all the elements of the input tensor, outputing a scalar tensor with a none shape 
# (the dimensions for sum can be specified by the parameter axis, without which all elements will be summed up by default).

# A basic example: Linear regression

Consider a practical problem. The estate price of a city between 2013 and 2017 are listed below:

| Year | 2013 | 2014 | 2015 | 2016 | 2017 |
| --- | --- | --- | --- | --- | --- |
| Price | 12000 | 14000 | 15000 | 16500 | 17500 |

Now we wish to perform a linear regression on this data, that is, use the linar model $y = ax + b$ to fit the data above, where a and b are parameters yet to be determined.

## Define the data and conduct basic normalization

In [10]:
import numpy as np

X_raw = np.array([2013, 2014, 2015, 2016, 2017], dtype = np.float32)
y_raw = np.array([12000, 14000, 15000, 16500, 17500], dtype = np.float32)

X = (X_raw - X_raw.min()) / (X_raw.max() - X_raw.min())                     # normalization X
print(X)

y = (y_raw - y_raw.min()) / (y_raw.max() - y_raw.min())
print(y)

[0.   0.25 0.5  0.75 1.  ]
[0.         0.36363637 0.54545456 0.8181818  1.        ]


## Gradient descent to find the parameters a and b.

To find a <font color=red>local minimum</font> of a multivariable function $f(x)$, the process of gradient descent is as follows:

- Initialize the independent variable to $x_0$, $k=0$.
- Iterate the following steps until the convergence criterion is met:
    - Find the gradient $\nabla f(x_k)$ of the function $f(x)$ with respect to the independent variable.

    - Update the independent variable: $x_{k+1} = x_{k} - \gamma \nabla f(x_k)$  where $\gamma$ is the learning rate (i.e. the “stride” in one gradient descent).

    - $k \leftarrow k+1$.

Next, we consider how to programme to implement the gradient descent method to find the solution of the linear regression 

$$\min_{a, b} L(a, b) = \sum_{i=1}^n(ax_i + b - y_i)^2$$

## Linear regression under numPy
- np.dot(A, B) <--> A.dot(B): the dot product: actually it's matrix multiplication.
- A * B：the multiplication of the elements in the responding position 
- np.sum() gets the sum

In [11]:
# (1) initialize the parameters a and b
a, b = 0, 0

# (2) initialize the training epoches and learning rate
num_epoch = 10000
learning_rate = 1e-3

# (3) training
for e in range(num_epoch):
    # (3.1) compute predicted value
    y_pred = a * X + b
    
    # (3.1) Compute the gradient of loss function with respect to independent variables (model parameters) manually.
    grad_a, grad_b = (y_pred - y).dot(X), (y_pred - y).sum()

    # (3.3) Update parameters.
    a, b = a - learning_rate * grad_a, b - learning_rate * grad_b

print(a, b)

0.9763702027872221 0.057564988311377796


However, you may have already noticed that there are two pain points for implementing ML models when using conventional scientific computing libraries:

- You have to <font color=red>find the partial derivatives with respect to parameters by yourself often</font>. It may be easy for simple functions, but the process would be very painful or even impossible once the functions become complex.

- You have to <font color=red>update the parameters according to the result of the derivative by yourself frequently</font>. Here we used gradient descent, the most fundamental approach, thus it was not hard updating parameters. However, the process would have been very complicated if you use more advanced approaches updating parameters (e.g., Adam or Adagrad).

The emergence of DL frameworks such as TensorFlow has largely solved these problems and has brought considerable convenience for implementing ML models.

## Linear regression under TensorFlow
TensorFlow Eager Execution Mode is quite similar with how NumPy worked above, while it provides a series of features which are rather crucial for deep learning, such as faster computation (GPU support), automatic differentiation, optimizers, etc. Here TensorFlow helps us accomplished two crucial tasks:

- Using tape.gradient(ys, xs) to compute the gradient automatically
- Using optimizer.apply_gradients(grads_and_vars) to update model parameters automatically

In [12]:
# (0) data
X = tf.constant(X)
y = tf.constant(y)

# (1) initialize the parameters a and b
a = tf.Variable(initial_value = 0.)
b = tf.Variable(initial_value = 0.)
variables = [a, b]

# (2) initialize the training epoches, learning rate and optimizer
num_epoch = 10000
optimizer = tf.keras.optimizers.SGD(learning_rate = 1e-3)

# (3) training
for e in range(num_epoch):
    # (3.1) Use tf.GradientTape() to record information about the gradient of the loss function.
    with tf.GradientTape() as tape:
        y_pred = a * X + b
        loss = 0.5 * tf.reduce_sum(tf.square(y_pred - y))
        
    # (3.2) TensorFlow computes the gradients of the loss function with respect to independent variables (model parameters) automatically.
    grads = tape.gradient(loss, variables)
    
    # (3.3) TensorFlow updates parameters according to the gradient automatically.
    optimizer.apply_gradients(grads_and_vars = zip(grads, variables))

print(a, b)

<tf.Variable 'Variable:0' shape=() dtype=float32, numpy=0.97637> <tf.Variable 'Variable:0' shape=() dtype=float32, numpy=0.057565063>


- tf.keras.optimizers.SGD(learning_rate=1e-3): a gradient descent optimizer updates model parameters based on the calculated derivative result, thereby minimizing a certain loss function.
- optimizer.apply_gradients: call the function apply_gradients() for doing so.
- grads_and_vars: the variables to be updated (like variables in the codes above) and the partial derivatives of the loss function with respect to them (like grads in the codes above),
- Specifically, you need to pass in a Python list here whose elements are (the partial derivative for the variable, the variable) pairs, e.g., [(grad_a, a), (grad_b, b)] in this case. 
- grads = tape.gradient(loss, variables): the partial derivatives of loss with respect to each variable in variables = [a, b] recorded in tape, which are grads = [grad_a, grad_b]. Then we used the zip() function in Python to assemble grads = [grad_a, grad_b] and variables = [a, b] together to get the parameters we needed.