# **Introduction to Tensors and Variables**

**Learning objectives**

1. Understand basic and advanced `tensor` concepts
2. Understand single-axis and multi-axis indexing
3. Create `tensors` and `Variables`

## **Introduction**

In this notebook, we look at tensors, which are multi-dimensional arrays with a uniform type (called a `dtype`). Tensors are (kind of) like `np.arrays`. All tensors are immutable like Python numbers and strings: you can never update the contents of a tensor, only create a new one.

We also look at variables, a `tf.Variable` represents a tensor whose value can be changed by running operations (ops) on it. Specific ops allow you to read and modify the value of this tensor. Higher level libraries like `tf.keras` use `tf.Variable` to store model parameters.

## **Load necessary libraries**

Import necessary libraries

In [3]:
import tensorflow as tf
import numpy as np

print("TensorFlow version:", tf.version.VERSION)

TensorFlow version: 2.4.1


## **Basic and advanced tensor concepts**

### **Basics**

Let's create some basic tensors. 

#### **Tensor objects**
A **scalar** is a rank-0 tensor. A scalar contains a single value, and no *axes*

In [4]:
# A scalar tensor contains a single value and no axes
rank_0_tensor = tf.constant(4)
rank_0_tensor

<tf.Tensor: shape=(), dtype=int32, numpy=4>

A **vector** is a rank-1 tensor. A vector is like a list of values, it has 1 axis

In [5]:
# A vector tensor has 1 axis
rank_1_tensor = tf.constant([2.0, 3.0, 4.0])
rank_1_tensor

<tf.Tensor: shape=(3,), dtype=float32, numpy=array([2., 3., 4.], dtype=float32)>

A **matrix** is a rank-2 tensor. It has 2 axes.

In [6]:
# A matrix tensor has 2 axes
# dtype can be specified at creation
rank_2_tensor = tf.constant([[1, 2],
                             [3, 4],
                             [5, 6]], dtype=tf.float16)
rank_2_tensor

<tf.Tensor: shape=(3, 2), dtype=float16, numpy=
array([[1., 2.],
       [3., 4.],
       [5., 6.]], dtype=float16)>

<center><img src="img/tensors.png" width=500 height=500></center>

Tensors may have more axes, here is a tensor with 3 axes

In [7]:
# There can be an arbitrary number of axes
rank_3_tensor = tf.constant([
    [[0, 1, 2, 3, 4],
     [5, 6, 7, 8, 9]],
    [[10, 11, 12, 13, 14],
     [15, 16, 17, 18, 19]],
    [[20, 21, 22, 23, 24],
     [25, 26, 27, 28, 29]]
])

rank_3_tensor

<tf.Tensor: shape=(3, 2, 5), dtype=int32, numpy=
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9]],

       [[10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],

       [[20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29]]], dtype=int32)>

Tensors with more than 2 axes can be visualised in many ways

<center><img src="img/multi-tensors.png" width=600 height=600></center>

A tensor can be converted to a NumPy array either using `np.array` or the `tensor.numpy` method.

In [8]:
np.array(rank_2_tensor)

array([[1., 2.],
       [3., 4.],
       [5., 6.]], dtype=float16)

In [9]:
rank_2_tensor.numpy()

array([[1., 2.],
       [3., 4.],
       [5., 6.]], dtype=float16)

Tensors often contain `float` and `int` types, but may have many other types, including:
- complex numbers
- strings

The base `tf.Tensor` class requires tensors to be *rectangular* - that is, **along each axis, every element must be the same size**. However, some specialised types of tensors can handle different shapes:
- ragged tensors
- sparse tensors

#### **Basic math**

Basic math can be performed on tensors, including addition, element-wise multiplication, and matrix multiplication.

In [10]:
a = tf.constant([[1, 2],
                 [3, 4]])
b = tf.constant([[1, 1],
                 [1, 1]])

**Addition**

In [11]:
print(tf.add(a, b))

tf.Tensor(
[[2 3]
 [4 5]], shape=(2, 2), dtype=int32)


In [12]:
print(a + b)

tf.Tensor(
[[2 3]
 [4 5]], shape=(2, 2), dtype=int32)


**Element-wise multiplication**

In [13]:
print(tf.multiply(a, b))

tf.Tensor(
[[1 2]
 [3 4]], shape=(2, 2), dtype=int32)


In [14]:
print(a * b)

tf.Tensor(
[[1 2]
 [3 4]], shape=(2, 2), dtype=int32)


**Matrix multiplication**

In [15]:
print(tf.matmul(a, b))

tf.Tensor(
[[3 3]
 [7 7]], shape=(2, 2), dtype=int32)


In [16]:
print(a @ b)

tf.Tensor(
[[3 3]
 [7 7]], shape=(2, 2), dtype=int32)


**Tensor ops**

In [17]:
c = tf.constant([[4.0, 5.0], 
                 [10.0, 1.0]])

# Find the largest value
tf.reduce_max(c)

# Find the INDEX of the largest value
tf.argmax(c)

# Compute the softmax
tf.nn.softmax(c)

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[2.6894143e-01, 7.3105860e-01],
       [9.9987662e-01, 1.2339458e-04]], dtype=float32)>

### **Tensor shapes**

Tensors have shapes. Some vocabulary:
- **Shape** - Tuple storing the length of each of the dimensions of a tensor
- **Rank** - Number of tensor dimensions. A **scalar** is rank-0, a **vector** is rank-1, a **matrix** is rank-2
- **Axis** or **Dimension** - Lowest structure level of a tensor
- **Size** - Total number of the items in a tensor, its product shape vector

Tensors and `tf.TensorShape` objects have convenient properties for accessing these.

Note: although there may be references to *tensors of 2 dimensions*, a rank-2 tensor usually does not describe a 2D space.

In [18]:
# `tf.zeros` creates a tensor with all elements set to 0
rank_4_tensor = tf.zeros([3, 2, 4, 5])
rank_4_tensor

<tf.Tensor: shape=(3, 2, 4, 5), dtype=float32, numpy=
array([[[[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]]],


       [[[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]]],


       [[[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]]]], dtype=float32)>

<center><img src="img/rank-4-tensor.png" width=500 height=500><\center>

In [19]:
print("Type of every element:", rank_4_tensor.dtype)
print("Number of dimensions:", rank_4_tensor.ndim)
print("Shape of tensor:", rank_4_tensor.shape)
print("Elements along axis 0 of tensor:", rank_4_tensor.shape[0])
print("Elements along the last axis of tensor:", rank_4_tensor.shape[-1])
print("Total number of elements (3*2*4*5):", tf.size(rank_4_tensor).numpy())

Type of every element: <dtype: 'float32'>
Number of dimensions: 4
Shape of tensor: (3, 2, 4, 5)
Elements along axis 0 of tensor: 3
Elements along the last axis of tensor: 5
Total number of elements (3*2*4*5): 120


While axes are often referred to by their indices, you should always keep track of the meaning of each. **Axes are ordered from global to local**: 
1. **batch** axis
2. **spatial** axes
3. **features** for each location

This way feature vectors are contiguous regions of memory.

<center><img src="img/tf-axes.png" width=300 height=300><\center>

## **Single-axis and multi-axis indexing**

### **Single-axis indexing**

TensorFlow follows standard Python and NumPy indexing rules:

- indexes start at `0`
- negative indices count backwards from the end
- colons `:` are used for slicing as `start:stop:step`

In [20]:
rank_1_tensor = tf.constant([0, 1, 1, 2, 3, 5, 8, 13, 21, 34])
rank_1_tensor.numpy()

array([ 0,  1,  1,  2,  3,  5,  8, 13, 21, 34], dtype=int32)

Indexing with a scalar **removes the dimension**

In [21]:
print("First:", rank_1_tensor[0].numpy())
print("Second:", rank_1_tensor[1].numpy())
print("Last:", rank_1_tensor[-1].numpy())

First: 0
Second: 1
Last: 34


Indexing with a slice `:` **keeps the dimension**

In [22]:
print("Everything:", rank_1_tensor[:].numpy())
print("Before 4: ", rank_1_tensor[:4].numpy())
print("From 4 to the end:", rank_1_tensor[4:].numpy())
print("From 2, before 7:", rank_1_tensor[2:7].numpy())
print("Every other item:", rank_1_tensor[::2].numpy())
print("Reversed:", rank_1_tensor[::-1].numpy())

Everything: [ 0  1  1  2  3  5  8 13 21 34]
Before 4:  [0 1 1 2]
From 4 to the end: [ 3  5  8 13 21 34]
From 2, before 7: [1 2 3 5 8]
Every other item: [ 0  1  3  8 21]
Reversed: [34 21 13  8  5  3  2  1  1  0]


### **Multi-axis indexig**

Higher rank tensors are indexed by passing multiple indices. The exact same rules as in the single-axis case apply to each axis independently.

In [23]:
rank_2_tensor.numpy()

array([[1., 2.],
       [3., 4.],
       [5., 6.]], dtype=float16)

Passing an integer for each index results in a scalar.

In [24]:
# Pull out a single value from a rank-2 tensor
rank_2_tensor[1, 1].numpy()

4.0

Indexing can be performed using any combination of integers and slices

In [25]:
# Get row and column tensors
print("Second row:", rank_2_tensor[1,:].numpy())
print("Second column:", rank_2_tensor[:, 1].numpy())
print("Last row:", rank_2_tensor[-1,:].numpy())
print("First item in the last column:", rank_2_tensor[0, -1].numpy())
print("Skip the first row:")
print(rank_2_tensor[1:, :].numpy())

Second row: [3. 4.]
Second column: [2. 4. 6.]
Last row: [5. 6.]
First item in the last column: 2.0
Skip the first row:
[[3. 4.]
 [5. 6.]]


An example with a 3-axis tensor

In [26]:
# Get 3-axis tensor
rank_3_tensor[:, :, 4]

<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[ 4,  9],
       [14, 19],
       [24, 29]], dtype=int32)>

<center><img src="img/rank-3-tensor.png" width=500 height=500><\center>

## **Manipulating shapes**

Reshaping a tensor is of great utility. The `tf.reshape` operation is fast and cheap as the underlying data does not need to be duplicated.

In [27]:
# Shape returns a `tf.TensorShape` object that shows the size of each dimension
var_x = tf.Variable(tf.constant([[1], [2], [3]]))
var_x.shape

TensorShape([3, 1])

In [28]:
# `tf.TensorShape` can be converted into a Python list
var_x.shape.as_list()

[3, 1]

A tensor can be reshaped into a new shape.

In [29]:
reshaped = tf.reshape(var_x, [1, 3])

In [30]:
print(var_x.shape)
print(reshaped.shape)

(3, 1)
(1, 3)


The data maintains its layout in memory and a new tensor is created, with the requested shape, pointing to the same data. TensorFlow uses C-style "row-major" memory ordering, where incrementing the right-most index corresponds to a single step in memory.

In [31]:
rank_3_tensor

<tf.Tensor: shape=(3, 2, 5), dtype=int32, numpy=
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9]],

       [[10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],

       [[20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29]]], dtype=int32)>

A flattened tensor shows the order it is laid out in memory

In [32]:
# A `-1` passed in the `shape` argument says "Whatever fits"
tf.reshape(rank_3_tensor, [-1])

<tf.Tensor: shape=(30,), dtype=int32, numpy=
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29], dtype=int32)>

Typically, the only reasonable uses of `tf.reshape` are to combine or split adjacent axes (or add/remove `1`s). For this 3x2x5 tensor, reshaping to (3x2)x5 or 3x(2x5) are both reasonable things to do, as the slices do not mix

In [33]:
tf.reshape(rank_3_tensor, [3*2, 5])

<tf.Tensor: shape=(6, 5), dtype=int32, numpy=
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29]], dtype=int32)>

In [34]:
tf.reshape(rank_3_tensor, [3, -1])

<tf.Tensor: shape=(3, 10), dtype=int32, numpy=
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]], dtype=int32)>

<center><img src="img/good-reshapes.png" width=800 height=700><\center>

Reshaping will succeed for any new shape with the same total number of elements. It won't do anything useful if the order of the axes are not respected.

Swapping axes in `tf.reshape` does not work, it requires `tf.transpose`.

In [35]:
# Bad reshaping examples

print(tf.reshape(rank_3_tensor, [2, 3, 5]), "\n")

# This is a mess
print(tf.reshape(rank_3_tensor, [5, 6]), "\n")

# This doesn't work at all
try:
    tf.reshape(rank_3_tensor, [7, -1])
except Exception as e: print(e)

tf.Tensor(
[[[ 0  1  2  3  4]
  [ 5  6  7  8  9]
  [10 11 12 13 14]]

 [[15 16 17 18 19]
  [20 21 22 23 24]
  [25 26 27 28 29]]], shape=(2, 3, 5), dtype=int32) 

tf.Tensor(
[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]
 [24 25 26 27 28 29]], shape=(5, 6), dtype=int32) 

Input to reshape is a tensor with 30 values, but the requested shape requires a multiple of 7 [Op:Reshape]


<center><img src="img/bad-reshapes.png" width=800 height=700><\center>

## **More on `dtypes`**

To inspect a `tf.Tensor` data type use `Tensor.dtype` property. 

When creating a `tf.Tensor` from a Python object, data type may optionally be specified. If not, TensorFlow selects a data type that can represent the data. TensorFlow converts Python integers to `tf.int32` and Python floating point numbers to `tf.float32`. Otherwise TensorFlow uses the same rules NumPy uses when converting to arrays.

Casting from type to type is possible.

In [36]:
# `tf.cast` casts a tensor to a new type
f64_tensor = tf.constant([2.2, 3.3, 4.4], dtype=tf.float64)
print(f64_tensor)
f16_tensor = tf.cast(f64_tensor, dtype=tf.float16)
print(f16_tensor)
# Now let's cast to uint8 and lose the decimal precision
u8_tensor = tf.cast(f16_tensor, dtype=tf.uint8)
print(u8_tensor)

tf.Tensor([2.2 3.3 4.4], shape=(3,), dtype=float64)
tf.Tensor([2.2 3.3 4.4], shape=(3,), dtype=float16)
tf.Tensor([2 3 4], shape=(3,), dtype=uint8)


## **Broadcasting**

Broadcasting is a concept borrowed from the equivalent feature un NumPy. In short, under certain conditions, **smaller tensors are "stretched" automatically to fit larger tensors when running combined operations on them**.

The simplest and most common case is the multiplication or addition of a tensor and a scalar. In that case, the scalar is broadcast to be the same shape as the other argument.

In [37]:
x = tf.constant([1, 2, 3])
y = tf.constant(2)
z = tf.constant([2, 2, 2])

print(tf.multiply(x, 2))
print(x * y)
print(x * z)

tf.Tensor([2 4 6], shape=(3,), dtype=int32)
tf.Tensor([2 4 6], shape=(3,), dtype=int32)
tf.Tensor([2 4 6], shape=(3,), dtype=int32)


Likewise, 1-sized dimensions can be stretched out to match the other arguments. Both arguments can be stretched in the same computation. In the following case, a 3x1 matrix is element-wise multiplied by a 1x4 matrix to produce a 3x4 matrix.

In [38]:
x = tf.reshape(x, [3, 1])
y = tf.range(1, 5)
print(x, "\n")
print(y, "\n")
print(tf.multiply(x, y))

tf.Tensor(
[[1]
 [2]
 [3]], shape=(3, 1), dtype=int32) 

tf.Tensor([1 2 3 4], shape=(4,), dtype=int32) 

tf.Tensor(
[[ 1  2  3  4]
 [ 2  4  6  8]
 [ 3  6  9 12]], shape=(3, 4), dtype=int32)


<center><img src="img/broadcast.png" width=400 height=400><\center>

The same operation is done without broadcasting in the following

In [39]:
x_stretch = tf.constant([[1, 1, 1, 1],
                         [2, 2, 2, 2],
                         [3, 3, 3, 3]])
y_stretch = tf.constant([[1, 2, 3, 4],
                         [1, 2, 3, 4],
                         [1, 2, 3, 4]])

print(x_stretch * y_stretch)

tf.Tensor(
[[ 1  2  3  4]
 [ 2  4  6  8]
 [ 3  6  9 12]], shape=(3, 4), dtype=int32)


Most of the time, broadcasting is both time and memory efficient, as the broadcast operation never materialises the expanded tensors in memory. Broadcasting can be looked at using `tf.broadcast_to`.

In [40]:
tf.broadcast_to(tf.constant([1, 2, 3]), [3, 3])

<tf.Tensor: shape=(3, 3), dtype=int32, numpy=
array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3]], dtype=int32)>

Unlike mathematical ops, for example, `tf.broadcast_to` does nothing special to save memory. Here, the tensor is materialised.

## **`tf.convert_to_tensor`**

Most ops, like `tf.matmul` and `tf.reshape` take arguments of class `tf.Tensor`. However, Python objects shaped like tensors are also frequently passed. Most, but not all, ops call `convert_to_tensor` on non-tensor arguments. There is a registry of conversions, and most object classes like NumPy `ndarray`, `TensorShape`, `list` and `tf.Variable` will convert automatically.

## **Ragged tensors**

A tensor with variable numbers of elements along some axis is called **ragged**. Use `tf.ragged.RaggedTensor` for ragged data.

For example, this cannot be represented as a regular tensor:

<center><img src="img/ragged-tensor.png" width=250 height=250><\center>

In [41]:
ragged_list = [
    [0, 1, 2, 3],
    [4, 5],
    [6, 7, 8],
    [9]
]

In [42]:
try:
    tensor = tf.constant(ragged_list)
except Exception as e: print(e)

Can't convert non-rectangular Python sequence to Tensor.


Instead create a `tf.RaggedTensor` using `tf.ragged.constant` 

In [44]:
ragged_tensor = tf.ragged.constant(ragged_list)
ragged_tensor

<tf.RaggedTensor [[0, 1, 2, 3], [4, 5], [6, 7, 8], [9]]>

In [45]:
ragged_tensor.shape

TensorShape([4, None])

## **String tensors**

`tf.string` is a `dtype`, which is to say we can represent data as strings (variable-length byte in arrays) in tensors.

**Strings are atomic and cannot be indexed in a Pythonic way**. The length of the string is not one of the dimensions of the tensor. Here is a scalar string tensor:

In [46]:
# Tensors can be strings, too
# Here is a scalar string
scalar_string_tensor = tf.constant("Foo bar")
scalar_string_tensor

<tf.Tensor: shape=(), dtype=string, numpy=b'Foo bar'>

And a vector of strings

<center><img src="img/string-vector-tensor.png" width=250 height=250><\center>

In [51]:
# If we want to have two string tensors of different lengths, this is OK
tensor_of_strings = tf.constant(["Foo bar",
                                 "Gray big wolf",
                                 "Lazy dog"])
# Note that the shape is (2,), indicating that it is `2 x unknown`
tensor_of_strings

<tf.Tensor: shape=(3,), dtype=string, numpy=array([b'Foo bar', b'Gray big wolf', b'Lazy dog'], dtype=object)>

In the above printout the `b` prefix indicates that `tf.string` dtype is not a unicode string, but a byte-string.

If you pass unicode characters they are utf8-encoded.

In [52]:
tf.constant("🥳👍")

<tf.Tensor: shape=(), dtype=string, numpy=b'\xf0\x9f\xa5\xb3\xf0\x9f\x91\x8d'>

Some basic functions with strings can be found in `tf.strings`, including `tf.strings.split`

In [53]:
# We can use split to split a string into a set of tensors
tf.strings.split(scalar_string_tensor, sep=" ")

<tf.Tensor: shape=(2,), dtype=string, numpy=array([b'Foo', b'bar'], dtype=object)>

In [54]:
# ...but it turns into a `tf.RaggedTensor` if we split up a tensor of strings,
# as each string might be split into a different number of parts
tf.strings.split(tensor_of_strings)

<tf.RaggedTensor [[b'Foo', b'bar'], [b'Gray', b'big', b'wolf'], [b'Lazy', b'dog']]>

<center><img src="img/ragged-string-tensor.png" width=350 height=350><\center>
    

And `tf.strings.to_number`

In [55]:
# `tf.strings.to_number` converts each string in the input tensor to the specified numeric type
text = tf.constant("1 10 100")
tf.strings.to_number(tf.strings.split(text, " "))

<tf.Tensor: shape=(3,), dtype=float32, numpy=array([  1.,  10., 100.], dtype=float32)>

Altough you can't use `tf.cast` to turn a string tensor into numbers, you can convert it into bytes, and then into numbers

In [57]:
# Split string elements of input into bytes using `tf.strings.bytes_split`
byte_strings = tf.strings.bytes_split(tf.constant("Duck"))
# `tf.io.decode_raw` reinterprets the bytes of a string as a vector of numbers
byte_ints = tf.io.decode_raw(tf.constant("Duck"), tf.uint8)
print("Byte strings:", byte_strings)
print("Bytes:", byte_ints)

Byte strings: tf.Tensor([b'D' b'u' b'c' b'k'], shape=(4,), dtype=string)
Bytes: tf.Tensor([ 68 117  99 107], shape=(4,), dtype=uint8)


In [58]:
# Or split it up as unicode and then decode it
unicode_bytes = tf.constant("アヒル 🦆")
# `tf.strings.unicode_split` splits each string in input into a sequence of Unicode code points
unicode_char_bytes = tf.strings.unicode_split(unicode_bytes, "UTF-8")
# `tf.strings.unicode_decode` decodes each string in input into a sequence of Unicode code points
unicode_values = tf.strings.unicode_decode(unicode_bytes, "UTF-8")

print("\nUnicode bytes:", unicode_bytes)
print("\nUnicode chars:", unicode_char_bytes)
print("\nUnicode values:", unicode_values)


Unicode bytes: tf.Tensor(b'\xe3\x82\xa2\xe3\x83\x92\xe3\x83\xab \xf0\x9f\xa6\x86', shape=(), dtype=string)

Unicode chars: tf.Tensor([b'\xe3\x82\xa2' b'\xe3\x83\x92' b'\xe3\x83\xab' b' ' b'\xf0\x9f\xa6\x86'], shape=(5,), dtype=string)

Unicode values: tf.Tensor([ 12450  12498  12523     32 129414], shape=(5,), dtype=int32)


The `tf.string` dtype is used for all raw bytes data in TensorFlow. The `tf.io` module contains functions for **converting data to and from bytes**, including images and parsing csv.

## **Sparse tensors**

Sometimes, your data is sparse, like a very wide embedding space. TensorFlow supports `tf.sparse.SparseTensor` and related ops to store sparse data efficiently.

<center><img src="img/sparse-tensor.png" width=300 height=250><\center>

In [63]:
# Sparse tensors store values by index in a memory-efficient manner
sparse_tensor = tf.sparse.SparseTensor(indices=[[0,0], [1, 2]],
                                       values=[1, 2],
                                       dense_shape=[3, 4])
print(sparse_tensor, "\n")

# We can convert sparse tensors to dense
print(tf.sparse.to_dense(sparse_tensor))

SparseTensor(indices=tf.Tensor(
[[0 0]
 [1 2]], shape=(2, 2), dtype=int64), values=tf.Tensor([1 2], shape=(2,), dtype=int32), dense_shape=tf.Tensor([3 4], shape=(2,), dtype=int64)) 

tf.Tensor(
[[1 0 0 0]
 [0 0 2 0]
 [0 0 0 0]], shape=(3, 4), dtype=int32)


## **Introduction to variables**

Variables are created and tracked via the `tf.Variable` class. A `tf.Variable` represents a **tensor whose value can be changed by running ops on it**.

A TensorFlow **variable** is the recommended way to represent shared, persistent state your program manipulates. This guide covers how to create, update, and manage instances of `tf.Variable` in TensorFlow. Specific ops allow to read and modify the value of a `tf.Variable` tensor. Higher level libraries like `tf.keras` use `tf.Variable` to store model parameters.

### **Setup**

In [64]:
import tensorflow as tf

tf.debugging.set_log_device_placement(True)

### **Create a variable**

To create a variable, provide an initial value. The `tf.Variable` will have the same `dtype` as the initialisation value.

In [69]:
# Let's set an initial value
my_tensor = tf.constant([[1.0, 2.0], [3.0, 4.0]])
# `tf.Variable` will have the same `dtype`
my_variable = tf.Variable(my_tensor)

# Variables can be all kinds of types, just like tensors
bool_variable = tf.Variable([False, False, False, True])
complex_variable = tf.Variable([5 + 4j, 6 + 1j])

A variable looks like and acts like a tensor, and, in fact, is a data structure backed by `tf.Tensor`. Like tensors, they have a `dtype` and a shape, and can be exported to NumPy.

In [70]:
print("Shape:", my_variable.shape)
print("dtype:", my_variable.dtype)
print("As NumPy:", my_variable.numpy)

Shape: (2, 2)
dtype: <dtype: 'float32'>
As NumPy: <bound method BaseResourceVariable.numpy of <tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=
array([[1., 2.],
       [3., 4.]], dtype=float32)>>


Most tensor operations work on variables as expected, altough variables cannot be reshaped in place

In [72]:
print("A variable:", my_variable)
print("\nViewed as a tensor:", tf.convert_to_tensor(my_variable))
print("\nIndex of highest value:", tf.argmax(my_variable))

# This creates a new tensor; it does not reshape the initial variable
print("\nCopying and reshaping:", tf.reshape(my_variable, ([1, 4])))

A variable: <tf.Variable 'Variable:0' shape=(2, 2) dtype=float32, numpy=
array([[1., 2.],
       [3., 4.]], dtype=float32)>

Viewed as a tensor: tf.Tensor(
[[1. 2.]
 [3. 4.]], shape=(2, 2), dtype=float32)

Index of highest value: tf.Tensor([1 1], shape=(2,), dtype=int64)

Copying and reshaping: tf.Tensor([[1. 2. 3. 4.]], shape=(1, 4), dtype=float32)


As noted above, variables are backed by tensors. You can reassign the tensor using `tf.Variable.assign`. Calling `assign` does not (usually) allocate a new tensor; instead, the existing tensor memory is reused

In [73]:
a = tf.Variable([2.0, 3.0])
# This will keep the same `dtype`, `float32`
a.assign([1, 2])
# Not allowed as it resizes the variable
try:
    a.assign([1.0, 2.0, 3.0])
except Exception as e: print(e)

Cannot assign to variable Variable:0 due to variable shape (2,) and value shape (3,) are incompatible


If you use a variable like a tensor in operations, you will usually operate on the backing tensor. Creating new variables from existing variables duplicates the backing tensors. Two variables will not share the same memory.

In [84]:
a = tf.Variable([2.0, 3.0])
# Create `b` based on thea value of `a`
b = tf.Variable(a)
a.assign([5, 6])

# `a` and `b` are different
print(a.numpy())
print(b.numpy())

# There are other versions of assign
print(a.assign_add([2, 3]).numpy())
print(a.assign_sub([7, 9]).numpy())

[5. 6.]
[2. 3.]
[7. 9.]
[0. 0.]


### **Lifecyles, naming and watching**

In Python-based TensorFlow, `tf.Variable` instances have the same lifecycle as other Python objects. When there are no references to a variable it is automatically deallocated.

Variables can also be named which can help you track and debug them. You can give two variables the same name.

In [86]:
# Create `a` and `b`; they have the same name but are backed by different tensors
a = tf.Variable(my_tensor, name="Mark")
# A new variable with the same name but different value
# Note that the scalar added is broadcast
b = tf.Variable(my_tensor + 1, name="Mark")

# These are element-wise unequal, despite having the same name
a == b

<tf.Tensor: shape=(2, 2), dtype=bool, numpy=
array([[False, False],
       [False, False]])>

Variable names are preserved when saving and loading models. By default, variables in models will acquire unique variable names automatically, so you don't need to assign them yourself unless you want to.

Altough variables are important for differentiation, some variables will not need to be differentiated. You can turn off gradients for a variable by setting `trainable` to false at creation. An example of a variable that would not need gradients is a training step counter.

In [94]:
# You can turn off gradients for a variable by setting trainable to false at creation
step_counter = tf.Variable(1, trainable=False)

### **Placing variables and tensors**

For better performance, TensorFlow will attempt to place tensors and variables on the fastest device compatible with its `dtype`. This means most variables are placed on **a GPU** if one is available.

However, we can override this. In this snippet, we can place a float tensor and a variable for the CPU even if a GPU is available. By turning on device placement logging, we can see where the variable is placed.

Note: Although manual placement works, using distribution strategies can be a more convenient and scalable way to optimise your computation.

If you run this notebook on different backends with and without a GPU you will see different logging. *Note that logging device placement must be turned on at the start of the session*.

In [98]:
# If you would like a particular operation to run on a device of your choice instead of what's 
# automatically selected for you, you can use `tf.device` to create a device context

with tf.device("CPU:0"):
    # Create some tensors
    a = tf.Variable([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
    b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
    c = tf.matmul(a, b)
    
c

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[22., 28.],
       [49., 64.]], dtype=float32)>

It is possible to set the location of a variable or tensor on one device and do the computation on another device. This will introduce delay, as data needs to be copied between the devices.

You might do this, however, if you had multiple GPU workers but only want one copy of the variables.

In [99]:
with tf.device("CPU:0"):
    a = tf.Variable([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
    b = tf.Variable([[1.0, 2.0, 3.0]])
    
with tf.device("GPU:0"):
    # Element-wise multiply
    k = a * b

k

<tf.Tensor: shape=(2, 3), dtype=float32, numpy=
array([[ 1.,  4.,  9.],
       [ 4., 10., 18.]], dtype=float32)>