# Introduction to Tensorflow

Tensorflow is one of the main libraries used for Deep Learning. Initialy, it was made for Tensor (nd-array) distributed computation and was extended with facilities for Machine Learning and Deep Learning.

In [1]:
import numpy as np
import tensorflow as tf

Init Plugin
Init Graph Optimizer
Init Kernel


## tf.Tensor

`Tensor` is the fundamental class of Tensorflow. A huge difference with NumPy is that `Tensor` are immutable objects, that is they can only be assigned once. They can represent operation tree on arrays or arrays themselves (actually an operation tree with only one node).

In [2]:
t = tf.constant([[2,3],[4,5]])
print(t)

tf.Tensor(
[[2 3]
 [4 5]], shape=(2, 2), dtype=int32)Metal device set to: Apple M1



2021-09-10 16:30:54.483824: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-09-10 16:30:54.483938: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)


You can not modify a `Tensor`, it is immutable:

In [3]:
try:
    t[1,0]=1
except Exception as e:
    print(f"{type(e).__name__}: {e}")
    

TypeError: 'tensorflow.python.framework.ops.EagerTensor' object does not support item assignment


Some ways to create `Tensor`s more easily:

In [4]:
a = tf.zeros((2, 2))
print(a)

tf.Tensor(
[[0. 0.]
 [0. 0.]], shape=(2, 2), dtype=float32)


In [5]:
b = tf.ones((3, 3))
print(b)

tf.Tensor(
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]], shape=(3, 3), dtype=float32)


In [6]:
c = tf.fill((2, 2), 7.)
print(c)

tf.Tensor(
[[7. 7.]
 [7. 7.]], shape=(2, 2), dtype=float32)


In [7]:
d = tf.constant([[1, 2, 3], [4, 5, 6]])
print(d)

tf.Tensor(
[[1 2 3]
 [4 5 6]], shape=(2, 3), dtype=int32)


In [8]:
e = tf.eye(4)
print(e)

tf.Tensor(
[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]], shape=(4, 4), dtype=float32)


Most simple operations that can be done on `ndarray` can be done on `Tensors`. Results of operations are `Tensor` themselves.

In [9]:
# element wise operations
a = tf.constant([[1, 2, 3], [4, 5, 6]])
b = tf.constant([[7, 8, 9], [10, 11, 12]])
c = a + b 
d = 2 * c  
print(d)

tf.Tensor(
[[16 20 24]
 [28 32 36]], shape=(2, 3), dtype=int32)


In [10]:
# matrix multiplication

a = tf.constant([[1, 2, 3], [4, 5, 6]])
b = tf.constant([[7], [8] , [9]])
c = a @ b   
print(c)

tf.Tensor(
[[ 50]
 [122]], shape=(2, 1), dtype=int32)


You can get the value of a `Tensor`in the from of an `ndarray` by the `numpy()` method

In [11]:
print(type(c))
print(c)
d = c.numpy()
print(type(d))
print(d)

<class 'tensorflow.python.framework.ops.EagerTensor'>
tf.Tensor(
[[ 50]
 [122]], shape=(2, 1), dtype=int32)
<class 'numpy.ndarray'>
[[ 50]
 [122]]


## tf.Variables
For now on, we have only seen stateless computation, also known as _functional programming_. To perform _imperative programming_, we need some mutable objects that can record changing data.

Tensorflow provides `Variable` object for that purpose. An initial value must be provide to the variable constructor.


In [12]:
a = tf.Variable([2.0, 3.0])
print(a)

<tf.Variable 'Variable:0' shape=(2,) dtype=float32, numpy=array([2., 3.], dtype=float32)>


As for `Tensor`, you can get a `ndarray` copy of a `Variable`

In [13]:
b = a.numpy()
print(type(b))
print(b)

<class 'numpy.ndarray'>
[2. 3.]


The values of a `Variable` can be mofidified using `assign()` method, but its shape can not be changed.

In [14]:
a.assign([1, 2])
print(a)

<tf.Variable 'Variable:0' shape=(2,) dtype=float32, numpy=array([1., 2.], dtype=float32)>


In [15]:
try:
    a.assign([[1, 2], [3, 4]])
except Exception as e:
    print(f"{type(e).__name__}: {e}")

ValueError: Cannot assign to variable Variable:0 due to variable shape (2,) and value shape (2, 2) are incompatible


You can read and modify a variable by the same operation.

In [16]:
a = tf.Variable([2.0, 3.0])
print(a)
a.assign_add([1.0, 2.0])
print(a)

<tf.Variable 'Variable:0' shape=(2,) dtype=float32, numpy=array([2., 3.], dtype=float32)>
<tf.Variable 'Variable:0' shape=(2,) dtype=float32, numpy=array([3., 5.], dtype=float32)>


Variables will be later used as memories inside the computation engine used by Tensorflow (CPU/GPU/TPU), for example as parameters of a model to be trained.

## Gradient

Operations which are recorded with `GradientTape` can be differentiate automatically. 

In [17]:
x = tf.Variable([2.0])
a = tf.constant([3.0])

with tf.GradientTape() as tape:
    fx = (3*a-x)**2

# The first argument is the last tensor to derivate
# The second argument is a variable (or a list of variables) on which to perform the derivation 
d_fx_d_x = tape.gradient(fx,x)

print(fx)
print(d_fx_d_x)


tf.Tensor([48.999985], shape=(1,), dtype=float32)
tf.Tensor([-13.999998], shape=(1,), dtype=float32)


## Example: minimizing a function by a gradient descent

Let's try to minimize $(3a-w)^2$ over $w$ where $a$ is a constant.

In [18]:
w = tf.Variable([2.0])
a = tf.constant([3.0])

nstep = 20
lr = 1e-1

def myfunc(w,a):
    return (3*a-w)**2

for i in range(nstep):
    
    # Computing the function meanwhile recording a gradient tape
    with tf.GradientTape() as tape:
        f = myfunc(w,a)
    print("Step %d: f(%s)=%s" % (i, w.numpy(), f.numpy()))
    
    # Computing the gradient trough the tape 
    d_f_d_w = tape.gradient(f,w)
    # Doing a gradient descent step
    w.assign_add(-lr*d_f_d_w)

f = myfunc(w,a)
print("Step %d: f(%s)=%s" % (nstep, w.numpy(), f.numpy()))

Step 0: f([2.])=[48.999985]
Step 1: f([3.3999999])=[31.359993]
Step 2: f([4.5199995])=[20.070396]
Step 3: f([5.4159994])=[12.8450575]
Step 4: f([6.1327996])=[8.220837]
Step 5: f([6.7062397])=[5.261336]
Step 6: f([7.164992])=[3.367254]
Step 7: f([7.5319934])=[2.1550431]
Step 8: f([7.825595])=[1.379227]
Step 9: f([8.060476])=[0.8827048]
Step 10: f([8.248381])=[0.5649316]
Step 11: f([8.398705])=[0.36155617]
Step 12: f([8.518964])=[0.2313958]
Step 13: f([8.615171])=[0.14809303]
Step 14: f([8.692137])=[0.09477977]
Step 15: f([8.75371])=[0.06065886]
Step 16: f([8.802968])=[0.0388216]
Step 17: f([8.842375])=[0.0248457]
Step 18: f([8.873899])=[0.01590135]
Step 19: f([8.899119])=[0.0101769]
Step 20: f([8.919295])=[0.00651325]


## Multiple variables

Here is a code to minimize $(<w,x>+b)^2$ over $w$ and $b$ where $w$ and $x$ are vectors of size 2, $b$ a scalar. $x$ is given by the user (no need to be interactive).

To do the scalar product of $x$ and $y$ you can use:

`tf.reduce_sum(tf.multiply(x,y))`

In [19]:
w = tf.Variable([2.0, 3.0])
b = tf.Variable([3.0])
x = tf.constant([1.0, 1.0])

nstep = 20
lr = 1e-1

def myfunc(w,b,x):
    return (tf.reduce_sum(tf.multiply(w, x))+b)**2

for i in range(nstep):
    
    # Computing the function meanwhile recording a gradient tape
    with tf.GradientTape() as tape:
        f = myfunc(w,b,x)
    print("Step %d: f(%s,%s)=%s" % (i, w.numpy(), b.numpy(), f.numpy()))
    
    # Computing the gradient trough the tape over all the variables 
    d_f_d_w, d_f_d_b = tape.gradient(f,[w,b])
    # Doing a gradient descent step over all the variables
    w.assign_add(-lr*d_f_d_w)
    b.assign_add(-lr*d_f_d_b)

f = myfunc(w,b,x)
print("Step %d: f(%s,%s)=%s" % (nstep, w.numpy(), b.numpy(), f.numpy()))

Step 0: f([2. 3.],[3.])=[64.]
Step 1: f([0.39999998 1.4       ],[1.4])=[10.239998]
Step 2: f([-0.23999995  0.76000005],[0.76000005])=[1.6384003]
Step 3: f([-0.49599996  0.50400007],[0.50400007])=[0.26214418]
Step 4: f([-0.5984      0.40160003],[0.40160003])=[0.04194307]
Step 5: f([-0.63936     0.36064002],[0.36064002])=[0.00671089]
Step 6: f([-0.655744  0.344256],[0.344256])=[0.00107374]
Step 7: f([-0.6622976   0.33770242],[0.33770242])=[0.0001718]
Step 8: f([-0.6649191   0.33508098],[0.33508098])=[2.7487846e-05]
Step 9: f([-0.66596764  0.33403242],[0.33403242])=[4.398207e-06]
Step 10: f([-0.6663871   0.33361298],[0.33361298])=[7.0371317e-07]
Step 11: f([-0.66655487  0.3334452 ],[0.3334452])=[1.1257001e-07]
Step 12: f([-0.666622    0.33337808],[0.33337808])=[1.8001609e-08]
Step 13: f([-0.6666488   0.33335125],[0.33335125])=[2.8840987e-09]
Step 14: f([-0.66665953  0.33334053],[0.33334053])=[4.6299337e-10]
Step 15: f([-0.6666638   0.33333623],[0.33333623])=[7.4695965e-11]
Step 16: f([-0.

## tf.Function

For now, our code was running into _eager_ mode which means that every single operation is read by the CPU before sent to the accelerator (such as the GPU) and the result return back to the CPU.

In order to increase the performance (i.e. running a full bunch of code in the accelerate hardware without exchange with the CPU), the code must be representable into a _graph_ of operations (also called, Abstract Syntax Tree or AST) and then compiled specificaly for the GPU.

`Tensorflow` builds automatically this _graph_ trough `tf.function` decorator which should be place before a function definition. This is the _autograph_ feature of the framework.
The first time a function is called, each python/tensorflow instruction is read and registers into a graph.
The graph is then compile for the accelerator.
The next time the function is called, it is not anymore the python code which is executed but the accelerator code.

In the next example, look at the `print` that you only see once.




In [20]:
@tf.function
def foo(x):
    print("Graphing foo")
    return tf.multiply(tf.constant([2]),x)

print(foo(1))
print(foo(1))
print(foo(1))



Graphing foo
tf.Tensor([2], shape=(1,), dtype=int32)
tf.Tensor([2], shape=(1,), dtype=int32)
tf.Tensor([2], shape=(1,), dtype=int32)


2021-09-10 16:30:54.702663: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-09-10 16:30:54.703070: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2021-09-10 16:30:54.703193: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


Here is our optimized code. Please note that `tf.Variable` should be created outside of a `tf.Function` or the _autograph_ will failed.

In [21]:
w = tf.Variable([2.0, 3.0])
b = tf.Variable([3.0])
x = tf.constant([1.0, 1.0])

nstep = 20
lr = 1e-1

@tf.function
def myfunc(w,b,x):
    return (tf.reduce_sum(tf.multiply(w, x))+b)**2

for i in range(nstep):
    
    # Computing the function meanwhile recording a gradient tape
    with tf.GradientTape() as tape:
        f = myfunc(w,b,x)
    print("Step %d: f(%s,%s)=%s" % (i, w.numpy(), b.numpy(), f.numpy()))
    
    # Computing the gradient trough the tape over all the variables 
    d_f_d_w, d_f_d_b = tape.gradient(f,[w,b])
    # Doing a gradient descent step over all the variables
    w.assign_add(-lr*d_f_d_w)
    b.assign_add(-lr*d_f_d_b)

f = myfunc(w,b,x)
print("Step %d: f(%s,%s)=%s" % (nstep, w.numpy(), b.numpy(), f.numpy()))

Step 0: f([2. 3.],[3.])=[64.]
Step 1: f([0.39999998 1.4       ],[1.4])=[10.239999]
Step 2: f([-0.23999995  0.76000005],[0.76000005])=[1.6384006]
Step 3: f([-0.49599996  0.50400007],[0.50400007])=[0.2621442]
Step 4: f([-0.5984      0.40160003],[0.40160003])=[0.04194307]
Step 5: f([-0.63936     0.36064002],[0.36064002])=[0.00671089]
Step 6: f([-0.655744  0.344256],[0.344256])=[0.00107374]
Step 7: f([-0.6622976   0.33770242],[0.33770242])=[0.0001718]
Step 8: f([-0.6649191   0.33508098],[0.33508098])=[2.7487835e-05]
Step 9: f([-0.66596764  0.33403242],[0.33403242])=[4.3982036e-06]
Step 10: f([-0.6663871   0.33361298],[0.33361298])=[7.0371254e-07]
Step 11: f([-0.66655487  0.3334452 ],[0.3334452])=[1.1257001e-07]
Step 12: f([-0.666622    0.33337808],[0.33337808])=[1.8001604e-08]
Step 13: f([-0.6666488   0.33335125],[0.33335125])=[2.8840965e-09]
Step 14: f([-0.66665953  0.33334053],[0.33334053])=[4.629932e-10]
Step 15: f([-0.6666638   0.33333623],[0.33333623])=[7.4695805e-11]
Step 16: f([-0.6

2021-09-10 16:30:54.742149: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-09-10 16:30:54.751920: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-09-10 16:30:54.807647: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


## tf.Module

In order to help building more complex functions / models, `Tensorflow` provides a generic class pattern `tf.Module` to embed a model. Variables must be put into the `__init__` constructor of the class, and the actual computation or forward pass into the `__call__` methods.

It provides tools such as getting all variables of a model.


In [22]:
class MyModule(tf.Module):
    def __init__(self, name=None):
        super().__init__(name=name)
        self.w = tf.Variable([1.0, 2.0], name='w')
        self.b = tf.Variable([3.0], name='b')
        # Dummy non trainable variable
        self.dummy = tf.Variable(4.0, trainable=False, name='dummy')

    @tf.function
    def __call__(self,x):
        return (tf.reduce_sum(tf.multiply(self.w, x))+self.b)**2

aModule = MyModule()
# All trainable variables
print("trainable variables:", aModule.trainable_variables)
# Every variable
print("all variables:", aModule.variables)

trainable variables: (<tf.Variable 'b:0' shape=(1,) dtype=float32, numpy=array([3.], dtype=float32)>, <tf.Variable 'w:0' shape=(2,) dtype=float32, numpy=array([1., 2.], dtype=float32)>)
all variables: (<tf.Variable 'b:0' shape=(1,) dtype=float32, numpy=array([3.], dtype=float32)>, <tf.Variable 'dummy:0' shape=() dtype=float32, numpy=4.0>, <tf.Variable 'w:0' shape=(2,) dtype=float32, numpy=array([1., 2.], dtype=float32)>)


2021-09-10 16:30:54.820601: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.


## Wrapping it all

Here is a simple code to optimize a module with an unkown number of variables.

In [23]:
class MyModule(tf.Module):
    def __init__(self, name=None):
        super().__init__(name=name)
        self.w = tf.Variable([1.0, 2.0], name='w')
        self.b = tf.Variable([3.0], name='b')

    @tf.function
    def __call__(self,x):
        return (tf.reduce_sum(tf.multiply(self.w, x))+self.b)**2

aModule = MyModule()
x = tf.constant([1.0, 1.0])

nstep = 20
lr = 1e-1
for i in range(nstep):
    
    # Computing the function meanwhile recording a gradient tape
    with tf.GradientTape() as tape:
        f = aModule(x)
    print("Step %d:\n\tf=%s" % (i, f.numpy()))
    for aVar in aModule.trainable_variables:
        print('\t%s=%s'%(aVar.name,aVar.numpy()))
    
    # Computing the gradient trough the tape over all the variables 
    grads = tape.gradient(f,aModule.trainable_variables)

    # Doing a gradient descent step over all the variables
    for aVariable,aGrad  in zip(aModule.trainable_variables, grads):
        aVariable.assign_add(-lr*aGrad)

f = aModule(x)
print("Step %d:\n\tf=%s" % (nstep, f.numpy()))
for aVar in aModule.trainable_variables:
    print('\t%s=%s'%(aVar.name,aVar.numpy()))

Step 0:
	f=[36.]
	b:0=[3.]
	w:0=[1. 2.]
Step 1:
	f=[5.76]
	b:0=[1.8000001]
	w:0=[-0.19999993  0.8000001 ]
Step 2:
	f=[0.9216003]
	b:0=[1.32]
	w:0=[-0.67999995  0.32000008]
Step 3:
	f=[0.1474561]
	b:0=[1.128]
	w:0=[-0.872       0.12800007]
Step 4:
	f=[0.02359299]
	b:0=[1.0512]
	w:0=[-0.94879997  0.05120005]
Step 5:
	f=[0.00377489]
	b:0=[1.02048]
	w:0=[-0.97951996  0.02048002]
Step 6:
	f=[0.00060398]
	b:0=[1.0081921]
	w:0=[-0.991808  0.008192]
Step 7:
	f=[9.663589e-05]
	b:0=[1.0032768]
	w:0=[-0.99672323  0.00327679]
Step 8:
	f=[1.5461555e-05]
	b:0=[1.0013107]
	w:0=[-0.9986893   0.00131072]
Step 9:
	f=[2.4738488e-06]
	b:0=[1.0005243]
	w:0=[-9.994757e-01  5.242932e-04]
Step 10:
	f=[3.958008e-07]
	b:0=[1.0002097]
	w:0=[-9.9979031e-01  2.0972377e-04]


2021-09-10 16:30:54.854929: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
2021-09-10 16:30:54.863908: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


Step 11:
	f=[6.329813e-08]
	b:0=[1.0000838]
	w:0=[-9.9991614e-01  8.3898325e-05]
Step 12:
	f=[1.0122903e-08]
	b:0=[1.0000335]
	w:0=[-9.9996644e-01  3.3580083e-05]
Step 13:
	f=[1.6187052e-09]
	b:0=[1.0000134]
	w:0=[-9.99986589e-01  1.34575475e-05]
Step 14:
	f=[2.5707791e-10]
	b:0=[1.0000052]
	w:0=[-9.9999464e-01  5.4109159e-06]
Step 15:
	f=[4.067502e-11]
	b:0=[1.000002]
	w:0=[-9.9999785e-01  2.2041856e-06]
Step 16:
	f=[6.5689676e-12]
	b:0=[1.0000007]
	w:0=[-9.999991e-01  9.286450e-07]
Step 17:
	f=[1.0267343e-12]
	b:0=[1.0000002]
	w:0=[-9.9999964e-01  4.1604477e-07]
Step 18:
	f=[1.7408297e-13]
	b:0=[1.]
	w:0=[-9.9999982e-01  2.1338893e-07]
Step 19:
	f=[3.1974423e-14]
	b:0=[0.99999994]
	w:0=[-9.9999988e-01  1.2994238e-07]
Step 20:
	f=[3.5527137e-15]
	b:0=[0.9999999]
	w:0=[-9.9999994e-01  9.4179583e-08]


2021-09-10 16:30:54.923747: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.
