In [5]:
# Based on TensorFlow course : https://www.tensorflow.org/tutorials/customization/basics
# Modified by Mehdi Ammi, Univ. Paris 8

# TensorFlow: Tensors and Operations

## Overview

This notebook introduces you to the foundational concepts of TensorFlow, including:

 - Importing necessary libraries.
 - Creating and manipulating tensors.
 - Leveraging GPU acceleration for computations.
 - Building efficient data pipelines using tf.data.Dataset.

## Import TensorFlow

To get started, import the tensorflow module. As of TensorFlow 2, eager execution is turned on by default. Eager execution enables a more interactive frontend to TensorFlow, which you will later explore in more detail.

In [6]:
import tensorflow as tf

## Working with Tensors

A tensor is a multi-dimensional array, similar to NumPy arrays but with additional capabilities. tf.Tensor objects have a specific data type and shape. They can also reside in accelerator memory, such as a GPU. TensorFlow provides a comprehensive set of operations for tensors, such as tf.math.add, tf.linalg.matmul, and tf.linalg.inv, which can handle automatic type conversion for built-in Python types.

In [7]:
print(tf.math.add(1, 2))
print(tf.math.add([1, 2], [3, 4]))
print(tf.math.square(5))
print(tf.math.reduce_sum([1, 2, 3]))

# Operator overloading is also supported
print(tf.math.square(2) + tf.math.square(3))

tf.Tensor(3, shape=(), dtype=int32)
tf.Tensor([4 6], shape=(2,), dtype=int32)
tf.Tensor(25, shape=(), dtype=int32)
tf.Tensor(6, shape=(), dtype=int32)
tf.Tensor(13, shape=(), dtype=int32)


In [8]:
tf.Tensor(3, shape=(), dtype=int32)
tf.Tensor([4 6], shape=(2,), dtype=int32)
tf.Tensor(25, shape=(), dtype=int32)
tf.Tensor(6, shape=(), dtype=int32)
tf.Tensor(13, shape=(), dtype=int32)

SyntaxError: invalid syntax (2046157380.py, line 2)

Every tensor has a shape and a datatype, which can be inspected as shown below:

In [None]:
x = tf.linalg.matmul([[1]], [[2, 3]])
print(x)
print(x.shape)
print(x.dtype)

In [None]:
tf.Tensor([[2 3]], shape=(1, 2), dtype=int32)
(1, 2)
<dtype: 'int32'>

Key differences between NumPy arrays and TensorFlow tensors include:

 - Accelerator Memory: Tensors can be stored in memory of hardware accelerators like GPUs and TPUs, enabling faster computations.
 - Immutability: Tensors are immutable once created, which helps in ensuring the consistency of data during model training and evaluation.

## Interoperability with NumPy
TensorFlow integrates seamlessly with NumPy, the fundamental package for scientific computing with Python. Converting between TensorFlow tensors and NumPy arrays is straightforward and efficient:

 - TensorFlow operations can automatically convert NumPy arrays to tensors, allowing you to leverage TensorFlow's optimized operations on your NumPy data.
 - Similarly, NumPy operations can convert tensors to NumPy arrays automatically, enabling you to use NumPy's extensive functionality on your TensorFlow data.
 - You can explicitly convert a tensor to a NumPy array using the .numpy() method, which is particularly useful when you need to perform operations that are only available in NumPy.

In [None]:
import numpy as np

ndarray = np.ones([3, 3])

print("TensorFlow operations convert numpy arrays to Tensors automatically")
tensor = tf.math.multiply(ndarray, 42)
print(tensor)


print("And NumPy operations convert Tensors to NumPy arrays automatically")
print(np.add(tensor, 1))

print("The .numpy() method explicitly converts a Tensor to a numpy array")
print(tensor.numpy())

In [None]:
>>
TensorFlow operations convert numpy arrays to Tensors automatically
tf.Tensor(
[[42. 42. 42.]
 [42. 42. 42.]
 [42. 42. 42.]], shape=(3, 3), dtype=float64)
And NumPy operations convert Tensors to NumPy arrays automatically
[[43. 43. 43.]
 [43. 43. 43.]
 [43. 43. 43.]]
The .numpy() method explicitly converts a Tensor to a numpy array
[[42. 42. 42.]
 [42. 42. 42.]
 [42. 42. 42.]]

## Utilizing GPU Acceleration

TensorFlow is designed to take advantage of hardware accelerators such as GPUs. Many operations in TensorFlow are optimized to use GPUs, providing significant speed improvements, especially for large-scale computations and deep learning models. TensorFlow automatically decides whether to use the GPU or CPU for an operation, transferring tensors between the two as necessary.

In [None]:
x = tf.random.uniform([3, 3])

print("Is there a GPU available: "),
print(tf.config.list_physical_devices("GPU"))

print("Is the Tensor on GPU #0:  "),
print(x.device.endswith('GPU:0'))

In [None]:
>>
Is there a GPU available: 
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:3', device_type='GPU')]
Is the Tensor on GPU #0:  
True

### Device names
The Tensor.device property provides a fully qualified string name of the device hosting the contents of the tensor. This name encodes many details, such as an identifier of the network address of the host on which this program is executing and the device within that host. This is required for distributed execution of a TensorFlow program. The string ends with GPU:< N > if the tensor is placed on the N-th GPU on the host.
    
    
### Explicit device placement
In TensorFlow, placement refers to how individual operations are assigned (placed on) a device for execution. As mentioned, when there is no explicit guidance provided, TensorFlow automatically decides which device to execute an operation and copies tensors to that device, if needed.

However, TensorFlow operations can be explicitly placed on specific devices using the tf.device context manager. For example:

In [None]:
import time

def time_matmul(x):
  start = time.time()
  for loop in range(10):
    tf.linalg.matmul(x, x)

  result = time.time()-start

  print("10 loops: {:0.2f}ms".format(1000*result))

# Force execution on CPU
print("On CPU:")
with tf.device("CPU:0"):
  x = tf.random.uniform([1000, 1000])
  assert x.device.endswith("CPU:0")
  time_matmul(x)

# Force execution on GPU #0 if available
if tf.config.list_physical_devices("GPU"):
  print("On GPU:")
  with tf.device("GPU:0"): # Or GPU:1 for the 2nd GPU, GPU:2 for the 3rd etc.
    x = tf.random.uniform([1000, 1000])
    assert x.device.endswith("GPU:0")
    time_matmul(x)

In [None]:
>>
On CPU:
10 loops: 42.76ms
On GPU:
10 loops: 300.72ms

## Building Data Pipelines

The tf.data.Dataset API is a powerful tool for building input pipelines to feed data into your machine learning models. It allows you to create complex and highly efficient data processing pipelines. 

Refer to the tf.data: Build TensorFlow input pipelines guide to learn more.


### Create a source Dataset

You can create a source dataset from various input sources, such as tensors, files, or by generating data programmatically.

Refer to the Reading input data section of the tf.data: Build TensorFlow input pipelines guide for more information.

In [None]:
ds_tensors = tf.data.Dataset.from_tensor_slices([1, 2, 3, 4, 5, 6])

# Create a CSV file
import tempfile
_, filename = tempfile.mkstemp()

with open(filename, 'w') as f:
  f.write("""Line 1
Line 2
Line 3
  """)

ds_file = tf.data.TextLineDataset(filename)

### Apply transformations

Transformations can be applied to the dataset using methods like map, batch, and shuffle, enabling you to preprocess and organize your data efficiently.

In [None]:
ds_tensors = ds_tensors.map(tf.math.square).shuffle(2).batch(2)

ds_file = ds_file.batch(2)

### Apply Iterateration 

tf.data.Dataset objects support iteration to loop over records:

In [None]:
print('Elements of ds_tensors:')
for x in ds_tensors:
  print(x)

print('\nElements in ds_file:')
for x in ds_file:
  print(x)

In [None]:
>> Elements of ds_tensors:
tf.Tensor([4 9], shape=(2,), dtype=int32)
tf.Tensor([ 1 25], shape=(2,), dtype=int32)
tf.Tensor([16 36], shape=(2,), dtype=int32)

Elements in ds_file:
tf.Tensor([b'Line 1' b'Line 2'], shape=(2,), dtype=string)
tf.Tensor([b'Line 3' b'  '], shape=(2,), dtype=string)