<br>

<div align=center><font color=maroon size=6><b>Customization basics: tensors and operations</b></font></div>

<br>

<font size=4><b>References:</b></font>
1. TF2 official tutorials: <a href="https://www.tensorflow.org/tutorials" style="text-decoration:none;">TensorFlow Tutorials</a> 
    * `TensorFlow > Learn > TensorFlow Core > `Tutorials > <a href="https://www.tensorflow.org/tutorials/customization/basics" style="text-decoration:none;">Customization basics: tensors and operations</a>
        * Run in <a href="https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/customization/basics.ipynb" style="text-decoration:none;">Google Colab</a>

<br>
<br>
<br>

This is an introductory TensorFlow tutorial that shows how to:

* Import the required package.
* Create and use tensors.
* Use GPU acceleration.
* Build a data pipeline with `tf.data.Dataset`.

<br>

## Import TensorFlow

To get started, import the `tensorflow` module. <font size=3 color=maroon>As of TensorFlow 2, **eager execution** is turned on by default. Eager execution enables a more interactive frontend to TensorFlow,</font> which you will later explore in more detail.

In [1]:
import tensorflow as tf

In [2]:
tf.__version__

'2.8.0'

<br>

## Tensors

A Tensor is a multi-dimensional array. Similar to NumPy `ndarray` objects, `tf.Tensor` objects have a data type and a shape. <font color=maroon size=3>Additionally, `tf.Tensor`s can reside in accelerator memory (like a GPU).</font> TensorFlow offers a rich library of operations (for example, `tf.math.add`, `tf.linalg.matmul`, and `tf.linalg.inv`) that consume and produce `tf.Tensor`s. <font color=maroon size=3>These operations automatically convert built-in Python types.</font> For example:


In [3]:
print(tf.math.add(1, 2))
print(tf.math.add([1, 2], [3, 4]))
print(tf.math.square(5))
print(tf.math.reduce_sum([1, 2, 3]))

# Operator overloading is also supported
print(tf.math.square(2) + tf.math.square(3))

tf.Tensor(3, shape=(), dtype=int32)
tf.Tensor([4 6], shape=(2,), dtype=int32)
tf.Tensor(25, shape=(), dtype=int32)
tf.Tensor(6, shape=(), dtype=int32)
tf.Tensor(13, shape=(), dtype=int32)


<br>

Each `tf.Tensor` has a shape and a datatype:

In [4]:
x = tf.linalg.matmul([[1]], [[2, 3]])
print(x)
print(x.shape)
print(x.dtype)

tf.Tensor([[2 3]], shape=(1, 2), dtype=int32)
(1, 2)
<dtype: 'int32'>


In [5]:
x.numpy()

array([[2, 3]])

<br>

<font size=3 color=maroon>The most obvious differences between NumPy arrays and `tf.Tensor`s are:

1. Tensors can be backed by accelerator memory (like GPU, TPU).
2. Tensors are immutable.</font>

<br>

### NumPy compatibility

Converting between a TensorFlow `tf.Tensor`s and a NumPy `ndarray` is easy:

* TensorFlow operations automatically convert NumPy ndarrays to Tensors.
* NumPy operations automatically convert Tensors to NumPy ndarrays.

Tensors are explicitly converted to NumPy ndarrays using their `.numpy()` method. These conversions are typically cheap since the array and `tf.Tensor` share the underlying memory representation, if possible. However, sharing the underlying representation isn't always possible since the `tf.Tensor` may be hosted in GPU memory while NumPy arrays are always backed by host memory, and the conversion involves a copy from GPU to host memory.

In [6]:
import numpy as np

ndarray = np.ones([3, 3])

print("TensorFlow operations convert numpy arrays to Tensors automatically")
tensor = tf.math.multiply(ndarray, 42)
print(tensor)
print()

print("And NumPy operations convert Tensors to NumPy arrays automatically")
print(np.add(tensor, 1))
print()

print("The .numpy() method explicitly converts a Tensor to a numpy array")
print(tensor.numpy())

TensorFlow operations convert numpy arrays to Tensors automatically
tf.Tensor(
[[42. 42. 42.]
 [42. 42. 42.]
 [42. 42. 42.]], shape=(3, 3), dtype=float64)

And NumPy operations convert Tensors to NumPy arrays automatically
[[43. 43. 43.]
 [43. 43. 43.]
 [43. 43. 43.]]

The .numpy() method explicitly converts a Tensor to a numpy array
[[42. 42. 42.]
 [42. 42. 42.]
 [42. 42. 42.]]


<br>
<br>
<br>

## GPU acceleration

Many TensorFlow operations are accelerated using the GPU for computation. Without any annotations, <font size=3 color=maroon>TensorFlow automatically decides whether to use the GPU or CPU for an operation—copying the tensor between CPU and GPU memory, if necessary. Tensors produced by an operation are typically backed by the memory of the device on which the operation executed.</font>

For example:

In [7]:
x = tf.random.uniform([3, 3])

print("Is there a GPU available: "),
print(tf.config.list_physical_devices("GPU"))
print()

print("Is the Tensor on GPU #0:  "),
print(x.device.endswith('GPU:0'))

Is there a GPU available: 
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Is the Tensor on GPU #0:  
True


<br>
<br>

### Device names

The `Tensor.device` property provides a fully qualified string name of the device hosting the contents of the tensor. This name encodes many details, such as an identifier of the network address of the host on which this program is executing and the device within that host. <font size=3 color=maroon>This is required for distributed execution of a TensorFlow program. The string ends with `GPU:<N>` if the tensor is placed on the `N`-th GPU on the host.</font>

<br>
<br>

### Explicit device placement

<font size=3 color=maroon>In TensorFlow, *placement* refers to how individual operations are assigned (placed on) a device for execution. As mentioned, when there is no explicit guidance provided, TensorFlow automatically decides which device to execute an operation and copies tensors to that device, if needed.

However, TensorFlow operations can be explicitly placed on specific devices using the `tf.device` context manager.</font> 

For example:

In [8]:
import time

def time_matmul(x):
    start = time.time()
    for loop in range(10):
        tf.linalg.matmul(x, x)

    result = time.time()-start

    print("10 loops: {:0.2f}ms".format(1000*result))

# Force execution on CPU
print("On CPU:")
with tf.device("CPU:0"):
    x = tf.random.uniform([1000, 1000])
    assert x.device.endswith("CPU:0")
    time_matmul(x)

print()    

# Force execution on GPU #0 if available
if tf.config.list_physical_devices("GPU"):
    print("On GPU:")
    with tf.device("GPU:0"): # Or GPU:1 for the 2nd GPU, GPU:2 for the 3rd etc.
        x = tf.random.uniform([1000, 1000])
        assert x.device.endswith("GPU:0")
        time_matmul(x)

On CPU:
10 loops: 70.04ms

On GPU:
10 loops: 1535.52ms


In [9]:
import time

def time_matmul(x):
    start = time.time()
    for loop in range(10):
        tf.linalg.matmul(x, x)

    result = time.time()-start

    print("10 loops: {:0.2f}ms".format(1000*result))

# Force execution on CPU
print("On CPU:")
with tf.device("CPU:0"):
    x = tf.random.uniform([1000, 1000])
    assert x.device.endswith("CPU:0")
    time_matmul(x)

print()    

# Force execution on GPU #0 if available
if tf.config.list_physical_devices("GPU"):
    print("On GPU:")
    with tf.device("GPU:0"): # Or GPU:1 for the 2nd GPU, GPU:2 for the 3rd etc.
        x = tf.random.uniform([1000, 1000])
        assert x.device.endswith("GPU:0")
        time_matmul(x)

On CPU:
10 loops: 43.00ms

On GPU:
10 loops: 1.00ms


<br>
<br>
<br>

## Datasets

This section uses the [`tf.data.Dataset` API](https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/guide/data.ipynb) to build a pipeline for feeding data to your model. `tf.data.Dataset` is used to build performant, complex input pipelines from simple, re-usable pieces that will feed your model's training or evaluation loops.

<br>

### Create a source `Dataset`

<font size=3 color=maroon>Create a ***source*** dataset using one of the factory functions like:

- `tf.data.Dataset.from_tensors`, `tf.data.Dataset.from_tensor_slices`, 

- or using objects that read from files like `tf.data.TextLineDataset` 

- or `tf.data.TFRecordDataset`. 


Refer to the _Reading input data_ section of the [tf.data: Build TensorFlow input pipelines](https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/guide/data.ipynb) guide for more information.
</font>

In [10]:
# help(tf.data.Dataset.from_tensor_slices)

In [11]:
# help(tf.data.Dataset.from_tensors)

In [12]:
ds_tensors = tf.data.Dataset.from_tensor_slices([1, 2, 3, 4, 5, 6])

# Create a CSV file
import tempfile

_, filename = tempfile.mkstemp()

with open(filename, 'w') as f:
    f.write("""Line 1
    Line 2
    Line 3
  """)

ds_file = tf.data.TextLineDataset(filename)

<br>

In [13]:
ds_tensors

<TensorSliceDataset element_spec=TensorSpec(shape=(), dtype=tf.int32, name=None)>

In [14]:
for e in ds_tensors.take(2):
    print(e.numpy())

1
2


In [15]:
# ds_tensors.take(2).numpy()
#
# 报错：AttributeError: 'TakeDataset' object has no attribute 'numpy'


# ds_tensors.take(2)[1].numpy()
#
# 报错：TypeError: 'TakeDataset' object is not subscriptable

<br>

In [16]:
[line for line in ds_file.take(3)]

[<tf.Tensor: shape=(), dtype=string, numpy=b'Line 1'>,
 <tf.Tensor: shape=(), dtype=string, numpy=b'    Line 2'>,
 <tf.Tensor: shape=(), dtype=string, numpy=b'    Line 3'>]

<br>
<br>

### Apply transformations

Use the transformations functions like `tf.data.Dataset.map`, `tf.data.Dataset.batch`, and `tf.data.Dataset.shuffle` to apply transformations to dataset records.

In [17]:
ds_tensors = ds_tensors.map(tf.math.square).shuffle(2).batch(2)

ds_file = ds_file.batch(2)

In [18]:
ds_file

<BatchDataset element_spec=TensorSpec(shape=(None,), dtype=tf.string, name=None)>

In [19]:
# ds_file.shape
#
# AttributeError: 'BatchDataset' object has no attribute 'shape'

<br>

### Iterate

`tf.data.Dataset` objects support iteration to loop over records:

In [20]:
print('Elements of ds_tensors:')
for x in ds_tensors:
    print(x)

print('\nElements in ds_file:')
for x in ds_file:
    print(x)

Elements of ds_tensors:
tf.Tensor([4 1], shape=(2,), dtype=int32)
tf.Tensor([16 25], shape=(2,), dtype=int32)
tf.Tensor([36  9], shape=(2,), dtype=int32)

Elements in ds_file:
tf.Tensor([b'Line 1' b'    Line 2'], shape=(2,), dtype=string)
tf.Tensor([b'    Line 3' b'  '], shape=(2,), dtype=string)


<br>
<br>
<br>

```python
# MIT License
#
# Copyright (c) 2017 François Chollet
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
```

<br>
<br>
<br>