In [1]:
import tensorflow as tf

In [3]:
import numpy as np

Tensors
A Tensor is a multi-dimensional array. Similar to NumPy ndarray objects, tf.Tensor objects have a data type and a shape. Additionally, tf.Tensors can reside in accelerator memory (like a GPU). TensorFlow offers a rich library of operations (for example, tf.math.add, tf.linalg.matmul, and tf.linalg.inv) that consume and produce tf.Tensors. These operations automatically convert built-in Python types. For example:

In [2]:
print(tf.math.add(1, 2))
print(tf.math.add([1, 2], [3, 4]))
print(tf.math.square(5))
print(tf.math.reduce_sum([1, 2, 3]))

# Operator overloading is also supported
print(tf.math.square(2) + tf.math.square(3))

tf.Tensor(3, shape=(), dtype=int32)
tf.Tensor([4 6], shape=(2,), dtype=int32)
tf.Tensor(25, shape=(), dtype=int32)
tf.Tensor(6, shape=(), dtype=int32)
tf.Tensor(13, shape=(), dtype=int32)


In [5]:
print(np.add(1, 2))
print(np.add([1, 2], [3, 4]))
print(np.square(5))
# print(np.reduce_sum([1, 2, 3]))

# Operator overloading is also supported
print(np.square(2) + np.square(3))

3
[4 6]
25
13


Each tf.Tensor has a shape and a datatype:

In [6]:
x = tf.linalg.matmul([[1]], [[2, 3]])
print(x)
print(x.shape)
print(x.dtype)

tf.Tensor([[2 3]], shape=(1, 2), dtype=int32)
(1, 2)
<dtype: 'int32'>


In [8]:
x = np.multiply([[1]], [[2, 3]])
print(x)
print(x.shape)
print(x.dtype)

[[2 3]]
(1, 2)
int64


In [40]:
x = np.dot([[1]], [[2, 3]])
print(x)
print(x.shape)
print(x.dtype)

[[2 3]]
(1, 2)
int64


In [24]:
# np.multiply([1,6],[2, 3, 4])

In [22]:
np.multiply([1,6],[2, 3])

array([ 2, 18])

In [17]:
np.dot([1,6],[2, 3])

np.int64(20)

In [19]:
np.dot([1,6,4],[[2, 3], [2,4], [2,3]])

array([22, 39])

The most obvious differences between NumPy arrays and tf.Tensors are:

Tensors can be backed by accelerator memory (like GPU, TPU).
Tensors are immutable.

ensors are explicitly converted to NumPy ndarrays using their .numpy() method. These conversions are typically cheap since the array and tf.Tensor share the underlying memory representation, if possible. However, sharing the underlying representation isn't always possible since the tf.Tensor may be hosted in GPU memory while NumPy arrays are always backed by host memory, and the conversion involves a copy from GPU to host memory.

In [20]:
ndarray = np.ones([3, 3])
ndarray

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [21]:
print("TensorFlow operations convert numpy arrays to Tensors automatically")
tensor = tf.math.multiply(ndarray, 42)
print(tensor)

TensorFlow operations convert numpy arrays to Tensors automatically
tf.Tensor(
[[42. 42. 42.]
 [42. 42. 42.]
 [42. 42. 42.]], shape=(3, 3), dtype=float64)


In [25]:
print("And NumPy operations convert Tensors to NumPy arrays automatically")
print(np.add(tensor, 1))

And NumPy operations convert Tensors to NumPy arrays automatically
[[43. 43. 43.]
 [43. 43. 43.]
 [43. 43. 43.]]


In [26]:
print("The .numpy() method explicitly converts a Tensor to a numpy array")
print(tensor.numpy())

The .numpy() method explicitly converts a Tensor to a numpy array
[[42. 42. 42.]
 [42. 42. 42.]
 [42. 42. 42.]]


GPU acceleration
Many TensorFlow operations are accelerated using the GPU for computation. Without any annotations, TensorFlow automatically decides whether to use the GPU or CPU for an operation—copying the tensor between CPU and GPU memory, if necessary. Tensors produced by an operation are typically backed by the memory of the device on which the operation executed. For example:

In [27]:
print("Is there a GPU available: "),
print(tf.config.list_physical_devices("GPU"))

Is there a GPU available: 
[]


In [29]:
x = tf.random.uniform([3, 3])

In [30]:
x

<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[0.8449755 , 0.12773073, 0.08117127],
       [0.66799366, 0.29608917, 0.21665585],
       [0.8179673 , 0.24631274, 0.16238463]], dtype=float32)>

In [31]:
print("Is the Tensor on GPU #0:  "),
print(x.device.endswith('GPU:0'))

Is the Tensor on GPU #0:  
False


Device names
The Tensor.device property provides a fully qualified string name of the device hosting the contents of the tensor. This name encodes many details, such as an identifier of the network address of the host on which this program is executing and the device within that host. This is required for distributed execution of a TensorFlow program. The string ends with GPU:<N> if the tensor is placed on the N-th GPU on the host.


However, TensorFlow operations can be explicitly placed on specific devices using the tf.device context manager. For example:

In [33]:
import time

def time_matmul(x):
  start = time.time()
  for loop in range(10):
    tf.linalg.matmul(x, x)

  result = time.time()-start

  print("10 loops: {:0.2f}ms".format(1000*result))

# Force execution on CPU
print("On CPU:")
with tf.device("CPU:0"):
  x = tf.random.uniform([1000, 1000])
  assert x.device.endswith("CPU:0")
  time_matmul(x)

# Force execution on GPU #0 if available
if tf.config.list_physical_devices("GPU"):
  print("On GPU:")
  with tf.device("GPU:0"): # Or GPU:1 for the 2nd GPU, GPU:2 for the 3rd etc.
    x = tf.random.uniform([1000, 1000])
    assert x.device.endswith("GPU:0")
    time_matmul(x)

On CPU:
10 loops: 863.39ms


In [35]:
# print("On GPU:")
# with tf.device("GPU:0"): # Or GPU:1 for the 2nd GPU, GPU:2 for the 3rd etc.
#     x = tf.random.uniform([1000, 1000])
#     assert x.device.endswith("GPU:0")
#     time_matmul(x)

In [36]:
x = tf.random.uniform([1000, 1000])
x

<tf.Tensor: shape=(1000, 1000), dtype=float32, numpy=
array([[0.81983244, 0.19604385, 0.229913  , ..., 0.23888636, 0.2408377 ,
        0.53636587],
       [0.72336555, 0.67997515, 0.3489883 , ..., 0.66726327, 0.38190365,
        0.5603876 ],
       [0.96385753, 0.11692548, 0.5668056 , ..., 0.04055905, 0.6299014 ,
        0.9153645 ],
       ...,
       [0.00293708, 0.25390017, 0.20568776, ..., 0.38237143, 0.44168222,
        0.20762956],
       [0.75548506, 0.84115577, 0.10746491, ..., 0.9055704 , 0.5898943 ,
        0.23693836],
       [0.63817096, 0.19101882, 0.20496035, ..., 0.84124935, 0.02521706,
        0.22696614]], dtype=float32)>

In [37]:
tf.linalg.matmul(x, x)

<tf.Tensor: shape=(1000, 1000), dtype=float32, numpy=
array([[235.45631, 251.81937, 241.7474 , ..., 243.86044, 246.05455,
        242.9179 ],
       [236.75821, 255.52428, 238.84564, ..., 242.32378, 246.53368,
        245.0959 ],
       [238.8576 , 259.1536 , 242.15292, ..., 253.00952, 257.80334,
        253.7004 ],
       ...,
       [251.40213, 257.87973, 248.24126, ..., 255.15138, 256.55768,
        251.34755],
       [241.95166, 255.66258, 240.57596, ..., 243.66803, 250.64886,
        244.01338],
       [247.52821, 259.07687, 242.63423, ..., 253.01822, 256.44254,
        252.03659]], dtype=float32)>

In [38]:
np.multiply(x, x)

array([[6.7212522e-01, 3.8433190e-02, 5.2859984e-02, ..., 5.7066690e-02,
        5.8002796e-02, 2.8768834e-01],
       [5.2325773e-01, 4.6236619e-01, 1.2179283e-01, ..., 4.4524026e-01,
        1.4585039e-01, 3.1403428e-01],
       [9.2902136e-01, 1.3671568e-02, 3.2126859e-01, ..., 1.6450369e-03,
        3.9677578e-01, 8.3789217e-01],
       ...,
       [8.6264299e-06, 6.4465299e-02, 4.2307455e-02, ..., 1.4620791e-01,
        1.9508319e-01, 4.3110035e-02],
       [5.7075769e-01, 7.0754302e-01, 1.1548706e-02, ..., 8.2005775e-01,
        3.4797528e-01, 5.6139786e-02],
       [4.0726218e-01, 3.6488190e-02, 4.2008743e-02, ..., 7.0770049e-01,
        6.3589995e-04, 5.1513631e-02]], dtype=float32)

In [39]:
np.dot(x, x)

array([[235.45628, 251.81943, 241.74738, ..., 243.86038, 246.05458,
        242.91785],
       [236.75824, 255.52434, 238.84567, ..., 242.32376, 246.5337 ,
        245.09598],
       [238.85764, 259.15356, 242.1529 , ..., 253.00954, 257.80328,
        253.70035],
       ...,
       [251.40216, 257.87973, 248.24124, ..., 255.15138, 256.5577 ,
        251.3476 ],
       [241.95172, 255.66258, 240.57599, ..., 243.66805, 250.64879,
        244.01332],
       [247.52824, 259.0768 , 242.63422, ..., 253.01825, 256.44257,
        252.03659]], dtype=float32)

Datasets
This section uses the tf.data.Dataset API to build a pipeline for feeding data to your model. tf.data.Dataset is used to build performant, complex input pipelines from simple, re-usable pieces that will feed your model's training or evaluation loops. (Refer to the tf.data: Build TensorFlow input pipelines guide to learn more.)

In [41]:
ds_tensors = tf.data.Dataset.from_tensor_slices([1, 2, 3, 4, 5, 6])

In [42]:
ds_tensors

<_TensorSliceDataset element_spec=TensorSpec(shape=(), dtype=tf.int32, name=None)>

In [43]:
# Create a CSV file
import tempfile
_, filename = tempfile.mkstemp()

In [44]:
_

3

In [45]:
filename

'C:\\Users\\lenovo\\AppData\\Local\\Temp\\tmp0wsdmy7g'

In [46]:
with open(filename, 'w') as f:
  f.write("""Line 1
             Line 2
             Line 3
          """)

In [60]:
with open(filename, 'r') as f:
  f.readlines()

In [61]:
f

<_io.TextIOWrapper name='C:\\Users\\lenovo\\AppData\\Local\\Temp\\tmp0wsdmy7g' mode='r' encoding='utf-8'>

In [50]:
ds_file = tf.data.TextLineDataset(filename)

In [51]:
ds_file

<TextLineDatasetV2 element_spec=TensorSpec(shape=(), dtype=tf.string, name=None)>

In [52]:
ds_tensors = ds_tensors.map(tf.math.square).shuffle(2).batch(2)

In [53]:
ds_tensors

<_BatchDataset element_spec=TensorSpec(shape=(None,), dtype=tf.int32, name=None)>

In [54]:
ds_file = ds_file.batch(2)

In [55]:
ds_file

<_BatchDataset element_spec=TensorSpec(shape=(None,), dtype=tf.string, name=None)>

In [56]:
print('Elements of ds_tensors:')
for x in ds_tensors:
  print(x)

print('\nElements in ds_file:')
for x in ds_file:
  print(x)

Elements of ds_tensors:
tf.Tensor([4 1], shape=(2,), dtype=int32)
tf.Tensor([16  9], shape=(2,), dtype=int32)
tf.Tensor([25 36], shape=(2,), dtype=int32)

Elements in ds_file:
tf.Tensor([b'Line 1' b'             Line 2'], shape=(2,), dtype=string)
tf.Tensor([b'             Line 3' b'          '], shape=(2,), dtype=string)
