## Tensor

A tensor is a container for numerical data. Tensors are a generalization of matrices to an arbitrary number of dimensions. 

In general, all current machine-learning systems require tensors as their input data structure.

A tensor has the following properties:

 * **Number of axis:** A vector has 1 axis, a matrix has 2 axis ets. The number of axis is also called the rank.
 * **Shape:** A tuple of integers that describes how many dimensions the tensor has along each axis. A matrix might have a shape (3,5). A vector has a shape with a single element, such as (5), whereas a scalar has an empty shape ().
 * **Data type**: The data type of the tensor values. This is a numeric like int8 or float32, but it could also be a character or string.
 
Lets look at the MNIST dataset for an example:

In [9]:
import numpy as np
from keras.datasets import mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

print('Tensor train_images:')
print('Number of axes:', train_images.ndim)
print('Shape:', train_images.shape)
print('Data type:', train_images.dtype)

Tensor train_images:
Number of axes: 3
Shape: (60000, 28, 28)
Data type: uint8


`train_images` is a 3D tensor that is an array of 60,000 matrices of 28 × 28 integers.

### Scalars (0D tensors):
A tensor that contains only one number is called a *scalar*.

In [10]:
x = np.array(12)
print('Number of axes:', x.ndim)
print('Shape:', x.shape)
print('Data type:', x.dtype)

Number of axes: 0
Shape: ()
Data type: int64


### Vectors (1D tensors)
An array of numbers is called a *vector*, it has one axis.

In [11]:
x = np.array([3.0, 5.0, 1.0, 24.0])
print('Number of axes:', x.ndim)
print('Shape:', x.shape)
print('Data type:', x.dtype)

Number of axes: 1
Shape: (4,)
Data type: float64


This vector is a a 4-dimensional vector.

**Note:** Don’t confuse a 4-dimensional vector with a 4D tensor! A 4-dimensional vector has only one axis and has four dimensions along its axis, whereas a 4D tensor has four axes (and may have any number of dimensions along each axis). The term dimensionality is overloaded and can denote either the number of entries along a specific axis (as in the case of our 4-dimensional vector) or the number of axes in a tensor (such as a 4D tensor).




### Matrices (2D tensors)
An array of vectors is a matrix. A matrix has two axes, often referred to as rows and columns. 

In [13]:
x = np.array([[5, 63, 5, 24, 8],
              [6, 71, 8, 42, 3],
              [7, 33, 1, 30, 8]])
print('Number of axes:', x.ndim)
print('Shape:', x.shape)
print('Data type:', x.dtype)

Number of axes: 2
Shape: (3, 5)
Data type: int64


### Higher dimensional Tensors
We can create an nD Tensor by arranging (n-1)D Tensors in an array. Example for a 3D Tensor:

In [14]:
x = np.array([
    [[1, 2, 3],
     [4, 5, 6],
     [7, 8, 9]],
    [[10, 20, 30],
     [40, 50, 60],
     [70, 80, 90]]
    ])
print('Number of axes:', x.ndim)
print('Shape:', x.shape)
print('Data type:', x.dtype)

Number of axes: 3
Shape: (2, 3, 3)
Data type: int64


## TODO Tensor operations

## Practical tensor examples

In general, the first axis (axis 0) in all data tensors will be the samples axis. In the MNIST example, samples are images of digits. In addition, models don’t process an entire dataset at once, it is usually broken into smaller batches. For such a batch tensor, the first axis (axis 0) is called the batch axis.

### Timeseries data

When you want to represent a timeseries it makes sense to store it in a 3D tensor with an explicit time axis. In this case axis 1 will represent time and axis 2 will represent the features of the example.

<img src="images/3d-tensor.png" height="30" width="400"/>

For example take a dataset of stock prices:

 * There are 390 minutes in a trading day, this are our timesteps
 * At the end of every minute the following features are recorded:
   * the current stock price
   * the highest price in the past minute
   * the lowest price in the past minute
   
When a trading day constitutes a single example it can be represented as tensor of shape (390, 3). A dataset of 200 trading days can be represented as tensor of shape (200, 390, 3).

### Image data

Images have three dimensions: height, width, and color depth. A batch of 128 color images can be stored in a tensor of shape (128, 256, 256, 3).

<img src="images/4d-tensor.png" height="30" width="400"/>

The color dimension is often called **channel**. Note that there may be image formats with more than 3 dimensions, for example satellite images often contain non-visible bands of radiation.

Grayscale images could be stored as a 2D tensor, e.g. (256, 256). But usually they are stored as a 3D tensor with a 'fake' dimension, e.g. (256, 256, 1. It is usually more convenient to use the same shape for grayscale and color images.

There are two formats for shapes of image tensors: 

 * `channels-last` places the color-depth axis at the end: (samples, height, width, color_depth)
 * `channels-first` places the color depth axis right after the batch axis: (samples, color_depth, height, width)
 

### Text data
The problem with text is the variable length of sentences and documents that must be represented as a fixed shaped tensor. A common solution is to convert each document to a fixed size vector, for example by using a TFIDF representation or by using a Doc2Vec algorithm.  The other solution is to represent each word as an integer ID and just put the sequence of word in a vector of fixed size. Shorter documents are padded and longer documents are cut off.

## Output encoding

Binary classification: Output is the probability of the class label 1.

Categorical classification: Output is a probability distribution over all classes.

Depending on the problem the output often looks like one of the input encodings. Text translation, speech recognition, semantic segmentation (image to image), video to video translation etc.