# Data Representation of Neural Networks


---
* Tensor
  * Container for data (usually numerical data)
  * Generalization of matrices to an arbitrary number of dimensions
    * In tensor, dimension is often referred as "axis"
  * [ELI5] A tensor is simply a way of mathematically representing physical information, generally at a point in space.
    * Say you want to represent information regarding temperature. You just need a single number ( a scalar or a 0th order tensor).

    * Now say you want to represent fluid velocity in a flow. You need 3 numbers to represent the components of velocity in the three orthogonal (perpendicular) directions. We arrange these numbers in a row and call it a vector or 1st order tensor.
    * Tensors are used because you can perform tensor math operations on them to quickly perform otherwise very time consuming algebra. Tensor math is just fancy ways to add and multiply the elements together in meaningful ways.


## Scalars (rank-0 tensors)
* A tensor that contains only one number is called a scalar (or scalar tensor, or rank-0
tensor, or 0D tensor)
* A scalar tensor has 0 axes (ndim == 0)

In [2]:
import numpy as np
x = np.array(12)
x.ndim

0

## Vectors (rank-1 tensors)
* An array of numbers is called a vector, or rank-1 tensor, or 1D tensor
* Have exactly one axis

In [3]:
x = np.array([12, 3, 6, 14, 7])
x.ndim

1

## Matrices (rank-2 tensors)
* An array of vectors is a matrix, or rank-2 tensor, or 2D tensor
* Has two axes
(often referred to as rows and columns)
* Example
  * Vector data—Rank-2 tensors of shape (samples, features), where each sample
is a vector of numerical attributes (“features”)
    * An actuarial dataset of people, where we consider each person’s age, gender,
and income. Each person can be characterized as a vector of 3 values, and thus
an entire dataset of 100,000 people can be stored in a rank-2 tensor of shape
`(100000, 3)`.

```json
{
   "data":{
      "Person 1":{
         "age":10,
         "gender":"M",
         "income":10000
      },
      "Person 2":{
         "age":10,
         "gender":"M",
         "income":10000
      },
      "Person 3": {},
      "Person 4" : {},
      ...............
      "Person 100k" : {}
   }
}
```

In [4]:
x = np.array([[5, 78, 2, 34, 0],
              [6, 79, 3, 35, 1],
              [7, 80, 4, 36, 2]])
x.ndim

2

## Rank-3 and higher tensors
* If you pack such matrices in a new array, you obtain a rank-3 tensor (or 3D tensor),
which you can visually interpret as a cube of numbers
* By packing rank-3 tensors in an array, you can create a rank-4 tensor, and so on
* Example
  * Timeseries data or sequence data—Rank-3 tensors of shape (samples, timesteps,
features), where each sample is a sequence (of length timesteps) of feature
vectors
    * A dataset of stock prices. Every minute, we store the current price of the stock,
the highest price in the past minute, and the lowest price in the past minute.
Thus, every minute is encoded as a 3D vector, an entire day of trading is
encoded as a matrix of shape `(390, 3)` (there are 390 minutes in a trading day),
and 250 days’ worth of data can be stored in a rank-3 tensor of shape `(250,
390, 3)`. Here, each sample would be one day’s worth of data.

```json
{
   "data":{
      "Day_1":{
         "Minute_1":{
            "Current_price":1000,
            "Highest_price":1200,
            "Lowest_price":900
         },
         "Minute_2":{
            "Current_price":1000,
            "Highest_price":1200,
            "Lowest_price":900
         },
         "Minute_3" : {},
         "Minute_4" : {},
         ................
         "Minute_390" : {},
      },
      ..........................
      "Day_250" : {}
   }
}
```

In [5]:
x = np.array([[[5, 78, 2, 34, 0],
              [6, 79, 3, 35, 1],
              [7, 80, 4, 36, 2]],
              [[5, 78, 2, 34, 0],
              [6, 79, 3, 35, 1],
              [7, 80, 4, 36, 2]],
              [[5, 78, 2, 34, 0],
              [6, 79, 3, 35, 1],
              [7, 80, 4, 36, 2]]])
x.ndim

3

### Image Data (Rank-4 Tensors)
* Images typically have three dimensions: height, width, and color depth
  * A batch of 128 grayscale images
of size 256 × 256 could thus be stored in a tensor of shape `(128, 256, 256, 1)`
  * A
batch of 128 color images could be stored in a tensor of shape `(128, 256, 256, 3)`



### Video Data (Rank-5 Tensors)
* A video can be understood as a sequence of frames, each frame being a color
image
* Because each frame can be stored in a rank-3 tensor (height, width, color_
depth), a sequence of frames can be stored in a rank-4 tensor (frames, height,
width, color_depth)
  * A batch of different videos can be stored in a rank-5
tensor of shape (samples, frames, height, width, color_depth)
*  For instance, a 60-second, 144 × 256 YouTube video clip sampled at 4 frames per
second would have 240 frames. A batch of four such video clips would be stored in a
tensor of shape `(4, 240, 144, 256, 3)`

## Key Attributes of a Tensor
* A tensor is defined by:
  * Number of axes (rank)
  * Shape : This is a tuple of integers that describes how many dimensions the tensor has along each axis
  * Data type (dtype) : This is the type of the data
contained in the tensor
 
From MNIST dataset, we have rank-3 tensor of 8-bit integers. More precisely, it’s an array of
60,000 matrices of 28 × 28 integers. Each such matrix is a grayscale image, with coefficients between 0 and 255


In [6]:
from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images.ndim

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


3

In [7]:
train_images.shape

(60000, 28, 28)

In [8]:
train_images.dtype

dtype('uint8')