# Data representations for neural networks
Matrices, which are 2D tensors: tensors are a generalization of matrices to an arbitrary number of dimensions
(note that in the context of tensors, a dimension is often called an axis).



## Scalar (0D tensors)
A tensor that contains only one number is called a scalar (or scalar tensor, or 0-dimensional
tensor, or 0D tensor). In Numpy, a float32 or float64 number is a scalar tensor (or scalar
array). You can display the number of axes of a Numpy tensor via the ndim attribute; a scalar
tensor has 0 axes (ndim == 0). The number of axes of a tensor is also called its rank.
Here’s a Numpy scalar:


In [2]:

import numpy as np
x = np.array(12)
print(x)
x.ndim


12


0

## Vector(1D tensors)
An array of numbers is called a vector, or 1D tensor. A 1D tensor is said to have exactly
one axis. Following is a Numpy vector:


In [9]:
x = np.array([12, 3, 6, 14])
print(x)
print(x.ndim)
print(x.shape)


[12  3  6 14]
1
(4,)


This vector has five entries and so is called a 5-dimensional vector. 

Don’t confuse a 5D vector with a 5D tensor! A 5D vector has only one axis and has five dimensions along its
axis, whereas a 5D tensor has five axes (and may have any number of dimensions
along each axis). 

Dimensionality can denote either the number of entries along a specific
axis (as in the case of our 5D vector) or the number of axes in a tensor (such as a
5D tensor), which can be confusing at times. In the latter case, it’s technically more
correct to talk about a tensor of rank 5 (the rank of a tensor being the number of axes),
but the ambiguous notation 5D tensor is common regardless.





## Matrices (2D tensors)

An array of vectors is a matrix, or 2D tensor. A matrix has two axes (often referred to
rows and columns). You can visually interpret a matrix as a rectangular grid of numbers.
This is a Numpy matrix:



In [8]:

x = np.array([[5, 78, 2, 34, 0],
[6, 79, 3, 35, 1],
[7, 80, 4, 36, 2]])

print(x.ndim)

print(x.shape)


2
(3, 5)



The entries from the first axis are called the rows, and the entries from the second axis
are called the columns. In the previous example, [5, 78, 2, 34, 0] is the first row of x,
and [5, 6, 7] is the first column.



## 3D tensors and higher-dimensional tensors

If you pack such matrices in a new array, you obtain a 3D tensor, which you can visually
interpret as a cube of numbers. Following is a Numpy 3D tensor:


In [15]:

x = np.array([[[5, 78, 2, 34, 0],
                [6, 79, 3, 35, 1],
                [7, 80, 4, 36, 2]],
                [[5, 78, 2, 34, 0],
                [6, 79, 3, 35, 1],
                [7, 80, 4, 36, 2]],
                [[5, 78, 2, 34, 0],
                [6, 79, 3, 35, 1],
                [7, 80, 4, 36, 2]]])

print(x)

print(x.ndim)

print(x.shape)

print(x.dtype)

[[[ 5 78  2 34  0]
  [ 6 79  3 35  1]
  [ 7 80  4 36  2]]

 [[ 5 78  2 34  0]
  [ 6 79  3 35  1]
  [ 7 80  4 36  2]]

 [[ 5 78  2 34  0]
  [ 6 79  3 35  1]
  [ 7 80  4 36  2]]]
3
(3, 3, 5)
int32


By packing 3D tensors in an array, you can create a 4D tensor, and so on. In deep learning,
you’ll generally manipulate tensors that are 0D to 4D, although you may go up to
5D if you process video data.



## Key attributes

A tensor is defined by three key attributes:

* Number of axes (rank)—For instance, a 3D tensor has three axes, and a matrix has
two axes. This is also called the tensor’s ndim in Python libraries such as Numpy.

* Shape—This is a tuple of integers that describes how many dimensions the tensor
has along each axis. For instance, the previous matrix example has shape
(3, 5), and the 3D tensor example has shape (3, 3, 5). A vector has a shape
with a single element, such as (5,), whereas a scalar has an empty shape, ().

* Data type (usually called dtype in Python libraries)—This is the type of the data
contained in the tensor; for instance, a tensor’s type could be float32, uint8,
float64, and so on. On rare occasions, you may see a char tensor. Note that
string tensors don’t exist in Numpy (or in most other libraries), because tensors
live in preallocated, contiguous memory segments: and strings, being variable
length, would preclude the use of this implementation.



In [14]:
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

print(train_images.ndim)

print(train_images.shape)

print(train_images.dtype)


3
(60000, 28, 28)
uint8


So what we have here is a 3D tensor of 8-bit integers. More precisely, it’s an array of 60,000 matrices of 28 × 8 integers.


## Manipulating tensors in Numpy

In the previous example, we selected a specific digit alongside the first axis using the
syntax train_images[i]. Selecting specific elements in a tensor is called tensor slicing.
Let’s look at the tensor-slicing operations you can do on Numpy arrays.
The following example selects digits #10 to #100 (#100 isn’t included) and puts
them in an array of shape (90, 28, 28)


In [17]:
my_slice=train_images[10:100]
print(my_slice.shape)


(90, 28, 28)


It’s equivalent to this more detailed notation, which specifies a start index and stop
index for the slice along each tensor axis. Note that : is equivalent to selecting the
entire axis:



In [None]:
my_slice = train_images[10:100, :, :]  #Equivalent to the previous example
print(my_slice.shape)

my_slice = train_images[10:100, 0:28, 0:28] #Also equivalent to the previous example
print(my_slice.shape)



In general, you may select between any two indices along each tensor axis. For
instance, in order to select 14 × 14 pixels **in the bottom-right corner of all images**


In [18]:
my_slice = train_images[:, 14:, 14:]

It’s also possible to use negative indices. Much like negative indices in Python lists,
they indicate a position relative to the end of the current axis. In order to crop the
images to patches of 14 × 14 pixels centered in the middle, you do this:


In [None]:
my_slice2 = train_images[:, 7:-7, 7:-7]

## The notion of data batches

In general, the first axis (axis 0, because indexing starts at 0) in all data tensors you’ll
come across in deep learning will be the samples axis (sometimes called the samples
dimension). 

In the MNIST example, samples are images of digits.
In addition, deep-learning models don’t process an entire dataset at once; rather,
they break the data into small batches. Concretely, here’s one batch of our MNIST digits,
with batch size of 128:

In [22]:
batch = train_images[:128]
batch.shape

(128, 28, 28)

And here’s the next batch:

In [23]:
batch1 = train_images[128:256]
batch1.shape

(128, 28, 28)

And the nth batch:

```python
batch2 = train_images[128 * n:128 * (n + 1)]
```
When considering such a batch tensor, the first axis (axis 0) is called the batch axis or
batch dimension. This is a term you’ll frequently encounter when using Keras and other
deep-learning libraries.

## Real-world examples of data tensors
Let’s make data tensors more concrete with a few examples similar to what you’ll
encounter later. The data you’ll manipulate will almost always fall into one of the following
categories:
* Vector data—2D tensors of shape (samples, features)
* **Timeseries data or sequence data—3D tensors of shape (samples, timesteps,
features)**
* Images—4D tensors of shape (samples, height, width, channels) or (samples,
channels, height, width). (samples, height, width, color_depth)> Tensorflow

* Video—5D tensors of shape (samples, frames, height, width, channels) or
(samples, frames, channels, height, width)

## Vector data
This is the most common case. In such a dataset, each single data point can be encoded
as a vector, and thus a batch of data will be encoded as a 2D tensor (that is, an array of
vectors), where the first axis is the samples axis and the second axis is the features axis.
Let’s take a look at two examples:
* An actuarial dataset of people, where we consider each person’s age, ZIP code,
and income. Each person can be characterized as a vector of 3 values, and thus
an entire dataset of 100,000 people can be stored in a 2D tensor of shape
(100000, 3). *(number of samples, number of features)*


* A dataset of text documents, where we represent each document by the counts
of how many times each word appears in it (out of a dictionary of 20,000 common
words). Each document can be encoded as a vector of 20,000 values (one
count per word in the dictionary), and thus an entire dataset of 500 documents
can be stored in a tensor of shape (500, 20000). *(Words== Features)*

## Timeseries data or sequence data
Whenever time matters in your data (or the notion of sequence order), it makes sense
to store it in a 3D tensor with an explicit time axis. Each sample can be encoded as a
sequence of vectors (a 2D tensor), and thus a batch of data will be encoded as a 3D
tensor 

**The time axis is always the second axis (axis of index 1)**, by convention. Let’s look at a
few examples:

* A dataset of stock prices. Every minute, we store the current price of the stock,
the highest price in the past minute, and the lowest price in the past minute.
Thus every minute is encoded as a 3D vector, an entire day of trading is
encoded as a 2D tensor of shape (390, 3) (there are 390 minutes in a trading
day), and 250 days’ worth of data can be stored in a 3D tensor of shape (250,
390, 3). Here, each sample would be one day’s worth of data. *(samples, timesteps,
features)*


* A dataset of tweets, where we encode each tweet as a sequence of 280 characters
out of an alphabet of 128 unique characters. In this setting, each character can
be encoded as a binary vector of size 128 (an all-zeros vector except for a 1 entry
at the index corresponding to the character). Then each tweet can be encoded
as a 2D tensor of shape (280, 128), and a dataset of 1 million tweets can be
stored in a tensor of shape (1000000, 280, 128).

-------------------------------------------------------

In our initial example, we were building our network by stacking Dense layers on
top of each other. A Keras layer instance looks like this:
```python
keras.layers.Dense(512, activation='relu')
```

This layer can be interpreted as a function, which takes as input a 2D tensor and
returns another 2D tensor—a new representation for the input tensor. Specifically, the
function is as follows (where W is a 2D tensor and b is a vector, both attributes of the
layer):

```python
output = relu(dot(W, input) + b)
```
Let’s unpack this. We have three tensor operations here: a dot product (dot) between
the input tensor and a tensor named W; an addition (+) between the resulting 2D tensor
and a vector b; and, finally, a relu operation. relu(x) is max(x, 0).

## Element-wise operations


The relu operation and addition are element-wise operations: operations that are
applied independently to each entry in the tensors being considered. This means
these operations are highly amenable to massively parallel implementations (vectorized
implementations, a term that comes from the vector processor supercomputer architecture
from the 1970–1990 period). If you want to write a naive Python implementation
of an element-wise operation, you use a for loop, as in this naive
implementation of an element-wise relu operation:


```python
def naive_relu(x):
    assert len(x.shape) == 2   #<----- x is a 2D Numpy tensor     (assert=ileri surmek)
    x = x.copy()
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i, j] = max(x[i, j], 0)
    return x
```

You do the same for addition:

```python
def naive_add(x, y):    
    assert len(x.shape) == 2     #<---------- x and y are 2D Numpy tensors.
    assert x.shape == y.shape
    x = x.copy()                #<---------- Avoid overwriting the input tensor.
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i, j] += y[i, j]
    return x
```
On the same principle, you can do element-wise multiplication, subtraction, and so on.
In practice, when dealing with Numpy arrays, these operations are available as welloptimized
built-in Numpy functions, which themselves delegate the heavy lifting to a
Basic Linear Algebra Subprograms (BLAS) implementation if you have one installed
(which you should). BLAS are low-level, highly parallel, efficient tensor-manipulation
routines that are typically implemented in Fortran or C.
So, in Numpy, you can do the following element-wise operation, and it will be blazing
fast:

```python
import numpy as np
z= x + y  #Element-wise addition
z=np.maximum(z,0,)  #Element-wise relu
```

## Broadcasting

What happens with addition when the shapes of the two tensors
being added differ?
When possible, and if there’s no ambiguity, the smaller tensor will be broadcasted to
match the shape of the larger tensor. Broadcasting consists of two steps:

1.  Axes (called broadcast axes) are added to the smaller tensor to match the ndim of
the larger tensor.

2.  The smaller tensor is repeated alongside these new axes to match the full shape
of the larger tensor.

Let’s look at a concrete example. Consider X with shape (32, 10) and y with shape
(10,). 
* First, we add an empty first axis to y, whose shape becomes (1, 10). 
* Then, we repeat y 32 times alongside this new axis, so that we end up with a tensor Y with shape
(32, 10), where Y[i, :] == y for i in range(0, 32). At this point, we can proceed to
add X and Y, because they have the same shape.

The repetition operation is entirely virtual: it happens at the algorithmic
level rather than at the memory level.

With broadcasting, you can generally apply two-tensor element-wise operations if one
tensor has shape (a, b, … n, n + 1, … m) and the other has shape (n, n + 1, … m). The
broadcasting will then automatically happen for axes a through n - 1.
The following example applies the element-wise maximum operation to two tensors
of different shapes via broadcasting:


In [25]:
import numpy as np
x = np.random.random((64, 3, 32, 10))  #x is a random tensor with shape (64, 3, 32, 10).
y = np.random.random((32, 10)) #y is a random tensor with shape (32, 10).
z = np.maximum(x, y) #The output z has shape (64, 3, 32, 10) like x.



## Tensor dot

Two vectors is a scalar and that only vectors with the same number of elements are compatible for a dot product.


You can also take the dot product between a matrix x and a vector y, which returns a vector where the coefficients are the dot products between y and the rows of x.


Note that as soon as one of the two tensors has an ndim greater than 1, dot is no longer
symmetric, which is to say that dot(x, y) isn’t the same as dot(y, x).
Of course, a dot product generalizes to tensors with an arbitrary number of axes.
The most common applications may be the dot product between two matrices. You
can take the dot product of two matrices x and y (dot(x, y)) if and only if
x.shape[1] == y.shape[0]. The result is a matrix with shape (x.shape[0],
y.shape[1]), where the coefficients are the vector products between the rows of x
and the columns of y.



## Tensor reshaping

A third type of tensor operation that’s essential to understand is tensor reshaping.
Although it wasn’t used in the Dense layers in our first neural network example, we
used it when we preprocessed the digits data before feeding it into our network:
train_images = train_images.reshape((60000, 28 * 28))
Reshaping a tensor means rearranging its rows and columns to match a target shape.
Naturally, the reshaped tensor has the same total number of coefficients as the initial
tensor. Reshaping is best understood via simple examples:


In [34]:
x = np.array([[0., 1.],
[2., 3.],
[4., 5.]])

print(x.shape)

x1=x.reshape((6,1))
print(x1)

x2=x.reshape((2,3))
print(x2)


(3, 2)
[[0.]
 [1.]
 [2.]
 [3.]
 [4.]
 [5.]]
[[0. 1. 2.]
 [3. 4. 5.]]
