 # Chapter_2:

## Tensors:
>>* **Scalars** are **`Rank-0 Tensor`** ... `np.array(22)` ... `np.array(22).ndim`
>>* **vectors** are **`Rank-1 Tensor`** ... `np.array([22])` ... `np.array([22]).ndim`
>>* **Matrices** can be **`Rank-n Tensor`** ... `np.array([[22]])` ... `np.array([[22]]).ndim`
>* usually, we deal with `Rank 0 -> 4` but we can deal with `Rank-5 tensors` if we process **video data**

### Real-world examples of Tensors:
>* **Vector data**—`Rank-2` tensors of shape `(samples, features)` samples are number of instances.
>>* A dataset of text documents, where we represent each document by the counts of how many times each word appears in it (out of a dictionary of 20,000 common words).`shape (500, 20000)`

>* **Timeseries data or sequence data**—`Rank-3` tensors of shape `(samples, timesteps, features)` audio data falls into this category also.
>>* A dataset of stock prices. **Every minute**, we store the `current price` of the stock, the `highest price in the past minute`, and the `lowest price in the past minute`. Thus, every minute is encoded as a 3D vector, an entire day of trading is encoded as a matrix of shape (390, 3) (there are 390 minutes in a trading day), and 250 days’ worth of data can be stored in a rank-3 tensor of `shape (250, 390, 3)`. Here, **each instance would be one day’s worth of data.**

>* **Images**—`Rank-4` tensors of shape `(samples, height, width, channels)`, where each sample is a 2D grid of pixels, and each pixel is represented by a vector of values (“channels”)
>>* A batch of 128 **grayscale images** of size 256 × 256 could thus be stored in a tensor of `shape (128, 256, 256, 1)`
>>* A batch of 128 **RGB images** or **HSV images** could be stored in a tensor of `shape (128, 256, 256, 3)`
>>> Note: there are two formats of images: `(samples, channels, height, width)`, `(samples, height, width, channels)`. The Keras API provides support for both formats. but the second is the Tensorflow standard one.

>* **Video**—`Rank-5` tensors of shape `(samples, frames, height, width, channels)`, where each sample is a sequence (of length frames) of images.
>>* A 60-second, 144 × 256 YouTube video clip sampled at 4 frames per second would have 240 frames. A batch of 7 such video clips would be stored in a tensor of `shape (7, 240, 144, 256, 3)`. That’s a total of **106,168,320** values! If the dtype of the tensor was `float32`, each value would be stored in `32 bits`, so the tensor would represent `405 MB`. Heavy! Videos you encounter in real life are much lighter, because they aren’t stored in `float32`, and they’re typically compressed by a large factor (such as in the `MPEG` format).

### Broadcasting:
> When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing (i.e. rightmost) dimension and works its way left. Two dimensions are compatible when

> 1. They are equal, or
> 2. One of them is 1.

> If these conditions are not met, a ValueError: operands could not be broadcast together exception is thrown, indicating that the arrays have incompatible shapes.

> Input arrays do not need to have the same number of dimensions. The resulting array will have the same number of dimensions as the input array with the greatest number of dimensions, where the size of each dimension is the largest size of the corresponding dimension among the input arrays. Note that missing dimensions are assumed to have size one.

> For example, if you have a 256x256x3 array of RGB values, and you want to scale each color in the image by a different value, you can multiply the image by a one-dimensional array with 3 values. Lining up the sizes of the trailing axes of these arrays according to the broadcast rules, shows that they are compatible:

>>* `Image  (3d array): 256 x 256 x 3`
>>* `Scale  (1d array):             3`
>>* `Result (3d array): 256 x 256 x 3`
>* When either of the dimensions compared is one, the other is used. In other words, dimensions with size 1 are stretched or “copied” to match the other.

>In terms of implementation, **no new rank-3 tensor is created for scaling values**, because that would be terribly inefficient. The **broadcasting operation is entirely virtual**: it happens **at the algorithmic level rather than at the memory level**.

In [1]:
import numpy as np

a = np.array([10.0, 20.0, 30.0])
b = np.array([[1.0, 2.0, 3.0]])
a.shape, b.shape, a+b

((3,), (1, 3), array([[11., 22., 33.]]))

In [2]:
# matrix multiplication is not cummtative.

c= np.array([[2,2],[2,2]])
d = np.array([[1,2],[3,4]])

np.dot(c,d), np.dot(d,c)

(array([[ 8, 12],
        [ 8, 12]]),
 array([[ 6,  6],
        [14, 14]]))

>* More generally, you can take the dot product between higher-dimensional tensors, following the same rules for shape compatibility as outlined earlier for the 2D case:
>>* `(a, b, c, d) • (d,)   → (a, b, c)`
>>* `(a, b, c, d) • (d, e) → (a, b, c, e)`

##  Geometric interpretation of tensor operations:
>`(idea is fully discussed in this chapter)`

>* Neural network basic operations are `(W.X +b)` which is a linear transformation and rotation of the space.
>* As we optimize the neural network, we try to find the best space that separates our categories if we are in classification task for example.
>* Think of it as this crampled paper example: before the paper is crampled, it was a flat sheet.
>* In neural network, our data got to us as crampled paper and we try to do the best geometrical tranformations (via training) that help us find this flat sheet.
### Some Geometrical transformation:
> ![1](https://github.com/OmarAllam22/Images-for-notebooks/blob/main/11.PNG?raw=true)
> ![2](https://github.com/OmarAllam22/Images-for-notebooks/blob/main/22.PNG?raw=true)
> ![3](https://github.com/OmarAllam22/Images-for-notebooks/blob/main/33.PNG?raw=true)
> ![4](https://github.com/OmarAllam22/Images-for-notebooks/blob/main/44.PNG?raw=true)
> ![5](https://github.com/OmarAllam22/Images-for-notebooks/blob/main/55.PNG?raw=true)
> ![6](https://github.com/OmarAllam22/Images-for-notebooks/blob/main/66.PNG?raw=true)
> ![7](https://github.com/OmarAllam22/Images-for-notebooks/blob/main/77.PNG?raw=true)

#### Optimizers:
>* Optimizers aren't the same thing as gradient descent. But they are algorithms that defines the way in which gradient descent updates parameters. Therefore, there is ordinary gradient descent (either mini-batch, batch or stocastic) and there are SGD with momentum, RMSprop, and Adagrad. 

>* There is, for instance, `SGD with momentum`, as well as `Adagrad`, `RMSprop`, and several others. Such variants are known as **optimization methods or optimizers**. In particular, the concept of momentum, which is used in many of these variants, deserves your attention. 
>* `Momentum` addresses two issues with SGD: **convergence speed** and **local minima**. Momentum can **help in reachig gloabl minima** and not stucking at local one.
>* Also `Learning Schedule`, method used to adjust learning rate over time, has effect on **reachig gloabl minima** and not stucking at local one. 

In [4]:
import tensorflow as tf
x = tf.Variable(0.)                      
with tf.GradientTape() as tape:          
    y = 2 * x + 3                        
grad_of_y_wrt_x = tape.gradient(y, x)  
grad_of_y_wrt_x

<tf.Tensor: shape=(), dtype=float32, numpy=2.0>

>* In the above cell, there are two important notes:

>>* `tf.Variable()` is a special type of tensor that is mutable, meaning its value can be changed during the execution of a **computation graph**. Unlike ordinary tensors, `tf.Variable` is designed to be **mutable** in a specific way that enables it to participate in computations within a TensorFlow graph. When you create a `tf.Variable`, TensorFlow tracks it as a node in the computational graph and automatically computes gradients for it during backpropagation. This means that when you update the value of a `tf.Variable`, TensorFlow knows that the graph has changed and can update any downstream operations that depend on that variable.
>>>* This is because `Tensorflow` depends on the idea of **flow** of **tensors** through the computation graphs.

>>* `GradientTape()`:  reecords operations for automatic differentiation.

>>* Don't forget to intialize the weights `W` and `b` randomly.

In [6]:
# Another example:

import tensorflow as tf

# define the model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(10, input_shape=(784,), activation='sigmoid'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

# define the loss function
def loss_fn(y_true, y_pred):
    return tf.keras.losses.binary_crossentropy(y_true, y_pred)

# define the optimizer
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)

# define the data
x_train = tf.random.normal((1000, 784))
y_train = tf.random.uniform((1000, 1), minval=0, maxval=2, dtype=tf.int32)

# define the training loop
for i in range(10):
    with tf.GradientTape() as tape:
        y_pred = model(x_train)
        loss = loss_fn(y_train, y_pred)
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
gradients

[<tf.Tensor: shape=(784, 10), dtype=float32, numpy=
 array([[-0.25137883, -2.6181302 , -1.9342039 , ...,  1.3155369 ,
         -0.2628116 ,  0.43288547],
        [ 0.1335316 , -1.8421947 ,  0.10670638, ...,  1.3369033 ,
          0.63244146,  0.40717515],
        [-0.41456538,  0.42275426,  1.0531392 , ...,  0.4635477 ,
          0.23076028, -0.4242389 ],
        ...,
        [ 0.01488625,  2.805472  ,  2.9826677 , ...,  0.8803037 ,
          0.32808232,  0.10142536],
        [ 0.05976688,  1.4644451 ,  3.1204963 , ...,  0.6548274 ,
         -0.5716774 ,  0.32218298],
        [ 0.22088124, -1.1825455 , -0.05075556, ..., -1.0533674 ,
          0.5194074 ,  0.0714402 ]], dtype=float32)>,
 <tf.Tensor: shape=(10,), dtype=float32, numpy=
 array([  2.322889 , -20.620571 , -20.07185  ,   4.1506615, -11.206947 ,
        -23.711704 , -10.409934 , -16.378399 ,   6.685594 ,  -4.8212214],
       dtype=float32)>,
 <tf.Tensor: shape=(10, 1), dtype=float32, numpy=
 array([[ -5.473991 ],
        [-32.

_______________________________