## Data Manipulation

n-dimensional array aka tensor

similar to ndarray from numpy with some killer features. 
 - gpu suppoted to accelerate the coputation
 - tensor class supports automatic differentiation
 
 hence widely used for deep learning

 Level | Level for Humans | Level Description                  
 -------|------------------|------------------------------------ 
  0     | DEBUG            | [Default] Print all messages       
  1     | INFO             | Filter out INFO messages           
  2     | WARNING          | Filter out INFO & WARNING messages 
  3     | ERROR            | Filter out all messages      

In [68]:
import tensorflow as tf
tf.get_logger().setLevel('INFO')

1 axis tensor = vector

2 axis tensors = matrix

3 or more axis = no mathematical name.

In [69]:
x = tf.range(12)
x

<tf.Tensor: shape=(12,), dtype=int32, numpy=array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11], dtype=int32)>

In [70]:
x.shape

TensorShape([12])

if we want total number of elements in the tensor, (multiplication of values in shape), use the following

In [71]:
tf.size(x)

<tf.Tensor: shape=(), dtype=int32, numpy=12>

In [72]:
x = tf.reshape(x, (3, 4))
x.shape

TensorShape([3, 4])

dont have to mention all dimensions in reshape function. put -1 in one value, tf will automatically find out the rest. 

In [73]:
x = tf.reshape(x, (-1, 3))
x.shape

TensorShape([4, 3])

we can initialize with zero/constants/random samples from a distribution. 

can be done like below

In [74]:
tf.zeros((2, 3, 4))

<tf.Tensor: shape=(2, 3, 4), dtype=float32, numpy=
array([[[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]], dtype=float32)>

In [75]:
tf.ones((2, 3, 4))

<tf.Tensor: shape=(2, 3, 4), dtype=float32, numpy=
array([[[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]],

       [[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]]], dtype=float32)>

In [76]:
tf.random.normal(shape = [3, 4])

<tf.Tensor: shape=(3, 4), dtype=float32, numpy=
array([[-0.67817247,  1.3990657 ,  0.4567462 ,  0.30003092],
       [ 0.14387898,  1.0841188 ,  0.7172565 ,  2.1216674 ],
       [-1.7713815 ,  0.20140563,  0.6365468 , -1.2981452 ]],
      dtype=float32)>

we can also specify exact values using python lists or ndarrays. 

In [77]:
tf.constant([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

<tf.Tensor: shape=(3, 3), dtype=int32, numpy=
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]], dtype=int32)>

### Operations on Tensors

In [78]:
x = tf.range(5,10)
y = tf.range(11,16)

In [79]:
x, y

(<tf.Tensor: shape=(5,), dtype=int32, numpy=array([5, 6, 7, 8, 9], dtype=int32)>,
 <tf.Tensor: shape=(5,), dtype=int32, numpy=array([11, 12, 13, 14, 15], dtype=int32)>)

In [80]:
x+y, x-y, x*y, x/y, x**y

(<tf.Tensor: shape=(5,), dtype=int32, numpy=array([16, 18, 20, 22, 24], dtype=int32)>,
 <tf.Tensor: shape=(5,), dtype=int32, numpy=array([-6, -6, -6, -6, -6], dtype=int32)>,
 <tf.Tensor: shape=(5,), dtype=int32, numpy=array([ 55,  72,  91, 112, 135], dtype=int32)>,
 <tf.Tensor: shape=(5,), dtype=float64, numpy=array([0.45454545, 0.5       , 0.53846154, 0.57142857, 0.6       ])>,
 <tf.Tensor: shape=(5,), dtype=int32, numpy=
 array([   48828125, -2118184960, -1895237401,           0, -1010140999],
       dtype=int32)>)

In [81]:
tf.math.exp(5.5)

<tf.Tensor: shape=(), dtype=float32, numpy=244.69194>

we can also concatenate multiple tensors to form a larger tensor. 

axis 0 -> first element of tf.shape

axis 1 -> second element of tf.shape

In [82]:
x = tf.reshape(tf.range(12, dtype = tf.float32), (3, 4))
y = tf.reshape(tf.range(12, 24, dtype = tf.float32), (3, 4))

In [83]:
tf.concat([x, y], axis = 0)

<tf.Tensor: shape=(6, 4), dtype=float32, numpy=
array([[ 0.,  1.,  2.,  3.],
       [ 4.,  5.,  6.,  7.],
       [ 8.,  9., 10., 11.],
       [12., 13., 14., 15.],
       [16., 17., 18., 19.],
       [20., 21., 22., 23.]], dtype=float32)>

In [84]:
tf.concat([x, y], axis = 1)

<tf.Tensor: shape=(3, 8), dtype=float32, numpy=
array([[ 0.,  1.,  2.,  3., 12., 13., 14., 15.],
       [ 4.,  5.,  6.,  7., 16., 17., 18., 19.],
       [ 8.,  9., 10., 11., 20., 21., 22., 23.]], dtype=float32)>

we can create binary tensors using comparision operators

In [85]:
x == y 

<tf.Tensor: shape=(3, 4), dtype=bool, numpy=
array([[False, False, False, False],
       [False, False, False, False],
       [False, False, False, False]])>

summing all elements in tensors yields tensor with only one element

In [86]:
tf.reduce_sum(x)

<tf.Tensor: shape=(), dtype=float32, numpy=66.0>

### Broadcasting Mechanism

broadcasting : lets operations be performed on tensors of uneven size by expanding one or both tensors to a compatible shape

In [87]:
a = tf.reshape(tf.range(3), (3, 1))
a

<tf.Tensor: shape=(3, 1), dtype=int32, numpy=
array([[0],
       [1],
       [2]], dtype=int32)>

In [88]:
b = tf.reshape(tf.range(2), (1, 2))
b

<tf.Tensor: shape=(1, 2), dtype=int32, numpy=array([[0, 1]], dtype=int32)>

In [89]:
a + b

<tf.Tensor: shape=(3, 2), dtype=int32, numpy=
array([[0, 1],
       [1, 2],
       [2, 3]], dtype=int32)>

indexing is just like in normal python lists

### Saving Memory

executing operations on tensors can sometimes cause the result to be hosted at a different memory locaiton. 

this is undesirable because we want to make optimal usage of memory and there might be hundreds of operations left. 

so we want to perform these operations _in place_ 

if this is not done, other references will still point to the old location leading to references to stale parameters 

In [90]:
x = tf.range(12)
y = tf.range(13, 25)
print(id(x))
print(id(y))
y = x + y
print(id(y))

139827493260176
139827493260368
139827493261136


In [91]:
z = tf.Variable(tf.zeros_like(y))
print(id(z))
z.assign(x + y)
print(id(z))

139827493077968
139827493077968


even when tf.Variable is used, it is better not to use it when not needed because, tensorflow tensors are immutable and gradients do not flow through variable assignments.

but tensorflow provides tf.function decorator to wrap computation inside a tensorflow graph that gets compiled and optimized before running. 

this allows tf to prune unused values and reuse prior allocations and no longer needed. 

this minimizes the memory overhead of tensorflow computations

In [92]:
@tf.function
def computation(x, y):
    z = tf.zeros_like(y)
    a = x+y
    b = a+y
    c = b+y
    return c+y

computation(x, y)

<tf.Tensor: shape=(12,), dtype=int32, numpy=
array([ 52,  61,  70,  79,  88,  97, 106, 115, 124, 133, 142, 151],
      dtype=int32)>

when tensorflow arrays are converted to other datatypes (ndarrays for ex) they do not share memory. this detail needs more attention as it could lead to computation halts because the memory is being used by numpy when needed

In [93]:
a = x.numpy()
b = tf.constant(a)
type(a), type(b)

(numpy.ndarray, tensorflow.python.framework.ops.EagerTensor)

to convert a size 1 tensor to a python scalar, we use item function or other python functions

In [94]:
a = tf.constant([3.5]).numpy()
a, a.item(), float(a), int(a)

(array([3.5], dtype=float32), 3.5, 3.5, 3)

## Data Preprocessing

creating temp dataset

In [95]:
import os

os.makedirs(os.path.join('..', 'data'), exist_ok=True)
data_file = os.path.join('..', 'data', 'house_tiny.csv')
with open(data_file, 'w') as f:
    f.write('NumRooms,Alley,Price\n')  # Column names
    f.write('NA,Pave,127500\n')  # Each row represents a data example
    f.write('2,NA,106000\n')
    f.write('4,NA,178100\n')
    f.write('NA,NA,140000\n')

In [96]:
import pandas as pd

data = pd.read_csv(data_file)
print(data)

   NumRooms Alley   Price
0       NaN  Pave  127500
1       2.0   NaN  106000
2       4.0   NaN  178100
3       NaN   NaN  140000


In [97]:
inputs, outputs = data.iloc[:, 0:2], data.iloc[:, 2]
inputs['NumRooms'] = inputs['NumRooms'].fillna(inputs['NumRooms'].mean())
print(inputs)

   NumRooms Alley
0       3.0  Pave
1       2.0   NaN
2       4.0   NaN
3       3.0   NaN


In [98]:
inputs = pd.get_dummies(inputs, dummy_na=True)
print(inputs)

   NumRooms  Alley_Pave  Alley_nan
0       3.0           1          0
1       2.0           0          1
2       4.0           0          1
3       3.0           0          1


In [99]:
X, y = tf.constant(inputs.values), tf.constant(outputs.values)
X, y

(<tf.Tensor: shape=(4, 3), dtype=float64, numpy=
 array([[3., 1., 0.],
        [2., 0., 1.],
        [4., 0., 1.],
        [3., 0., 1.]])>,
 <tf.Tensor: shape=(4,), dtype=int64, numpy=array([127500, 106000, 178100, 140000])>)

### Exercises

In [100]:
data = pd.read_csv(data_file)
inputs, outputs = data.iloc[:, 0:2], data.iloc[:, 2]
inputs['NumRooms'] = inputs['NumRooms'].fillna(inputs['NumRooms'].mean())
print(inputs)

   NumRooms Alley
0       3.0  Pave
1       2.0   NaN
2       4.0   NaN
3       3.0   NaN


In [101]:
inputs = inputs.drop('Alley', axis = 1)

In [102]:
X = tf.constant(inputs)
X

<tf.Tensor: shape=(4, 1), dtype=float64, numpy=
array([[3.],
       [2.],
       [4.],
       [3.]])>

## Linear Algebra

x ∈ R means x in R

scalar in tensorflow = tensor with just one element

In [103]:
x = tf.constant(3.0)
y = tf.constant(5.0)

x+y, x-y, x*y, x/y, x**y

(<tf.Tensor: shape=(), dtype=float32, numpy=8.0>,
 <tf.Tensor: shape=(), dtype=float32, numpy=-2.0>,
 <tf.Tensor: shape=(), dtype=float32, numpy=15.0>,
 <tf.Tensor: shape=(), dtype=float32, numpy=0.6>,
 <tf.Tensor: shape=(), dtype=float32, numpy=242.99998>)

vector = list of scalar values. 

In [104]:
x = tf.range(4)
x

<tf.Tensor: shape=(4,), dtype=int32, numpy=array([0, 1, 2, 3], dtype=int32)>

if it's a normal list, we would use len(), but for tensors we use x.shape

In [105]:
x.shape

TensorShape([4])

generally, uppercase letters are used to denote matrices. 

mathematically, we represent them as: A∈Rm×n 

square matrix => m = n

In [106]:
A = tf.reshape(tf.range(20), (5, 4))
A

<tf.Tensor: shape=(5, 4), dtype=int32, numpy=
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]], dtype=int32)>

tf.transpose(A) to calculate transpose of matrix. 

In [107]:
tf.transpose(A)

<tf.Tensor: shape=(4, 5), dtype=int32, numpy=
array([[ 0,  4,  8, 12, 16],
       [ 1,  5,  9, 13, 17],
       [ 2,  6, 10, 14, 18],
       [ 3,  7, 11, 15, 19]], dtype=int32)>

just as vectors generalize scalars, matrices generalizes vectors, tensors are used to add more dimensions thus generalizing matrices. 

In [108]:
T = tf.reshape(tf.range(40), (2,5,4))
T

<tf.Tensor: shape=(2, 5, 4), dtype=int32, numpy=
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15],
        [16, 17, 18, 19]],

       [[20, 21, 22, 23],
        [24, 25, 26, 27],
        [28, 29, 30, 31],
        [32, 33, 34, 35],
        [36, 37, 38, 39]]], dtype=int32)>

2 = number of tensors

5 = number of rows in each matrix

4 = number of columns in each matrix

In [109]:
A = tf.reshape(tf.range(20, dtype = tf.float32), (5, 4))
B = A
A, A+B

(<tf.Tensor: shape=(5, 4), dtype=float32, numpy=
 array([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [12., 13., 14., 15.],
        [16., 17., 18., 19.]], dtype=float32)>,
 <tf.Tensor: shape=(5, 4), dtype=float32, numpy=
 array([[ 0.,  2.,  4.,  6.],
        [ 8., 10., 12., 14.],
        [16., 18., 20., 22.],
        [24., 26., 28., 30.],
        [32., 34., 36., 38.]], dtype=float32)>)

no new memory is allocated to B above

In [110]:
id(A), id(B)

(139827492943248, 139827492943248)

hadamard product = elementwise product

In [111]:
A * B

<tf.Tensor: shape=(5, 4), dtype=float32, numpy=
array([[  0.,   1.,   4.,   9.],
       [ 16.,  25.,  36.,  49.],
       [ 64.,  81., 100., 121.],
       [144., 169., 196., 225.],
       [256., 289., 324., 361.]], dtype=float32)>

tf.reduce_sum() to calculate sum of all elements of tensor

In [112]:
A = tf.reshape(tf.range(100), (2,5,10))
tf.reduce_sum(A)

<tf.Tensor: shape=(), dtype=int32, numpy=4950>

we can also specify an axis along which it needs to be reduced

In [113]:
tf.reduce_sum(A, axis = 0), tf.reduce_sum(A, axis = 1), tf.reduce_sum(A, axis = 2)

(<tf.Tensor: shape=(5, 10), dtype=int32, numpy=
 array([[ 50,  52,  54,  56,  58,  60,  62,  64,  66,  68],
        [ 70,  72,  74,  76,  78,  80,  82,  84,  86,  88],
        [ 90,  92,  94,  96,  98, 100, 102, 104, 106, 108],
        [110, 112, 114, 116, 118, 120, 122, 124, 126, 128],
        [130, 132, 134, 136, 138, 140, 142, 144, 146, 148]], dtype=int32)>,
 <tf.Tensor: shape=(2, 10), dtype=int32, numpy=
 array([[100, 105, 110, 115, 120, 125, 130, 135, 140, 145],
        [350, 355, 360, 365, 370, 375, 380, 385, 390, 395]], dtype=int32)>,
 <tf.Tensor: shape=(2, 5), dtype=int32, numpy=
 array([[ 45, 145, 245, 345, 445],
        [545, 645, 745, 845, 945]], dtype=int32)>)

reducing a matrix along all axes is equivalent to sum of all elements aka reduce_sum without axis parameter

In [114]:
tf.reduce_sum(A, [0, 1, 2])

<tf.Tensor: shape=(), dtype=int32, numpy=4950>

In [115]:
A

<tf.Tensor: shape=(2, 5, 10), dtype=int32, numpy=
array([[[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
        [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]],

       [[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
        [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
        [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
        [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
        [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]]], dtype=int32)>

In [116]:
sum_a = tf.reduce_sum(A, axis = 1, keepdims = True)
sum_a

<tf.Tensor: shape=(2, 1, 10), dtype=int32, numpy=
array([[[100, 105, 110, 115, 120, 125, 130, 135, 140, 145]],

       [[350, 355, 360, 365, 370, 375, 380, 385, 390, 395]]], dtype=int32)>

we can calculate sum across an axis without reducing it's dimensions using cumsum()

In [117]:
tf.cumsum(A, axis = 0)

<tf.Tensor: shape=(2, 5, 10), dtype=int32, numpy=
array([[[  0,   1,   2,   3,   4,   5,   6,   7,   8,   9],
        [ 10,  11,  12,  13,  14,  15,  16,  17,  18,  19],
        [ 20,  21,  22,  23,  24,  25,  26,  27,  28,  29],
        [ 30,  31,  32,  33,  34,  35,  36,  37,  38,  39],
        [ 40,  41,  42,  43,  44,  45,  46,  47,  48,  49]],

       [[ 50,  52,  54,  56,  58,  60,  62,  64,  66,  68],
        [ 70,  72,  74,  76,  78,  80,  82,  84,  86,  88],
        [ 90,  92,  94,  96,  98, 100, 102, 104, 106, 108],
        [110, 112, 114, 116, 118, 120, 122, 124, 126, 128],
        [130, 132, 134, 136, 138, 140, 142, 144, 146, 148]]], dtype=int32)>

the second tensor is sum of first and second tensors

dot product = sum over products of elements in the same position

In [118]:
x = tf.Variable([0, 1, 2, 3], dtype = tf.float32)
y = tf.ones(4, dtype = tf.float32)
x, y, tf.tensordot(x, y, axes = 1)

(<tf.Variable 'Variable:0' shape=(4,) dtype=float32, numpy=array([0., 1., 2., 3.], dtype=float32)>,
 <tf.Tensor: shape=(4,), dtype=float32, numpy=array([1., 1., 1., 1.], dtype=float32)>,
 <tf.Tensor: shape=(), dtype=float32, numpy=6.0>)

it's equivalent to

In [119]:
tf.reduce_sum(x * y)

<tf.Tensor: shape=(), dtype=float32, numpy=6.0>

matrix-vector product = Ax is a matrix vector product of length m whose i'th element is the dot product of aiTx

this can be thought of as transformation of vector from Rn to Rm. 

tf.linalg.matvec(A, x) is the syntax. 

the height of A must be the same as length of x

In [120]:
A = tf.reshape(tf.range(20, dtype = tf.float32), (5,4))
A.shape

TensorShape([5, 4])

In [121]:
x.shape

TensorShape([4])

In [122]:
tf.linalg.matvec(A, x)

<tf.Tensor: shape=(5,), dtype=float32, numpy=array([ 14.,  38.,  62.,  86., 110.], dtype=float32)>

tf.matmul() to perform matrix multiplication

In [123]:
B = tf.ones((4,3), tf.float32)
tf.matmul(A, B)

<tf.Tensor: shape=(5, 3), dtype=float32, numpy=
array([[ 6.,  6.,  6.],
       [22., 22., 22.],
       [38., 38., 38.],
       [54., 54., 54.],
       [70., 70., 70.]], dtype=float32)>

norms intuitively tell us how big a vector is. big in terms of dimensionality rather than the magnitude of the components. 

in linear algebra, norm is a function that maps a vector to a scalar. 

l2 norm is square root of sum of squares of all elements. 

l2 is most common used norm in deep learning. 

In [124]:
u = tf.range(20, dtype = tf.float32)
tf.norm(u)

<tf.Tensor: shape=(), dtype=float32, numpy=49.699093>

In [125]:
v = tf.reshape(tf.range(20, dtype = tf.float32), (5,4))
tf.norm(v)

<tf.Tensor: shape=(), dtype=float32, numpy=49.699093>

l1 norm is sum of absolute values of all elements. 

we can use tf.reduce_sum(tf.abs(u)) to calculate l1 norm

frobenius norm is square root of sum of squares of matrix elements. it is analogous to l2 norm

why do we need norms?

in deep learning we often try to solve optimization problems such as maximize probability assigned to observed data, minimize distance between predictions and ground truth observations. 

this optimizations are often expressed as norms. 

### Exercises

In [126]:
A = tf.reshape(tf.range(20, dtype = tf.float32), (5, 4))
A == tf.transpose(tf.transpose(A))

<tf.Tensor: shape=(5, 4), dtype=bool, numpy=
array([[ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True]])>

In [127]:
B = tf.reshape(tf.range(20, 40, dtype = tf.float32), (5, 4))

tf.transpose(A + B) == tf.transpose(A) + tf.transpose(B)

<tf.Tensor: shape=(4, 5), dtype=bool, numpy=
array([[ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True]])>

In [128]:
C = tf.reshape(tf.range(16, dtype = tf.float32), (4, 4))
C + tf.transpose(C)

<tf.Tensor: shape=(4, 4), dtype=float32, numpy=
array([[ 0.,  5., 10., 15.],
       [ 5., 10., 15., 20.],
       [10., 15., 20., 25.],
       [15., 20., 25., 30.]], dtype=float32)>

In [129]:
X = tf.reshape(tf.range(40), (2,5,4))
len(X)

2

In [130]:
Z = tf.reshape(tf.range(80, dtype = tf.float32), (2,2,5,4))
tf.linalg.norm(Z)

<tf.Tensor: shape=(), dtype=float32, numpy=409.2432>

## Calculus

method of exhaustion = in the old days, people did not know how to calculate areas of circles. so what they did was they inscribe polygons inside the circles, calculated the  areas of those polygons to approximate the area of the circle. 

this is where integral calculus got it's origin. 


almost 2000 years later, differential calculus was invented. optimization problems is one of the best applications of differential calculus. 

in deep learning, we train models, updating them successively until the results get better. 

usually getting better means minimizing a loss function. loss function's result is basically a score that tells how bad is the model. 

this task of fitting models is decomposed into two tasks. 

1. optimization: fitting our models to observed data. 

2. generalization: producing models that extend beyond the data samples on which the model is trained on. 

in deep learning, we typically choose loss functions that are differentiable. this is so that we can determine how rapidly the loss would increase or decrease if we were to increase or decrease that parameter by an infinitesimally small amount. 

In [131]:
def f(x):
    return 3 * x ** 2 - 4 * x

## Automatic Differentiation

deep learning frameworks expedite the process of differentiation by automatically calculating the derivatives aka automatic differentiation. 

based on the model, the system builds a computational graph which tracks the operations performed on the data to get the output. 

automatic differentiation enables system to subsequently backpropogate gradients. backpropogate = filling in the partial deriatives wrt each parameter. 

In [132]:
x = tf.range(4, dtype = tf.float32)

it is important that we do not allocate new memory every time derivative is taken because we often update the same parameteres thousands of times and we could quickly run out of memory. 

In [133]:
x = tf.Variable(x)
x

<tf.Variable 'Variable:0' shape=(4,) dtype=float32, numpy=array([0., 1., 2., 3.], dtype=float32)>

assume y = 2 * xT * x

In [134]:
with tf.GradientTape() as t:
    y = 2 * tf.tensordot(x, x, axes = 1)

y

<tf.Tensor: shape=(), dtype=float32, numpy=28.0>

In [135]:
x_grad = t.gradient(y, x)
x_grad

<tf.Tensor: shape=(4,), dtype=float32, numpy=array([ 0.,  4.,  8., 12.], dtype=float32)>

y = 2xTx,

it's derivative is 4x. let's verify that

In [136]:
x_grad == 4*x

<tf.Tensor: shape=(4,), dtype=bool, numpy=array([ True,  True,  True,  True])>

In [137]:
abc = tf.range(4, dtype= tf.float32)
tf.reduce_sum(abc), 2 * abc, tf.reduce_sum(abc * abc)

(<tf.Tensor: shape=(), dtype=float32, numpy=6.0>,
 <tf.Tensor: shape=(4,), dtype=float32, numpy=array([0., 2., 4., 6.], dtype=float32)>,
 <tf.Tensor: shape=(), dtype=float32, numpy=14.0>)

detaching computation:

say y is calculated as a function of x and z is calculated as a function of y. sometimes we want to calculate the gradient of z with respect to x but for some reason do it while treating y as a constant and only take into account the role x played in calculating z. 

this can be done using tf.stop_gradient()

In [138]:
with tf.GradientTape(persistent = True) as t:
    y = x * x
    u = tf.stop_gradient(y)
    z = u * x

In [141]:
x_grad = t.gradient(z, x)
x_grad == u

<tf.Tensor: shape=(4,), dtype=bool, numpy=array([ True,  True,  True,  True])>

In [142]:
t.gradient(y, x) == 2 * x

<tf.Tensor: shape=(4,), dtype=bool, numpy=array([ True,  True,  True,  True])>

gradients can also be calculated for python control flow statements like conditionals, loops and function calls. 

In [143]:
def f(a):
    b = a * 2
    while tf.norm(b) < 1000:
        b = b * 2
    if tf.reduce_sum(b) > 0:
        c = b
    else:
        c = 100 * b
    return c

In [144]:
a = tf.Variable(tf.random.normal(shape = ()))
with tf.GradientTape() as t:
    d = f(a)

d_grad = t.gradient(d, a)
d_grad

<tf.Tensor: shape=(), dtype=float32, numpy=204800.0>

In [145]:
def f(a):
    b = a * 2
    while tf.norm(b) < 1000:
        b = b * 2
    if tf.reduce_sum(b) > 0:
        c = b
    else:
        c = 100 * b
    return c

In [146]:
a = tf.Variable(tf.random.normal(shape=()))
with tf.GradientTape() as t:
    d = f(a)
d_grad = t.gradient(d, a)
d_grad

<tf.Tensor: shape=(), dtype=float32, numpy=204800.0>

In [147]:
d_grad == d/a

<tf.Tensor: shape=(), dtype=bool, numpy=False>

## Probability 

machine learning in some ways is all about making predictions. 

we need to think predict the probability of an event happening or how likely a reading is about to happen etc.,

refer to github.com/asrjy/ai-notes for detailed notes on probability of machine learning. 