## TensorFlow: Static Graphs
PyTorch autograd looks a lot like TensorFlow: in both frameworks we define a computational graph, and use automatic differentiation to compute gradients.
The biggest difference betwee the to is that TensorFlow's computational graphs are static and PyTorch uses dynamic computation graphs.

In TensorFlow, we define the computational graph once and then execute the same graph over and over again, possibly feeding input data to the graph. In PyTorch, each forward pass defines a new computational graph.

In [2]:
# importing necessary packages
import tensorflow as tf
import numpy as np

In [3]:
# First we set up the computational graph:

# N is batch size
# D_in is input dimension
# H is hidden dimension
# D_out is output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

In [4]:
# creating placefolders for the i/p and target data

x = tf.placeholder(tf.float32, shape=(None, D_in))
y = tf.placeholder(tf.float32, shape=(None, D_out))

In [5]:
# create Variables for the weights and 
# initialize them with random data.
# A TensorFlow Variable persists its value across
# executions of the graph

w1 = tf.Variable(tf.random_normal((D_in, H)))
w2 = tf.Variable(tf.random_normal((H, D_out)))

In [6]:
# Forward pass
# Compute the predicted y using operations on TensorFlow Tensors.

h = tf.matmul(x, w1)
h_relu = tf.maximum(h, tf.zeros(1))
y_pred = tf.matmul(h_relu, w2)

In [7]:
# Compute loss using operations on Tensorflow Tensors
loss = tf.reduce_sum((y - y_pred) ** 2.0)

In [8]:
# Compute gradient of the loss w.r.t w1 and w2
grad_w1, grad_w2 = tf.gradients(loss, [w1, w2])

In [9]:
# Update the weights using gradient descent.
# To update the weights, we need to evaluate new_w1 and new_w2
# when executing the graph
# Note: 
# In Tensorflow, the act of updating the value of the weights
# is part of the computational graph;
# In PyTorch, this happens outside the computational graph

learning_rate = 1e-6 # learing rate

new_w1 = w1.assign(w1 - learning_rate * grad_w1)
new_w2 = w2.assign(w2 - learning_rate * grad_w2)

In [10]:
# no of epochs
epochs = 500

In [14]:
# starting the Tensorflow session to execute the graph
with tf.Session() as sess:
    # Run the graph once to initialize the Variables w1 and w2
    sess.run(tf.global_variables_initializer())
    
    # Create numpy arrays holding the actual data
    # for the inputs x and targets y
    x_value = np.random.randn(N, D_in)
    y_value = np.random.randn(N, D_out)
    
    for i in range(epochs):
        # Executing multiple times;
        # Each time it executes, we want to bind
        # x_value to x and y_value to y (specified using feed_dict)
        # Each time we execute the graph, we want to compute loss
        # the values of new_w1 & new_w2 are returned as numpy arrays
        
        loss_value, _, _ = sess.run([loss, new_w1, new_w2],
                                   feed_dict={x:x_value, y:y_value})
        print(i, loss_value)

0 26856216.0
1 22561762.0
2 21610822.0
3 20932736.0
4 19044610.0
5 15430631.0
6 11177201.0
7 7319739.0
8 4573650.0
9 2830814.5
10 1813216.5
11 1225071.6
12 881036.0
13 670022.44
14 532796.6
15 437559.25
16 367538.3
17 313462.66
18 270213.34
19 234721.12
20 205090.83
21 180031.86
22 158664.44
23 140342.77
24 124536.38
25 110832.79
26 98876.66
27 88413.414
28 79225.69
29 71129.57
30 63976.473
31 57643.695
32 52022.652
33 47022.266
34 42561.43
35 38579.438
36 35019.89
37 31828.242
38 28959.805
39 26377.145
40 24049.463
41 21948.855
42 20050.271
43 18332.729
44 16776.467
45 15364.527
46 14082.914
47 12917.729
48 11857.256
49 10891.403
50 10011.109
51 9207.289
52 8473.479
53 7802.87
54 7189.3604
55 6628.0723
56 6113.5586
57 5641.924
58 5209.412
59 4812.579
60 4448.0854
61 4113.158
62 3805.3748
63 3522.254
64 3261.5608
65 3021.539
66 2800.333
67 2596.374
68 2408.2568
69 2234.6687
70 2074.4536
71 1926.4451
72 1789.784
73 1663.3711
74 1546.479
75 1438.3016
76 1338.2213
77 1245.5511
78 1159.731