## Dynamic & Static

At present, the neural network framework is divided into a static graph framework and a dynamic graph framework. The biggest difference between PyTorch and TensorFlow, Caffe and other frameworks is that they have different computational graph representations. TensorFlow uses static graphs, which means that we first define the computation graph and then use it continuously, and in PyTorch, we rebuild a new computation graph each time. Through this course, we will understand the advantages and disadvantages between static and dynamic images.

For the user, there are very big differences between the two forms of calculation graphs. At the same time, static graphs and dynamic graphs have their own advantages. For example, dynamic graphs are more convenient for debugging, and users can debug in any way they like. At the same time, it is very intuitive, and the static graph is defined by running it first. After running it again, it is no longer necessary to rebuild the graph, so the speed will be faster than the dynamic graph.

![](https://ws3.sinaimg.cn/large/006tNc79ly1fmai482qumg30rs0fmq6e.gif)

### Tensorflow: Static Graph

In TensorFlow, we define the computational graph once and then execute the same graph over and over again, possibly feeding different input data to the graph. 
Here we use TensorFlow to fit a simple two-layer net:

In [1]:
# Code in file autograd/tf_two_layer_net.py
import tensorflow as tf
import numpy as np

# First we set up the computational graph:

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

  from ._conv import register_converters as _register_converters


In [2]:
# Create placeholders for the input and target data; these will be filled
# with real data when we execute the graph.
x = tf.placeholder(tf.float32, shape=(None, D_in))
y = tf.placeholder(tf.float32, shape=(None, D_out))

# Create Variables for the weights and initialize them with random data.
# A TensorFlow Variable persists its value across executions of the graph.
w1 = tf.Variable(tf.random_normal((D_in, H)))
w2 = tf.Variable(tf.random_normal((H, D_out)))

In [3]:
# Forward pass: Compute the predicted y using operations on TensorFlow Tensors.
# Note that this code does not actually perform any numeric operations; it
# merely sets up the computational graph that we will later execute.
h = tf.matmul(x, w1)
h_relu = tf.maximum(h, tf.zeros(1))
y_pred = tf.matmul(h_relu, w2)

In [4]:
# Compute loss using operations on TensorFlow Tensors
loss = tf.reduce_sum((y - y_pred) ** 2.0)

# Compute gradient of the loss with respect to w1 and w2.
grad_w1, grad_w2 = tf.gradients(loss, [w1, w2])

# Update the weights using gradient descent. To actually update the weights
# we need to evaluate new_w1 and new_w2 when executing the graph. Note that
# in TensorFlow the the act of updating the value of the weights is part of
# the computational graph; in PyTorch this happens outside the computational
# graph.
learning_rate = 1e-6
new_w1 = w1.assign(w1 - learning_rate * grad_w1)
new_w2 = w2.assign(w2 - learning_rate * grad_w2)

In [5]:
# Now we have built our computational graph, so we enter a TensorFlow session to
# actually execute the graph.
with tf.Session() as sess:
  # Run the graph once to initialize the Variables w1 and w2.
  sess.run(tf.global_variables_initializer())

  # Create numpy arrays holding the actual data for the inputs x and targets y
  x_value = np.random.randn(N, D_in)
  y_value = np.random.randn(N, D_out)
  for _ in range(500):
    # Execute the graph many times. Each time it executes we want to bind
    # x_value to x and y_value to y, specified with the feed_dict argument.
    # Each time we execute the graph we want to compute the values for loss,
    # new_w1, and new_w2; the values of these Tensors are returned as numpy
    # arrays.
    loss_value, _, _ = sess.run([loss, new_w1, new_w2],
                                feed_dict={x: x_value, y: y_value})
    print(loss_value)

27855396.0
22304114.0
20955164.0
20435418.0
19126156.0
16154638.0
12239609.0
8337312.5
5337310.0
3323413.0
2100637.5
1379123.2
955671.5
699161.5
536945.2
428586.94
352145.7
295298.16
251192.06
215878.75
186988.39
162916.22
142595.66
125296.69
110459.336
97668.375
86587.92
76945.73
68529.28
61161.36
54693.164
48998.867
43974.984
39530.066
35594.78
32099.953
28991.652
26220.05
23745.195
21529.629
19543.95
17761.402
16157.533
14714.482
13412.182
12235.715
11172.024
10209.08
9336.236
8544.175
7825.163
7171.174
6576.0796
6034.2227
5540.472
5089.7773
4678.326
4302.56
3959.326
3644.7527
3356.7056
3092.2358
2849.9905
2627.6777
2423.7437
2236.565
2064.6602
1906.6196
1761.2465
1627.5468
1504.5259
1391.2698
1286.8767
1190.6262
1101.9138
1020.16144
944.67206
875.01965
810.72144
751.32556
696.5172
645.80994
598.93677
555.5604
515.43
478.34186
443.97736
412.1577
382.7119
355.43677
330.17737
306.7508
285.0323
264.89874
246.23602
228.92667
212.86612
197.96039
184.12537
171.28299
159.36926
148.31435
13

### Pytorch: Dynamic Graph

Here we use PyTorch Tensors and autograd to implement our two-layer network.

When using autograd, the forward pass of your network will define a computational graph; nodes in the graph will be Tensors, and edges will be functions that produce output Tensors from input Tensors. Backpropagating through this graph then allows you to easily compute gradients.

In [6]:
import torch

device = torch.device('cpu')
# device = torch.device('cuda') # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

In [7]:
# Create random Tensors to hold input and outputs
x = torch.randn(N, D_in, device=device)
y = torch.randn(N, D_out, device=device)

# Create random Tensors for weights; setting requires_grad=True means that we
# want to compute gradients for these Tensors during the backward pass.
w1 = torch.randn(D_in, H, device=device, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, requires_grad=True)

In [8]:
learning_rate = 1e-6
for t in range(500):
      # Forward pass: compute predicted y using operations on Tensors. Since w1 and
      # w2 have requires_grad=True, operations involving these Tensors will cause
      # PyTorch to build a computational graph, allowing automatic computation of
      # gradients. Since we are no longer implementing the backward pass by hand we
      # don't need to keep references to intermediate values.
      y_pred = x.mm(w1).clamp(min=0).mm(w2)

      # Compute and print loss. Loss is a Tensor of shape (), and loss.item()
      # is a Python number giving its value.
      loss = (y_pred - y).pow(2).sum()
      print(t, loss.item())

      # Use autograd to compute the backward pass. This call will compute the
      # gradient of loss with respect to all Tensors with requires_grad=True.
      # After this call w1.grad and w2.grad will be Tensors holding the gradient
      # of the loss with respect to w1 and w2 respectively.
      loss.backward()

      # Update weights using gradient descent. For this step we just want to mutate
      # the values of w1 and w2 in-place; we don't want to build up a computational
      # graph for the update steps, so we use the torch.no_grad() context manager
      # to prevent PyTorch from building a computational graph for the updates
      with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad

        # Manually zero the gradients after running the backward pass
        w1.grad.zero_()
        w2.grad.zero_()

0 34676156.0
1 31003460.0
2 28006888.0
3 22690798.0
4 16010568.0
5 9993319.0
6 5935406.0
7 3604461.0
8 2358743.25
9 1681317.5
10 1287884.375
11 1036888.6875
12 861576.6875
13 730156.125
14 627009.0625
15 543271.5
16 473835.0
17 415600.15625
18 366345.21875
19 324227.21875
20 287952.15625
21 256592.421875
22 229325.734375
23 205518.171875
24 184644.296875
25 166278.171875
26 150046.5
27 135670.0
28 122907.6875
29 111545.2109375
30 101396.140625
31 92318.265625
32 84202.4609375
33 76903.265625
34 70328.328125
35 64399.71875
36 59044.68359375
37 54194.0625
38 49795.609375
39 45799.80078125
40 42170.5390625
41 38869.48046875
42 35859.8203125
43 33114.39453125
44 30604.767578125
45 28307.232421875
46 26203.96875
47 24276.0390625
48 22505.15234375
49 20877.240234375
50 19379.43359375
51 18000.31640625
52 16728.583984375
53 15556.255859375
54 14474.2626953125
55 13475.0244140625
56 12550.2607421875
57 11695.1162109375
58 10903.232421875
59 10170.10546875
60 9490.240234375
61 8859.5771484375
6

429 0.0001787524961400777
430 0.00017459774971939623
431 0.0001703459129203111
432 0.00016702547145541757
433 0.0001630586921237409
434 0.00015913098468445241
435 0.00015652754518669099
436 0.00015282558160834014
437 0.00014949274191167206
438 0.00014634917897637933
439 0.00014341549831442535
440 0.00014059308159630746
441 0.00013798766303807497
442 0.00013543413660954684
443 0.00013244131696410477
444 0.00012995539873372763
445 0.00012726624845527112
446 0.00012493676331359893
447 0.00012244308891240507
448 0.0001203226056532003
449 0.00011782200454035774
450 0.00011502904089866206
451 0.00011307625391054899
452 0.00011080759577453136
453 0.00010920184286078438
454 0.00010718530393205583
455 0.00010493594891158864
456 0.00010309080244041979
457 0.00010094624303746969
458 9.970759856514633e-05
459 9.791643969947472e-05
460 9.592665446689352e-05
461 9.431871876586229e-05
462 9.2730660981033e-05
463 9.117438457906246e-05
464 8.932536729844287e-05
465 8.803009404800832e-05
466 8.664794586