# Warm-up: numpy

Before introducing PyTorch, we will first implement the network using numpy.

Numpy provides an n-dimensional array object, and many functions for manipulating these arrays. Numpy is a generic framework for scientific computing; it does not know anything about computation graphs, or deep learning, or gradients. However we can easily use numpy to fit a two-layer network to random data by manually implementing the forward and backward passes through the network using numpy operations:

In [2]:
# -*- coding: utf-8 -*-
import numpy as np

# Manual backpropagation!!!

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

# Randomly initialize weights
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y
    h = x.dot(w1)
    h_relu = np.maximum(h, 0)
    y_pred = h_relu.dot(w2)

    # Compute and print loss
    loss = np.square(y_pred - y).sum()
    print(t, loss)

    # Backprop to compute gradients of w1 and w2 with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h < 0] = 0
    grad_w1 = x.T.dot(grad_h)

    # Update weights
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 33604035.24248965
1 34429935.33959008
2 40323255.426083535
3 43828622.5379537
4 37651793.99561229
5 23818217.722445913
6 11419783.48685687
7 4977190.318096895
8 2438917.617055584
9 1485323.6403014986
10 1076694.827247693
11 857251.0286308117
12 712880.7570681137
13 604933.5932219281
14 519207.9972245651
15 448968.83282428596
16 390525.1846320142
17 341376.76094681583
18 299756.1123010445
19 264318.59620949783
20 233901.57025635347
21 207685.35969553352
22 184972.3479305404
23 165195.76598169864
24 147925.12320808112
25 132788.02413390524
26 119465.96934776581
27 107706.173893275
28 97297.91187987139
29 88053.77324523684
30 79822.27835007265
31 72475.73682922109
32 65901.71706986947
33 60020.049096947354
34 54736.86106719043
35 49981.88496717531
36 45692.89850624885
37 41815.51563226028
38 38308.02971222328
39 35128.28577094067
40 32238.904752444007
41 29615.46747264543
42 27227.95237896195
43 25054.857146111608
44 23072.689125661233
45 21262.836594495446
46 19608.27963535999
47 18094

479 6.608357418764179e-07
480 6.289183212556317e-07
481 5.985511937887261e-07
482 5.696622861604171e-07
483 5.421707811431315e-07
484 5.1600444231775e-07
485 4.911099914530159e-07
486 4.6741785551805545e-07
487 4.4487398338571226e-07
488 4.234211003059386e-07
489 4.0300965824325106e-07
490 3.83582207666172e-07
491 3.650946604829654e-07
492 3.4750234005763734e-07
493 3.307600787884032e-07
494 3.148258421220463e-07
495 2.996664306507415e-07
496 2.8523553581235116e-07
497 2.7150425395149565e-07
498 2.584382058737915e-07
499 2.460011107398181e-07


![title](img/my_backprop2.jpg)

## PyTorch: Tensors

Numpy is a great framework, but it cannot utilize GPUs to accelerate its numerical computations. For modern deep neural networks, GPUs often provide speedups of 50x or greater, so unfortunately numpy won’t be enough for modern deep learning.

Here we introduce the most fundamental PyTorch concept: the Tensor. A PyTorch Tensor is conceptually identical to a numpy array: a Tensor is an n-dimensional array, and PyTorch provides many functions for operating on these Tensors. Like numpy arrays, PyTorch Tensors do not know anything about deep learning or computational graphs or gradients; they are a generic tool for scientific computing.

However unlike numpy, PyTorch Tensors can utilize GPUs to accelerate their numeric computations. To run a PyTorch Tensor on GPU, you simply need to cast it to a new datatype.

Here we use PyTorch Tensors to fit a two-layer network to random data. Like the numpy example above we need to manually implement the forward and backward passes through the network:

In [3]:
# -*- coding: utf-8 -*-
import torch

dtype = torch.float
device = torch.device("cpu")
# dtype = torch.device("cuda:0") # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# Randomly initialize weights
w1 = torch.randn(D_in, H, device=device, dtype=dtype)
w2 = torch.randn(H, D_out, device=device, dtype=dtype)

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y
    h = x.mm(w1)
    h_relu = h.clamp(min=0)
    y_pred = h_relu.mm(w2)

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum().item()
    print(t, loss)

    # Backprop to compute gradients of w1 and w2 with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h < 0] = 0
    grad_w1 = x.t().mm(grad_h)

    # Update weights using gradient descent
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 36033240.0
1 31648028.0
2 28145876.0
3 22592880.0
4 15793286.0
5 9816277.0
6 5801947.5
7 3492640.75
8 2248613.0
9 1569968.25
10 1176699.5
11 928932.6875
12 758982.5
13 633928.3125
14 537037.5625
15 459626.40625
16 396316.125
17 343775.59375
18 299834.875
19 262647.78125
20 230957.171875
21 203765.234375
22 180326.28125
23 160031.234375
24 142365.96875
25 126968.0390625
26 113495.015625
27 101661.5234375
28 91242.171875
29 82052.21875
30 73919.8125
31 66710.1015625
32 60301.7734375
33 54610.234375
34 49527.36328125
35 44981.359375
36 40907.08203125
37 37248.23828125
38 33959.82421875
39 30999.015625
40 28327.65234375
41 25915.265625
42 23733.099609375
43 21755.373046875
44 19959.095703125
45 18326.982421875
46 16841.97265625
47 15491.4736328125
48 14260.171875
49 13135.7734375
50 12109.0654296875
51 11169.5986328125
52 10309.5537109375
53 9522.072265625
54 8799.09375
55 8135.7529296875
56 7526.65478515625
57 6966.869140625
58 6452.42138671875
59 5978.87060546875
60 5542.40966796875
61