The correponding pyunit test case is in 
<b>tests.core.np.TestDenseLayer.DenseLayerStandAlone#test_basic_op</b>

Note that given $w, b$ and $x$, we calculate 

$y=wx+b$ 

whereas you need to provide  $x^T, w $ and $b$  to pytorch to caculate 

$y_{torch}=xw^T+b$  in which case, $y=y^{T}_{torch}$

Strictly speaking, shape of $b$ should matter but I have not yet tested it. 

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np 
import torch.optim as optim 
import sys 
import os 
import matplotlib.pyplot as plt

In [None]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.dense =  nn.Linear(3,2)

    def forward(self, x):
        return self.dense(x) 

In [3]:
net = Net() 
print("Printing dense layer weights...")
print(net.dense.weight.data)
print(net.dense.bias.data)
print("finished printing dense layer weights")

Printing dense layer weights...
tensor([[ 0.3781,  0.0311, -0.0762],
        [-0.3608,  0.4820,  0.3797]])
tensor([ 0.1245, -0.1654])
finished printing dense layer weights


<h2>Manually set the layer weights</h2>
Setting the weights to match those used in the pyunit test case. This is usually better than 
relying on seeding random number generators. 

Also computing the output directly as 

$y_{pred}=xw^{T}+b$

In [4]:
model_w = np.array([[1, 3, -1], [0, -4, 2]])
model_b = np.array([-3, 2])
w_tensor = torch.from_numpy(model_w).float()
b_tensor = torch.from_numpy(model_b).float() 
w = nn.Parameter(w_tensor)
b = nn.Parameter(b_tensor)
net.dense.weight  = w 
net.dense.bias = b
x = np.array([[1, -1], [2, 3], [-1, -2]]).T
print("#------ ")
print("model_w=np.{}".format(repr(model_w)))
print("model_b=np.{}".format(repr(model_b)))
print("x=np.{}".format(repr(x)))
yy = x@model_w.T + model_b 
print("# ---- expected final value (directly computed)----")
print("y-predicted=np.{}".format(repr(yy)))

#------ 
model_w=np.array([[ 1,  3, -1],
       [ 0, -4,  2]])
model_b=np.array([-3,  2])
x=np.array([[ 1,  2, -1],
       [-1,  3, -2]])
# ---- expected final value (directly computed)----
y-predicted=np.array([[  5,  -8],
       [  7, -14]])


In [5]:
input = torch.from_numpy(x).float() 
print("Input Shape:{}".format(input.shape))
output = net(input)
print("Output:{}".format(output.data))
y = np.array([[-1, 1], [-3, -1]]).T
target = torch.from_numpy(y).float()
print("Target:{}".format(target.data))

Input Shape:torch.Size([2, 3])
Output:tensor([[  5.,  -8.],
        [  7., -14.]])
Target:tensor([[-1., -3.],
        [ 1., -1.]])


In [6]:
criteria = nn.MSELoss()
loss = criteria(output, target)
print("loss: {}".format(loss))
optimizer = torch.optim.SGD(net.parameters(), lr=.001)

loss: 66.5


In [7]:
optimizer.zero_grad()
# perform a backward pass (backpropagation)
loss.backward()
# Update the parameters
optimizer.step()

In [8]:
torch.set_printoptions(precision=8, sci_mode=False)
print("Printing wight and weight gradient after one step")
print("Weight:{}".format(net.dense.weight.data))
print("W Grad:{}".format(net.dense.weight.grad.data))
print("\nPrinting bias and bias gradient after one step")
print("Bias:{}".format(net.dense.bias.data))
print("B grad:{}".format(net.dense.bias.grad.data))

Printing wight and weight gradient after one step
Weight:tensor([[ 1.00000000,  2.98499990, -0.99100000],
        [-0.00400000, -3.97550011,  1.98450005]])
W Grad:tensor([[  0.00000000,  15.00000000,  -9.00000000],
        [  4.00000000, -24.50000000,  15.50000000]])

Printing bias and bias gradient after one step
Bias:tensor([-3.00600004,  2.00900006])
B grad:tensor([ 6., -9.])
