# Introduction to Human Language Technology (601.467/667)
## Kenton Murray

![PyTorch](Pictures/IntroHLT/PyTorch.png)

First, let's load PyTorch. (At this point you should have conda, pytorch, and jupyter notebook installed). Then we will check the version and use a simple comman.

In [1]:
import torch

In [2]:
torch.__version__

'1.9.0'

In [3]:
fc1 = torch.nn.Linear(3,2)

In [4]:
print(fc1)

Linear(in_features=3, out_features=2, bias=True)


Hopefully, you should be able to see the version and to print out a Linear layer.





Now, let's return to last week's logistic regression examples.

![LogisticRegression](Pictures/IntroHLT/LogisticRegression.png)

In [6]:
x = torch.Tensor([0,1])

In [7]:
print(x)

tensor([0., 1.])


In [8]:
W = torch.nn.Linear(2,1)

In [8]:
print(W)
print(W.weight)
print(W.bias)

Linear(in_features=2, out_features=1, bias=True)
Parameter containing:
tensor([[-0.3554, -0.5974]], requires_grad=True)
Parameter containing:
tensor([-0.2394], requires_grad=True)


In [10]:
LinearTransform = W(x)

In [11]:
print("LinearTransform:", LinearTransform)

LinearTransform: tensor([-1.2021], grad_fn=<AddBackward0>)


In [12]:
output = torch.sigmoid(LinearTransform)

In [13]:
print("output:", output)

output: tensor([0.2311], grad_fn=<SigmoidBackward>)


So what does this number mean?

What happens when we try it again?

In [14]:
W = torch.nn.Linear(2,1)
LinearTransform = W(x)
print("LinearTransform:", LinearTransform)
output = torch.sigmoid(LinearTransform)
print("output:", output)

LinearTransform: tensor([-0.6212], grad_fn=<AddBackward0>)
output: tensor([0.3495], grad_fn=<SigmoidBackward>)


Why does the number change?

In [15]:
for i in range(1,10):
    W = torch.nn.Linear(2,1)
    print(W.weight)
    i = i+1

Parameter containing:
tensor([[ 0.7055, -0.1307]], requires_grad=True)
Parameter containing:
tensor([[-0.3631, -0.0564]], requires_grad=True)
Parameter containing:
tensor([[0.5110, 0.0677]], requires_grad=True)
Parameter containing:
tensor([[0.5051, 0.1598]], requires_grad=True)
Parameter containing:
tensor([[-0.6636, -0.5504]], requires_grad=True)
Parameter containing:
tensor([[-0.0710,  0.2904]], requires_grad=True)
Parameter containing:
tensor([[-0.2769, -0.5019]], requires_grad=True)
Parameter containing:
tensor([[-0.5132, -0.3038]], requires_grad=True)
Parameter containing:
tensor([[-0.5941, -0.4805]], requires_grad=True)


![one_does_not](Pictures/IntroHLT/one_does_not.jpg)



We need to learn the weights!

But first, what is $x$?

## Binary Language ID

"gatos", "cat", "hola", "gata", "hello", "at"

Is a word Spanish or English?

### Input vector, $x$

Here, we are going to *manually* choose two features for each word to use as our embedding. Feature selection was common before Deep Learning. Now it is more common to let the embeddings be learned automatically. For our toy example here we will use the ratio of "o"s and "a"s in a word to its length, and its length.

![WordEmbeddings](Pictures/IntroHLT/WordEmbeddings.png)


In [16]:
# Assign values to all words
# variable_name = ratio, len
gatos = [0.4, 5]
cat = [0.33, 3]
hola = [0.5, 4]
gata = [0.5, 4]
hello = [0.2, 5]
at = [0.5, 2]

In [18]:
#Choose one of our examples to be x
x = torch.Tensor(cat)

W = torch.nn.Linear(2,1)
LinearTransform = W(x)
print("LinearTransform:", LinearTransform)
output = torch.sigmoid(LinearTransform)
print("output:", output)

LinearTransform: tensor([0.8019], grad_fn=<AddBackward0>)
output: tensor([0.6904], grad_fn=<SigmoidBackward>)


## Training

Our weight matrix $W$ is only randomly initialized. It hasn't been trained, nor learned its weights. How do we do this?

### Labeled Data!

![LabeledData](Pictures/IntroHLT/LabeledData.png)

In [20]:
English = torch.Tensor([[0],[1],[0],[0],[1],[1]])
print(English)

tensor([[0.],
        [1.],
        [0.],
        [0.],
        [1.],
        [1.]])


In [21]:
Xs = [gatos, cat, hola, gata, hello, at]
Xs = torch.Tensor(Xs)
print(Xs)

tensor([[0.4000, 5.0000],
        [0.3300, 3.0000],
        [0.5000, 4.0000],
        [0.5000, 4.0000],
        [0.2000, 5.0000],
        [0.5000, 2.0000]])


In [22]:
for x, output in zip(Xs, English):
    print("x:", x)
    LinearTransform = W(x)
    print("LinearTransform:", LinearTransform)
    predicted_output = torch.sigmoid(LinearTransform)
    print("predicted_output:", predicted_output)
    print("output:", output, "\n\n")

x: tensor([0.4000, 5.0000])
LinearTransform: tensor([1.6982], grad_fn=<AddBackward0>)
predicted_output: tensor([0.8453], grad_fn=<SigmoidBackward>)
output: tensor([0.]) 


x: tensor([0.3300, 3.0000])
LinearTransform: tensor([0.8019], grad_fn=<AddBackward0>)
predicted_output: tensor([0.6904], grad_fn=<SigmoidBackward>)
output: tensor([1.]) 


x: tensor([0.5000, 4.0000])
LinearTransform: tensor([1.2886], grad_fn=<AddBackward0>)
predicted_output: tensor([0.7839], grad_fn=<SigmoidBackward>)
output: tensor([0.]) 


x: tensor([0.5000, 4.0000])
LinearTransform: tensor([1.2886], grad_fn=<AddBackward0>)
predicted_output: tensor([0.7839], grad_fn=<SigmoidBackward>)
output: tensor([0.]) 


x: tensor([0.2000, 5.0000])
LinearTransform: tensor([1.6410], grad_fn=<AddBackward0>)
predicted_output: tensor([0.8377], grad_fn=<SigmoidBackward>)
output: tensor([1.]) 


x: tensor([0.5000, 2.0000])
LinearTransform: tensor([0.4123], grad_fn=<AddBackward0>)
predicted_output: tensor([0.6017], grad_fn=<SigmoidBac

![sad](Pictures/IntroHLT/sad.jpg)

### But how garbage are they?!?!

We need a *badness* metric. In Machine Learning, we call this a loss. We simply compare the correct answers to ours.

(predicted_output - output)

In [23]:
for x, output in zip(Xs, English):
    print("x:", x)
    LinearTransform = W(x)
    print("LinearTransform:", LinearTransform)
    predicted_output = torch.sigmoid(LinearTransform)
    print("predicted_output:", predicted_output)
    print("output:", output)
    loss = (predicted_output - output)
    print("loss:", loss, "\n\n")

x: tensor([0.4000, 5.0000])
LinearTransform: tensor([1.6982], grad_fn=<AddBackward0>)
predicted_output: tensor([0.8453], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor([0.8453], grad_fn=<SubBackward0>) 


x: tensor([0.3300, 3.0000])
LinearTransform: tensor([0.8019], grad_fn=<AddBackward0>)
predicted_output: tensor([0.6904], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor([-0.3096], grad_fn=<SubBackward0>) 


x: tensor([0.5000, 4.0000])
LinearTransform: tensor([1.2886], grad_fn=<AddBackward0>)
predicted_output: tensor([0.7839], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor([0.7839], grad_fn=<SubBackward0>) 


x: tensor([0.5000, 4.0000])
LinearTransform: tensor([1.2886], grad_fn=<AddBackward0>)
predicted_output: tensor([0.7839], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor([0.7839], grad_fn=<SubBackward0>) 


x: tensor([0.2000, 5.0000])
LinearTransform: tensor([1.6410], grad_fn=<AddBackward0>)
predicted_output: tensor([0.8377], gr

However, we need to update the parameters. PyTorch has built in optimizers (more on this later). First, let's do some brief software engineering and define a class calle LogisticRegression.

In [43]:
class LogisticRegression(torch.nn.Module):
    def __init__(self, D_in):
        super(LogisticRegression, self).__init__()
        self.W = torch.nn.Linear(D_in, 1)
    
    def forward(self, x):
        LinearTransform = self.W(x)
        predicted_output = torch.sigmoid(LinearTransform)
        return predicted_output

Now we can make Logistic Regression Models for binary classification for $x$'s of different dimensions. Let's go back to our two feature example.

In [44]:
model = LogisticRegression(2)

And now we can use an optimizer on our model (again more details later).

In [45]:
optimizer = torch.optim.SGD(list(model.parameters()), lr=0.1)

Let's actually use this loss through backpropagation (more info also coming ....)

In [27]:
for x, output in zip(Xs, English):
    print("x:", x)
    predicted_output = model(x)
    print("predicted_output:", predicted_output)
    print("output:", output)
    loss = (predicted_output - output)
    print("loss:", loss, "\n\n")
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    #print("optimizer:", optimizer, "\n\n")

x: tensor([0.4000, 5.0000])
predicted_output: tensor([0.0252], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor([0.0252], grad_fn=<SubBackward0>) 


x: tensor([0.3300, 3.0000])
predicted_output: tensor([0.0791], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor([-0.9209], grad_fn=<SubBackward0>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.0403], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor([0.0403], grad_fn=<SubBackward0>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.0378], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor([0.0378], grad_fn=<SubBackward0>) 


x: tensor([0.2000, 5.0000])
predicted_output: tensor([0.0177], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor([-0.9823], grad_fn=<SubBackward0>) 


x: tensor([0.5000, 2.0000])
predicted_output: tensor([0.1252], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor([-0.8748], grad_fn=<SubBackward0>) 




![sad](Pictures/IntroHLT/sad.jpg)

Why are they still bad?

Let's try going through the data more times.

In [97]:
for t in range (1, 50):
    for x, output in zip(Xs, English):
        print("x:", x)
        predicted_output = model(x)
        print("predicted_output:", predicted_output)
        print("output:", output)
        loss = (predicted_output - output)
        print("loss:", loss, "\n\n")
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

x: tensor([0.4000, 5.0000])
predicted_output: tensor([0.0468], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor([0.0468], grad_fn=<SubBackward0>) 


x: tensor([0.3300, 3.0000])
predicted_output: tensor([0.1106], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor([-0.8894], grad_fn=<SubBackward0>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.0634], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor([0.0634], grad_fn=<SubBackward0>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.0576], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor([0.0576], grad_fn=<SubBackward0>) 


x: tensor([0.2000, 5.0000])
predicted_output: tensor([0.0268], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor([-0.9732], grad_fn=<SubBackward0>) 


x: tensor([0.5000, 2.0000])
predicted_output: tensor([0.1545], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor([-0.8455], grad_fn=<SubBackward0>) 


x: tensor([0.4000, 5.0000])
pre

loss: tensor([0.0024], grad_fn=<SubBackward0>) 


x: tensor([0.3300, 3.0000])
predicted_output: tensor([0.0196], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor([-0.9804], grad_fn=<SubBackward0>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.0069], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor([0.0069], grad_fn=<SubBackward0>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.0068], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor([0.0068], grad_fn=<SubBackward0>) 


x: tensor([0.2000, 5.0000])
predicted_output: tensor([0.0021], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor([-0.9979], grad_fn=<SubBackward0>) 


x: tensor([0.5000, 2.0000])
predicted_output: tensor([0.0559], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor([-0.9441], grad_fn=<SubBackward0>) 


x: tensor([0.4000, 5.0000])
predicted_output: tensor([0.0021], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor([0.0021], grad_fn=

output: tensor([1.])
loss: tensor([-0.9904], grad_fn=<SubBackward0>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.0028], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor([0.0028], grad_fn=<SubBackward0>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.0027], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor([0.0027], grad_fn=<SubBackward0>) 


x: tensor([0.2000, 5.0000])
predicted_output: tensor([0.0007], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor([-0.9993], grad_fn=<SubBackward0>) 


x: tensor([0.5000, 2.0000])
predicted_output: tensor([0.0347], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor([-0.9653], grad_fn=<SubBackward0>) 


x: tensor([0.4000, 5.0000])
predicted_output: tensor([0.0007], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor([0.0007], grad_fn=<SubBackward0>) 


x: tensor([0.3300, 3.0000])
predicted_output: tensor([0.0092], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tens

output: tensor([0.])
loss: tensor([0.0016], grad_fn=<SubBackward0>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.0016], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor([0.0016], grad_fn=<SubBackward0>) 


x: tensor([0.2000, 5.0000])
predicted_output: tensor([0.0004], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor([-0.9996], grad_fn=<SubBackward0>) 


x: tensor([0.5000, 2.0000])
predicted_output: tensor([0.0263], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor([-0.9737], grad_fn=<SubBackward0>) 


x: tensor([0.4000, 5.0000])
predicted_output: tensor([0.0004], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor([0.0004], grad_fn=<SubBackward0>) 


x: tensor([0.3300, 3.0000])
predicted_output: tensor([0.0063], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor([-0.9937], grad_fn=<SubBackward0>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.0016], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tens

output: tensor([1.])
loss: tensor([-0.9790], grad_fn=<SubBackward0>) 


x: tensor([0.4000, 5.0000])
predicted_output: tensor([0.0002], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor([0.0002], grad_fn=<SubBackward0>) 


x: tensor([0.3300, 3.0000])
predicted_output: tensor([0.0046], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor([-0.9954], grad_fn=<SubBackward0>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.0011], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor([0.0011], grad_fn=<SubBackward0>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.0011], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor([0.0011], grad_fn=<SubBackward0>) 


x: tensor([0.2000, 5.0000])
predicted_output: tensor([0.0002], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor([-0.9998], grad_fn=<SubBackward0>) 


x: tensor([0.5000, 2.0000])
predicted_output: tensor([0.0207], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tens

What is a loss? What happens with it? What is this backprop?


![forward](Pictures/IntroHLT/forward.jpg)

![forward2](Pictures/IntroHLT/forward2.png)

For our Linear Regression example, this is simply the $\sigma(W(x))$

Now remember that our loss calculates a score between the predicted output and the correct output. We want these to be as close as possible - so we want to minimize this.



![GradientDescent](Pictures/IntroHLT/GradientDescent.png)

![chan](Pictures/IntroHLT/chan.jpg)

It is simply the method of calculating the gradients for all the variables. The method goes through all the variables in the network in reverse order calculating the gradients using the chain rule from calculus.

![backprop](Pictures/IntroHLT/backprop.png)

### Chain Rule

$\frac{\delta}{\delta \theta}f(g(\theta))$ = $f'(g(\theta))g'(\theta)$

$f(g(\theta)) = \sigma(W(\theta))$

$ = \sigma(W(\theta))(1-\sigma(W(\theta)))(W'(\theta)) $

### Better Loss Functions

The loss function we have defined so far is a simple difference between the predicted and the correct value. However, this is prone to issues such as the sign of the difference. There are other more commonly used loss functions designed to mitigate some of these problems.

* Mean Absolute Error Loss (L1)
* Mean Squared Error (MSE)
* Cross-Entropy   

MSE: $\frac{1}{n}\sum_{1}^{n}{(Correct - Predicted)^2}$

In [28]:
criterion = torch.nn.MSELoss(reduction='sum')

In [30]:
for t in range (1, 50):
    for x, output in zip(Xs, English):
        print("x:", x)
        predicted_output = model(x)
        print("predicted_output:", predicted_output)
        print("output:", output)
        loss = criterion(predicted_output, output)
        print("loss:", loss, "\n\n")
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        

x: tensor([0.4000, 5.0000])
predicted_output: tensor([0.0153], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor(0.0002, grad_fn=<MseLossBackward>) 


x: tensor([0.3300, 3.0000])
predicted_output: tensor([0.0610], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor(0.8818, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.0356], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor(0.0013, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.0355], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor(0.0013, grad_fn=<MseLossBackward>) 


x: tensor([0.2000, 5.0000])
predicted_output: tensor([0.0176], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor(0.9650, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 2.0000])
predicted_output: tensor([0.1310], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor(0.7551, grad_fn=<MseLossBackward>) 


x: tensor([0.4000, 5.0000])


x: tensor([0.3300, 3.0000])
predicted_output: tensor([0.3881], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor(0.3744, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.4654], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor(0.2166, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.3686], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor(0.1359, grad_fn=<MseLossBackward>) 


x: tensor([0.2000, 5.0000])
predicted_output: tensor([0.2598], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor(0.5479, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 2.0000])
predicted_output: tensor([0.4544], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor(0.2977, grad_fn=<MseLossBackward>) 


x: tensor([0.4000, 5.0000])
predicted_output: tensor([0.5072], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor(0.2572, grad_fn=<MseLossBackward>) 


x: tensor([0.3300, 3.0000])


output: tensor([1.])
loss: tensor(0.3576, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.4649], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor(0.2161, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.3683], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor(0.1356, grad_fn=<MseLossBackward>) 


x: tensor([0.2000, 5.0000])
predicted_output: tensor([0.2506], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor(0.5617, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 2.0000])
predicted_output: tensor([0.4798], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor(0.2706, grad_fn=<MseLossBackward>) 


x: tensor([0.4000, 5.0000])
predicted_output: tensor([0.4889], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor(0.2390, grad_fn=<MseLossBackward>) 


x: tensor([0.3300, 3.0000])
predicted_output: tensor([0.4031], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: t

output: tensor([1.])
loss: tensor(0.5736, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 2.0000])
predicted_output: tensor([0.5019], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor(0.2481, grad_fn=<MseLossBackward>) 


x: tensor([0.4000, 5.0000])
predicted_output: tensor([0.4722], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor(0.2230, grad_fn=<MseLossBackward>) 


x: tensor([0.3300, 3.0000])
predicted_output: tensor([0.4153], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor(0.3419, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.4639], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor(0.2152, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.3675], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor(0.1351, grad_fn=<MseLossBackward>) 


x: tensor([0.2000, 5.0000])
predicted_output: tensor([0.2420], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: t

x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.3668], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor(0.1345, grad_fn=<MseLossBackward>) 


x: tensor([0.2000, 5.0000])
predicted_output: tensor([0.2348], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor(0.5855, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 2.0000])
predicted_output: tensor([0.5245], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor(0.2261, grad_fn=<MseLossBackward>) 


x: tensor([0.4000, 5.0000])
predicted_output: tensor([0.4546], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor(0.2067, grad_fn=<MseLossBackward>) 


x: tensor([0.3300, 3.0000])
predicted_output: tensor([0.4281], grad_fn=<SigmoidBackward>)
output: tensor([1.])
loss: tensor(0.3271, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.4626], grad_fn=<SigmoidBackward>)
output: tensor([0.])
loss: tensor(0.2140, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 4.0000])


What are some of the other things we can try?

*  More loss functions
*  Different optimizers
*  More times through the data
*  Minibatches
*  More Data
*  Different Initializations

## Neural Network

In [31]:
class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        super(TwoLayerNet, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, H)
        self.linear2 = torch.nn.Linear(H, D_out)

    def forward(self, x):
        h_relu = self.linear1(x).clamp(min=0)
        y_pred = self.linear2(h_relu)
        return y_pred
    

# D_in is input dimension; H is hidden dimension; D_out is output dimension.
D_in, H, D_out = 2, 2, 1

# Construct model
model2 = TwoLayerNet(D_in, H, D_out)

# Loss, Optimizer, and Proximal Gradient
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(list(model2.parameters()), lr=0.1)
print(list(model2.parameters()))

[Parameter containing:
tensor([[-0.0861, -0.3097],
        [-0.4803,  0.3839]], requires_grad=True), Parameter containing:
tensor([ 0.6667, -0.3263], requires_grad=True), Parameter containing:
tensor([[ 0.3753, -0.5971]], requires_grad=True), Parameter containing:
tensor([-0.2337], requires_grad=True)]


In [33]:
for t in range (1, 50):
    for x, output in zip(Xs, English):
        print("x:", x)
        predicted_output = model2(x)
        print("predicted_output:", predicted_output)
        print("output:", output)
        loss = criterion(predicted_output, output)
        print(list(model2.parameters()))
        print("loss:", loss, "\n\n")
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

x: tensor([0.4000, 5.0000])
predicted_output: tensor([0.5181], grad_fn=<AddBackward0>)
predicted_output.grad: None
output: tensor([0.])
[Parameter containing:
tensor([[ 0.1693, -0.5731],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.7416, -0.4541], requires_grad=True), Parameter containing:
tensor([[ 0.5684, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.5181], requires_grad=True)]
loss: tensor(0.2684, grad_fn=<MseLossBackward>) 


x: tensor([0.3300, 3.0000])
predicted_output: tensor([0.4589], grad_fn=<AddBackward0>)
predicted_output.grad: None
output: tensor([1.])
[Parameter containing:
tensor([[ 0.1693, -0.5731],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.7416, -0.4541], requires_grad=True), Parameter containing:
tensor([[ 0.5684, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.4145], requires_grad=True)]
loss: tensor(0.2927, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 4

  


tensor([0.])
[Parameter containing:
tensor([[ 0.1781, -0.6618],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.8020, -0.4541], requires_grad=True), Parameter containing:
tensor([[ 0.5624, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.5150], requires_grad=True)]
loss: tensor(0.2653, grad_fn=<MseLossBackward>) 


x: tensor([0.3300, 3.0000])
predicted_output: tensor([0.4120], grad_fn=<AddBackward0>)
predicted_output.grad: None
output: tensor([1.])
[Parameter containing:
tensor([[ 0.1781, -0.6618],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.8020, -0.4541], requires_grad=True), Parameter containing:
tensor([[ 0.5624, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.4120], requires_grad=True)]
loss: tensor(0.3457, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.5296], grad_fn=<AddBackward0>)
predicted_output.grad: None
output: tensor([0.])
[Parame

loss: tensor(0.5146, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.3797], grad_fn=<AddBackward0>)
predicted_output.grad: None
output: tensor([0.])
[Parameter containing:
tensor([[ 0.1794, -0.7727],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.8501, -0.4541], requires_grad=True), Parameter containing:
tensor([[ 0.5498, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.3797], requires_grad=True)]
loss: tensor(0.1442, grad_fn=<MseLossBackward>) 


x: tensor([0.2000, 5.0000])
predicted_output: tensor([0.3038], grad_fn=<AddBackward0>)
predicted_output.grad: None
output: tensor([1.])
[Parameter containing:
tensor([[ 0.1794, -0.7727],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.8501, -0.4541], requires_grad=True), Parameter containing:
tensor([[ 0.5498, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.3038], requires_grad=True)]
loss: tensor(0.4847,

predicted_output: tensor([0.8307], grad_fn=<AddBackward0>)
predicted_output.grad: None
output: tensor([1.])
[Parameter containing:
tensor([[ 0.2161, -0.7313],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.9717, -0.4541], requires_grad=True), Parameter containing:
tensor([[ 0.5872, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.4683], requires_grad=True)]
loss: tensor(0.0287, grad_fn=<MseLossBackward>) 


x: tensor([0.4000, 5.0000])
predicted_output: tensor([0.5022], grad_fn=<AddBackward0>)
predicted_output.grad: None
output: tensor([0.])
[Parameter containing:
tensor([[ 0.2260, -0.6916],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.9916, -0.4541], requires_grad=True), Parameter containing:
tensor([[ 0.6081, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.5022], requires_grad=True)]
loss: tensor(0.2522, grad_fn=<MseLossBackward>) 


x: tensor([0.3300, 3.0000])
predicted_output: te

[Parameter containing:
tensor([[ 0.2274, -0.7531],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 2.0183, -0.4541], requires_grad=True), Parameter containing:
tensor([[ 0.6036, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.3296], requires_grad=True)]
loss: tensor(0.4494, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 2.0000])
predicted_output: tensor([0.8415], grad_fn=<AddBackward0>)
predicted_output.grad: None
output: tensor([1.])
[Parameter containing:
tensor([[ 0.2274, -0.7531],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 2.0183, -0.4541], requires_grad=True), Parameter containing:
tensor([[ 0.6036, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.4637], requires_grad=True)]
loss: tensor(0.0251, grad_fn=<MseLossBackward>) 


x: tensor([0.4000, 5.0000])
predicted_output: tensor([0.4954], grad_fn=<AddBackward0>)
predicted_output.grad: None
output: tensor([0.])
[Parameter containin

predicted_output: tensor([0.4889], grad_fn=<AddBackward0>)
predicted_output.grad: None
output: tensor([0.])
[Parameter containing:
tensor([[ 0.2478, -0.7963],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 2.1109, -0.4541], requires_grad=True), Parameter containing:
tensor([[ 0.6184, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.4889], requires_grad=True)]
loss: tensor(0.2390, grad_fn=<MseLossBackward>) 


x: tensor([0.3300, 3.0000])
predicted_output: tensor([0.3911], grad_fn=<AddBackward0>)
predicted_output.grad: None
output: tensor([1.])
[Parameter containing:
tensor([[ 0.2478, -0.7963],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 2.1109, -0.4541], requires_grad=True), Parameter containing:
tensor([[ 0.6184, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.3911], requires_grad=True)]
loss: tensor(0.3707, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 4.0000])
predicted_output: te

loss: tensor(0.4577, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 2.0000])
predicted_output: tensor([0.9633], grad_fn=<AddBackward0>)
predicted_output.grad: None
output: tensor([1.])
[Parameter containing:
tensor([[ 0.2609, -0.7437],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 2.1372, -0.4541], requires_grad=True), Parameter containing:
tensor([[ 0.6467, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.4588], requires_grad=True)]
loss: tensor(0.0013, grad_fn=<MseLossBackward>) 


x: tensor([0.4000, 5.0000])
predicted_output: tensor([0.4661], grad_fn=<AddBackward0>)
predicted_output.grad: None
output: tensor([0.])
[Parameter containing:
tensor([[ 0.2633, -0.7342],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 2.1419, -0.4541], requires_grad=True), Parameter containing:
tensor([[ 0.6524, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.4661], requires_grad=True)]
loss: tensor(0.2173,

loss: tensor(0.2096, grad_fn=<MseLossBackward>) 


x: tensor([0.3300, 3.0000])
predicted_output: tensor([0.3759], grad_fn=<AddBackward0>)
predicted_output.grad: None
output: tensor([1.])
[Parameter containing:
tensor([[ 0.2736, -0.7551],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 2.1897, -0.4541], requires_grad=True), Parameter containing:
tensor([[ 0.6598, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.3662], requires_grad=True)]
loss: tensor(0.3895, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.7493], grad_fn=<AddBackward0>)
predicted_output.grad: None
output: tensor([0.])
[Parameter containing:
tensor([[ 0.3008, -0.5080],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 2.2720, -0.4541], requires_grad=True), Parameter containing:
tensor([[ 0.6617, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.4911], requires_grad=True)]
loss: tensor(0.5615,

loss: tensor(0.1552, grad_fn=<MseLossBackward>) 


x: tensor([0.2000, 5.0000])
predicted_output: tensor([0.3151], grad_fn=<AddBackward0>)
predicted_output.grad: None
output: tensor([1.])
[Parameter containing:
tensor([[ 0.2823, -0.7803],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 2.2351, -0.4541], requires_grad=True), Parameter containing:
tensor([[ 0.6620, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.3151], requires_grad=True)]
loss: tensor(0.4691, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 2.0000])
predicted_output: tensor([0.9921], grad_fn=<AddBackward0>)
predicted_output.grad: None
output: tensor([1.])
[Parameter containing:
tensor([[ 0.2823, -0.7803],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 2.2351, -0.4541], requires_grad=True), Parameter containing:
tensor([[ 0.6620, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.4521], requires_grad=True)]
loss: tensor(6.1651e

x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.3390], grad_fn=<AddBackward0>)
predicted_output.grad: None
output: tensor([0.])
[Parameter containing:
tensor([[ 0.2616, -0.9193],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 2.2225, -0.4541], requires_grad=True), Parameter containing:
tensor([[ 0.6063, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.3390], requires_grad=True)]
loss: tensor(0.1149, grad_fn=<MseLossBackward>) 


x: tensor([0.2000, 5.0000])
predicted_output: tensor([0.2712], grad_fn=<AddBackward0>)
predicted_output.grad: None
output: tensor([1.])
[Parameter containing:
tensor([[ 0.2616, -0.9193],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 2.2225, -0.4541], requires_grad=True), Parameter containing:
tensor([[ 0.6063, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.2712], requires_grad=True)]
loss: tensor(0.5312, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 2

loss: tensor(4.7091e-05, grad_fn=<MseLossBackward>) 


x: tensor([0.4000, 5.0000])
predicted_output: tensor([0.4532], grad_fn=<AddBackward0>)
predicted_output.grad: None
output: tensor([0.])
[Parameter containing:
tensor([[ 0.2909, -0.8021],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 2.2810, -0.4541], requires_grad=True), Parameter containing:
tensor([[ 0.6633, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.4532], requires_grad=True)]
loss: tensor(0.2054, grad_fn=<MseLossBackward>) 


x: tensor([0.3300, 3.0000])
predicted_output: tensor([0.3626], grad_fn=<AddBackward0>)
predicted_output.grad: None
output: tensor([1.])
[Parameter containing:
tensor([[ 0.2909, -0.8021],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 2.2810, -0.4541], requires_grad=True), Parameter containing:
tensor([[ 0.6633, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.3626], requires_grad=True)]
loss: tensor(0.4

output: tensor([0.])
[Parameter containing:
tensor([[ 0.2914, -0.8001],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 2.2821, -0.4541], requires_grad=True), Parameter containing:
tensor([[ 0.6646, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.4880], requires_grad=True)]
loss: tensor(0.2381, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.3904], grad_fn=<AddBackward0>)
predicted_output.grad: None
output: tensor([0.])
[Parameter containing:
tensor([[ 0.2914, -0.8001],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 2.2821, -0.4541], requires_grad=True), Parameter containing:
tensor([[ 0.6646, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.3904], requires_grad=True)]
loss: tensor(0.1524, grad_fn=<MseLossBackward>) 


x: tensor([0.2000, 5.0000])
predicted_output: tensor([0.3123], grad_fn=<AddBackward0>)
predicted_output.grad: None
output: tensor([1.])

## Why Deeper?

In [35]:
class EightLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        super(EightLayerNet, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, H)
        self.linear2 = torch.nn.Linear(H, H)
        self.linear3 = torch.nn.Linear(H, H)
        self.linear4 = torch.nn.Linear(H, H)
        self.linear5 = torch.nn.Linear(H, H)
        self.linear6 = torch.nn.Linear(H, H)
        self.linear7 = torch.nn.Linear(H, H)
        self.linear8 = torch.nn.Linear(H, D_out)

    def forward(self, x):
        h_relu = self.linear1(x).clamp(min=0)
        h_relu2 = self.linear2(h_relu)
        h_relu3 = self.linear3(h_relu2)
        h_relu4 = self.linear4(h_relu3)
        h_relu5 = self.linear5(h_relu4)
        h_relu6 = self.linear6(h_relu5)
        h_relu7 = self.linear7(h_relu6)
        y_pred = self.linear8(h_relu7)
        return y_pred
    

# D_in is input dimension; H is hidden dimension; D_out is output dimension.
D_in, H, D_out = 2, 4, 1

# Construct model
model8 = EightLayerNet(D_in, H, D_out)

# Loss, Optimizer, and Proximal Gradient
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(list(model2.parameters()), lr=0.1)
print(list(model8.parameters()))

[Parameter containing:
tensor([[-0.6424,  0.1296],
        [-0.0156,  0.6513],
        [ 0.0760,  0.2171],
        [ 0.2923,  0.3507]], requires_grad=True), Parameter containing:
tensor([ 0.3033, -0.3790,  0.2678, -0.0189], requires_grad=True), Parameter containing:
tensor([[-0.2819, -0.0837, -0.1564,  0.2387],
        [-0.2601, -0.2592,  0.2486,  0.2023],
        [ 0.1555,  0.4985, -0.2878, -0.0495],
        [-0.2454, -0.4824,  0.1429,  0.2333]], requires_grad=True), Parameter containing:
tensor([ 0.0871,  0.0315, -0.2115, -0.2815], requires_grad=True), Parameter containing:
tensor([[ 0.4783,  0.4812, -0.4707, -0.0440],
        [-0.4070,  0.2737, -0.3041, -0.3743],
        [ 0.0208, -0.0410,  0.4279, -0.3769],
        [ 0.2437, -0.2150, -0.4834, -0.0580]], requires_grad=True), Parameter containing:
tensor([0.2695, 0.4822, 0.2320, 0.1880], requires_grad=True), Parameter containing:
tensor([[ 0.3235, -0.2275,  0.0768,  0.2460],
        [ 0.3084, -0.3215,  0.4021,  0.4775],
        [ 0.1

In [37]:
for t in range (1, 50):
    for x, output in zip(Xs, English):
        print("x:", x)
        predicted_output = model8(x)
        print("predicted_output:", predicted_output)
        print("output:", output)
        loss = criterion(predicted_output, output)
        #print(list(model8.parameters()))
        print("loss:", loss, "\n\n")
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

x: tensor([0.4000, 5.0000])
predicted_output: tensor([0.0797], grad_fn=<AddBackward0>)
output: tensor([0.])
loss: tensor(0.0063, grad_fn=<MseLossBackward>) 


x: tensor([0.3300, 3.0000])
predicted_output: tensor([0.0715], grad_fn=<AddBackward0>)
output: tensor([1.])
loss: tensor(0.8622, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.0744], grad_fn=<AddBackward0>)
output: tensor([0.])
loss: tensor(0.0055, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.0744], grad_fn=<AddBackward0>)
output: tensor([0.])
loss: tensor(0.0055, grad_fn=<MseLossBackward>) 


x: tensor([0.2000, 5.0000])
predicted_output: tensor([0.0813], grad_fn=<AddBackward0>)
output: tensor([1.])
loss: tensor(0.8439, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 2.0000])
predicted_output: tensor([0.0656], grad_fn=<AddBackward0>)
output: tensor([1.])
loss: tensor(0.8730, grad_fn=<MseLossBackward>) 


x: tensor([0.4000, 5.0000])
predicted_output: 

loss: tensor(0.8730, grad_fn=<MseLossBackward>) 


x: tensor([0.4000, 5.0000])
predicted_output: tensor([0.0797], grad_fn=<AddBackward0>)
output: tensor([0.])
loss: tensor(0.0063, grad_fn=<MseLossBackward>) 


x: tensor([0.3300, 3.0000])
predicted_output: tensor([0.0715], grad_fn=<AddBackward0>)
output: tensor([1.])
loss: tensor(0.8622, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.0744], grad_fn=<AddBackward0>)
output: tensor([0.])
loss: tensor(0.0055, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.0744], grad_fn=<AddBackward0>)
output: tensor([0.])
loss: tensor(0.0055, grad_fn=<MseLossBackward>) 


x: tensor([0.2000, 5.0000])
predicted_output: tensor([0.0813], grad_fn=<AddBackward0>)
output: tensor([1.])
loss: tensor(0.8439, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 2.0000])
predicted_output: tensor([0.0656], grad_fn=<AddBackward0>)
output: tensor([1.])
loss: tensor(0.8730, grad_fn=<MseLossBackward>

x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.0744], grad_fn=<AddBackward0>)
output: tensor([0.])
loss: tensor(0.0055, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.0744], grad_fn=<AddBackward0>)
output: tensor([0.])
loss: tensor(0.0055, grad_fn=<MseLossBackward>) 


x: tensor([0.2000, 5.0000])
predicted_output: tensor([0.0813], grad_fn=<AddBackward0>)
output: tensor([1.])
loss: tensor(0.8439, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 2.0000])
predicted_output: tensor([0.0656], grad_fn=<AddBackward0>)
output: tensor([1.])
loss: tensor(0.8730, grad_fn=<MseLossBackward>) 


x: tensor([0.4000, 5.0000])
predicted_output: tensor([0.0797], grad_fn=<AddBackward0>)
output: tensor([0.])
loss: tensor(0.0063, grad_fn=<MseLossBackward>) 


x: tensor([0.3300, 3.0000])
predicted_output: tensor([0.0715], grad_fn=<AddBackward0>)
output: tensor([1.])
loss: tensor(0.8622, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 4.0000])
predicted_output: 

x: tensor([0.4000, 5.0000])
predicted_output: tensor([0.0797], grad_fn=<AddBackward0>)
output: tensor([0.])
loss: tensor(0.0063, grad_fn=<MseLossBackward>) 


x: tensor([0.3300, 3.0000])
predicted_output: tensor([0.0715], grad_fn=<AddBackward0>)
output: tensor([1.])
loss: tensor(0.8622, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.0744], grad_fn=<AddBackward0>)
output: tensor([0.])
loss: tensor(0.0055, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.0744], grad_fn=<AddBackward0>)
output: tensor([0.])
loss: tensor(0.0055, grad_fn=<MseLossBackward>) 


x: tensor([0.2000, 5.0000])
predicted_output: tensor([0.0813], grad_fn=<AddBackward0>)
output: tensor([1.])
loss: tensor(0.8439, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 2.0000])
predicted_output: tensor([0.0656], grad_fn=<AddBackward0>)
output: tensor([1.])
loss: tensor(0.8730, grad_fn=<MseLossBackward>) 


x: tensor([0.4000, 5.0000])
predicted_output: 

output: tensor([1.])
loss: tensor(0.8439, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 2.0000])
predicted_output: tensor([0.0656], grad_fn=<AddBackward0>)
output: tensor([1.])
loss: tensor(0.8730, grad_fn=<MseLossBackward>) 


x: tensor([0.4000, 5.0000])
predicted_output: tensor([0.0797], grad_fn=<AddBackward0>)
output: tensor([0.])
loss: tensor(0.0063, grad_fn=<MseLossBackward>) 


x: tensor([0.3300, 3.0000])
predicted_output: tensor([0.0715], grad_fn=<AddBackward0>)
output: tensor([1.])
loss: tensor(0.8622, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.0744], grad_fn=<AddBackward0>)
output: tensor([0.])
loss: tensor(0.0055, grad_fn=<MseLossBackward>) 


x: tensor([0.5000, 4.0000])
predicted_output: tensor([0.0744], grad_fn=<AddBackward0>)
output: tensor([0.])
loss: tensor(0.0055, grad_fn=<MseLossBackward>) 


x: tensor([0.2000, 5.0000])
predicted_output: tensor([0.0813], grad_fn=<AddBackward0>)
output: tensor([1.])
loss: tensor(0.8439, grad

### And Operator
*  0 x 0 = 0
*  1 x 0 = 0
*  0 x 1 = 0
*  1 x 1 = 1

### Or Operator
*  0 + 0 = 0
*  1 + 0 = 1
*  0 + 1 = 1
*  1 + 1 = 1

### Xor Operator
*  0 xor 0 = 0
*  1 xor 0 = 1
*  0 xor 1 = 1
*  1 xor 1 = 0


In [40]:
bits = torch.Tensor([[0.,0.],[1.,0.],[0.,1.],[1.,1.]])
and_y = torch.Tensor([[0.],[0.],[0.],[1.]])
or_y = torch.Tensor([[0.],[1.],[1.],[1.]])
xor_y = torch.Tensor([[0.],[1.],[1.],[0.]])

### Logistic Regression And

In [46]:
for t in range (1, 50):
    for x, output in zip(bits, and_y):
        print("x:", x)
        predicted_output = model(x)
        print("predicted_output:", predicted_output)
        print("output:", output)
        loss = criterion(predicted_output, output)
        print(list(model.parameters()))
        print("loss:", loss, "\n\n")
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

x: tensor([0., 0.])
predicted_output: tensor([0.5232], grad_fn=<SigmoidBackward>)
output: tensor([0.])
[Parameter containing:
tensor([[ 0.0005, -0.3515]], requires_grad=True), Parameter containing:
tensor([0.0929], requires_grad=True)]
loss: tensor(0.2737, grad_fn=<MseLossBackward>) 


x: tensor([1., 0.])
predicted_output: tensor([0.5168], grad_fn=<SigmoidBackward>)
output: tensor([0.])
[Parameter containing:
tensor([[ 0.0005, -0.3515]], requires_grad=True), Parameter containing:
tensor([0.0668], requires_grad=True)]
loss: tensor(0.2671, grad_fn=<MseLossBackward>) 


x: tensor([0., 1.])
predicted_output: tensor([0.4230], grad_fn=<SigmoidBackward>)
output: tensor([0.])
[Parameter containing:
tensor([[-0.0253, -0.3515]], requires_grad=True), Parameter containing:
tensor([0.0410], requires_grad=True)]
loss: tensor(0.1789, grad_fn=<MseLossBackward>) 


x: tensor([1., 1.])
predicted_output: tensor([0.4068], grad_fn=<SigmoidBackward>)
output: tensor([1.])
[Parameter containing:
tensor([[-0.0

x: tensor([1., 1.])
predicted_output: tensor([0.3616], grad_fn=<SigmoidBackward>)
output: tensor([1.])
[Parameter containing:
tensor([[ 0.0324, -0.2631]], requires_grad=True), Parameter containing:
tensor([-0.3376], requires_grad=True)]
loss: tensor(0.4075, grad_fn=<MseLossBackward>) 


x: tensor([0., 0.])
predicted_output: tensor([0.4236], grad_fn=<SigmoidBackward>)
output: tensor([0.])
[Parameter containing:
tensor([[ 0.0619, -0.2337]], requires_grad=True), Parameter containing:
tensor([-0.3082], requires_grad=True)]
loss: tensor(0.1794, grad_fn=<MseLossBackward>) 


x: tensor([1., 0.])
predicted_output: tensor([0.4337], grad_fn=<SigmoidBackward>)
output: tensor([0.])
[Parameter containing:
tensor([[ 0.0619, -0.2337]], requires_grad=True), Parameter containing:
tensor([-0.3288], requires_grad=True)]
loss: tensor(0.1881, grad_fn=<MseLossBackward>) 


x: tensor([0., 1.])
predicted_output: tensor([0.3581], grad_fn=<SigmoidBackward>)
output: tensor([0.])
[Parameter containing:
tensor([[ 

[Parameter containing:
tensor([[ 0.1558, -0.0947]], requires_grad=True), Parameter containing:
tensor([-0.5591], requires_grad=True)]
loss: tensor(0.1323, grad_fn=<MseLossBackward>) 


x: tensor([1., 0.])
predicted_output: tensor([0.3965], grad_fn=<SigmoidBackward>)
output: tensor([0.])
[Parameter containing:
tensor([[ 0.1558, -0.0947]], requires_grad=True), Parameter containing:
tensor([-0.5759], requires_grad=True)]
loss: tensor(0.1572, grad_fn=<MseLossBackward>) 


x: tensor([0., 1.])
predicted_output: tensor([0.3341], grad_fn=<SigmoidBackward>)
output: tensor([0.])
[Parameter containing:
tensor([[ 0.1368, -0.0947]], requires_grad=True), Parameter containing:
tensor([-0.5949], requires_grad=True)]
loss: tensor(0.1116, grad_fn=<MseLossBackward>) 


x: tensor([1., 1.])
predicted_output: tensor([0.3584], grad_fn=<SigmoidBackward>)
output: tensor([1.])
[Parameter containing:
tensor([[ 0.1368, -0.1096]], requires_grad=True), Parameter containing:
tensor([-0.6098], requires_grad=True)]
lo

[Parameter containing:
tensor([[0.2663, 0.0540]], requires_grad=True), Parameter containing:
tensor([-0.7631], requires_grad=True)]
loss: tensor(0.1431, grad_fn=<MseLossBackward>) 


x: tensor([0., 1.])
predicted_output: tensor([0.3259], grad_fn=<SigmoidBackward>)
output: tensor([0.])
[Parameter containing:
tensor([[0.2485, 0.0540]], requires_grad=True), Parameter containing:
tensor([-0.7809], requires_grad=True)]
loss: tensor(0.1062, grad_fn=<MseLossBackward>) 


x: tensor([1., 1.])
predicted_output: tensor([0.3759], grad_fn=<SigmoidBackward>)
output: tensor([1.])
[Parameter containing:
tensor([[0.2485, 0.0397]], requires_grad=True), Parameter containing:
tensor([-0.7952], requires_grad=True)]
loss: tensor(0.3895, grad_fn=<MseLossBackward>) 


x: tensor([0., 0.])
predicted_output: tensor([0.3174], grad_fn=<SigmoidBackward>)
output: tensor([0.])
[Parameter containing:
tensor([[0.2778, 0.0689]], requires_grad=True), Parameter containing:
tensor([-0.7659], requires_grad=True)]
loss: tens

predicted_output: tensor([0.3693], grad_fn=<SigmoidBackward>)
output: tensor([0.])
[Parameter containing:
tensor([[0.3821, 0.2021]], requires_grad=True), Parameter containing:
tensor([-0.9174], requires_grad=True)]
loss: tensor(0.1364, grad_fn=<MseLossBackward>) 


x: tensor([0., 1.])
predicted_output: tensor([0.3246], grad_fn=<SigmoidBackward>)
output: tensor([0.])
[Parameter containing:
tensor([[0.3649, 0.2021]], requires_grad=True), Parameter containing:
tensor([-0.9346], requires_grad=True)]
loss: tensor(0.1054, grad_fn=<MseLossBackward>) 


x: tensor([1., 1.])
predicted_output: tensor([0.4022], grad_fn=<SigmoidBackward>)
output: tensor([1.])
[Parameter containing:
tensor([[0.3649, 0.1878]], requires_grad=True), Parameter containing:
tensor([-0.9489], requires_grad=True)]
loss: tensor(0.3573, grad_fn=<MseLossBackward>) 


x: tensor([0., 0.])
predicted_output: tensor([0.2849], grad_fn=<SigmoidBackward>)
output: tensor([0.])
[Parameter containing:
tensor([[0.3936, 0.2166]], requires_

### 2-Layer And

In [41]:
for t in range (1, 50):
    for x, output in zip(bits, and_y):
        print("x:", x)
        predicted_output = model2(x)
        print("predicted_output:", predicted_output)
        print("output:", output)
        loss = criterion(predicted_output, output)
        print(list(model2.parameters()))
        print("loss:", loss, "\n\n")
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

x: tensor([0., 0.])
predicted_output: tensor([1.9666], grad_fn=<AddBackward0>)
output: tensor([0.])
[Parameter containing:
tensor([[ 0.2914, -0.8000],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 2.2821, -0.4541], requires_grad=True), Parameter containing:
tensor([[ 0.6646, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.4498], requires_grad=True)]
loss: tensor(3.8676, grad_fn=<MseLossBackward>) 


x: tensor([1., 0.])
predicted_output: tensor([-0.4821], grad_fn=<AddBackward0>)
output: tensor([0.])
[Parameter containing:
tensor([[ 0.2914, -0.8000],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 2.0207, -0.4541], requires_grad=True), Parameter containing:
tensor([[-0.2330, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.0565], requires_grad=True)]
loss: tensor(0.2324, grad_fn=<MseLossBackward>) 


x: tensor([0., 1.])
predicted_output: tensor([0.1409], grad_fn=<AddBackward0>)
output: tens

output: tensor([0.])
[Parameter containing:
tensor([[ 0.1655, -0.8956],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.8046, -0.4541], requires_grad=True), Parameter containing:
tensor([[-0.1378, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.3276], requires_grad=True)]
loss: tensor(0.0031, grad_fn=<MseLossBackward>) 


x: tensor([0., 1.])
predicted_output: tensor([0.1708], grad_fn=<AddBackward0>)
output: tensor([0.])
[Parameter containing:
tensor([[ 0.1671, -0.8956],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.8062, -0.4541], requires_grad=True), Parameter containing:
tensor([[-0.1599, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.3164], requires_grad=True)]
loss: tensor(0.0292, grad_fn=<MseLossBackward>) 


x: tensor([1., 1.])
predicted_output: tensor([0.0743], grad_fn=<AddBackward0>)
output: tensor([1.])
[Parameter containing:
tensor([[ 0.1671, -0.8902],
        [-0.5314, -0

loss: tensor(0.5860, grad_fn=<MseLossBackward>) 


x: tensor([0., 0.])
predicted_output: tensor([0.1947], grad_fn=<AddBackward0>)
output: tensor([0.])
[Parameter containing:
tensor([[-0.0871, -1.0929],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.6514, -0.4541], requires_grad=True), Parameter containing:
tensor([[-0.2482, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.6046], requires_grad=True)]
loss: tensor(0.0379, grad_fn=<MseLossBackward>) 


x: tensor([1., 0.])
predicted_output: tensor([0.0738], grad_fn=<AddBackward0>)
output: tensor([0.])
[Parameter containing:
tensor([[-0.0871, -1.0929],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.6610, -0.4541], requires_grad=True), Parameter containing:
tensor([[-0.3125, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.5657], requires_grad=True)]
loss: tensor(0.0054, grad_fn=<MseLossBackward>) 


x: tensor([0., 1.])
predicted_output: ten

loss: tensor(0.2064, grad_fn=<MseLossBackward>) 


x: tensor([1., 1.])
predicted_output: tensor([0.4415], grad_fn=<AddBackward0>)
output: tensor([1.])
[Parameter containing:
tensor([[-0.2743, -1.1507],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.6500, -0.4541], requires_grad=True), Parameter containing:
tensor([[-0.4871, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.5511], requires_grad=True)]
loss: tensor(0.3119, grad_fn=<MseLossBackward>) 


x: tensor([0., 0.])
predicted_output: tensor([-0.0744], grad_fn=<AddBackward0>)
output: tensor([0.])
[Parameter containing:
tensor([[-0.3287, -1.2052],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.5955, -0.4541], requires_grad=True), Parameter containing:
tensor([[-0.4620, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.6628], requires_grad=True)]
loss: tensor(0.0055, grad_fn=<MseLossBackward>) 


x: tensor([1., 0.])
predicted_output: te

x: tensor([1., 0.])
predicted_output: tensor([0.1792], grad_fn=<AddBackward0>)
output: tensor([0.])
[Parameter containing:
tensor([[-0.4789, -1.2219],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.5585, -0.4541], requires_grad=True), Parameter containing:
tensor([[-0.4785, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.6959], requires_grad=True)]
loss: tensor(0.0321, grad_fn=<MseLossBackward>) 


x: tensor([0., 1.])
predicted_output: tensor([0.4771], grad_fn=<AddBackward0>)
output: tensor([0.])
[Parameter containing:
tensor([[-0.4617, -1.2219],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.5757, -0.4541], requires_grad=True), Parameter containing:
tensor([[-0.5172, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.6601], requires_grad=True)]
loss: tensor(0.2276, grad_fn=<MseLossBackward>) 


x: tensor([1., 1.])
predicted_output: tensor([0.5646], grad_fn=<AddBackward0>)
output: tenso

loss: tensor(0.0540, grad_fn=<MseLossBackward>) 


x: tensor([1., 0.])
predicted_output: tensor([0.2462], grad_fn=<AddBackward0>)
output: tensor([0.])
[Parameter containing:
tensor([[-0.5974, -1.1037],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.6125, -0.4541], requires_grad=True), Parameter containing:
tensor([[-0.4926, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.7463], requires_grad=True)]
loss: tensor(0.0606, grad_fn=<MseLossBackward>) 


x: tensor([0., 1.])
predicted_output: tensor([0.4078], grad_fn=<AddBackward0>)
output: tensor([0.])
[Parameter containing:
tensor([[-0.5731, -1.1037],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.6368, -0.4541], requires_grad=True), Parameter containing:
tensor([[-0.5426, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.6970], requires_grad=True)]
loss: tensor(0.1663, grad_fn=<MseLossBackward>) 


x: tensor([1., 1.])
predicted_output: ten

[Parameter containing:
tensor([[-0.6903, -1.0506],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.6133, -0.4541], requires_grad=True), Parameter containing:
tensor([[-0.5120, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.7720], requires_grad=True)]
loss: tensor(0.0896, grad_fn=<MseLossBackward>) 


x: tensor([0., 1.])
predicted_output: tensor([0.3755], grad_fn=<AddBackward0>)
output: tensor([0.])
[Parameter containing:
tensor([[-0.6597, -1.0506],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.6439, -0.4541], requires_grad=True), Parameter containing:
tensor([[-0.5673, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.7121], requires_grad=True)]
loss: tensor(0.1410, grad_fn=<MseLossBackward>) 


x: tensor([1., 1.])
predicted_output: tensor([0.6254], grad_fn=<AddBackward0>)
output: tensor([1.])
[Parameter containing:
tensor([[-0.6597, -1.0080],
        [-0.5314, -0.2553]], requires_gra

predicted_output: tensor([-0.2865], grad_fn=<AddBackward0>)
output: tensor([0.])
[Parameter containing:
tensor([[-0.7339, -1.0108],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.6488, -0.4541], requires_grad=True), Parameter containing:
tensor([[-0.6136, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.7253], requires_grad=True)]
loss: tensor(0.0821, grad_fn=<MseLossBackward>) 


x: tensor([1., 0.])
predicted_output: tensor([0.3259], grad_fn=<AddBackward0>)
output: tensor([0.])
[Parameter containing:
tensor([[-0.7339, -1.0108],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.6136, -0.4541], requires_grad=True), Parameter containing:
tensor([[-0.5192, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.7826], requires_grad=True)]
loss: tensor(0.1062, grad_fn=<MseLossBackward>) 


x: tensor([0., 1.])
predicted_output: tensor([0.3504], grad_fn=<AddBackward0>)
output: tensor([0.])
[Parameter 

### Logsitic Regression Xor

In [47]:
for t in range (1, 50):
    for x, output in zip(bits, xor_y):
        print("x:", x)
        predicted_output = model(x)
        print("predicted_output:", predicted_output)
        print("output:", output)
        loss = criterion(predicted_output, output)
        print(list(model.parameters()))
        print("loss:", loss, "\n\n")
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

x: tensor([0., 0.])
predicted_output: tensor([0.2655], grad_fn=<SigmoidBackward>)
output: tensor([0.])
[Parameter containing:
tensor([[0.4733, 0.3158]], requires_grad=True), Parameter containing:
tensor([-1.0176], requires_grad=True)]
loss: tensor(0.0705, grad_fn=<MseLossBackward>) 


x: tensor([1., 0.])
predicted_output: tensor([0.3648], grad_fn=<SigmoidBackward>)
output: tensor([1.])
[Parameter containing:
tensor([[0.4733, 0.3158]], requires_grad=True), Parameter containing:
tensor([-1.0279], requires_grad=True)]
loss: tensor(0.4035, grad_fn=<MseLossBackward>) 


x: tensor([0., 1.])
predicted_output: tensor([0.3357], grad_fn=<SigmoidBackward>)
output: tensor([1.])
[Parameter containing:
tensor([[0.5028, 0.3158]], requires_grad=True), Parameter containing:
tensor([-0.9985], requires_grad=True)]
loss: tensor(0.4413, grad_fn=<MseLossBackward>) 


x: tensor([1., 1.])
predicted_output: tensor([0.4699], grad_fn=<SigmoidBackward>)
output: tensor([0.])
[Parameter containing:
tensor([[0.5028,

loss: tensor(0.3222, grad_fn=<MseLossBackward>) 


x: tensor([0., 1.])
predicted_output: tensor([0.4025], grad_fn=<SigmoidBackward>)
output: tensor([1.])
[Parameter containing:
tensor([[0.5401, 0.3613]], requires_grad=True), Parameter containing:
tensor([-0.7565], requires_grad=True)]
loss: tensor(0.3570, grad_fn=<MseLossBackward>) 


x: tensor([1., 1.])
predicted_output: tensor([0.5504], grad_fn=<SigmoidBackward>)
output: tensor([0.])
[Parameter containing:
tensor([[0.5401, 0.3901]], requires_grad=True), Parameter containing:
tensor([-0.7277], requires_grad=True)]
loss: tensor(0.3030, grad_fn=<MseLossBackward>) 


x: tensor([0., 0.])
predicted_output: tensor([0.3197], grad_fn=<SigmoidBackward>)
output: tensor([0.])
[Parameter containing:
tensor([[0.5128, 0.3628]], requires_grad=True), Parameter containing:
tensor([-0.7550], requires_grad=True)]
loss: tensor(0.1022, grad_fn=<MseLossBackward>) 


x: tensor([1., 0.])
predicted_output: tensor([0.4363], grad_fn=<SigmoidBackward>)
output: t

loss: tensor(0.3193, grad_fn=<MseLossBackward>) 


x: tensor([1., 1.])
predicted_output: tensor([0.5812], grad_fn=<SigmoidBackward>)
output: tensor([0.])
[Parameter containing:
tensor([[0.5339, 0.3940]], requires_grad=True), Parameter containing:
tensor([-0.6001], requires_grad=True)]
loss: tensor(0.3378, grad_fn=<MseLossBackward>) 


x: tensor([0., 0.])
predicted_output: tensor([0.3479], grad_fn=<SigmoidBackward>)
output: tensor([0.])
[Parameter containing:
tensor([[0.5056, 0.3657]], requires_grad=True), Parameter containing:
tensor([-0.6284], requires_grad=True)]
loss: tensor(0.1210, grad_fn=<MseLossBackward>) 


x: tensor([1., 0.])
predicted_output: tensor([0.4654], grad_fn=<SigmoidBackward>)
output: tensor([1.])
[Parameter containing:
tensor([[0.5056, 0.3657]], requires_grad=True), Parameter containing:
tensor([-0.6442], requires_grad=True)]
loss: tensor(0.2858, grad_fn=<MseLossBackward>) 


x: tensor([0., 1.])
predicted_output: tensor([0.4374], grad_fn=<SigmoidBackward>)
output: t

[Parameter containing:
tensor([[0.4829, 0.3540]], requires_grad=True), Parameter containing:
tensor([-0.5416], requires_grad=True)]
loss: tensor(0.1353, grad_fn=<MseLossBackward>) 


x: tensor([1., 0.])
predicted_output: tensor([0.4811], grad_fn=<SigmoidBackward>)
output: tensor([1.])
[Parameter containing:
tensor([[0.4829, 0.3540]], requires_grad=True), Parameter containing:
tensor([-0.5587], requires_grad=True)]
loss: tensor(0.2693, grad_fn=<MseLossBackward>) 


x: tensor([0., 1.])
predicted_output: tensor([0.4554], grad_fn=<SigmoidBackward>)
output: tensor([1.])
[Parameter containing:
tensor([[0.5088, 0.3540]], requires_grad=True), Parameter containing:
tensor([-0.5328], requires_grad=True)]
loss: tensor(0.2966, grad_fn=<MseLossBackward>) 


x: tensor([1., 1.])
predicted_output: tensor([0.5949], grad_fn=<SigmoidBackward>)
output: tensor([0.])
[Parameter containing:
tensor([[0.5088, 0.3811]], requires_grad=True), Parameter containing:
tensor([-0.5058], requires_grad=True)]
loss: tens

x: tensor([1., 1.])
predicted_output: tensor([0.5974], grad_fn=<SigmoidBackward>)
output: tensor([0.])
[Parameter containing:
tensor([[0.4753, 0.3593]], requires_grad=True), Parameter containing:
tensor([-0.4399], requires_grad=True)]
loss: tensor(0.3569, grad_fn=<MseLossBackward>) 


x: tensor([0., 0.])
predicted_output: tensor([0.3849], grad_fn=<SigmoidBackward>)
output: tensor([0.])
[Parameter containing:
tensor([[0.4465, 0.3306]], requires_grad=True), Parameter containing:
tensor([-0.4686], requires_grad=True)]
loss: tensor(0.1482, grad_fn=<MseLossBackward>) 


x: tensor([1., 0.])
predicted_output: tensor([0.4899], grad_fn=<SigmoidBackward>)
output: tensor([1.])
[Parameter containing:
tensor([[0.4465, 0.3306]], requires_grad=True), Parameter containing:
tensor([-0.4868], requires_grad=True)]
loss: tensor(0.2602, grad_fn=<MseLossBackward>) 


x: tensor([0., 1.])
predicted_output: tensor([0.4673], grad_fn=<SigmoidBackward>)
output: tensor([1.])
[Parameter containing:
tensor([[0.4720,

### 2-Layer Xor

In [48]:
for t in range (1, 50):
    for x, output in zip(bits, xor_y):
        print("x:", x)
        predicted_output = model2(x)
        print("predicted_output:", predicted_output)
        print("output:", output)
        loss = criterion(predicted_output, output)
        print(list(model2.parameters()))
        print("loss:", loss, "\n\n")
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

x: tensor([0., 0.])
predicted_output: tensor([-0.3323], grad_fn=<AddBackward0>)
output: tensor([0.])
[Parameter containing:
tensor([[-0.7107, -0.9748],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.6810, -0.4541], requires_grad=True), Parameter containing:
tensor([[-0.6227, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.7145], requires_grad=True)]
loss: tensor(0.1104, grad_fn=<MseLossBackward>) 


x: tensor([1., 0.])
predicted_output: tensor([0.1103], grad_fn=<AddBackward0>)
output: tensor([1.])
[Parameter containing:
tensor([[-0.7107, -0.9748],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.6810, -0.4541], requires_grad=True), Parameter containing:
tensor([[-0.6227, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.7145], requires_grad=True)]
loss: tensor(0.7916, grad_fn=<MseLossBackward>) 


x: tensor([0., 1.])
predicted_output: tensor([0.2748], grad_fn=<AddBackward0>)
output: tens

x: tensor([0., 0.])
predicted_output: tensor([-0.3323], grad_fn=<AddBackward0>)
output: tensor([0.])
[Parameter containing:
tensor([[-0.7107, -0.9748],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.6810, -0.4541], requires_grad=True), Parameter containing:
tensor([[-0.6227, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.7145], requires_grad=True)]
loss: tensor(0.1104, grad_fn=<MseLossBackward>) 


x: tensor([1., 0.])
predicted_output: tensor([0.1103], grad_fn=<AddBackward0>)
output: tensor([1.])
[Parameter containing:
tensor([[-0.7107, -0.9748],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.6810, -0.4541], requires_grad=True), Parameter containing:
tensor([[-0.6227, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.7145], requires_grad=True)]
loss: tensor(0.7916, grad_fn=<MseLossBackward>) 


x: tensor([0., 1.])
predicted_output: tensor([0.2748], grad_fn=<AddBackward0>)
output: tens

x: tensor([0., 1.])
predicted_output: tensor([0.2748], grad_fn=<AddBackward0>)
output: tensor([1.])
[Parameter containing:
tensor([[-0.7107, -0.9748],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.6810, -0.4541], requires_grad=True), Parameter containing:
tensor([[-0.6227, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.7145], requires_grad=True)]
loss: tensor(0.5260, grad_fn=<MseLossBackward>) 


x: tensor([1., 1.])
predicted_output: tensor([0.7145], grad_fn=<AddBackward0>)
output: tensor([0.])
[Parameter containing:
tensor([[-0.7107, -0.9748],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.6810, -0.4541], requires_grad=True), Parameter containing:
tensor([[-0.6227, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.7145], requires_grad=True)]
loss: tensor(0.5105, grad_fn=<MseLossBackward>) 


x: tensor([0., 0.])
predicted_output: tensor([-0.3323], grad_fn=<AddBackward0>)
output: tens

output: tensor([1.])
[Parameter containing:
tensor([[-0.7107, -0.9748],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.6810, -0.4541], requires_grad=True), Parameter containing:
tensor([[-0.6227, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.7145], requires_grad=True)]
loss: tensor(0.5260, grad_fn=<MseLossBackward>) 


x: tensor([1., 1.])
predicted_output: tensor([0.7145], grad_fn=<AddBackward0>)
output: tensor([0.])
[Parameter containing:
tensor([[-0.7107, -0.9748],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.6810, -0.4541], requires_grad=True), Parameter containing:
tensor([[-0.6227, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.7145], requires_grad=True)]
loss: tensor(0.5105, grad_fn=<MseLossBackward>) 


x: tensor([0., 0.])
predicted_output: tensor([-0.3323], grad_fn=<AddBackward0>)
output: tensor([0.])
[Parameter containing:
tensor([[-0.7107, -0.9748],
        [-0.5314, -

[Parameter containing:
tensor([[-0.7107, -0.9748],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.6810, -0.4541], requires_grad=True), Parameter containing:
tensor([[-0.6227, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.7145], requires_grad=True)]
loss: tensor(0.5260, grad_fn=<MseLossBackward>) 


x: tensor([1., 1.])
predicted_output: tensor([0.7145], grad_fn=<AddBackward0>)
output: tensor([0.])
[Parameter containing:
tensor([[-0.7107, -0.9748],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.6810, -0.4541], requires_grad=True), Parameter containing:
tensor([[-0.6227, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.7145], requires_grad=True)]
loss: tensor(0.5105, grad_fn=<MseLossBackward>) 


x: tensor([0., 0.])
predicted_output: tensor([-0.3323], grad_fn=<AddBackward0>)
output: tensor([0.])
[Parameter containing:
tensor([[-0.7107, -0.9748],
        [-0.5314, -0.2553]], requires_gr

loss: tensor(0.1104, grad_fn=<MseLossBackward>) 


x: tensor([1., 0.])
predicted_output: tensor([0.1103], grad_fn=<AddBackward0>)
output: tensor([1.])
[Parameter containing:
tensor([[-0.7107, -0.9748],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.6810, -0.4541], requires_grad=True), Parameter containing:
tensor([[-0.6227, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.7145], requires_grad=True)]
loss: tensor(0.7916, grad_fn=<MseLossBackward>) 


x: tensor([0., 1.])
predicted_output: tensor([0.2748], grad_fn=<AddBackward0>)
output: tensor([1.])
[Parameter containing:
tensor([[-0.7107, -0.9748],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.6810, -0.4541], requires_grad=True), Parameter containing:
tensor([[-0.6227, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.7145], requires_grad=True)]
loss: tensor(0.5260, grad_fn=<MseLossBackward>) 


x: tensor([1., 1.])
predicted_output: ten

loss: tensor(0.1104, grad_fn=<MseLossBackward>) 


x: tensor([1., 0.])
predicted_output: tensor([0.1103], grad_fn=<AddBackward0>)
output: tensor([1.])
[Parameter containing:
tensor([[-0.7107, -0.9748],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.6810, -0.4541], requires_grad=True), Parameter containing:
tensor([[-0.6227, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.7145], requires_grad=True)]
loss: tensor(0.7916, grad_fn=<MseLossBackward>) 


x: tensor([0., 1.])
predicted_output: tensor([0.2748], grad_fn=<AddBackward0>)
output: tensor([1.])
[Parameter containing:
tensor([[-0.7107, -0.9748],
        [-0.5314, -0.2553]], requires_grad=True), Parameter containing:
tensor([ 1.6810, -0.4541], requires_grad=True), Parameter containing:
tensor([[-0.6227, -0.2972]], requires_grad=True), Parameter containing:
tensor([0.7145], requires_grad=True)]
loss: tensor(0.5260, grad_fn=<MseLossBackward>) 


x: tensor([1., 1.])
predicted_output: ten

# Word Embeddings


![WordEmbeddings](Pictures/IntroHLT/WordEmbeddings.png)

These are manually chosen features. How does one choose them? Instead, today, most methods learn them automatically.

## One-hot Vector

![OneHot](Pictures/IntroHLT/OneHot.png)