# Lab 8_1: Implementing XOR with Neural Networks

In [1]:
import torch

In [2]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

In [3]:
# for reproducibility
if device == 'cuda':
    torch.cuda.manual_seed_all(777)

torch.manual_seed(777)

<torch._C.Generator at 0xff5fad1cecf0>

In [4]:
# train data: XOR situations
X = torch.FloatTensor([[0, 0], [0, 1], [1, 0], [1, 1]]).to(device)
Y = torch.FloatTensor([[0], [1], [1], [0]]).to(device)

In [5]:
# nn layers
linear = torch.nn.Linear(2, 1, bias=True) # input_dim = 2, output_dim = 1
sigmoid = torch.nn.Sigmoid()

The `torch.nn.Sequential` is a container module in PyTorch that allows you to build neural networks by stacking layers in a sequential manner. 

In [6]:
# model
model = torch.nn.Sequential(linear, sigmoid).to(device) # first linear, then sigmoid

Binary Cross-Entropy (BCE) Loss, also known as Log Loss, is a loss function commonly used in binary classification tasks. It measures the performance of a classification model whose output is a probability value between 0 and 1.

Here's the formula for Binary Cross-Entropy Loss:

$ \text{BCE} = -\frac{1}{N} \sum_{i=1}^{N} [y_i \log(p_i) + (1 - y_i) \log(1 - p_i)] $

Where:
- \( N \) is the number of samples.
- \( y_i \) is the actual label (0 or 1) for the \( i \)-th sample.
- \( p_i \) is the predicted probability for the \( i \)-th sample.

### Explanation:
- The BCE loss function penalizes the model more when it makes confident but incorrect predictions.
- If the actual label \( y_i \) is 1 and the predicted probability \( p_i \) is close to 0, the loss will be very high.
- Conversely, if \( y_i \) is 0 and \( p_i \) is close to 1, the loss will also be very high.
- The goal is to minimize this loss function during training, which means the model's predicted probabilities should be as close as possible to the actual labels.

In [7]:
# define cost/loss & optimizer
criterion = torch.nn.BCELoss().to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

In [8]:
for step in range(2000):
    optimizer.zero_grad()
    hypothesis = model(X)

    # cost/loss function
    cost = criterion(hypothesis, Y)
    cost.backward()
    optimizer.step()

    if step % 100 == 0:
        print(step, cost.item())
print(cost.item())
for name, param in model.named_parameters():
    print(name, param.data)

0 0.7273973822593689
100 0.6951808929443359
200 0.693813681602478
300 0.6933773756027222
400 0.6932311654090881
500 0.6931794881820679
600 0.6931601762771606
700 0.6931526064872742
800 0.6931495070457458
900 0.6931481957435608
1000 0.693147599697113
1100 0.6931474208831787
1200 0.6931473016738892
1300 0.6931472420692444
1400 0.6931471824645996
1500 0.6931471824645996
1600 0.6931471824645996
1700 0.6931471824645996
1800 0.6931471824645996
1900 0.6931471228599548
0.6931471824645996
0.weight tensor([[-5.1122e-05, -4.9037e-05]])
0.bias tensor([5.9384e-05])


In [9]:
# Accuracy computation
# True if hypothesis>0.5 else False
with torch.no_grad():
    hypothesis = model(X)
    predicted = (hypothesis > 0.5).float()
    accuracy = (predicted == Y).float().mean()
    print('\nHypothesis: ', hypothesis.detach().cpu().numpy(), '\nCorrect: ', predicted.detach().cpu().numpy(), '\nAccuracy: ', accuracy.item())


Hypothesis:  [[0.50001484]
 [0.5000026 ]
 [0.5000021 ]
 [0.4999898 ]] 
Correct:  [[1.]
 [1.]
 [1.]
 [0.]] 
Accuracy:  0.75
