# Introduction into PyNeuraLogic

Learning the XOR operation is a relatively elementary task, but it serves as a good example to showcase the basics of problem encoding and library usage. Note that the problem is
used for simple library introduction and is, in fact,
a propositional rather than a relational problem<sup>1</sup>.

<sub>[1] i.e. the template here does not contain any variables, causing it to correspond to a standard neural network rather than a GNN.</sub>

The XOR operation has two inputs - $I_1 \in \{0, 1\}$ and $I_2 \in \{0, 1\}$, and one output $O \in \{0, 1\}$. The whole operation can be summarized by Table 1.

#### Table 1: The XOR truth table
| X | Y | O |
|---|---|---|
| 0 | 0 | 0 |
| 1 | 0 | 1 |
| 0 | 1 | 1 |
| 1 | 1 | 0 |

## Template

The model for learning the XOR operation can be expressed
in multiple ways; the following model reduces the
architecture into one rule, representing one layer.
The rule can be read as: _"Atom xor is implied by atom xy."_

In [16]:
from neuralogic.nn import get_evaluator
from neuralogic.core import Backend
from neuralogic.core import Atom, Template, Initializer, Var, Term
from neuralogic.core.settings import Settings, Optimizer
from neuralogic.utils.data import Dataset

In [2]:
with Template().context() as template:
    template.add_rule(Atom.xor[1, 8] <= Atom.xy[8, 2])

We also declared weight with given dimensions for each atom - $W_{xor}$ for atom `xor` and $W_{xy}$ for atom `xy`. Since we did not specify concrete values for weights, those learnable parameters will be sampled randomly from, by default, the uniform distribution. <sup>The distribution can be changed via settings.</sup>

This rule subsequently represents the following equation, where the output of $f(x)$ is the output of the `xor` atom and $x$ is the value of the `xy` atom. Functions $\phi_{rule}$ and $\phi_{xor}$ are activation functions of our rule and the atom `xor`, respectively. In our case, $\phi_{rule}$ is equal to the $\tanh$ function, and $\phi_{xor}$ is the identity function.

$$W_{xor} \in \mathbb{R}^{1, 8}, W_{xy} \in \mathbb{R}^{8, 2}, x \in \{0,1\}^2$$

$$f(x) = \phi_{xor}(W_{xor} \cdot \phi_{rule}(W_{xy} \cdot x)) $$


## Defining a Dataset

To be able to learn our parameters $W_{xor}$
and $W_{xy}$, we need to create a training dataset
that contains examples. In our case, the dataset
examples are straightforward and mimic the truth
table (Table 1).

In [3]:
dataset = Dataset()

with template.context():
    dataset.add_examples(
        [
            Atom.xor[0] <= Atom.xy[[0, 0]],
            Atom.xor[1] <= Atom.xy[[0, 1]],
            Atom.xor[1] <= Atom.xy[[1, 0]],
            Atom.xor[0] <= Atom.xy[[1, 1]],
        ]
    )


Each example in the dataset corresponds to one row in the truth table. In the scope of datasets, the value of each atom is its actual value and not (a learnable) weight.

For example, the following example can be read as: _"Given the atom xy's value is equal to the vector $(0, 1)$, we are expecting the atom xor to have a value equal to scalar $1$."_

```
Atom.xor[1] <= Atom.xy[[0, 1]]
```

## Training

We can do the training manually by writing a training loop, similarly to popular frameworks,  or using a predefined training loop implemented inside evaluators, which are suitable for quick prototyping and switching between different backends, such as DyNet or Java. Such evaluators can be conveniently customized via settings to specify optimizer, learning rate, error function, and more. In our example, we have chosen the Java backend with a stochastic gradient descent optimizer for training.

In [43]:
settings = Settings(optimizer=Optimizer.SGD, epochs=100)

evaluator = get_evaluator(Backend.DYNET, template, settings=settings)

printouts = 10

for epoch, (total_loss, seen_instances) in \
    enumerate(evaluator.train(dataset)):
    if epoch % printouts == 0:
        print(f"Epoch {epoch}, average loss {total_loss / seen_instances}")

Epoch 0, average loss 1.0051223076879978
Epoch 10, average loss 4.786243493981601e-06
Epoch 20, average loss 1.1231510357173624e-09
Epoch 30, average loss 1.5502428171460664e-13
Epoch 40, average loss 6.578666715659383e-14
Epoch 50, average loss 1.3155780894041831e-16
Epoch 60, average loss 1.3155780894041831e-16
Epoch 70, average loss 1.3155780894041831e-16
Epoch 80, average loss 1.3155780894041831e-16
Epoch 90, average loss 1.3155780894041831e-16


Before the training is evaluated, our dataset is grounded with our template. The grounding then yields one computation graph for each query from the dataset. In our case this will produce, for each query, a computation network with the same structure but with different input and target values.

## Testing
Evaluators also encapsulate testing with a user-friendly interface that is analogous to training.

In [44]:
for label, predicted in evaluator.test(dataset):
    print(f"Label: {label}, predicted: {predicted}")


Label: 0.0, predicted: 0.0
Label: 1.0, predicted: 1.0
Label: 1.0, predicted: 1.0
Label: 0.0, predicted: -2.2939730115467682e-08
