# Introduction into PyNeuraLogic

Learning the XOR operation is a popular elementary task, and serves here as an example to showcase the basics of problem encoding and the library usage. Note that the problem is merely
propositional rather than [relational](https://link.springer.com/referenceworkentry/10.1007%2F978-0-387-30164-8_719) (i.e. we do not yet use logical variables in this problem, causing the rules to correspond to a standard feedforward network rather than a relational model like, e.g., a GNN).


The XOR operation has two inputs - $I_1 \in \{0, 1\}$ and $I_2 \in \{0, 1\}$, and one output $O \in \{0, 1\}$. The whole operation can be summarized by Table 1.

#### Table 1: The XOR truth table
| X | Y | O |
|---|---|---|
| 0 | 0 | 0 |
| 1 | 0 | 1 |
| 0 | 1 | 1 |
| 1 | 1 | 0 |

Install PyNeuraLogic from PyPI

In [None]:
! pip install neuralogic

## Template

The model for learning the XOR operation can be expressed
in multiple ways; the following model reduces the
architecture to one rule, representing one classic neural layer.
The rule can be read as: _"proposition xor is implied by proposition xy."_ (the nullary "relation" with no arguments corresponds to a proposition, i.e. a simple statement)

In [1]:
from neuralogic.nn import get_evaluator
from neuralogic.core import Backend, Relation, Template, Settings, Optimizer
from neuralogic.dataset import Dataset

In [2]:
template = Template()
template.add_rule(Relation.xor[1, 8] <= Relation.xy[8, 2])

We also declared dimensionality of the weights for each part - $W_{xor}$ for relation `xor` and $W_{xy}$ for relation `xy`. Since we did not specify concrete values for weights here, these will be learnable and sampled randomly from, by default, the uniform distribution (the distribution can be changed via settings)

This rule then represents the following equation, where the output of $f(x)$ is the output of the `xor` proposition and $x$ is the value of the `xy` proposition. Functions $\phi_{rule}$ and $\phi_{xor}$ are activation functions of our rule and the proposition `xor`, respectively. In our case, $\phi_{rule}$ is equal to the $\tanh$ function, and $\phi_{xor}$ is the identity function.

$$W_{xor} \in \mathbb{R}^{1, 8}, W_{xy} \in \mathbb{R}^{8, 2}, x \in \{0,1\}^2$$

$$f(x) = \phi_{xor}(W_{xor} \cdot \phi_{rule}(W_{xy} \cdot x)) $$


## Defining a Dataset

To be able to learn our parameters $W_{xor}$
and $W_{xy}$, we need to create a training dataset
that contains examples. In our case, the dataset
examples are straightforward and mimic the truth
table (Table 1).

In [3]:
dataset = Dataset()


dataset.add_examples(
    [
        Relation.xor[0] <= Relation.xy[[0, 0]],
        Relation.xor[1] <= Relation.xy[[0, 1]],
        Relation.xor[1] <= Relation.xy[[1, 0]],
        Relation.xor[0] <= Relation.xy[[1, 1]],
    ]
)


Each example in the dataset corresponds to one row in the truth table. While defining datasets, the value of each proposition is its actual value and not (a learnable) weight.

For example, the following example can be read as: _"Given the proposition xy's value being equal to the vector $(0, 1)$, we are expecting the proposition xor to have a value equal to scalar $1$."_

```
Relation.xor[1] <= Relation.xy[[0, 1]]
```

## Training

We can do the training manually by writing a training loop, similarly to popular frameworks,  or using a predefined training loop implemented inside evaluators, which are suitable for quick prototyping and switching between different backends, such as DyNet or Java. Such evaluators can be conveniently customized via settings to specify optimizer, learning rate, error function, and more. In our example, we have chosen the Java backend with a stochastic gradient descent optimizer for training.

In [4]:
printouts = 10

settings = Settings(optimizer=Optimizer.SGD, epochs=100)
evaluator = get_evaluator(template, settings)
built_dataset = evaluator.build_dataset(dataset)

for epoch, (total_loss, seen_instances) in \
    enumerate(evaluator.train(built_dataset)):
    if epoch % printouts == 0:
        print(f"Epoch {epoch}, average loss {total_loss / seen_instances}")

Epoch 0, average loss 0.8486421179926945
Epoch 10, average loss 0.2523365313646398
Epoch 20, average loss 0.22581171403149267
Epoch 30, average loss 0.20155291380951051
Epoch 40, average loss 0.16632702588300935
Epoch 50, average loss 0.12272502214289742
Epoch 60, average loss 0.08378590695186636
Epoch 70, average loss 0.05659655622355965
Epoch 80, average loss 0.03952319346092247
Epoch 90, average loss 0.028906304842870375


Before the training is evaluated, our dataset is "grounded" with our template. The grounding then yields one computation graph for each query from the dataset. In this propositional problem setting, this will produce, for each query, a computation network with the same structure but with different input and target values.

## Testing
Evaluators also encapsulate testing with a user-friendly interface that is analogous to training.

In [5]:
for label, predicted in evaluator.test(built_dataset):
    print(f"Label: {label}, predicted: {predicted}")


Label: 0, predicted: 0
Label: 1, predicted: 0.7998393002473414
Label: 1, predicted: 0.7991515636502554
Label: 0, predicted: 0.0324124386538237
