# Neuro-symbolic AI - LTN

This workbook explores the concepts of **hybrid intelligent systems**, in particular of how to combine a knowledge-base structure with a neural network, in other words, how to create a **neuro-symbolic** solution. 


Throughout this notebook, you will work on simple hybrid systems, using logic and, of course, neural networks.You will find some guided examples to aid your understanding, and some exercises for you to implement on your own.

#### Content:
* [LTN - Logical Tensor Networks](#ltn)
    * [Getting started](#ltn-start)
    * [Exercise: Wine classification](#ltn-ex)

## LTN - Logic Tensor Network <a class="anchor" id="ltn"></a>

[Logic tensor networks (LTN)](https://arxiv.org/pdf/2012.13635.pdf) are a framework that combines neural networks with first-order logic to enable machines to reason and make decisions based on logical rules. Logical propositions are used to represent the KB as formulas, and the neural /deep learning part is used to learn the different weights of these formulas. These logical propositions integrate prior domain knowledge into the neural network, and act as constraints on the neural network’s performance: if the neural network’s output violates the logical propositions, then it is penalised. This means that during training, an LTN does not need only to improve its predictive power, but it has also satisfy the logical propositions.

In what follows we will use [LTNtorch](https://github.com/tommasocarraro/LTNtorch), an LTN implementation based on the deep learning package [PyTorch](https://github.com/pytorch/pytorch) (similar to Tensorflow).

### Getting started <a class="anchor" id="gcn-start"></a>

In [None]:
# uncomment to install LTNtorch, make sure to have also pytorch installed
# !pip install LTNtorch

import torch
import ltn

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, ConfusionMatrixDisplay

### Exercise - wine classification <a class="anchor" id="ltn-ex"></a>

The following exercise is adapted from _Dingli, A. and Farrugia, D. (2023) Neuro-Symbolic AI_ book.

We are going to implement a simple LTN to classify type of wines. The dataset used is the [red & wine dataset](https://www.kaggle.com/datasets/numberswithkartik/red-white-wine-dataset). The dataset contains 6000 data points and 11 features describing different wine characteristics. 


**Dataset**

In [None]:
# import dataset
wine_df = pd.read_csv('./wine_dataset.csv')

wine_df.sample(5)

In [None]:
# number of samples per class
wine_df['style'].value_counts()

The dataset is fairly clean (alotugh imbalance!), and we need to only perform the following processing steps:
- Convert to binary the label `style` column
- Separate features from label
- Normalise the features (right now all numerical features have different scales).
- Create train/test dataset

In [None]:
# convert style to binary True=red / False=white
wine_df['style'] = np.where(wine_df['style'] == 'red', True, False)

# sparate features from label
X = wine_df.loc[:, wine_df.columns != 'style'].values
y = wine_df['style']

# standardise the features
X = StandardScaler().fit_transform(X)

# split dataset: train/test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


The features and labels created need to be converted into [Pytorch tensors](https://pytorch.org/tutorials/beginner/introyt/tensors_deeper_tutorial.html),  multi-dimensional arrays that allow for efficient computation within the Pytorch framework.

In [None]:
# convert to tensors
X_train = torch.as_tensor(X_train).to(dtype=torch.float32)
y_train = torch.as_tensor(y_train.values).to(dtype=torch.float32)
X_test = torch.as_tensor(X_test).to(dtype=torch.float32)
y_test = torch.as_tensor(y_test.values).to(dtype=torch.float32)

While training a model, we  want to pass data in 'batches', reshuffled data at every epoch to reduce model overfitting. In PyTorch this is done using [`DataLoader`](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html#preparing-your-data-for-training-with-dataloaders), an iterable pytorch construct that allows to create batches to feed into the model. 

The input to `DataLoader` is a `TensorDataset`, a combination of the features and label tensors into a unique dataset. 

DataLoader will randomly put data into batches, possibly creating batches with all 0s or 1s. We want to make sure tha each batch has some samples for each class. For this reason, we are going to create a `balanced_sampler` (see code below), to pass to DataLoader and create less random and more balanced batches.

In [None]:
from torch.utils.data import WeightedRandomSampler
from torch.utils.data import TensorDataset, DataLoader

# cont how many samples for each class are in the training set
class_sample_count = np.array(
    [len(np.where(y_train == t)[0]) for t in np.unique(y_train)])

# compute the inverse of the previous count
# this will represent the weight for each class
weight = 1. /class_sample_count

# associate to each class the corresponding weight
samples_weight = np.array([weight[t.int().item()] for t in y_train])
samples_weight = torch.from_numpy(samples_weight)

# create the balanced sampler helper to pass to DataLoader
balanced_sampler = WeightedRandomSampler(
    samples_weight.type('torch.DoubleTensor'), 
    len(samples_weight))


# create train and test Tensordataset
train_dataset = TensorDataset(X_train, y_train)
test_dataset = TensorDataset(X_test, y_test)

# create train and test DataLoader
train_dataloader = DataLoader(train_dataset, batch_size=16, sampler=balanced_sampler)
test_dataloader = DataLoader(test_dataset, batch_size=16)

**Neural Network**

The neural network part of our LTN system will be a very simple **feed forward neural network _N_**, with the following characteristics:

- **Input layer**: (11,64)
- **Hidden layer**: (64, 64)
- **Output layer**: (64,1)

Let's define the model using PyTorch syntax, so we will be able to use it within the LTNTorch framework.

In [None]:
class ModelN(torch.nn.Module):
    def __init__(self):
        # define model components
        super(ModelN, self).__init__()
        
        # aactivation functions
        self.sigmoid = torch.nn.Sigmoid()
        self.relu = torch.nn.ReLU()
        
        # layers
        self.layer1 = torch.nn.Linear(11, 64)
        self.layer2 = torch.nn.Linear(64, 64)
        self.layer3 = torch.nn.Linear(64, 1)
        
        # dropout
        self.dropout = torch.nn.Dropout(p=0.1)
        
    # construct the feed-forward network
    def forward(self, x):
        # layer 1: activate input layer (11,64) using ReLu function
        x = self.relu(self.layer1(x))
        # layer 2 : activate hidden layer (64,64) using ReLu function
        x = self.relu(self.layer2(x))
        # add dropout with probability 0.1
        x = self.dropout(x)
        # layer 3: return output layer (64,1) with sigmoid function
        return self.sigmoid(self.layer3(x))

The code above is equivalent to the following Tensorflow code

```python
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

model = Sequential([
    Dense(64, activation='relu', input_shape=(11,)),
    Dense(64, activation='relu'),
    Dropout(0.1),
    Dense(1, activation='sigmoid')
])
```

**Logic KB: Predicates, Connectives and Quantifiers**

In a LTN, the knowledge-base is represented by first-order-logic (FOL) elements. 

On Week2, we have seen that FOL improves the classic logic (simple true/false statements) by providing **generalisations** with **predicates** and **quantifiers**. For example, a FOL element is:

<div style='text-align: center;'>
∀ x (Sunny(x) ∧ ￢ Weekend(x) -> ￢ Hiking(me)) 
</div>

In plain English: for any days of the week, which is sunny but not the weekend, then I don't go hiking. In particular, the statement above has the following components:

* **predicates**: Sunny(x), Weekend(x), are all predicates, where the true/false value depends on the specific value of the _variable_ x (=day of the week)
* **connectives**: ∧,  ￢, ->, these are  connectives, allowing multiple predicates to be combined together
* **quantifiers**: ∀ (for all) allows the full combination of predicates to be generalised to any day of the week.

To create the KB for our neurosymbolic solution, we need to define the predicates, connectives and quantifiers related to our dataset.

As seen above, the predicate is a 'function' that maps a variable x to a true/false state. For this reason, we can choose the _predicate_ to be the simple **feed forward neural network _N_** defined before: for each data point x in the wine dataset, it will predict (= map) this variable into a label of true(= red) or false(= white). We can therefore write:

<div style='text-align: center;'>
    <i>N</i>(x,L) 
</div>

In plain English: the network _N_ will associate (predict) the data point x to the label L(red/white, true/flase). 

Another fact that we know about our dataset is that the two possible labels (red/white) are mutually exclusive: if a wine gets predcited as red cannot (and is not!) be predicted as white.

Given all these facts we can define our **knowledge base** with the following logic rules:
<div style='text-align: center; white-space: pre-wrap;'>
    1. <i>N</i>(x<sub>r</sub>,L<sub>r</sub>)
    2. <i>N</i>(x<sub>w</sub>,L<sub>w</sub>) 
    3. ∀ x<sub>L</sub>: <i>N</i>(x<sub>L</sub>, L)
    4. ∀ x<sub>￢L</sub>: ￢<i>N</i>(x<sub>￢L</sub>, ￢L)
</div>

Let's try to undertsand these rules:

* <strong><i>N</i>(x<sub>r</sub>,L<sub>r</sub>)</strong> and <strong><i>N</i>(x<sub>w</sub>,L<sub>w</sub>)</strong>: here x<sub>r</sub> and x<sub>w</sub> are data points in our training set for which we already know their classification/label, red and white, respectively. These rules are to ensure that the network correctly classify what is already known (hence true), in other words <i>N</i>(x<sub>r</sub>,L<sub>w</sub>)  or <i>N</i>(x<sub>w</sub>,L<sub>r</sub>) would be false (and a contradiction). The idea behind these rules is to ensure that the neural network _N_ reasons without contradictions in the **training phase**: all the true facts have to be considered as such throughout the whole training process.
* <strong>∀ x<sub>L</sub>: <i>N</i>(x<sub>L</sub>, L)</strong>: here x<sub>L</sub> is any variable in our dataset, for which a label L is associated. This rule, together with <strong>∀ x<sub>￢L</sub>: ￢<i>N</i>(x<sub>￢L</sub>, ￢L)</strong>, is meant to ensure the _mutual exclusivity_ of our classification: if a data point x gets classified/predicted as L, it cannot be predcited simultaneosuly as the opposite, ￢L. In other words, these rules are based on the _laws of logic_ seen during Week 2: if something is True cannot be also False (law of contradiction). The idea behind these rules is to ensure that the neural network _N_ reasons without contradictions in the **prediction and optimisation phase**: if a contradiction is reached, the networks is failing. 

It's now time to code our KB. Given the rules above we will need a predciate, a connective ￢, and a quantifier ∀.  First we create the predicate _N_:

In [None]:
# create predicate N (i.e., our Neural Network)
N = ltn.Predicate(ModelN())
print(N)

Create the NOT (￢) connective. 

In LTN,  a connective is created using the constructor `Connective()`, which takes as input a _fuzzy connective semantics_ from the `ltn.fuzzy_ops` module. As we have seen in Week6, fuzzy logic extend binary logic by considering all the values between 1=True and 0=False, so binary logic is just a special case of fuzzy logic. The reason why we use fuzzy semantic is to include all the values in the range [0,1]. 

If you are still wondering why we use fuzzy logic in a LTN, think about the output of a NN for binary classification: the sigmoid function squashes the output of the neural network to a value between 0 and 1, which can be interpreted as the probability of the input belonging to a certain class. This is the reason why we extend the binary logic to fuzzy logic in creating our connectives!

In [None]:
# create the NOT standard connective
Not = ltn.Connective(ltn.fuzzy_ops.NotStandard())
print(Not)

Finally create the FOR ALL (∀) quantifier.

In LTN, a quantifier is created using the constructor `Quantifier()`, which takes as input an aggregation semantics and a character indicating which type of quantification is associated to the quantifier ('e' for exists, 'f' for forall). 

In [None]:
# create the FOR ALL quantifier
Forall = ltn.Quantifier(ltn.fuzzy_ops.AggregPMeanError(p=2),
                        quantifier="f")
print(Forall)

_Error aggregation_ is a statistical technique that combines multiple error measures into a single value. In particular, **p-mean error aggregation** (`AggregPMeanError`)  calculates the average of the p-th power of individual error measures. In the case above, p=2. In a LTN, we use p-mean error aggregation to optimise the network performance since we want our KB rules to hold **for all** data points in the dataset, hence by aggregating errors we can assess the network ability to generalise to all data points.

The connective and quantifier defined above, and their specific definition, are the recommended approach by LTNtorch for binary classification. For more complex and different situation, refer to the [LTNtorch example page](https://github.com/tommasocarraro/LTNtorch/tree/main/examples).

**Optmisation**

We are almost there! The last thing to define is a way to optimise our solution. Since we are dealing with an hybrid system, we need to optimise two aspect of the system: the neural network prediction performance, and the neural network abilitty to reason without contradiction. in the first case we will use an Adam optimiser, as usually is done for neural networks. To optimise the reasoning we use `SatAgg` , the satisfaction aggregator, an operator which aggregates the truth values of _all_ the rules included in the knowledge base.

In [None]:
SatAgg = ltn.fuzzy_ops.SatAgg()
optimizer = torch.optim.Adam(N.parameters(), lr=0.001)

**Training!**

Finally, it's time to train our model.

In [None]:
epochs = 30

for epoch in range(epochs):
    # reset the training loss for every epoch
    train_loss = 0.0

    # start batching the data
    for batch_idx, (data, labels) in enumerate(train_dataloader):
        # transform the data points into logic variables
        x_L = ltn.Variable("x_L", data[torch.nonzero(labels)]) # positive examples
        x_not_L = ltn.Variable("x_not_L", data[torch.nonzero(torch.logical_not(labels))]) # negative examples

        # compute SAT(isfaction) level
        # the loss of our NN will depend on the ability of the network to satify the non-contradiction rules       
        sat_agg = SatAgg(
            Forall(x_L, N(x_L)),
            Forall(x_not_L, Not(N(x_not_L))))

        # compute loss and perform back-propagation
        # set the gradients to zero before starting the backpropagation of the training process
        # see this thread for more details: 
        # https://stackoverflow.com/questions/48001598/why-do-we-need-to-call-zero-grad-in-pytorch/48009142#48009142
        optimizer.zero_grad()
        loss = 1. - sat_agg
        loss.backward()
        optimizer.step()
        train_loss += loss.item()
        
        if batch_idx % 100 == 0:    # print every 100 mini-batches
            print(f'[{epoch + 1}, {batch_idx + 1:5d}] loss: {train_loss / 2000:.3f}')
            train_loss = 0.0
    

Let's now check the accuracy of our LTN model:

In [None]:
mean_accuracy = 0.0
# iterate over our data samples
for data, labels in test_dataloader:
    # get the predictions for the given samples
    predictions = N.model(data).detach().numpy()

    # convert the predictions to a binary classification (i.e., 0 or 1)
    predictions = np.where(predictions > 0.5, 1., 0.).flatten()

    # compute the accuracy_score
    mean_accuracy += accuracy_score(labels, predictions)

# get the mean accuracy
mean_accuracy / len(test_dataloader)

**Task 1**

Try to run the same Neural Network but without using the logical KB (you can use tensorflow if you prefer). How does it compare to the LTN solution in terms of accuracy and training speed?

In [None]:
# write here your code

**Task 2**

Implement a LTN for multi-label classification. 

* Option 1 (easy): Use the famous [Iris dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html) to predic the 3 classes (0 = Setosa, 1 = Versicolour, 2 = Virginica) of iris flowers.

* Option 2: Pick your favourite multi-label dataset.

In [None]:
# write here your code


# For option 1 uncomment the code below:
# from sklearn import datasets
# iris = datasets.load_iris()
# features = pd.DataFrame(iris.data, columns=iris.feature_names)
# print(features.head())
# label = pd.DataFrame(iris.target, columns=['label'])
# print(label.sample(5))