## Multi-class multi-label classification

We now turn to multi-label classification, whereby multiple labels can be assigned to each example. As a first example
of the reach of LTNs, we shall see how the previous example can be extended naturally using LTN to account for multiple
labels, which is not always a trivial extension for most ML algorithms.

The standard approach to the multi-label problem is to provide explicit negative examples for each class. By contrast,
LTN can use background knowledge to relate classes directly to each other, thus becoming a powerful tool in the case of
the multi-label problem, where typically the labelled data is scarce.

We explore the Leptograpsus crabs dataset, consisting of 200 examples of 5 morphological measurements of 50 crabs.
The task is to classify the crabs according to their colour and sex. There are four labels: blue, orange, male, and female.
The colour labels are mutually-exclusive, and so are the labels for sex. LTN will be used to specify such information
logically.

For this specific task, LTN uses the following language and grounding:

**Domains:**
- $items$, denoting the examples from the crabs data set;
- $labels$, denoting the class labels.

**Variables:**
- $x_{blue}, x_{orange}, x_{male}, x_{female}$ for the positive examples of each class;
- $x$, used to denote all the examples;
- $D(x_{blue}) = D(x_{orange}) = D(x_{male}) = D(x_{female}) = D(x) = items$.

**Constants:**
- $l_{blue}, l_{orange}, l_{male}, l_{female}$: the labels of each class;
- $D(l_{blue}) = D(l_{orange}) = D(l_{male}) = D(l_{female}) = labels$.

**Predicates:**
- $P(x,l)$ denoting the fact that item $x$ is labelled as $l$;
- $D_{in}(P) = items,labels$.

**Axioms:**

- $\forall x_{blue} P(x_{blue}, l_{blue})$: all the examples coloured by blue should have label $l_{blue}$;
- $\forall x_{orange} P(x_{orange}, l_{orange})$: all the examples coloured by orange should have label $l_{orange}$;
- $\forall x_{male} P(x_{male}, l_{male})$: all the examples that are males should have label $l_{male}$;
- $\forall x_{female} P(x_{female}, l_{female})$: all the examples that are females should have label $l_{female}$;
- $\forall x \lnot (P(x, l_{blue}) \land P(x, l_{orange}))$: if an example $x$ is labelled as blue, it cannot be labelled
as orange too;
- $\forall x \lnot (P(x, l_{male}) \land P(x, l_{female}))$: if an example $x$ is labelled as male, it cannot be labelled
as female too.

Notice how the last two logical rules represent the mutual exclusion of the labels on colour and sex, respectively.
As a result, negative examples are not used explicitly in this specification.


**Grounding:**
- $\mathcal{G}(items)=\mathbb{R}^{5}$, items are described by 5 features;
- $\mathcal{G}(labels)=\mathbb{N}^{4}$, we use a one-hot encoding to represent labels;
- $\mathcal{G}(x_{blue}) \in \mathbb{R}^{m_1 \times 5}, \mathcal{G}(x_{orange}) \in \mathbb{R}^{m_2 \times 5},\mathcal{G}(x_{male}) \in \mathbb{R}^{m_3 \times 5},\mathcal{G}(x_{female}) \in \mathbb{R}^{m_4 \times 5}$.
These sequences are not mutually-exclusive, one example can for instance be in both $x_{blue}$ and $x_{male}$;
- $\mathcal{G}(x) \in \mathbb{R}^{m \times 5}$, that is, $\mathcal{G}(x)$ is a sequence of all the examples;
- $\mathcal{G}(l_{blue}) = [1, 0, 0, 0]$, $\mathcal{G}(l_{orange}) = [0, 1, 0, 0]$, $\mathcal{G}(l_{male}) = [0, 0, 1, 0]$, $\mathcal{G}(l_{female}) = [0, 0, 0, 1]$;
- $\mathcal{G}(P \mid \theta): x,l \mapsto l^\top \cdot \sigma\left(\operatorname{MLP}_{\theta}(x)\right)$, where $MLP$
has four output neurons corresponding to as many labels, and $\cdot$ denotes the dot product as a way of selecting an
output for $\mathcal{G}(P \mid \theta)$. In fact, multiplying the $MLP$’s output by the one-hot vector $l^\top$ gives the probability
corresponding to the label denoted by $l$. By contrast with the previous example, notice the use of a *sigmoid* function instead of a *softmax* function. We need that because labels are not mutually exclusive anymore.


### Dataset

Now, let's import the dataset.

The Leptograpsus crabs dataset consists of 200 examples. Every example is represented by 5 features. The dataset
is subdivided into train and test set. In particular, we use 160 examples for training and 40 for test.

## 多类别多标签分类

我们现在转向多标签分类，其中每个示例可以分配多个标签。作为LTN（逻辑张量网络）应用的第一个例子，我们将看到如何自然地使用LTN扩展之前的示例以处理多个标签，这对于大多数机器学习算法来说并不是一个简单的扩展。

解决多标签问题的标准方法是为每个类别提供明确的负样本。相比之下，LTN可以**使用背景知识直接将类别彼此关联**，从而在**标签数据稀少**的情况下成为一种强大的工具。

我们将探索Leptograpsus蟹数据集，该数据集包含50只蟹的5个形态测量值的200个示例。任务是根据蟹的颜色和性别对其进行分类。有四个标签：蓝色、橙色、雄性和雌性。颜色标签是互斥的，性别标签也是如此。LTN将用于逻辑上指定这些信息。

对于这个特定任务，LTN使用以下语言和基础：

**域：**
- $items$，表示蟹数据集中的示例；
- $labels$，表示类别标签。

**变量：**
- $x_{blue}, x_{orange}, x_{male}, x_{female}$ 表示每个类别的正例；
- $x$，用于表示所有示例；
- $D(x_{blue}) = D(x_{orange}) = D(x_{male}) = D(x_{female}) = D(x) = items$。

**常量：**
- $l_{blue}, l_{orange}, l_{male}, l_{female}$：每个类别的标签；
- $D(l_{blue}) = D(l_{orange}) = D(l_{male}) = D(l_{female}) = labels$。

**谓词：**
- $P(x,l)$ 表示项 $x$ 被标记为 $l$；
- $D_{in}(P) = items,labels$。

**公理：**

- $\forall x_{blue} P(x_{blue}, l_{blue})$：所有标记为蓝色的示例应具有标签 $l_{blue}$；
- $\forall x_{orange} P(x_{orange}, l_{orange})$：所有标记为橙色的示例应具有标签 $l_{orange}$；
- $\forall x_{male} P(x_{male}, l_{male})$：所有雄性示例应具有标签 $l_{male}$；
- $\forall x_{female} P(x_{female}, l_{female})$：所有雌性示例应具有标签 $l_{female}$；
- $\forall x \lnot (P(x, l_{blue}) \land P(x, l_{orange}))$：如果示例 $x$ 被标记为蓝色，则不能被标记为橙色；
- $\forall x \lnot (P(x, l_{male}) \land P(x, l_{female}))$：如果示例 $x$ 被标记为雄性，则不能被标记为雌性。

注意最后两条逻辑规则分别表示颜色和性别标签的互斥性。因此，在这个规范中没有显式使用负样本。

**基础：**
- $\mathcal{G}(items)=\mathbb{R}^{5}$，项由5个特征描述；
- $\mathcal{G}(labels)=\mathbb{N}^{4}$，我们使用独热编码来表示标签；
- $\mathcal{G}(x_{blue}) \in \mathbb{R}^{m_1 \times 5}, \mathcal{G}(x_{orange}) \in \mathbb{R}^{m_2 \times 5},\mathcal{G}(x_{male}) \in \mathbb{R}^{m_3 \times 5},\mathcal{G}(x_{female}) \in \mathbb{R}^{m_4 \times 5}$。这些序列不是互斥的，例如，一个示例可以同时在 $x_{blue}$ 和 $x_{male}$ 中；
- $\mathcal{G}(x) \in \mathbb{R}^{m \times 5}$，即 $\mathcal{G}(x)$ 是所有示例的序列；
- $\mathcal{G}(l_{blue}) = [1, 0, 0, 0]$，$\mathcal{G}(l_{orange}) = [0, 1, 0, 0]$，$\mathcal{G}(l_{male}) = [0, 0, 1, 0]$，$\mathcal{G}(l_{female}) = [0, 0, 0, 1]$；
- $\mathcal{G}(P \mid \theta): x,l \mapsto l^\top \cdot \sigma\left(\operatorname{MLP}_{\theta}(x)\right)$，其中 $MLP$ 有四个输出神经元，对应于多个标签，$\cdot$ 表示点积作为选择 $\mathcal{G}(P \mid \theta)$ 的输出的方法。实际上，将 $MLP$ 的输出与独热向量 $l^\top$ 相乘会得到对应于标签 $l$ 的概率。与前一个例子相比，注意这里使用了 *sigmoid* 函数而不是 *softmax* 函数。我们需要这样做，因为标签不再是互斥的。

### 数据集

现在，让我们导入数据集。

Leptograpsus蟹数据集包含200个示例。每个示例由5个特征表示。数据集被分为训练集和测试集。特别是，我们使用160个示例进行训练，40个示例进行测试。

In [1]:
import torch
import pandas as pd

df = pd.read_csv("datasets/crabs.dat", sep=" ", skipinitialspace=True) # sep=" "表示以空格分隔，skipinitialspace=True表示跳过初始空格 # 返回值类型是DataFrame # skipinitialspace：是否跳过字段值前面的空格。
df = df.sample(frac=1)  # shuffle dataset # 打乱数据集 # 返回值类型是DataFrame # frac：抽样比例，1 表示打乱所有数据。
df = df.replace({'B': 0, 'O': 1, 'M': 2, 'F': 3})

features = torch.tensor(df[['FL', 'RW', 'CL', 'CW', 'BD']].to_numpy())
labels_sex = torch.tensor(df['sex'].to_numpy())
labels_color = torch.tensor(df['sp'].to_numpy())

train_data = features[:160].float()
test_data = features[160:].float()
train_sex_labels = labels_sex[:160].long()
test_sex_labels = labels_sex[160:].long()
train_color_labels = labels_color[:160].long()
test_color_labels = labels_color[160:].long()

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd
  df = df.replace({'B': 0, 'O': 1, 'M': 2, 'F': 3})


### LTN setting

In order to define our knowledge base (axioms), we need to define predicate $P$, constants $l_{blue}$, $l_{orange}$, $l_{male}$,
$l_{female}$, connectives, universal quantifier, and the `SatAgg` operator.

For the connectives and quantifier, we use the stable product configuration (seen in the tutorials).

For predicate $P$, we have two models. The first one implements an $MLP$ which outputs the logits for the four classes of
the dataset, given an example $x$ in input. The second model takes as input a labelled example $(x,l)$, it computes the logits
using the first model and then returns the prediction (*sigmoid*) for class $l$.

The constants $l_{blue}$, $l_{orange}$, $l_{male}$, and $l_{female}$, represent the one-hot labels for the four classes, as we have already seen in the
definition of the grounding for this task.

`SatAgg` is defined using the `pMeanError` aggregator.

### LTN 设置

为了定义我们的知识库（公理），我们需要定义谓词 $P$、常量 $l_{blue}$、$l_{orange}$、$l_{male}$、$l_{female}$、连接词、全称量词以及 `SatAgg` 操作符。

对于连接词和量词，我们使用稳定乘积配置（在教程中已经见过）。

对于谓词 $P$，我们有两种模型。第一种模型实现了一个 MLP（多层感知器），它输出数据集中四个类别的logits，输入为一个示例 $x$。第二种模型输入一个标记示例 $(x, l)$，它使用第一种模型来计算logits，然后返回类别 $l$ 的预测值（*sigmoid*）。

常量 $l_{blue}$、$l_{orange}$、$l_{male}$ 和 $l_{female}$ 表示四个类别的独热标签（one-hot labels），正如我们在该任务的基础定义中所见。

`SatAgg` 使用 `pMeanError` 聚合器定义。

In [3]:
import ltn

# we define the constants # 定义常量
l_blue = ltn.Constant(torch.tensor([1, 0, 0, 0]))
l_orange = ltn.Constant(torch.tensor([0, 1, 0, 0]))
l_male = ltn.Constant(torch.tensor([0, 0, 1, 0]))
l_female = ltn.Constant(torch.tensor([0, 0, 0, 1]))

# we define predicate P # 定义谓词P
class MLP(torch.nn.Module):
    """
    This model returns the logits for the classes given an input example. It does not compute the softmax, so the output
    are not normalized.
    This is done to separate the accuracy computation from the satisfaction level computation. Go through the example
    to understand it.
    该模型在给定输入示例的情况下返回类别的对数值（logits）。它不计算 softmax，因此输出未归一化。
    这样做是为了将准确性计算与满意度水平计算分开。请通读示例以理解这一点。
    """
    def __init__(self, layer_sizes=(5, 16, 16, 8, 4)):
        super(MLP, self).__init__()
        self.elu = torch.nn.ELU()
        self.dropout = torch.nn.Dropout(0.2)
        self.linear_layers = torch.nn.ModuleList([torch.nn.Linear(layer_sizes[i - 1], layer_sizes[i]) for i in range(1, len(layer_sizes))])

    def forward(self, x, training=False):
        """
        Method which defines the forward phase of the neural network for our multi class classification task.
        In particular, it returns the logits for the classes given an input example.

        :param x: the features of the example
        :param training: whether the network is in training mode (dropout applied) or validation mode (dropout not applied)
        :return: logits for example x
        定义神经网络前向传播阶段的方法，用于我们的多类别分类任务。
        特别地，它返回给定输入示例的各类别的对数值（logits）。
        
        :param x: 示例的特征
        :param training: 指示网络是否处于训练模式（应用 dropout）或验证模式（不应用 dropout）
        :return: 示例 x 的对数值
        """
        for layer in self.linear_layers[:-1]:
            x = self.elu(layer(x))
            if training:
                x = self.dropout(x)
        logits = self.linear_layers[-1](x)
        return logits


class LogitsToPredicate(torch.nn.Module):
    """
    This model has inside a logits model, that is a model which compute logits for the classes given an input example x.
    The idea of this model is to keep logits and probabilities separated. The logits model returns the logits for an example,
    while this model returns the probabilities given the logits model.

    In particular, it takes as input an example x and a class label l. It applies the logits model to x to get the logits.
    Then, it applies a softmax（这里应该是笔误，应该是*sigmoid*） function to get the probabilities per classes. Finally, it returns only the probability related
    to the given class l.
    这个模型内部包含一个对数值（logits）模型，即一个给定输入示例 $x$ 计算类别对数值的模型。这个模型的理念是将对数值和概率分开。对数值模型返回一个示例的对数值，而这个模型则在给定对数值模型的情况下返回概率。
    
    具体来说，它的输入是一个示例 $x$ 和一个类别标签 $l$。它将对数值模型应用于 $x$ 以获得对数值。然后，它应用 softmax（这里应该是笔误，应该是*sigmoid*） 函数以获得各类别的概率。最后，它只返回与给定类别 $l$ 相关的概率。
    """
    def __init__(self, logits_model):
        super(LogitsToPredicate, self).__init__()
        self.logits_model = logits_model
        self.sigmoid = torch.nn.Sigmoid()

    def forward(self, x, l, training=False):
        logits = self.logits_model(x, training=training)
        probs = self.sigmoid(logits) # 将 logits 通过 Sigmoid 激活函数转换为概率。
        out = torch.sum(probs * l, dim=1)
        return out

# mlp = MLP() # 原来的写法会报错
# P = ltn.Predicate(LogitsToPredicate(mlp)) # 原来的写法会报错
mlp = MLP().to(ltn.device)
P = ltn.Predicate(LogitsToPredicate(mlp).to(ltn.device))

# we define the connectives, quantifiers, and the SatAgg # 定义连接词、量词和SatAgg
Not = ltn.Connective(ltn.fuzzy_ops.NotStandard())
And = ltn.Connective(ltn.fuzzy_ops.AndProd())
Forall = ltn.Quantifier(ltn.fuzzy_ops.AggregPMeanError(p=2), quantifier="f")
SatAgg = ltn.fuzzy_ops.SatAgg()

### Utils

Now, we need to define some utility classes and functions.

We define a standard PyTorch data loader, which takes as input the dataset and returns a generator of batches of data.
In particular, we need a data loader instance for training data and one for testing data.

Then, we define functions to evaluate the model performances. The model is evaluated on the test set using the following metrics:
- the satisfaction level of the knowledge base: measure the ability of LTN to satisfy the knowledge;
- the classification accuracy: this time, the accuracy is defined as $1 - HL$, where $HL$ is the average Hamming loss,
i.e. the fraction of labels predicted incorrectly, with a classification threshold of 0.5 (given an example $u$,
if the model outputs a value greater than 0.5 for class $C$ then $u$ is deemed as belonging to class $C$).

### 工具类和函数

现在，我们需要定义一些工具类和函数。

我们定义了一个标准的 PyTorch 数据加载器，它接受数据集作为输入，并返回一个数据批次生成器。
特别是，我们需要为训练数据和测试数据分别定义一个数据加载器实例。

然后，我们定义一些函数来评估模型的性能。模型使用以下指标在测试集上进行评估：
- 知识库的满足度：衡量 LTN 满足知识的能力；
- 分类准确率：这次，准确率定义为 $1 - HL$，其中 $HL$ 是平均汉明损失，即预测错误标签的比例，分类阈值为 0.5（给定一个示例 $u$，如果模型对类 $C$ 输出的值大于 0.5，那么 $u$ 被视为属于类 $C$）。

In [5]:
from sklearn.metrics import accuracy_score
import numpy as np

class DataLoader(object):
    def __init__(self,
                 data,
                 labels,
                 batch_size=1,
                 shuffle=True):
        self.data = data
        self.labels_sex = labels[0]
        self.labels_color = labels[1]
        self.batch_size = batch_size
        self.shuffle = shuffle

    def __len__(self):
        return int(np.ceil(self.data.shape[0] / self.batch_size))

    def __iter__(self):
        n = self.data.shape[0]
        idxlist = list(range(n))
        if self.shuffle:
            np.random.shuffle(idxlist)

        for _, start_idx in enumerate(range(0, n, self.batch_size)):
            end_idx = min(start_idx + self.batch_size, n)
            data = self.data[idxlist[start_idx:end_idx]]
            labels_sex = self.labels_sex[idxlist[start_idx:end_idx]]
            labels_color = self.labels_color[idxlist[start_idx:end_idx]]

            yield data, labels_sex, labels_color


# define metrics for evaluation of the model # 定义评估模型的指标

# it computes the overall satisfaction level on the knowledge base using the given data loader (train or test) # 使用给定的数据加载器（训练或测试）计算知识库的整体满意度
def compute_sat_level(loader):
    mean_sat = 0
    for data, labels_sex, labels_color in loader:
        x = ltn.Variable("x", data)
        x_blue = ltn.Variable("x_blue", data[labels_color == 0])
        x_orange = ltn.Variable("x_orange", data[labels_color == 1])
        x_male = ltn.Variable("x_male", data[labels_sex == 2])
        x_female = ltn.Variable("x_female", data[labels_sex == 3])
        mean_sat += SatAgg(
            Forall(x_blue, P(x_blue, l_blue)),
            Forall(x_orange, P(x_orange, l_orange)),
            Forall(x_male, P(x_male, l_male)),
            Forall(x_female, P(x_female, l_female)),
            Forall(x, Not(And(P(x, l_blue), P(x, l_orange)))),
            Forall(x, Not(And(P(x, l_male), P(x, l_female))))
        )
    mean_sat /= len(loader)
    return mean_sat

# it computes the overall accuracy of the predictions of the trained model using the given data loader # 使用给定的数据加载器计算训练模型的预测的整体准确性
# (train or test) # （训练或测试）
def compute_accuracy(loader, threshold=0.5): # threshold的含义是分类阈值，即大于0.5则判定为属于该类别
    mean_accuracy = 0.0
    for data, labels_sex, labels_color in loader:
        # predictions = mlp(data).detach().numpy() # 原来的写法会报错
        predictions = mlp(data.to(ltn.device)).detach().cpu().numpy()
        labels_male = (labels_sex == 2) # 将标签转换为布尔值
        labels_female = (labels_sex == 3)
        labels_blue = (labels_color == 0)
        labels_orange = (labels_color == 1)
        onehot = np.stack([labels_blue, labels_orange, labels_male, labels_female], axis=-1).astype(np.int32) # axis=-1：在最后一个轴上堆叠。最终实现了将布尔标签转换为独热编码格式。 # astype(np.int32)：转换为 32 位整数。
        # 将预测值二值化，并转换为整数。
        predictions = predictions > threshold # 比较运算符，返回布尔值。
        predictions = predictions.astype(np.int32)
        nonzero = np.count_nonzero(onehot - predictions, axis=-1).astype(np.float32) # np.count_nonzero：计算数组中非零元素的数量。axis=-1：沿最后一个轴计算。astype(np.float32)：转换为 32 位浮点数。 # 这里计算非0的个数，因为差可能是1，也可能是-1
        multilabel_hamming_loss = nonzero / predictions.shape[-1] # predictions.shape[-1]：预测数组的最后一个维度的大小。
        mean_accuracy += np.mean(1 - multilabel_hamming_loss)
        # np.mean(1 - multilabel_hamming_loss)：这里计算是先把每一行（或者说每一个螃蟹个体）的汉明损失计算出来，然后对所有螃蟹求平均值。其实和之间用所有的非0个数加起来，一起计算汉明损失是一样的。具体在ex3.md中，## 1有解释。

    return mean_accuracy / len(loader)

# create train and test loader # 创建训练和测试加载器
train_loader = DataLoader(train_data, (train_sex_labels, train_color_labels), 64, shuffle=True) # 传递给DataLoader的第二个参数是元组，元组的第一个元素是性别标签，第二个元素是颜色标签
test_loader = DataLoader(test_data, (test_sex_labels, test_color_labels), 64, shuffle=False)

### Learning

Let us define $D$ the data set of all examples. The objective function is given by $\operatorname{SatAgg}_{\phi \in \mathcal{K}} \mathcal{G}_{\boldsymbol{\theta}, x \leftarrow \boldsymbol{D}}(\phi)$.

In practice, the optimizer uses the following loss function:

$\boldsymbol{L}=\left(1-\underset{\phi \in \mathcal{K}}{\operatorname{SatAgg}} \mathcal{G}_{\boldsymbol{\theta}, x \leftarrow \boldsymbol{B}}(\phi)\right)$

where $B$ is a mini batch sampled from $D$.

### Querying

To illustrate the learning of constraints by LTN, we have queried three formulas
that were not explicitly part of the knowledge base, over time during learning:
- $\phi_1: \forall x (P(x, l_{blue}) \implies \lnot P(x, l_{orange}))$;
- $\phi_2: \forall x (P(x, l_{blue}) \implies P(x, l_{orange}))$;
- $\phi_3: \forall x (P(x, l_{blue}) \implies P(x, l_{male}))$.

For querying, we use $p=5$ when approximating the universal quantifiers with
`pMeanError`. A higher $p$ denotes a stricter universal quantification with a stronger
focus on outliers. We should expect $\phi_1$ to hold true (every
blue crab cannot be orange and vice-versa), and we should expect $\phi_2$ (every blue crab is also orange) and $\phi_3$
(every blue crab is male) to be false.

In the following, we define some functions computing the three formulas and the implication connective, since we need it
to define the formulas.

### 学习

我们定义 $D$ 为所有示例的数据集。目标函数由 $\operatorname{SatAgg}_{\phi \in \mathcal{K}} \mathcal{G}_{\boldsymbol{\theta}, x \leftarrow \boldsymbol{D}}(\phi)$ 给出。

在实际操作中，优化器使用以下损失函数：

$$
\boldsymbol{L}=\left(1-\underset{\phi \in \mathcal{K}}{\operatorname{SatAgg}} \mathcal{G}_{\boldsymbol{\theta}, x \leftarrow \boldsymbol{B}}(\phi)\right)
$$

其中 $B$ 是从 $D$ 中采样的一个小批量。

### 查询

为了说明 LTN 学习约束的过程，我们在学习过程中查询了三个不显式包含在知识库中的公式：
- $\phi_1: \forall x (P(x, l_{blue}) \implies \lnot P(x, l_{orange}))$;
- $\phi_2: \forall x (P(x, l_{blue}) \implies P(x, l_{orange}))$;
- $\phi_3: \forall x (P(x, l_{blue}) \implies P(x, l_{male}))$。

在查询时，我们使用 $p=5$ 来通过 `pMeanError` 近似全称量词。较高的 $p$ 表示更严格的全称量化，更多地关注离群值。我们期望 $\phi_1$ 为真（每只蓝蟹不能是橙色的，反之亦然），并且我们期望 $\phi_2$（每只蓝蟹也是橙色的）和 $\phi_3$（每只蓝蟹是雄性的）为假。

在下文中，我们定义了一些计算三个公式和蕴涵连接词的函数，因为我们需要这些函数来定义公式。

In [6]:
Implies = ltn.Connective(ltn.fuzzy_ops.ImpliesReichenbach())

def phi1(features):
    x = ltn.Variable("x", features)
    return Forall(x, Implies(P(x, l_blue), Not(P(x, l_orange))), p=5)

def phi2(features):
    x = ltn.Variable("x", features)
    return Forall(x, Implies(P(x, l_blue), P(x, l_orange)), p=5)

def phi3(features):
    x = ltn.Variable("x", features)
    return Forall(x, Implies(P(x, l_blue), P(x, l_male)), p=5)

# it computes the satisfaction level of a formula phi using the given data loader (train or test) # 使用给定的数据加载器（训练或测试）计算公式phi的满意度
def compute_sat_level_phi(loader, phi):
    mean_sat = 0
    for features, _, _ in loader:
        mean_sat += phi(features).value
    mean_sat /= len(loader)
    return mean_sat # 计算得到对于这个dataloader中的数据，对于指定的公式phi，每个batch的平均的满意度

In the following, we learn our LTN in the multi-class multi-label classification task using the satisfaction of the knowledge base as
an objective. In other words, we want to learn the parameters $\theta$ of binary predicate $P$ in such a way the three
axioms in the knowledge base are maximally satisfied. We train our model for 500 epochs and use the `Adam` optimizer.

下面，我们在多类别多标签分类任务中使用知识库的满足度作为目标来学习我们的逻辑张量网络（LTN）。换句话说，我们希望以使知识库中的三个公理最大程度上得到满足的方式来学习二元谓词 $P$ 的参数 $\theta$。我们训练模型500个epoch，并使用 `Adam` 优化器。

In [7]:
optimizer = torch.optim.Adam(P.parameters(), lr=0.001)

for epoch in range(500):
    train_loss = 0.0
    for batch_idx, (data, labels_sex, labels_color) in enumerate(train_loader):
        optimizer.zero_grad()
        # we ground the variables with current batch data # 我们使用当前批次数据对变量进行实例化
        x = ltn.Variable("x", data)
        x_blue = ltn.Variable("x_blue", data[labels_color == 0])
        x_orange = ltn.Variable("x_orange", data[labels_color == 1])
        x_male = ltn.Variable("x_male", data[labels_sex == 2])
        x_female = ltn.Variable("x_female", data[labels_sex == 3])
        sat_agg = SatAgg(
            Forall(x_blue, P(x_blue, l_blue)),
            Forall(x_orange, P(x_orange, l_orange)),
            Forall(x_male, P(x_male, l_male)),
            Forall(x_female, P(x_female, l_female)),
            Forall(x, Not(And(P(x, l_blue), P(x, l_orange)))),
            Forall(x, Not(And(P(x, l_male), P(x, l_female))))
        )
        loss = 1. - sat_agg
        loss.backward()
        optimizer.step()
        train_loss += loss.item()
    train_loss = train_loss / len(train_loader)

    # we print metrics every 20 epochs of training # 每20个epoch打印指标
    if epoch % 20 == 0:
        print(" epoch %d | loss %.4f | Train Sat %.3f | Test Sat %.3f | Train Acc %.3f | Test Acc %.3f | "
                        "Test Sat Phi 1 %.3f | Test Sat Phi 2 %.3f | Test Sat Phi 3 %.3f " %
              (epoch, train_loss, compute_sat_level(train_loader),
                        compute_sat_level(test_loader),
                        compute_accuracy(train_loader), compute_accuracy(test_loader),
                        compute_sat_level_phi(test_loader, phi1), compute_sat_level_phi(test_loader, phi2),
                        compute_sat_level_phi(test_loader, phi3)))

 epoch 0 | loss 0.3986 | Train Sat 0.605 | Test Sat 0.606 | Train Acc 0.478 | Test Acc 0.438 | Test Sat Phi 1 0.512 | Test Sat Phi 2 0.708 | Test Sat Phi 3 0.685 
 epoch 20 | loss 0.3702 | Train Sat 0.630 | Test Sat 0.631 | Train Acc 0.531 | Test Acc 0.506 | Test Sat Phi 1 0.541 | Test Sat Phi 2 0.744 | Test Sat Phi 3 0.753 
 epoch 40 | loss 0.3491 | Train Sat 0.652 | Test Sat 0.651 | Train Acc 0.798 | Test Acc 0.775 | Test Sat Phi 1 0.587 | Test Sat Phi 2 0.755 | Test Sat Phi 3 0.775 
 epoch 60 | loss 0.2753 | Train Sat 0.728 | Test Sat 0.718 | Train Acc 0.931 | Test Acc 0.913 | Test Sat Phi 1 0.668 | Test Sat Phi 2 0.576 | Test Sat Phi 3 0.730 
 epoch 80 | loss 0.2107 | Train Sat 0.796 | Test Sat 0.776 | Train Acc 0.948 | Test Acc 0.913 | Test Sat Phi 1 0.743 | Test Sat Phi 2 0.405 | Test Sat Phi 3 0.624 
 epoch 100 | loss 0.1376 | Train Sat 0.852 | Test Sat 0.849 | Train Acc 0.986 | Test Acc 0.975 | Test Sat Phi 1 0.840 | Test Sat Phi 2 0.252 | Test Sat Phi 3 0.469 
 epoch 120 | los

Notice that variables $x_{blue}$, $x_{orange}$, $x_{male}$, and $x_{female}$ are grounded batch by batch with new data
arriving from the data loader. This is exactly what
we mean with $\mathcal{G}_{x \leftarrow \boldsymbol{B}}(\phi(x))$, where $B$ is a mini-batch sampled by the data loader.

Notice also that `SatAgg` takes as input the four axioms and returns one truth value which can be interpreted as the satisfaction
level of the knowledge base.

Note that after 100 epochs the test accuracy is around 1. This shows the power of LTN in learning
the multi-class multi-label classification task only using the satisfaction of a knowledge base as an objective.

At the beginning of the training, the truth values of $\phi_1$, $\phi_2$, and $\phi_3$ are non-informative. Instead, during
training, one can see a trend towards the satisfaction of $\phi_1$, and an opposite trend for $\phi_2$ and $\phi_3$, as expected.
This shows the ability of LTN to query and reason on never-seen formulas.


当然，以下是您提供的内容的中文翻译：

请注意，变量 $ x_{blue} $, $ x_{orange} $, $ x_{male} $, 和 $ x_{female} $ 是通过数据加载器逐批接收新数据进行基础化的。这正是我们所指的 $\mathcal{G}_{x \leftarrow \boldsymbol{B}}(\phi(x))$，其中 $B$ 是由数据加载器采样的小批量数据。

还请注意，`SatAgg` 接受四个公理作为输入并返回一个真值，该真值可以解释为知识库的满意度水平。

请注意，在经过 100 轮训练后，测试准确率约为 1。这显示了 LTN 在仅使用知识库满意度作为目标来学习多类别多标签分类任务中的强大能力。

在训练开始时，$\phi_1$, $\phi_2$, 和 $\phi_3$ 的真值是无信息的。相反，在训练过程中，可以看到 $\phi_1$ 的满意度呈上升趋势，而 $\phi_2$ 和 $\phi_3$ 的满意度则呈下降趋势，正如预期的那样。这显示了 LTN 查询和推理从未见过的公式的能力。

我们在训练的时候，使用了一些公式，我们后来随着训练的不断推进，在这个过程中，不断去查询（query）一些新的公式（phi 1，2，3），这些公式模型是没有见过的，但是我们训练使得模型学习到了知识库中已有的公式，根据这些已有的公式大概是可以推导出这些新的公式应该是真还是假的，所以我们可以看到，随着训练的不断推进，这些新的公式的满意度也在不断变化，该变大的在变大（即该公式在知识库的情况下，应该是真的），该变小的在变小（即该公式在知识库的情况下，应该是假的）。这说明了LTN的强大能力。