## Regression

Another important problem in Machine Learning is regression, where a relationship is estimated between one independent variable $X$ and a continuous dependent variable $Y$.
The essence of regression is, therefore, to approximate a function $f(x) = y$ by a function $f^*$, given examples $(x_i, y_i)$ such that $f(x_i) = y_i$.

In LTN one can model a regression task by defining $f^*$ as a learnable function whose parameter values $\theta$ are
constrained by data. Additionally, a regression task requires a notion of *equality*. We therefore define the
predicate $Eq$ as a smooth version of the symbol $=$ in order to turn the constraint $f(x_i) = y_i$ into a smooth
optimization problem.

In this example, we explore regression using a problem from a real estate dataset with 414 examples, each described in
terms of 6 real-numbered features:
- the transaction date (converted to a float);
- the age of the house;
- the distance to the nearest station;
- the number of convenience stores in the vicinity;
- the latitude and longitude coordinates.

The model has to predict the house price given these features in input.

For this specific task, LTN uses the following language and grounding:

**Domains:**
- $samples$, denoting the houses and their features;
- $prices$, denoting the house prices.

**Variables:**
- $x$ for the samples;
- $y$ for the prices;
- $D(x) = samples$;
- $D(y) = prices$.

**Functions:**
- $f^*(x)$: the regression function that has to be learned;
- $D_{in}(f^*) = samples$;
- $D_{out}(f^*) = prices$, where $D_{out}(.)$ is a function which returns the domain of the output of a given logical
function.

**Predicates:**
- $Eq(y_1, y_2)$: a smooth equality predicate which measures how similar $y_1$ and $y_2$ are;
- $D_{in}(Eq) = prices, prices$.

**Axioms:**

- $\forall Diag(x,y) \text{ } Eq(f^*(x), y)$: the output of $f^*$ should be equal to the ground truth $y$ for each
example $x$ given in input.

Notice again the use of $Diag$: when grounding $x$ and $y$ onto sequences of values, this is done by obeying a
one-to-one correspondence between the sequences. In other words, we aggregate pairs of corresponding samples and prices,
instead of any combination thereof.


**Grounding:**
- $\mathcal{G}(samples)=\mathbb{R}^{6}$, samples are described by 6 features;
- $\mathcal{G}(prices)=\mathbb{R}$;
- $\mathcal{G}(x) \in \mathbb{R}^{m \times 6}, \mathcal{G}(y) \in \mathbb{R}^{m \times 1}$. Notice that this specification
refers to the same number $m$ of examples for $x$ and $y$ due to the above one-to-one correspondence
obtained with the use of $Diag$;
- $\mathcal{G}(\mathrm{eq}(\mathbf{u}, \mathbf{v}))=\exp \left(-\alpha \sqrt{\sum_{j}\left(u_{j}-v_{j}\right)^{2}}\right)$,
where the hyper-parameter $\alpha$ is a real number that scales how strict the smooth equality is. In this example, we use
$\alpha = 0.05$. Intuitively, the smooth equality is $\operatorname{exp}(- \alpha d(\mathbf{u}, \mathbf{v}))$, where
$d(\mathbf{u}, \mathbf{v})$ is the Euclidean distance between $\mathbf{u}$ and $\mathbf{v}$. It produces a 1 if the
distance is zero; as the distance increases, the result decreases exponentially towards 0. In our example, $\mathbf{u}$
will be a vector containing the results of $f^*$ for $j$ samples, while $\mathbf{v}$ will contain the ground truths
associated to the $j$ samples. Our objective is to maximize the truth degree of this predicate;
- $\mathcal{G}(f^*(x) \mid \theta): \operatorname{MLP}_{\theta}(x)$, where $MLP_{\theta}$
is a Multi-Layer Perceptron which ends in one neuron corresponding to a price prediction (linear activation).

### Dataset

Now, let's import the dataset.

The real estate dataset has 414 examples with 6 features each. We subdivide the dataset into 330 training examples and
84 test examples.

## 回归

机器学习中的另一个重要问题是回归，即估计一个自变量 $X$ 与一个连续因变量 $Y$ 之间的关系。
回归的本质是通过给定的样本 $(x_i, y_i)$ 来逼近一个函数 $f(x) = y$，从而得到一个函数 $f^*$。

在LTN中，可以将回归任务建模为一个可学习的函数 $f^*$，其参数 $\theta$ 由数据约束。此外，回归任务需要“等式”的概念，因此我们定义了谓词 $Eq$ 作为符号 $=$ 的平滑版本，以将约束 $f(x_i) = y_i$ 转化为平滑的优化问题。

在这个例子中，我们使用一个房地产数据集，其中包含414个样本，每个样本有6个实数特征：
- 交易日期（转换为浮点数）；
- 房龄；
- 距离最近车站的距离；
- 附近便利店的数量；
- 经度和纬度坐标。

模型需要根据这些特征预测房价。

对于这个特定任务，LTN使用以下语言和基础：

**域：**
- $samples$，表示房屋及其特征；
- $prices$，表示房价。

**变量：**
- $x$ 代表样本；
- $y$ 代表价格；
- $D(x) = samples$；
- $D(y) = prices$。

**函数：**
- $f^*(x)$：需要学习的回归函数；
- $D_{in}(f^*) = samples$；
- $D_{out}(f^*) = prices$，其中 $D_{out}(.)$ 是一个返回给定逻辑函数输出域的函数。

**谓词：**
- $Eq(y_1, y_2)$：一个平滑等式谓词，用于衡量 $y_1$ 和 $y_2$ 的相似度；
- $D_{in}(Eq) = prices, prices$。

**公理：**

- $\forall Diag(x,y) \text{ } Eq(f^*(x), y)$：对于每个输入样本 $x$，$f^*$ 的输出应该等于真实值 $y$。

请注意，使用 $Diag$ 时，将 $x$ 和 $y$ 接地到一对一对应的值序列上。换句话说，我们聚合对应样本和价格的对，而不是任何组合。

**接地：**
- $\mathcal{G}(samples)=\mathbb{R}^{6}$，样本由6个特征描述；
- $\mathcal{G}(prices)=\mathbb{R}$；
- $\mathcal{G}(x) \in \mathbb{R}^{m \times 6}, \mathcal{G}(y) \in \mathbb{R}^{m \times 1}$。由于上述 $Diag$ 的使用，这个说明指的是相同数量的样本 $m$；
- $\mathcal{G}(\mathrm{eq}(\mathbf{u}, \mathbf{v}))=\exp \left(-\alpha \sqrt{\sum_{j}\left(u_{j}-v_{j}\right)^{2}}\right)$，其中超参数 $\alpha$ 是一个调整平滑等式严格程度的实数。在这个例子中，我们使用 $\alpha = 0.05$。直观上，平滑等式是 $\operatorname{exp}(- \alpha d(\mathbf{u}, \mathbf{v}))$，其中 $d(\mathbf{u}, \mathbf{v})$ 是 $\mathbf{u}$ 和 $\mathbf{v}$ 之间的欧氏距离。如果距离为零，则结果为1；随着距离增加，结果指数级减小到0。在我们的例子中，$\mathbf{u}$ 是包含 $f^*$ 对 $j$ 个样本结果的向量，而 $\mathbf{v}$ 包含与 $j$ 个样本相关的真实值。我们的目标是最大化这个谓词的真值度；
- $\mathcal{G}(f^*(x) \mid \theta): \operatorname{MLP}_{\theta}(x)$，其中 $MLP_{\theta}$ 是一个多层感知器，以一个对应于价格预测的神经元（线性激活）结束。

### 数据集

现在，让我们导入数据集。

房地产数据集有414个样本，每个样本有6个特征。我们将数据集分为330个训练样本和84个测试样本。

In [1]:
import torch
import pandas as pd

data = pd.read_csv("datasets/real-estate.csv")
data = data.sample(frac=1)  # shuffle # 打乱数据

x = torch.tensor(data[['X1 transaction date', 'X2 house age',
                     'X3 distance to the nearest MRT station',
                     'X4 number of convenience stores', 'X5 latitude', 'X6 longitude']].to_numpy()).float()

y = torch.tensor(data[['Y house price of unit area']].to_numpy()).float()

x_train, y_train = x[:330], y[:330]
x_test, y_test = x[330:], y[330:]

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


### LTN setting

In order to define our knowledge base (axioms), we need to define function $f$, predicate $Eq$,
universal quantifier, and the `SatAgg` operator.

For the quantifier, we use the stable product configuration (seen in the tutorials).

For function $f$, we use a simple $MLP$ with two hidden layers.

For predicate $Eq$, we use a lambda function which implements the *grounding* defined above.

`SatAgg` is defined using the `pMeanError` aggregator.

### LTN 设置

为了定义我们的知识库（公理），我们需要定义函数 $ f $、谓词 $ Eq $、全称量词和 `SatAgg` 运算符。

对于量词，我们使用稳定乘积配置（在教程中可以看到）。

对于函数 $ f $，我们使用一个简单的具有两个隐藏层的多层感知器（MLP）。

对于谓词 $ Eq $，我们使用一个实现上述*基础化*定义的 lambda 函数。

`SatAgg` 使用 `pMeanError` 聚合器定义。

In [3]:
import ltn

# we define function f # 定义函数 f
class MLP(torch.nn.Module):
    """
    This model returns the prediction of the price of an house given in input. The output is linear since we are applying the model to a regression problem.
    这个模型返回输入的房屋价格预测。由于我们将模型应用于回归问题，因此输出是线性的。
    """
    def __init__(self, layer_sizes=(6, 8, 8, 1)):
        super(MLP, self).__init__()
        self.elu = torch.nn.ELU()
        self.linear_layers = torch.nn.ModuleList([torch.nn.Linear(layer_sizes[i - 1], layer_sizes[i])
                                                  for i in range(1, len(layer_sizes))])

    def forward(self, x):
        """
        Method which defines the forward phase of the neural network for our regression task.

        :param x: the features of the example
        :return: prediction for example x
        定义神经网络前向传播阶段的方法，用于我们的回归任务。

        :param x: 示例的特征
        :return: 对示例 x 的预测
        """
        for layer in self.linear_layers[:-1]:
            x = self.elu(layer(x))
        out = self.linear_layers[-1](x)
        return out

# f = ltn.Function(MLP().to(ltn.device)) # 原来的代码会报错
f = ltn.Function(MLP().to(ltn.device))

# Equality Predicate - not trainable # 等式谓词 - 不可训练
alpha = 0.05
Eq = ltn.Predicate(func=lambda u, v: torch.exp(
    -alpha * torch.sqrt(torch.sum(torch.square(u - v), dim=1)))
)

# we define the universal quantifier and the SatAgg operator # 定义全称量词和 SatAgg 运算符
Forall = ltn.Quantifier(ltn.fuzzy_ops.AggregPMeanError(p=2), quantifier="f")
SatAgg = ltn.fuzzy_ops.SatAgg()

### Utils

Now, we need to define some utility classes and functions.

We define a standard PyTorch data loader, which takes as input the dataset and returns a generator of batches of data.
In particular, we need a data loader instance for training data and one for testing data.

Then, we define functions to evaluate the model performances. The model is evaluated on the test set using the following metrics:
- the satisfaction level of the knowledge base: measure the ability of LTN to satisfy the knowledge;
- the RMSE (Root Mean Squared Error): measure the quality of the predictions.

### 工具类

现在，我们需要定义一些实用的类和函数。

我们定义了一个标准的 PyTorch 数据加载器，它以数据集为输入并返回数据批次的生成器。
特别地，我们需要为训练数据和测试数据各定义一个数据加载器实例。

然后，我们定义了评估模型性能的函数。模型在测试集上使用以下指标进行评估：
- 知识库的满意度：衡量 LTN 满足知识的能力；
- 均方根误差 (RMSE)：衡量预测质量。

In [8]:
import numpy as np
from sklearn.metrics import mean_squared_error

# this is a standard PyTorch DataLoader to load the dataset for the training and testing of the model # 这是一个标准的 PyTorch DataLoader，用于加载数据集以训练和测试模型
class DataLoader(object):
    def __init__(self,
                 x,
                 y,
                 batch_size=1,
                 shuffle=True):
        self.x = x
        self.y = y
        self.batch_size = batch_size
        self.shuffle = shuffle

    def __len__(self):
        return int(np.ceil(self.x.shape[0] / self.batch_size))

    def __iter__(self):
        n = self.x.shape[0]
        idxlist = list(range(n))
        if self.shuffle:
            np.random.shuffle(idxlist)

        for _, start_idx in enumerate(range(0, n, self.batch_size)):
            end_idx = min(start_idx + self.batch_size, n)
            x = self.x[idxlist[start_idx:end_idx]]
            y = self.y[idxlist[start_idx:end_idx]]

            yield x, y

# define metrics for evaluation of the model # 定义用于评估模型的指标

# it computes the overall satisfaction level on the knowledge base using the given data loader (train or test) # 使用给定的数据加载器（训练或测试）计算知识库的整体满意度水平
def compute_sat_level(loader):
    mean_sat = 0
    for x_data, y_data in loader:
        x = ltn.Variable("x", x_data)
        y = ltn.Variable("y", y_data)
        mean_sat += Forall(ltn.diag(x, y), Eq(f(x), y)).value
    mean_sat /= len(loader)
    return mean_sat

# it computes the overall RMSE between the predictions and the ground truth, using the given data loader (train or test) # 使用给定的数据加载器（训练或测试）计算预测和真实值之间的整体 RMSE
def compute_rmse(loader):
    mean_rmse = 0.0
    for x, y in loader:
        # predictions = f.model(x).detach().numpy() # 原来的代码会报错
        predictions = f.model(x.to(ltn.device)).detach().cpu().numpy()
        mean_rmse += mean_squared_error(y, predictions, squared=False)
    return mean_rmse / len(loader)

# create train and test loader # 创建训练和测试加载器
train_loader = DataLoader(x_train, y_train, 64, shuffle=True)
test_loader = DataLoader(x_test, y_test, 64, shuffle=False)

### Learning

Let us define $D$ the data set of all examples. The objective function with $\mathcal{K}=\{\forall Diag(x,y) \text{ } Eq(f^*(x), y)\}$
is given by $\operatorname{SatAgg}_{\phi \in \mathcal{K}} \mathcal{G}_{\boldsymbol{\theta}, x \leftarrow \boldsymbol{D}}(\phi)$.

In practice, the optimizer uses the following loss function:

$\boldsymbol{L}=\left(1-\underset{\phi \in \mathcal{K}}{\operatorname{SatAgg}} \mathcal{G}_{\boldsymbol{\theta}, x \leftarrow \boldsymbol{B}}(\phi)\right)$

where $B$ is a mini batch sampled from $D$.

In the following, we learn our LTN in the regression task using the satisfaction of the knowledge base as
an objective. In other words, we want to learn the parameters $\theta$ of function $f^*$ in such a way the only
axiom in the knowledge base is maximally satisfied. We train our model for 500 epochs and use the `Adam` optimizer.

### 学习

让我们定义 $D$ 为所有示例的数据集。目标函数用 $\mathcal{K}=\{\forall Diag(x,y) \text{ } Eq(f^*(x), y)\}$ 表示，给出为 $\operatorname{SatAgg}_{\phi \in \mathcal{K}} \mathcal{G}_{\boldsymbol{\theta}, x \leftarrow \boldsymbol{D}}(\phi)$。

实际上，优化器使用以下损失函数：

$$
\boldsymbol{L}=\left(1-\underset{\phi \in \mathcal{K}}{\operatorname{SatAgg}} \mathcal{G}_{\boldsymbol{\theta}, x \leftarrow \boldsymbol{B}}(\phi)\right)
$$

其中 $B$ 是从 $D$ 中采样的小批量。

在下文中，我们在回归任务中使用知识库的满意度作为目标来学习我们的 LTN。换句话说，我们希望学习函数 $f^*$ 的参数 $\theta$，以使知识库中的唯一公理得到最大程度的满足。我们训练模型500个周期，使用 `Adam` 优化器。

In [9]:
optimizer = torch.optim.Adam(f.parameters(), lr=0.0005)

for epoch in range(500):
    train_loss = 0.0
    for batch_idx, (x_data, y_data) in enumerate(train_loader):
        optimizer.zero_grad()
        # ground the variables with current batch of data # 使用当前数据批次接地变量
        x = ltn.Variable("x", x_data)  # samples # 样本
        y = ltn.Variable("y", y_data)  # ground truths # 真实值
        sat_agg = Forall(ltn.diag(x, y), Eq(f(x), y)).value
        loss = 1. - sat_agg
        loss.backward()
        optimizer.step()
        train_loss += loss.item()
    train_loss = train_loss / len(train_loader)

    # we print metrics every 50 epochs of training # 我们在每50个训练周期打印指标
    if epoch % 50 == 0:
        print(" epoch %d | loss %.4f | Train Sat %.3f | Test Sat %.3f | Train RMSE %.3f | Test RMSE %.3f " %
              (epoch, train_loss, compute_sat_level(train_loader), compute_sat_level(test_loader),
                    compute_rmse(train_loader), compute_rmse(test_loader)))



 epoch 0 | loss 0.9987 | Train Sat 0.002 | Test Sat 0.002 | Train RMSE 140.747 | Test RMSE 144.584 




 epoch 50 | loss 0.2989 | Train Sat 0.701 | Test Sat 0.733 | Train RMSE 9.098 | Test RMSE 7.128 




 epoch 100 | loss 0.2860 | Train Sat 0.696 | Test Sat 0.743 | Train RMSE 9.362 | Test RMSE 6.900 




 epoch 150 | loss 0.2808 | Train Sat 0.715 | Test Sat 0.755 | Train RMSE 8.395 | Test RMSE 6.551 




 epoch 200 | loss 0.2990 | Train Sat 0.716 | Test Sat 0.755 | Train RMSE 8.853 | Test RMSE 6.614 




 epoch 250 | loss 0.2906 | Train Sat 0.715 | Test Sat 0.751 | Train RMSE 8.678 | Test RMSE 6.714 




 epoch 300 | loss 0.2977 | Train Sat 0.721 | Test Sat 0.759 | Train RMSE 9.691 | Test RMSE 6.490 




 epoch 350 | loss 0.2683 | Train Sat 0.720 | Test Sat 0.761 | Train RMSE 8.639 | Test RMSE 6.415 




 epoch 400 | loss 0.2878 | Train Sat 0.734 | Test Sat 0.756 | Train RMSE 8.167 | Test RMSE 6.599 




 epoch 450 | loss 0.2709 | Train Sat 0.728 | Test Sat 0.755 | Train RMSE 8.899 | Test RMSE 6.653 


Notice that variables $x$ and $y$ are grounded batch by batch with new data arriving from the data loader. This is exactly what
we mean with $\mathcal{G}_{x \leftarrow \boldsymbol{B}}(\phi(x))$, where $B$ is a mini-batch sampled by the data loader.

Notice also that `SatAgg` is defined by one single axiom and returns the truth value corresponding to the evaluation
of the axiom.

Note that after 300 epochs the test RMSE is around 7, while at the beginning of the training it was around 62. This shows
the power of LTN in learning the regression task only using the satisfaction of a knowledge base as an objective.


注意，变量 $ x $ 和 $ y $ 是通过数据加载器逐批进行基础化的。这正是我们所说的 $\mathcal{G}_{x \leftarrow \boldsymbol{B}}(\phi(x))$，其中 $ B $ 是由数据加载器抽样的一个小批次。

还要注意，`SatAgg` 由一个单一公理定义，并返回对应于该公理评估的真值。

请注意，经过 300 个周期后，测试 RMSE 约为 7，而在训练开始时约为 62。这表明 LTN 在仅使用知识库的满意度作为目标时学习回归任务的强大能力。