In [2]:
import torch

In [4]:
x = torch.tensor([float("nan"), float("inf"), -float("inf"), 3.14])
torch.nan_to_num(x)

tensor([ 0.0000e+00,  3.4028e+38, -3.4028e+38,  3.1400e+00])

In [10]:
y = torch.rand(100)
yhat = torch.rand(100)

### Scale-dependent Errors
### 1. **Mean Absolute Error (MAE)**:
#### Formula:
$ \text{MAE} = \frac{1}{n} \sum_{i=1}^{n} \left| y_i - \hat{y}_i \right| $
Where:
- $ y_i $ is the actual value,
- $ \hat{y}_i $ is the predicted value,
- $ n $ is the number of data points.

#### Torch Implementation:
```python
import torch

def mae(y, yhat):
    return torch.sum(torch.abs(y - yhat)) / len(y)
```

---

### 2. **Mean Squared Error (MSE)**:
#### Formula:
$ \text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 $
Where:
- $ y_i $ is the actual value,
- $ \hat{y}_i $ is the predicted value,
- $ n $ is the number of data points.

#### Torch Implementation:
```python
def mse(y, yhat):
    return torch.sum(torch.square(y - yhat)) / len(y)
```

---

### 3. **Root Mean Squared Error (RMSE)**:
#### Formula:
$ \text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2} $
Where:
- $ y_i $ is the actual value,
- $ \hat{y}_i $ is the predicted value,
- $ n $ is the number of data points.

#### Torch Implementation:
```python
def rmse(y, yhat):
    return torch.sqrt(torch.sum(torch.square(y - yhat)) / len(y))
```

---


### Scale-independent Errors
### 4. **Mean Absolute Percentage Error (MAPE)**:
#### Formula:
$ \text{MAPE} = \frac{1}{n} \sum_{i=1}^{n} \left| \frac{y_i - \hat{y}_i}{y_i} \right| \times 100 $
Where:
- $ y_i $ is the actual value,
- $ \hat{y}_i $ is the predicted value,
- $ n $ is the number of data points.

#### Torch Implementation:
```python
def mape(y, yhat):
    return (torch.sum(torch.abs((y - yhat) / y)) / len(y)) * 100
```

---

### 5. **Symmetric Mean Absolute Percentage Error (SMAPE)**:
#### Formula:
$ \text{SMAPE} = \frac{1}{n} \sum_{i=1}^{n} \frac{2 \left| y_i - \hat{y}_i \right|}{\left| y_i \right| + \left| \hat{y}_i \right|} $
Where:
- $ y_i $ is the actual value,
- $ \hat{y}_i $ is the predicted value,
- $ n $ is the number of data points.

#### Torch Implementation:
```python
def smape(y, yhat):
    return torch.sum(torch.abs(y - yhat) / ((torch.abs(y) + torch.abs(yhat)) / 2)) / len(y)
```




### Scale-independent Errors
### 6. **Mean Absolute Scaled Error (MASE)**:
#### Formula:
$ \text{MASE} = \frac{\frac{1}{n} \sum_{i=1}^{n} \left| y_i - \hat{y}_i \right|}{\frac{1}{n-1} \sum_{i=2}^{n} \left| y_i - y_{i-1} \right|} $
Where:
- $ y_i $ is the actual value,
- $ \hat{y}_i $ is the predicted value,
- $ n $ is the number of data points,
- The denominator is the MAE of a naive model using previous values.

#### Torch Implementation:
```python
def mase(y, yhat):
    # Compute the MAE of the model
    mae_model = torch.sum(torch.abs(y - yhat)) / len(y)
    
    # Compute the MAE of the naive model (y_t = y_(t-1))
    naive_error = torch.sum(torch.abs(y[1:] - y[:-1])) / (len(y) - 1)
    
    # Compute MASE
    return mae_model / naive_error
```

---

### 7. **Relative Mean Squared Error (relMSE)**:
#### Formula:
$ \text{relMSE} = \frac{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{\frac{1}{n} \sum_{i=1}^{n} (y_i - \bar{y})^2} $
Where:
- $ y_i $ is the actual value,
- $ \hat{y}_i $ is the predicted value,
- $ \bar{y} $ is the mean of the actual values.

#### Torch Implementation:
```python
def rel_mse(y, yhat):
    # Compute the MSE of the model
    mse_model = torch.sum(torch.square(y - yhat)) / len(y)
    
    # Compute the variance of the true values (y)
    variance_y = torch.sum(torch.square(y - torch.mean(y))) / len(y)
    
    # Compute relMSE
    return mse_model / variance_y
```



### 1. **Quantile Loss**:
#### Formula:
$ \text{Quantile Loss}(q) = \begin{cases} q \cdot (y - \hat{y}) & \text{if } y \geq \hat{y} \\ (1 - q) \cdot (\hat{y} - y) & \text{if } y < \hat{y} \end{cases} $
Where:
- $ y $ is the true value,
- $ \hat{y} $ is the predicted value,
- $ q $ is the quantile being predicted.

#### Torch Implementation:
```python
import torch

def quantile_loss(y, yhat, q):
    # Calculate the quantile loss
    loss = torch.max(q * (y - yhat), (q - 1) * (y - yhat))
    return torch.mean(loss)
```

---

### 2. **Multi-Quantile Loss (MQLoss)**:
#### Formula:
$ \text{MQLoss}(y, \hat{y}, Q) = \frac{1}{k} \sum_{i=1}^{k} \text{Quantile Loss}(q_i) $
Where:
- $ Q = \{q_1, q_2, \dots, q_k\} $ is the set of quantiles,
- $ q_i $ is the $ i $-th quantile in the set.

#### Torch Implementation:
```python
def multi_quantile_loss(y, yhat, quantiles):
    losses = []
    for q in quantiles:
        loss = torch.max(q * (y - yhat), (q - 1) * (y - yhat))
        losses.append(torch.mean(loss))
    return torch.mean(torch.stack(losses))
```

---

### 3. **Implicit Quantile Loss (IQLoss)**:
#### Formula:
$ \text{IQLoss} = \frac{1}{n} \sum_{i=1}^{n} \left| \hat{y}_i - y_i \right| \times \text{weight} $
Where:
- $ \hat{y}_i $ is the predicted value,
- $ y_i $ is the true value,
- The weight depends on the quantiles and is dynamically calculated.

#### Torch Implementation:
```python
def implicit_quantile_loss(y, yhat, quantiles):
    losses = []
    for q in quantiles:
        diff = y - yhat
        weight = torch.where(diff >= 0, q, 1 - q)
        loss = weight * torch.abs(diff)
        losses.append(torch.mean(loss))
    return torch.mean(torch.stack(losses))
```

---

### 4. **Distribution Loss (DistributionLoss)**:
#### Formula (using Kullback-Leibler Divergence):
$ \text{KL}(p || q) = \sum_{i=1}^{n} p(y_i) \log \frac{p(y_i)}{q(\hat{y}_i)} $
Where:
- $ p(y_i) $ is the true distribution of the target variable,
- $ q(\hat{y}_i) $ is the predicted distribution.

#### Torch Implementation (KL Divergence):
```python
import torch
import torch.nn.functional as F

def distribution_loss(y, yhat, p_dist, q_dist):
    # Calculate KL divergence between true distribution p and predicted q
    kl_loss = F.kl_div(q_dist.log(), p_dist, reduction='batchmean')
    return kl_loss
```
