NWRMSLE stands for Normalized Weighted Root Mean Squared Logarithmic Error. It's a metric used to evaluate the performance of regression models, particularly when you're dealing with target values that have a wide range and are positive-valued. This metric is often seen in Kaggle competitions or other machine learning tasks where accurate prediction of quantities is important, especially when those quantities can vary by several orders of magnitude.

### Understanding NWRMSLE:

1. **Root Mean Squared Logarithmic Error (RMSLE)**: 
   - The RMSLE is a variation of the Root Mean Squared Error (RMSE) and is particularly useful when you want to penalize underestimation more than overestimation.
   - It's calculated by taking the natural logarithm (`log(1 + x)`) of the predicted and actual values, computing the squared difference between these log-transformed values, and then taking the square root of the average of these squared differences.

2. **Normalized**:
   - "Normalized" implies that the error is scaled in some manner, typically to bring it within a certain range or to account for the scale of the data.

3. **Weighted**:
   - "Weighted" indicates that different values or different predictions might be given different weights. This is useful when certain errors are more significant than others or when the data is imbalanced.
   - Weights can be applied based on the importance of specific samples or categories in the data.

### Formula:

The formula for NWRMSLE might vary slightly depending on specific implementations, but it generally looks something like this:

\[ NWRMSLE = \sqrt{\frac{\sum w_i \cdot (\log(p_i + 1) - \log(a_i + 1))^2}{\sum w_i}} \]

where:
- \( p_i \) is the predicted value for the ith observation,
- \( a_i \) is the actual value for the ith observation,
- \( w_i \) is the weight for the ith observation,
- \( \log \) is the natural logarithm.

### Use in Target Transformation:

NWRMSLE can be particularly useful in cases where the target variable has undergone a transformation, such as a logarithmic transformation. This is common in situations where the target variable spans several orders of magnitude. By using logarithmic transformation on the target, models can often achieve a more balanced and accurate prediction across the range of values. The NWRMSLE then becomes an appropriate metric for evaluating these models because it inherently works with the logarithms of the predictions and actual values, aligning well with the transformed nature of the target.

### Key Points:

- NWRMSLE is less sensitive to large errors when both the predicted and actual values are large, which can be desirable in certain contexts.
- It penalizes underestimates more than overestimates, which can be particularly useful in scenarios where underestimation has a greater cost.
- The inclusion of weights allows for flexibility in emphasizing certain parts of the data according to their relevance or importance. 

### Conclusion:

NWRMSLE is a powerful metric for evaluating regression models, especially in cases where target transformation is applied, and the data has a wide range or is imbalanced. It offers a nuanced way to assess model performance, taking into account the scale and importance of different errors.

In [1]:
import numpy as np

def nwrmsle(actual, predicted, weights):
    """
    Calculate the Normalized Weighted Root Mean Squared Logarithmic Error.
    :param actual: numpy array of actual values
    :param predicted: numpy array of predicted values
    :param weights: numpy array of weights for each observation
    :return: calculated NWRMSLE
    """
    # Add 1 to actual and predicted values to ensure they are positive and nonzero
    log_actual = np.log(actual + 1)
    log_predicted = np.log(predicted + 1)
    
    # Calculate the squared log error
    squared_log_error = (log_predicted - log_actual) ** 2
    
    # Calculate the mean squared log error with weights
    mean_squared_log_error = np.sum(weights * squared_log_error) / np.sum(weights)
    
    # Return the square root of the mean squared log error
    return np.sqrt(mean_squared_log_error)

# Example data
actual_values = np.array([10, 20, 30, 40, 50])
predicted_values = np.array([12, 22, 29, 43, 52])
weights = np.array([1, 1.5, 1, 2, 1.5])

# Calculate NWRMSLE
error = nwrmsle(actual_values, predicted_values, weights)
print("NWRMSLE:", error)


NWRMSLE: 0.08749628569079618


Defining weights in the context of calculating a metric like Normalized Weighted Root Mean Squared Logarithmic Error (NWRMSLE) depends on the specific requirements of your problem and dataset. Weights are used to give different importance to different observations in your dataset. Here are some common strategies for defining weights:

### 1. **Uniform Weights**:
- **Equal Importance**: If you believe that each observation in your dataset is equally important, you can assign a uniform weight to all observations. This is the simplest approach and essentially turns NWRMSLE into a non-weighted metric.

  ```python
  weights = np.ones(len(actual_values))  # For a numpy array of actual_values
  ```

### 2. **Error Magnitude-Based Weights**:
- **Larger Impact for Specific Ranges**: If errors in certain ranges of your target variable are more critical than others, you might assign higher weights to those ranges. For example, in a sales forecasting problem, underestimating high-volume sales could be more detrimental than underestimating low-volume sales.

  ```python
  weights = np.where(actual_values > threshold, higher_weight, lower_weight)
  ```

### 3. **Category-Based Weights**:
- **Different Categories**: If your dataset contains different categories or groups and some groups are more important than others, you can assign weights based on these categories.

  ```python
  weights = np.array([weight_dict[category] for category in categories])
  ```

### 4. **Variance-Based Weights**:
- **Inversely Proportional to Variance**: In some cases, observations with less variability (more certainty) are more critical. Here, you could inversely weight observations by their variance.

  ```python
  weights = 1 / np.var(some_feature_of_each_observation)
  ```

### 5. **Time-Based Weights**:
- **Time Series Data**: In time series forecasting, more recent data might be more relevant than older data. Weights could decrease for older observations.

  ```python
  weights = np.linspace(start_weight, end_weight, len(actual_values))
  ```

### 6. **Custom Business Logic**:
- **Business Objectives**: Sometimes, weights are determined by specific business goals or objectives, such as focusing on certain customers, regions, or time periods.

### 7. **Model Confidence**:
- **Confidence of Predictions**: If your model can output prediction confidence intervals or probabilities, you can use these to weight observations. More confidence in a prediction could lead to a higher weight.

### Key Considerations:
- **Balance**: Ensure that the weights are balanced and do not overly favor certain observations unless strongly justified.
- **Validation**: It's important to validate the choice of weights through cross-validation or other model validation techniques to ensure they improve model performance.
- **Interpretability**: Be aware that introducing weights can make the model evaluation and interpretation more complex.
- **Data Understanding**: Deep understanding of your data and the problem context is crucial in defining meaningful and effective weights.

In summary, the choice of weights should reflect the importance of different observations in achieving your overall modeling goals and should be guided by a thorough understanding of the data and the problem context.

## 1. 均勻權重（Uniform Weights）

In [2]:
import numpy as np

actual_values = np.array([10, 15, 20, 25, 30])  # 示例實際值
predicted_values = np.array([12, 14, 22, 24, 31])  # 示例預測值

# 給所有觀測值分配相同的權重
uniform_weights = np.ones(len(actual_values))
print("Uniform Weights:", uniform_weights)


Uniform Weights: [1. 1. 1. 1. 1.]


## 2. 基於錯誤大小的權重（Error Magnitude-Based Weights）

In [4]:
threshold = 20
higher_weight = 2
lower_weight = 1

# 為高於某閾值的實際值分配更高的權重
magnitude_weights = np.where(actual_values > threshold, higher_weight, lower_weight)
print("Magnitude-Based Weights:", magnitude_weights)


Magnitude-Based Weights: [1 1 1 2 2]


## 3. 基於類別的權重（Category-Based Weights）

In [5]:
categories = np.array(['A', 'B', 'A', 'C', 'B'])  # 示例類別
weight_dict = {'A': 1, 'B': 1.5, 'C': 2}

# 根據類別分配權重
category_weights = np.array([weight_dict[category] for category in categories])
print("Category-Based Weights:", category_weights)


Category-Based Weights: [1.  1.5 1.  2.  1.5]


## 4. 基於變異性的權重（Variance-Based Weights）

In [6]:
some_feature = np.array([5, 10, 5, 15, 10])  # 示例特徵

# 分配與特徵變異性成反比的權重
variance_weights = 1 / np.var(some_feature)
print("Variance-Based Weights:", variance_weights)


Variance-Based Weights: 0.07142857142857142


## 5. 基於時間的權重（Time-Based Weights）

In [7]:
start_weight = 1
end_weight = 2

# 給予最近的觀測值更高的權重
time_weights = np.linspace(start_weight, end_weight, len(actual_values))
print("Time-Based Weights:", time_weights)


Time-Based Weights: [1.   1.25 1.5  1.75 2.  ]


## 6. 自定義商業邏輯的權重（Custom Business Logic Weights）

In [8]:
# 假設這裡是基於特定商業邏輯定義的權重
custom_weights = np.array([1, 2, 1, 3, 2])
print("Custom Business Logic Weights:", custom_weights)


Custom Business Logic Weights: [1 2 1 3 2]


## 7. 模型信心的權重（Model Confidence Weights）

In [10]:
model_confidence = np.array([0.8, 0.5, 0.9, 0.6, 0.7])  # 示例模型信心值

# 使用模型信心作為權重
confidence_weights = model_confidence
print("Model Confidence Weights:", confidence_weights)


Model Confidence Weights: [0.8 0.5 0.9 0.6 0.7]
