docs causal fidelity ts: interpretation and remarks

deel-ai · Dec 10, 2021 · af217c7 · af217c7
1 parent e059af8
commit af217c7
Show file tree

Hide file tree

Showing 2 changed files with 60 additions and 4 deletions.
diff --git a/docs/api/deletion_ts.md b/docs/api/deletion_ts.md
@@ -6,7 +6,35 @@ This metric computes the capacity of the model to make predictions while perturb
 Specific explanation metrics for time series are necessary because time series and images have different shapes (number of dimensions) and perturbations should be applied differently to them.
 As the insertion and deletion metrics use input perturbation to be computed, creating new metrics for time series is natural[^2].
 
-The better the method, the smaller the score.
+
+## Score interpretation
+
+The interpretation of the score depends on the score metric you are using to evaluate your model.
+- For metrics where the score increases with the performance of the model (such as accuracy).
+If explanations are accurate, the score will quickly fall from the score on non-perturbed input to the score of a random predictor.
+  Thus, in this case, a lower score represent a more accurate explanation.
+
+- For metrics where the score decreases with the performance of the model (such as losses). 
+If explanations are accurate, the score will quickly rise.
+  Thus, in this case, a higher score represent a more accurate explanation.
+
+
+## Remarks
+
+This metric only evaluate the order of importance between features.
+
+The parameters metric, steps and max_percentage_perturbed may drastically change the score :
+
+- For inputs with many features, increasing the number of steps will allow you to capture more efficiently the difference between attributions methods.
+
+- The order of importance of features with low importance may not matter, hence, decreasing the max_percentage_perturbed,
+may make the score more relevant.
+
+Sometimes, attributions methods also returns negative attributions,
+for those methods, do not take the absolute value before computing insertion and deletion metrics.
+Otherwise, negative attributions may have higher absolute values, and the order of importance between features will change.
+Therefore, take those previous remarks into account to get a relevant score.
+
 
 ## Example
 
@@ -25,5 +53,5 @@ score = metric.evaluate(explanations)
 
 {{xplique.metrics.DeletionTS}}
 
-[^1]: [RISE: Randomized Input Sampling for Explanation of Black-box Models (2018)](https://arxiv.org/abs/1806.07421)
-[^2]: [Towards a Rigorous Evaluation of XAI Methods on Time Series (2019)](https://arxiv.org/abs/1909.07082)
+[^1]:[RISE: Randomized Input Sampling for Explanation of Black-box Models (2018)](https://arxiv.org/abs/1806.07421)
+[^2]:[Towards a Rigorous Evaluation of XAI Methods on Time Series (2019)](https://arxiv.org/abs/1909.07082)
diff --git a/docs/api/insertion_ts.md b/docs/api/insertion_ts.md
@@ -3,7 +3,35 @@
 The Time Series Insertion Fidelity metric measures the faithfulness of explanations on Time Series predictions[^2].
 This metric computes the capacity of the model to make predictions while only the most important features are not perturbed[^1].
 
-The better the method, the higher the score.
+
+## Score interpretation
+
+The interpretation of the score depends on the score metric you are using to evaluate your model.
+- For metrics where the score increases with the performance of the model (such as accuracy).
+If explanations are accurate, the score will quickly rise to the score on non-perturbed input.
+  Thus, in this case, a higher score represent a more accurate explanation.
+
+- For metrics where the score decreases with the performance of the model (such as losses). 
+If explanations are accurate, the score will quickly fall to the score on non-perturbed input.
+  Thus, in this case, a lower score represent a more accurate explanation.
+
+
+## Remarks
+
+This metric only evaluate the order of importance between features.
+
+The parameters metric, steps and max_percentage_perturbed may drastically change the score :
+
+- For inputs with many features, increasing the number of steps will allow you to capture more efficiently the difference between attributions methods.
+
+- The order of importance of features with low importance may not matter, hence, decreasing the max_percentage_perturbed,
+may make the score more relevant.
+
+Sometimes, attributions methods also returns negative attributions,
+for those methods, do not take the absolute value before computing insertion and deletion metrics.
+Otherwise, negative attributions may have higher absolute values, and the order of importance between features will change.
+Therefore, take those previous remarks into account to get a relevant score.
+
 
 ## Example