# First thought on interpretation

## The task

The goal is to identify what features are the most important for the model predictions. I.e. what exact feature changes affect predictions the most. And, if possible, what exact feature _values_ changes affect predictions the most.
The interpretation should somehow use Formal Concept Analysis. As example the result of the interpretation should be presented as a Concept Lattice.

A model to interpret is supposed to be a Black Box, i.e. one knows nothing about its internal structure. The only known things are:
* input data $X$ (the set of features and their values needed for the model to run)
* Black Box model as a function $BB: X \mapsto Y$, where $Y$ is prediction of the model.

## Benefits of FCA

Formal Concept is a mathematical structure though may be useful for business analysis. Some features in the original dataset may correspond to the same business value. To resolve this one can either change the input data and the model (not our case) or somehow consider it at the stage of interpreting the model. The latter can be done with assigning specific values of different features to the same Formal Concept.

## Lack of FCA

FCA is good for interpreting the result though it's not always the best model to predict the result. So consider the given Black Box model is the model possible model to solve the task via some metric. Therefore FCA should be used only to interpret the Black Box rather then predicting its output.

## Ways to interpret

There are 2 basic ways to interpret a model:
1. Global - what features are the most important in the model given data $X$. The outcome is the set of features ordered by their contribution to all the rows of data.
2. Local - what features are the most important in the model for given row (object) $g\in X$. The output is the set of features ordered by their contribution into specific rox $x$ of data. If possible - how exactly did they influence the prediction.

## Ideas of intepretation

### One feature noise
For every feature $f \in F$ ($F$ is the set of all features), change the values of $f$ to some noise and calculate $$\Delta Y_f=\frac{1}{|G|} |\{g\in G| |BB(g_f)-BB(g)|>k\}|  $$, 
where
* $g_f$ - data with nosed feature $f$,
* $g$ - the default data,
* $k$ - some coefficient corresponding to the precision of the interpretation,
* $|*|$ - cardinality of a set

Then construct the context 

### Multiple feature nose

Find every combination of the features that significatly infuence the output

### Using SHAP
https://github.com/slundberg/shap

SHAP is one of the most popular techniques of interpretation. May be we can use it?

#### Shap features importances

#### Shap feature _values_ importances