# Explainable ecco!

I've build a small addition to the ecco library (https://github.com/jalammar/ecco) using openai's api to automatically explain what the clusters found through non-negative matrix factorizations (NNMF) might have in common!
I thought ecco was a nice library to build the addition to because this project was quite experimental, and the vizualisations of ecco makes it easy to do assess whether the interpretation/summary that the LLM is providing!
Table of Content: 
- Short intro to ECCO
- Description of .explain() method
- .explain() in action!

## Short intro to ECCO

In [1]:
import ecco
lm = ecco.from_pretrained('distilbert-base-uncased', activations=True)

Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertModel: ['vocab_transform.bias', 'vocab_projector.weight', 'vocab_transform.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Ecco is a library providing interactive visualizations to some well-known large language model analysis methods.
It has a few pretrained models - and for this project I'll be using the "distilbert-base-uncased" model since I can easily run inference using the model locally, but the .explain() methods works for all the models in ecco.
Below I've provided an example of the explore method, but Jay Alammar provides more indepth examples here: https://jalammar.github.io/explaining-transformers/

In [2]:
text = "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18"
inputs = lm.tokenizer([text], return_tensors="pt")
output = lm(inputs)
nnmf_example1 = output.run_nmf(n_components=2)
nnmf_example1.explore()

<IPython.core.display.Javascript object>

By hovering over each "factor", you can see how each token responded by the specific factor. In the example above, factor 1 seemed to respond to the commas, factor 2 the [SEP] token and factor 3 to the numbers of the simple sequence.
But what if this explaining step could be automatized?
I tried using openai's large language models for this - an example can be seen below:

In [9]:
nnmf_example1.explain()

['\nThis cluster responds to the last token in the sequence, which appears to mark the end of the input.',
 '\nThis factor responds to the [CLS] token at the beginning and the [SEP] token at the end of the sequence, but no other intuitive connection between the tokens was found.']

The nnmf provides a value for how much each tokes relates to the each factor - this value is used by ECCO to colour the tokens above.
I turned the these values into text by masking the original input sequence using the nnmf values, and masking all values under some threshold.
I experimented with several different methods for finding a good threshold for masking, and 0.01 generally provided reasonable results.

## Description of .explain() method

The explain method mainly uses 2 techniques to guide/ground the model in making the summaries!
The first method is known as in-context learning (Dong et, al. 2022) - providing the model with a few examples of what you want it to do (and how to respond if it can't)!
In the "promp.py" file, I've written the prompt used by the explain method. 
However, I quickly ran into trouble, since I wanted the model to have examples of different types of input.
I wanted it to provide appropriate summaries of snippets of code, as well as poems.
For this reason I quickly ran out of space, since the maximum context-length of the model I was using was 4097!

A common way of adressing this issue is by employing the second technique: indexing.
By embedding each of my example prompts + explanations in a vector space I am able to pull the most similiar examples (those closest in this embedded vector space), and use these specific examples to ground the model.
This way, It's possible to have a large amount of highly specific instructions, from which the model can extract the most relevant information.
I manually analyzed some examples, embedded them using openai's embedding model: "text-embedding-ada-002", and wrote a customized search class (in embedding_searcher.py) which found the most relevant examples, and wrote a method to add the examples, until no more could fit in the prompt.

In summary, this is how .explain() (sort of) works - by creating some additional "factual" information that the LLM can use, as well as a way of "choosing" from this pool of specialized knowledge.
Using this specialized knowledge in combination with its huge background training - I hoped to automatize a bit of explainability, without having to fine-tune!

## .explain() yourself!

As an example, I'll let GPT-3.5 analyze this short poem written by GPT-3.5.


In [4]:
poem = """
A robot small yet smart and bright,
With features that delight the sight,
Anki Vector's the name to know,
A friend that's more than just a show.

He rolls around and scans the room,
With sensors that dispel the gloom,
He recognizes faces and can hear,
Your voice and commands he holds dear.

A charming bot that loves to play,
And keep you company all day,
Anki Vector, oh how we adore,
A companion we can't ignore. 
"""

Since this poem is much more complex than the counting sequence, let's just look at only the final layer, and let's assume that more components might be appropriate:

In [12]:
inputs = lm.tokenizer([poem], return_tensors="pt")
output = lm(inputs)
nnmf_example2 = output.run_nmf(n_components=3, from_layer=5, to_layer=6)
nnmf_example2.explore()

<IPython.core.display.Javascript object>

In [14]:
nnmf_example2.explain()

[' This cluster responds mainly to verbs like "rolls", "scans", "recognizes" and "holds" which are associated with the robot\'s actions. Additionally, it responds to possessive pronouns like "\'s" which link the robot to its attributes.',
 '\nThis factor responds to the [CLS], [SEP] and "an ##ki vector"\'s tokens more often than others, but does not show a clear connection between other tokens.',
 '\nThis cluster responds to very few tokens, consisting of the [CLS] token and the [SEP] token, as well as conjunctions like "yet", "and" or "but".']

The results above have the clusters jumbled - which seems to happen because of the order used by the list-comprehension I used (which I'm changing to fix it).
Factor 1 = cluster 3
Factor 2 = cluster 1

These results are not great, but the project was interesting!
I think more examples would help guide it! And since I wrote the method in a way, which just adds examples, untill there are no more tokens left this technique should scale well to a larger database of examples, as well as a model with more context length.
I think it serves well as a proof of concept, and the idea of using LLMs to analyse LLMS is super exciting.

## Improvements (TODO's)

Of course these result could be improved! 
Some low-hanging fruits are:
- Figure out where the order of the clusters get jumbled!
- More (and higher quality) explanation examples.
- A more complex masking-threshold selection (right now, the masking does not take into account, that tokens above the threshold don't all have identical values)
- Access to models with longer context-length (GPT-4 has double the context length!)
- Exploration of different indexing/search methods like lexical or graph-based approaches

## References

- Dong, Q., Li, L., Dai, D., Zheng, C., Wu, Z., Chang, B., ... & Sui, Z. (2022). A Survey for In-context Learning. arXiv preprint arXiv:2301.00234.