# Basic Examples of LM-Polygraph Usage for Visual LLMs

This notebook contains basic examples of obtaining uncertainty scores for Visual LLMs along with generations using a high-level API function:

```estimate_uncertainty(model, estimator, input_text)```. 

## Install Dependencies

In [1]:
# Assume that you have installed lm-polygraph: 
# pip install git+https://github.com/artemshelmanov/lm-polygraph.git

!python -m spacy download en_core_web_sm

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m47.0 MB/s[0m eta [36m0:00:00[0m[36m0:00:02[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


## Basic Imports

In [2]:
%load_ext autoreload
%autoreload 2

from transformers import AutoModelForVision2Seq, AutoProcessor
from lm_polygraph.model_adapters.visual_whitebox_model import VisualWhiteboxModel
from lm_polygraph import estimate_uncertainty
from lm_polygraph.estimators import MaximumTokenProbability, MaximumSequenceProbability, SemanticEntropy, EigValLaplacian

  from .autonotebook import tqdm as notebook_tqdm


## UQ for Whitebox LLMs

### Initialize model

In [3]:
base_model = AutoModelForVision2Seq.from_pretrained("microsoft/kosmos-2-patch14-224")
processor = AutoProcessor.from_pretrained("microsoft/kosmos-2-patch14-224")

# Create whitebox model
model = VisualWhiteboxModel(base_model, processor)

# Test with input text and image
input_text = ["<grounding>An image of"]
url = "https://huggingface.co/microsoft/kosmos-2-patch14-224/resolve/main/snowman.png"

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.


### Sequence-level UQ for a Whitebox LLM

In [4]:
estimator = MaximumSequenceProbability()
estimate_uncertainty(model, estimator, input_text=input_text, image=url)

`generation_config` default values have been modified to match model-specific defaults: {'no_repeat_ngram_size': 3, 'pad_token_id': 1, 'bos_token_id': 0, 'eos_token_id': 2}. If this is not desired, please set these values explicitly.


UncertaintyOutput(uncertainty=13.223609924316406, input_text=['<grounding>An image of'], generation_text='Snowman in<phrase> a hat</phrase><object><patch_index_0145><patch_index_0246></object> in the snow', generation_tokens=[6709, 581, 12, 64007, 10, 3958, 64008, 64009, 64158, 64259, 64010, 12, 5, 1842], model_path=None, estimator='MaximumSequenceProbability')

In [5]:
# It takes 3 mins to run the example.

estimator = SemanticEntropy()
estimate_uncertainty(model, estimator, input_text=input_text, image=url)

Some weights of the model checkpoint at microsoft/deberta-large-mnli were not used when initializing DebertaForSequenceClassification: ['config']
- This IS expected if you are initializing DebertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Keyword argument `return_dict` is not a valid argument for this processor and will be ignored.


UncertaintyOutput(uncertainty=41.580437001058634, input_text=['<grounding>An image of'], generation_text='Snowman in<phrase> a hat</phrase><object><patch_index_0145><patch_index_0246></object> in the snow', generation_tokens=[6709, 581, 12, 64007, 10, 3958, 64008, 64009, 64158, 64259, 64010, 12, 5, 1842], model_path=None, estimator='SemanticEntropy')

### Token-level UQ for Whitebox LLM

In [6]:
estimator = MaximumTokenProbability()
estimate_uncertainty(model, estimator, input_text=input_text, image=url)

UncertaintyOutput(uncertainty=array([-0.2788393 , -0.9490852 , -0.39342225, -0.2518251 , -0.37809426,
       -0.47034308, -0.9533952 , -1.        , -0.21120858, -0.20057407,
       -0.9998684 , -0.17277305, -0.4382389 , -0.31373882], dtype=float32), input_text=['<grounding>An image of'], generation_text='Snowman in<phrase> a hat</phrase><object><patch_index_0145><patch_index_0246></object> in the snow', generation_tokens=[6709, 581, 12, 64007, 10, 3958, 64008, 64009, 64158, 64259, 64010, 12, 5, 1842], model_path=None, estimator='MaximumTokenProbability')