## Introduction
This notebook runs all the processing steps one by one for several models and renders the output. Each section is individually runnable after a kernel restart 

## Observations
* Symbolic tracing did not play well with any BERT model, because it creates proxies for mutually exclusive inputs to e.g. `DistilBertModel.forward`
  * This was fixed by making the `concrete_args` input to `fx.symbolic_trace` available to the `MAV` and `MavTracer` objects
  * For BERT models, `concrete_args={'inputs_embeds':None}` gets around this issue
* Still, most NLP models use proxy variables for control flow, which is not supported by `torch.fx`
  * Perhaps fixing more arguments via `concrete_args` could work around this. To be investigated.

## File system size checks

In [8]:
!df -BG /dev/sdc

Filesystem     1G-blocks  Used Available Use% Mounted on
/dev/sdc            251G  150G       89G  63% /


In [6]:
!du -md1 ~/.cache/huggingface/hub

3719	/home/dev/.cache/huggingface/hub/models--TheBloke--Llama-2-7b-Chat-GPTQ
149	/home/dev/.cache/huggingface/hub/models--openai--whisper-tiny
331	/home/dev/.cache/huggingface/hub/models--timm--vit_base_patch16_224.augreg2_in21k_ft_in1k
528	/home/dev/.cache/huggingface/hub/models--timm--vgg16.tv_in1k
320	/home/dev/.cache/huggingface/hub/models--stabilityai--sd-vae-ft-ema
6214	/home/dev/.cache/huggingface/hub/models--CompVis--stable-diffusion-v1-4
1	/home/dev/.cache/huggingface/hub/.locks
233	/home/dev/.cache/huggingface/hub/models--t5-small
752	/home/dev/.cache/huggingface/hub/models--timm--swin_large_patch4_window7_224.ms_in22k_ft_in1k
18	/home/dev/.cache/huggingface/hub/models--google--bert_uncased_L-2_H-128_A-2
62	/home/dev/.cache/huggingface/hub/models--timm--resnet26d.bt_in1k
1635	/home/dev/.cache/huggingface/hub/models--openai--clip-vit-large-patch14
1	/home/dev/.cache/huggingface/hub/models--EleutherAI--gpt-neox-20b
420	/home/dev/.cache/huggingface/hub/models--timm--convnext_bas

## DistilBERT

In [5]:
import sys
sys.path.append('..')
from transformers import DistilBertModel, DistilBertTokenizer
import torch
from idlmav import MAV, plotly_renderer

model = DistilBertModel.from_pretrained("distilbert-base-uncased")
tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")
model.eval()
inputs = tokenizer("Hello world", return_tensors="pt")
device = 'cpu'

mav = MAV(model, inputs, concrete_args={'inputs_embeds':None})
with plotly_renderer('notebook_connected'): mav.show_figure()

Tracing failed with torch.fx.symbolic_trace: symbolically traced variables cannot be used as inputs to control flow
Tracing with torch.compile



## T5-small encoder

In [4]:
import sys
sys.path.append('..')
from transformers import T5Model, T5Tokenizer
import torch
from idlmav import MAV, plotly_renderer

model = T5Model.from_pretrained("t5-small")
tokenizer = T5Tokenizer.from_pretrained("t5-small")
model.eval()
inputs = tokenizer("translate English to French: Hello, how are you?", return_tensors="pt")
device = 'cpu'

mav = MAV(model.encoder, inputs, device=device, concrete_args={'inputs_embeds':None})
with plotly_renderer('notebook_connected'): mav.show_figure()

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


Tracing failed with torch.fx.symbolic_trace: finfo(): argument 'type' (position 1) must be torch.dtype, not Attribute
Tracing with torch.compile



Forward pass failed. Last successful node: "relative_buckets_1:iadd". Possible error node: "values:embedding".: index out of range in self

While executing %values : [num_users=1] = call_function[target=torch.nn.functional.embedding](args = (%relative_buckets_1, %l_self_modules_block_modules_0_modules_layer_modules_0_modules_self_attention_modules_relative_attention_bias_parameters_weight_, None, None, 2.0, False, False), kwargs = {})
Original traceback:
  File "/home/dev/ai/idlmav/.venv/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 1124, in forward
    layer_outputs = layer_module(
  File "/home/dev/ai/idlmav/.venv/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 675, in forward
    self_attention_outputs = self.layer[0](
  File "/home/dev/ai/idlmav/.venv/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py", line 593, in forward
    attention_output = self.SelfAttention(
  File "/home/dev/ai/idlmav/.venv/lib/python3.

## BERT mini

In [4]:
mav.show_widget(add_overview=True)

HBox(children=(Box(children=(FloatRangeSlider(value=(-9.5, 0.5), layout=Layout(height='400px'), max=0.5, min=-…

In [3]:
import sys
sys.path.append('..')
from transformers import BertModel, BertTokenizer
import torch
from idlmav import MAV, plotly_renderer

model = BertModel.from_pretrained("google/bert_uncased_L-2_H-128_A-2")
tokenizer = BertTokenizer.from_pretrained("google/bert_uncased_L-2_H-128_A-2")
model.eval()
inputs = tokenizer("This is a test sentence.", return_tensors="pt")
device = 'cpu'

mav = MAV(model, inputs, concrete_args={'inputs_embeds':None})
with plotly_renderer('notebook_connected'): mav.show_figure()

Tracing failed with torch.fx.symbolic_trace: symbolically traced variables cannot be used as inputs to control flow
Tracing with torch.compile


## ALBERT Lite

In [2]:
import sys
sys.path.append('..')
from transformers import AlbertModel, AlbertTokenizer
import torch
from idlmav import MAV, plotly_renderer

model = AlbertModel.from_pretrained("albert-base-v2")
tokenizer = AlbertTokenizer.from_pretrained("albert-base-v2")
model.eval()
inputs = tokenizer("The quick brown fox jumps over the lazy dog.", return_tensors="pt")
device = 'cpu'

mav = MAV(model, inputs, concrete_args={'inputs_embeds':None})
with plotly_renderer('notebook_connected'): mav.show_figure()

Tracing failed with torch.fx.symbolic_trace: symbolically traced variables cannot be used as inputs to control flow
Tracing with torch.compile
Total nodes: 372. Input nodes: 19. Output nodes: 1. Largest level nodes: 8


## ModernBERT

In [None]:
import sys
sys.path.append('..')
from transformers import AutoTokenizer, AutoModelForMaskedLM
import torch
from idlmav import MAV, plotly_renderer, MavTracer

model_id = "answerdotai/ModernBERT-base"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForMaskedLM.from_pretrained(model_id)
inputs = tokenizer("The capital of France is [MASK].", return_tensors="pt")
device='cpu'

mav = MAV(model, inputs, concrete_args={'inputs_embeds':None})
with plotly_renderer('notebook_connected'): mav.show_figure()

Tracing failed with torch.fx.symbolic_trace: symbolically traced variables cannot be used as inputs to control flow
Tracing with torch.compile


INFO:2025-02-14 14:07:08 16178:16178 init.cpp:181] If you see CUPTI_ERROR_INSUFFICIENT_PRIVILEGES, refer to https://developer.nvidia.com/nvidia-development-tools-solutions-err-nvgpuctrperm-cupti


In [2]:
with plotly_renderer('notebook_connected'): mav.show_widget()

HBox(children=(Box(children=(FloatRangeSlider(value=(-9.5, 0.5), layout=Layout(height='400px'), max=0.5, min=-…

In [3]:
mav = MAV(model, inputs, concrete_args={'inputs_embeds':None}, show_param_nodes=True)
with plotly_renderer('notebook_connected'): mav.show_figure()

Tracing failed with torch.fx.symbolic_trace: symbolically traced variables cannot be used as inputs to control flow
Tracing with torch.compile
Total nodes: 1420. Input nodes: 163. Output nodes: 1. Largest level nodes: 6


In [4]:
with plotly_renderer('notebook_connected'): mav.show_widget()

HBox(children=(Box(children=(FloatRangeSlider(value=(-9.5, 0.5), layout=Layout(height='400px'), max=0.5, min=-…

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Mistral-7B
* Gated model: accept terms and conditions [here](https://huggingface.co/mistralai/Mistral-7B-v0.3)
* TODO: Run on better hardware

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "mistralai/Mistral-7B-v0.3"
i=0; print(i)
tokenizer = AutoTokenizer.from_pretrained(model_id, device_map={"": "cpu"})
i+=1; print(i)

model = AutoModelForCausalLM.from_pretrained(model_id, device_map={"": "cpu"})
i+=1; print(i)
inputs = tokenizer("Hello my name is", return_tensors="pt")
i+=1; print(i)

outputs = model.generate(**inputs, max_new_tokens=20)
i+=1; print(i)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

0
1


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

: 

## Others under consideration

### MobileLLM-350M

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("facebook/MobileLLM-350M", use_fast=False)
model = AutoModelForCausalLM.from_pretrained("facebook/MobileLLM-350M", trust_remote_code=True)
